我有一个形状的3D numpy数组(t, n1, n2):
(t, n1, n2)
x = np.random.rand(10, 2, 4)
我需要计算另一个3D数组y,其形状(t, n1, n1)如下:
y
(t, n1, n1)
y[0] = np.cov(x[0,:,:])
…等等,沿第一个轴的所有切片。
因此,一个循环实现将是:
y = np.zeros((10,2,2)) for i in np.arange(x.shape[0]): y[i] = np.cov(x[i, :, :])
有什么方法可以向量化它,以便我可以一次性计算所有协方差矩阵吗?我试着做:
x1 = x.swapaxes(1, 2) y = np.dot(x, x1)
但这没有用。
入侵numpy.cov source code并尝试使用默认参数。事实证明,np.cov(x[i,:,:])将很简单:
numpy.cov source code
np.cov(x[i,:,:])
N = x.shape[2] m = x[i,:,:] m -= np.sum(m, axis=1, keepdims=True) / N cov = np.dot(m, m.T) /(N - 1)
因此,任务是对这个循环进行矢量化处理,该循环将一次性遍历i并处理所有数据x。同样,我们可以broadcasting在第三步中使用。对于最后一步,我们将sum- reduction沿第一轴上的所有切片在此处执行。可以通过向量化方式有效地实现np.einsum。因此,最终的实现是这样的-
i
x
broadcasting
sum- reduction
np.einsum
N = x.shape[2] m1 = x - x.sum(2,keepdims=1)/N y_out = np.einsum('ijk,ilk->ijl',m1,m1) /(N - 1)
运行时测试
In [155]: def original_app(x): ...: n = x.shape[0] ...: y = np.zeros((n,2,2)) ...: for i in np.arange(x.shape[0]): ...: y[i]=np.cov(x[i,:,:]) ...: return y ...: ...: def proposed_app(x): ...: N = x.shape[2] ...: m1 = x - x.sum(2,keepdims=1)/N ...: out = np.einsum('ijk,ilk->ijl',m1,m1) / (N - 1) ...: return out ...: In [156]: # Setup inputs ...: n = 10000 ...: x = np.random.rand(n,2,4) ...: In [157]: np.allclose(original_app(x),proposed_app(x)) Out[157]: True # Results verified In [158]: %timeit original_app(x) 1 loops, best of 3: 610 ms per loop In [159]: %timeit proposed_app(x) 100 loops, best of 3: 6.32 ms per loop
巨大的加速!