在寻找和理解pythonic方法以优化嵌套的for循环中的以下数组操作时,我将不胜感激:
def _func(a, b, radius): "Return 0 if a>b, otherwise return 1" if distance.euclidean(a, b) < radius: return 1 else: return 0 def _make_mask(volume, roi, radius): mask = numpy.zeros(volume.shape) for x in range(volume.shape[0]): for y in range(volume.shape[1]): for z in range(volume.shape[2]): mask[x, y, z] = _func((x, y, z), roi, radius) return mask
其中volume.shape(182、218、200)和roi.shape(3)都是ndarray类型;并且radius是int
volume.shape
roi.shape
ndarray
radius
int
方法1
这是向量化方法-
m,n,r = volume.shape x,y,z = np.mgrid[0:m,0:n,0:r] X = x - roi[0] Y = y - roi[1] Z = z - roi[2] mask = X**2 + Y**2 + Z**2 < radius**2
可能的改进:我们可以通过numexpr模块加快最后一步的速度-
numexpr
import numexpr as ne mask = ne.evaluate('X**2 + Y**2 + Z**2 < radius**2')
方法#2
我们还可以逐步构建与形状参数相对应的三个范围,并在运行中对三个元素进行减法运算,roi而无需像之前使用那样实际创建网格np.mgrid。通过broadcasting出于效率目的使用会受益。实现看起来像这样-
roi
np.mgrid
broadcasting
m,n,r = volume.shape vals = ((np.arange(m)-roi[0])**2)[:,None,None] + \ ((np.arange(n)-roi[1])**2)[:,None] + ((np.arange(r)-roi[2])**2) mask = vals < radius**2
简化版:感谢@Bi Rico在这里提出改进建议,因为我们可以使用它np.ogrid以更简洁的方式执行这些操作,例如-
np.ogrid
m,n,r = volume.shape x,y,z = np.ogrid[0:m,0:n,0:r]-roi mask = (x**2+y**2+z**2) < radius**2
运行时测试
功能定义-
def vectorized_app1(volume, roi, radius): m,n,r = volume.shape x,y,z = np.mgrid[0:m,0:n,0:r] X = x - roi[0] Y = y - roi[1] Z = z - roi[2] return X**2 + Y**2 + Z**2 < radius**2 def vectorized_app1_improved(volume, roi, radius): m,n,r = volume.shape x,y,z = np.mgrid[0:m,0:n,0:r] X = x - roi[0] Y = y - roi[1] Z = z - roi[2] return ne.evaluate('X**2 + Y**2 + Z**2 < radius**2') def vectorized_app2(volume, roi, radius): m,n,r = volume.shape vals = ((np.arange(m)-roi[0])**2)[:,None,None] + \ ((np.arange(n)-roi[1])**2)[:,None] + ((np.arange(r)-roi[2])**2) return vals < radius**2 def vectorized_app2_simplified(volume, roi, radius): m,n,r = volume.shape x,y,z = np.ogrid[0:m,0:n,0:r]-roi return (x**2+y**2+z**2) < radius**2
时间-
In [106]: # Setup input arrays ...: volume = np.random.rand(90,110,100) # Half of original input sizes ...: roi = np.random.rand(3) ...: radius = 3.4 ...: In [107]: %timeit _make_mask(volume, roi, radius) 1 loops, best of 3: 41.4 s per loop In [108]: %timeit vectorized_app1(volume, roi, radius) 10 loops, best of 3: 62.3 ms per loop In [109]: %timeit vectorized_app1_improved(volume, roi, radius) 10 loops, best of 3: 47 ms per loop In [110]: %timeit vectorized_app2(volume, roi, radius) 100 loops, best of 3: 4.26 ms per loop In [139]: %timeit vectorized_app2_simplified(volume, roi, radius) 100 loops, best of 3: 4.36 ms per loop
因此,一如既往地broadcasting显示出疯狂地 10,000x 加快原始代码速度的魔力,并且比 10x 使用即时广播的操作创建网格要好得多!
10,000x
10x