我需要有效地计算给定数组中每个点到另一个数组中每个其他点的欧几里德 加权 距离。这是我拥有的代码,可以正常工作:x,y``x,y
x,y``x,y
import numpy as np import random def rand_data(integ): ''' Function that generates 'integ' random values between [0.,1.) ''' rand_dat = [random.random() for _ in range(integ)] return rand_dat def weighted_dist(indx, x_coo, y_coo): ''' Function that calculates *weighted* euclidean distances. ''' dist_point_list = [] # Iterate through every point in array_2. for indx2, x_coo2 in enumerate(array_2[0]): y_coo2 = array_2[1][indx2] # Weighted distance in x. x_dist_weight = (x_coo-x_coo2)/w_data[0][indx] # Weighted distance in y. y_dist_weight = (y_coo-y_coo2)/w_data[1][indx] # Weighted distance between point from array_1 passed and this point # from array_2. dist = np.sqrt(x_dist_weight**2 + y_dist_weight**2) # Append weighted distance value to list. dist_point_list.append(round(dist, 8)) return dist_point_list # Generate random x,y data points. array_1 = np.array([rand_data(10), rand_data(10)], dtype=float) # Generate weights for each x,y coord for points in array_1. w_data = np.array([rand_data(10), rand_data(10)], dtype=float) # Generate second larger array. array_2 = np.array([rand_data(100), rand_data(100)], dtype=float) # Obtain *weighted* distances for every point in array_1 to every point in array_2. dist = [] # Iterate through every point in array_1. for indx, x_coo in enumerate(array_1[0]): y_coo = array_1[1][indx] # Call function to get weighted distances for this point to every point in # array_2. dist.append(weighted_dist(indx, x_coo, y_coo))
最终列表dist包含的子列表数量与第一个数组中的点数量相同,每个元素中的元素数量与第二个数组中的点数量相同(加权距离)。
dist
我想知道是否有一种方法可以提高此代码的效率,也许使用cdist函数,因为当数组中有很多元素(在我的情况下,它们具有)并且必须检查时,此过程会变得非常昂贵许多数组的距离(我也有)
@Evan和@Martinis Group处在正确的轨道上- 扩大Evan的答案,以下是一个函数,该函数使用广播快速计算n维加权欧式距离,而无需Python循环:
import numpy as np def fast_wdist(A, B, W): """ Compute the weighted euclidean distance between two arrays of points: D{i,j} = sqrt( ((A{0,i}-B{0,j})/W{0,i})^2 + ... + ((A{k,i}-B{k,j})/W{k,i})^2 ) inputs: A is an (k, m) array of coordinates B is an (k, n) array of coordinates W is an (k, m) array of weights returns: D is an (m, n) array of weighted euclidean distances """ # compute the differences and apply the weights in one go using # broadcasting jujitsu. the result is (n, k, m) wdiff = (A[np.newaxis,...] - B[np.newaxis,...].T) / W[np.newaxis,...] # square and sum over the second axis, take the sqrt and transpose. the # result is an (m, n) array of weighted euclidean distances D = np.sqrt((wdiff*wdiff).sum(1)).T return D
为了检查是否可以正常工作,我们将其与使用嵌套Python循环的较慢版本进行比较:
def slow_wdist(A, B, W): k,m = A.shape _,n = B.shape D = np.zeros((m, n)) for ii in xrange(m): for jj in xrange(n): wdiff = (A[:,ii] - B[:,jj]) / W[:,ii] D[ii,jj] = np.sqrt((wdiff**2).sum()) return D
首先,让我们确保两个函数给出相同的答案:
# make some random points and weights def setup(k=2, m=100, n=300): return np.random.randn(k,m), np.random.randn(k,n),np.random.randn(k,m) a, b, w = setup() d0 = slow_wdist(a, b, w) d1 = fast_wdist(a, b, w) print np.allclose(d0, d1) # True
不用说,使用广播而不是Python循环的版本要快几个数量级:
%%timeit a, b, w = setup() slow_wdist(a, b, w) # 1 loops, best of 3: 647 ms per loop %%timeit a, b, w = setup() fast_wdist(a, b, w) # 1000 loops, best of 3: 620 us per loop