我有一个很大的数组,其中每一行都是一个时间序列,因此需要保持秩序。
我想为每行选择一个给定大小的随机窗口。
>>>import numpy as np >>>arr = np.array(range(42)).reshape(6,7) >>>arr array([[ 0, 1, 2, 3, 4, 5, 6], [ 7, 8, 9, 10, 11, 12, 13], [14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27], [28, 29, 30, 31, 32, 33, 34], [35, 36, 37, 38, 39, 40, 41]]) >>># What I want to do: >>>select_random_windows(arr, window_size=3) array([[ 1, 2, 3], [11, 12, 13], [14, 15, 16], [22, 23, 24], [38, 39, 40]])
def select_random_windows(arr, window_size): offsets = np.random.randint(0, arr.shape[0] - window_size, size = arr.shape[1]) return arr[:, offsets: offsets + window_size]
但不幸的是,这不起作用
def select_random_windows(arr, wndow_size): result = [] offsets = np.random.randint(0, arr.shape[0]-window_size, size = arr.shape[1]) for row, offset in enumerate(start_indices): result.append(arr[row][offset: offset + window_size]) return np.array(result)
当然,我可以对列表进行理解(并获得最小的速度提升),但是我想知道是否有某种超级聪明的numpy向量化方法可以做到这一点。
这是一种杠杆作用np.lib.stride_tricks.as_strided-
np.lib.stride_tricks.as_strided
def random_windows_per_row_strided(arr, W=3): idx = np.random.randint(0,arr.shape[1]-W+1, arr.shape[0]) strided = np.lib.stride_tricks.as_strided m,n = arr.shape s0,s1 = arr.strides windows = strided(arr, shape=(m,n-W+1,W), strides=(s0,s1,s1)) return windows[np.arange(len(idx)), idx]
在具有10,000行的更大数组上进行运行时测试-
10,000
In [469]: arr = np.random.rand(100000,100) # @Psidom's soln In [470]: %timeit select_random_windows(arr, window_size=3) 100 loops, best of 3: 7.41 ms per loop In [471]: %timeit random_windows_per_row_strided(arr, W=3) 100 loops, best of 3: 6.84 ms per loop # @Psidom's soln In [472]: %timeit select_random_windows(arr, window_size=30) 10 loops, best of 3: 26.8 ms per loop In [473]: %timeit random_windows_per_row_strided(arr, W=30) 100 loops, best of 3: 9.65 ms per loop # @Psidom's soln In [474]: %timeit select_random_windows(arr, window_size=50) 10 loops, best of 3: 41.8 ms per loop In [475]: %timeit random_windows_per_row_strided(arr, W=50) 100 loops, best of 3: 10 ms per loop