我有一个如下所示的pandas数据框,并通过一列保存数据组id:
id
import numpy as np import pandas as pd df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD')) df['id'] = ['W', 'W', 'W', 'Z', 'Z', 'Y', 'Y', 'Y', 'Z', 'Z'] print(df) A B C D id 0 0.347501 -1.152416 1.441144 -0.144545 w 1 0.775828 -1.176764 0.203049 -0.305332 w 2 1.036246 -0.467927 0.088138 -0.438207 w 3 -0.737092 -0.231706 0.268403 0.464026 x 4 -1.857346 -1.420284 -0.515517 -0.231774 x 5 -0.970731 0.217890 0.193814 -0.078838 y 6 -0.318314 -0.244348 0.162103 1.204386 y 7 0.340199 1.074977 1.201068 -0.431473 y 8 0.202050 0.790434 0.643458 -0.068620 z 9 -0.882865 0.687325 -0.008771 -0.066912 z
现在,我想创建新的数据框(名为df_w,df_x,df_y,df_z),这些数据框仅保存其原始数据中的数据,并在一些可迭代的列表(例如列表)中进行最佳组合:
df_w A B C D id 0 0.347501 -1.152416 1.441144 -0.144545 w 1 0.775828 -1.176764 0.203049 -0.305332 w 2 1.036246 -0.467927 0.088138 -0.438207 w df_x A B C D id 0 -0.737092 -0.231706 0.268403 0.464026 x 1 -1.857346 -1.420284 -0.515517 -0.231774 x df_y A B C D id 0 -0.970731 0.217890 0.193814 -0.078838 y 1 -0.318314 -0.244348 0.162103 1.204386 y 2 0.340199 1.074977 1.201068 -0.431473 y df_z A B C D id 0 0.202050 0.790434 0.643458 -0.068620 z 1 -0.882865 0.687325 -0.008771 -0.066912 z
有没有使用groupby,apply和/或applymap和函数来实现此目的的智能(矢量化熊猫)方法?
我当时正在考虑对数据框进行迭代,但这似乎不是很优雅。
预先感谢您的任何提示!
我们可以创建DF的字典:
In [166]: dfs = {k:v for k,v in df.groupby('id')} In [168]: dfs.keys() Out[168]: dict_keys(['W', 'Y', 'Z']) In [169]: dfs['W'] Out[169]: A B C D id 0 -0.373021 -0.555218 0.022980 -0.512323 W 1 -1.599466 0.637292 0.045059 -0.334030 W 2 0.100659 0.557068 0.142226 -0.186214 W In [170]: dfs['Y'] Out[170]: A B C D id 5 0.540107 -0.739077 0.992408 2.010203 Y 6 -0.201376 -0.913222 -0.173284 1.837442 Y 7 -1.367659 0.915360 0.072720 -0.886071 Y In [171]: dfs['Z'] Out[171]: A B C D id 3 -0.329087 0.842431 0.839319 -0.597823 Z 4 -0.594375 -0.950486 1.125584 0.116599 Z 8 0.366667 -0.978279 -1.449893 0.192451 Z 9 -0.007439 -0.084612 0.010192 -0.417602 Z
更新: 重置索引:
In [177]: {k:v.reset_index(drop=True) for k,v in df.groupby('id')} Out[177]: {'W': A B C D id 0 -0.373021 -0.555218 0.022980 -0.512323 W 1 -1.599466 0.637292 0.045059 -0.334030 W 2 0.100659 0.557068 0.142226 -0.186214 W, 'Y': A B C D id 0 0.540107 -0.739077 0.992408 2.010203 Y 1 -0.201376 -0.913222 -0.173284 1.837442 Y 2 -1.367659 0.915360 0.072720 -0.886071 Y, 'Z': A B C D id 0 -0.329087 0.842431 0.839319 -0.597823 Z 1 -0.594375 -0.950486 1.125584 0.116599 Z 2 0.366667 -0.978279 -1.449893 0.192451 Z 3 -0.007439 -0.084612 0.010192 -0.417602 Z}