我有一个DataFrame,其中的一列包含每行的标签(除了每行的一些相关数据)。我有一本字典,其键等于可能的标签,而值等于与该标签相关的信息的2元组。我想在框架上添加两个新列,每个对应于标签的2元组的每一部分。
这是设置:
import pandas as pd import numpy as np np.random.seed(1) n = 10 labels = list('abcdef') colors = ['red', 'green', 'blue'] sizes = ['small', 'medium', 'large'] labeldict = {c: (np.random.choice(colors), np.random.choice(sizes)) for c in labels} df = pd.DataFrame({'label': np.random.choice(labels, n), 'somedata': np.random.randn(n)})
我可以通过运行获得想要的东西:
df['color'], df['size'] = zip(*df['label'].map(labeldict)) print df label somedata color size 0 b 0.196643 red medium 1 c -1.545214 green small 2 a -0.088104 green small 3 c 0.852239 green small 4 b 0.677234 red medium 5 c -0.106878 green small 6 a 0.725274 green small 7 d 0.934889 red medium 8 a 1.118297 green small 9 c 0.055613 green small
但是,如果我不想手动在作业左侧键入两列,该怎么办?即,如何动态创建多个新列。例如,如果我有10个元组labeldict而不是2个元组,那么这将是当前所写的真正痛苦。这有几项无效:
labeldict
# set up attrlist for later use attrlist = ['color', 'size'] # non-working idea 1) df[attrlist] = zip(*df['label'].map(labeldict)) # non-working idea 2) df.loc[:, attrlist] = zip(*df['label'].map(labeldict))
确实可以,但是似乎很简单:
for a in attrlist: df[a] = 0 df[attrlist] = zip(*df['label'].map(labeldict))
更好的解决方案?
您可以改为使用合并:
>>> ld = pd.DataFrame(labeldict).T >>> ld.columns = ['color', 'size'] >>> ld.index.name = 'label' >>> df.merge(ld.reset_index(), on='label') label somedata color size 0 b 1.462108 red medium 1 c -2.060141 green small 2 c 1.133769 green small 3 c 0.042214 green small 4 e -0.322417 red medium 5 e -1.099891 red medium 6 e -0.877858 red medium 7 e 0.582815 red medium 8 f -0.384054 red large 9 d -0.172428 red medium