我有一个227x4的DataFrame,其中包含要清除的国家/地区名称和数字值(缠结?)。
这是DataFrame的抽象:
import pandas as pd import random import string import numpy as np pdn = pd.DataFrame(["".join([random.choice(string.ascii_letters) for i in range(3)]) for j in range (6)], columns =['Country Name']) measures = pd.DataFrame(np.random.random_integers(10,size=(6,2)), columns=['Measure1','Measure2']) df = pdn.merge(measures, how= 'inner', left_index=True, right_index =True) df.iloc[4,1] = 'str' df.iloc[1,2] = 'stuff' print(df) Country Name Measure1 Measure2 0 tua 6 3 1 MDK 3 stuff 2 RJU 7 2 3 WyB 7 8 4 Nnr str 3 5 rVN 7 4
如何np.nan在不更改国家/地区名称的情况下用所有列替换字符串值?
np.nan
我尝试使用布尔面罩:
mask = df.loc[:,measures.columns].applymap(lambda x: isinstance(x, (int, float))).values print(mask) [[ True True] [ True False] [ True True] [ True True] [False True] [ True True]] # I thought the following would replace by default false with np.nan in place, but it didn't df.loc[:,measures.columns].where(mask, inplace=True) print(df) Country Name Measure1 Measure2 0 tua 6 3 1 MDK 3 stuff 2 RJU 7 2 3 WyB 7 8 4 Nnr str 3 5 rVN 7 4 # this give a good output, unfortunately it's missing the country names print(df.loc[:,measures.columns].where(mask)) Measure1 Measure2 0 6 3 1 3 NaN 2 7 2 3 7 8 4 NaN 3 5 7 4
只分配感兴趣的列:
cols = ['Measure1','Measure2'] mask = df[cols].applymap(lambda x: isinstance(x, (int, float))) df[cols] = df[cols].where(mask) print (df) Country Name Measure1 Measure2 0 uFv 7 8 1 vCr 5 NaN 2 qPp 2 6 3 QIC 10 10 4 Suy NaN 8 5 eFS 6 4
一个元问题,在这里提出一个问题(包括研究)要花费我3个多小时是正常的吗?
我认为是的,提出一个好的问题确实很难。