我一直在尝试从数据集中为所有行选择一组特定的列。我尝试了以下类似的方法。
train_features = train_df.loc[,[0,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]]
我想提一下,所有行都包含在内,但只需要编号的列即可。有没有更好的方法来解决这个问题。
样本数据:
age job marital education default housing loan equities contact duration campaign pdays previous poutcome emp.var.rate cons.price.idx cons.conf.idx euribor3m nr.employed y 56 housemaid married basic.4y 1 1 1 1 0 261 1 999 0 2 1.1 93.994 -36.4 3.299552287 5191 1 37 services married high.school 1 0 1 1 0 226 1 999 0 2 1.1 93.994 -36.4 0.743751247 5191 1 56 services married high.school 1 1 0 1 0 307 1 999 0 2 1.1 93.994 -36.4 1.28265179 5191 1
我试图忽略我的数据集中的工作,婚姻,教育和y栏。y列是目标变量。
如果需要按位置选择,请使用iloc:
iloc
train_features = train_df.iloc[:, [0,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]] print (train_features) age default housing loan equities contact duration campaign pdays \ 0 56 1 1 1 1 0 261 1 999 1 37 1 0 1 1 0 226 1 999 2 56 1 1 0 1 0 307 1 999 previous poutcome emp.var.rate cons.price.idx cons.conf.idx euribor3m \ 0 0 2 1.1 93.994 -36.4 3.299552 1 0 2 1.1 93.994 -36.4 0.743751 2 0 2 1.1 93.994 -36.4 1.282652 nr.employed 0 5191 1 5191 2 5191
另一个解决方案是drop不必要的列:
drop
cols= ['job','marital','education','y'] train_features = train_df.drop(cols, axis=1) print (train_features) age default housing loan equities contact duration campaign pdays \ 0 56 1 1 1 1 0 261 1 999 1 37 1 0 1 1 0 226 1 999 2 56 1 1 0 1 0 307 1 999 previous poutcome emp.var.rate cons.price.idx cons.conf.idx euribor3m \ 0 0 2 1.1 93.994 -36.4 3.299552 1 0 2 1.1 93.994 -36.4 0.743751 2 0 2 1.1 93.994 -36.4 1.282652 nr.employed 0 5191 1 5191 2 5191