我有一个多级数据框df。作为列,我分析了不同的“对象”。作为行索引,我有一个`Case IDlc和时间t。
df
对于每种情况lc,我需要找到t每个对象达到目标值的时间(理想情况下为插值,但最接近的值足够精细)。
lc
该目标值是时间上给定对象的函数t==0。
t==0
import pandas as pd print(pd.__version__) 0.16.2
虚拟数据集示例:
data = {1: {(1014, 0.0): 20.25, (1014, 0.0991): 19.08, (1014, 0.1991): 18.43, (1014, 0.2991): 19.03, (1014, 0.3991): 18.71, (1015, 0.0): 20.22, (1015, 0.0991): 19.3, (1015, 0.1991): 18.68, (1015, 0.2991): 18.22, (1015, 0.3991): 17.84, (1016, 0.0): 21.75, (1016, 0.0991): 19.97, (1016, 0.1991): 19.65, (1016, 0.2991): 19.29, (1016, 0.3991): 18.94 }, 2: {(1014, 0.0): 29.11, (1014, 0.0991): 28.68, (1014, 0.1991): 28.27, (1014, 0.2991): 27.46, (1014, 0.3991): 26.96, (1015, 0.0): 29.22, (1015, 0.0991): 28.64, (1015, 0.1991): 28.18, (1015, 0.2991): 27.74, (1015, 0.3991): 27.25, (1016, 0.0): 29.17, (1016, 0.0991): 28.68, (1016, 0.1991): 28.17, (1016, 0.2991): 27.68, (1016, 0.3991): 27.18 }, 3: {(1014, 0.0): 22.01, (1014, 0.0991): 21.5, (1014, 0.1991): 21.18, (1014, 0.2991): 20.58, (1014, 0.3991): 20.21, (1015, 0.0): 21.81, (1015, 0.0991): 21.46, (1015, 0.1991): 21.11, (1015, 0.2991): 20.78, (1015, 0.3991): 20.42, (1016, 0.0): 21.82, (1016, 0.0991): 21.49, (1016, 0.1991): 21.11, (1016, 0.2991): 20.75, (1016, 0.3991): 20.37 }} df = pd.DataFrame(data).sort() df.index.names=['case', 't']
数据框如下所示:
1 2 3 case t 1014 0.0000 20.25 29.11 22.01 0.0991 19.08 28.68 21.50 0.1991 18.43 28.27 21.18 0.2991 19.03 27.46 20.58 0.3991 18.71 26.96 20.21 1015 0.0000 20.22 29.22 21.81 0.0991 19.30 28.64 21.46 0.1991 18.68 28.18 21.11 0.2991 18.22 27.74 20.78 0.3991 17.84 27.25 20.42 1016 0.0000 21.75 29.17 21.82 0.0991 19.97 28.68 21.49 0.1991 19.65 28.17 21.11 0.2991 19.29 27.68 20.75 0.3991 18.94 27.18 20.37
目标值是时间值的函数t==0。通常,半时间段的k = 0.5。对于当前样本,我们将取k = 0.926
由于对值进行了排序,每种情况都可以采用第一行。
targets = df.groupby(level='case').first() * 0.926 print(targets) 1 2 3 case 1014 18.75150 26.95586 20.38126 1015 18.72372 27.05772 20.19606 1016 20.14050 27.01142 20.20532
现在,我如何简单地构建以下数据框,以显示t每个对象达到上述计算的目标值的时间?
1 2 3 case 1014 0.3991 0.3991 0.2991 1015 0.1991 0.3991 0.3991 1016 0.0991 0.3991 0.3991
这些有点骇人听闻,让我们看看是否有更好的解决方案:
In [36]: targets['t']=0 In [37]: df2 = df.reset_index().set_index('case') - targets In [38]: df3 = df2.groupby(df2.index).transform(lambda x: x.abs()==np.min(x.abs())) In [39]: df4 = pd.DataFrame({'1': df2.t[df3[1]], '2': df2.t[df3[2]], '3': df2.t[df3[3]]}) print df4 1 2 3 case 1014 0.3991 0.3991 0.3991 1015 0.1991 0.3991 0.3991 1016 0.0991 0.3991 0.3991