我有以下数据框:
c1 c2 0 Date of transaction 2021-04-26 1 Deposit amount 10,000 2 Withdrawal amount NaN 3 Balance 10,000 4 Comments store_a 5 Date of transaction 2021-04-26 6 Deposit amount 20,000 7 Withdrawal amount NaN 8 Balance 21,000 9 Comments store_b
我想:
Date of transaction Deposit amount Withdrawal amount Balance Comments 0 2021-04-26 10,000 NaN 10,000 store_a 5 2021-04-26 20,000 NaN 21,000 store_b
我该怎么做?
我尝试了一个支点,但我得到了这个结果:
c2 c1 Balance Comments Date of transaction Deposit amount Withdrawal amount 0 NaN NaN 2021-04-26 NaN NaN 1 NaN NaN NaN 10,000 NaN 2 NaN NaN NaN NaN NaN 3 10,000 NaN NaN NaN NaN 4 NaN store_a NaN NaN NaN 5 NaN NaN 2021-04-26 NaN NaN 6 NaN NaN NaN 20,000 NaN 7 NaN NaN NaN NaN NaN 8 21,000 NaN NaN NaN NaN 9 NaN store_b NaN NaN NaN
我相信我之前需要对我的数据进行分组,但我没有成功地做到这一点。
首先分配某种唯一的 id - 我猜我们可以做到这一点,因为每次出现 c1 的样本数据等于“交易日期”,然后旋转:
df['transaction_no'] = df['c1'].eq('Date of transaction').cumsum() df2 = df.pivot('transaction_no', 'c1', 'c2')
这给了你:
| transaction_no | Balance | Comments | Date of transaction | Deposit amount | Withdrawal amount | |-----------------:|:----------|:-----------|:----------------------|:-----------------|--------------------:| | 1 | 10,000 | store_a | 2021-04-26 | 10,000 | nan | | 2 | 21,000 | store_b | 2021-04-26 | 20,000 | nan |