我想在Google BigQuery中使用现有的每日收入数据创建一个新表,并使用基于现有数据并需要创建的预测数据扩展该新表。一旦存在某天的新实际数据,它将覆盖该天的预测数据。而且,直到月底的预测数据都将再次更新。
到目前为止,我想出了以下内容,它会生成一条 错误消息 :Scalar subquery produced more than oneelement
Scalar subquery produced more than oneelement
SELECT date, sum(yl_revenue), 'ACTUAL' as type from project.dataset.table where date >"2020-01-01" and date < current_date() group by date union distinct SELECT (select calendar_date FROM UNNEST(GENERATE_DATE_ARRAY('2020-01-01', DATE_SUB(DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH), INTERVAL 1 DAY), INTERVAL 1 DAY)) AS calendar_date), avg(revenue_daily) as average_daily_revenue, 'FORECAST' as type FROM (SELECT sum(revenue) as revenue_daily from project.dataset.table WHERE date > "2020-01-01" and extract(month from date) = extract (month from current_date()) group by date)
我希望数据看起来如何:
+------------+------------+----------+ | date | revenue | type | +------------+------------+----------+ | 01.04.2020 | 100 € | ACTUAL | | … | 5.000 € | ACTUAL | | 23.04.2020 | 200 € | ACTUAL | | 24.04.2020 | 230,43 € | FORECAST | | 25.04.2020 | 230,43 € | FORECAST | | 26.04.2020 | 230,43 € | FORECAST | | 27.04.2020 | 230,43 € | FORECAST | | 28.04.2020 | 230,43 € | FORECAST | | 29.04.2020 | 230,43 € | FORECAST | | 30.04.2020 | 230,43 € | FORECAST | +------------+------------+----------+
在第二天(2020年4月24日),它应如下所示:
+------------+--------------+----------+ | date | revenue | type | +------------+--------------+----------+ | 01.04.2020 | 100 € | ACTUAL | | … | 5.000 € | ACTUAL | | 23.04.2020 | 200 € | ACTUAL | | 24.04.2020 | 1.000,00 € | ACTUAL | <---- | 25.04.2020 | 262,50 € | FORECAST | | 26.04.2020 | 262,50 € | FORECAST | | 27.04.2020 | 262,50 € | FORECAST | | 28.04.2020 | 262,50 € | FORECAST | | 29.04.2020 | 262,50 € | FORECAST | | 30.04.2020 | 262,50 € | FORECAST | +------------+--------------+----------+
预测值只是该月的实际收入之和除以该月到目前为止的天数。请注意,第二个表中的每日预测值已更改,因为已将新的实际值添加到该表中。
非常感谢您提供有关如何解决此问题的帮助!
谢谢
一月
我找到了解决问题的方法。(尽管它可能不是最复杂的一种)
我现在想出了3个新表:
MERGE
以下是各个查询:
1)
SELECT * FROM UNNEST(GENERATE_DATE_ARRAY('2018-01-01', '2030-12-31', INTERVAL 1 DAY)) AS calendar_date WITH OFFSET AS OFFSET ORDER BY OFFSET
2)
SELECT date, 'actual' AS type, ROUND(SUM(revenue),2) FROM `project.dataset.revenue_data` WHERE EXTRACT(year FROM date) = EXTRACT (year FROM CURRENT_DATE()) AND EXTRACT(month FROM date) = EXTRACT (month FROM CURRENT_DATE()) GROUP BY date UNION DISTINCT SELECT calendar_date, 'forecast', ( SELECT ROUND(AVG(revenue_daily),2) FROM ( SELECT SUM(revenue) AS revenue_daily FROM `project.dataset.revenue_data` WHERE EXTRACT(year FROM date) = EXTRACT (year FROM CURRENT_DATE()) AND EXTRACT(month FROM date) = EXTRACT (month FROM CURRENT_DATE()) GROUP BY date ORDER BY date) AS average_daily_revenue), FROM `project.dataset.calendar` WHERE calendar_date >= CURRENT_DATE() AND calendar_date <=DATE_SUB(DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH), INTERVAL 1 DAY) ORDER BY date
3)
MERGE `project.dataset.forecast_table` f USING `project.dataset.forecast_month` m ON f.date = m.date WHEN MATCHED THEN UPDATE SET f.type = m.type, f.revenue = m.revenue WHEN NOT MATCHED AND m.date >= CURRENT_DATE() THEN INSERT (date, type, revenue) VALUES (date, type, revenue)