小编典典

继续使用现有表,直到月底为止包含预测数据并每天更新

sql

我想在Google BigQuery中使用现有的每日收入数据创建一个新表,并使用基于现有数据并需要创建的预测数据扩展该新表。一旦存在某天的新实际数据,它将覆盖该天的预测数据。而且,直到月底的预测数据都将再次更新。

到目前为止,我想出了以下内容,它会生成一条 错误消息Scalar subquery produced more than oneelement

    SELECT
        date, sum(yl_revenue), 'ACTUAL' as type 
        from project.dataset.table 
        where date >"2020-01-01" and date < current_date() 
        group by date 
        union distinct

        SELECT 
        (select calendar_date 
    FROM 
UNNEST(GENERATE_DATE_ARRAY('2020-01-01', DATE_SUB(DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH), INTERVAL 1 DAY), INTERVAL 1 DAY)) 
AS calendar_date), 
        avg(revenue_daily) as average_daily_revenue, 
        'FORECAST' as type FROM 
            (SELECT sum(revenue) as revenue_daily from project.dataset.table 
    WHERE date > "2020-01-01" and extract(month from date) = extract (month from current_date()) group by date)

我希望数据看起来如何:

 +------------+------------+----------+
|    date    |  revenue   |   type   |
+------------+------------+----------+
| 01.04.2020 | 100 €      | ACTUAL   |
| …          | 5.000 €    | ACTUAL   |
| 23.04.2020 | 200 €      | ACTUAL   |
| 24.04.2020 |  230,43 €  | FORECAST |
| 25.04.2020 |  230,43 €  | FORECAST |
| 26.04.2020 |  230,43 €  | FORECAST |
| 27.04.2020 |  230,43 €  | FORECAST |
| 28.04.2020 |  230,43 €  | FORECAST |
| 29.04.2020 |  230,43 €  | FORECAST |
| 30.04.2020 |  230,43 €  | FORECAST |
+------------+------------+----------+

在第二天(2020年4月24日),它应如下所示:

+------------+--------------+----------+
|    date    |   revenue    |   type   |
+------------+--------------+----------+
| 01.04.2020 | 100 €        | ACTUAL   |
| …          | 5.000 €      | ACTUAL   |
| 23.04.2020 | 200 €        | ACTUAL   |
| 24.04.2020 |  1.000,00 €  | ACTUAL   | <----
| 25.04.2020 |  262,50 €    | FORECAST |
| 26.04.2020 |  262,50 €    | FORECAST |
| 27.04.2020 |  262,50 €    | FORECAST |
| 28.04.2020 |  262,50 €    | FORECAST |
| 29.04.2020 |  262,50 €    | FORECAST |
| 30.04.2020 |  262,50 €    | FORECAST |
+------------+--------------+----------+

预测值只是该月的实际收入之和除以该月到目前为止的天数。请注意,第二个表中的每日预测值已更改,因为已将新的实际值添加到该表中。

非常感谢您提供有关如何解决此问题的帮助!

谢谢

一月


阅读 167

收藏
2021-04-07

共1个答案

小编典典

我找到了解决问题的方法。(尽管它可能不是最复杂的一种)

我现在想出了3个新表:

  1. 提供过去和将来的日期,这就是为什么我称其为“日历”
  2. 提供当月的收入数据。我每天都使用计划查询覆盖该表,该查询提供了实际的过去数据和预测的未来数据(基于当月的实际数据),直到当前月末。
  3. 提供过去的数据(回溯的时间比当月还长),再加上2中的每日更新数据。我也使用预定MERGE查询。

以下是各个查询:

1)

SELECT
  *
FROM
  UNNEST(GENERATE_DATE_ARRAY('2018-01-01', '2030-12-31', INTERVAL 1 DAY)) AS calendar_date
WITH
OFFSET
  AS
OFFSET
ORDER BY
OFFSET

2)

SELECT
  date,
  'actual' AS type,
  ROUND(SUM(revenue),2)
FROM
  `project.dataset.revenue_data` 
WHERE
  EXTRACT(year
  FROM
    date) = EXTRACT (year
  FROM
    CURRENT_DATE())
  AND EXTRACT(month
  FROM
    date) = EXTRACT (month
  FROM
    CURRENT_DATE())
GROUP BY
  date
UNION DISTINCT
SELECT
  calendar_date,
  'forecast',
  (
  SELECT
    ROUND(AVG(revenue_daily),2)
  FROM (
    SELECT
      SUM(revenue) AS revenue_daily
    FROM
      `project.dataset.revenue_data`
    WHERE
      EXTRACT(year
      FROM
        date) = EXTRACT (year
      FROM
        CURRENT_DATE())
      AND EXTRACT(month
      FROM
        date) = EXTRACT (month
      FROM
        CURRENT_DATE())
    GROUP BY
      date
    ORDER BY
      date) AS average_daily_revenue),
FROM
  `project.dataset.calendar`
WHERE
  calendar_date >= CURRENT_DATE()
  AND calendar_date <=DATE_SUB(DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH), INTERVAL 1 DAY)
ORDER BY
  date

3)

MERGE
  `project.dataset.forecast_table` f
USING
  `project.dataset.forecast_month` m
ON
  f.date = m.date
  WHEN MATCHED THEN UPDATE SET f.type = m.type, f.revenue = m.revenue
  WHEN NOT MATCHED
  AND m.date >= CURRENT_DATE() THEN
INSERT
  (date,
    type,
    revenue)
VALUES
  (date, type, revenue)
2021-04-07