如何从历史数据中检索行程？

小编典典

如何从历史数据中检索行程？

sql

我mytable在Hive中有下表：

id    radar_id     car_id     datetime
1     A21          123        2017-03-08 17:31:19.0
2     A21          555        2017-03-08 17:32:00.0
3     A21          777        2017-03-08 17:33:00.0
4     B15          123        2017-03-08 17:35:22.0
5     B15          555        2017-03-08 17:34:05.0
5     B15          777        2017-03-08 20:50:12.0
6     A21          123        2017-03-09 11:00:00.0
7     C11          123        2017-03-09 11:10:00.0
8     A21          123        2017-03-09 11:12:00.0
9     A21          555        2017-03-09 11:12:10.0
10    B15          123        2017-03-09 11:14:00.0
11    C11          555        2017-03-09 11:20:00.0

我想汽车通过雷达的路线A21和B15同样的行程之内。例如，如果同一日期的日期不同car_id，则它不是同一趟旅程。基本上，我想考虑雷达A21与B15同一车辆之间的最大时差应为30分钟。如果更大，则行程不一样，例如的行程car_id
777。

我的最终目标是计算每天的平均出行次数（非唯一性，因此，如果同一辆车通过相同路线经过2次，则应计算2次）。

预期结果如下：

radar_start   radar_end       avg_tripscount_per_day
A21           B15             1.5

在该日期2017-03-08，雷达A21与之间有2次旅行B15（777由于30分钟的限制，不考虑乘车），而在该日期2017-03-09，只有1次旅行。每天平均2
+ 1 = 1.5次旅行。

我怎么能得到这个结果？基本上，我不知道如何在查询中引入30分钟的限制以及如何通过radar_start和分组乘车radar_end。

谢谢。

更新：

行程在开始日期进行注册。
如果汽车被雷达触发A21的2017-03-08 23:55，并通过雷达B15在2017-03-09 00:15，那么它应该被视为日注册了相同的行程2017-03-08。
在ids6和8的情况下，同一辆车123经过了A21两次，然后转向B15（id10）。最后一次乘id8时应考虑。因此，8-10。因此，最接近的B15。解释是，一辆汽车经过A21两次，第二次转向B15。

阅读 191

2021-04-07

共1个答案

小编典典

select  count(*) / count(distinct to_date(datetime))    as trips_per_day

from   (select  radar_id
               ,datetime
               ,lead(radar_id) over w  as next_radar_id
               ,lead(datetime) over w  as next_datetime

        from    mytable

        where   radar_id in ('A21','B15')

        window  w as 
                (
                    partition by  car_id
                    order by      datetime
                )
        ) t

where   radar_id        = 'A21'
    and next_radar_id   = 'B15'
    and datetime + interval '30' minutes >= next_datetime
;

+----------------+
| trips_per_day  |
+----------------+
| 1.5            |
+----------------+

ps
如果您的版本不支持间隔，则最后的代码记录可以替换为-
and to_unix_timestamp(datetime) + 30*60 > to_unix_timestamp(next_datetime)

2021-04-07