我有一个带有这样的在线会话的表(空行只是为了更好地显示):
ip_address | start_time | stop_time ------------|------------------|------------------ 10.10.10.10 | 2016-04-02 08:00 | 2016-04-02 08:12 10.10.10.10 | 2016-04-02 08:11 | 2016-04-02 08:20 10.10.10.10 | 2016-04-02 09:00 | 2016-04-02 09:10 10.10.10.10 | 2016-04-02 09:05 | 2016-04-02 09:08 10.10.10.10 | 2016-04-02 09:05 | 2016-04-02 09:11 10.10.10.10 | 2016-04-02 09:02 | 2016-04-02 09:15 10.10.10.10 | 2016-04-02 09:10 | 2016-04-02 09:12 10.66.44.22 | 2016-04-02 08:05 | 2016-04-02 08:07 10.66.44.22 | 2016-04-02 08:03 | 2016-04-02 08:11
我需要“包围”在线时间跨度:
ip_address | full_start_time | full_stop_time ------------|------------------|------------------ 10.10.10.10 | 2016-04-02 08:00 | 2016-04-02 08:20 10.10.10.10 | 2016-04-02 09:00 | 2016-04-02 09:15 10.66.44.22 | 2016-04-02 08:03 | 2016-04-02 08:11
我有此查询返回所需的结果:
WITH t AS -- Determine full time-range of each IP (SELECT ip_address, MIN(start_time) AS min_start_time, MAX(stop_time) AS max_stop_time FROM IP_SESSIONS GROUP BY ip_address), t2 AS -- compose ticks (SELECT DISTINCT ip_address, min_start_time + (LEVEL-1) * INTERVAL '1' MINUTE AS ts FROM t CONNECT BY min_start_time + (LEVEL-1) * INTERVAL '1' MINUTE <= max_stop_time), t3 AS -- get all "online" ticks (SELECT DISTINCT ip_address, ts FROM t2 JOIN IP_SESSIONS USING (ip_address) WHERE ts BETWEEN start_time AND stop_time), t4 AS (SELECT ip_address, ts, LAG(ts) OVER (PARTITION BY ip_address ORDER BY ts) AS previous_ts FROM t3), t5 AS (SELECT ip_address, ts, SUM(DECODE(previous_ts,NULL,1,0 + (CASE WHEN previous_ts + INTERVAL '1' MINUTE <> ts THEN 1 ELSE 0 END))) OVER (PARTITION BY ip_address ORDER BY ts ROWS UNBOUNDED PRECEDING) session_no FROM t4) SELECT ip_address, MIN(ts) AS full_start_time, MAX(ts) AS full_stop_time FROM t5 GROUP BY ip_address, session_no ORDER BY 1,2;
但是,我担心性能。该表具有数亿行,时间分辨率为毫秒(如示例中所示,不是一分钟)。因此,CTEt3将会非常庞大。是否有人有避免自我加入和“ CONNECT BY”的解决方案?
t3
单个智能分析功能将非常有用。
也尝试这个。我尽我所能进行了测试,我相信它涵盖了所有可能性,包括合并相邻的时间间隔(10:15至10:30和10:30至10:40合并为一个时间间隔,即10:15至10:40 )。它也应该相当快,用处不大。
with m as ( select ip_address, start_time, max(stop_time) over (partition by ip_address order by start_time rows between unbounded preceding and 1 preceding) as m_time from ip_sessions union all select ip_address, NULL, max(stop_time) from ip_sessions group by ip_address ), n as ( select ip_address, start_time, m_time from m where start_time > m_time or start_time is null or m_time is null ), f as ( select ip_address, start_time, lead(m_time) over (partition by ip_address order by start_time) as stop_time from n ) select * from f where start_time is not null /