我知道BigQuery中没有数据类型。您希望在BigQuery中处理什么语义版本?
我有以下架构:
software:string, software_version:string
software_version列是string但我存储在其中的数据为semver格式:`MAJOR.MINOR.PATCH-prerelease
string
我特别想执行运算符< > =。
<
>
=
select '4.0.0' < '4.0.0-beta'
返回true,但是根据semver的定义,它是错误的。因为char-用于预发布。
true
-
以下是BigQuery标准SQL
您可以使用compareSemanticVersionUDF比较两个语义版本 和/或使用normaizedSemanticVersionUDF通过ORDER BY子句对输出进行排序。 参见下面的示例(将比较用例和排序依据)合二为一
compareSemanticVersion
normaizedSemanticVersion
#standardSQL CREATE TEMP FUNCTION normaizedSemanticVersion(semanticVersion STRING) AS (( SELECT STRING_AGG( IF(isDigit, REPEAT('0', 8 - LENGTH(chars)) || chars, chars), '' ORDER BY grp ) || 'zzzzzzzzzzzzzz' FROM ( SELECT grp, isDigit, STRING_AGG(char, '' ORDER BY OFFSET) chars, FROM ( SELECT OFFSET, char, isDigit, COUNTIF(NOT isDigit) OVER(ORDER BY OFFSET) AS grp FROM UNNEST(SPLIT(semanticVersion, '')) AS char WITH OFFSET, UNNEST([char IN ('1','2','3','4','5','6','7','8','9','0')]) isDigit ) GROUP BY grp, isDigit ))); CREATE TEMP FUNCTION compareSemanticVersions( normSemanticVersion1 STRING, normSemanticVersion2 STRING) AS (( SELECT CASE WHEN v1 < v2 THEN 'v2 newer than v1' WHEN v1 > v2 THEN 'v1 newer than v2' ELSE 'same versions' END FROM UNNEST([STRUCT( normaizedSemanticVersion(normSemanticVersion1) AS v1, normaizedSemanticVersion(normSemanticVersion2) AS v2 )]) )); WITH test AS ( SELECT '1.10.0-alpha' AS v1 , '1.0.0-alpha.1' AS v2 UNION ALL SELECT '4.0.0', '4.0.0-beta' UNION ALL SELECT '1.0.0-alpha.1' , '1.0.0-alpha.beta' UNION ALL SELECT '1.0.0-alpha.beta' , '1.0.0-beta' UNION ALL SELECT '1.0.0-beta' , '1.0.0-beta.2' UNION ALL SELECT '1.0.0-beta.2' , '1.0.0-beta.11' UNION ALL SELECT '1.0.0-beta.11' , '1.0.0-rc.1' UNION ALL SELECT '1.0.0-rc.1' , '1.0.0' UNION ALL SELECT '1.0.0-alpha-1.1+build1234-a', '1.0.0-alpha-1.1+build1234-a' ) SELECT v1, v2, compareSemanticVersions(v1, v2) result FROM test ORDER BY normaizedSemanticVersion(v1)
带输出
Row v1 v2 result 1 1.0.0-alpha-1.1+build1234-a 1.0.0-alpha-1.1+build1234-a same versions 2 1.0.0-alpha.1 1.0.0-alpha.beta v2 newer than v1 3 1.0.0-alpha.beta 1.0.0-beta v2 newer than v1 4 1.0.0-beta.2 1.0.0-beta.11 v2 newer than v1 5 1.0.0-beta.11 1.0.0-rc.1 v2 newer than v1 6 1.0.0-beta 1.0.0-beta.2 v1 newer than v2 7 1.0.0-rc.1 1.0.0 v2 newer than v1 8 1.10.0-alpha 1.0.0-alpha.1 v1 newer than v2 9 4.0.0 4.0.0-beta v1 newer than v2
注意:我在阅读您提供的参考资料后,根据对语义版本控制的理解,在上面编写了UDF。潜在的一些极端情况仍然需要解决。但是绝对应该适用于简单的情况,我希望您将能够简单地采用这些UDF并根据您的特定需求调整输出,甚至可以优化我最终在这里使用的IDF。
仅供参考:在normaizedSemanticVersionUDF中,我zzzzzzzzzz仅用于解决一些极端情况。我尝试的另一个选择是..zzzzzzzzzz(请注意两个额外的点)-我认为这对于更复杂的情况会提供更好的结果- 但是我真的没有时间完成测试。请试试
zzzzzzzzzz
..zzzzzzzzzz
例如,在“语义版本控制”页面中,有一个示例:1.0.0-alpha <1.0.0-alpha.1 <1.0.0-alpha beta <1.0.0-beta <1.0.0-beta.2 <1.0。 0-beta.11 <1.0.0-rc.1 <1.0.0。
要使用与该示例相同的顺序-..zzzzzzzzzz应该使用-参见下文
#standardSQL CREATE TEMP FUNCTION normaizedSemanticVersion(semanticVersion STRING) AS (( SELECT STRING_AGG( IF(isDigit, REPEAT('0', 8 - LENGTH(chars)) || chars, chars), '' ORDER BY grp ) || '..zzzzzzzzzzzzzz' FROM ( SELECT grp, isDigit, STRING_AGG(char, '' ORDER BY OFFSET) chars, FROM ( SELECT OFFSET, char, isDigit, COUNTIF(NOT isDigit) OVER(ORDER BY OFFSET) AS grp FROM UNNEST(SPLIT(semanticVersion, '')) AS char WITH OFFSET, UNNEST([char IN ('1','2','3','4','5','6','7','8','9','0')]) isDigit ) GROUP BY grp, isDigit ))); CREATE TEMP FUNCTION compareSemanticVersions( normSemanticVersion1 STRING, normSemanticVersion2 STRING) AS (( SELECT CASE WHEN v1 < v2 THEN 'v2 newer than v1' WHEN v1 > v2 THEN 'v1 newer than v2' ELSE 'same versions' END FROM UNNEST([STRUCT( normaizedSemanticVersion(normSemanticVersion1) AS v1, normaizedSemanticVersion(normSemanticVersion2) AS v2 )]) )); WITH test AS ( SELECT 1 `order`, '1.0.0-alpha' version UNION ALL SELECT 2, '1.0.0-alpha.1' UNION ALL SELECT 3, '1.0.0-alpha.beta' UNION ALL SELECT 4, '1.0.0-beta' UNION ALL SELECT 5, '1.0.0-beta.2' UNION ALL SELECT 6, '1.0.0-beta.11' UNION ALL SELECT 7, '1.0.0-rc.1' UNION ALL SELECT 8, '1.0.0.' ) SELECT * FROM test ORDER BY normaizedSemanticVersion(version)
输出与语义版本控制规范匹配
Row order version 1 1 1.0.0-alpha 2 2 1.0.0-alpha.1 3 3 1.0.0-alpha.beta 4 4 1.0.0-beta 5 5 1.0.0-beta.2 6 6 1.0.0-beta.11 7 7 1.0.0-rc.1 8 8 1.0.0.