我最近几周正在学习PyParsing。我计划使用它从SQL语句获取表名。我看了http://pyparsing.wikispaces.com/file/view/simpleSQL.py。但是我打算使语法保持简单,因为我不是在试图解析select语句的每个部分,而是在寻找表名。同样,为任何市售的现代数据库(如Teradata)定义完整的语法也很费劲。
#!/usr/bin/env python from pyparsing import * import sys semicolon = Combine(Literal(';') + lineEnd) comma = Literal(',') lparen = Literal('(') rparen = Literal(')') # Keyword definition update_kw, volatile_kw, create_kw, table_kw, as_kw, from_kw, \ where_kw, join_kw, left_kw, right_kw, cross_kw, outer_kw, \ on_kw , insert_kw , into_kw= \ map(lambda x: Keyword(x, caseless=True), \ ['UPDATE', 'VOLATILE', 'CREATE', 'TABLE', 'AS', 'FROM', 'WHERE', 'JOIN' , 'LEFT', 'RIGHT' , \ 'CROSS', 'OUTER', 'ON', 'INSERT', 'INTO']) # Teradata SQL allows SELECT and well as SEL keyword select_kw = Keyword('SELECT', caseless=True) | Keyword('SEL' , caseless=True) # list of reserved keywords reserved_words = (update_kw | volatile_kw | create_kw | table_kw | as_kw | select_kw | from_kw | where_kw | join_kw | left_kw | right_kw | cross_kw | on_kw | insert_kw | into_kw) # Identifier can be used as table or column names. They can't be reserved words ident = ~reserved_words + Word(alphas, alphanums + '_') # Recursive definition for table table = Forward() # simple table name can be identifer or qualified identifier e.g. schema.table simple_table = Combine(Optional(ident + Literal('.')) + ident) # table name can also a complete select statement used as table nested_table = lparen.suppress() + select_kw.suppress() + SkipTo(from_kw).suppress() + \ from_kw.suppress() + table + rparen.suppress() # table can be simple table or nested table table << (nested_table | simple_table) # comma delimited list of tables table_list = delimitedList(table) # Building from clause only because table name(s) will always appears after that from_clause = from_kw.suppress() + table_list txt = """ SELECT p, (SELECT * FROM foo),e FROM a, d, (SELECT * FROM z), b """ for token, start, end in from_clause.scanString(txt): print token
这里值得一提。我使用“ SkipTo(from_kw)”跳过SQL语句中的列列表。这主要是为了避免为列列表定义语法,列列表可以是逗号分隔的标识符,许多函数名称,DW分析函数之类的列表,而不能用逗号分隔。通过这种语法,我可以解析上面的语句以及SELECT列列表或表列表中的任何嵌套级别。
['foo'] ['a', 'd', 'z', 'b']
当SELECT具有where子句时,我遇到问题:
nested_table = lparen.suppress() + select_kw.suppress() + SkipTo(from_kw).suppress() + \ from_kw.suppress() + table + rparen.suppress()
当WHERE子句存在时,相同的语句可能看起来像:SELECT … FROM a,d,(SELECT * FROM z WHERE(c1 = 1)和(c2 = 3)),p我想到了更改“ nested_table”的定义到:
nested_table = lparen.suppress() + select_kw.suppress() + SkipTo(from_kw).suppress() + \ from_kw.suppress() + table + Optional(where_kw + SkipTo(rparen)) + rparen
但这不起作用,因为它与“ c = 1”之后的右括号匹配。我想知道的是如何在“ SELECT * FROM z …”之前跳到与左括号匹配的右括号,我不知道如何使用PyParsing
另外,我还寻求一些建议,这是从复杂的嵌套SQL获取表名的最佳方法。任何帮助,我们真的很感激。
谢谢阿比吉特
考虑到您还试图解析嵌套的SELECT,我认为您将无法避免编写一个相当完整的SQL解析器。幸运的是,Pyparsing Wiki示例页面上有一个更完整的示例select_parser.py。我希望这能使您走得更远。