使用pyODBC的fast_executemany加速pandas.DataFrame.to

小编典典

使用pyODBC的fast_executemany加速pandas.DataFrame.to_sql

python

我想将大型邮件发送pandas.DataFrame到运行MS
SQL的远程服务器。我现在做的方法是将一个data_frame对象转换为元组列表，然后使用pyODBC的executemany()函数将其发送出去。它是这样的：

 import pyodbc as pdb

 list_of_tuples = convert_df(data_frame)

 connection = pdb.connect(cnxn_str)

 cursor = connection.cursor()
 cursor.fast_executemany = True
 cursor.executemany(sql_statement, list_of_tuples)
 connection.commit()

 cursor.close()
 connection.close()

然后，我开始怀疑是否可以通过使用data_frame.to_sql()方法加速（或至少更具可读性）。我想出了以下解决方案：

 import sqlalchemy as sa

 engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % cnxn_str)
 data_frame.to_sql(table_name, engine, index=False)

现在，代码更具可读性，但上传 速度至少慢了150倍…

fast_executemany使用SQLAlchemy时是否可以翻转？

我正在使用pandas-0.20.3，pyODBC-4.0.21和sqlalchemy-1.1.13。

阅读 571

2020-12-20

共1个答案

小编典典

与SQLAlchemy的开发人员联系后，出现了解决此问题的方法。非常感谢他们的出色工作！

必须使用一个游标执行事件，并检查该executemany标志是否已升高。如果确实如此，请打开该fast_executemany选项。例如：

from sqlalchemy import event

@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    if executemany:
        cursor.fast_executemany = True

有关执行事件的更多信息，请参见此处。

更新： SQLAlchemy
1.3.0中添加了对fast_executemanyof的支持，因此不再需要这种技巧。pyodbc

2020-12-20