我已经从网络资源中抓取了一些数据并将其全部存储在pandas DataFrame中。现在,为了利用SQLAlchemy提供的强大的db工具,我想将所说的DataFrame转换为Table()对象,并最终将所有数据上载到PostgreSQL表中。如果可行,那么完成此任务的可行方法是什么?
如果您使用的是PostgreSQL 9.5或更高版本,则可以使用临时表和一条INSERT … ON CONFLICT语句执行UPSERT :
with engine.begin() as conn: # step 0.0 - create test environment conn.execute(sa.text("DROP TABLE IF EXISTS main_table")) conn.execute( sa.text( "CREATE TABLE main_table (id int primary key, txt varchar(50))" ) ) conn.execute( sa.text( "INSERT INTO main_table (id, txt) VALUES (1, 'row 1 old text')" ) ) # step 0.1 - create DataFrame to UPSERT df = pd.DataFrame( [(2, "new row 2 text"), (1, "row 1 new text")], columns=["id", "txt"] ) # step 1 - create temporary table and upload DataFrame conn.execute( sa.text( "CREATE TEMPORARY TABLE temp_table (id int primary key, txt varchar(50))" ) ) df.to_sql("temp_table", conn, index=False, if_exists="append") # step 2 - merge temp_table into main_table conn.execute( sa.text("""\ INSERT INTO main_table (id, txt) SELECT id, txt FROM temp_table ON CONFLICT (id) DO UPDATE SET txt = EXCLUDED.txt """ ) ) # step 3 - confirm results result = conn.execute(sa.text("SELECT * FROM main_table ORDER BY id")).fetchall() print(result) # [(1, 'row 1 new text'), (2, 'new row 2 text')]