我目前面临着众所周知的常见Hibernate插入批处理问题。
我需要保存500万行的批次。我首先尝试减轻负载。由于我只需要插入2种类型的实体(首先是类型A的所有记录,然后是类型B的所有记录,都指向公共类型C的ManyToOne父级),所以我想从JDBC批处理插入中获得最大的好处。
ManyToOne
我已经阅读了许多文档,但是没有一个尝试过。
AUTO_INCREMENT
SELECT MAX(ID) FROM ENTITIES
hibernate.jdbc.batch_size
LocalSessionFactoryBean
这是我的实体
共同的父实体。这首先插入到单个事务中。我不在 这里 自动递增列。每批作业仅 一个 记录
@Entity @Table(...) @SequenceGenerator(...) public class Deal { @Id @Column( name = "DEAL_ID", nullable = false) @GeneratedValue( strategy = GenerationType.AUTO) protected Long id; ................ }
其中一个孩子(假设每批250万条记录)
@Entity @Table( name = "TA_LOANS") public class Loan { @Id @Column( name = "LOAN_ID", nullable = false) protected Long id; @ManyToOne( optional = false, targetEntity = Deal.class, fetch = FetchType.LAZY) @JoinColumn( name = "DEAL_ID", nullable = false) protected Deal deal; ............. }
其他孩子类型。假设其他250万条记录
@Entity @Table( name = "TA_BONDS") public class Bond { @Id @Column( name = "BOND_ID") @ManyToOne( fetch = FetchType.LAZY, optional = false, targetEntity = Deal.class) @JoinColumn( name = "DEAL_ID", nullable = false, updatable = false) protected Deal deal; }
插入记录的简化代码
long loanIdCounter = loanDao.getMaxId(), bondIdCounter = bondDao.getMaxId(); //Perform SELECT MAX(ID) Deal deal = null; List<Bond> bondList = new ArrayList<Bond>(COMMIT_BATCH_SIZE); //500 constant value List<Loan> loanList = new ArrayList<Loan>(COMMIT_BATCH_SIZE); for (String msg: inputStreamReader) { log.debug(msg.toString()); if (this is a deal) { Deal deal = parseDeal(msg.getMessage()); deal = dealManager.persist(holder.deal); //Called in a separate transaction using Spring annotation @Transaction(REQUIRES_NEW) } else if (this is a loan) { Loan loan = parseLoan(msg.getMessage()); loan.setId(++loanIdCounter); loan.setDeal(deal); loanList.add(loan); if (loanList.size() == COMMIT_BATCH_SIZE) { loanManager.bulkInsert(loanList); //Perform a bulk insert in a single transaction, not annotated but handled manually this time loanList.clear(); } } else if (this is a bond) { Bond bond = parseBond(msg.getMessage()); bond.setId(++bondIdCounter); bond.setDeal(deal); bondList.add(bond); if (bondList.size() == COMMIT_BATCH_SIZE) //As above { bondManager.bulkInsert(bondList); bondList.clear(); } } } if (!bondList.isEmpty()) bondManager.bulkInsert(bondList); if (!loanList.isEmpty()) loanManager.bulkInsert(loanList); //Flush remaining items, not important
实施bulkInsert:
bulkInsert
@Override public void bulkInsert(Collection<Bond> bonds) { // StatelessSession session = sessionFactory.openStatelessSession(); Session session = sessionFactory.openSession(); try { Transaction t = session.beginTransaction(); try { for (Bond bond : bonds) // session.persist(bond); // session.insert(bond); session.save(bond); } catch (RuntimeException ex) { t.rollback(); } finally { t.commit(); } } finally { session.close(); } }
从评论中可以看到,我尝试了有状态/无状态的几种组合session。没有工作。
session
我dataSource是ComboPooledDataSource具有以下网址的
dataSource
ComboPooledDataSource
<b:property name="jdbcUrl" value="jdbc:mysql://server:3306/db?autoReconnect=true&rewriteBatchedStatements=true" />
我的 SessionFactory
SessionFactory
<b:bean id="sessionFactory" class="class.that.extends.org.springframework.orm.hibernate3.LocalSessionFactoryBean" lazy-init="false" depends-on="dataSource"> <b:property name="dataSource" ref="phoenixDataSource" /> <b:property name="hibernateProperties"> <b:props> <b:prop key="hibernate.dialect">${hibernate.dialect}</b:prop> <!-- MySQL5InnoDb--> <b:prop key="hibernate.show_sql">${hibernate.showSQL}</b:prop> <b:prop key="hibernate.jdbc.batch_size">500</b:prop> <b:prop key="hibernate.jdbc.use_scrollable_resultset">false</b:prop> <b:prop key="hibernate.cache.use_second_level_cache">false</b:prop> <b:prop key="hibernate.cache.provider_class">org.hibernate.cache.EhCacheProvider</b:prop> <b:prop key="hibernate.cache.use_query_cache">false</b:prop> <b:prop key="hibernate.validator.apply_to_ddl">false</b:prop> <b:prop key="hibernate.validator.autoregister_listeners">false</b:prop> <b:prop key="hibernate.order_inserts">true</b:prop> <b:prop key="hibernate.order_updates">true</b:prop> </b:props> </b:property> </b:bean>
即使我的项目范围的类extends LocalSessionFactoryBean,它 也不会 覆盖其方法(仅添加一些项目范围的方法)
几天以来我发疯了。我读了几篇文章,但没有一篇文章使我能够批量插入。我从使用Spring上下文(因此可以使用@Autowire我的类)进行的JUnit测试运行所有代码。我所有的尝试只会产生很多单独的INSERT陈述
@Autowire
INSERT
我想念什么?
您的查询可能正在被重写,但是通过查看Hibernate SQL日志您不会知道是否。Hibernate不会重写插入语句- MySQL驱动程序会重写它们。换句话说,Hibernate将向驱动程序发送多个插入语句,然后驱动程序将重写它们。因此,Hibernate日志仅显示向主机发送的SQL Hibernate,而不是驱动程序向数据库发送的SQL。
您可以通过在连接URL中启用MySQL的profileSQL参数来验证这一点:
<b:property name="jdbcUrl" value="jdbc:mysql://server:3306/db?autoReconnect=true&rewriteBatchedStatements=true&profileSQL=true" />
使用类似于您的示例,这就是我的输出:
insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) insert into Person (firstName, lastName, id) values (?, ?, ?) Wed Feb 05 13:29:52 MST 2014 INFO: Profiler Event: [QUERY] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) duration: 1 ms, connection-id: 81, statement-id: 33, resultset-id: 0, message: insert into Person (firstName, lastName, id) values ('person1', 'Name', 1),('person2', 'Name', 2),('person3', 'Name', 3),('person4', 'Name', 4),('person5', 'Name', 5),('person6', 'Name', 6),('person7', 'Name', 7),('person8', 'Name', 8),('person9', 'Name', 9),('person10', 'Name', 10)
Hibernate记录了前10行,尽管实际上不是发送到MySQL数据库。最后一行来自MySQL驱动程序,它清楚地显示了具有多个值的单个批处理插入,这实际上是发送到MySQL数据库的内容。