我正在尝试使用Hibernate(JPA)在5秒内在MYSQL表中插入100,000行。我尝试了hibernate提供的所有技巧,但仍然不能超过35秒。
第一次优化:我从IDENTITY序列生成器开始,这导致了60秒的插入时间。后来我放弃了序列生成器,开始@Id自己阅读MAX(id)并使用AtomicInteger.incrementAndGet()分配字段来自己分配字段。这样将插入时间减少到35秒。
@Id
MAX(id)
AtomicInteger.incrementAndGet()
第二次优化:我通过添加启用批量插入
<prop key="hibernate.jdbc.batch_size">30</prop> <prop key="hibernate.order_inserts">true</prop> <prop key="hibernate.current_session_context_class">thread</prop> <prop key="hibernate.jdbc.batch_versioned_data">true</prop>
配置。令我震惊的是,批量插入对于减少插入时间没有任何作用。仍然是35秒!
现在,我正在考虑尝试使用多个线程进行插入。有人有指针吗?我应该选择MongoDB吗?
下面是我的配置:1.hibernate配置
<bean id="entityManagerFactoryBean" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean"> <property name="dataSource" ref="dataSource" /> <property name="packagesToScan" value="com.progresssoft.manishkr" /> <property name="jpaVendorAdapter"> <bean class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter" /> </property> <property name="jpaProperties"> <props> <prop key="hibernate.hbm2ddl.auto">${hibernate.hbm2ddl.auto}</prop> <prop key="hibernate.dialect">${hibernate.dialect}</prop> <prop key="hibernate.show_sql">${hibernate.show_sql}</prop> <prop key="hibernate.format_sql">${hibernate.format_sql}</prop> <prop key="hibernate.jdbc.batch_size">30</prop> <prop key="hibernate.order_inserts">true</prop> <prop key="hibernate.current_session_context_class">thread</prop> <prop key="hibernate.jdbc.batch_versioned_data">true</prop> </props> </property> </bean> <bean class="org.springframework.jdbc.datasource.DriverManagerDataSource" id="dataSource"> <property name="driverClassName" value="${database.driver}"></property> <property name="url" value="${database.url}"></property> <property name="username" value="${database.username}"></property> <property name="password" value="${database.password}"></property> </bean> <bean id="transactionManager" class="org.springframework.orm.jpa.JpaTransactionManager"> <property name="entityManagerFactory" ref="entityManagerFactoryBean" /> </bean> <tx:annotation-driven transaction-manager="transactionManager" />
`
@Entity @Table(name = "myEntity") public class MyEntity { @Id private Integer id; @Column(name = "deal_id") private String dealId; .... .... @Temporal(TemporalType.TIMESTAMP) @Column(name = "timestamp") private Date timestamp; @Column(name = "amount") private BigDecimal amount; @OneToOne(cascade = CascadeType.ALL) @JoinColumn(name = "source_file") private MyFile sourceFile; public Deal(Integer id,String dealId, ....., Timestamp timestamp, BigDecimal amount, SourceFile sourceFile) { this.id = id; this.dealId = dealId; ... ... ... this.amount = amount; this.sourceFile = sourceFile; } public String getDealId() { return dealId; } public void setDealId(String dealId) { this.dealId = dealId; } ... ... .... public BigDecimal getAmount() { return amount; } public void setAmount(BigDecimal amount) { this.amount = amount; } .... public Integer getId() { return id; } public void setId(Integer id) { this.id = id; }
@Service @Transactional public class ServiceImpl implements MyService{ @Autowired private MyDao dao; .... `void foo(){ for(MyObject d : listOfObjects_100000){ dao.persist(d); } }
`4.道课:
@Repository public class DaoImpl implements MyDao{ @PersistenceContext private EntityManager em; public void persist(Deal deal){ em.persist(deal); } }
日志:
DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement 18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement 18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement 18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement 18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement 18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:32.906 [http-nio-8080-exec-2]
……
DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement 18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement 18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement 18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement 18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 27 18:26:34.011 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - update deal_source_file set invalid_rows=?, source_file=?, valid_rows=? where id=? 18:26:34.015 [http-nio-8080-exec-2] DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 1 18:26:34.018 [http-nio-8080-exec-2] DEBUG o.h.e.t.i.jdbc.JdbcTransaction - committed JDBC Connection 18:26:34.018 [http-nio-8080-exec-2] DEBUG o.h.e.t.i.jdbc.JdbcTransaction - re-enabling autocommit 18:26:34.032 [http-nio-8080-exec-2] DEBUG o.s.orm.jpa.JpaTransactionManager - Closing JPA EntityManager [org.hibernate.jpa.internal.EntityManagerImpl@2354fb09] after transaction 18:26:34.032 [http-nio-8080-exec-2] DEBUG o.s.o.jpa.EntityManagerFactoryUtils - Closing JPA EntityManager 18:26:34.032 [http-nio-8080-exec-2] DEBUG o.h.e.j.internal.JdbcCoordinatorImpl - HHH000420: Closing un-released batch 18:26:34.032 [http-nio-8080-exec-2] DEBUG o.h.e.j.i.LogicalConnectionImpl - Releasing JDBC connection 18:26:34.033 [http-nio-8080-exec-2] DEBUG o.h.e.j.i.LogicalConnectionImpl - Released JDBC connection
‘
在尝试了所有可能的解决方案之后,我终于找到了一种在5秒内插入100,000行的解决方案!
我尝试过的事情:
1)使用AtomicInteger用自己生成的ID替换了hibernate/数据库的AUTOINCREMENT / GENERATED ID
2)以batch_size = 50启用batch_inserts
3)在每“ batch_size”个persist()调用之后刷新缓存
4)多线程(没有尝试过这个)
最后,有效的方法是使用 本机多插入查询, 并在一个sql插入查询中插入1000行,而不是在每个实体上使用 persist() 。为了插入100,000个实体,我创建了这样的本机查询"INSERT into MyTable VALUES (x,x,x),(x,x,x).......(x,x,x)"[在一个sql插入查询中插入1000行]
"INSERT into MyTable VALUES (x,x,x),(x,x,x).......(x,x,x)"
现在,插入100,000条记录大约需要3秒钟!因此,瓶颈是orm本身!对于批量插入,似乎唯一起作用的是本机插入查询!