Friday 11 January 2013

Insert data in database in batches using Java JDBC


  executeBatch- JDBC

In this note, we will see how we can use JDBC APIs like Statement andPreparedStatement to insert data in any database in batches.
              Also we will try to explore scenarios where we can run out of memory and how to optimize the batch operation.
Firs how Insert data in database in batches using Java JDBC.

Bad Code

String [] queries = {
    "insert into employee (name, city, phone) values ('A', 'X', '123')",
    "insert into employee (name, city, phone) values ('B', 'Y', '234')",
    "insert into employee (name, city, phone) values ('C', 'Z', '345')",
};
             
Connection connection = new getConnection();
Statement statemenet = connection.createStatement();
             
for (String query : queries) {
    statemenet.execute(query);
}
statemenet.close();
connection.close();

This is the BAD code. You are executing each query separately. This hits the database for each insert statement. Consider if you want to insert 1000 records. This is not a good idea. We’ll below is the basic code to perform batch insert. Check it out:

Good Code

Connection connection = new getConnection();
Statement statemenet = connection.createStatement();
 
for (String query : queries) {
    statemenet.addBatch(query);
}
statemenet.executeBatch();

statemenet.close();
connection.close();

        we used addBatch() method of Statement, instead of directly executing the query. And after adding all the queries we executed them in one go using statement.executeBatch() method.
             Note that we have taken the queries from a String array. Instead you may want to make it dynamically. For example:

import java.sql.Connection; Nothing fancy, just a simple batch insert.
import java.sql.Statement;
//...
Connection connection = new getConnection();
Statement statemenet = connection.createStatement();
for (Employee employee: employees) {
    String query = "insert into employee (name, city) values('"
            + employee.getName() + "','" + employee.getCity + "')";
    statemenet.addBatch(query);
}
statemenet.executeBatch();
statemenet.close();
connection.close();



SQL Injection Safe Batch

Consider the following code:
import java.sql.Connection;
import java.sql.PreparedStatement;
//...
String sql = "insert into employee (name, city, phone) values (?, ?, ?)";
Connection connection = new getConnection();
PreparedStatement ps = connection.prepareStatement(sql);
for (Employee employee: employees) {
    ps.setString(1, employee.getName());
    ps.setString(2, employee.getCity());
    ps.setString(3, employee.getPhone());
    ps.addBatch();
}
ps.executeBatch();
ps.close();
connection.close();        
        Checkout the above code. We used java.sql.PreparedStatement and added insert query in the batch. 
This is the solution you must implement in your batch insert logic, instead of aboveStatement one.
         Still there is one problem with this solution. Consider a scenario where you want to insert half 
million records into database using batch. Well, that may generate OutOfMemoryError:
java.lang.OutOfMemoryError: Java heap space
    com.mysql.jdbc.ServerPreparedStatement$BatchedBindValues.<init>(ServerPreparedStatement.java:72)
    com.mysql.jdbc.ServerPreparedStatement.addBatch
        (ServerPreparedStatement.java:330)
    org.apache.commons.dbcp.DelegatingPreparedStatement.addBatch
(DelegatingPreparedStatement.java:171)
This is because you are trying to add everything in one batch and inserting once. Best idea would 
be to execute batch itself in batch. Check out the below solution.

Smart Insert: Batch within Batch

This is a simplest solution. Consider a batch size like 1000 and insert queries in the batches of
  1000 queries at a time.
String sql = "insert into employee (name, city, phone) values (?, ?, ?)";
Connection connection = new getConnection();
PreparedStatement ps = connection.prepareStatement(sql);
final int batchSize = 1000;
int count = 0;
for (Employee employee: employees) {
    ps.setString(1, employee.getName());
    ps.setString(2, employee.getCity());
    ps.setString(3, employee.getPhone());
    ps.addBatch();
     
    if(++count % batchSize == 0) {
        ps.executeBatch();
    }
}
ps.executeBatch(); // insert remaining records
ps.close();
connection.close();

This would be the idea solution. This avoids SQL Injection and also takes care of out of memory issue. 
Check how we have incremented a counter count and once it reaches batchSize which is 1000,
 we call executeBatch().

No comments:

Post a Comment