Support for JDBC batching and query fetch has been added
August 14, 2010 14 Comments
JDBC batching is a very important feature which allows sending multiple commands to the database in one call. Scriptella batching is controlled by statement.batchSize parameter. The value of this parameter specifies number of statements to be combined in a batch before sending it to the database.
Please note that behavior for batching depends on the type of statements processed, as the result non-prepared SQL statements (statements without ? parameters) are processed in a single batch group different from parametrized prepared SQL statements. The grouping rules are the following:
- Normal(non-prepared) statements are always grouped in a single batch per ETL script element.
- Parameterized prepared statement use SQL text as a group key, i.e. different statements go into different groups what may sometimes introduce an undesired behavior.
As the result mixing parameterized (prepared statements) with normal statements in a single ETL element is NOT RECOMMENDED for batch-mode since statements are processed in different batch groups and results may be unexpected.
The following 2 examples show RECOMMENDED ways of batching.
Example 1. Extract->Load using PreparedStatement.setParameters/addBatch:
<connection id="out" ...> #enables batching and set batch size to 100 statement.batchSize=100 </connection> <query connection-id="in"> SELECT * from Bug <script connection-id="out"> INSERT INTO Bug VALUES (?ID, ?priority, ?summary, ?status); </script> </query>
<script connection-id="out"> INSERT INTO Bug VALUES (1, 'One'); INSERT INTO Bug VALUES (2, 'Two'); INSERT INTO Bug VALUES (3, '$Value'); .... </script>
Please pay attention to the following items when use batching:
- Queries are not supported in batch mode. Typically batches are intended for DB modifications, consider using a separate connection if you need querying.
- Batch buffers(pending SQL commands) are flushed immediately before ETL script commits, rather than after ETL element completes. You may face this problem only while querying tables being updated, which is not recommended (see item 1).
- Optimal size of batch varies between DBs and available JVM heap size, but in most cases it should not be less than 10.
Another important parameter statement.fetchSize is added to give the JDBC driver a hint about the number of rows that should be fetched from the database when more rows are needed for the result set. Examples:
<connection url="jdbc:mysql://localhost:3306/db" ...> #MySQL-specific hack #set fetchSize to Integer.MIN_VALUE to avoid loading all data into memory statement.fetchSize = -2147483648 </connection> <connection url="jdbc:oracle:thin:@localhost:1521:orcl" ...> #For performance reasons increase fetchSize statement.fetchSize = 1000 </connection>