What Is Spring Batch?

Spring Batch Overview

  • Spring Batch is one of the projects created by Pivotal, the company behind Spring.
  • It is a framework for extracting work previously handled by schedulers into large-scale batch jobs.
  • Existing scheduled jobs made it difficult to inspect logs.
  • Spring Batch supports bulk data processing.
  • You can create tables for Spring Batch metadata using a schema such as mysql.sql.

Basic Spring Batch Concepts

Job

  • A Job is the largest unit of batch work and the unit of execution.
  • You can register a Job as a bean and execute it with parameters.
  • You can create multiple Jobs.
  • A Job consists of one or more Steps. Unless the Job is very complex, using 2 to 10 Steps is recommended.

Step

  • A Step is a unit of work within a Job.
  • A Step supports Tasklet processing and chunk-oriented processing with a reader, processor, and writer.
  • A Step groups reading, processing, and writing. This is called chunk processing and can be understood as a transaction. It is also central to restarting a job.

Tasklet

  • A Tasklet is a unit of work within a Step.

Chunk

  • A chunk is the number of rows committed together.
  • Because a batch transaction runs by chunk, a failure rolls back the current chunk.
  • Chunk-oriented processing has the following three stages:
    • Read: Load data to process from the database.
    • Process: Transform the loaded data. This stage is optional.
    • Write: Save the processed data to the database.

Chunk and Page

  • Page: Fetch a fixed number of items from the data to process.
  • Chunk: Process and write a fixed number of fetched items.
  • Setting page = chunk * n is considered efficient. A common setting is page = chunk.

ItemReader

  • An ItemReader reads data and is required.
  • Major implementations:
    • CursorItemReader: Processes one item at a time using a stream.
    • PagingItemReader: Fetches and processes items by page size.

ItemProcessor

  • An ItemProcessor receives an object from the ItemReader, transforms it, and passes it to the ItemWriter one item at a time.
  • It is optional.
  • Major implementation:
    • CompositeItemProcessor: Chains processors and runs them sequentially.

ItemWriter

  • An ItemWriter collects items passed from an ItemReader or ItemProcessor and saves them.
  • It is required.
  • Implementations include:
    • CompositeItemWriter
    • FlatFileItemWriter
    • HibernateItemWriter
    • JdbcBatchItemWriter
    • JsonFileItemWriter
    • MongoItemWriter

JobLauncher

A JobLauncher executes a Job.

JobRepository

A JobRepository is an interface that manages metadata for batch work such as Jobs and Steps. Metadata management is one of the core features provided by Spring Batch.

Spring Batch Metadata Schema

Spring Batch has six metadata tables and three sequences. Each Job execution stores information in these structures.

Normally Spring Batch cannot run without metadata tables. You can customize it to run without them, but the tables are generally necessary in production to inspect execution and failure history.

spring-batch-core includes DBMS-specific metadata schemas named schema-{DBMS}.sql. This section examines the MySQL schema, schema-mysql.sql. Because MySQL does not provide sequences, tables are also created to serve as sequences.

BATCH_JOB_INSTANCE

BATCH_JOB_INSTANCE contains information about JobInstances and is the top level of the hierarchy.

CREATE TABLE BATCH_JOB_INSTANCE  (
    JOB_INSTANCE_ID BIGINT  NOT NULL PRIMARY KEY ,
    VERSION BIGINT ,
    JOB_NAME VARCHAR(100) NOT NULL,
    JOB_KEY VARCHAR(32) NOT NULL,
    constraint JOB_INST_UN unique (JOB_NAME, JOB_KEY)
) ENGINE=InnoDB;

Its primary key is generated by BATCH_JOB_SEQ.

CREATE TABLE BATCH_JOB_SEQ (
    ID BIGINT NOT NULL,
    UNIQUE_KEY CHAR(1) NOT NULL,
    constraint UNIQUE_KEY_UN unique (UNIQUE_KEY)
) ENGINE=InnoDB;

INSERT INTO BATCH_JOB_SEQ (ID, UNIQUE_KEY) select * from (select 0 as ID, '0' as UNIQUE_KEY) as tmp where not exists(select * from BATCH_JOB_SEQ);

BATCH_JOB_EXECUTION

BATCH_JOB_EXECUTION stores information about JobExecutions, including start time, end time, and exit code for each execution of a JobInstance.

CREATE TABLE BATCH_JOB_EXECUTION  (
    JOB_EXECUTION_ID BIGINT  NOT NULL PRIMARY KEY ,
    VERSION BIGINT  ,
    JOB_INSTANCE_ID BIGINT NOT NULL,
    CREATE_TIME DATETIME(6) NOT NULL,
    START_TIME DATETIME(6) DEFAULT NULL ,
    END_TIME DATETIME(6) DEFAULT NULL ,
    STATUS VARCHAR(10) ,
    EXIT_CODE VARCHAR(2500) ,
    EXIT_MESSAGE VARCHAR(2500) ,
    LAST_UPDATED DATETIME(6),
    JOB_CONFIGURATION_LOCATION VARCHAR(2500) NULL,
    constraint JOB_INST_EXEC_FK foreign key (JOB_INSTANCE_ID)
    references BATCH_JOB_INSTANCE(JOB_INSTANCE_ID)
) ENGINE=InnoDB;

Its primary key is generated by BATCH_JOB_EXECUTION_SEQ.

CREATE TABLE BATCH_JOB_EXECUTION_SEQ (
    ID BIGINT NOT NULL,
    UNIQUE_KEY CHAR(1) NOT NULL,
    constraint UNIQUE_KEY_UN unique (UNIQUE_KEY)
) ENGINE=InnoDB;

INSERT INTO BATCH_JOB_EXECUTION_SEQ (ID, UNIQUE_KEY) select * from (select 0 as ID, '0' as UNIQUE_KEY) as tmp where not exists(select * from BATCH_JOB_EXECUTION_SEQ);

BATCH_JOB_EXECUTION_PARAMS

BATCH_JOB_EXECUTION_PARAMS stores the JobParameters used to execute a Job.

CREATE TABLE BATCH_JOB_EXECUTION_PARAMS  (
    JOB_EXECUTION_ID BIGINT NOT NULL ,
    TYPE_CD VARCHAR(6) NOT NULL ,
    KEY_NAME VARCHAR(100) NOT NULL ,
    STRING_VAL VARCHAR(250) ,
    DATE_VAL DATETIME(6) DEFAULT NULL ,
    LONG_VAL BIGINT ,
    DOUBLE_VAL DOUBLE PRECISION ,
    IDENTIFYING CHAR(1) NOT NULL ,
    constraint JOB_EXEC_PARAMS_FK foreign key (JOB_EXECUTION_ID)
    references BATCH_JOB_EXECUTION(JOB_EXECUTION_ID)
) ENGINE=InnoDB;

BATCH_STEP_EXECUTION

BATCH_STEP_EXECUTION stores StepExecution information. It resembles BATCH_JOB_EXECUTION and additionally stores information such as read, commit, and skip counts.

CREATE TABLE BATCH_STEP_EXECUTION  (
    STEP_EXECUTION_ID BIGINT  NOT NULL PRIMARY KEY ,
    VERSION BIGINT NOT NULL,
    STEP_NAME VARCHAR(100) NOT NULL,
    JOB_EXECUTION_ID BIGINT NOT NULL,
    START_TIME DATETIME(6) NOT NULL ,
    END_TIME DATETIME(6) DEFAULT NULL ,
    STATUS VARCHAR(10) ,
    COMMIT_COUNT BIGINT ,
    READ_COUNT BIGINT ,
    FILTER_COUNT BIGINT ,
    WRITE_COUNT BIGINT ,
    READ_SKIP_COUNT BIGINT ,
    WRITE_SKIP_COUNT BIGINT ,
    PROCESS_SKIP_COUNT BIGINT ,
    ROLLBACK_COUNT BIGINT ,
    EXIT_CODE VARCHAR(2500) ,
    EXIT_MESSAGE VARCHAR(2500) ,
    LAST_UPDATED DATETIME(6),
    constraint JOB_EXEC_STEP_FK foreign key (JOB_EXECUTION_ID)
    references BATCH_JOB_EXECUTION(JOB_EXECUTION_ID)
) ENGINE=InnoDB;

Its primary key is generated by BATCH_STEP_EXECUTION_SEQ.

CREATE TABLE BATCH_STEP_EXECUTION_SEQ (
    ID BIGINT NOT NULL,
    UNIQUE_KEY CHAR(1) NOT NULL,
    constraint UNIQUE_KEY_UN unique (UNIQUE_KEY)
) ENGINE=InnoDB;

INSERT INTO BATCH_STEP_EXECUTION_SEQ (ID, UNIQUE_KEY) select * from (select 0 as ID, '0' as UNIQUE_KEY) as tmp where not exists(select * from BATCH_STEP_EXECUTION_SEQ);

BATCH_JOB_EXECUTION_CONTEXT

BATCH_JOB_EXECUTION_CONTEXT stores JobExecution context information. This context generally contains the information needed to restart a failed JobInstance from the point where it stopped.

CREATE TABLE BATCH_STEP_EXECUTION_CONTEXT  (
    STEP_EXECUTION_ID BIGINT NOT NULL PRIMARY KEY,
    SHORT_CONTEXT VARCHAR(2500) NOT NULL,
    SERIALIZED_CONTEXT TEXT ,
    constraint STEP_EXEC_CTX_FK foreign key (STEP_EXECUTION_ID)
    references BATCH_STEP_EXECUTION(STEP_EXECUTION_ID)
) ENGINE=InnoDB;

BATCH_STEP_EXECUTION_CONTEXT

BATCH_STEP_EXECUTION_CONTEXT stores StepExecution context information. This context generally contains the information needed to restart a failed JobInstance from the point where it stopped.

CREATE TABLE BATCH_JOB_EXECUTION_CONTEXT  (
    JOB_EXECUTION_ID BIGINT NOT NULL PRIMARY KEY,
    SHORT_CONTEXT VARCHAR(2500) NOT NULL,
    SERIALIZED_CONTEXT TEXT ,
    constraint JOB_EXEC_CTX_FK foreign key (JOB_EXECUTION_ID)
    references BATCH_JOB_EXECUTION(JOB_EXECUTION_ID)
) ENGINE=InnoDB;