Simple Spring Batch Example

What is Spring Batch

In this tutorial I will show you how Spring Batch works by an example. The example will import data from a CSV (Comma Separated Value) file and transform with custom code and finally saves the result into another CSV file.

Spring Batch is a lightweight, comprehensive batch framework designed to enable the development of robust batch applications vital for the daily operations of enterprise systems.

Features of Spring Batch

  • Spring Batch is a lightweight, comprehensive batch framework
  • It is designed to enable the development of robust batch applications
  • It builds on the productivity, POJO-based development approach
  • Spring Batch is not a scheduling framework
  • It is intended to work in conjunction with a scheduler but not a replacement for a scheduler.

Usages of Spring Batch

  • used to perform business operations in mission critical environments
  • used to automate the complex processing of large volume of data without user interaction
  • processes the time-based events, periodic repetitive complex processing for a large data sets
  • used to integrate the internal/external information that requires formatting, validation and processing in a transactional manner
  • used to process the parallel jobs or concurrent jobs
  • provide the functionality for manual or scheduled restart after failure

Guidelines to use Spring Batch

  • avoid building complex logical structures in a single batch application
  • keep your data close to where the batch processing occurs
  • minimize the system resource use like I/O by performing operations in internal memory wherever possible
  • cache the data after first read from database for every transaction and read cache data from next time onwards
  • avoid unnecessary scan for table or index in database
  • be specific to retrieve the data from database, i.e., retrieve the required fields only, specify WHERE clause in the SQL statement etc.
  • avoid performing the same thing multiple times in a batch processing
  • allocate enough memory before batch process starts because reallocating memory is a time-consuming matter during the batch process
  • be consistent to check and validate the data to maintain the data integrity
  • Implement check-sums for internal validation wherever possible
  • stress test should be executed at early stage for production-like environments

For more information on Theoretical parts please go to links http://docs.spring.io/spring-batch/trunk/reference/html/spring-batch-intro.html and http://spring.io/guides/gs/batch-processing/

Prerequisites

Java 8/11/12/19, Spring Boot 2.1.4 – 2.6.7/3.1.2, Maven 3.8.5

I’ll build a service that imports data from a CSV spreadsheet, transforms it with custom code, and stores the final results in another CSV spreadsheet. You can also store data in database or any persistence storage.

Project Setup

Create a maven based project in your favorite IDE or tool and you will see the required project structure gets created.

For the maven based project you can use the following pom.xml file:

Spring Boot 3.x

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>com.roytuts</groupId>
	<artifactId>spring-batch</artifactId>
	<version>0.0.1-SNAPSHOT</version>

	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
		<maven.compiler.source>19</maven.compiler.source>
		<maven.compiler.target>19</maven.compiler.target>
	</properties>

	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>3.1.2</version>
	</parent>

	<dependencies>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-batch</artifactId>
		</dependency>

		<dependency>
			<groupId>com.h2database</groupId>
			<artifactId>h2</artifactId>
		</dependency>
	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
			</plugin>
		</plugins>
	</build>
</project>

Spring Boot 2.x

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>com.roytuts</groupId>
	<artifactId>spring-batch</artifactId>
	<version>0.0.1-SNAPSHOT</version>

	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
		<maven.compiler.source>11</maven.compiler.source>
		<maven.compiler.target>11</maven.compiler.target>
	</properties>

	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>2.6.7</version>
	</parent>

	<dependencies>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-batch</artifactId>
		</dependency>

		<dependency>
			<groupId>com.h2database</groupId>
			<artifactId>h2</artifactId>
		</dependency>
	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
			</plugin>
		</plugins>
	</build>
</project>

In the above build script, I have added H2 database as a runtime dependency because it is required by Spring Batch to process the data. You can use any database, such as, MySQL, Oracle, Derby etc.

Related Posts:

VO Class

Create a business class User.java which will represent a row of data for inputs and outputs. You can instantiate the User class either with name and email through a constructor, or by setting the properties.

public class User {
	private String name;
	private String email;
	public User() {
	}
	public User(String name, String email) {
		this.name = name;
		this.email = email;
	}
	public String getName() {
		return name;
	}
	public void setName(String name) {
		this.name = name;
	}
	public String getEmail() {
		return email;
	}
	public void setEmail(String email) {
		this.email = email;
	}
	@Override
	public String toString() {
		return "name: " + name + ", email:" + email;
	}
}

ItemProcessor Class

Create an intermediate processor. A common paradigm in batch processing is to ingest data, transform it, and then pipe it out somewhere else.

Here I will write a simple transformer that converts the names to uppercase and changes the email domain.

You can implement your own business as per your needs for the application.

public class UserItemProcessor implements ItemProcessor<User, User> {
	@Override
	public User process(final User user) throws Exception {
		final String domain = "roytuts.com";
		final String name = user.getName().toUpperCase();
		final String email = user.getEmail().substring(0, user.getEmail().indexOf("@") + 1) + domain;
		final User transformedUser = new User(name, email);
		System.out.println("Converting [" + user + "] => [" + transformedUser + "]");
		return transformedUser;
	}
}

UserItemProcessor implements Spring Batch’s ItemProcessor interface. This makes it easy to wire the code into a batch job that we define further down in this guide.

According to the interface, I receive an incoming User object, after which I transform name to an upper-cased name and I replace the email domain by roytuts.com in User object.

FieldSetMapper Class

The FieldSetMapper class helps to map field or value to object.

public class UserFieldSetMapper implements FieldSetMapper<User> {
	@Override
	public User mapFieldSet(FieldSet fieldSet) throws BindException {
		User user = new User();
		user.setName(fieldSet.readString(0));
		user.setEmail(fieldSet.readString(1));
		return user;
	}
}

Spring Batch Configuration

Now I will write a batch job. I use annotation @EnableBatchProcessing for enabling memory-based batch processing meaning when processing is done, the data is gone.

I have written comments on each bean and statements so it will be easier to know what it does.

Spring Boot 3.x

In Spring Boot 3, JobBuilderFactory and StepBuilderFactory have been removed, so you need to use JobBuilder in place of JobBuilderFactory and StepBuilder in place of StepBuilderFactory. Additionally you need to use JobRepository with JobBuilder and StepBuilder and PlatformTransactionManager with chunk() method.

You also need to remove @EnableBatchProcessing annotation from the batch configuration class.

@Configuration
public class SpringBatchConfig {
	
	@Autowired
	private JobRepository jobRepository;
	
	@Autowired
	private PlatformTransactionManager platformTransactionManager;

	@Bean
	// creates an item reader
	public ItemReader<User> reader() {
		FlatFileItemReader<User> reader = new FlatFileItemReader<User>();
		// look for file user.csv
		reader.setResource(new ClassPathResource("user.csv"));
		// line mapper
		DefaultLineMapper<User> lineMapper = new DefaultLineMapper<User>();
		// each line with comma separated
		lineMapper.setLineTokenizer(new DelimitedLineTokenizer());
		// map file's field with object
		lineMapper.setFieldSetMapper(new UserFieldSetMapper());
		reader.setLineMapper(lineMapper);
		return reader;
	}

	@Bean
	// creates an instance of our UserItemProcessor for transformation
	public ItemProcessor<User, User> processor() {
		return new UserItemProcessor();
	}

	@Bean
	// creates item writer
	public ItemWriter<User> writer() {
		FlatFileItemWriter<User> writer = new FlatFileItemWriter<User>();
		// output file path
		writer.setResource(new FileSystemResource("C:/eclipse-workspace/transformed_user.csv"));
		// delete if the file already exists
		writer.setShouldDeleteIfExists(true);
		// create lines for writing to file
		DelimitedLineAggregator<User> lineAggregator = new DelimitedLineAggregator<User>();
		// delimit field by comma
		lineAggregator.setDelimiter(",");
		// extract field from ItemReader
		BeanWrapperFieldExtractor<User> fieldExtractor = new BeanWrapperFieldExtractor<User>();
		// use User object's properties
		fieldExtractor.setNames(new String[] { "name", "email" });
		lineAggregator.setFieldExtractor(fieldExtractor);
		// write whole data
		writer.setLineAggregator(lineAggregator);
		return writer;
	}

	@Bean
	// define job which is built from step
	public Job importUserJob(Step step) {
		// need incrementer to maintain execution state
		return new JobBuilder("importUserJob", jobRepository).incrementer(new RunIdIncrementer()).flow(step).end()
				.build();
	}

	@Bean
	// define step
	public Step step1(ItemReader<User> reader, ItemWriter<User> writer, ItemProcessor<User, User> processor) {
		// chunk uses how much data to write at a time
		// In this case, it writes up to five records at a time.
		// Next, we configure the reader, processor, and writer
		return new StepBuilder("step1", jobRepository).<User, User>chunk(5, platformTransactionManager).reader(reader)
				.processor(processor).writer(writer).build();
	}

}

Spring Boot 2.x

@Configuration
@EnableBatchProcessing
public class SpringBatchConfig {
	@Bean
	// creates an item reader
	public ItemReader<User> reader() {
		FlatFileItemReader<User> reader = new FlatFileItemReader<User>();
		// look for file user.csv
		reader.setResource(new ClassPathResource("user.csv"));
		// line mapper
		DefaultLineMapper<User> lineMapper = new DefaultLineMapper<User>();
		// each line with comma separated
		lineMapper.setLineTokenizer(new DelimitedLineTokenizer());
		// map file's field with object
		lineMapper.setFieldSetMapper(new UserFieldSetMapper());
		reader.setLineMapper(lineMapper);
		return reader;
	}
	@Bean
	// creates an instance of our UserItemProcessor for transformation
	public ItemProcessor<User, User> processor() {
		return new UserItemProcessor();
	}
	@Bean
	// creates item writer
	public ItemWriter<User> writer() {
		FlatFileItemWriter<User> writer = new FlatFileItemWriter<User>();
		// output file path
		writer.setResource(new FileSystemResource("C:/workspace/transformed_user.csv"));
		// delete if the file already exists
		writer.setShouldDeleteIfExists(true);
		// create lines for writing to file
		DelimitedLineAggregator<User> lineAggregator = new DelimitedLineAggregator<User>();
		// delimit field by comma
		lineAggregator.setDelimiter(",");
		// extract field from ItemReader
		BeanWrapperFieldExtractor<User> fieldExtractor = new BeanWrapperFieldExtractor<User>();
		// use User object's properties
		fieldExtractor.setNames(new String[] { "name", "email" });
		lineAggregator.setFieldExtractor(fieldExtractor);
		// write whole data
		writer.setLineAggregator(lineAggregator);
		return writer;
	}
	@Bean
	// define job which is built from step
	public Job importUserJob(JobBuilderFactory jobs, Step step) {
		// need incrementer to maintain execution state
		return jobs.get("importUserJob").incrementer(new RunIdIncrementer()).flow(step).end().build();
	}
	@Bean
	// define step
	public Step step1(StepBuilderFactory stepBuilderFactory, ItemReader<User> reader, ItemWriter<User> writer,
			ItemProcessor<User, User> processor) {
		// chunk uses how much data to write at a time
		// In this case, it writes up to five records at a time.
		// Next, I configure the reader, processor, and writer
		return stepBuilderFactory.get("step1").<User, User>chunk(5).reader(reader).processor(processor).writer(writer)
				.build();
	}
}

Spring Boot Main Class

This batch processing can be embedded in web apps also, but in this Spring Boot example, I will create a main class to run the application. You can also create an executable jar from it.

@SpringBootApplication
public class SpringBatch {
	public static void main(String[] args) {
		SpringApplication.run(SpringBatch.class, args);
	}
}

Testing Spring Batch Application

Run the above main class, you will see the following output in the console.

Input csv file can be found here below:

You will also get the output file -> transformed_user.csv.

Converting [name: soumitra, email:soumitra@gmail.com] => [name: SOUMITRA, email:soumitra@roytuts.com]
Converting [name: soumitra, email:soumitra1@roytuts.com] => [name: SOUMITRA, email:soumitra1@roytuts.com]
Converting [name: liton, email:liton@gmail.com] => [name: LITON, email:liton@roytuts.com]
Converting [name: john, email:jhon@gmail.com] => [name: JOHN, email:jhon@roytuts.com]
Converting [name: sumit, email:sumit@gmail.com] => [name: SUMIT, email:sumit@roytuts.com]
Converting [name: souvik, email:souvik@gmail.com] => [name: SOUVIK, email:souvik@roytuts.com]
Converting [name: debabrata, email:debabrata@gmail.com] => [name: DEBABRATA, email:debabrata@roytuts.com]
Converting [name: debina, email:debina@gmail.com] => [name: DEBINA, email:debina@roytuts.com]
Converting [name: sushil, email:sushil@gmail.com] => [name: SUSHIL, email:sushil@roytuts.com]
Converting [name: francois, email:francois@yahoo.com] => [name: FRANCOIS, email:francois@roytuts.com]
Converting [name: kanimozi, email:kanimozi@gmail.com] => [name: KANIMOZI, email:kanimozi@roytuts.com]
Converting [name: subodh, email:subodh@hotmail.com] => [name: SUBODH, email:subodh@roytuts.com]

Hope you got an idea how Spring Batch works.

Source Code

Download

Leave a Reply

Your email address will not be published. Required fields are marked *