Spring Batch – Convert CSV To XML File

CSV To XML File Conversion

This tutorial will show you how to read a csv file data and write to xml file with some modification. I will build a service that imports data from a CSV file, transforms it with custom code, and stores the final results in XML file.

You can read the tutorial Spring Batch to read what is Spring Batch and what are the usages of Spring Batch.

Related Posts:

Prerequisites

Spring Boot 2.1.4/2.6.7/3.1.2, Java 8/11/19, Maven 3.8.5

Spring Batch

Project Setup

Create a maven based project in your favorite IDE or tool and you will see the required project structure gets created.

Spring Boot 3.x

For Spring Boot 3.x version you can use the following pom.xml file:

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>com.roytuts</groupId>
	<artifactId>spring-batch-csv-to-xml</artifactId>
	<version>0.0.1-SNAPSHOT</version>

	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
		<maven.compiler.source>19</maven.compiler.source>
		<maven.compiler.target>19</maven.compiler.target>
	</properties>

	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>3.1.2</version>
	</parent>

	<dependencies>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-batch</artifactId>
		</dependency>

		<dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-oxm</artifactId>
		</dependency>

		<dependency>
			<groupId>jakarta.xml.bind</groupId>
			<artifactId>jakarta.xml.bind-api</artifactId>
		</dependency>

		<dependency>
			<groupId>org.glassfish.jaxb</groupId>
			<artifactId>jaxb-runtime</artifactId>
		</dependency>

		<dependency>
			<groupId>com.h2database</groupId>
			<artifactId>h2</artifactId>
		</dependency>
	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
			</plugin>
		</plugins>
	</build>
</project>

Spring Boot 2.x

For maven based project you can use the following pom.xml file:

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>com.roytuts</groupId>
	<artifactId>spring-batch-csv-to-xml</artifactId>
	<version>0.0.1-SNAPSHOT</version>

	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
		<maven.compiler.source>11</maven.compiler.source>
		<maven.compiler.target>11</maven.compiler.target>
	</properties>

	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>2.6.7</version>
	</parent>

	<dependencies>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-batch</artifactId>
		</dependency>

		<dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-oxm</artifactId>
		</dependency>

		<dependency>
			<groupId>jakarta.xml.bind</groupId>
			<artifactId>jakarta.xml.bind-api</artifactId>
		</dependency>

		<dependency>
			<groupId>org.glassfish.jaxb</groupId>
			<artifactId>jaxb-runtime</artifactId>
		</dependency>

		<dependency>
			<groupId>com.h2database</groupId>
			<artifactId>h2</artifactId>
		</dependency>
	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
			</plugin>
		</plugins>
	</build>
</project>

In the above build script I have added spring-oxm to get benefits of JAXB classes for generating XML file from Java POJO class.

VO Class

Create a model class Person.java which will represent a row of data for inputs and outputs. I have made the below class JAXB annotation enabled for converting Java object to XML file directly.

Remember The import statements will be different according to the Spring Boot 3.x or 2.x version.

@XmlRootElement(name = "person")
public class Person {
	private int id;
	private String firstName;
	private String lastName;
	@XmlAttribute(name = "id")
	public int getId() {
		return id;
	}
	public void setId(int id) {
		this.id = id;
	}
	@XmlElement(name = "firstName")
	public String getFirstName() {
		return firstName;
	}
	public void setFirstName(String firstName) {
		this.firstName = firstName;
	}
	@XmlElement(name = "lastName")
	public String getLastName() {
		return lastName;
	}
	public void setLastName(String lastName) {
		this.lastName = lastName;
	}
	@Override
	public String toString() {
		return "Person [id=" + id + ", firstName=" + firstName + ", lastName=" + lastName + "]";
	}
}

FieldSetMapper Class

Create below mapper class which will map the CSV file row item to Java object.

public class PersonFieldSetMapper implements FieldSetMapper<Person> {
	@Override
	public Person mapFieldSet(FieldSet fieldSet) {
		Person person = new Person();
		person.setId(fieldSet.readInt(0));
		person.setFirstName(fieldSet.readString(1));
		person.setLastName(fieldSet.readString(2));
		return person;
	}
}

ItemProcessor Class

Create an intermediate processor. A common paradigm in batch processing is to ingest data, transform it, and then pipe it out somewhere else. Here I write a simple transformer that converts the initial characters of the names to uppercase.

public class PersonItemProcessor implements ItemProcessor<Person, Person> {
	@Override
	public Person process(Person person) throws Exception {
		System.out.println("Processing: " + person);
		final String initCapFirstName = person.getFirstName().substring(0, 1).toUpperCase()
				+ person.getFirstName().substring(1);
		final String initCapLastName = person.getLastName().substring(0, 1).toUpperCase()
				+ person.getLastName().substring(1);
		Person transformedPerson = new Person();
		transformedPerson.setId(person.getId());
		transformedPerson.setFirstName(initCapFirstName);
		transformedPerson.setLastName(initCapLastName);
		return transformedPerson;
	}
}

Input CSV File

Create below CSV file under classpath directory – src/main/resources.

1000,soumitra,roy
1001,souvik,sanyal
1002,arup,chatterjee
1003,suman,mukherjee
1004,debina,guha
1005,liton,sarkar
1006,debabrata,poddar

Configuration Class

I have created this Spring Configuration class to define several beans for Spring Batch processing.

I have defined beans, such as, ItemProcessor, TransactionManager, JobRepository, DataSource, JobLauncher, Step, Job etc. for our Spring Batch processing.

I used Jaxb2Marshaller to marshall Java POJO to XML. I write the final output into an XML file (person.xml).

The spring boot configuration class is different in spring boot 3.x and 2.x versions.

Spring Boot 3.x

In spring boot 3.x you don’t need to use @EnableBatchProcessing annotation on the configuration class.

The RunIdIncrementer() has not been used for getting the unique run id, instead setIncrementerFactory() method has been used while creating the JobRepository instance. Otherwise you will get the following exception:

Failed to instantiate [org.springframework.batch.core.repository.JobRepository]: Factory method 'jobRepository' threw exception with message: Cannot invoke "org.springframework.batch.item.database.support.DataFieldMaxValueIncrementerFactory.getIncrementer(String, String)" because "this.incrementerFactory" is null

The JobBuilderFactory and StepBuilderFactory have been removed and you can now use JobBuilder and StepBuilder to create respective instances with JobRepository and PlatformTransactionManager instances.

@Configuration
public class SpringBatchConfig {

	@Bean
	@Scope(value = BeanDefinition.SCOPE_PROTOTYPE)
	public Person person() {
		return new Person();
	}

	@Bean
	@Scope(value = BeanDefinition.SCOPE_PROTOTYPE)
	public ItemProcessor<Person, Person> itemProcessor() {
		return new PersonItemProcessor();
	}

	@Bean
	public DataSource dataSource() {
		EmbeddedDatabaseBuilder embeddedDatabaseBuilder = new EmbeddedDatabaseBuilder();
		return embeddedDatabaseBuilder.addScript("classpath:org/springframework/batch/core/schema-drop-h2.sql")
				.addScript("classpath:org/springframework/batch/core/schema-h2.sql").setType(EmbeddedDatabaseType.H2)
				.build();
	}

	@Bean
	public PlatformTransactionManager txManager() {
		return new ResourcelessTransactionManager();
	}

	@Bean
	public JobRepository jbRepository(DataSource dataSource, PlatformTransactionManager transactionManager)
			throws Exception {
		JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
		factory.setDatabaseType(DatabaseType.H2.getProductName());
		factory.setDataSource(dataSource);
		factory.setTransactionManager(transactionManager);
		factory.setIncrementerFactory(new DefaultDataFieldMaxValueIncrementerFactory(dataSource) {
			@Override
			public DataFieldMaxValueIncrementer getIncrementer(String incrementerType, String incrementerName) {
				return new SqlServerSequenceMaxValueIncrementer(dataSource, incrementerName);
			}
		});
		factory.afterPropertiesSet();
		return factory.getObject();
	}

	@Bean
	public JobLauncher jbLauncher(JobRepository jobRepository) {
		TaskExecutorJobLauncher jobLauncher = new TaskExecutorJobLauncher();
		jobLauncher.setJobRepository(jobRepository);
		return jobLauncher;
	}

	@Bean
	public Jaxb2Marshaller jaxb2Marshaller() {
		Jaxb2Marshaller jaxb2Marshaller = new Jaxb2Marshaller();
		jaxb2Marshaller.setClassesToBeBound(Person.class);

		return jaxb2Marshaller;
	}

	@Bean
	public FlatFileItemReader<Person> fileItemReader() {
		FlatFileItemReader<Person> fileItemReader = new FlatFileItemReader<>();
		fileItemReader.setResource(new ClassPathResource("person.csv"));

		DelimitedLineTokenizer delimitedLineTokenizer = new DelimitedLineTokenizer();
		delimitedLineTokenizer.setNames("id", "firstName", "lastName");

		DefaultLineMapper<Person> defaultLineMapper = new DefaultLineMapper<>();
		defaultLineMapper.setLineTokenizer(delimitedLineTokenizer);
		defaultLineMapper.setFieldSetMapper(new PersonFieldSetMapper());

		fileItemReader.setLineMapper(defaultLineMapper);

		return fileItemReader;
	}

	@Bean(destroyMethod = "")
	public StaxEventItemWriter<Person> staxEventItemWriter(Jaxb2Marshaller marshaller) {
		StaxEventItemWriter<Person> staxEventItemWriter = new StaxEventItemWriter<>();
		staxEventItemWriter.setResource(new FileSystemResource("C:/eclipse-workspace/person.xml"));
		staxEventItemWriter.setMarshaller(marshaller);
		staxEventItemWriter.setRootTagName("personInfo");

		return staxEventItemWriter;
	}

	@Bean
	public Job jobCsvXml(JobRepository jobRepository, Step step) {
		// return new JobBuilder("jobCsvXml", jobRepository).start(step).build(); //or
		return new JobBuilder("jobCsvXml", jobRepository).flow(step).end().build();
	}

	@Bean
	public Step step1(JobRepository jobRepository, ResourcelessTransactionManager transactionManager,
			ItemReader<Person> reader, ItemWriter<Person> writer, ItemProcessor<Person, Person> processor) {
		return new StepBuilder("step1", jobRepository).<Person, Person>chunk(2, transactionManager).reader(reader)
				.processor(processor).writer(writer).build();
	}

}

Spring Boot 2.x

Note I have used (destroyMethod = "") on StaxEventItemWriter() to resolve the below error:

Destroy method 'close' on bean with name 'staxEventItemWriter' threw an exception: java.lang.NullPointerException
@Configuration
@EnableBatchProcessing
public class SpringBatchConfig {
	@Bean
	@Scope(value = BeanDefinition.SCOPE_PROTOTYPE)
	public Person person() {
		return new Person();
	}
	@Bean
	@Scope(value = BeanDefinition.SCOPE_PROTOTYPE)
	public ItemProcessor<Person, Person> itemProcessor() {
		return new PersonItemProcessor();
	}
	@Bean
	public DataSource dataSource() {
		EmbeddedDatabaseBuilder embeddedDatabaseBuilder = new EmbeddedDatabaseBuilder();
		return embeddedDatabaseBuilder.addScript("classpath:org/springframework/batch/core/schema-drop-h2.sql")
				.addScript("classpath:org/springframework/batch/core/schema-h2.sql").setType(EmbeddedDatabaseType.H2)
				.build();
	}
	@Bean
	public ResourcelessTransactionManager txManager() {
		return new ResourcelessTransactionManager();
	}
	@Bean
	public JobRepository jbRepository(DataSource dataSource, ResourcelessTransactionManager transactionManager)
			throws Exception {
		JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
		factory.setDatabaseType(DatabaseType.H2.getProductName());
		factory.setDataSource(dataSource);
		factory.setTransactionManager(transactionManager);
		return factory.getObject();
	}
	@Bean
	public JobLauncher jbLauncher(JobRepository jobRepository) {
		SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
		jobLauncher.setJobRepository(jobRepository);
		return jobLauncher;
	}
	@Bean
	public Jaxb2Marshaller jaxb2Marshaller() {
		Jaxb2Marshaller jaxb2Marshaller = new Jaxb2Marshaller();
		jaxb2Marshaller.setClassesToBeBound(Person.class);
		return jaxb2Marshaller;
	}
	@Bean
	public FlatFileItemReader<Person> fileItemReader() {
		FlatFileItemReader<Person> fileItemReader = new FlatFileItemReader<>();
		fileItemReader.setResource(new ClassPathResource("person.csv"));
		DelimitedLineTokenizer delimitedLineTokenizer = new DelimitedLineTokenizer();
		delimitedLineTokenizer.setNames("id", "firstName", "lastName");
		DefaultLineMapper<Person> defaultLineMapper = new DefaultLineMapper<>();
		defaultLineMapper.setLineTokenizer(delimitedLineTokenizer);
		defaultLineMapper.setFieldSetMapper(new PersonFieldSetMapper());
		fileItemReader.setLineMapper(defaultLineMapper);
		return fileItemReader;
	}
	@Bean(destroyMethod = "")
	public StaxEventItemWriter<Person> staxEventItemWriter(Jaxb2Marshaller marshaller) {
		StaxEventItemWriter<Person> staxEventItemWriter = new StaxEventItemWriter<>();
		staxEventItemWriter.setResource(new FileSystemResource("C:/workspace/person.xml"));
		staxEventItemWriter.setMarshaller(marshaller);
		staxEventItemWriter.setRootTagName("personInfo");
		return staxEventItemWriter;
	}
	@Bean
	public Job jobCsvXml(JobBuilderFactory jobBuilderFactory, Step step) {
		return jobBuilderFactory.get("jobCsvXml").incrementer(new RunIdIncrementer()).flow(step).end().build();
	}
	@Bean
	public Step step1(StepBuilderFactory stepBuilderFactory, ResourcelessTransactionManager transactionManager,
			ItemReader<Person> reader, ItemWriter<Person> writer, ItemProcessor<Person, Person> processor) {
		return stepBuilderFactory.get("step1").transactionManager(transactionManager).<Person, Person>chunk(2)
				.reader(reader).processor(processor).writer(writer).build();
	}
}

I have used here in-memory H2 database for our Spring Batch processing. It initializes the database for automatically creating spring batch job meta tables in H2 database.

You can also use other database, such as, MySQL, Oracle etc.

Note here I am including SQL scripts while configuring datasource for H2 database but these SQL scripts are loaded from Spring API. So you don’t need to create any SQL scripts here.

A default simple implementation of the Job interface is provided by Spring Batch in the form of the SimpleJob class which creates some standard functionality on top of Job, however the batch namespace abstracts away the need to instantiate it directly.

A Step is a domain object that encapsulates an independent, sequential phase of a batch job. Therefore, every Job is composed entirely of one or more steps. A Step contains all of the information necessary to define and control the actual batch processing.

ItemReader is an abstraction that represents the retrieval of input for a Step, one item at a time.

ItemWriter is an abstraction that represents the output of a Step, one batch or chunk of items at a time. Generally, an item writer has no knowledge of the input it will receive next, only the item that was passed in its current invocation.

ItemProcessor is an abstraction that represents the business processing of an item. While the ItemReader reads one item, and the ItemWriter writes them, the ItemProcessor provides access to transform or apply other business processing. If, while processing the item, it is determined that the item is not valid, returning null indicates that the item should not be written out.

TransactionManager – Spring’s that will be used to begin and commit transactions during processing.

Chunk – The number of items that will be processed before the transaction is committed.

JobRepository is the persistence mechanism. It provides CRUD operations for JobLauncher, Job and Step implementations. When a Job is first launched, a JobExecution is obtained from the repository, and during the course of execution StepExecution and JobExecution implementations are persisted by passing them to the repository.

JonLauncher represents a simple interface for launching a Job with a given set of JobParameters.

Spring Boot Main Class

Create below class for launching spring batch job. This will run the spring boot application.

@SpringBootApplication
public class SpringBatch {
	public static void main(String[] args) {
		SpringApplication.run(SpringBatch.class, args);
	}
}

Testing Spring Batch CSV to XML Conversion

Run the above class, you will see the below output.

Processing: Person [id=1000, firstName=soumitra, lastName=roy]
Processing: Person [id=1001, firstName=souvik, lastName=sanyal]
Processing: Person [id=1002, firstName=arup, lastName=chatterjee]
Processing: Person [id=1003, firstName=suman, lastName=mukherjee]
Processing: Person [id=1004, firstName=debina, lastName=guha]
Processing: Person [id=1005, firstName=liton, lastName=sarkar]
Processing: Person [id=1006, firstName=debabrata, lastName=poddar]

In the above output you see the job name, step name and also which row item from csv file is being processed.

MySQL Database

If you use other than in-memory database, such as, MySQL, Oracle etc. then you can also see the SQL scripts have been executed and below tables have been created in the MySQL database with job details.

spring batch csv xml

You need to make the following changes in order to work with MySQL database in Spring Batch application.

First comment out the h2 database and add dependency for MySQL database in the pom.xml file:

<dependencies>
	...

	<!--<dependency>
		<groupId>com.h2database</groupId>
		<artifactId>h2</artifactId>
	</dependency>-->
	
	<dependency>
		<groupId>mysql</groupId>
		<artifactId>mysql-connector-java</artifactId>
		<version>8.0.31</version>
	</dependency>
</dependencies>

Add application.properties file with the following datasource details under src/main/resources folder:

spring.datasource.driverClassName=com.mysql.cj.jdbc.Driver
spring.datasource.url=jdbc:mysql://localhost/roytuts
spring.datasource.username=root
spring.datasource.password=root

spring.batch.jdbc.initialize-schema=always

The key/value pair spring.batch.jdbc.initialize-schema=always creates the required tables in the database server otherwise you will see error bad SQL grammer - table or view does not exist.

You need to comment out the DataSource bean defined for h2 database and add for MySQL database:

@Configuration
public class SpringBatchConfig {
	
	@Autowired
	private Environment environment;
	
	...

	/*@Bean
	public DataSource dataSource() {
		EmbeddedDatabaseBuilder embeddedDatabaseBuilder = new EmbeddedDatabaseBuilder();
		return embeddedDatabaseBuilder.addScript("classpath:org/springframework/batch/core/schema-drop-h2.sql")
				.addScript("classpath:org/springframework/batch/core/schema-h2.sql").setType(EmbeddedDatabaseType.H2)
				.build();
	}*/
	
	@Bean
	public DataSource dataSource() {
		return DataSourceBuilder.create().driverClassName(environment.getProperty("spring.datasource.driverClassName"))
				.url(environment.getProperty("spring.datasource.url"))
				.username(environment.getProperty("spring.datasource.username"))
				.password(environment.getProperty("spring.datasource.password")).build();
	}
	
	...

}

Hope you got an idea how to configure RDBMS other than H2 database in spring batch application.

Source Code

Download

Leave a Reply

Your email address will not be published. Required fields are marked *