Handling large data writing to Excel using SXSSF Apache POI

Large Data Writing to Excel

This tutorial will show you an example on handling large data writing to excel using sxssf Apache POI library. Having said that I will show you here how to write large data set to an Excel file using Apache POI using SXSSF. The theoretical text and concept have been borrowed from http://poi.apache.org/spreadsheet/how-to.html#sxssf

SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.

You can specify the window size at workbook construction time via new SXSSFWorkbook(int windowSize) or you can set it per-sheet via SXSSFSheet#setRandomAccessWindowSize(int windowSize).

When a new row is created via createRow() and the total number of unflushed records would exceed the specified window size, then the row with the lowest index value is flushed and cannot be accessed via getRow() anymore.

The default window size is 100 and defined by SXSSFWorkbook.DEFAULT_WINDOW_SIZE.

A windowSize of -1 indicates unlimited access. In this case all records that have not been flushed by a call to flushRows() are available for random access.

Note that SXSSF allocates temporary files that you must always clean up explicitly, by calling the dispose method.

SXSSFWorkbook defaults to using inline strings instead of a shared strings table. This is very efficient, since no document content needs to be kept in memory, but is also known to produce documents that are incompatible with some clients. With shared strings enabled all unique strings in the document has to be kept in memory. Depending on your document content this could use a lot more resources than with shared strings disabled.

Please note that there are still things that still may consume a large amount of memory based on which features you are using, e.g. merged regions, comments, … are still only stored in memory and thus may require a lot of memory if used extensively.

Carefully review your memory budget and compatibility needs before deciding whether to enable shared strings or not.

Prerequisites

Java 8/12/19, Apache POI 4.1.0 – 5.0.0/5.2.4, Gradle 4.10.2 – 6.7.1, Maven 3.6.3/3.8.5

Project Setup

Create a gradle or maven based project in your favorite IDE or tool. The project’s name is apache-poi-excel-large-data-handle.

Update the default generated build.gradle script to include Apache POI library for writing to Excel sheet.

plugins {
    id 'java-library'
}

sourceCompatibility = 12
targetCompatibility = 12

repositories {
    jcenter()
}

dependencies {
    implementation 'org.apache.poi:poi-ooxml:4.1.0' to 5.0.0
}

If you are creating maven based project then you can use the following pom.xml file:

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.roytuts</groupId>
  <artifactId>apache-poi-excel-large-data-handle</artifactId>
  <version>0.0.1-SNAPSHOT</version>

  <properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
		<maven.compiler.source>19</maven.compiler.source>
		<maven.compiler.target>19</maven.compiler.target>
	</properties>

	<dependencies>
		<dependency>
			<groupId>org.apache.poi</groupId>
			<artifactId>poi-ooxml</artifactId>
			<version>5.2.4</version>
		</dependency>
	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-compiler-plugin</artifactId>
				<version>3.8.1</version>
			</plugin>
		</plugins>
	</build>
</project>

VO Class

Create a VO (Value Object) class to hold data coming from persistence storage or external source.

This class adds 100 attributes to represent 100 columns in the excel file.

public class ExcelVo {

	private String column1;
	private String column2;
	private String column3;
	private String column4;
	private String column5;
	private String column6;
	private String column7;
	private String column8;
	private String column9;
	private String column10;
	private String column11;
	private String column12;
	private String column13;
	private String column14;
	private String column15;
	private String column16;
	private String column17;
	private String column18;
	private String column19;
	private String column20;
	private String column21;
	private String column22;
	private String column23;
	private String column24;
	private String column25;
	private String column26;
	private String column27;
	private String column28;
	private String column29;
	private String column30;
	private String column31;
	private String column32;
	private String column33;
	private String column34;
	private String column35;
	private String column36;
	private String column37;
	private String column38;
	private String column39;
	private String column40;
	private String column41;
	private String column42;
	private String column43;
	private String column44;
	private String column45;
	private String column46;
	private String column47;
	private String column48;
	private String column49;
	private String column50;
	private String column51;
	private String column52;
	private String column53;
	private String column54;
	private String column55;
	private String column56;
	private String column57;
	private String column58;
	private String column59;
	private String column60;
	private String column61;
	private String column62;
	private String column63;
	private String column64;
	private String column65;
	private String column66;
	private String column67;
	private String column68;
	private String column69;
	private String column70;
	private String column71;
	private String column72;
	private String column73;
	private String column74;
	private String column75;
	private String column76;
	private String column77;
	private String column78;
	private String column79;
	private String column80;
	private String column81;
	private String column82;
	private String column83;
	private String column84;
	private String column85;
	private String column86;
	private String column87;
	private String column88;
	private String column89;
	private String column90;
	private String column91;
	private String column92;
	private String column93;
	private String column94;
	private String column95;
	private String column96;
	private String column97;
	private String column98;
	private String column99;
	private String column100;

//        getters and setters
}

Utility Class

The below utility class has one method that will capitalize the initial character of the string. The capitalization is required to find the actual getter and setter methods from a given class.

public final class ExcelUtil {

	private ExcelUtil() {
	}

	// capitalize the first letter of the field name for retrieving value of the
	// field later
	public static String capitalizeInitialLetter(String s) {
		if (s.length() == 0)
			return s;
		return s.substring(0, 1).toUpperCase() + s.substring(1);
	}

}

Writing to Excel File

I will create two classes here – one class is to write data to excel file using auto flush mode and another class is to write data to excel file using manual flush mode.

Using Auto Flush

In the below class, the method writes to a sheet with a window of 100 rows. When the row count reaches 101, the row with rownum=0 is flushed to disk and removed from memory, when rownum reaches 102 then the row with rownum=1 is flushed, etc.

public class ExcelWriterAutoFlush {

	// using auto flush and default window size 100
	public void writeToExcelAutoFlush(String fileName, List<ExcelVo> excelData) {
		SXSSFWorkbook wb = null;
		FileOutputStream fos = null;
		try {
			// keep 100 rows in memory, exceeding rows will be flushed to disk
			wb = new SXSSFWorkbook(SXSSFWorkbook.DEFAULT_WINDOW_SIZE/* 100 */);
			Sheet sh = wb.createSheet();
			@SuppressWarnings("unchecked")
			Class<ExcelVo> classz = (Class<ExcelVo>) excelData.get(0).getClass();
			Field[] fields = classz.getDeclaredFields();
			int noOfFields = fields.length;
			int rownum = 0;
			Row row = sh.createRow(rownum);
			for (int i = 0; i < noOfFields; i++) {
				Cell cell = row.createCell(i);
				cell.setCellValue(fields[i].getName());
			}
			for (ExcelVo excelModel : excelData) {
				row = sh.createRow(rownum + 1);
				int colnum = 0;
				for (Field field : fields) {
					String fieldName = field.getName();
					Cell cell = row.createCell(colnum);
					Method method = null;
					try {
						method = classz.getMethod("get" + ExcelUtil.capitalizeInitialLetter(fieldName));
					} catch (NoSuchMethodException nme) {
						method = classz.getMethod("get" + fieldName);
					}
					Object value = method.invoke(excelModel, (Object[]) null);
					cell.setCellValue((String) value);
					colnum++;
				}
				rownum++;
			}
			fos = new FileOutputStream(fileName);
			wb.write(fos);
		} catch (Exception ex) {
		} finally {
			try {
				if (fos != null) {
					fos.close();
				}
			} catch (IOException e) {
			}
			try {
				if (wb != null) {
					wb.close();
				}
			} catch (IOException e) {
			}
		}
	}

}

Using Manual Flush

The below method turns off auto-flushing (windowSize=-1) and the code manually controls how portions of data are written to disk.

public class ExcelWriterManualFlush {

	// using manual flush and default window size 100
	public void writeToExcelManualFlush(String fileName, List<ExcelVo> exceldata) {
		SXSSFWorkbook wb = null;
		FileOutputStream fos = null;
		try {
			// turn off auto-flushing and accumulate all rows in memory
			wb = new SXSSFWorkbook(-1);
			Sheet sh = wb.createSheet();
			@SuppressWarnings("unchecked")
			Class<ExcelVo> classz = (Class<ExcelVo>) exceldata.get(0).getClass();
			Field[] fields = classz.getDeclaredFields();
			int noOfFields = fields.length;
			int rownum = 0;
			Row row = sh.createRow(rownum);
			for (int i = 0; i < noOfFields; i++) {
				Cell cell = row.createCell(i);
				cell.setCellValue(fields[i].getName());
			}
			for (ExcelVo excelModel : exceldata) {
				row = sh.createRow(rownum + 1);
				int colnum = 0;
				for (Field field : fields) {
					String fieldName = field.getName();
					Cell cell = row.createCell(colnum);
					Method method = null;
					try {
						method = classz.getMethod("get" + ExcelUtil.capitalizeInitialLetter(fieldName));
					} catch (NoSuchMethodException nme) {
						method = classz.getMethod("get" + fieldName);
					}
					Object value = method.invoke(excelModel, (Object[]) null);
					cell.setCellValue((String) value);
					colnum++;
				}
				// manually control how rows are flushed to disk
				if (rownum % 100 == 0) {
					// retain 100 last rows and flush all others
					((SXSSFSheet) sh).flushRows(100);
				}
				rownum++;
			}
			fos = new FileOutputStream(fileName);
			wb.write(fos);
		} catch (Exception ex) {
		} finally {
			try {
				if (fos != null) {
					fos.close();
				}
			} catch (IOException e) {
			}
			try {
				if (wb != null) {
					wb.close();
				}
			} catch (IOException e) {
			}
		}
	}

}

SXSSF flushes sheet data in temporary files (a temp file per sheet) and the size of these temporary files can grow to a very large value. For example, for a 20 MB csv data the size of the temp XML becomes more than a gigabyte.

You may also like to read Examples on Java Reflections.

Mock Data

I will test the application using some mock data and write to the excel file. I will generate 20000 records using random string using Java API.

public class ExcelMockData {

	private List<ExcelVo> excelData;

	public ExcelMockData() {
	}

	public List<ExcelVo> getExcelData() {
		if (excelData == null) {
			populateExcelData();
		}
		return excelData;
	}

	public void setExcelData(List<ExcelVo> excelData) {
		this.excelData = excelData;
	}

	private void populateExcelData() {
		excelData = new ArrayList<>();
		Class<ExcelVo> classz = (Class<ExcelVo>) ExcelVo.class;
		Method[] methods = classz.getMethods();
		for (int i = 0; i < 20000; i++) {
			ExcelVo model = new ExcelVo();
			for (Method method : methods) {
				String methodName = method.getName();
				if (methodName.startsWith("set")) {
					try {
						method.invoke(model, getRandomString());
					} catch (IllegalAccessException | IllegalArgumentException | InvocationTargetException e) {
						e.printStackTrace();
					}
				}
			}
			excelData.add(model);
		}
	}

	private String getRandomString() {
		SecureRandom random = new SecureRandom();
		return new BigInteger(130, random).toString(32);
	}

}

Main Class

In order to test the application I will create a main class here. You may also want to create Junit test case to test the application.

public class ExcelApp {

	public static void main(String[] args) {
		ExcelMockData mockData = new ExcelMockData();
		List<ExcelVo> excelData = mockData.getExcelData();
		
		// using auto flush mode
		final long autoFlushStartTime = System.currentTimeMillis();
		String fileName = "C:/eclipse-workspace/apache-poi-excel-large-data-handle/Excel-auto-flush.xlsx";
		
		ExcelWriterAutoFlush autoFlush = new ExcelWriterAutoFlush();
		autoFlush.writeToExcelAutoFlush(fileName, excelData);
		
		final long autoFlushEndTime = System.currentTimeMillis();
		final long autoFlushExeTime = autoFlushEndTime - autoFlushStartTime;
		final long hr = TimeUnit.MILLISECONDS.toHours(autoFlushExeTime);
		final long min = TimeUnit.MILLISECONDS.toMinutes(autoFlushExeTime)
				- TimeUnit.HOURS.toMinutes(TimeUnit.MILLISECONDS.toHours(autoFlushExeTime));
		final long sec = TimeUnit.MILLISECONDS.toSeconds(autoFlushExeTime)
				- TimeUnit.MINUTES.toSeconds(TimeUnit.MILLISECONDS.toMinutes(autoFlushExeTime));
		final long ms = TimeUnit.MILLISECONDS.toMillis(autoFlushExeTime)
				- TimeUnit.SECONDS.toMillis(TimeUnit.MILLISECONDS.toSeconds(autoFlushExeTime));
		
		System.out.println(String.format(
				"Total time taken to execute 20000 records using auto flush: %d Hours %d Minutes %d Seconds %d Milliseconds",
				hr, min, sec, ms));
		
		// using manual flush mode
		final long manualFlushStartTime = System.currentTimeMillis();
		
		fileName = "C:/eclipse-workspace/apache-poi-excel-large-data-handle/Excel-manual-flush.xlsx";
		ExcelWriterManualFlush manualFlush = new ExcelWriterManualFlush();
		manualFlush.writeToExcelManualFlush(fileName, excelData);
		
		final long manualFlushEndTime = System.currentTimeMillis();
		final long manualFlushExeTime = manualFlushEndTime - manualFlushStartTime;
		final long manualFlushHr = TimeUnit.MILLISECONDS.toHours(manualFlushExeTime);
		final long manualFlushMin = TimeUnit.MILLISECONDS.toMinutes(manualFlushExeTime)
				- TimeUnit.HOURS.toMinutes(TimeUnit.MILLISECONDS.toHours(manualFlushExeTime));
		final long manualFlushSec = TimeUnit.MILLISECONDS.toSeconds(manualFlushExeTime)
				- TimeUnit.MINUTES.toSeconds(TimeUnit.MILLISECONDS.toMinutes(manualFlushExeTime));
		final long manualFlushMs = TimeUnit.MILLISECONDS.toMillis(manualFlushExeTime)
				- TimeUnit.SECONDS.toMillis(TimeUnit.MILLISECONDS.toSeconds(manualFlushExeTime));
		
		System.out.println(String.format(
				"Total time taken to execute 20000 records using manual flush: %d Hours %d Minutes %d Seconds %d Milliseconds",
				manualFlushHr, manualFlushMin, manualFlushSec, manualFlushMs));
	}

}

You may also like to read Time Conversion Example.

Testing the Large Data Writing Application

Run the main class, you will see excel files Excel-auto-flush.xlsx and Excel-manual-flush.xlsx will be generated under C:/eclipse-workspace/apache-poi-excel-large-data-handle/ directory.

Console Output

The total time taken by each mode is given below:

Total time taken to execute 20000 records using auto flush: 0 Hours 0 Minutes 6 Seconds 786 Milliseconds
Total time taken to execute 20000 records using manual flush: 0 Hours 0 Minutes 5 Seconds 731 Milliseconds

Time taken by the program may depend on several factors, such as, Operating System, Hardware, RAM, how many applications are currently running in the system etc.

Source Code

Download

3 thoughts on “Handling large data writing to Excel using SXSSF Apache POI

  1. i have a problem i if have millions of record and want to export into excel file spring boot java as your example has only 20k records and it taking approx. 15 seconds. then in my case definitely it take too much time( correct me if i am wrong?). should i go with your approach or if you have any other which work with large data size please refer me. suppose in case we have millions of record and i want to export all records into excelsheet at once.

    Thanks

  2. Unable to recognize the ExcelUtils.capitalizeInitialLetter. I know u are using reflection mechanism. But,
    Where is this class ExcelUtils? My Program is unable to compile

Leave a Reply

Your email address will not be published. Required fields are marked *