Convert PDF To Image File Using Java

Introduction

This tutorial will show you how to convert pdf to image file using Java. For this I am using here pdfbox API. Java pdf to image example will show you step by step conversion procedure from PDF file to image file.

In the recent version (2.0.20) of the pdfbox library many methods were removed along with getAllPages() and convertToImage() methods.

In this example I will show you how to convert PDF file into Image file using 1.8.3 as well as 2.0.20 to 2.0.22 versions of pdfbox library.

Related Posts:

java pdf to image

Prerequisites

Java 1.8+, Maven 3.6.3 -3.8.2, Gradle 6.4.1 – 6.7.1, PdfBox 1.8.3 and 2.0.20 – 2.0.22

Setup Project

Create a maven or gradle based project in your favorite IDE or tool. The name of the project is java-pdf-to-image.

If you are using maven based project then you can use below pom.xml file:

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>com.roytuts</groupId>
	<artifactId>java-pdf-to-image</artifactId>
	<version>0.0.1-SNAPSHOT</version>

	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
		<maven.compiler.source>12</maven.compiler.source>
		<maven.compiler.target>12</maven.compiler.target>
	</properties>

	<dependencies>
		<dependency>
			<groupId>org.apache.pdfbox</groupId>
			<artifactId>pdfbox</artifactId>
			<version>2.0.20 to 2.0.22</version>
		</dependency>
	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-compiler-plugin</artifactId>
				<version>3.8.1</version>
			</plugin>
		</plugins>
	</build>
</project>

If you are using gradle based project then you can use below build.gradle script. You can change the version of pdfbox according to your requirement.

plugins {
    id 'java-library'
}

repositories {
    jcenter()
}

dependencies {
    implementation 'org.apache.pdfbox:pdfbox:2.0.20' to 2.0.22
}

Java Class

The below Java class converts PDF file into Image file. The output image file will be PNG type.

If you are using pdfbox 1.8.3 then you can use below code.

package com.roytuts.java.pdf.to.image;

import java.awt.image.BufferedImage;
import java.io.File;
import java.util.List;
import javax.imageio.ImageIO;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
public class ConvertPdfToImage {
	public static void main(String[] args) {
		try {
			String sourceDir = "sample.pdf";
			String destinationDir = "pdf-to-image";
			File sourceFile = new File(sourceDir);
			File destinationFile = new File(destinationDir);
			if (!destinationFile.exists()) {
				destinationFile.mkdir();
				System.out.println("Folder Created -> " + destinationFile.getAbsolutePath());
			}
			if (sourceFile.exists()) {
				PDDocument document = PDDocument.load(sourceDir);
				@SuppressWarnings("unchecked")
				List<PDPage> list = document.getDocumentCatalog().getAllPages();
				String fileName = sourceFile.getName().replace(".pdf", "");
				int pageNumber = 1;
				for (PDPage page : list) {
					BufferedImage image = page.convertToImage();
					File outputfile = new File(destinationDir + fileName + "_" + pageNumber + ".png");
					ImageIO.write(image, "png", outputfile);
					pageNumber++;
				}
				document.close();
				System.out.println("Image saved at -> " + destinationFile.getAbsolutePath());
			} else {
				System.err.println(sourceFile.getName() + " File does not exist");
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

First I get the source from where I want to read the pdf file and destination directory where I want to write the converted image file.

Next I create the required destination directories if they do not exist.

Then I read the pdf file and retrieve all pages and for each page I generate the image file in the destination directory.

If you are using pdfbox 2.0.20 to 2.0.22 version then you can use below code:

package com.roytuts.java.pdf.to.image;

import java.awt.image.BufferedImage;
import java.io.File;

import javax.imageio.ImageIO;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.PDFRenderer;

public class PdfToImageConverter {

	public static void main(String[] args) {
		try {
			String destinationDir = "pdf-to-image/";

			File sourceFile = new File("sample.pdf");
			File destinationFile = new File(destinationDir);

			if (!destinationFile.exists()) {
				destinationFile.mkdir();
				System.out.println("Folder Created -> " + destinationFile.getAbsolutePath());
			}

			if (sourceFile.exists()) {
				PDDocument document = PDDocument.load(sourceFile);
				PDFRenderer pdfRenderer = new PDFRenderer(document);

				String fileName = sourceFile.getName().replace(".pdf", "");

				// int pageNumber = 0;

				// for (PDPage page : document.getPages()) {
				for (int pageNumber = 0; pageNumber < document.getNumberOfPages(); ++pageNumber) {
					BufferedImage bim = pdfRenderer.renderImage(pageNumber);

					String destDir = destinationDir + fileName + "_" + pageNumber + ".png";

					ImageIO.write(bim, "png", new File(destDir));
				}

				document.close();

				System.out.println("Image saved at -> " + destinationFile.getAbsolutePath());
			} else {
				System.err.println(sourceFile.getName() + " File does not exist");
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

}

Testing the Application

Input pdf file

Output Image file

The output images for page 1 and page are given below:

Page 1:

pdf to image using java

Page 2:

pdf to image using java

That’s all how to convert pdf file to image file using Java program.

Source Code

Leave a Reply

Your email address will not be published. Required fields are marked *