In this post you will see how to convert pdf to image using Python language. I will use here pdf2image module for extracting image from pdf file and convert to image file.
Though there are number of tools available for converting pdf to image file but still you may need to convert pdf using programming language for certain situations. Here I am going to use pdf2image library in Python 3 for converting image.
This library wraps pdftoppm and pdftocairo to convert PDF to an image object.
Python 3.8.0 – 3.9.5, pdf2image 1.10.0 – 1.15.1
Install the required module
pdf2image using the following command from the command line tool. Make sure you open the command line tool in administrator mode.
Once successfully installed you will get the success message as shown below:
Installing collected packages: pdf2image Successfully installed pdf2image-1.15.1
You need to install poppler library in order to convert pdf to image. Here I am using Windows 10 64 bit operating system so I have downloaded latest library poppler-0.68.0_x86 from the link. You need to choose according to your operating system.
Now extract the zip file into your convenient location generally under C drive. So your poppler library bin directory would be C:\poppler-0.68.0\bin. Now you need to add this bin directory (C:\poppler-0.68.0\bin) to your environment variable Path.
Now you need to use new command prompt to get the changes otherwise on old command prompt you won’t be able to get it work.
Convert Pdf to Image
Now you will see how to convert pdf to image using pdf2image library in Python 3.
First you need to import the pdf2image library and associated exceptions so that you will get proper error message if anything goes wrong.
I will show you how to convert pdf to image in various ways.
You can convert from path using the below code.
To make output as png from pdf use below code:
images = convert_from_path('sample.pdf') for image in images: image.save('sample.png', 'PNG')
To make output as jpg format use below code:
images = convert_from_path('gentleman.pdf') for image in images: image.save('gentleman.jpg', 'JPEG')
If you know that your pdf file has only one image then you can use below code:
images = convert_from_bytes(open('gentleman.pdf', 'rb').read()) images.save('gentleman-byte.jpg', 'JPEG')
When you have multiple pages with images in pdf file then you can save them one by one by appending some counter value to avoid overwriting the same output file.
images = convert_from_path('output1.pdf') i = 1 for image in images: image.save('output' + str(i) + '.jpg', 'JPEG') i = i + 1
Please note that the output images will be saved in the current directory.
You can download the source code, sample pdf files and output image files from the below link.