How to convert file content into PDF document using Python

Introduction

I am going to show you how to read file and write to a PDF document using Python 3. The file content may be in any format and it depends on you how you are going to parse it. I am going to show you how to read text file and write into PDF file. I am also going to show you how to read HTML file or text file with HTML content and parse as HTML and write to PDF file using Python.

I will read here both text and html files and write those content into PDF file. The HTML content in text file will be written as plain text as well as HTML text in PDF file. For writing as HTML text I will use HTMLMixin library. We also need to include fpdf library to write to the PDF document.

Prerequisites

Python 3.8.0, fpdf 1.7.2 (pip installfpdf)

Read and Write File Content into PDF

Now we will see how to read and write file content into PDF file.

Related Posts:

We will create a Python script that will generate PDF files from text file and html file.

If you closely look at the script, there are two parts in the below Python script:

  1. Reading text file and write as text content
  2. Reading text and html files and write as HTML content
from fpdf import FPDF, HTMLMixin

#HTML content as text
pdf = FPDF()
pdf.add_page()

#Read file
text = None
with open('file.txt', 'r') as fh:
	text = fh.read()

pdf.set_font('Times', '', 12)
pdf.multi_cell(0, 5, text)

pdf.output('FileContentText2Pdf.pdf', 'F')


#HTML content as HTML
class MyFPDF(FPDF, HTMLMixin):
	pass

pdf = MyFPDF()
pdf.add_page()

html = None
#with open("file.html", "r", encoding='utf-8') as f:
with open("file.txt", "r", encoding='utf-8') as f:
    html= f.read()
	
pdf.write_html(html)
pdf.output('FileContentHtml2Pdf.pdf', 'F')

For writing as text without mixing styles we are just creating a pdf object using FPDF constructor. Then we set font type which is mandatory otherwise your document would be invalid. Finally we write the content using multi_cell() function. This function is required when you have texts over multiple lines otherwise your text will be written into one line and your text will not be visible beyond the width of the pdf file.

For writing as HTML with mixing style we are creating first our own class MyFPDF.

The pass statement in Python is used when a statement is required syntactically but you do not want any command or code to execute. The pass statement is a null operation; nothing happens when it executes.

I have commented one line that reads html file. This html file contains HTML content which can be written as HTML into the PDF file. Therefore I just let you know that either HTML content into a text file or html file will be written as HTML content into PDF document with proper parsing.

File Content

The content of the text and html files is same and given below:

<h1 align="center">HTML to PDF</h1>
<h2>An exmaple to convert HTML to PDF</h2>
<p>You can now easily print text mixing different
styles : <B>bold</B>, <I>italic</I>, <U>underlined</U>, or
<B><I><U>all at once</U></I></B>!<BR>You can also insert links
on text, such as <A HREF="https://pypi.org/project/fpdf/">www.pypi.org</A>,
or on an image: click on the logo image.<br>
<center>
<a href="http://www.pypi.org"><img src="logo.png" width="104" height="71"></a>
</center>

<h2>Paragraph</h2>

<p>Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae, ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam egestas semper. Aenean ultricies mi vitae est. Mauris placerat eleifend leo. Quisque sit amet est et sapien ullamcorper pharetra. Vestibulum erat wisi, condimentum sed, commodo vitae, ornare sit amet, wisi. Aenean fermentum, elit eget tincidunt condimentum, eros ipsum rutrum orci, sagittis tempus lacus enim ac dui. Donec non enim in turpis pulvinar facilisis. Ut felis. Praesent dapibus, neque id cursus faucibus, tortor neque egestas augue, eu vulputate magna eros eu erat. Aliquam erat volutpat. Nam dui mi, tincidunt quis, accumsan porttitor, facilisis luctus, metus</p>

<h2>Paragraph with Bold Text</h2>

<p><B>Pellentesque habitant morbi tristique</B> senectus et netus et malesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae, ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam egestas semper. <em>Aenean ultricies mi vitae est.</em> Mauris placerat eleifend leo. Quisque sit amet est et sapien ullamcorper pharetra. Vestibulum erat wisi, condimentum sed, <code>commodo vitae</code>, ornare sit amet, wisi. Aenean fermentum, elit eget tincidunt condimentum, eros ipsum rutrum orci, sagittis tempus lacus enim ac dui. <a href="#">Donec non enim</a> in turpis pulvinar facilisis. Ut felis.</p>

<h2>Ordered List</h2>

<ol>
   <li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
   <li>Aliquam tincidunt mauris eu risus.</li>
</ol>

<blockquote><p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus magna. Cras in mi at felis aliquet congue. Ut a est eget ligula molestie gravida. Curabitur massa. Donec eleifend, libero at sagittis mollis, tellus est malesuada tellus, at luctus turpis elit sit amet quam. Vivamus pretium ornare est.</p></blockquote>

<h3>Unordered List</h3>

<ul>
   <li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
   <li>Aliquam tincidunt mauris eu risus.</li>
</ul>

<h2>Code Block</h2>

<pre><code>
#header h1 a {
  display: block;
  width: 300px;
  height: 80px;
}
</code></pre>

<h2>Definition List</h2>

<dl>
   <dt>Definition list</dt>
   <dd>Consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat.</dd>
   <dt>Lorem ipsum dolor sit amet</dt>
   <dd>Consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat.</dd>
</dl>

<h2>Table</h2>

<table border="0" align="center" width="50%">
<thead><tr><th width="30%">Header 1</th><th width="70%">header 2</th></tr></thead>
<tbody>
<tr><td>cell 1</td><td>cell 2</td></tr>
<tr><td>cell 2</td><td>cell 3</td></tr>
</tbody>
</table>

Testing the Program

Executing the above Python script will give you two PDF files into the root directory of the Python script.

Source Code

Download

Thanks for reading.

Leave a Reply

Your email address will not be published. Required fields are marked *