Parsing XML using Python

XML Parsing

Parsing XML using Python is a matter of few lines of code has to be written.  The XML could be in a file or XML could be a string. So basically parsing XML using Python means reading the XML node data and further doing something with those data to use in this application.

Extensible Markup Language (XML) are the most widely used formats for data, because this format is very well supported by modern applications, and is very well suited for further data manipulation and customization.

Prerequisites

Python 3.6.6/3.11.5

Preparing Workspace

Preparing your workspace is one of the first things that you can do to make sure that you start off well. The first step is to check your working directory.

When you are working in the Python terminal, you need first navigate to the directory, where your file is located and then start up Python, i.e., you have to make sure that your file is located in the directory where you want to work from.

Let’s move on to the example…

Project Directory

In the below image you see I have opened a cmd prompt and navigated to the directory where I have put the xml file that has to be read or I will be parsing xml using Python.

python

I will be parsing XML using Python script.

Python Script

Now I will create a python script that will read the attached XML file in the above link and display the content in the console.

XML is an inherently hierarchical data format, and the most natural way to represent it is with a tree. I will be parsing the XML data using xml.etree.ElementTree. ElementTree represents the whole XML document as a tree, and Element represents a single node in this tree. Interactions with the whole document (reading and writing to/from files) are usually done on the ElementTree level. Interactions with a single XML element and its sub-elements are done on the Element level.

If you want to parse or read XML string then use root = ET.fromstring(book_data_as_string).

Here in the below script I first import the required module and then I load or parse the whole XML file into tree variable. Then I get the root node from the XML data. Finally I iterate through each level of the node and print the node name, node attributes and node value.

Lets create the Python script (xml-parser.py) to read the above XML file:

import xml.etree.ElementTree as ET

tree = ET.parse('bookstore.xml')
root = tree.getroot()

for child in root:
	print(child.tag, child.attrib)
	for node in child:
		print(node.tag, node.attrib, node.text)
		for c in node:
			print(c.tag, c.attrib, c.text)

Testing the Parsing XML Script

Make sure you have the bookstore.xml file in C:\py_scripts directory. Now when you run the above Python script, you should see the following output in the console.

Here I could not show you the whole output but you will see the whole output in the console when you run the script.

parsing xml using python

Source Code

Download

Leave a Reply

Your email address will not be published. Required fields are marked *