DOM – Document Object Model

What is DOM ?

A DOM – Document Object Model is a collection of nodes, or pieces of information, organized in a hierarchy. This hierarchy allows a developer to navigate around the tree looking for specific information. Analyzing the structure normally requires the entire document to be loaded and the hierarchy to be built before any work is done. Because it is based on a hierarchy of information, the DOM is said to be tree-based, or object-based.

For exceptionally large documents, parsing and loading the entire document can be slow and resource-intensive.

DOM provides an API that allows a developer to add, edit, move, or remove nodes at any point on the tree in order to create an application, while event-based models like SAX does not allow a developer to actually change the data in the original document.

Node Types in DOM ?

The node types are illustrated below one by one:

Elements

Elements are the basic building blocks of XML. Typically, elements have children that are other elements, text nodes, or a combination of both. Element nodes are also the only type of node that can have attributes.

Attributes

Attribute nodes contain information about an element node, but are not actually considered to be children of the element, for example,

<withdrawalamount limit="10000000">50000</withdrawalamount>

Text

A text node is exactly the text. It can consist of more information or just white space.

Document

The document node is the overall parent for all of the other nodes in the document.

Parsing an XML File using DOM

To work with the information in an XML file, the file must be parsed to create a Document object.

The Document object is an interface, so it cannot be instantiated directly; generally, the application uses a factory instead.

In Java environment, parsing the XML file is a three-step process:

  • Create the DocumentBuilderFactory. This object creates the DocumentBuilder.
  • Create the DocumentBuilder. The DocumentBuilder does the actual parsing to create the Document object.
  • Parse the file to create the Document object.

Start by creating the application, a class called NewsProcessor:

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.File;
import org.w3c.dom.Document;

public class NewsProcessor {

    public static void main(String args[]) {
	
        File docFile = new File("news.xml");
        Document doc = null;
		
        try {
            DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
            DocumentBuilder db = dbf.newDocumentBuilder();
            doc = db.parse(docFile);
        } catch (Exception e) {
            System.out.print("Problem occurred during parsing the file: " + e.getMessage());
        }
		
    }
	
}

In the above NewsProcessor class within the try-catch block, the application creates the DocumentBuilderFactory, which it then uses to create the DocumentBuilder. Finally, the DocumentBuilder parses the file to create the Document.

Validating Document using DOM

Set setValidating(true) to the DocumentBuilderFactory instance in the above NewsProcessor class as shown below.

...
try {
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setValidating(true);
    DocumentBuilder db = dbf.newDocumentBuilder();
    doc = db.parse(docFile);
} catch (Exception e) {
...

Accessing Root Element using DOM

Once the document is parsed and a Document is created, an application can step through the structure to review, find, or display information.

This navigation is the basis for many operations that will be performed on a Document.

Stepping through the document begins with the root element.

A well-formed document has only one root element, also known as the DocumentElement.

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.File;
import org.w3c.dom.Document;
import org.w3c.dom.Element;

public class NewsProcessor {

    ...
	//Get the root element
	Element root = doc.getDocumentElement();
	System.out.println("The root element is " + root.getNodeName());
    ...
	
}

Accessing Child Node using DOM

Once the application determines the root element, it retrieves a list of the root element’s children as a NodeList. The NodeList class is a series of items through which the application can iterate.

In the below example, the application gets the children nodes and verifies the retrieval by showing only how many elements appear in the resulting NodeList:

...
import org.w3c.dom.NodeList;
    ...
        //Get the root element
        Element root = doc.getDocumentElement();
        System.out.println("The root element is " + root.getNodeName());
        //Get the children
        NodeList children = root.getChildNodes();
        System.out.println("There are " + children.getLength() + " nodes in this document.");
    }
}

getFirstChild() and getNextSibling() in DOM

The parent-child and sibling relationships offer an alternative mean for iterating through all of the children of a node that may be more appropriate in some situations, such as when these relationships and the order in which children appear is crucial to understanding the data.

A for-loop starts with the first child of the root. The application iterates through each of the siblings of the first child until they have all been evaluated. Each time the application executes the loop, it retrieves a Node object, outputting its name and value.

Notice also that the elements carry a value of null, rather than the expected text. It is the text nodes that are children of the elements that carry the actual content as their values:

...
import org.w3c.dom.Node;
    ...
        //Step through the children
        for (Node child = root.getFirstChild(); child != null; child = child.getNextSibling()) {
            System.out.println(child.getNodeName() + " = " + child.getNodeValue());
        }
    }
}

A Node object carries member constants that represent each type of node, such as ELEMENT_NODE or ATTRIBUTE_NODE. If the nodeType matches ELEMENT_NODE, it is an element.

For every element it finds, the application creates a NamedNodeMap that contains all of the attributes for the element. The application can iterate through a NamedNodeMap, printing each attribute’s name and value, just as it iterated through the NodeList:

...
import org.w3c.dom.NamedNodeMap;
...
private static void stepThroughAllNodes(Node root) {

	System.out.println(root.getNodeName() + " = " + root.getNodeValue());
	
	if (root.getNodeType() == root.ELEMENT_NODE) {
		NamedNodeMap rootAttr = root.getAttributes();
		
		for (int i = 0; i < rootAttr.getLength(); i++) {
			Node attr = rootAttr.item(i);
			System.out.println("Attribute: " + attr.getNodeName() + " = " + attr.getNodeValue());
		}
	}
	
	for (Node child = root.getFirstChild(); child != null; child = child.getNextSibling()) {
		stepThroughAllNodes(child);
	}
	
}

That’s all. Hope you understand what is DOM and what is its usage case.

Thanks for reading.

Leave a Reply

Your email address will not be published. Required fields are marked *