What is DOM ?
A DOM – Document Object Model is a collection of nodes, or pieces of information, organized in a hierarchy. This hierarchy allows a developer to navigate around the tree looking for specific information. Analyzing the structure normally requires the entire document to be loaded and the hierarchy to be built before any work is done. Because it is based on a hierarchy of information, the DOM is said to be tree-based, or object-based.
For exceptionally large documents, parsing and loading the entire document can be slow and resource-intensive.
DOM provides an API that allows a developer to add, edit, move, or remove nodes at any point on the tree in order to create an application, while event-based models like SAX does not allow a developer to actually change the data in the original document.
Node Types in DOM ?
The node types are illustrated below one by one:
Elements
Elements are the basic building blocks of XML. Typically, elements have children that are other elements, text nodes, or a combination of both. Element nodes are also the only type of node that can have attributes.
Attributes
Attribute nodes contain information about an element node, but are not actually considered to be children of the element, for example,
<withdrawalamount limit="10000000">50000</withdrawalamount>
Text
A text node is exactly the text. It can consist of more information or just white space.
Document
The document node is the overall parent for all of the other nodes in the document.
Parsing an XML File using DOM
To work with the information in an XML file, the file must be parsed to create a Document
object.
The Document
object is an interface, so it cannot be instantiated directly; generally, the application uses a factory instead.
In Java environment, parsing the XML file is a three-step process:
- Create the
DocumentBuilderFactory
. This object creates theDocumentBuilder
. - Create the
DocumentBuilder
. TheDocumentBuilder
does the actual parsing to create theDocument
object. - Parse the file to create the
Document
object.
Start by creating the application, a class called NewsProcessor
:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.File;
import org.w3c.dom.Document;
public class NewsProcessor {
public static void main(String args[]) {
File docFile = new File("news.xml");
Document doc = null;
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
doc = db.parse(docFile);
} catch (Exception e) {
System.out.print("Problem occurred during parsing the file: " + e.getMessage());
}
}
}
In the above NewsProcessor
class within the try-catch
block, the application creates the DocumentBuilderFactory
, which it then uses to create the DocumentBuilder
. Finally, the DocumentBuilder
parses the file to create the Document
.
Validating Document using DOM
Set setValidating(true)
to the DocumentBuilderFactory
instance in the above NewsProcessor
class as shown below.
...
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(true);
DocumentBuilder db = dbf.newDocumentBuilder();
doc = db.parse(docFile);
} catch (Exception e) {
...
Accessing Root Element using DOM
Once the document is parsed and a Document
is created, an application can step through the structure to review, find, or display information.
This navigation is the basis for many operations that will be performed on a Document
.
Stepping through the document begins with the root element.
A well-formed document has only one root element, also known as the DocumentElement
.
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.File;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
public class NewsProcessor {
...
//Get the root element
Element root = doc.getDocumentElement();
System.out.println("The root element is " + root.getNodeName());
...
}
Accessing Child Node using DOM
Once the application determines the root element, it retrieves a list of the root element’s children as a NodeList
. The NodeList
class is a series of items through which the application can iterate.
In the below example, the application gets the children nodes and verifies the retrieval by showing only how many elements appear in the resulting NodeList
:
...
import org.w3c.dom.NodeList;
...
//Get the root element
Element root = doc.getDocumentElement();
System.out.println("The root element is " + root.getNodeName());
//Get the children
NodeList children = root.getChildNodes();
System.out.println("There are " + children.getLength() + " nodes in this document.");
}
}
getFirstChild() and getNextSibling() in DOM
The parent-child and sibling relationships offer an alternative mean for iterating through all of the children of a node that may be more appropriate in some situations, such as when these relationships and the order in which children appear is crucial to understanding the data.
A for-loop starts with the first child of the root. The application iterates through each of the siblings of the first child until they have all been evaluated. Each time the application executes the loop, it retrieves a Node object, outputting its name and value.
Notice also that the elements carry a value of null, rather than the expected text. It is the text nodes that are children of the elements that carry the actual content as their values:
...
import org.w3c.dom.Node;
...
//Step through the children
for (Node child = root.getFirstChild(); child != null; child = child.getNextSibling()) {
System.out.println(child.getNodeName() + " = " + child.getNodeValue());
}
}
}
A Node object carries member constants that represent each type of node, such as ELEMENT_NODE
or ATTRIBUTE_NODE
. If the nodeType
matches ELEMENT_NODE
, it is an element.
For every element it finds, the application creates a NamedNodeMap
that contains all of the attributes for the element. The application can iterate through a NamedNodeMap
, printing each attribute’s name and value, just as it iterated through the NodeList
:
...
import org.w3c.dom.NamedNodeMap;
...
private static void stepThroughAllNodes(Node root) {
System.out.println(root.getNodeName() + " = " + root.getNodeValue());
if (root.getNodeType() == root.ELEMENT_NODE) {
NamedNodeMap rootAttr = root.getAttributes();
for (int i = 0; i < rootAttr.getLength(); i++) {
Node attr = rootAttr.item(i);
System.out.println("Attribute: " + attr.getNodeName() + " = " + attr.getNodeValue());
}
}
for (Node child = root.getFirstChild(); child != null; child = child.getNextSibling()) {
stepThroughAllNodes(child);
}
}
That’s all. Hope you understand what is DOM and what is its usage case.
Thanks for reading.