Comparison of DOM and SAX

Tree approach is useful for small documents in which the program needs to process a large portion of the document. Event driven approach is useful for large documents in which the program only needs to process a small portion of the document.
SAX parsers generally require you to write a bit more code than the DOM interface. If you use DOM to construct the tree, extract the data and throw away the tree, then SAX might have been more efficient.
Unless you build a DOM style tree from your application’s internal representation for the data, you can’t as easily write the XML file back to disk. The DOM tree is not constructed, so there are potentially less memory allocation.
  If you convert the data in the DOM tree to another format, the SAX API may help remove the intermediate step.
  If you do not need all the XML data in memory, the SAX API allows you to process the data as it is parsed.

Document Object Model (DOM)


  • Tree-traversal approach.
  • A platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents. The Document Object Model provides a standard set of objects for representing HTML and XML documents, a standard model of how these objects can be combined, and a standard interface for accessing and manipulating them. Vendors can support the DOM as an interface to their proprietary data structures and APIs, and content authors can write to the standard DOM interfaces rather than product-specific APIs, thus increasing interoperability on the Web. (from the W3C specs).

DOM Levels

  • DOM Level 1 – document (XML and HTML) navigation and manipulation
  • DOM Level 2 – stylesheet navigation and manipulation
  • DOM Level 3 – document loading, saving, and validation (DTDs and schemas) [Working Draft]
  • DOM Level 1 and 2 are used by Dynamic HTML (DHTML)

Simple API for XML (SAX)


  • Event-driven
  • language-independent


XML Document:

<?xml version="1.0"> <doc> <para>Hello, world!</para> </doc>


start document
start element: doc
start element: para
characters: Hello, world!
end element: para
end element: doc
end document

Language Independence