Jun 1, 2020

python parses the xml method tutorial using the xml.dom module

1. What is xml? What are the features?

The extensible markup language (xml), which can be used to tag data and define data types, is a source language that allows users to define their own markup language.

Example: del xml

<?xml version="1.0" encoding="utf-8"?>
<catalog>
 <maxid>4</maxid>
 <login username="pytest" passwd='123456'>
  <caption>Python</caption>
  <item id="4">
   <caption>test</caption>
  </item>
 </login>
 <item id="2">
  <caption>Zope</caption>
 </item>
</catalog>

Structurally, it looks a lot like HTML. But they are designed for different purposes. Hypertext markup language is designed to display data, and its focus is on the appearance of the data. It is designed to transmit and store data, and its focus is on the content of the data.

So it has the following characteristics:

The & # 8226; It’s made up of tag pairs, <aa></aa>

The & # 8226; Tags can have attributes: <aa id='123'></aa>

The & # 8226; Tag pairs can embed data: <aa>abc</aa>

The & # 8226; Tags can be embedded with subtags (hierarchical)

2. Get tag attributes

#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse("del.xml") # Open the xml The document

root = dom.documentElement    # get xml The document object
print "nodeName:", root.nodeName  # every 1 Every node has its own nodeName . nodeValue . nodeType attribute
print "nodeValue:", root.nodeValue  #nodeValue Is the value of a node, only valid for text nodes
print "nodeType:", root.nodeType
print "ELEMENT_NODE:", root.ELEMENT_NODE

nodeType is the type of node. catalog is of type ELEMENT_NODE

There are now several:

'ATTRIBUTE_NODE'

'CDATA_SECTION_NODE'

'COMMENT_NODE'

'DOCUMENT_FRAGMENT_NODE'

'DOCUMENT_NODE'

'DOCUMENT_TYPE_NODE'

'ELEMENT_NODE'

'ENTITY_NODE'

'ENTITY_REFERENCE_NODE'

'NOTATION_NODE'

'PROCESSING_INSTRUCTION_NODE'

'TEXT_NODE'

The results

nodeName: catalog

nodeValue: None

nodeType: 1

ELEMENT_NODE: 1

3. Get subtags

#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse("del.xml")

root = dom.documentElement
bb = root.getElementsByTagName('maxid')
print type(bb)
print bb
b = bb[0]
print b.nodeName
print b.nodeValue

The results

<class 'xml.dom.minicompat.NodeList'>

[<DOM Element: maxid at 0x2707a48>]

maxid

None

4. Get the tag attribute value

#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse("del.xml")

root = dom.documentElement
itemlist = root.getElementsByTagName('login')
item = itemlist[0]
print item.getAttribute("username")
print item.getAttribute("passwd")

itemlist = root.getElementsByTagName("item")
item = itemlist[0]     # Through the itemlist Position differentiation in
print item.getAttribute("id")

item2 = itemlist[1]     # Through the itemlist Position differentiation in
print item2.getAttribute("id")

The results

pytest

123456

4

2

5. Get the data between the label pairs

#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse("del.xml")

root = dom.documentElement
itemlist = root.getElementsByTagName('caption')

item = itemlist[0]
print item.firstChild.data

item2 = itemlist[1]
print item2.firstChild.data

The results

Python

test

Example 6.

#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse("del.xml") # Open the xml The document

root = dom.documentElement    # get xml The document object
print "nodeName:", root.nodeName  # every 1 Every node has its own nodeName . nodeValue . nodeType attribute
print "nodeValue:", root.nodeValue  #nodeValue Is the value of a node, only valid for text nodes
print "nodeType:", root.nodeType
print "ELEMENT_NODE:", root.ELEMENT_NODE

Output name, email, age, sex

Reference code

#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse("del.xml") # Open the xml The document

root = dom.documentElement    # get xml The document object
print "nodeName:", root.nodeName  # every 1 Every node has its own nodeName . nodeValue . nodeType attribute
print "nodeValue:", root.nodeValue  #nodeValue Is the value of a node, only valid for text nodes
print "nodeType:", root.nodeType
print "ELEMENT_NODE:", root.ELEMENT_NODE

The results of

#coding: utf-8
import xml.dom.minidom
dom = xml.dom.minidom.parse("del.xml") # Open the xml The document

root = dom.documentElement    # get xml The document object
print "nodeName:", root.nodeName  # every 1 Every node has its own nodeName . nodeValue . nodeType attribute
print "nodeValue:", root.nodeValue  #nodeValue Is the value of a node, only valid for text nodes
print "nodeType:", root.nodeType
print "ELEMENT_NODE:", root.ELEMENT_NODE

7. To summarize

minidom.parse(filename)

 Loads to read XML file



doc.documentElement

 To obtain XML The document object



node.getAttribute(AttributeName)

 To obtain XML Node attribute value



node.getElementsByTagName(TagName)

 To obtain XML Node object collection



node.childNodes # Returns a list of child nodes.



node.childNodes[index].nodeValue

 To obtain XML Node values



node.firstChild

# Access to the first 1 A node. Is equivalent to pagexml.childNodes[0]



doc = minidom.parse(filename)

doc.toxml('UTF-8')

 return Node The node's xml Represented text



Node.attributes["id"]

a.name # That's the top  "id"

a.value # The value of the attribute

 Accessing element attributes