I figured out why my XML parsing code works fine using the pure-Python ElementTree XML parsing module but fails when using the speedy and memory-optimized cElementTree XML parsing module.
The XPath 1.0 specification says '.'
is short-hand for 'self::node()'
, selecting a node itself.
Parsing an XML document and selecting the context node with ElementTree in Python 2.5:
>>> from xml.etree import ElementTree
>>> ElementTree.VERSION
'1.2.6'
>>> doc = "<Root><Example>BUG</Example></Root>"
>>> node1 = ElementTree.fromstring(doc).find('./Example')
>>> node1
<Element Example at 10e0ed8c0>
>>> node1.find('.')
<Element Example at 10e0ed8c0>
>>> node1.find('.') == node1
True
See how the result of node1.find('.')
is the node itself? As it should be.
Parsing an XML document and selecting the context node with cElementTree in Python 2.5:
>>> from xml.etree import cElementTree
>>> doc = "<Root><Example>BUG</Example></Root>"
>>> node2 = cElementTree.fromstring(doc).find('./Example')
>>> node2
<Element 'Example' at 0x10e0e3660>
>>> node2.find('.')
>>> node2.find('.') == node2
False
Balls. The result of node2.find('.')
is None
.
However! I have a kludgey work-around that works whether you use ElementTree or cElementTree. Use './'
instead of '.'
:
>>> node1.find('./')
<Element Example at 10e0ed8c0>
>>> node1.find('./') == node1
True
>>> node2.find('./')
<Element 'Example' at 0x10e0e3660>
>>> node2.find('./') == node2
True
Kludgey because './'
is not a valid XPath expression.
So we are back on track. Also works for Python 2.6 which has the same version of ElementTree.
Fortunately Python 2.7 got a new version of ElementTree and the bug is fixed:
>>> from xml.etree import ElementTree
>>> ElementTree.VERSION
'1.3.0'
>>> doc = "<Root><Example>BUG</Example></Root>"
>>> node3 = ElementTree.fromstring(doc).find('./Example')
>>> node3
<Element 'Example' at 0x107257210>
>>> node3.find('.')
<Element 'Example' at 0x107257210>
>>> node3.find('.') == node3
True
However! They also fixed my kludgey work-around:
>>> node3.find('./')
>>> node3.find('./') == node3
False
So I can’t code something that works for all three versions. This is annoying. I was hoping to just replace ElementTree with the C version, makes my code run in one third the time (the XML parts of it run in one tenth the time). And cannot install any compiled modules – the code can only rely on Python 2.5′s standard library.