test failure with libxml2 2.10.4

Bug #2016939 reported by Steve Kowalik
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
lxml
Fix Released
Medium
scoder

Bug Description

While attempting to build lxml against the recently released libxml2 2.10.4:

Traceback (most recent call last):
  File "/usr/lib64/python3.9/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/lib64/python3.9/unittest/case.py", line 592, in run
    self._callTestMethod(testMethod)
  File "/usr/lib64/python3.9/unittest/case.py", line 550, in _callTestMethod
    method()
  File "/home/abuild/rpmbuild/BUILD/lxml-4.9.2/src/lxml/tests/test_etree.py", line 3077, in test_html_prefix_nsmap
    self.assertEqual({}, el.nsmap)
AttributeError: 'NoneType' object has no attribute 'nsmap'

I've tracked this down to this line in the test case returning None

el = etree.HTML('<hha:page-description>aa</hha:page-description>').find('.//page-description')

The release notes of libxml2 2.10.4 state a regression that namespaces are now ignored in HTML documents, but I'm not certain how best to handle this.

Revision history for this message
Miro Hrončok (churchyard) wrote :

I've tried to change the tests like this:

    def test_html_prefix_nsmap(self):
        etree = self.etree
        html = etree.HTML('<hha:page-description>aa</hha:page-description>')
        el = html.find('.//page-description')
        if etree.LIBXML_VERSION < (2, 10, 4):
            if etree.LIBXML_VERSION < (2, 9, 11):
                self.assertEqual({'hha': None}, el.nsmap)
            else:
                self.assertEqual({}, el.nsmap)
        else:
            self.assertIsNone(el)
            el = html.find('.//hha:page-description') # <-- breaks
            self.assertEqual({}, el.nsmap)

But the marked line does not work as I expected:

======================================================================
ERROR: test_html_prefix_nsmap (lxml.tests.test_etree.ETreeOnlyTestCase.test_html_prefix_nsmap)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "src/lxml/_elementpath.py", line 85, in lxml._elementpath.xpath_tokenizer
    raise KeyError
KeyError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib64/python3.11/unittest/case.py", line 57, in testPartExecutor
    yield
  File "/usr/lib64/python3.11/unittest/case.py", line 623, in run
    self._callTestMethod(testMethod)
  File "/usr/lib64/python3.11/unittest/case.py", line 579, in _callTestMethod
    if method() is not None:
       ^^^^^^^^
  File "/builddir/build/BUILD/lxml-4.9.2/src/lxml/tests/test_etree.py", line 3080, in test_html_prefix_nsmap
    el = html.find('.//hha:page-description')
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/lxml/etree.pyx", line 1550, in lxml.etree._Element.find
    return _elementpath.find(self, path, namespaces)
  File "src/lxml/_elementpath.py", line 323, in lxml._elementpath.find
    it = iterfind(elem, path, namespaces)
  File "src/lxml/_elementpath.py", line 312, in lxml._elementpath.iterfind
    selector = _build_path_iterator(path, namespaces)
  File "src/lxml/_elementpath.py", line 295, in lxml._elementpath._build_path_iterator
    selector.append(ops[token[0]](_next, token))
  File "src/lxml/_elementpath.py", line 120, in lxml._elementpath.prepare_descendant
    token = next()
  File "src/lxml/_elementpath.py", line 88, in xpath_tokenizer
    raise SyntaxError("prefix %r not found in prefix map" % prefix)
SyntaxError: prefix 'hha' not found in prefix map

----------------------------------------------------------------------
Ran 1979 tests in 5.332s

Revision history for this message
scoder (scoder) wrote :

I'm targeting this to lxml 5.0 since it's a behavioural change in libxml2 2.10.4 and later. I'll just remove the test and declare the parsing of HTML tag "prefixes" as version dependent and not future proof.

Changed in lxml:
milestone: none → 5.0
assignee: nobody → scoder (scoder)
status: New → Triaged
Revision history for this message
scoder (scoder) wrote :

I changed the behaviour in lxml 5.0 to consider "prefixes" in HTML documents as part of the tag name when searching for tags.
Let's see if this works for our users.

See https://github.com/lxml/lxml/commit/72f5a287a4016ecb405f2e8a4a03ae22a5b0b496

Changed in lxml:
importance: Undecided → Medium
status: Triaged → Fix Committed
scoder (scoder)
Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.