lxml fails to build with stricter C compilers

Bug #2045435 reported by Eli Schwartz
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
lxml
Fix Released
Low
scoder

Bug Description

Originally discovered at https://bugs.gentoo.org/917562

Modern C compilers are becoming stricter with a variety of changes over
the last year or so. In particular, GCC 14 and clang 17 will error out
by default.

You can reproduce using an older gcc 13.2.1 and the following CFLAGS to
emulate those changed defaults:

CFLAGS="-Werror=incompatible-pointer-types"

Tested against libxml2 2.11.5

The failure is in etree.c:

```
2023-12-01 14:26:59,852 root INFO x86_64-pc-linux-gnu-gcc -Wsign-compare
-DNDEBUG -march=native -fstack-protector-all -O2 -pipe
-fdiagnostics-color=always -frecord-gcc-switches
-Werror=incompatible-pointer-types -DNDEBUG -fPIC
-DCYTHON_CLINE_IN_TRACEBACK=0 -Isrc/lxml -Isrc/lxml/includes
-I/usr/include/libxml2 -Isrc -I/usr/include/python3.11 -c
src/lxml/etree.c -o
/var/tmp/portage/dev-python/lxml-4.9.3-r2/work/lxml-lxml-4.9.3-python3_11/build/temp.linux-x86_64-cpython-311/src/lxml/etree.o
src/lxml/etree.c: In function
‘__pyx_f_4lxml_5etree_11_BaseParser__registerHtmlErrorHandler’:
src/lxml/etree.c:134305:5: warning: ‘__htmlDefaultSAXHandler’ is
deprecated [-Wdeprecated-declarations]
134305 | __pyx_t_2 = (((xmlSAXHandlerV1 *)__pyx_v_sax) ==
(&htmlDefaultSAXHandler));
       | ^~~~~~~~~
In file included from /usr/include/libxml2/libxml/threads.h:35,
                 from /usr/include/libxml2/libxml/xmlmemory.h:222,
                 from /usr/include/libxml2/libxml/tree.h:1310,
                 from src/lxml/includes/etree_defs.h:189,
                 from src/lxml/etree.c:1301:
/usr/include/libxml2/libxml/globals.h:259:29: note: declared here
  259 | XMLPUBFUN xmlSAXHandlerV1 * __htmlDefaultSAXHandler(void);
      | ^~~~~~~~~~~~~~~~~~~~~~~
src/lxml/etree.c:134352:7: warning: ‘__htmlDefaultSAXHandler’ is
deprecated [-Wdeprecated-declarations]
134352 | (void)(memcpy(__pyx_v_sax, (&htmlDefaultSAXHandler),
(sizeof(htmlDefaultSAXHandler))));
       | ^
/usr/include/libxml2/libxml/globals.h:259:29: note: declared here
  259 | XMLPUBFUN xmlSAXHandlerV1 * __htmlDefaultSAXHandler(void);
      | ^~~~~~~~~~~~~~~~~~~~~~~
src/lxml/etree.c:134352:7: warning: ‘__htmlDefaultSAXHandler’ is
deprecated [-Wdeprecated-declarations]
134352 | (void)(memcpy(__pyx_v_sax, (&htmlDefaultSAXHandler),
(sizeof(htmlDefaultSAXHandler))));
       | ^
/usr/include/libxml2/libxml/globals.h:259:29: note: declared here
  259 | XMLPUBFUN xmlSAXHandlerV1 * __htmlDefaultSAXHandler(void);
      | ^~~~~~~~~~~~~~~~~~~~~~~
src/lxml/etree.c: In function ‘__pyx_pf_4lxml_5etree_11TreeBuilder_4data’:
src/lxml/etree.c:155053:66: error: passing argument 1 of
‘__pyx_f_4lxml_5etree_11TreeBuilder__handleSaxData’ from incompatible
pointer type [-Werror=incompatible-pointer-types]
155053 | __pyx_t_1 =
__pyx_f_4lxml_5etree_11TreeBuilder__handleSaxData(((struct
__pyx_obj_4lxml_5etree__SaxParserTarget *)__pyx_v_self), __pyx_v_data);
if (unlikely(__pyx_t_1 == ((int)-1))) __PYX_ERR(3, 832, __pyx_L1_error)
       |
~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       | |
       |
 struct __pyx_obj_4lxml_5etree__SaxParserTarget *
src/lxml/etree.c:154330:105: note: expected ‘struct
__pyx_obj_4lxml_5etree_TreeBuilder *’ but argument is of type ‘struct
__pyx_obj_4lxml_5etree__SaxParserTarget *’
154330 | static int
__pyx_f_4lxml_5etree_11TreeBuilder__handleSaxData(struct
__pyx_obj_4lxml_5etree_TreeBuilder *__pyx_v_self, PyObject *__pyx_v_data) {
       |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
src/lxml/etree.c: In function ‘__pyx_pf_4lxml_5etree_11TreeBuilder_6start’:
src/lxml/etree.c:155259:67: error: passing argument 1 of
‘__pyx_f_4lxml_5etree_11TreeBuilder__handleSaxStart’ from incompatible
pointer type [-Werror=incompatible-pointer-types]
155259 | __pyx_t_2 =
__pyx_f_4lxml_5etree_11TreeBuilder__handleSaxStart(((struct
__pyx_obj_4lxml_5etree__SaxParserTarget *)__pyx_v_self), __pyx_v_tag,
__pyx_v_attrs, __pyx_v_nsmap); if (unlikely(!__pyx_t_2)) __PYX_ERR(3,
841, __pyx_L1_error)
       |
 ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       | |
       |
  struct __pyx_obj_4lxml_5etree__SaxParserTarget *
src/lxml/etree.c:153949:112: note: expected ‘struct
__pyx_obj_4lxml_5etree_TreeBuilder *’ but argument is of type ‘struct
__pyx_obj_4lxml_5etree__SaxParserTarget *’
153949 | static PyObject
*__pyx_f_4lxml_5etree_11TreeBuilder__handleSaxStart(struct
__pyx_obj_4lxml_5etree_TreeBuilder *__pyx_v_self, PyObject *__pyx_v_tag,
PyObject *__pyx_v_attrib, PyObject *__pyx_v_nsmap) {
       |
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
src/lxml/etree.c: In function ‘__pyx_pf_4lxml_5etree_11TreeBuilder_8end’:
src/lxml/etree.c:155412:65: error: passing argument 1 of
‘__pyx_f_4lxml_5etree_11TreeBuilder__handleSaxEnd’ from incompatible
pointer type [-Werror=incompatible-pointer-types]
155412 | __pyx_t_1 =
__pyx_f_4lxml_5etree_11TreeBuilder__handleSaxEnd(((struct
__pyx_obj_4lxml_5etree__SaxParserTarget *)__pyx_v_self), __pyx_v_tag);
if (unlikely(!__pyx_t_1)) __PYX_ERR(3, 848, __pyx_L1_error)
       |
~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       | |
       |
struct __pyx_obj_4lxml_5etree__SaxParserTarget *
src/lxml/etree.c:154222:110: note: expected ‘struct
__pyx_obj_4lxml_5etree_TreeBuilder *’ but argument is of type ‘struct
__pyx_obj_4lxml_5etree__SaxParserTarget *’
154222 | static PyObject
*__pyx_f_4lxml_5etree_11TreeBuilder__handleSaxEnd(struct
__pyx_obj_4lxml_5etree_TreeBuilder *__pyx_v_self, CYTHON_UNUSED PyObject
*__pyx_v_tag) {
       |
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
src/lxml/etree.c: In function ‘__pyx_pf_4lxml_5etree_11TreeBuilder_10pi’:
src/lxml/etree.c:155658:64: error: passing argument 1 of
‘__pyx_f_4lxml_5etree_11TreeBuilder__handleSaxPi’ from incompatible
pointer type [-Werror=incompatible-pointer-types]
155658 | __pyx_t_1 =
__pyx_f_4lxml_5etree_11TreeBuilder__handleSaxPi(((struct
__pyx_obj_4lxml_5etree__SaxParserTarget *)__pyx_v_self), __pyx_v_target,
__pyx_v_data); if (unlikely(!__pyx_t_1)) __PYX_ERR(3, 859, __pyx_L1_error)
       |
~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       | |
       |
struct __pyx_obj_4lxml_5etree__SaxParserTarget *
src/lxml/etree.c:154376:109: note: expected ‘struct
__pyx_obj_4lxml_5etree_TreeBuilder *’ but argument is of type ‘struct
__pyx_obj_4lxml_5etree__SaxParserTarget *’
154376 | static PyObject
*__pyx_f_4lxml_5etree_11TreeBuilder__handleSaxPi(struct
__pyx_obj_4lxml_5etree_TreeBuilder *__pyx_v_self, PyObject
*__pyx_v_target, PyObject *__pyx_v_data) {
       |
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
src/lxml/etree.c: In function
‘__pyx_pf_4lxml_5etree_11TreeBuilder_12comment’:
src/lxml/etree.c:155803:69: error: passing argument 1 of
‘__pyx_f_4lxml_5etree_11TreeBuilder__handleSaxComment’ from incompatible
pointer type [-Werror=incompatible-pointer-types]
155803 | __pyx_t_1 =
__pyx_f_4lxml_5etree_11TreeBuilder__handleSaxComment(((struct
__pyx_obj_4lxml_5etree__SaxParserTarget *)__pyx_v_self),
__pyx_v_comment); if (unlikely(!__pyx_t_1)) __PYX_ERR(3, 867,
__pyx_L1_error)
       |
   ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       |
    |
       |
    struct __pyx_obj_4lxml_5etree__SaxParserTarget *
src/lxml/etree.c:154556:114: note: expected ‘struct
__pyx_obj_4lxml_5etree_TreeBuilder *’ but argument is of type ‘struct
__pyx_obj_4lxml_5etree__SaxParserTarget *’
154556 | static PyObject
*__pyx_f_4lxml_5etree_11TreeBuilder__handleSaxComment(struct
__pyx_obj_4lxml_5etree_TreeBuilder *__pyx_v_self, PyObject
*__pyx_v_comment) {
       |
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
src/lxml/etree.c: In function ‘__pyx_pf_4lxml_5etree_4XSLT_18__call__’:
src/lxml/etree.c:225871:73: error: passing argument 1 of
‘__pyx_f_4lxml_5etree_12_XSLTContext__copy’ from incompatible pointer
type [-Werror=incompatible-pointer-types]
225871 | __pyx_t_2 = ((PyObject
*)__pyx_f_4lxml_5etree_12_XSLTContext__copy(((struct
__pyx_obj_4lxml_5etree__BaseContext *)__pyx_v_self->_context))); if
(unlikely(!__pyx_t_2)) __PYX_ERR(4, 550, __pyx_L9_error)
       |

~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       |
        |
       |
        struct __pyx_obj_4lxml_5etree__BaseContext *
src/lxml/etree.c:223147:138: note: expected ‘struct
__pyx_obj_4lxml_5etree__XSLTContext *’ but argument is of type ‘struct
__pyx_obj_4lxml_5etree__BaseContext *’
223147 | static struct __pyx_obj_4lxml_5etree__BaseContext
*__pyx_f_4lxml_5etree_12_XSLTContext__copy(struct
__pyx_obj_4lxml_5etree__XSLTContext *__pyx_v_self) {
       |

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
src/lxml/etree.c: In function ‘__pyx_f_4lxml_5etree__copyXSLT’:
src/lxml/etree.c:227688:71: error: passing argument 1 of
‘__pyx_f_4lxml_5etree_12_XSLTContext__copy’ from incompatible pointer
type [-Werror=incompatible-pointer-types]
227688 | __pyx_t_2 = ((PyObject
*)__pyx_f_4lxml_5etree_12_XSLTContext__copy(((struct
__pyx_obj_4lxml_5etree__BaseContext *)__pyx_v_stylesheet->_context)));
if (unlikely(!__pyx_t_2)) __PYX_ERR(4, 691, __pyx_L1_error)
       |

~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       |
      |
       |
      struct __pyx_obj_4lxml_5etree__BaseContext *
src/lxml/etree.c:223147:138: note: expected ‘struct
__pyx_obj_4lxml_5etree__XSLTContext *’ but argument is of type ‘struct
__pyx_obj_4lxml_5etree__BaseContext *’
223147 | static struct __pyx_obj_4lxml_5etree__BaseContext
*__pyx_f_4lxml_5etree_12_XSLTContext__copy(struct
__pyx_obj_4lxml_5etree__XSLTContext *__pyx_v_self) {
       |

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
src/lxml/etree.c: In function ‘__pyx_pymod_exec_etree’:
src/lxml/etree.c:287466:3: warning: ‘xmlThrDefLineNumbersDefaultValue’
is deprecated [-Wdeprecated-declarations]
287466 | (void)(xmlThrDefLineNumbersDefaultValue(1));
       | ^
/usr/include/libxml2/libxml/globals.h:426:15: note: declared here
  426 | XMLPUBFUN int xmlThrDefLineNumbersDefaultValue(int v);
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

 affects lxml

Revision history for this message
scoder (scoder) wrote :

This is a known issue in Cython: https://github.com/cython/cython/issues/2747

Basically, Cython generates a suboptimal cast for the self object that it passes into a cdef function. The code is fine, it's just that the C compiler is unhappy seeing an extension type rather than its (compatible) base type.

Changed in lxml:
status: New → Triaged
Revision history for this message
Colin Dean (colindean) wrote :
Download full text (18.6 KiB)

I'm encountering this while installing lxml 5.0.0 on macOS.

My versions:

```
$ sw_vers
ProductName: macOS
ProductVersion: 13.6.2
BuildVersion: 22G320
$ poetry run python --version
Python 3.9.16
$ clang --version
Homebrew clang version 17.0.6
Target: arm64-apple-darwin22.6.0
Thread model: posix
InstalledDir: /opt/homebrew/opt/llvm/bin
```

And then the build failure, output from pip running within poetry (since I'm using poetry primarily):

```
$ poetry run pip wheel --no-cache-dir --use-pep517 "lxml (==5.0.0)"
Looking in indexes: https://binrepo.mycompany.com/artifactory/api/pypi/pypi-remote/simple/, https://binrepo.mycompany.com/artifactory/api/pypi/tgt-python/simple/
Collecting lxml==5.0.0
  Downloading https://binrepo.mycompany.com/artifactory/api/pypi/pypi-remote/packages/packages/80/2c/076fafd979728858829fb9ce2e13fa6367b6be9acc4da0cff6367aa6a1ce/lxml-5.0.0.zip (4.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.1/4.1 MB 5.6 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: lxml
  Building wheel for lxml (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for lxml (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [161 lines of output]
      <string>:67: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
      Building lxml version 5.0.0.
      Building without Cython.
      Building against libxml2 2.9.4 and libxslt 1.1.29
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.macosx-12.6-arm64-cpython-39
      creating build/lib.macosx-12.6-arm64-cpython-39/lxml
      copying src/lxml/_elementpath.py -> build/lib.macosx-12.6-arm64-cpython-39/lxml
      copying src/lxml/sax.py -> build/lib.macosx-12.6-arm64-cpython-39/lxml
      copying src/lxml/pyclasslookup.py -> build/lib.macosx-12.6-arm64-cpython-39/lxml
      copying src/lxml/__init__.py -> build/lib.macosx-12.6-arm64-cpython-39/lxml
      copying src/lxml/builder.py -> build/lib.macosx-12.6-arm64-cpython-39/lxml
      copying src/lxml/doctestcompare.py -> build/lib.macosx-12.6-arm64-cpython-39/lxml
      copying src/lxml/usedoctest.py -> build/lib.macosx-12.6-arm64-cpython-39/lxml
      copying src/lxml/cssselect.py -> build/lib.macosx-12.6-arm64-cpython-39/lxml
      copying src/lxml/ElementInclude.py -> build/lib.macosx-12.6-arm64-cpython-39/lxml
      creating build/lib.macosx-12.6-arm64-cpython-39/lxml/includes
      copying src/lxml/includes/__init__.py -> build/lib.macosx-12.6-arm64-cpython-39/lxml/includes
      creating build/lib.macosx-12.6-arm64-cpython-39/lxml/html
      copying src/lxml/html/soupparser.py -> build/lib.macosx-12.6-arm64-cpython-39/lxml/html
      copying src/lxml/html/defs.py -> build/lib.macosx-12.6-arm64-cpython-39/lxml/html
      copying src/lxml/html/_setmixin.py -> build/lib.macosx-12.6-arm64-cpython-39/lxml/html
      copying src/lxml/html/clean.py -> build/lib.macosx-12.6-arm64-cpython-39/lxml/html
      copying src/lxml/...

Revision history for this message
Eli Schwartz (eschwartz) wrote :

> I anticipate that limiting lxml to '>=4,<5' in Poetry's version spec will suffice for my particular use case.

Are you very sure about that?

I got the original error with 4.9.3, and according to the author, this has nothing to do with the lxml version and everything to do with the Cython version.

Perhaps binrepo.mycompany.com/artifactory already has a prebuilt wheel for lxml >=4,<5 that was compiled with an older clang?

Revision history for this message
scoder (scoder) wrote :

There are two issues here. One is the incorrect cast that Cython generates. The second issue, that @colindean mentions, is due to incompatibilities in libxml2 versions. They added some 'const' modifiers to their API in 2.12.x which lxml now adheres to. This leads to warnings (or, in your case, errors) in older libxml2 versions. The way the code goes is actually correct: I'm now passing a const requiring function into the old function pointer that doesn't care about const (or not). But it seems that clang still complains about it.

As much as I dislike doing this, I'll add casts to silence the warnings (and, sadly, any real future errors that might occur). That solves a larger part of this problem in real-world environments. It'll be in 5.0.1.

Changed in lxml:
assignee: nobody → scoder (scoder)
importance: Undecided → Low
status: Triaged → Confirmed
Revision history for this message
Victor Stinner (vstinner) wrote :

Cython 3.0.9 was released 2 days ago: it adds #pragma to workaround the GCC compiler warnings.

Revision history for this message
scoder (scoder) wrote :

Fixed in lxml 5.0.2 and 5.1.1.

Changed in lxml:
milestone: none → 5.0.2
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.