diff -Nru lxml-3.3.2/CHANGES.txt lxml-3.3.3/CHANGES.txt --- lxml-3.3.2/CHANGES.txt 2014-02-26 19:36:12.000000000 +0000 +++ lxml-3.3.3/CHANGES.txt 2014-03-04 16:26:13.000000000 +0000 @@ -2,6 +2,22 @@ lxml changelog ============== +3.3.3 (2014-03-04) +================== + +Bugs fixed +---------- + +* Crash when using Element subtypes with ``__slots__``. + +Other changes +------------- + +* The internal classes ``_LogEntry`` and ``_Attrib`` can no longer be + subclassed from Python code. + + + 3.3.2 (2014-02-26) ================== @@ -10,7 +26,7 @@ * The properties ``resolvers`` and ``version``, as well as the methods ``set_element_class_lookup()`` and ``makeelement()``, were lost from - ``iterparse`` objects. + ``iterparse`` objects in 3.3.0. * LP#1222132: instances of ``XMLSchema``, ``Schematron`` and ``RelaxNG`` did not clear their local ``error_log`` before running a validation. @@ -18,8 +34,8 @@ * LP#1238500: lxml.doctestcompare mixed up "expected" and "actual" in attribute values. -* Some file I/O tests were failing in MS-Windows due to incorrect temp file - usage. Initial patch by Gabi Davar. +* Some file I/O tests were failing in MS-Windows due to non-portable temp + file usage. Initial patch by Gabi Davar. * LP#910014: duplicate IDs in a document were not reported by DTD validation. diff -Nru lxml-3.3.2/debian/changelog lxml-3.3.3/debian/changelog --- lxml-3.3.2/debian/changelog 2014-03-27 18:16:16.000000000 +0000 +++ lxml-3.3.3/debian/changelog 2014-03-27 18:16:16.000000000 +0000 @@ -1,8 +1,8 @@ -lxml (3.3.2-1build1) trusty; urgency=medium +lxml (3.3.3-1) unstable; urgency=medium - * No-change rebuild to drop Python 3.3 support. + * New upstrea version 3.3.3. - -- Matthias Klose Sun, 23 Mar 2014 15:28:27 +0000 + -- Matthias Klose Thu, 27 Mar 2014 11:13:51 +0100 lxml (3.3.2-1) unstable; urgency=medium @@ -10,7 +10,7 @@ - Re-add lost properties of iterparse objects. Closes: #740226, #740102. - -- Matthias Klose Mon, 03 Mar 2014 23:22:35 +0100 + -- Matthias Klose Mon, 03 Mar 2014 23:22:35 +0100 lxml (3.3.1-1) unstable; urgency=medium diff -Nru lxml-3.3.2/doc/FAQ.txt lxml-3.3.3/doc/FAQ.txt --- lxml-3.3.2/doc/FAQ.txt 2014-02-26 19:36:12.000000000 +0000 +++ lxml-3.3.3/doc/FAQ.txt 2014-03-04 16:26:13.000000000 +0000 @@ -48,12 +48,12 @@ 6 Parsing and Serialisation 6.1 Why doesn't the ``pretty_print`` option reformat my XML output? 6.2 Why can't lxml parse my XML from unicode strings? - 6.3 What is the difference between str(xslt(doc)) and xslt(doc).write() ? - 6.4 Why can't I just delete parents or clear the root node in iterparse()? - 6.5 How do I output null characters in XML text? - 6.6 Is lxml vulnerable to XML bombs? - 6.7 How do I configure lxml safely as a web-service endpoint? - 6.8 Can lxml parse from file objects opened in unicode mode? + 6.3 Can lxml parse from file objects opened in unicode mode? + 6.4 What is the difference between str(xslt(doc)) and xslt(doc).write() ? + 6.5 Why can't I just delete parents or clear the root node in iterparse()? + 6.6 How do I output null characters in XML text? + 6.7 Is lxml vulnerable to XML bombs? + 6.8 How do I configure lxml safely as a web-service endpoint? 7 XPath and Document Traversal 7.1 What are the ``findall()`` and ``xpath()`` methods on Element(Tree)? 7.2 Why doesn't ``findall()`` support full XPath expressions? @@ -862,13 +862,26 @@ Note that the ``remove_blank_text`` option also uses a heuristic if it has no definite knowledge about the document's ignorable whitespace. It will keep blank text nodes that appear after non-blank text nodes -at the same level. This is to prevent document-style XML from -breaking. +at the same level. This is to prevent document-style XML from loosing +content. -If you want to be sure all blank text is removed, you have to use -either a DTD to tell the parser which whitespace it can safely ignore, -or remove the ignorable whitespace manually after parsing, e.g. by -setting all tail text to None: +The HTMLParser has this structural knowledge built-in, which means that +most whitespace that appears between tags in HTML documents will *not* +be removed by this option, except in places where it is truly ignorable, +e.g. in the page header, between table structure tags, etc. Therefore, +it is also safe to use this option with the HTMLParser, as it will keep +content like the following intact (i.e. it will not remove the space +that separates the two words): + +.. sourcecode:: html + +

some text

+ +If you want to be sure all blank text is removed from an XML document +(or just more blank text than the parser does by itself), you have to +use either a DTD to tell the parser which whitespace it can safely +ignore, or remove the ignorable whitespace manually after parsing, +e.g. by setting all tail text to None: .. sourcecode:: python @@ -921,6 +934,30 @@ .. _`XML specification`: http://www.w3.org/TR/REC-xml/ +Can lxml parse from file objects opened in unicode/text mode? +------------------------------------------------------------- + +Technically, yes. However, you likely do not want to do that, because +it is extremely inefficient. The text encoding that libxml2 uses +internally is UTF-8, so parsing from a Unicode file means that Python +first reads a chunk of data from the file, then decodes it into a new +buffer, and then copies it into a new unicode string object, just to +let libxml2 make yet another copy while encoding it down into UTF-8 +in order to parse it. It's clear that this involves a lot more +recoding and copying than when parsing straight from the bytes that +the file contains. + +If you really know the encoding better than the parser (e.g. when +parsing HTML that lacks a content declaration), then instead of passing +an encoding parameter into the file object when opening it, create a +new instance of an XMLParser or HTMLParser and pass the encoding into +its constructor. Afterwards, use that parser for parsing, e.g. by +passing it into the ``etree.parse(file, parser)`` function. Remember +to open the file in binary mode (mode="rb"), or, if possible, prefer +passing the file path directly into ``parse()`` instead of an opened +Python file object. + + What is the difference between str(xslt(doc)) and xslt(doc).write() ? --------------------------------------------------------------------- @@ -1050,27 +1087,6 @@ .. _defusedxml: https://bitbucket.org/tiran/defusedxml -Can lxml parse from file objects opened in unicode/text mode? -------------------------------------------------------------- - -Technically, yes. However, you likely do not want to do that, because -it is extremely inefficient. The text encoding that libxml2 uses -internally is UTF-8, so parsing from a Unicode file means that Python -first reads a chunk of data from the file, then decodes it into a new -buffer, and then copies it into a new unicode string object, just to -let libxml2 make yet another copy while encoding it down into UTF-8 -in order to parse it. It's clear that this involves a lot more -recoding and copying than when parsing straight from the bytes that -the file contains. - -If you really know the encoding better than the parser (e.g. when -parsing HTML that lacks a content declaration), then instead of passing -an encoding parameter into the file object when opening it, create a -new instance of an XMLParser or HTMLParser and pass the encoding into -its constructor. Afterwards, use that parser for parsing, e.g. by -passing it into the ``etree.parse(file, parser)`` function. - - XPath and Document Traversal ============================ diff -Nru lxml-3.3.2/doc/html/api/abc.ABCMeta-class.html lxml-3.3.3/doc/html/api/abc.ABCMeta-class.html --- lxml-3.3.2/doc/html/api/abc.ABCMeta-class.html 2014-02-26 19:37:35.000000000 +0000 +++ lxml-3.3.3/doc/html/api/abc.ABCMeta-class.html 2014-03-04 18:18:59.000000000 +0000 @@ -426,7 +426,7 @@