diff -Nru feedparser-5.0.1/LICENSE feedparser-5.1.2/LICENSE --- feedparser-5.0.1/LICENSE 2010-11-19 05:34:13.000000000 +0000 +++ feedparser-5.1.2/LICENSE 2012-02-16 08:10:42.000000000 +0000 @@ -4,7 +4,8 @@ ----- begin license block ----- -Copyright (c) 2002-2008, Mark Pilgrim +Copyright (c) 2010-2012 Kurt McKee +Copyright (c) 2002-2008 Mark Pilgrim All rights reserved. Redistribution and use in source and binary forms, with or without modification, @@ -41,16 +42,16 @@ Copyright 2004-2008 Mark Pilgrim. All rights reserved. -Redistribution and use in source (XML DocBook) and "compiled" forms (SGML, -HTML, PDF, PostScript, RTF and so forth) with or without modification, are -permitted provided that the following conditions are met: +Redistribution and use in source (Sphinx ReST) and "compiled" forms (HTML, PDF, +PostScript, RTF and so forth) with or without modification, are permitted +provided that the following conditions are met: -* Redistributions of source code (XML DocBook) must retain the above copyright +* Redistributions of source code (Sphinx ReST) must retain the above copyright notice, this list of conditions and the following disclaimer. -* Redistributions in compiled form (transformed to other DTDs, converted to - PDF, PostScript, RTF and other formats) must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. +* Redistributions in compiled form (converted to HTML, PDF, PostScript, RTF and + other formats) must reproduce the above copyright notice, this list of + conditions and the following disclaimer in the documentation and/or other + materials provided with the distribution. THIS DOCUMENTATION IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE diff -Nru feedparser-5.0.1/MANIFEST.in feedparser-5.1.2/MANIFEST.in --- feedparser-5.0.1/MANIFEST.in 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/MANIFEST.in 2012-02-16 08:10:42.000000000 +0000 @@ -0,0 +1,6 @@ +recursive-include feedparser/tests *.xml *.gz *.z +recursive-include docs *.rst *.py *.css +include feedparser/feedparsertest.py +include feedparser/sgmllib3.py +include LICENSE +include NEWS diff -Nru feedparser-5.0.1/NEWS feedparser-5.1.2/NEWS --- feedparser-5.0.1/NEWS 2011-02-20 20:41:11.000000000 +0000 +++ feedparser-5.1.2/NEWS 2012-05-03 13:43:47.000000000 +0000 @@ -1,3 +1,77 @@ +5.1.2 - May 3, 2012 + * Minor changes to the documentation + * Strip potentially dangerous ENTITY declarations in encoded feeds + * feedparser will now try to continue parsing despite compression errors + * Fix issue 321 a little more (the initial fix missed a code path) + * Issue 337 (`_parse_date_rfc822()` returns None on single-digit days) + * Issue 343 (add magnet links to the ACCEPTABLE_URI_SCHEMES) + * Issue 344 (handle deflated data with no headers nor checksums) + * Issue 347 (support `itunes:image` elements with a `url` attribute) + +5.1.1 - March 20, 2011 + * Fix mistakes, typos, and bugs in the unit test code + * Fix crash in Python 2.4 and 2.5 if the feed has a UTF_32 byte order mark + * Replace the RFC822 date parser for more extensibility + * Issue 304 (handle RFC822 dates with timezones like GMT+00:00) + * Issue 309 (itunes:keywords should be split by commas, not whitespace) + * Issue 310 (pubDate should map to `published`, not `updated`) + * Issue 313 (include the compression test files in MANIFEST.in) + * Issue 314 (far-flung RFC822 dates don't throw OverflowError on x64) + * Issue 315 (HTTP server for unit tests runs on 0.0.0.0) + * Issue 321 (malformed URIs can cause ValueError to be thrown) + * Issue 322 (HTTP redirect to HTTP 304 causes SAXParseException) + * Issue 323 (installing chardet causes 11 unit test failures) + * Issue 325 (map `description_detail` to `summary_detail`) + * Issue 326 (Unicode filename causes UnicodeEncodeError if locale is ASCII) + * Issue 327 (handle RFC822 dates with extraneous commas) + * Issue 328 (temporarily map `updated` to `published` due to issue 310) + * Issue 329 (escape backslashes in Windows path in docs/introduction.rst) + * Issue 331 (don't escape backslashes that are in raw strings in the docs) + +5.1 - December 2, 2011 + * Extensive, extensive unit test refactoring + * Convert the Docbook documentation to ReST + * Include the documentation in the source distribution + * Consolidate the disparate README files into one + * Support Jython somewhat (almost all unit tests pass) + * Support Python 3.2 + * Fix Python 3 issues exposed by improved unit tests + * Fix international domain name issues exposed by improved unit tests + * Issue 148 (loose parser doesn't always return unicode strings) + * Issue 204 (FeedParserDict behavior should not be controlled by `assert`) + * Issue 247 (mssql date parser uses hardcoded tokyo timezone) + * Issue 249 (KeyboardInterrupt and SystemExit exceptions being caught) + * Issue 250 (`updated` can be a 9-tuple or a string, depending on context) + * Issue 252 (running setup.py in Python 3 fails due to missing sgmllib) + * Issue 253 (document that text/plain content isn't sanitized) + * Issue 260 (Python 3 doesn't decompress gzip'ed or deflate'd content) + * Issue 261 (popping from empty tag list) + * Issue 262 (docs are missing from distribution files) + * Issue 264 (vcard parser crashes on non-ascii characters) + * Issue 265 (http header comparisons are case sensitive) + * Issue 271 (monkey-patching sgmllib breaks other libraries) + * Issue 272 (can't pass bytes or str to `parse()` in Python 3) + * Issue 275 (`_parse_date()` doesn't catch OverflowError) + * Issue 276 (mutable types used as default values in `parse()`) + * Issue 277 (`python3 setup.py install` fails) + * Issue 281 (`_parse_date()` doesn't catch ValueError) + * Issue 282 (`_parse_date()` crashes when passed `None`) + * Issue 285 (crash on empty xmlns attribute) + * Issue 286 ('apos' character entity not handled properly) + * Issue 289 (add an option to disable microformat parsing) + * Issue 290 (Blogger's invalid img tags are unparseable) + * Issue 292 (atom id element not explicitly supported) + * Issue 294 ('categories' key exists but raises KeyError) + * Issue 297 (unresolvable external doctype causes crash) + * Issue 298 (nested nodes clobber actual values) + * Issue 300 (performance improvements) + * Issue 303 (unicode characters cause crash during relative uri resolution) + * Remove "Hot RSS" support since the format doesn't actually exist + * Remove the old feedparser.org website files from the source + * Remove the feedparser command line interface + * Remove the Zope interoperability hack + * Remove extraneous whitespace + 5.0.1 - February 20, 2011 * Fix issue 91 (invalid text in XML declaration causes sanitizer to crash) * Fix issue 254 (sanitization can be bypassed by malformed XML comments) diff -Nru feedparser-5.0.1/PKG-INFO feedparser-5.1.2/PKG-INFO --- feedparser-5.0.1/PKG-INFO 2011-02-20 20:46:04.000000000 +0000 +++ feedparser-5.1.2/PKG-INFO 2012-05-03 13:57:53.000000000 +0000 @@ -1,21 +1,13 @@ Metadata-Version: 1.0 Name: feedparser -Version: 5.0.1 +Version: 5.1.2 Summary: Universal feed parser, handles RSS 0.9x, RSS 1.0, RSS 2.0, CDF, Atom 0.3, and Atom 1.0 feeds -Home-page: http://feedparser.org/ -Author: Mark Pilgrim -Author-email: mark@diveintomark.org +Home-page: http://code.google.com/p/feedparser/ +Author: Kurt McKee +Author-email: contactme@kurtmckee.org License: UNKNOWN Download-URL: http://code.google.com/p/feedparser/ -Description: Universal feed parser - - Handles RSS 0.9x, RSS 1.0, RSS 2.0, CDF, Atom 0.3, and Atom 1.0 feeds - - Visit http://feedparser.org/ for the latest version - Visit http://feedparser.org/docs/ for the latest documentation - - Required: Python 2.4 or later - Recommended: CJKCodecs and iconv_codec +Description: UNKNOWN Keywords: atom,cdf,feed,parser,rdf,rss Platform: POSIX Platform: Windows @@ -32,5 +24,6 @@ Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.0 Classifier: Programming Language :: Python :: 3.1 +Classifier: Programming Language :: Python :: 3.2 Classifier: Topic :: Software Development :: Libraries :: Python Modules Classifier: Topic :: Text Processing :: Markup :: XML diff -Nru feedparser-5.0.1/README feedparser-5.1.2/README --- feedparser-5.0.1/README 2010-11-19 05:34:13.000000000 +0000 +++ feedparser-5.1.2/README 2012-02-16 08:10:42.000000000 +0000 @@ -1,13 +1,75 @@ -Universal Feed Parser -Parse RSS and Atom feeds in Python. 4000 unit tests. Open source. +feedparser - Parse Atom and RSS feeds in Python. -Copyright (c) 2002-2008, Mark Pilgrim -open source, see LICENSE file for details +Copyright (c) 2010-2012 Kurt McKee +Copyright (c) 2002-2008 Mark Pilgrim ------ +feedparser is open source. See the LICENSE file for more information. -To install: -$ python setup.py install -Full documentation is available in the docs/ directory, or online at -http://feedparser.org/docs/ +Installation +============ + +Feedparser can be installed using distutils or setuptools by running: + + $ python setup.py install + +If you're using Python 3, feedparser will automatically be updated by the 2to3 +tool; installation should be seamless across Python 2 and Python 3. + +There's one caveat, however: sgmllib.py was deprecated in Python 2.6 and is no +longer included in the Python 3 standard library. Because feedparser currently +relies on sgmllib.py to handle illformed feeds (among other things), it's a +useful library to have installed. + +If your feedparser download included a copy of sgmllib.py, it's probably called +sgmllib3.py, and you can simply rename the file to sgmllib.py. It will not be +automatically installed using the command above, so you will have to manually +copy it to somewhere in your Python path. + +If a copy of sgmllib.py was not included in your feedparser download, you can +grab a copy from the Python 2 standard library (preferably from the Python 2.7 +series) and run the 2to3 tool on it: + + $ 2to3 -w sgmllib.py + +If you copied sgmllib.py from a Python 2.6 or 2.7 installation you'll +additionally need to edit the resulting file to remove the `warnpy3k` lines at +the top of the file. There should be four lines at the top of the file that you +can delete. + +Because sgmllib.py is a part of the Python codebase, it's licensed under the +Python Software Foundation License. You can find a copy of that license at +python.org: + + http://docs.python.org/license.html + + +Documentation +============= + +The feedparser documentation is available on the web at: + + http://packages.python.org/feedparser + +It is also included in its source format, ReST, in the docs/ directory. To +build the documentation you'll need the Sphinx package, which is available at: + + http://sphinx.pocoo.org/ + +You can then build HTML pages using a command similar to: + + $ sphinx-build -b html docs/ fpdocs + +This will produce HTML documentation in the fpdocs/ directory. + + +Testing +======= + +Feedparser has an extensive test suite that has been growing for a decade. If +you'd like to run the tests yourself, you can run the following command: + + $ python feedparsertest.py + +This will spawn an HTTP server that will listen on port 8097. The tests will +fail if that port is in use. diff -Nru feedparser-5.0.1/README-PYTHON3 feedparser-5.1.2/README-PYTHON3 --- feedparser-5.0.1/README-PYTHON3 2011-01-10 04:25:07.000000000 +0000 +++ feedparser-5.1.2/README-PYTHON3 1970-01-01 00:00:00.000000000 +0000 @@ -1,36 +0,0 @@ -Universal Feed Parser -Parse RSS and Atom feeds in Python. 4000 unit tests. Open source. - -Copyright (c) 2002-2008, Mark Pilgrim -Copyright (c) 2008-2010, multiple authors -open source, see LICENSE file for details - ------ - -If you're using Python 3, feedparser can be converted to work with Python 3 by -the 2to3 tool! To convert it, you can either run the convert_to_py3.sh, or you -can run the following command manually if there's a problem: - - $ 2to3 -w feedparser.py feedparsertest.py - -Unfortunately, sgmllib.py was deprecated in Python 2 and is no longer included -in Python 3. If a copy of sgmllib.py - ported to Python 3 - was not included -in your feedparser download, simply grab a copy from your Python 2 system -library (preferably from the Python 2.7 series) and run the 2to3 tool on it: - - $ 2to3 -w sgmllib.py - -You'll additionally need to edit the resulting file to remove the `warnpy3k` -lines at the top of the file. There should be four lines at the top of the file -that you can delete. - -If your feedparser download included the sgmllib.py file, it's probably called -sgmllib3.py, and you can simply rename the file to sgmllib.py. (The "3" -prevents the file from conflicting with your system's existing sgmllib if -you're using Python 2.) - -Because sgmllib is a part of the Python codebase, it's licensed under the -Python Software Foundation License. You can find a copy of that license at -python.org: - - http://docs.python.org/license.html diff -Nru feedparser-5.0.1/README-TESTS feedparser-5.1.2/README-TESTS --- feedparser-5.0.1/README-TESTS 2010-11-19 05:34:13.000000000 +0000 +++ feedparser-5.1.2/README-TESTS 1970-01-01 00:00:00.000000000 +0000 @@ -1,15 +0,0 @@ -Universal Feed Parser -Parse RSS and Atom feeds in Python. 4000 unit tests. Open source. - -Copyright (c) 2002-2008, Mark Pilgrim -open source, see LICENSE file for details - ------ - -To run test suite: -$ python feedparsertest.py - -Test suite files are available in the tests/ directory, or online at -http://feedparser.org/tests/ - -Full documentation is available online at http://feedparser.org/docs/ diff -Nru feedparser-5.0.1/convert_to_py3.sh feedparser-5.1.2/convert_to_py3.sh --- feedparser-5.0.1/convert_to_py3.sh 2011-01-10 04:25:07.000000000 +0000 +++ feedparser-5.1.2/convert_to_py3.sh 1970-01-01 00:00:00.000000000 +0000 @@ -1,15 +0,0 @@ -#!/bin/sh - -# If the sgmllib3.py file exists, copy it to sgmllib.py -if [ -e feedparser/sgmllib3.py ]; then - echo "Copying feedparser/sgmllib3.py to feedparser/sgmllib.py" - cp feedparser/sgmllib3.py feedparser/sgmllib.py -fi - -echo "Using the 2to3 tool to convert feedparser.py and feedparsertest.py to Python 3" -2to3 -w feedparser/feedparser.py feedparser/feedparsertest.py - -if [ ! -e feedparser/sgmllib3.py ]; then - echo "No Python 3 version of sgmllib was found. This is a required library." - echo "See README-PYTHON3 for more details" -fi diff -Nru feedparser-5.0.1/debian/changelog feedparser-5.1.2/debian/changelog --- feedparser-5.0.1/debian/changelog 2011-04-04 17:43:50.000000000 +0000 +++ feedparser-5.1.2/debian/changelog 2012-12-27 15:30:52.000000000 +0000 @@ -1,3 +1,93 @@ +feedparser (5.1.2-1ubuntu1~oneiric0) oneiric; urgency=low + + * Automatic build for oneiric + + -- Thomas Perl Thu, 27 Dec 2012 16:30:52 +0100 + +feedparser (5.1.2-1ubuntu1) quantal; urgency=low + + * Merge with Debian. Remaining Ubuntu changes: + + Build for Python 3. + + debian/rules: + - Run the test suite during build + + Convert to dh_python2. + * Drop debian/patches/broken-tests.patch and d/p/CVE-2012-2921.patch + since both have been applied upstream by 5.1.2 + * Drop Build-Depends on python-chardet, python-libxml2, and + python-utidylib as these cause test suite failures. Now the Python 2 + build dependencies match the Python 3 build dependencies. (LP: #1056820) + + -- Barry Warsaw Thu, 27 Sep 2012 13:59:14 -0400 + +feedparser (5.1.2-1) unstable; urgency=high + + * New upstream release. (Closes: #674167) + * debian/control + - Homepage updated. (Closes: #649855) + - Standards-Version updated to 3.9.3.1 + * debian/watch fixed. + * debian/rules + - Migrated to dh_python2. (Closes: #646718) + - lintian debian-rules-missing-recommended-target warning fixed + + -- Carlos Galisteo Tue, 29 May 2012 09:54:36 +0200 + +feedparser (5.1-0ubuntu4) quantal; urgency=low + + * SECURITY UPDATE: Prevent ENTITY declarations from hiding in encoded + documents + - debian/patches/CVE-2012-2921.patch: normalize encoding then replace + DOCTYPE and ENTITY declarations + - CVE-2012-2921 + + -- Jamie Strandboge Tue, 22 May 2012 11:07:29 -0500 + +feedparser (5.1-0ubuntu3) precise; urgency=low + + * Remove python3-chardet from Build-Dep and Recommends. In comments here: + http://www.wefearchange.org/2012/01/debian-package-for-python-2-and-3.html + upstream indicates that chardet is essentially deprecated for + feedparser. Because python3-chardet needs a MIR, but isn't really + necessary, I'm removing it from the Recommends of + python3-feedparser, but not changing things for python-feedparser. + + -- Barry Warsaw Mon, 23 Jan 2012 10:27:25 -0500 + +feedparser (5.1-0ubuntu2) precise; urgency=low + + * Build for Python 3. + + -- Barry Warsaw Wed, 11 Jan 2012 15:18:57 +0100 + +feedparser (5.1-0ubuntu1) UNRELEASED; urgency=low + + * New upstream release. + + -- Barry Warsaw Wed, 11 Jan 2012 10:27:06 +0100 + +feedparser (5.0.1-1ubuntu3) precise; urgency=low + + * debian/control: add python-libxml2, python-chardet, and python-utidylib + to Build-Depends so that the test suite is testing what people are + actually installing. python-chardet pulls in 8 extra tests in the test + suite and the others are conditionally used in feedparser if they are + present. + + -- Jamie Strandboge Fri, 18 Nov 2011 08:40:00 -0600 + +feedparser (5.0.1-1ubuntu2) precise; urgency=low + + * debian/rules: + - Run the test suite during build + + -- Michael Terry Fri, 18 Nov 2011 08:34:04 -0500 + +feedparser (5.0.1-1ubuntu1) precise; urgency=low + + * Convert to dh_python2. + + -- Chuck Short Fri, 21 Oct 2011 11:55:13 -0400 + feedparser (5.0.1-1) unstable; urgency=low [ Carlos Galisteo ] diff -Nru feedparser-5.0.1/debian/compat feedparser-5.1.2/debian/compat --- feedparser-5.0.1/debian/compat 2011-02-15 21:28:10.000000000 +0000 +++ feedparser-5.1.2/debian/compat 2012-09-27 17:54:46.000000000 +0000 @@ -1 +1 @@ -5 +8 diff -Nru feedparser-5.0.1/debian/control feedparser-5.1.2/debian/control --- feedparser-5.0.1/debian/control 2011-04-04 19:08:00.000000000 +0000 +++ feedparser-5.1.2/debian/control 2012-09-27 21:05:10.000000000 +0000 @@ -1,23 +1,40 @@ Source: feedparser Section: python Priority: optional -Maintainer: Carlos Galisteo +Maintainer: Ubuntu Developers +XSBC-Original-Maintainer: Carlos Galisteo Uploaders: Debian Python Modules Team -Build-Depends: debhelper (>= 5.0.37.2), python, python-support +Build-Depends: debhelper (>= 8), + python (>= 2.6.6-3~), + python-setuptools, + python3, + python3-setuptools Vcs-Svn: svn://svn.debian.org/python-modules/packages/feedparser/trunk Vcs-Browser: http://svn.debian.org/viewsvn/python-modules/packages/feedparser/trunk/ -XS-Python-Version: >= 2.1 -Standards-Version: 3.9.1 -Homepage: http://www.feedparser.org +X-Python-Version: >= 2.6 +X-Python3-Version: >= 3.2 +Standards-Version: 3.9.3.1 +Homepage: https://code.google.com/p/feedparser/ Package: python-feedparser Architecture: all -Depends: ${python:Depends}, ${misc:Depends} -Recommends: python-libxml2, python-chardet, python-utidylib -XB-Python-Version: ${python:Versions} +Depends: ${misc:Depends}, ${python:Depends} +Recommends: python-chardet, python-libxml2, python-utidylib Description: Universal Feed Parser for Python Python module for downloading and parsing syndicated feeds. It can handle RSS 0.90, Netscape RSS 0.91, Userland RSS 0.91, RSS 0.92, RSS 0.93, RSS 0.94, RSS 1.0, RSS 2.0, Atom, and CDF feeds. . It provides the same API to all formats, and sanitizes URIs and HTML. + +Package: python3-feedparser +Architecture: all +Depends: ${misc:Depends}, ${python3:Depends} +Description: Universal Feed Parser for Python + Python module for downloading and parsing syndicated feeds. It can + handle RSS 0.90, Netscape RSS 0.91, Userland RSS 0.91, RSS 0.92, RSS + 0.93, RSS 0.94, RSS 1.0, RSS 2.0, Atom, and CDF feeds. + . + It provides the same API to all formats, and sanitizes URIs and HTML. + . + This is the Python 3 version of the package. diff -Nru feedparser-5.0.1/debian/patches/series feedparser-5.1.2/debian/patches/series --- feedparser-5.0.1/debian/patches/series 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/debian/patches/series 2012-09-27 21:03:45.000000000 +0000 @@ -0,0 +1 @@ +sgmllib3.patch diff -Nru feedparser-5.0.1/debian/patches/sgmllib3.patch feedparser-5.1.2/debian/patches/sgmllib3.patch --- feedparser-5.0.1/debian/patches/sgmllib3.patch 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/debian/patches/sgmllib3.patch 2012-09-27 17:54:46.000000000 +0000 @@ -0,0 +1,48 @@ +Description: Python 3 does not ship an sgmllib.py module. Upstream feedparser + ships a file called feedparser/sgmllib3.py and suggests installing this on + your sys.path. Rather than that, the Debian packaging installs the file as + feedparser_sgmllib3.py, so extend the import to look for this. +Author: Barry Warsaw + +--- a/feedparser/feedparser.py ++++ b/feedparser/feedparser.py +@@ -201,22 +201,30 @@ + # sgmllib is not available by default in Python 3; if the end user doesn't have + # it available then we'll lose illformed XML parsing, content santizing, and + # microformat support (at least while feedparser depends on BeautifulSoup). ++_SGML_AVAILABLE = 0 + try: + import sgmllib + except ImportError: +- # This is probably Python 3, which doesn't include sgmllib anymore +- _SGML_AVAILABLE = 0 ++ # Debian installs the upstream sgmllib3.py into this location. ++ try: ++ import feedparser_sgmllib3 as sgmllib ++ except ImportError: ++ # This is probably Python 3, which doesn't include sgmllib anymore ++ _SGML_AVAILABLE = 0 + +- # Mock sgmllib enough to allow subclassing later on +- class sgmllib(object): +- class SGMLParser(object): +- def goahead(self, i): +- pass +- def parse_starttag(self, i): +- pass ++ # Mock sgmllib enough to allow subclassing later on ++ class sgmllib(object): ++ class SGMLParser(object): ++ def goahead(self, i): ++ pass ++ def parse_starttag(self, i): ++ pass ++ else: ++ _SGML_AVAILABLE = 1 + else: + _SGML_AVAILABLE = 1 + ++if _SGML_AVAILABLE: + # sgmllib defines a number of module-level regular expressions that are + # insufficient for the XML parsing feedparser needs. Rather than modify + # the variables directly in sgmllib, they're defined here using the same diff -Nru feedparser-5.0.1/debian/pycompat feedparser-5.1.2/debian/pycompat --- feedparser-5.0.1/debian/pycompat 2011-02-15 21:28:10.000000000 +0000 +++ feedparser-5.1.2/debian/pycompat 1970-01-01 00:00:00.000000000 +0000 @@ -1 +0,0 @@ -2 diff -Nru feedparser-5.0.1/debian/python-feedparser.install feedparser-5.1.2/debian/python-feedparser.install --- feedparser-5.0.1/debian/python-feedparser.install 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/debian/python-feedparser.install 2012-09-27 17:54:46.000000000 +0000 @@ -0,0 +1 @@ +usr/lib/python2.*/*-packages/* diff -Nru feedparser-5.0.1/debian/python3-feedparser.install feedparser-5.1.2/debian/python3-feedparser.install --- feedparser-5.0.1/debian/python3-feedparser.install 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/debian/python3-feedparser.install 2012-09-27 17:54:46.000000000 +0000 @@ -0,0 +1 @@ +usr/lib/python3/*-packages/* diff -Nru feedparser-5.0.1/debian/python3-feedparser.lintian-overrides feedparser-5.1.2/debian/python3-feedparser.lintian-overrides --- feedparser-5.0.1/debian/python3-feedparser.lintian-overrides 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/debian/python3-feedparser.lintian-overrides 2012-09-27 17:54:46.000000000 +0000 @@ -0,0 +1,4 @@ +# lintian needs to be taught about python3-feedparser, which legitimately +# contains an embedded version of feedparser, obviously. :) + +python3-feedparser: embedded-feedparser-library diff -Nru feedparser-5.0.1/debian/rules feedparser-5.1.2/debian/rules --- feedparser-5.0.1/debian/rules 2011-04-04 17:43:50.000000000 +0000 +++ feedparser-5.1.2/debian/rules 2012-09-27 17:58:39.000000000 +0000 @@ -1,41 +1,39 @@ #!/usr/bin/make -f export DH_VERBOSE=1 --include /usr/share/python/python.mk -PREFIX := debian/python-feedparser/usr +%: + dh $@ --with python2,python3 -clean: - dh_testdir - dh_testroot - python ./setup.py clean - rm -rf build - dh_clean feedparser/*.pyc - -build: - -install: build - dh_testdir - dh_testroot - dh_installdirs - python ./setup.py install --prefix $(PREFIX) --no-compile $(py_setup_install_args) - -binary-arch: build install - -binary-indep: build install - dh_testdir - dh_testroot - dh_installchangelogs - #FIXME: Skipping 18MB of tests. - dh_installdocs -Xtests - dh_installexamples - dh_compress - dh_fixperms - dh_pysupport - dh_installdeb - dh_gencontrol - dh_md5sums - dh_builddeb - -binary: binary-indep binary-arch +override_dh_auto_build: + dh_auto_build + set -ex; for python in $(shell py3versions -r); do \ + $$python setup.py build; \ + done; + +override_dh_auto_install: + dh_auto_install + set -ex; for python in $(shell py3versions -r); do \ + $$python setup.py install --root=$(CURDIR)/debian/tmp --install-layout=deb; \ + done; + cp feedparser/sgmllib3.py $(CURDIR)/debian/tmp/usr/lib/python3/dist-packages/feedparser_sgmllib3.py + +override_dh_auto_clean: + dh_auto_clean + rm -rf build .*egg-info + +override_dh_auto_test: + # Add back data files which were missing in upstream tarball for 5.1 + cp $(CURDIR)/debian/upstream/*.z feedparser/tests/compression + cp $(CURDIR)/debian/upstream/*.gz feedparser/tests/compression +ifeq (,$(filter nocheck,$(DEB_BUILD_OPTIONS))) + cd feedparser && python ./feedparsertest.py + # FIXME: run tests for python3 too. This requires running 2to3 over + # feedparsertest.py also, and then running it against the already + # converted feedparser.py, but still being able to find the test + # directory. +else + @echo "nocheck set, not running tests" +endif -.PHONY: build clean binary-indep binary-arch binary install configure +override_dh_installdocs: + dh_installdocs -Xtests diff -Nru feedparser-5.0.1/debian/source/include-binaries feedparser-5.1.2/debian/source/include-binaries --- feedparser-5.0.1/debian/source/include-binaries 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/debian/source/include-binaries 2012-09-27 17:54:46.000000000 +0000 @@ -0,0 +1,5 @@ +debian/upstream/deflate-error.z +debian/upstream/deflate.z +debian/upstream/gzip.gz +debian/upstream/gzip-not-gzipped.gz +debian/upstream/gzip-struct-error.gz diff -Nru feedparser-5.0.1/debian/upstream/README feedparser-5.1.2/debian/upstream/README --- feedparser-5.0.1/debian/upstream/README 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/debian/upstream/README 2012-09-27 17:54:46.000000000 +0000 @@ -0,0 +1,10 @@ +These files were missing from the upstream 5.1 tarball. They were retrieved +from the upstream Subversion repository here: + +svn checkout http://feedparser.googlecode.com/svn/trunk/ feedparser-read-only + +Here is the upstream issue: + +http://code.google.com/p/feedparser/issues/detail?id=313 + +which will be fixed in 5.2 diff -Nru feedparser-5.0.1/debian/upstream/deflate-error.z feedparser-5.1.2/debian/upstream/deflate-error.z --- feedparser-5.0.1/debian/upstream/deflate-error.z 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/debian/upstream/deflate-error.z 2012-09-27 17:54:46.000000000 +0000 @@ -0,0 +1 @@ +error \ No newline at end of file Binary files /tmp/ilzdKFcrnn/feedparser-5.0.1/debian/upstream/deflate.z and /tmp/Hp4IxLK2PF/feedparser-5.1.2/debian/upstream/deflate.z differ diff -Nru feedparser-5.0.1/debian/upstream/gzip-not-gzipped.gz feedparser-5.1.2/debian/upstream/gzip-not-gzipped.gz --- feedparser-5.0.1/debian/upstream/gzip-not-gzipped.gz 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/debian/upstream/gzip-not-gzipped.gz 2012-09-27 17:54:46.000000000 +0000 @@ -0,0 +1 @@ +error \ No newline at end of file Binary files /tmp/ilzdKFcrnn/feedparser-5.0.1/debian/upstream/gzip-struct-error.gz and /tmp/Hp4IxLK2PF/feedparser-5.1.2/debian/upstream/gzip-struct-error.gz differ Binary files /tmp/ilzdKFcrnn/feedparser-5.0.1/debian/upstream/gzip.gz and /tmp/Hp4IxLK2PF/feedparser-5.1.2/debian/upstream/gzip.gz differ diff -Nru feedparser-5.0.1/debian/watch feedparser-5.1.2/debian/watch --- feedparser-5.0.1/debian/watch 2011-02-15 21:28:10.000000000 +0000 +++ feedparser-5.1.2/debian/watch 2012-09-27 17:57:38.000000000 +0000 @@ -1,2 +1,2 @@ version=3 -http://code.google.com/p/feedparser/downloads/list http://feedparser.googlecode.com/files/feedparser-([\d.]+)\.tar.gz +http://code.google.com/p/feedparser/downloads/list?can=1 .*/feedparser-(\d[\d\.]*)\.(?:tar\.gz|tar\.bz2|tar\.xz) diff -Nru feedparser-5.0.1/docs/_static/feedparser.css feedparser-5.1.2/docs/_static/feedparser.css --- feedparser-5.0.1/docs/_static/feedparser.css 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/_static/feedparser.css 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,5 @@ +.pre, .pre * { + font-style: normal; + font-family: monospace; + white-space: pre; +} diff -Nru feedparser-5.0.1/docs/add_custom_css.py feedparser-5.1.2/docs/add_custom_css.py --- feedparser-5.0.1/docs/add_custom_css.py 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/add_custom_css.py 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,3 @@ +# Makes Sphinx create a to feedparser.css in the HTML output +def setup(app): + app.add_stylesheet('feedparser.css') diff -Nru feedparser-5.0.1/docs/advanced.rst feedparser-5.1.2/docs/advanced.rst --- feedparser-5.0.1/docs/advanced.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/advanced.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,38 @@ +Advanced Features +################# + +.. toctree:: + :maxdepth: 2 + + date-parsing + html-sanitization + content-normalization + namespace-handling + resolving-relative-links + version-detection + character-encoding + bozo + + + + + + + + + + + + + +.. COMMENT:
+ + + + + <para>xxx</para> + </abstract> + </sectioninfo> + <title>Language Detection + xxx +
diff -Nru feedparser-5.0.1/docs/annotated-atom03.rst feedparser-5.1.2/docs/annotated-atom03.rst --- feedparser-5.0.1/docs/annotated-atom03.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/annotated-atom03.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,87 @@ +.. _annotated.atom03: + +Atom 0.3 +======== + +This is a sample Atom 0.3 feed, annotated with links that show how each value +can be accessed once the feed is parsed. + +.. caution:: + + Even though many of these elements are required according to the specification, + real-world feeds may be missing any element. If an element is not present in + the feed, it will not be present in the parsed results. You should not rely on + any particular element being present. + + +.. rubric:: Annotated Atom 0.3 feed + +.. container:: pre + + `"?> + + + :ref:`Sample Feed <reference.feed.title>` + + + :ref:`For documentation <em>only</em> ` + + + + :ref:`<p>Copyright 2004, Mark Pilgrim</p>< ` + + + :ref:`Sample Toolkit ` + + \ :ref:`tag:feedparser.org,2004-04-20:/docs/examples/atom03.xml `\ + \ :ref:`2004-04-20T11:56:34Z `\ + + :ref:`\
\

This is an Atom syndication feed.\

\
` +
+ + \ :ref:`First entry title <reference.entry.title>`\ + + + + \ :ref:`tag:feedparser.org,2004-04-20:/docs/examples/atom03.xml:3 `\ + \ :ref:`2004-04-19T07:45:00Z `\ + \ :ref:`2004-04-20T00:23:47Z `\ + \ :ref:`2004-04-20T11:56:34Z `\ + + \ :ref:`Mark Pilgrim `\ + \ :ref:`http://diveintomark.org/ `\ + \ :ref:`mark@example.org `\ + + + \ :ref:`Joe `\ + \ :ref:`http://example.org/joe/ `\ + \ :ref:`joe@example.org `\ + + + \ :ref:`Sam `\ + \ :ref:`http://example.org/sam/ `\ + \ :ref:`sam@example.org `\ + + + :ref:`Watch out for nasty tricks ` + + + :ref:`\
Watch out for \ nasty tricks\\
` +
+
+
diff -Nru feedparser-5.0.1/docs/annotated-atom10.rst feedparser-5.1.2/docs/annotated-atom10.rst --- feedparser-5.0.1/docs/annotated-atom10.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/annotated-atom10.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,87 @@ +.. _annotated.atom10: + +Atom 1.0 +======== + +This is a sample Atom 1.0 feed, annotated with links that show how each value +can be accessed once the feed is parsed. + +.. caution:: + + Even though many of these elements are required according to the specification, + real-world feeds may be missing any element. If an element is not present in + the feed, it will not be present in the parsed results. You should not rely on + any particular element being present. + +.. rubric:: Annotated Atom 1.0 feed + +.. container:: pre + + `"?> + + + :ref:`Sample Feed <reference.feed.title>` + + + :ref:`For documentation <em>only</em> ` + + + + + :ref:`<p>Copyright 2005, Mark Pilgrim</p> ` + + + :ref:`Sample Toolkit ` + + \ :ref:`tag:feedparser.org,2005-11-09:/docs/examples/atom10.xml `\ + \ :ref:`2005-11-09T11:56:34Z `\ + + \ :ref:`First entry title <reference.entry.title>`\ + + + + + \ :ref:`tag:feedparser.org,2005-11-09:/docs/examples/atom10.xml:3 `\ + \ :ref:`2005-11-09T00:23:47Z `\ + \ :ref:`2005-11-09T11:56:34Z `\ + + \ :ref:`Mark Pilgrim `\ + \ :ref:`http://diveintomark.org/ `\ + \ :ref:`mark@example.org `\ + + + \ :ref:`Joe `\ + \ :ref:`http://example.org/joe/ `\ + \ :ref:`joe@example.org `\ + + + \ :ref:`Sam `\ + \ :ref:`http://example.org/sam/ `\ + \ :ref:`sam@example.org `\ + + + :ref:`Watch out for nasty tricks ` + + \ :ref:`\
Watch out for + \ + nasty tricks\\
` +
+
+
diff -Nru feedparser-5.0.1/docs/annotated-examples.rst feedparser-5.1.2/docs/annotated-examples.rst --- feedparser-5.0.1/docs/annotated-examples.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/annotated-examples.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,14 @@ +.. _annotated: + +Annotated Examples +################## + +.. toctree:: + :maxdepth: 2 + + annotated-atom10 + annotated-atom03 + annotated-rss20 + annotated-rss20-dc + annotated-rss10 + diff -Nru feedparser-5.0.1/docs/annotated-rss10.rst feedparser-5.1.2/docs/annotated-rss10.rst --- feedparser-5.0.1/docs/annotated-rss10.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/annotated-rss10.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,55 @@ +.. _annotated.rss10: + +:abbr:`RSS (Rich Site Summary)` 1.0 +=================================== + +This is a sample :abbr:`RSS (Rich Site Summary)` 1.0 feed, annotated with links that show how each value can be accessed once the feed is parsed. + +.. caution:: + + Even though many of these elements are required according to the specification, + real-world feeds may be missing any element. If an element is not present in + the feed, it will not be present in the parsed results. You should not rely on + any particular element being present. + +.. rubric:: Annotated :abbr:`RSS (Rich Site Summary)` 1.0 feed + +.. container:: pre + + `"?> + + + \ :ref:`Sample Feed <reference.feed.title>`\ + \ :ref:`http://www.example.org/ `\ + \ :ref:`For documentation only `\ + \ :ref:`en `\ + + \ :ref:`Mark Pilgrim ` (:ref:`mark@example.org `) + \ :ref:`2004-06-04T17:40:33-05:00 `\ + + + + + + + + + + \ :ref:`First of all <reference.entry.title>`\ + \ :ref:`http://example.org/archives/2002/09/04.html#first_of_all `\ + + :ref:`Americans are fat. Smokers are stupid. People who don't speak Perl are irrelevant. ` + + \ :ref:`Quotes `\ + \ :ref:`2004-05-30T14:23:54-06:00 `\ + Ian Hickson\: \\ + Americans are fat. Smokers are stupid. People who don't speak Perl are irrelevant. + \\]]> ` + + + diff -Nru feedparser-5.0.1/docs/annotated-rss20-dc.rst feedparser-5.1.2/docs/annotated-rss20-dc.rst --- feedparser-5.0.1/docs/annotated-rss20-dc.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/annotated-rss20-dc.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,54 @@ +.. _annotated.rss20dc: + +RSS 2.0 with Namespaces +======================= + +This is a sample :abbr:`RSS (Rich Site Summary)` 2.0 feed that uses several +allowable extension modules in namespaces. The feed is annotated with links +that show how each value can be accessed once the feed is parsed. + +.. caution:: + + Even though many of these elements are required according to the specification, + real-world feeds may be missing any element. If an element is not present in + the feed, it will not be present in the parsed results. You should not rely on + any particular element being present. + +.. rubric:: Annotated :abbr:`RSS (Rich Site Summary)` 2.0 feed with namespaces + +.. container:: pre + + + `"?> + + + \ :ref:`Sample Feed <reference.feed.title>`\ + \ :ref:`http://example.org/ `\ + \ :ref:`For documentation only `\ + \ :ref:`en-us `\ + \ :ref:`Mark Pilgrim ` (:ref:`mark@example.org `) + \ :ref:`Copyright 2004 Mark Pilgrim `\ + \ :ref:`2004-06-04T17:40:33-05:00 `\ + + + + \ :ref:`First of all <reference.entry.title>`\ + \ :ref:`http://example.org/archives/2002/09/04.html#first_of_all `\ + \ :ref:`1983@example.org `\ + + :ref:`Americans are fat. Smokers are stupid. People who don't speak Perl are irrelevant. ` + + \ :ref:`Quotes `\ + \ :ref:`2002-09-04T13:54:20-05:00 `\ + Ian Hickson\: \\ + + + \ :ref:`Sample Feed <reference.feed.title>`\ + \ :ref:`For documentation <em>only</em> `\ + \ :ref:`http://example.org/ `\ + \ :ref:`en `\ + \ :ref:`Copyright 2004, Mark Pilgrim `\ + \ :ref:`editor@example.org `\ + \ :ref:`webmaster@example.org `\ + \ :ref:`Sat, 07 Sep 2002 0:00:01 GMT `\ + \ :ref:`Examples `\ + \ :ref:`Sample Toolkit `\ + \ :ref:`http://feedvalidator.org/docs/rss2.html `\ + + \ :ref:`60 `\ + + \ :ref:`http://example.org/banner.png `\ + \ :ref:`Example banner <reference.feed.image.title>`\ + \ :ref:`http://example.org/ `\ + \ :ref:`80 `\ + \ :ref:`15 `\ + + + \ :ref:`Search <reference.feed.textinput.title>`\ + \ :ref:`Search this site: `\ + \ :ref:`q `\ + \ :ref:`http://example.org/mt/mt-search.cgi `\ + + + \ :ref:`First item title <reference.entry.title>`\ + \ :ref:`http://example.org/item/1 `\ + \ :ref:`Watch out for + <span style="background: url(javascript:window.location='http://example.org/')"> + nasty tricks</span> ` + + \ :ref:`mark@example.org `\ + \ :ref:`Miscellaneous `\ + \ :ref:`http://example.org/comments/1 `\ + + \ :ref:`http://example.org/guid/1 `\ + \ :ref:`Thu, 05 Sep 2002 0:00:01 GMT `\ + + + diff -Nru feedparser-5.0.1/docs/atom-detail.rst feedparser-5.1.2/docs/atom-detail.rst --- feedparser-5.0.1/docs/atom-detail.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/atom-detail.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,49 @@ +Getting Detailed Information on Atom Elements +============================================= + +Several Atom elements share the Atom content model: title, subtitle, rights, +summary, and of course content. (Atom 0.3 also had an info element which +shared this content model.) :program:`Universal Feed Parser` captures all +relevant metadata about these elements, most importantly the content type and +the value itself. + +Detailed Information on Feed Elements +------------------------------------- + +:: + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') + >>> d.feed.title_detail + {'type': u'text/plain', + 'base': u'http://example.org/', + 'language': u'en', + 'value': u'Sample Feed'} + >>> d.feed.subtitle_detail + {'type': u'text/html', + 'base': u'http://example.org/', + 'language': u'en', + 'value': u'For documentation only'} + >>> d.feed.rights_detail + {'type': u'text/html', + 'base': u'http://example.org/', + 'language': u'en', + 'value': u'

Copyright 2004, Mark Pilgrim

'} + >>> d.entries[0].title_detail + {'type': 'text/plain', + 'base': u'http://example.org/', + 'language': u'en', + 'value': u'First entry title'} + >>> d.entries[0].summary_detail + {'type': u'text/plain', + 'base': u'http://example.org/', + 'language': u'en', + 'value': u'Watch out for nasty tricks'} + >>> len(d.entries[0].content) + 1 + >>> d.entries[0].content[0] + {'type': u'application/xhtml+xml', + 'base': u'http://example.org/entry/3', + 'language': u'en-US' + 'value': u'
Watch out for nasty tricks
'} + diff -Nru feedparser-5.0.1/docs/basic-existence.rst feedparser-5.1.2/docs/basic-existence.rst --- feedparser-5.0.1/docs/basic-existence.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/basic-existence.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,26 @@ +Testing for Existence +===================== + +Feeds in the real world may be missing elements, even elements that are +required by the specification. You should always test for the existence of an +element before getting its value. Never assume an element is present. + +Use standard :program:`Python` dictionary functions such as ``has_key`` to test +whether an element exists. + +Testing if elements are present +------------------------------- + +:: + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') + >>> d.feed.has_key('title') + True + >>> d.feed.has_key('ttl') + False + >>> d.feed.get('title', 'No title') + u'Sample feed' + >>> d.feed.get('ttl', 60) + 60 + diff -Nru feedparser-5.0.1/docs/basic.rst feedparser-5.1.2/docs/basic.rst --- feedparser-5.0.1/docs/basic.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/basic.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,14 @@ +Basic Features +############## + +.. toctree:: + :maxdepth: 2 + + introduction + common-rss-elements + common-atom-elements + atom-detail + uncommon-rss + uncommon-atom + basic-existence + diff -Nru feedparser-5.0.1/docs/bozo.rst feedparser-5.1.2/docs/bozo.rst --- feedparser-5.0.1/docs/bozo.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/bozo.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,36 @@ +.. _advanced.bozo: + +Bozo Detection +============== + +:program:`Universal Feed Parser` can parse feeds whether they are well-formed +:abbr:`XML (Extensible Markup Language)` or not. However, since some +applications may wish to reject or warn users about non-well-formed feeds, +:program:`Universal Feed Parser` sets the ``bozo`` bit when it detects that a +feed is not well-formed. Thanks to `Tim Bray +`_ for +suggesting this terminology. + +Detecting a non-well-formed feed +-------------------------------- + +:: + + >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') + >>> d.bozo + 0 + >>> d = feedparser.parse('http://feedparser.org/tests/illformed/rss/aaa_illformed.xml') + >>> d.bozo + 1 + >>> d.bozo_exception + + >>> exc = d.bozo_exception + >>> exc.getMessage() + "expected '>'\\n" + >>> exc.getLineNumber() + 6 + + +There are many reasons an :abbr:`XML (Extensible Markup Language)` document +could be non-well-formed besides this example (incomplete end tags) See +:ref:`advanced.encoding` for some other ways to trip the bozo bit. diff -Nru feedparser-5.0.1/docs/changes-26.rst feedparser-5.1.2/docs/changes-26.rst --- feedparser-5.0.1/docs/changes-26.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/changes-26.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,36 @@ +Changes in version 2.6 +====================== + +:program:`Ultra-liberal Feed Parser` 2.6 was released on January 1, 2004. + +- dc:author support (MarekK) + +- fixed bug tracking nested divs within content (JohnD) + +- fixed missing :file:`sys` import (JohanS) + +- fixed regular expression to capture :abbr:`XML (Extensible Markup Language)` character encoding (Andrei) + +- added support for Atom 0.3-style links + +- fixed bug with textInput tracking + +- added support for cloud (MartijnP) + +- added support for multiple category/dc:subject (MartijnP) + +- normalize content model: ``description`` gets description (which can come from ````, ``
``, or full content if no ````), ``content`` gets dict of ``base``/``language``/``type``/``value`` (which can come from ````, ````, ````, or ````) + +- fixed bug matching arbitrary Userland namespaces + +- added xml:base and xml:lang tracking + +- fixed bug tracking unknown tags + +- fixed bug tracking content when ```` element is not in default namespace (like Pocketsoap feed) + +- resolve relative URLs in ````, ````, ````, ````, ````, ````, ```` + +- resolve relative :abbr:`URI (Uniform Resource Identifier)`s within embedded :abbr:`HTML (HyperText Markup Language)` markup in ````, ````, ````, ````, ````, ``<subtitle>``, ``<summary>``, ``<info>``, ``<tagline>``, and ``<copyright>`` + +- added support for pingback and trackback namespaces diff -Nru feedparser-5.0.1/docs/changes-27.rst feedparser-5.1.2/docs/changes-27.rst --- feedparser-5.0.1/docs/changes-27.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/changes-27.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,70 @@ +Changes in version 2.7.x +======================== + +The 2.7 series was a brief but necessary transition towards some of the core ideas in version 3.0. + +:program:`Ultra-liberal Feed Parser` 2.7.6 was released on January 16, 2004. + +- fixed bug with :file:`StringIO` importing + + +:program:`Ultra-liberal Feed Parser` 2.7.5 was released on January 15, 2004. + +- added workaround for malformed DOCTYPE (seen on many ``blogspot.com`` sites) + +- added ``_debug`` variable + + +:program:`Ultra-liberal Feed Parser` 2.7.4 was released on January 14, 2004. + +- added workaround for improperly formed <br/> tags in encoded :abbr:`HTML (HyperText Markup Language)` (skadz) + +- fixed unicode handling in normalize_attrs (ChrisL) + +- fixed relative :abbr:`URI (Uniform Resource Identifier)` processing for guid (skadz) + +- added ICBM support + +- added :file:`base64` support + + +:program:`Ultra-liberal Feed Parser` 2.7.3 was released on January 14, 2004. + +- reverted all changes made in 2.7.2 + + +:program:`Ultra-liberal Feed Parser` 2.7.2 was released on January 13, 2004. + +- "Version 2.7.2 of my feed parser, released today, will by default refuse to parse `this feed <http://intertwingly.net/stories/2004/01/12/broken.rss>`_. It does a first-pass check for wellformedness, and when that fails it sets the 'bozo' bit in the result to ``1`` and immediately terminates. You can revert to the previous behavior by passing ``disableWellFormedCheck=1``, but it will print arrogant warning messages to stderr to the effect that anyone who can't create a well-formed :abbr:`XML (Extensible Markup Language)` feed is a bozo and an incompetent fool." `source <http://intertwingly.net/blog/2004/01/12/Scientific-Method#c1074047818>`_ + + +:program:`Ultra-liberal Feed Parser` 2.7.1 was released on January 9, 2004. + +- fixed bug handling " and ' + +- fixed memory leak not closing url opener (JohnD) + +- added dc:publisher support (MarekK) + +- added admin:errorReportsTo support (MarekK) + +- :program:`Python` 2.1 ``dict`` support (MarekK) + + +:program:`Ultra-liberal Feed Parser` 2.7 was released on January 5, 2004. + +- really added support for trackback and pingback namespaces, as opposed to 2.6 when I said I did but didn't really + +- sanitize :abbr:`HTML (HyperText Markup Language)` markup within some elements + +- added :file:`mxTidy` support (if installed) to tidy :abbr:`HTML (HyperText Markup Language)` markup within some elements + +- fixed indentation bug in ``_parse_date`` (FazalM) + +- use ``socket.setdefaulttimeout`` if available (FazalM) + +- universal date parsing and normalization (FazalM): ``created``, ``modified``, ``issued`` are parsed into 9-tuple date format and stored in ``created_parsed``, ``modified_parsed``, and ``issued_parsed`` + +- ``date`` is duplicated in ``modified`` and vice-versa + +- ``date_parsed`` is duplicated in ``modified_parsed`` and vice-versa diff -Nru feedparser-5.0.1/docs/changes-30.rst feedparser-5.1.2/docs/changes-30.rst --- feedparser-5.0.1/docs/changes-30.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/changes-30.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,226 @@ +Changes in version 3.0 +====================== + + +:program:`Universal Feed Parser` 3.0 was released on June 21, 2004. + +- don't try ``iso-8859-1`` (can't distinguish between ``iso-8859-1`` and ``windows-1252`` anyway, and most incorrectly marked feeds are ``windows-1252``) + +- fixed regression that could cause the same encoding to be tried twice (even if it failed the first time) + + +:program:`Universal Feed Parser` 3.0fc3 was released on June 18, 2004. + +- fixed bug in ``_changeEncodingDeclaration`` that failed to parse UTF-16 encoded feeds + +- made ``source`` into a FeedParserDict + +- duplicate admin:generatorAgent/@rdf:resource in ``generator_detail.url`` + +- added support for image + +- refactored ``parse()`` fallback logic to try other encodings if SAX parsing fails (previously it would only try other encodings if re-encoding failed) + +- remove ``unichr`` madness in normalize_attrs now that we're properly tracking encoding in and out of BaseHTMLProcessor + +- set ``feed.language`` from root-level xml:lang + +- set ``entry.id`` from rdf:about + +- send ``Accept`` header + + +:program:`Universal Feed Parser` 3.0fc2 was released on May 10, 2004. + +- added and passed Sam's amp tests + +- added and passed my blink tag tests + + +:program:`Universal Feed Parser` 3.0fc1 was released on April 23, 2004. + +- made ``results.entries[0].links[0]`` and ``results.entries[0].enclosures[0]`` into FeedParserDict + +- fixed typo that could cause the same encoding to be tried twice (even if it failed the first time) + +- fixed DOCTYPE stripping when DOCTYPE contained entity declarations + +- better textinput and image tracking in illformed :abbr:`RSS (Rich Site Summary)` 1.0 feeds + + +:program:`Universal Feed Parser` 3.0b23 was released on April 21, 2004. + +- fixed ``UnicodeDecodeError`` for feeds that contain high-bit characters in attributes in embedded :abbr:`HTML (HyperText Markup Language)` in description (thanks Thijs van de Vossen) + +- moved ``guid``, ``date``, and ``date_parsed`` to mapped keys in FeedParserDict + +- tweaked FeedParserDict.has_key to return ``True`` if asking about a mapped key + + +:program:`Universal Feed Parser` 3.0b22 was released on April 19, 2004. + +- changed ``channel`` to ``feed``, ``item`` to ``entries`` in ``results`` dict + +- changed ``results`` dict to allow getting values with ``results.key`` as well as ``results[key]`` + +- work around embedded illformed :abbr:`HTML (HyperText Markup Language)` with half a DOCTYPE + +- work around malformed ``Content-Type`` header + +- if character encoding is wrong, try several common ones before falling back to regexes (if this works, ``bozo_exception`` is set to ``CharacterEncodingOverride`` + +- fixed character encoding issues in BaseHTMLProcessor by tracking encoding and converting from Unicode to raw strings before feeding data to sgmllib.SGMLParser + +- convert each value in results to Unicode (if possible), even if using regex-based parsing + + +:program:`Universal Feed Parser` 3.0b21 was released on April 14, 2004. + +- added Hot RSS support + + +:program:`Universal Feed Parser` 3.0b20 was released on April 7, 2004. + +- added :abbr:`CDF (Channel Definition Format)` support + + +:program:`Universal Feed Parser` 3.0b19 was released on March 15, 2004. + +- fixed bug exploding author information when author name was in parentheses + +- removed ultra-problematic :file:`mxTidy` support + +- patch to workaround crash in PyXML/expat when encountering invalid entities (MarkMoraes) + +- support for textinput/textInput + + +:program:`Universal Feed Parser` 3.0b18 was released on February 17, 2004. + +- always map description to ``summary_detail`` (Andrei) + +- use :file:`libxml2` (if available) + + +:program:`Universal Feed Parser` 3.0b17 was released on February 13, 2004. + +- determine character encoding as per `RFC 3023 <http://www.ietf.org/rfc/rfc3023.txt>`_ + + +:program:`Universal Feed Parser` 3.0b16 was released on February 12, 2004. + +- fixed support for :abbr:`RSS (Rich Site Summary)` 0.90 (broken in b15) + + +:program:`Universal Feed Parser` 3.0b15 was released on February 11, 2004. + +- fixed bug resolving relative links in wfw:commentRSS + +- fixed bug capturing author and contributor :abbr:`URI (Uniform Resource Identifier)` + +- fixed bug resolving relative links in author and contributor :abbr:`URI (Uniform Resource Identifier)` + +- fixed bug resolving relative links in generator :abbr:`URI (Uniform Resource Identifier)` + +- added support for recognizing :abbr:`RSS (Rich Site Summary)` 1.0 + +- passed Simon Fell's namespace tests, and included them permanently in the test suite with his permission + +- fixed namespace handling under :program:`Python` 2.1 + + +:program:`Universal Feed Parser` 3.0b14 was released on February 8, 2004. + +- fixed CDATA handling in non-wellformed feeds under :program:`Python` 2.1 + + +:program:`Universal Feed Parser` 3.0b13 was released on February 8, 2004. + +- better handling of empty :abbr:`HTML (HyperText Markup Language)` tags (br, hr, img, etc.) in embedded markup, in either :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML (Extensible HyperText Markup Language)` form (<br>, <br/>, <br />) + + +:program:`Universal Feed Parser` 3.0b12 was released on February 6, 2004. + +- fiddled with ``decodeEntities`` (still not right) + +- added support to Atom 0.2 subtitle + +- added support for Atom content model in copyright + +- better sanitizing of dangerous :abbr:`HTML (HyperText Markup Language)` elements with end tags (script, frameset) + + +:program:`Universal Feed Parser` 3.0b11 was released on February 2, 2004. + +- added rights to list of elements that can contain dangerous markup + +- fiddled with ``decodeEntities`` (not right) + +- liberalized date parsing even further + + +:program:`Universal Feed Parser` 3.0b10 was released on January 31, 2004. + +- incorporated ISO-8601 date parsing routines from :file:`xml.util.iso8601` + + +:program:`Universal Feed Parser` 3.0b9 was released on January 29, 2004. + +- fixed check for presence of ``dict`` function + +- added support for summary + + +:program:`Universal Feed Parser` 3.0b8 was released on January 28, 2004. + +- added support for contributor + + +:program:`Universal Feed Parser` 3.0b7 was released on January 28, 2004. + +- support Atom-style author element in ``author_detail`` (dictionary of ``name``, ``url``, ``email``) + +- map ``author`` to ``author_detail`` if ``author`` contains name + email address + + +:program:`Universal Feed Parser` 3.0b6 was released on January 27, 2004. + +- added feed type and version detection, ``result['version']`` will be one of ``SUPPORTED_VERSIONS.keys()`` or empty string if unrecognized + +- added support for creativeCommons:license and cc:license + +- added support for full Atom content model in title, tagline, info, copyright, summary + +- fixed bug with gzip encoding (not always telling server we support it when we do) + + +:program:`Universal Feed Parser` 3.0b5 was released on January 26, 2004. + +- fixed bug parsing multiple links at feed level + + +:program:`Universal Feed Parser` 3.0b4 was released on January 26, 2004. + +- fixed xml:lang inheritance + +- fixed multiple bugs tracking xml:base :abbr:`URI (Uniform Resource Identifier)`, one for documents that don't define one explicitly and one for documents that define an outer and an inner xml:base that goes out of scope before the end of the document + + +:program:`Universal Feed Parser` 3.0b3 was released on January 23, 2004. + +- parse entire feed with real :abbr:`XML (Extensible Markup Language)` parser (if available) + +- added several new supported namespaces + +- fixed bug tracking naked markup in description + +- added support for enclosure + +- added support for source + +- re-added support for cloud which got dropped somehow + +- added support for expirationDate + + +:program:`Universal Feed Parser` 3.0b2 and 3.0b1 have been lost in the mists of time. diff -Nru feedparser-5.0.1/docs/changes-301.rst feedparser-5.1.2/docs/changes-301.rst --- feedparser-5.0.1/docs/changes-301.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/changes-301.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,23 @@ +Changes in version 3.0.1 +======================== + + + + +:program:`Universal Feed Parser` 3.0.1 was released on June 21, 2004. + +- default to ``us-ascii`` for all text/* content types + +- recover from malformed ``content-type`` header parameter with no equals sign ("text/xml; charset:iso-8859-1") + +- docs: added :file:`reference-feed.html` and :file:`reference-entry.html` (bug #977723) + +- docs: fixed ``entry[i]`` in documentation (should be ``entries[i]``) (bug #977722) + +- docs: added note about Unicode string usage (bug #977716) + +- docs: added :file:`basic-existence.html` (bug #977704) + +- docs: fixed description of feed title (bug #977685) + +- docs: fixed typo in annotated :abbr:`RSS (Rich Site Summary)` 1.0 feed (bug #977682) \ No newline at end of file diff -Nru feedparser-5.0.1/docs/changes-31.rst feedparser-5.1.2/docs/changes-31.rst --- feedparser-5.0.1/docs/changes-31.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/changes-31.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,25 @@ +Changes in version 3.1 +====================== + + + + +:program:`Universal Feed Parser` 3.1 was released on June 28, 2004. + +- added and passed tests for converting :abbr:`HTML (HyperText Markup Language)` entities to Unicode equivalents in illformed feeds (aaronsw) + +- added and passed tests for converting character entities to Unicode equivalents in illformed feeds (aaronsw) + +- test for valid parsers when setting ``XML_AVAILABLE`` + +- make version and encoding available when server returns a ``304`` + +- add ``handlers`` parameter to pass arbitrary :file:`urllib2` handlers (like digest auth or proxy support) + +- add code to parse username/password out of url and send as basic authentication + +- expose downloading-related exceptions in ``bozo_exception`` (aaronsw) + +- added __contains__ method to FeedParserDict (aaronsw) + +- added ``publisher_detail`` (aaronsw) \ No newline at end of file diff -Nru feedparser-5.0.1/docs/changes-32.rst feedparser-5.1.2/docs/changes-32.rst --- feedparser-5.0.1/docs/changes-32.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/changes-32.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,33 @@ +Changes in version 3.2 +====================== + + + + +:program:`Universal Feed Parser` 3.2 was released on July 3, 2004. + +- use :file:`cjkcodecs` and :file:`iconv_codec` if available + +- always convert feed to UTF-8 before passing to :abbr:`XML (Extensible Markup Language)` parser + +- completely revamped logic for determining character encoding and attempting :abbr:`XML (Extensible Markup Language)` parsing (much faster) + +- increased default timeout to 20 seconds + +- test for presence of ``Location`` header on redirects + +- added tests for many alternate character encodings + +- support various :abbr:`EBCDIC` encodings + +- support UTF-16BE and UTF16-LE with or without a :abbr:`BOM (Byte Order Mark)` + +- support UTF-8 with a :abbr:`BOM (Byte Order Mark)` + +- support UTF-32BE and UTF-32LE with or without a :abbr:`BOM (Byte Order Mark)` + +- fixed crashing bug if no :abbr:`XML (Extensible Markup Language)` parsers are available + +- added support for ``Content-encoding: deflate`` + +- send blank ``Accept-encoding`` header if neither :file:`gzip` nor :file:`zlib` modules are available diff -Nru feedparser-5.0.1/docs/changes-33.rst feedparser-5.1.2/docs/changes-33.rst --- feedparser-5.0.1/docs/changes-33.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/changes-33.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,35 @@ +Changes in version 3.3 +====================== + + + + +:program:`Universal Feed Parser` 3.3 was released on July 15, 2004. + +- optimized :abbr:`EBCDIC` to :abbr:`ASCII` conversion + +- fixed obscure problem tracking xml:base and xml:lang if element declares it, child doesn't, first grandchild redeclares it, and second grandchild doesn't + +- refactored date parsing + +- defined public ``registerDateHandler`` so callers can add support for additional date formats at runtime + +- added support for OnBlog, Nate, MSSQL, Greek, and Hungarian dates (ytrewq1) + +- added ``zopeCompatibilityHack()`` which turns FeedParserDict into a regular dictionary, required for :program:`Zope` compatibility, and also makes command-line debugging easier because pprint module formats real dictionaries better than dictionary-like objects + +- added NonXMLContentType exception, which is stored in ``bozo_exception`` when a feed is served with a non-:abbr:`XML (Extensible Markup Language)` media type such as ``'text/plain'`` + +- respect ``Content-Language`` as default language if no xml:lang is present + +- ``cloud`` dict is now FeedParserDict + +- generator dict is now FeedParserDict + +- better tracking of xml:lang, including support for xml:lang='' to unset the current language + +- recognize :abbr:`RSS (Rich Site Summary)` 1.0 feeds even when :abbr:`RSS (Rich Site Summary)` 1.0 namespace is not the default namespace + +- don't overwrite final status on redirects (scenarios: redirecting to a :abbr:`URI (Uniform Resource Identifier)` that returns ``304``, redirecting to a :abbr:`URI (Uniform Resource Identifier)` that redirects to another :abbr:`URI (Uniform Resource Identifier)` with a different type of redirect) + +- add support for ``HTTP 303`` redirects \ No newline at end of file diff -Nru feedparser-5.0.1/docs/changes-40.rst feedparser-5.1.2/docs/changes-40.rst --- feedparser-5.0.1/docs/changes-40.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/changes-40.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,27 @@ +Changes in version 4.0 +====================== + + + + +:program:`Universal Feed Parser` 4.0 was released on December 23, 2005. + +- Support for :ref:`annotated.atom10`. + +- Support for :program:`iTunes` extensions. + +- Support for dc:contributor. + +- :program:`Universal Feed Parser` now captures the feed's :ref:`reference.namespaces`. See :ref:`advanced.namespaces` for details. + +- Lots of things have been renamed to match Atom 1.0 terminology. issued is now :ref:`reference.entry.published`, modified is now :ref:`reference.entry.updated`, and url is now href everywhere. You can still access these elements with the old names, so you shouldn't need to change any existing code, but don't be surprised if you can't find them during debugging. + +- category and categories have been replaced by tags, see :ref:`reference.feed.tags` and :ref:`reference.entry.tags`. The old names still work. + +- mode is gone from all detail and content dictionaries. It was never terribly useful, since :program:`Universal Feed Parser` unescapes content automatically. + +- :ref:`reference.entry.source` is now a dictionary of feed metadata as per section 4.2.11 of RFC 4287. :program:`Universal Feed Parser` no longer supports the :abbr:`RSS (Rich Site Summary)` 2.0's source element. + +- Content in unknown namespaces is no longer discarded (`bug 993305 <http://sourceforge.net/tracker/index.php?func=detail&aid=993305&group_id=112328&atid=661937>`_) + +- Lots of other bug fixes. \ No newline at end of file diff -Nru feedparser-5.0.1/docs/changes-401.rst feedparser-5.1.2/docs/changes-401.rst --- feedparser-5.0.1/docs/changes-401.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/changes-401.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,9 @@ +Changes in version 4.0.1 +======================== + + + + +:program:`Universal Feed Parser` 4.0.1 was released on December 24, 2005. + +- bug fixes for :program:`Python` 2.1 compatibility. \ No newline at end of file diff -Nru feedparser-5.0.1/docs/changes-402.rst feedparser-5.1.2/docs/changes-402.rst --- feedparser-5.0.1/docs/changes-402.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/changes-402.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,9 @@ +Changes in version 4.0.2 +======================== + + + + +:program:`Universal Feed Parser` 4.0.2 was released on December 24, 2005. + +- cleared ``_debug`` flag. \ No newline at end of file diff -Nru feedparser-5.0.1/docs/changes-41.rst feedparser-5.1.2/docs/changes-41.rst --- feedparser-5.0.1/docs/changes-41.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/changes-41.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,8 @@ +Changes in version 4.1 +====================== + +:program:`Universal Feed Parser` 4.1 was released on January 11, 2006. + +- Support for the `Universal Encoding Detector <http://chardet.feedparser.org/>`_ to autodetect character encoding of feeds that declare their encoding incorrectly or don't declare it at all. See :ref:`advanced.encoding` for details of when this gets called. + +- :program:`Universal Feed Parser` no longer sets a default socket timeout (SourceForge bug `1392140 <http://sourceforge.net/tracker/index.php?func=detail&aid=1392140&group_id=112328&atid=661937>`_). If you were relying on this feature, you will need to call socket.setdefaulttimeout(TIMEOUT_IN_SECONDS) yourself. diff -Nru feedparser-5.0.1/docs/changes-42.rst feedparser-5.1.2/docs/changes-42.rst --- feedparser-5.0.1/docs/changes-42.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/changes-42.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,22 @@ +Changes in version 4.2 +====================== + +:program:`Universal Feed Parser` 4.2 was released on 2008-03-12. + +- Support for :ref:`parsing microformats <advanced.microformats>`, including :ref:`rel=enclosure <advanced.microformats.relenclosure>`, :ref:`rel=tag <advanced.microformats.reltag>`, :ref:`XFN <advanced.microformats.xfn>`, and :ref:`hCard <advanced.microformats.hcard>`. + +- Updated the whitelist of :ref:`acceptable HTML elements and attributes <advanced.sanitization.html>` based on the latest draft of the :abbr:`HTML (HyperText Markup Language)` 5 specification. + +- Support for :ref:`advanced.sanitization.css`. (Previous versions of :program:`Universal Feed Parser` simply stripped all inline styles.) Many thanks to Sam Ruby for implementing this, despite my insistence that it was impossible. + +- Support for :ref:`advanced.sanitization.svg`. + +- Support for :ref:`advanced.sanitization.mathml`. Many thanks to Jacques Distler for patiently debugging this feature. + +- :abbr:`IRI (International Resource Identifier)` support for every element that can contain a :abbr:`URI (Uniform Resource Identifier)`. + +- Ability to :ref:`disable relative URI resolution <advanced.base.disable>`. + +- Command-line arguments and alternate serializers, for manipulating :program:`Universal Feed Parser` from shell scripts or other non-Python sources. + +- More robust parsing of author email addresses, misencoded win-1252 content, rel=self links, and better detection of HTML content in elements with ambiguous content types. diff -Nru feedparser-5.0.1/docs/changes-early.rst feedparser-5.1.2/docs/changes-early.rst --- feedparser-5.0.1/docs/changes-early.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/changes-early.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,113 @@ +Changes in earlier versions +=========================== + + + + +:program:`Universal Feed Parser` began as an "ultra-liberal RSS parser" named :file:`rssparser.py`. It was written as a weapon for battles that no one remembers, to work around problems that no longer exist. + +:program:`Ultra-liberal Feed Parser` 2.5.3 was released on August 3, 2003. + +- track whether we're inside an image or textInput (TvdV) + +- return the character encoding, if specified + + +:program:`Ultra-liberal Feed Parser` 2.5.2 was released on July 28, 2003. + +- entity-decode inline :abbr:`XML (Extensible Markup Language)` properly + +- added support for inline <xhtml:body> and <xhtml:div> as used in some :abbr:`RSS (Rich Site Summary)` 2.0 feeds + + +:program:`Ultra-liberal Feed Parser` 2.5.1 was released on July 26, 2003. + +- clear ``opener.addheaders`` so we only send our custom ``User-Agent`` (otherwise :file:`urllib2` sends two, which confuses some servers) (RMK) + + +:program:`Ultra-liberal Feed Parser` 2.5 was released on July 25, 2003. + +- changed to :program:`Python` license (all contributors agree) + +- removed unnecessary :file:`>urllib` code -- :file:`urllib2` should always be available anyway + +- return actual ``url``, ``status``, and full :abbr:`HTTP (Hypertext Transfer Protocol)` headers (as ``result['url']``, ``result['status']``, and ``result['headers']``) if parsing a remote feed over :abbr:`HTTP (Hypertext Transfer Protocol)`. This should pass all the `Aggregator client :abbr:`HTTP (Hypertext Transfer Protocol)` tests <http://diveintomark.org/tests/client/http/>`_. + +- added the latest namespace-of-the-week for :abbr:`RSS (Rich Site Summary)` 2.0 + + +:program:`Ultra-liberal Feed Parser` 2.4 was released on July 9, 2003. + +- added preliminary Pie/Atom/Echo support based on `Sam Ruby's snapshot of July 1 <http://www.intertwingly.net/blog/1506.html>`_ + +- changed project name + + +:program:`Ultra-liberal RSS Parser` 2.3.1 was released on June 12, 2003. + +- if item has both link and guid, return both as-is + + +:program:`Ultra-liberal RSS Parser` 2.3 was released on June 11, 2003. + +- added ``USER_AGENT`` for default (if caller doesn't specify) + +- make sure we send the ``User-Agent`` even if :file:`urllib2` isn't available + +- Match any variation of ``backend.userland.com/rss`` namespace + + +:program:`Ultra-liberal RSS Parser` 2.2 was released on January 27, 2003. + +- added attribute support and admin:generatorAgent. start_admingeneratoragent is an example of how to handle elements with only attributes, no content. + + +:program:`Ultra-liberal RSS Parser` 2.1 was released on November 14, 2002. + +- added gzip support + + +:program:`Ultra-liberal RSS Parser` 2.0.2 was released on October 21, 2002. + +- added the ``inchannel`` to the ``if`` statement, otherwise it's useless. Fixes the problem JD was addressing by adding it. (JB) + + +:program:`Ultra-liberal RSS Parser` 2.0.1 was released on October 21, 2002. + +- changed ``parse()`` so that if we don't get anything because of ``etag``/``modified``, return the old ``etag``/``modified`` to the caller to indicate why nothing is being returned + + +:program:`Ultra-liberal RSS Parser` 2.0 was released on October 19, 2002. + +- use ``inchannel`` to watch out for image and textinput elements which can also contain title, link, and description elements (JD) + +- check for isPermaLink='false' attribute on guid elements (JD) + +- replaced ``openAnything`` with ``open_resource`` supporting ``ETag`` and ``If-Modified-Since`` request headers (JD) + +- ``parse`` now accepts ``etag``, ``modified``, ``agent``, and ``referrer`` optional arguments (JD) + +- modified ``parse`` to return a dictionary instead of a tuple so that any ``etag`` or ``modified`` information can be returned and cached by the caller + + +:program:`Ultra-liberal RSS Parser` 1.1 was released on September 27, 2002. + +- fixed infinite loop on incomplete CDATA sections + + +:program:`Ultra-liberal RSS Parser` 1.0 was released on September 27, 2002. + +- fixed namespace processing on prefixed :abbr:`RSS (Rich Site Summary)` 2.0 elements + +- added Simon Fell's namespace test suite + + +:program:`Ultra-liberal RSS Parser` was first released on August 13, 2002. + +`Announcement <http://diveintomark.org/archives/2002/08/13/ultraliberal_rss_parser>`_: + + Aaron Swartz has been looking for an ultra-liberal :abbr:`RSS (Rich Site Summary)` parser. Now that I'm experimenting with a homegrown :abbr:`RSS (Rich Site Summary)`-to-email news aggregator, so am I. You see, most :abbr:`RSS (Rich Site Summary)` feeds suck. Invalid characters, unescaped ampersands (Blogger feeds), invalid entities (Radio feeds), unescaped and invalid HTML (The Register's feed most days). Or just a bastardized mix of :abbr:`RSS (Rich Site Summary)` 0.9x elements with :abbr:`RSS (Rich Site Summary)` 1.0 elements (Movable Type feeds). + + Then there are feeds, like Aaron's feed, which are too bleeding edge. He puts an excerpt in the description element but puts the full text in the content:encoded element (as CDATA). This is valid :abbr:`RSS (Rich Site Summary)` 1.0, but nobody actually uses it (except Aaron), few news aggregators support it, and many parsers choke on it. Other parsers are confused by the new elements (guid) in :abbr:`RSS (Rich Site Summary)` 0.94 (see Dave Winer's feed for an example). And then there's Jon Udell's feed, with the fullitem element that he just sort of made up. + + :file:`rssparser.py`. GPL-licensed. Tested on 5000 active feeds. diff -Nru feedparser-5.0.1/docs/character-encoding.rst feedparser-5.1.2/docs/character-encoding.rst --- feedparser-5.0.1/docs/character-encoding.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/character-encoding.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,134 @@ +.. _advanced.encoding: + +Character Encoding Detection +============================ + +.. tip:: + + Feeds may be published in any character encoding. :program:`Python` + supports only a few character encodings by default. To support the maximum + number of character encodings (and be able to parse the maximum number of + feeds), you should install :file:`cjkcodecs` and :file:`iconv_codec`. Both are + available at `http://cjkpython.i18n.org/ <http://cjkpython.i18n.org/>`_. + +`RFC 3023 <http://www.ietf.org/rfc/rfc3023.txt>`_ defines the interaction +between :abbr:`XML (Extensible Markup Language)` and :abbr:`HTTP (Hypertext Transfer Protocol)` +as it relates to character encoding. :abbr:`XML (Extensible Markup Language)` +and :abbr:`HTTP (Hypertext Transfer Protocol)` have different ways of +specifying character encoding and different defaults in case no encoding is +specified, and determining which value takes precedence depends on a variety of +factors. + + +Introduction to Character Encoding +---------------------------------- + +In :abbr:`XML (Extensible Markup Language)`, the character encoding is optional +and may be given in the :abbr:`XML (Extensible Markup Language)` declaration in +the first line of the document, like this: + +.. sourcecode:: xml + + <?xml version="1.0" encoding="utf-8"?> + +If no encoding is given, :abbr:`XML (Extensible Markup Language)` supports the +use of a Byte Order Mark to identify the document as some flavor of UTF-32, +UTF-16, or UTF-8. `Section F of the XML specification <http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info>`_ +outlines the process for determining the character encoding based on unique +properties of the Byte Order Mark in the first two to four bytes of the +document. + +If no encoding is specified and no Byte Order Mark is present, :abbr:`XML (Extensible Markup Language)` +defaults to UTF-8. + +:abbr:`HTTP (Hypertext Transfer Protocol)` uses :abbr:`MIME` to define a method +of specifying the character encoding, as part of the Content-Type :abbr:`HTTP (Hypertext Transfer Protocol)` +header, which looks like this: + +:: + + Content-Type: text/html; charset="utf-8" + + +If no charset is specified, :abbr:`HTTP (Hypertext Transfer Protocol)` defaults +to iso-8859-1, but only for text/* media types. For other media types, the +default encoding is undefined, which is where :abbr:`RFC (Request For Comments)` 3023 comes in. + +According to :abbr:`RFC (Request For Comments)` 3023, if the media type given +in the Content-Type :abbr:`HTTP (Hypertext Transfer Protocol)` header is +application/xml, application/xml-dtd, application/xml-external-parsed-entity, +or any one of the subtypes of application/xml such as application/atom+xml or +application/rss+xml or even application/rdf+xml, then the encoding is + + +#. the encoding given in the ``charset`` parameter of the Content-Type :abbr:`HTTP (Hypertext Transfer Protocol)` header, or + +#. the encoding given in the encoding attribute of the :abbr:`XML (Extensible Markup Language)` declaration within the document, or + +#. utf-8. + + +On the other hand, if the media type given in the Content-Type +:abbr:`HTTP (Hypertext Transfer Protocol)` header is text/xml, +text/xml-external-parsed-entity, or a subtype like text/AnythingAtAll+xml, then +the encoding attribute of the :abbr:`XML (Extensible Markup Language)` +declaration within the document is ignored completely, and the encoding is + + +#. the encoding given in the charset parameter of the Content-Type :abbr:`HTTP (Hypertext Transfer Protocol)` header, or + +#. us-ascii. + + +Handling Incorrectly-Declared Encodings +--------------------------------------- + +:program:`Universal Feed Parser` initially uses the rules specified in +:abbr:`RFC (Request For Comments)` 3023 to determine the character encoding of +the feed. If parsing succeeds, then that's that. If parsing fails, +:program:`Universal Feed Parser` sets the ``bozo`` bit to ``1`` and sets +``bozo_exception`` to ``feedparser.CharacterEncodingOverride``. Then it tries +to reparse the feed with the following character encodings: + + +#. the encoding specified in the :abbr:`XML (Extensible Markup Language)` declaration + +#. the encoding sniffed from the first four bytes of the document (as per `Section F <http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info>`_) + +#. the encoding auto-detected by the `Universal Encoding Detector <http://chardet.feedparser.org/>`_, if installed + +#. utf-8 + +#. windows-1252 + + +If the character encoding can not be determined, :program:`Universal Feed Parser` +sets the ``bozo`` bit to ``1`` and sets ``bozo_exception`` to +``feedparser.CharacterEncodingUnknown``. In this case, parsed values will be +strings, not Unicode strings. + + +Handling Incorrectly-Declared Media Types +----------------------------------------- + +:abbr:`RFC (Request For Comments)` 3023 only applies when the feed is served +over :abbr:`HTTP (Hypertext Transfer Protocol)` with a Content-Type that +declares the feed to be some kind of :abbr:`XML (Extensible Markup Language)`. +However, some web servers are severely misconfigured and serve feeds with a +Content-Type of text/plain, application/octet-stream, or some completely bogus +media type. + +:program:`Universal Feed Parser` will attempt to parse such feeds, but it will +set the ``bozo`` bit to ``1`` and set ``bozo_exception`` to +``feedparser.NonXMLContentType``. + + +.. seealso:: + + * `RFC 3023 <http://www.ietf.org/rfc/rfc3023.txt>`_ + + * `Section F of the XML specification <http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info>`_ + + * `On the well-formedness of XML documents served as text/plain <http://www.imc.org/atom-syntax/mail-archive/msg05575.html>`_ + + * `CJKCodecs and iconv_codec <http://cjkpython.i18n.org/>`_ diff -Nru feedparser-5.0.1/docs/common-atom-elements.rst feedparser-5.1.2/docs/common-atom-elements.rst --- feedparser-5.0.1/docs/common-atom-elements.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/common-atom-elements.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,130 @@ +Common Atom Elements +==================== + +Atom feeds generally contain more information than :abbr:`RSS (Rich Site Summary)` +feeds (because more elements are required), but the most commonly used elements +are still title, link, subtitle/description, various dates, and ID. + +This sample Atom feed is at `http://feedparser.org/docs/examples/atom10.xml +<http://feedparser.org/docs/examples/atom10.xml>`_. + +.. sourcecode:: xml + + <?xml version="1.0" encoding="utf-8"?> + <feed xmlns="http://www.w3.org/2005/Atom" + xml:base="http://example.org/" + xml:lang="en"> + <title type="text">Sample Feed + + For documentation <em>only</em> + + + + + <p>Copyright 2005, Mark Pilgrim</p>< + + tag:feedparser.org,2005-11-09:/docs/examples/atom10.xml + + Sample Toolkit + + 2005-11-09T11:56:34Z + + First entry title + + + + + tag:feedparser.org,2005-11-09:/docs/examples/atom10.xml:3 + 2005-11-09T00:23:47Z + 2005-11-09T11:56:34Z + Watch out for nasty tricks + +
Watch out for + + nasty tricks
+
+
+ + +The feed elements are available in ``d.feed``. + +Accessing Common Feed Elements +------------------------------ + +:: + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') + >>> d.feed.title + u'Sample feed' + >>> d.feed.link + u'http://example.org/' + >>> d.feed.subtitle + u'For documentation only' + >>> d.feed.updated + u'2005-11-09T11:56:34Z' + >>> d.feed.updated_parsed + (2005, 11, 9, 11, 56, 34, 2, 313, 0) + >>> d.feed.id + u'tag:feedparser.org,2005-11-09:/docs/examples/atom10.xml' + +Entries are available in ``d.entries``, which is a list. You access entries in +the order in which they appear in the original feed, so the first entry is +``d.entries[0]``. + +Accessing Common Entry Elements +------------------------------- + +:: + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') + >>> d.entries[0].title + u'First entry title' + >>> d.entries[0].link + u'http://example.org/entry/3 + >>> d.entries[0].id + u'tag:feedparser.org,2005-11-09:/docs/examples/atom10.xml:3' + >>> d.entries[0].published + u'2005-11-09T00:23:47Z' + >>> d.entries[0].published_parsed + (2005, 11, 9, 0, 23, 47, 2, 313, 0) + >>> d.entries[0].updated + u'2005-11-09T11:56:34Z' + >>> d.entries[0].updated_parsed + (2005, 11, 9, 11, 56, 34, 2, 313, 0) + >>> d.entries[0].summary + u'Watch out for nasty tricks' + >>> d.entries[0].content + [{'type': u'application/xhtml+xml', + 'base': u'http://example.org/entry/3', + 'language': u'en-US', + 'value': u'
Watch out for nasty tricks
'}] + +.. note:: + + The parsed summary and content are not the same as they appear in the + original feed. The original elements contained dangerous :abbr:`HTML + (HyperText Markup Language)` markup which was sanitized. See + :ref:`advanced.sanitization` for details. + +Because Atom entries can have more than one content element, +``d.entries[0].content`` is a list of dictionaries. Each dictionary contains +metadata about a single content element. The two most important values in the +dictionary are the content type, in ``d.entries[0].content[0].type``, and the +actual content value, in ``d.entries[0].content[0].value``. + +You can get this level of detail on other Atom elements too. diff -Nru feedparser-5.0.1/docs/common-rss-elements.rst feedparser-5.1.2/docs/common-rss-elements.rst --- feedparser-5.0.1/docs/common-rss-elements.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/common-rss-elements.rst 2012-05-03 13:36:49.000000000 +0000 @@ -0,0 +1,81 @@ +Common :abbr:`RSS (Rich Site Summary)` Elements +=============================================== + +The most commonly used elements in :abbr:`RSS (Rich Site Summary)` feeds +(regardless of version) are title, link, description, publication date, and entry +ID. The publication date comes from the pubDate element, and the entry ID comes +from the guid element. + +This sample :abbr:`RSS (Rich Site Summary)` feed is at +`http://feedparser.org/docs/examples/rss20.xml +`_. + +.. sourcecode:: xml + + + + + Sample Feed + For documentation <em>only</em> + http://example.org/ + Sat, 07 Sep 2002 00:00:01 GMT + + + First entry title + http://example.org/entry/3 + Watch out for <span style="background-image: + url(javascript:window.location='http://example.org/')">nasty + tricks</span> + Thu, 05 Sep 2002 00:00:01 GMT + http://example.org/entry/3 + + + + + + +The channel elements are available in ``d.feed``. + +Accessing Common Channel Elements +--------------------------------- +:: + + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/rss20.xml') + >>> d.feed.title + u'Sample Feed' + >>> d.feed.link + u'http://example.org/' + >>> d.feed.description + u'For documentation only' + >>> d.feed.published + u'Sat, 07 Sep 2002 00:00:01 GMT' + >>> d.feed.published_parsed + (2002, 9, 7, 0, 0, 1, 5, 250, 0) + + +The items are available in ``d.entries``, which is a list. You access items in the list in the same order in which they appear in the original feed, so the first item is available in ``d.entries[0]``. + +Accessing Common Item Elements +------------------------------ +:: + + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/rss20.xml') + >>> d.entries[0].title + u'First item title' + >>> d.entries[0].link + u'http://example.org/item/1' + >>> d.entries[0].description + u'Watch out for nasty tricks' + >>> d.entries[0].published + u'Thu, 05 Sep 2002 00:00:01 GMT' + >>> d.entries[0].published_parsed + (2002, 9, 5, 0, 0, 1, 3, 248, 0) + >>> d.entries[0].id + u'http://example.org/guid/1' + + +.. tip:: You can also access data from :abbr:`RSS (Rich Site Summary)` feeds using Atom terminology. See :ref:`advanced.normalization` for details. diff -Nru feedparser-5.0.1/docs/conf.py feedparser-5.1.2/docs/conf.py --- feedparser-5.0.1/docs/conf.py 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/conf.py 2012-05-03 13:44:10.000000000 +0000 @@ -0,0 +1,19 @@ +# project information +project = u'feedparser' +copyright = u'2004-8, Mark Pilgrim' +version = u'5.1.2' +release = u'5.1.2' +language = u'en' + +# documentation options +master_doc = 'index' +exclude_patterns = ['_build'] + +# use a custom extension to make Sphinx add a to feedparser.css +import sys, os.path +sys.path.append(os.path.dirname(os.path.abspath(__file__))) +extensions = ['add_custom_css'] + +# customize the html +# files in html_static_path will be copied into _static/ when compiled +html_static_path = ['_static'] diff -Nru feedparser-5.0.1/docs/content-normalization.rst feedparser-5.1.2/docs/content-normalization.rst --- feedparser-5.0.1/docs/content-normalization.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/content-normalization.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,74 @@ +.. _advanced.normalization: + +Content Normalization +===================== + +:program:`Universal Feed Parser` can parse many different types of feeds: Atom, +:abbr:`CDF (Channel Definition Format)`, and nine different versions of +:abbr:`RSS (Rich Site Summary)`. You should not be forced to learn the +differences between these formats. :program:`Universal Feed Parser` does its +best to ensure that you can treat all feeds the same way, regardless of format +or version. + +You can access the basic elements of an Atom feed using :abbr:`RSS (Rich Site Summary)` terminology. + +Accessing an Atom feed as an :abbr:`RSS (Rich Site Summary)` feed +----------------------------------------------------------------- + +:: + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') + >>> d['channel']['title'] + u'Sample Feed' + >>> d['channel']['link'] + u'http://example.org/' + >>> d['channel']['description'] + u'For documentation only + >>> len(d['items']) + 1 + >>> e = d['items'][0] + >>> e['title'] + u'First entry title' + >>> e['link'] + u'http://example.org/entry/3' + >>> e['description'] + u'Watch out for nasty tricks' + >>> e['author'] + u'Mark Pilgrim (mark@example.org)' + + +The same thing works in reverse: you can access :abbr:`RSS (Rich Site Summary)` feeds as if they were Atom feeds. + +Accessing an :abbr:`RSS (Rich Site Summary)` feed as an Atom feed +----------------------------------------------------------------- + +:: + + >>> import feedparser + >>> d = feedparser.parse(' http://feedparser.org/docs/examples/rss20.xml') + >>> d.feed.subtitle_detail + {'type': 'text/html', + 'base': 'http://feedparser.org/docs/examples/rss20.xml', + 'language': None, + 'value': u'For documentation only'} + >>> len(d.entries) + 1 + >>> e = d.entries[0] + >>> e.links + [{'rel': 'alternate', + 'type': 'text/html', + 'href': u'http://example.org/item/1'}] + >>> e.summary_detail + {'type': 'text/html', + 'base': 'http://feedparser.org/docs/examples/rss20.xml', + 'language': u'en', + 'value': u'Watch out for nasty tricks'} + >>> e.updated_parsed + (2002, 9, 5, 0, 0, 1, 3, 248, 0) + + +.. note:: + + For more examples of how :program:`Universal Feed Parser` normalizes + content from different formats, see :ref:`annotated`. diff -Nru feedparser-5.0.1/docs/date-parsing.rst feedparser-5.1.2/docs/date-parsing.rst --- feedparser-5.0.1/docs/date-parsing.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/date-parsing.rst 2012-02-19 23:50:53.000000000 +0000 @@ -0,0 +1,177 @@ +.. _advanced.date: + +Date Parsing +============ + +Different feed types and versions use wildly different date formats. +:program:`Universal Feed Parser` will attempt to auto-detect the date format +used in any date element, and parse it into a standard :program:`Python` +9-tuple, as documented in `the Python time module `_. + +The following elements are parsed as dates: + +- :ref:`reference.feed.updated` is parsed into :ref:`reference.feed.updated_parsed`. + +- :ref:`reference.entry.published` is parsed into :ref:`reference.entry.published_parsed`. + +- :ref:`reference.entry.updated` is parsed into :ref:`reference.entry.updated_parsed`. + +- :ref:`reference.entry.created` is parsed into :ref:`reference.entry.created_parsed`. + +- :ref:`reference.entry.expired` is parsed into :ref:`reference.entry.expired_parsed`. + + +History of Date Formats +----------------------- + + +Here is a brief history of feed date formats: + +- :abbr:`CDF (Channel Definition Format)` states that all date values must + conform to ISO 8601:1988. ISO 8601:1988 is not a freely + available specification, but a brief (non-normative) description of the date + formats it describes is available here: `ISO 8601:1988 Date/Time Representations `_. + +- :abbr:`RSS (Rich Site Summary)` 0.90 has no date elements. + +- Netscape :abbr:`RSS (Rich Site Summary)` 0.91 does not specify a date format, + but examples within the specification show :abbr:`RFC (Request For Comments)` + 822-style dates with 4-digit years. + +- Userland :abbr:`RSS (Rich Site Summary)` 0.91 states, "All date-times in + :abbr:`RSS (Rich Site Summary)` conform to the Date and Time Specification of + :abbr:`RFC (Request For Comments)` 822." `RFC 822 `_ + mandates 2-digit years; it does not allow 4-digit years. + +- :abbr:`RSS (Rich Site Summary)` 1.0 states that all date elements must + conform to `W3CDTF `_, + which is a profile of ISO 8601:1988. + +- :abbr:`RSS (Rich Site Summary)` 2.0 states, "All date-times in :abbr:`RSS (Rich Site Summary)` conform to the Date and Time Specification of RFC 822, with the exception that the year may be expressed with two characters or four characters (four preferred)." + +- Atom 0.3 states that all date elements must conform to + `W3CDTF `_. + +- Atom 1.0 states that all date elements "MUST conform to the date-time + production in `RFC 3339 `_. + In addition, an uppercase T character MUST be used to separate date and time, + and an uppercase Z character MUST be present in the absence of a numeric time + zone offset." + + +Recognized Date Formats +----------------------- + +Here is a representative list of the formats that :program:`Universal Feed +Parser` can recognize in any date element: + + +Recognized Date Formats + + +============================================ ================================= ===================================== +Description Example Parsed Value +============================================ ================================= ===================================== +valid RFC 822 (2-digit year) Thu, 01 Jan 04 19:48:21 GMT (2004, 1, 1, 19, 48, 21, 3, 1, 0) +valid RFC 822 (4-digit year) Thu, 01 Jan 2004 19:48:21 GMT (2004, 1, 1, 19, 48, 21, 3, 1, 0) +invalid RFC 822 (no time) 01 Jan 2004 (2004, 1, 1, 0, 0, 0, 3, 1, 0) +invalid RFC 822 (no seconds) 01 Jan 2004 00:00 GMT (2004, 1, 1, 0, 0, 0, 3, 1, 0) +valid W3CDTF (numeric timezone) 2003-12-31T10:14:55-08:00 (2003, 12, 31, 18, 14, 55, 2, 365, 0) +valid W3CDTF (UTC timezone) 2003-12-31T10:14:55Z (2003, 12, 31, 10, 14, 55, 2, 365, 0) +valid W3CDTF (yyyy) 2003 (2003, 1, 1, 0, 0, 0, 2, 1, 0) +valid W3CDTF (yyyy-mm) 2003-12 (2003, 12, 1, 0, 0, 0, 0, 335, 0) +valid W3CDTF (yyyy-mm-dd) 2003-12-31 (2003, 12, 31, 0, 0, 0, 2, 365, 0) +valid ISO 8601 (yyyymmdd) 20031231 (2003, 12, 31, 0, 0, 0, 2, 365, 0) +valid ISO 8601 (-yy-mm) -03-12 (2003, 12, 1, 0, 0, 0, 0, 335, 0) +valid ISO 8601 (-yymm) -0312 (2003, 12, 1, 0, 0, 0, 0, 335, 0) +valid ISO 8601 (-yy-mm-dd) -03-12-31 (2003, 12, 31, 0, 0, 0, 2, 365, 0) +valid ISO 8601 (yymmdd) 031231 (2003, 12, 31, 0, 0, 0, 2, 365, 0) +valid ISO 8601 (yyyy-o) 2003-335 (2003, 12, 1, 0, 0, 0, 0, 335, 0) +valid ISO 8601 (yyo) 03335 (2003, 12, 1, 0, 0, 0, 0, 335, 0) +valid asctime Sun Jan 4 16:29:06 PST 2004 (2004, 1, 5, 0, 29, 6, 0, 5, 0) +bogus RFC 822 (invalid day/month) Thu, 31 Jun 2004 19:48:21 GMT (2004, 7, 1, 19, 48, 21, 3, 183, 0) +bogus RFC 822 (invalid month) Mon, 26 January 2004 16:31:00 EST (2004, 1, 26, 21, 31, 0, 0, 26, 0) +bogus RFC 822 (invalid timezone) Mon, 26 Jan 2004 16:31:00 ET (2004, 1, 26, 21, 31, 0, 0, 26, 0) +bogus W3CDTF (invalid hour) 2003-12-31T25:14:55Z (2004, 1, 1, 1, 14, 55, 3, 1, 0) +bogus W3CDTF (invalid minute) 2003-12-31T10:61:55Z (2003, 12, 31, 11, 1, 55, 2, 365, 0) +bogus W3CDTF (invalid second) 2003-12-31T10:14:61Z (2003, 12, 31, 10, 15, 1, 2, 365, 0) +bogus (MSSQL) 2004-07-08 23:56:58.0 (2004, 7, 8, 14, 56, 58, 3, 190, 0) +bogus (MSSQL-ish, without fractional second) 2004-07-08 23:56:58 (2004, 7, 8, 14, 56, 58, 3, 190, 0) +bogus (Korean) 2004-05-25 오 11:23:17 (2004, 5, 25, 14, 23, 17, 1, 146, 0) +bogus (Greek) ΚυÏ, 11 ΙοÏλ 2004 12:00:00 EST (2004, 7, 11, 17, 0, 0, 6, 193, 0) +bogus (Hungarian) július-13T9:15-05:00 (2004, 7, 13, 14, 15, 0, 1, 195, 0) +============================================ ================================= ===================================== + + +:program:`Universal Feed Parser` recognizes all character-based timezone +abbreviations defined in :abbr:`RFC (Request For Comments)` 822. In addition, +:program:`Universal Feed Parser` recognizes the following invalid timezones: + + +- ``AT`` is treated as ``AST`` + +- ``ET`` is treated as ``EST`` + +- ``CT`` is treated as ``CST`` + +- ``MT`` is treated as ``MST`` + +- ``PT`` is treated as ``PST`` + + + +Supporting Additional Date Formats +---------------------------------- + +:program:`Universal Feed Parser` supports many different date formats, but +there are probably many more in the wild that are still unsupported. If you +find other date formats, you can support them by registering them with +``registerDateHandler``. It takes a single argument, a callback function. The +callback function should take a single argument, a string, and return a single +value, a 9-tuple :program:`Python` date in UTC. + + +Registering a third-party date handler +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +:: + + import feedparser + import re + + _my_date_pattern = re.compile( + r'(\d{,2})/(\d{,2})/(\d{4}) (\d{,2}):(\d{2}):(\d{2})') + + def myDateHandler(aDateString): + """parse a UTC date in MM/DD/YYYY HH:MM:SS format""" + month, day, year, hour, minute, second = \ + _my_date_pattern.search(aDateString).groups() + return (int(year), int(month), int(day), \ + int(hour), int(minute), int(second), 0, 0, 0) + + feedparser.registerDateHandler(myDateHandler) + d = feedparser.parse(...) + + + +Your newly-registered date handler will be tried before all the other date +handlers built into :program:`Universal Feed Parser`. (More specifically, all +date handlers are tried in "last in, first out" order; i.e. the last handler to +be registered is the first one tried, and so on in reverse order of +registration.) + + +If your date handler returns ``None``, or anything other than a +:program:`Python` 9-tuple date, or raises an exception of any kind, the error +will be silently ignored and the other registered date handlers will be tried +in order. If no date handlers succeed, then the date is not parsed, and the +\*_parsed value will not be present in the results dictionary. The original +date string will still be available in the appropriate element in the results +dictionary. + + +.. tip:: + + If you write a new date handler, you are encouraged (but not required) to + `submit a patch `_ so it can be + integrated into the next version of :program:`Universal Feed Parser`. diff -Nru feedparser-5.0.1/docs/history.rst feedparser-5.1.2/docs/history.rst --- feedparser-5.0.1/docs/history.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/history.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,19 @@ +Revision history +################ + +.. toctree:: + :maxdepth: 2 + + changes-42 + changes-41 + changes-402 + changes-401 + changes-40 + changes-33 + changes-32 + changes-31 + changes-301 + changes-30 + changes-27 + changes-26 + changes-early diff -Nru feedparser-5.0.1/docs/html-sanitization.rst feedparser-5.1.2/docs/html-sanitization.rst --- feedparser-5.0.1/docs/html-sanitization.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/html-sanitization.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,744 @@ +.. _advanced.sanitization: + +Sanitization +============ + +Most feeds embed :abbr:`HTML (HyperText Markup Language)` markup within feed +elements. Some feeds even embed other types of markup, such as :abbr:`SVG +(Scalable Vector Graphics)` or :abbr:`MathML (Mathematical Markup Language)`. +Since many feed aggregators use a web browser (or browser component) to display +content, :program:`Universal Feed Parser` sanitizes embedded markup to remove +things that could pose security risks. + +These elements are sanitized by default: + +* :ref:`reference.entry.content` +* :ref:`reference.entry.summary` +* :ref:`reference.entry.title` +* :ref:`reference.feed.info` +* :ref:`reference.feed.rights` +* :ref:`reference.feed.subtitle` +* :ref:`reference.feed.title` + + +.. note:: + + If the content is declared to be (or is determined to be) + :mimetype:`text/plain`, it will not be sanitized. This is to avoid data loss. + It is recommended that you check the content type in e.g. + :py:attr:`entries[i].summary_detail.type`. If it is :mimetype:`text/plain` then + it has not been sanitized (and you should perform HTML escaping before + rendering the content). + + +.. _advanced.sanitization.html: + +:abbr:`HTML (HyperText Markup Language)` Sanitization +----------------------------------------------------- + +The following :abbr:`HTML (HyperText Markup Language)` elements are allowed by +default (all others are stripped): + +.. hlist:: + :columns: 3 + + * a + * abbr + * acronym + * address + * area + * article + * aside + * audio + * b + * big + * blockquote + * br + * button + * canvas + * caption + * center + * cite + * code + * col + * colgroup + * command + * datagrid + * datalist + * dd + * del + * details + * dfn + * dialog + * dir + * div + * dl + * dt + * em + * event-source + * fieldset + * figure + * font + * footer + * form + * h1 + * h2 + * h3 + * h4 + * h5 + * h6 + * header + * hr + * i + * img + * input + * ins + * kbd + * keygen + * label + * legend + * li + * m + * map + * menu + * meter + * multicol + * nav + * nextid + * noscript + * ol + * optgroup + * option + * output + * p + * pre + * progress + * q + * s + * samp + * section + * select + * small + * sound + * source + * spacer + * span + * strike + * strong + * sub + * sup + * table + * tbody + * td + * textarea + * tfoot + * th + * thead + * time + * tr + * tt + * u + * ul + * var + * video + + +The following :abbr:`HTML (HyperText Markup Language)` attributes are allowed +by default (all others are stripped): + +.. hlist:: + :columns: 3 + + * abbr + * accept + * accept-charset + * accesskey + * action + * align + * alt + * autocomplete + * autofocus + * autoplay + * axis + * background + * balance + * bgcolor + * bgproperties + * border + * bordercolor + * bordercolordark + * bordercolorlight + * bottompadding + * cellpadding + * cellspacing + * ch + * challenge + * char + * charoff + * charset + * checked + * choff + * cite + * class + * clear + * color + * cols + * colspan + * compact + * contenteditable + * coords + * data + * datafld + * datapagesize + * datasrc + * datetime + * default + * delay + * dir + * disabled + * draggable + * dynsrc + * enctype + * end + * face + * for + * form + * frame + * galleryimg + * gutter + * headers + * height + * hidden + * hidefocus + * high + * href + * hreflang + * hspace + * icon + * id + * inputmode + * ismap + * keytype + * label + * lang + * leftspacing + * list + * longdesc + * loop + * loopcount + * loopend + * loopstart + * low + * lowsrc + * max + * maxlength + * media + * method + * min + * multiple + * name + * nohref + * noshade + * nowrap + * open + * optimum + * pattern + * ping + * point-size + * pqg + * prompt + * radiogroup + * readonly + * rel + * repeat-max + * repeat-min + * replace + * required + * rev + * rightspacing + * rows + * rowspan + * rules + * scope + * selected + * shape + * size + * span + * src + * start + * step + * summary + * suppress + * tabindex + * target + * template + * title + * toppadding + * type + * unselectable + * urn + * usemap + * valign + * value + * variable + * volume + * vrml + * vspace + * width + * wrap + * xml:lang + + +.. _advanced.sanitization.svg: + +:abbr:`SVG (Scalable Vector Graphics)` Sanitization +--------------------------------------------------- + +The following SVG elements are allowed by default (all others are stripped): + +.. hlist:: + :columns: 3 + + * a + * animate + * animateColor + * animateMotion + * animateTransform + * circle + * defs + * desc + * ellipse + * font-face + * font-face-name + * font-face-src + * foreignObject + * g + * glyph + * hkern + * line + * linearGradient + * marker + * metadata + * missing-glyph + * mpath + * path + * polygon + * polyline + * radialGradient + * rect + * set + * stop + * svg + * switch + * text + * title + * tspan + * use + + +The following :abbr:`SVG (Scalable Vector Graphics)` attributes are allowed by +default (all others are stripped): + +.. hlist:: + :columns: 3 + + * accent-height + * accumulate + * additive + * alphabetic + * arabic-form + * ascent + * attributeName + * attributeType + * baseProfile + * bbox + * begin + * by + * calcMode + * cap-height + * class + * color + * color-rendering + * content + * cx + * cy + * d + * descent + * display + * dur + * dx + * dy + * end + * fill + * fill-opacity + * fill-rule + * font-family + * font-size + * font-stretch + * font-style + * font-variant + * font-weight + * from + * fx + * fy + * g1 + * g2 + * glyph-name + * gradientUnits + * hanging + * height + * horiz-adv-x + * horiz-origin-x + * id + * ideographic + * k + * keyPoints + * keySplines + * keyTimes + * lang + * marker-end + * marker-mid + * marker-start + * markerHeight + * markerUnits + * markerWidth + * mathematical + * max + * min + * name + * offset + * opacity + * orient + * origin + * overline-position + * overline-thickness + * panose-1 + * path + * pathLength + * points + * preserveAspectRatio + * r + * refX + * refY + * repeatCount + * repeatDur + * requiredExtensions + * requiredFeatures + * restart + * rotate + * rx + * ry + * slope + * stemh + * stemv + * stop-color + * stop-opacity + * strikethrough-position + * strikethrough-thickness + * stroke + * stroke-dasharray + * stroke-dashoffset + * stroke-linecap + * stroke-linejoin + * stroke-miterlimit + * stroke-opacity + * stroke-width + * systemLanguage + * target + * text-anchor + * to + * transform + * type + * u1 + * u2 + * underline-position + * underline-thickness + * unicode + * unicode-range + * units-per-em + * values + * version + * viewBox + * visibility + * width + * widths + * x + * x-height + * x1 + * x2 + * xlink:actuate + * xlink:arcrole + * xlink:href + * xlink:role + * xlink:show + * xlink:title + * xlink:type + * xml:base + * xml:lang + * xml:space + * xmlns + * xmlns:xlink + * y + * y1 + * y2 + * zoomAndPan + + +.. _advanced.sanitization.mathml: + +:abbr:`MathML (Mathematical Markup Language)` Sanitization +---------------------------------------------------------- + +The following :abbr:`MathML (Mathematical Markup Language)` elements are +allowed by default (all others are stripped): + +.. hlist:: + :columns: 3 + + * annotation + * annotation-xml + * maction + * math + * merror + * mfenced + * mfrac + * mi + * mmultiscripts + * mn + * mo + * mover + * mpadded + * mphantom + * mprescripts + * mroot + * mrow + * mspace + * msqrt + * mstyle + * msub + * msubsup + * msup + * mtable + * mtd + * mtext + * mtr + * munder + * munderover + * none + * semantics + + +The following :abbr:`MathML (Mathematical Markup Language)` attributes are +allowed by default (all others are stripped): + +.. hlist:: + :columns: 3 + + * actiontype + * align + * close + * columnalign + * columnlines + * columnspacing + * columnspan + * depth + * display + * displaystyle + * encoding + * equalcolumns + * equalrows + * fence + * fontstyle + * fontweight + * frame + * height + * linethickness + * lspace + * mathbackground + * mathcolor + * mathvariant + * maxsize + * minsize + * open + * other + * rowalign + * rowlines + * rowspacing + * rowspan + * rspace + * scriptlevel + * selection + * separator + * separators + * stretchy + * width + * xlink:href + * xlink:show + * xlink:type + * xmlns + * xmlns:xlink + + +.. _advanced.sanitization.css: + +:abbr:`CSS (Cascading Style Sheets)` Sanitization +------------------------------------------------- + +The following :abbr:`CSS (Cascading Style Sheets)` properties are allowed by +default in style attributes (all others are stripped): + +.. hlist:: + :columns: 3 + + * azimuth + * background-color + * border-bottom-color + * border-collapse + * border-color + * border-left-color + * border-right-color + * border-top-color + * clear + * color + * cursor + * direction + * display + * elevation + * float + * font + * font-family + * font-size + * font-style + * font-variant + * font-weight + * height + * letter-spacing + * line-height + * overflow + * pause + * pause-after + * pause-before + * pitch + * pitch-range + * richness + * speak + * speak-header + * speak-numeral + * speak-punctuation + * speech-rate + * stress + * text-align + * text-decoration + * text-indent + * unicode-bidi + * vertical-align + * voice-family + * volume + * white-space + * width + + +.. note:: + + Not all possible CSS values are allowed for these properties. The + allowable values are restricted by a whitelist and a regular expression that + allows color values and lengths. :abbr:`URI (Uniform Resource Identifier)`\s + are not allowed, to prevent `platypus attacks `_. + See the _HTMLSanitizer class for more details. + + +Whitelist, Don't Blacklist +-------------------------- + +I am often asked why :program:`Universal Feed Parser` is so hard-assed about +:abbr:`HTML (HyperText Markup Language)` and :abbr:`CSS (Cascading Style +Sheets)` sanitizing. To illustrate the problem, here is an incomplete list of +potentially dangerous :abbr:`HTML (HyperText Markup Language)` tags and +attributes: + +* script, which can contain malicious script +* applet, embed, and object, which can automatically download and execute malicious code +* meta, which can contain malicious redirects +* onload, onunload, and all other on* attributes, which can contain malicious script +* style, link, and the style attribute, which can contain malicious script + +*style?* Yes, style. :abbr:`CSS (Cascading Style Sheets)` definitions can contain executable code. + + +Embedding Javascript in :abbr:`CSS (Cascading Style Sheets)` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This sample is taken from `http://feedparser.org/docs/examples/rss20.xml `_: + +.. sourcecode:: html + + + Watch out for + <span style="background: url(javascript:window.location='http://example.org/')"> + nasty tricks</span> + + +This sample is more advanced, and does not contain the keyword javascript: that +many naive :abbr:`HTML (HyperText Markup Language)` sanitizers scan for: + +.. sourcecode:: html + + Watch out for + <span style="any: expression(window.location='http://example.org/')"> + nasty tricks</span> + + +Internet Explorer for Windows will execute the Javascript in both of these examples. + +Now consider that in :abbr:`HTML (HyperText Markup Language)`, attribute values may be entity-encoded in several different ways. + + +Embedding encoded Javascript in :abbr:`CSS (Cascading Style Sheets)` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To a browser, this: + +.. sourcecode:: html + + + + +is the same as this (without the line breaks): + +.. sourcecode:: html + + + + +which is the same as this (without the line breaks): + +.. sourcecode:: html + + + + +And so on, plus several other variations, plus every combination of every +variation. + +The more I investigate, the more cases I find where Internet Explorer for +Windows will treat seemingly innocuous markup as code and blithely execute it. +This is why :program:`Universal Feed Parser` uses a whitelist and not a +blacklist. I am reasonably confident that none of the elements or attributes on +the whitelist are security risks. I am not at all confident about elements or +attributes that I have not explicitly investigated. And I have no confidence at +all in my ability to detect strings within attribute values that Internet +Explorer for Windows will treat as executable code. + +.. seealso:: + + `How to consume RSS safely `_ + Explains the platypus attack. diff -Nru feedparser-5.0.1/docs/http-authentication.rst feedparser-5.1.2/docs/http-authentication.rst --- feedparser-5.0.1/docs/http-authentication.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/http-authentication.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,135 @@ +Password-Protected Feeds +======================== + +:program:`Universal Feed Parser` supports downloading and parsing +password-protected feeds that are protected by :abbr:`HTTP (Hypertext Transfer Protocol)` +authentication. Both basic and digest authentication are supported. + + +Downloading a feed protected by basic authentication (the easy way) +------------------------------------------------------------------- + +The easiest way is to embed the username and password in the feed +:abbr:`URL (Uniform Resource Locator)` itself. + +In this example, the username is test and the password is basic. + +:: + + >>> import feedparser + >>> d = feedparser.parse('http://test:basic@feedparser.org/docs/examples/basic_auth.xml') + >>> d.feed.title + u'Sample Feed' + +The same technique works for digest authentication. (Technically, +:program:`Universal Feed Parser` will attempt basic authentication first, but +if that fails and the server indicates that it requires digest authentication, +:program:`Universal Feed Parser` will automatically re-request the feed with +the appropriate digest authentication headers. *This means that this technique +will send your password to the server in an easily decryptable form.*) + + +.. _example.auth.inline.digest: + +Downloading a feed protected by digest authentication (the easy but horribly insecure way) +------------------------------------------------------------------------------------------ + +In this example, the username is test and the password is digest. + +:: + + >>> import feedparser + >>> d = feedparser.parse('http://test:digest@feedparser.org/docs/examples/digest_auth.xml') + >>> d.feed.title + u'Sample Feed' + + + +You can also construct a HTTPBasicAuthHandler that contains the password +information, then pass that as a handler to the ``parse`` function. +HTTPBasicAuthHandler is part of the standard `urllib2 `_ module. + +Downloading a feed protected by :abbr:`HTTP (Hypertext Transfer Protocol)` basic authentication (the hard way) +-------------------------------------------------------------------------------------------------------------- + +:: + + import urllib2, feedparser + + # Construct the authentication handler + auth = urllib2.HTTPBasicAuthHandler() + + # Add password information: realm, host, user, password. + # A single handler can contain passwords for multiple sites; + # urllib2 will sort out which passwords get sent to which sites + # based on the realm and host of the URL you're retrieving + auth.add_password('BasicTest', 'feedparser.org', 'test', 'basic') + + # Pass the authentication handler to the feed parser. + # handlers is a list because there might be more than one + # type of handler (urllib2 defines lots of different ones, + # and you can build your own) + d = feedparser.parse('http://feedparser.org/docs/examples/basic_auth.xml', + handlers=[auth]) + + + +Digest authentication is handled in much the same way, by constructing an +HTTPDigestAuthHandler and populating it with the necessary realm, host, user, +and password information. This is more secure than +:ref:`stuffing the username and password in the URL `, +since the password will be encrypted before being sent to the server. + + +Downloading a feed protected by :abbr:`HTTP (Hypertext Transfer Protocol)` digest authentication (the secure way) +----------------------------------------------------------------------------------------------------------------- + +:: + + import urllib2, feedparser + + auth = urllib2.HTTPDigestAuthHandler() + auth.add_password('DigestTest', 'feedparser.org', 'test', 'digest') + d = feedparser.parse('http://feedparser.org/docs/examples/digest_auth.xml', + handlers=[auth]) + + +The examples so far have assumed that you know in advance that the feed is +password-protected. But what if you don't know? + +If you try to download a password-protected feed without sending all the proper +password information, the server will return an +:abbr:`HTTP (Hypertext Transfer Protocol)` status code ``401``. +:program:`Universal Feed Parser` makes this status code available in +``d.status``. + +Details on the authentication scheme are in ``d.headers['www-authenticate']``. +:program:`Universal Feed Parser` does not do any further parsing on this field; +you will need to parse it yourself. Everything before the first space is the +type of authentication (probably ``Basic`` or ``Digest``), which controls which +type of handler you'll need to construct. The realm name is given as +realm="foo" -- so foo would be your first argument to auth.add_password. Other +information in the www-authenticate header is probably safe to ignore; the +:file:`urllib2` module will handle it for you. + + +Determining that a feed is password-protected +--------------------------------------------- + +:: + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/basic_auth.xml') + >>> d.status + 401 + >>> d.headers['www-authenticate'] + 'Basic realm="Use test/basic"' + >>> d = feedparser.parse('http://feedparser.org/docs/examples/digest_auth.xml') + >>> d.status + 401 + >>> d.headers['www-authenticate'] + 'Digest realm="DigestTest", + nonce="+LV/uLLdAwA=5d77397291261b9ef256b034e19bcb94f5b7992a", + algorithm=MD5, + qop="auth"' + diff -Nru feedparser-5.0.1/docs/http-etag.rst feedparser-5.1.2/docs/http-etag.rst --- feedparser-5.0.1/docs/http-etag.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/http-etag.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,90 @@ +.. _http.etag: + +ETag and Last-Modified Headers +============================== + +ETags and Last-Modified headers are two ways that feed publishers can save +bandwidth, but they only work if clients take advantage of them. +:program:`Universal Feed Parser` gives you the ability to take advantage of +these features, but you must use them properly. + +The basic concept is that a feed publisher may provide a special +:abbr:`HTTP (Hypertext Transfer Protocol)` header, called an ETag, when it +publishes a feed. You should send this ETag back to the server on subsequent +requests. If the feed has not changed since the last time you requested it, +the server will return a special :abbr:`HTTP (Hypertext Transfer Protocol)` +status code (``304``) and no feed data. + +Using ETags to reduce bandwidth +------------------------------- + +:: + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') + >>> d.etag + '"6c132-941-ad7e3080"' + >>> d2 = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml', etag=d.etag) + >>> d2.status + 304 + >>> d2.feed + {} + >>> d2.entries + [] + >>> d2.debug_message + 'The feed has not changed since you last checked, so + the server sent no data. This is a feature, not a bug!' + +There is a related concept which accomplishes the same thing, but slightly +differently. In this case, the server publishes the last-modified date of the +feed in the :abbr:`HTTP (Hypertext Transfer Protocol)` header. You can send +this back to the server on subsequent requests, and if the feed has not +changed, the server will return :abbr:`HTTP (Hypertext Transfer Protocol)` +status code ``304`` and no feed data. + + +Using Last-Modified headers to reduce bandwidth +----------------------------------------------- + +:: + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') + >>> d.modified + (2004, 6, 11, 23, 0, 34, 4, 163, 0) + >>> d2 = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml', modified=d.modified) + >>> d2.status + 304 + >>> d2.feed + {} + >>> d2.entries + [] + >>> d2.debug_message + 'The feed has not changed since you last checked, so + the server sent no data. This is a feature, not a bug!' + +Clients should support both ETag and Last-Modified headers, as some servers support one but not the other. + + +.. important:: + + If you do not support ETag and Last-Modified headers, you will repeatedly + download feeds that have not changed. This wastes your bandwidth and the + publisher's bandwidth, and the publisher may ban you from accessing their + server. + + +.. note:: + + You can control the behaviour of :abbr:`HTTP (Hypertext Transfer Protocol)` + caches between your application and the origin server by using the + ``extra_headers`` parameter. For example, you may want to send + ``Cache-control: max-age=60`` to make the caches revalidate against the + origin server unless their cached copy is less than a minute old. Again, + this should be used with consideration. + + +.. seealso:: + + * `HTTP Conditional Get For RSS Hackers `_ + * `HTTP Web Services `_ diff -Nru feedparser-5.0.1/docs/http-other.rst feedparser-5.1.2/docs/http-other.rst --- feedparser-5.0.1/docs/http-other.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/http-other.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,40 @@ +Other :abbr:`HTTP (Hypertext Transfer Protocol)` Headers +======================================================== + +You can specify extra :abbr:`HTTP (Hypertext Transfer Protocol)` request +headers as a dictionary. When you download a feed from a remote web server, +:program:`Universal Feed Parser` exposes the complete set of +:abbr:`HTTP (Hypertext Transfer Protocol)` response headers as a dictionary. + + +.. _example.http.headers.request: + +Sending custom :abbr:`HTTP (Hypertext Transfer Protocol)` request headers +------------------------------------------------------------------------- + +:: + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom03.xml', + extra_headers={'Cache-control': 'max-age=0'}) + + +Accessing other :abbr:`HTTP (Hypertext Transfer Protocol)` response headers +--------------------------------------------------------------------------- + +:: + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom03.xml') + >>> d.headers + {'date': 'Fri, 11 Jun 2004 23:57:50 GMT', + 'server': 'Apache/2.0.49 (Debian GNU/Linux)', + 'last-modified': 'Fri, 11 Jun 2004 23:00:34 GMT', + 'etag': '"6c132-941-ad7e3080"', + 'accept-ranges': 'bytes', + 'vary': 'Accept-Encoding,User-Agent', + 'content-encoding': 'gzip', + 'content-length': '883', + 'connection': 'close', + 'content-type': 'application/xml'} + diff -Nru feedparser-5.0.1/docs/http-redirect.rst feedparser-5.1.2/docs/http-redirect.rst --- feedparser-5.0.1/docs/http-redirect.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/http-redirect.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,82 @@ +:abbr:`HTTP (Hypertext Transfer Protocol)` Redirects +==================================================== + +When you download a feed from a remote web server, :program:`Universal Feed Parser` +exposes the :abbr:`HTTP (Hypertext Transfer Protocol)` status code. You need +to understand the different codes, including permanent and temporary redirects, +and feeds that have been marked "gone". + +When a feed has temporarily moved to a new location, the web server will return +a ``302`` status code. :program:`Universal Feed Parser` makes this available +in ``d.status``. + +There is nothing special you need to do with temporary redirects; by the time +you learn about it, :program:`Universal Feed Parser` has already followed the +redirect to the new location (available in ``d.href``), downloaded the feed, +and parsed it. Since the redirect is temporary, you should continue requesting +the original :abbr:`URL (Uniform Resource Locator)` the next time you want to +parse the feed. + + +Noticing temporary redirects +---------------------------- + +:: + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/temporary.xml') + >>> d.status + 302 + >>> d.href + 'http://feedparser.org/docs/examples/atom10.xml' + >>> d.feed.title + u'Sample Feed' + +When a feed has permanently moved to a new location, the web server will return +a ``301`` status code. Again, :program:`Universal Feed Parser` makes this +available in ``d.status``. + + +If you are polling a feed on a regular basis, it is very important to check the +status code (``d.status``) every time you download. If the feed has been +permanently redirected, you should update your database or configuration file +with the new address (``d.href``). Repeatedly requesting the original address +of a feed that has been permanently redirected is very rude, and may get you +banned from the server. + + +Noticing permanent redirects +---------------------------- + +:: + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/permanent.xml') + >>> d.status + 301 + >>> d.href + 'http://feedparser.org/docs/examples/atom10.xml' + >>> d.feed.title + u'Sample Feed' + + +When a feed has been permanently deleted, the web server will return a ``410`` +status code. If you ever receive a ``410``, you should stop polling the feed +and inform the end user that the feed is gone for good. + + +Repeatedly requesting a feed that has been marked as "gone" is very rude, and +may get you banned from the server. + + +Noticing feeds marked "gone" +---------------------------- + +:: + + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/gone.xml') + >>> d.status + 410 + diff -Nru feedparser-5.0.1/docs/http-useragent.rst feedparser-5.1.2/docs/http-useragent.rst --- feedparser-5.0.1/docs/http-useragent.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/http-useragent.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,57 @@ +User-Agent and Referer Headers +============================== + +:program:`Universal Feed Parser` sends a default User-Agent string when it +requests a feed from a web server. + + +The default User-Agent string looks like this: + +:: + + UniversalFeedParser/5.0.1 +http://feedparser.org/ + +If you are embedding :program:`Universal Feed Parser` in a larger application, +you should change the User-Agent to your application name and +:abbr:`URL (Uniform Resource Locator)`. + + +Customizing the User-Agent +-------------------------- + +:: + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml', + agent='MyApp/1.0 +http://example.com/') + +You can also set the User-Agent once, globally, and then call the ``parse`` +function normally. + + +Customizing the User-Agent permanently +-------------------------------------- + +:: + + >>> import feedparser + >>> feedparser.USER_AGENT = "MyApp/1.0 +http://example.com/" + >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') + + +:program:`Universal Feed Parser` also lets you set the referrer when you +download a feed from a web server. This is discouraged, because it is a +violation of `RFC 2616 `_. +The default behavior is to send a blank referrer, and you should never need to +override this. + + +Customizing the referrer +------------------------ + +:: + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml', + referrer='http://example.com/') + diff -Nru feedparser-5.0.1/docs/http.rst feedparser-5.1.2/docs/http.rst --- feedparser-5.0.1/docs/http.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/http.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,11 @@ +:abbr:`HTTP (Hypertext Transfer Protocol)` Features +################################################### + +.. toctree:: + :maxdepth: 2 + + http-etag + http-useragent + http-redirect + http-authentication + http-other diff -Nru feedparser-5.0.1/docs/index.rst feedparser-5.1.2/docs/index.rst --- feedparser-5.0.1/docs/index.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/index.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,25 @@ +============= +Documentation +============= + +This documentation claims to describe the behavior of :program:`Universal Feed +Parser` |version|. It does not claim to describe the behavior of any other version. + +This documentation lives at `http://packages.python.org/feedparser/ +`_. If you're reading it somewhere else, you may +not have the latest version. + +This documentation is provided by the author "as is" without any express or +implied warranties. See :ref:`the documentation license ` for more details. + +.. toctree:: + :maxdepth: 2 + + basic + advanced + http + annotated-examples + history + microformats + reference + license diff -Nru feedparser-5.0.1/docs/introduction.rst feedparser-5.1.2/docs/introduction.rst --- feedparser-5.0.1/docs/introduction.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/introduction.rst 2012-02-19 23:50:31.000000000 +0000 @@ -0,0 +1,78 @@ +Introduction +============ + +:program:`Universal Feed Parser` is a :program:`Python` module for downloading +and parsing syndicated feeds. It can handle :abbr:`RSS (Rich Site Summary)` +0.90, Netscape :abbr:`RSS (Rich Site Summary)` 0.91, Userland :abbr:`RSS (Rich +Site Summary)` 0.91, :abbr:`RSS (Rich Site Summary)` 0.92, :abbr:`RSS (Rich +Site Summary)` 0.93, :abbr:`RSS (Rich Site Summary)` 0.94, :abbr:`RSS (Rich +Site Summary)` 1.0, :abbr:`RSS (Rich Site Summary)` 2.0, Atom 0.3, Atom 1.0, +and :abbr:`CDF (Channel Definition Format)` feeds. It also parses several +popular extension modules, including Dublin Core and Apple's :program:`iTunes` +extensions. + +To use :program:`Universal Feed Parser`, you will need :program:`Python` 2.4 or +later (Python 3 is supported). :program:`Universal Feed Parser` is not meant +to run standalone; it is a module for you to use as part of a larger +:program:`Python` program. + +:program:`Universal Feed Parser` is easy to use; the module is self-contained +in a single file, :file:`feedparser.py`, and it has one primary public +function, ``parse``. ``parse`` takes a number of arguments, but only one is +required, and it can be a :abbr:`URL (Uniform Resource Locator)`, a local +filename, or a raw string containing feed data in any format. + + +Parsing a feed from a remote :abbr:`URL (Uniform Resource Locator)` +------------------------------------------------------------------- +:: + + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') + >>> d['feed']['title'] + u'Sample Feed' + + +The following example assumes you are on Windows, and that you have saved a feed at :file:`c:\\incoming\\atom10.xml`. + +.. note:: + + :program:`Universal Feed Parser` works on any platform that can run + :program:`Python`; use the path syntax appropriate for your platform. + +Parsing a feed from a local file +-------------------------------- +:: + + + >>> import feedparser + >>> d = feedparser.parse(r'c:\incoming\atom10.xml') + >>> d['feed']['title'] + u'Sample Feed' + + +:program:`Universal Feed Parser` can also parse a feed in memory. + +Parsing a feed from a string +---------------------------- +:: + + + >>> import feedparser + >>> rawdata = """ + + Sample Feed + + """ + >>> d = feedparser.parse(rawdata) + >>> d['feed']['title'] + u'Sample Feed' + + +Values are returned as :program:`Python` Unicode strings (except when they're +not -- see :ref:`advanced.encoding` for all the gory details). + +.. seealso:: + + `Introduction to Python Unicode strings `_ diff -Nru feedparser-5.0.1/docs/license.rst feedparser-5.1.2/docs/license.rst --- feedparser-5.0.1/docs/license.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/license.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,28 @@ +.. _license: + +Documentation license +===================== + +Copyright 2004-2008 Mark Pilgrim. All rights reserved. + +Redistribution and use in source (Sphinx ReST) and "compiled" forms (HTML, PDF, +PostScript, RTF and so forth) with or without modification, are permitted +provided that the following conditions are met: + +* Redistributions of source code (Sphinx ReST) must retain the above copyright + notice, this list of conditions and the following disclaimer. +* Redistributions in compiled form (converted to HTML, PDF, PostScript, RTF and + other formats) must reproduce the above copyright notice, this list of + conditions and the following disclaimer in the documentation and/or other + materials provided with the distribution. + +THIS DOCUMENTATION IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS +IS' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR +ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES +(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON +ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +DOCUMENTATION, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. diff -Nru feedparser-5.0.1/docs/microformats.rst feedparser-5.1.2/docs/microformats.rst --- feedparser-5.0.1/docs/microformats.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/microformats.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,230 @@ +.. _advanced.microformats: + +Microformats +============ + +An emerging trend in feed syndication is the inclusion of `microformats`_. +Besides the semantics defined by individual feed formats, publishers can add +additional semantics using rel and class attributes in embedded +:abbr:`HTML (HyperText Markup Language)` content. + +.. _microformats: http://microformats.org/ + +.. note:: + + To parse microformats. :program:`Universal Feed Parser` relies on a + third-party library called `Beautiful Soup`_, which is distributed + separately. If Beautiful Soup is not installed, + :program:`Universal Feed Parser` will silently skip microformats parsing. + +.. _Beautiful Soup: http://www.crummy.com/software/BeautifulSoup/ + + +The following elements are parsed for microformats: + +* :ref:`reference.entry.summary_detail.value` +* :ref:`reference.entry.content.value` + + + +.. _advanced.microformats.relenclosure: + +rel=enclosure +------------- + +The `rel=enclosure`_ microformat provides a way for embedded +:abbr:`HTML (HyperText Markup Language)` content to specify that a certain link +should be treated as an :ref:`enclosure `. +:program:`Universal Feed Parser` looks for links within embedded markup that +meet any of the following conditions: + +.. _rel=enclosure: http://microformats.org/wiki/rel-enclosure + +* rel attribute contains enclosure (note: rel attributes can contain a list of space-separated values) +* type attribute starts with audio/ +* type attribute starts with video/ +* type attribute starts with application/ but does not end with xml +* href attribute ends with one of the following file extensions: + :file:`.7z`, + :file:`.avi`, + :file:`.bin`, + :file:`.bz2`, + :file:`.bz2`, + :file:`.deb`, + :file:`.dmg`, + :file:`.exe`, + :file:`.gz`, + :file:`.hqx`, + :file:`.img`, + :file:`.iso`, + :file:`.jar`, + :file:`.m4a`, + :file:`.m4v`, + :file:`.mp2`, + :file:`.mp3`, + :file:`.mp4`, + :file:`.msi`, + :file:`.ogg`, + :file:`.ogm`, + :file:`.rar`, + :file:`.rpm`, + :file:`.sit`, + :file:`.sitx`, + :file:`.tar`, + :file:`.tbz2`, + :file:`.tgz`, + :file:`.wma`, + :file:`.wmv`, + :file:`.z`, + :file:`.zip` + + +When :program:`Universal Feed Parser` finds a link that satisfies any of these +conditions, it adds it to :ref:`reference.entry.enclosures`. + + +.. rubric:: Parsing embedded enclosures + +.. sourcecode:: python + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/rel-enclosure.xml') + >>> d.entries[0].enclosures + [{u'href': u'http://example.com/movie.mp4', 'title': u'awesome movie'}] + + + +.. _advanced.microformats.reltag: + +rel=tag +------- + +The `rel=tag`_ microformat allows you to define +:ref:`tags ` within embedded +:abbr:`HTML (HyperText Markup Language)` content. +:program:`Universal Feed Parser` looks for these attribute values in embedded +markup and maps them to :ref:`reference.entry.tags`. + +.. _rel=tag: http://microformats.org/wiki/rel-tag + + +.. rubric:: Parsing embedded tags + +.. sourcecode:: python + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/rel-tag.xml') + >>> d.entries[0].tags + [{'term': u'tech', 'scheme': u'http://del.icio.us/tag/', 'label': u'Technology'}] + + + +.. _advanced.microformats.xfn: + +:abbr:`XFN (XHTML Friends Network)` +----------------------------------- + + +The `XFN`_ microformat allows you to define human relationships between +:abbr:`URI (Uniform Resource Identifier)`\s. For example, you could link from +your weblog to your spouse's weblog with the ``rel="spouse"`` relation. It is +intended primarily for "blogrolls" or other static lists of links, but the +relations can occur anywhere in :abbr:`HTML (HyperText Markup Language)` +content. If found, :program:`Universal Feed Parser` will return the +:abbr:`XFN (XHTML Friends Network)` information in :ref:`reference.entry.xfn`. + +.. _XFN: http://microformats.org/wiki/XFN + +:program:`Universal Feed Parser` supports all of the relationships listed in +the `XFN 1.1 profile`_, as well as the following variations: + +.. _XFN 1.1 profile: http://gmpg.org/xfn/11 + +* ``coworker`` in addition to ``co-worker`` +* ``coresident`` in addition to ``co-resident`` +* ``relative`` in addition to ``kin`` +* ``brother`` and ``sister`` in addition to ``sibling`` +* ``husband`` and ``wife`` in addition to ``spouse`` + + + + +.. rubric:: Parsing :abbr:`XFN (XHTML Friends Network)` relationships + +.. sourcecode:: python + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/xfn.xml') + >>> person = d.entries[0].xfn[0] + >>> person.name + u'John Doe' + >>> person.href + u'http://example.com/johndoe' + >>> person.relationships + [u'coworker', u'friend'] + + + +.. _advanced.microformats.hcard: + +hCard +----- + +The `hCard`_ microformat allows you to embed address book information within +:abbr:`HTML (HyperText Markup Language)` content. If +:program:`Universal Feed Parser` finds an hCard within supported elements, it +converts it into an RFC 2426-compliant vCard and returns it in +:ref:`reference.entry.vcard`. + +.. _hCard: http://microformats.org/wiki/hcard + + +.. rubric:: Converting embedded hCard markup into a vCard + +.. sourcecode:: python + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/hcard.xml') + >>> print d.entries[0].vcard + BEGIN:vCard + VERSION:3.0 + FN:Frank Dawson + N:Dawson;Frank + ADR;TYPE=work,postal,parcel:;;6544 Battleford Drive;Raleigh;NC;27613-3502;U + .S.A. + TEL;TYPE=WORK,VOICE,MSG:+1-919-676-9515 + TEL;TYPE=WORK,FAX:+1-919-676-9564 + EMAIL;TYPE=internet,pref:Frank_Dawson at Lotus.com + EMAIL;TYPE=internet:fdawson at earthlink.net + ORG:Lotus Development Corporation + URL:http://home.earthlink.net/~fdawson + END:vCard + BEGIN:vCard + VERSION:3.0 + FN:Tim Howes + N:Howes;Tim + ADR;TYPE=work:;;501 E. Middlefield Rd.;Mountain View;CA;94043;U.S.A. + TEL;TYPE=WORK,VOICE,MSG:+1-415-937-3419 + TEL;TYPE=WORK,FAX:+1-415-528-4164 + EMAIL;TYPE=internet:howes at netscape.com + ORG:Netscape Communications Corp. + END:vCard + + + +.. note:: + + There are a growing number of microformats, and + :program:`Universal Feed Parser` does not parse all of them. However, both the + rel and class attributes survive :ref:`HTML sanitizing `, + so applications built on :program:`Universal Feed Parser` that wish to parse + additional microformat content are free to do so. + + +.. seealso:: + + * `Microformats.org `_ + * `rel=enclosure specification `_ + * `rel=tag specification `_ + * `XFN specification `_ + * `hCard specification `_ diff -Nru feedparser-5.0.1/docs/namespace-handling.rst feedparser-5.1.2/docs/namespace-handling.rst --- feedparser-5.0.1/docs/namespace-handling.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/namespace-handling.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,137 @@ +.. _advanced.namespaces: + +Namespace Handling +================== + +:program:`Universal Feed Parser` attempts to expose all possible data in feeds, +including elements in extension namespaces. + +Some common namespaced elements are mapped to core elements. For further +information about these mappings, see :ref:`reference`. + +Other namespaced elements are available as ``prefixelement``. + +The namespaces defined in the feed are available in the parsed results as +``namespaces``, a dictionary of {prefix: namespaceURI}. If the feed defines a +default namespace, it is listed as ``namespaces['']``. + + +Accessing namespaced elements +----------------------------- + +:: + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/prism.rdf') + >>> d.feed.prism_issn + u'0028-0836' + >>> d.namespaces + {'': u'http://purl.org/rss/1.0/', + 'prism': u'http://prismstandard.org/namespaces/1.2/basic/', + 'rdf': u'http://www.w3.org/1999/02/22-rdf-syntax-ns#'} + + +The prefix used to construct the variable name is not guaranteed to be the same +as the prefix of the namespaced element in the original feed. If +:program:`Universal Feed Parser` recognizes the namespace, it will use the +namespace's preferred prefix to construct the variable name. It will also list +the namespace in the ``namespaces`` dictionary using the namespace's preferred +prefix. + +In the previous example, the namespace +(http://prismstandard.org/namespaces/1.2/basic/) was defined with the +namespace's preferred prefix (prism), so the prism:issn element was accessible +as the variable ``d.feed.prism_issn``. However, if the namespace is defined +with a non-standard prefix, :program:`Universal Feed Parser` will still +construct the variable name using the preferred prefix, *not* the actual prefix +that is used in the feed. + +This will become clear with an example. + + +Accessing namespaced elements with non-standard prefixes +-------------------------------------------------------- + +:: + + >>> import feedparser + >>> d = feedparser.parse('http://feedparser.org/docs/examples/nonstandard_prefix.rdf') + >>> d.feed.prism_issn + u'0028-0836' + >>> d.feed.foo_issn + Traceback (most recent call last): + File "", line 1, in ? + File "feedparser.py", line 158, in __getattr__ + raise AttributeError, "object has no attribute '%s'" % key + AttributeError: object has no attribute 'foo_issn' + >>> d.namespaces + {'': u'http://purl.org/rss/1.0/', + 'prism': u'http://prismstandard.org/namespaces/1.2/basic/', + 'rdf': u'http://www.w3.org/1999/02/22-rdf-syntax-ns#'} + + +This is the complete list of namespaces that :program:`Universal Feed Parser` +recognizes and uses to construct the variable names for data in these +namespaces: + +=============== ===================================================== +Prefix Namespace +=============== ===================================================== +admin http://webns.net/mvcb/ +ag http://purl.org/rss/1.0/modules/aggregation/ +annotate http://purl.org/rss/1.0/modules/annotate/ +audio http://media.tangent.org/rss/1.0/ +blogChannel http://backend.userland.com/blogChannelModule +cc http://web.resource.org/cc/ +co http://purl.org/rss/1.0/modules/company +content http://purl.org/rss/1.0/modules/content/ +cp http://my.theinfo.org/changed/1.0/rss/ +creativeCommons http://backend.userland.com/creativeCommonsRssModule +dc http://purl.org/dc/elements/1.1/ +dcterms http://purl.org/dc/terms/ +email http://purl.org/rss/1.0/modules/email/ +ev http://purl.org/rss/1.0/modules/event/ +feedburner http://rssnamespace.org/feedburner/ext/1.0 +fm http://freshmeat.net/rss/fm/ +foaf http://xmlns.com/foaf/0.1/ +geo http://www.w3.org/2003/01/geo/wgs84_pos# +icbm http://postneo.com/icbm/ +image http://purl.org/rss/1.0/modules/image/ +itunes http://example.com/DTDs/PodCast-1.0.dtd +itunes http://www.itunes.com/DTDs/PodCast-1.0.dtd +l http://purl.org/rss/1.0/modules/link/ +media http://search.yahoo.com/mrss +pingback http://madskills.com/public/xml/rss/module/pingback/ +prism http://prismstandard.org/namespaces/1.2/basic/ +rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# +rdfs http://www.w3.org/2000/01/rdf-schema# +ref http://purl.org/rss/1.0/modules/reference/ +reqv http://purl.org/rss/1.0/modules/richequiv/ +search http://purl.org/rss/1.0/modules/search/ +slash http://purl.org/rss/1.0/modules/slash/ +soap http://schemas.xmlsoap.org/soap/envelope/ +ss http://purl.org/rss/1.0/modules/servicestatus/ +str http://hacks.benhammersley.com/rss/streaming/ +sub http://purl.org/rss/1.0/modules/subscription/ +sy http://purl.org/rss/1.0/modules/syndication/ +szf http://schemas.pocketsoap.com/rss/myDescModule/ +taxo http://purl.org/rss/1.0/modules/taxonomy/ +thr http://purl.org/rss/1.0/modules/threading/ +ti http://purl.org/rss/1.0/modules/textinput/ +trackback http://madskills.com/public/xml/rss/module/trackback/ +wfw http://wellformedweb.org/CommentAPI/ +wiki http://purl.org/rss/1.0/modules/wiki/ +xhtml http://www.w3.org/1999/xhtml +xlink http://www.w3.org/1999/xlink +xml http://www.w3.org/XML/1998/namespace +=============== ===================================================== + +.. note:: + + :program:`Universal Feed Parser` treats namespaces as case-insensitive to + match the behavior of certain versions of :program:`iTunes`. + +.. warning:: + + Data from namespaced elements is not :ref:`sanitized ` + (even if it contains :abbr:`HTML (HyperText Markup Language)` markup). diff -Nru feedparser-5.0.1/docs/reference-bozo.rst feedparser-5.1.2/docs/reference-bozo.rst --- feedparser-5.0.1/docs/reference-bozo.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-bozo.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,18 @@ +:py:attr:`bozo` +=============== + +An integer, either ``1`` or ``0``. Set to ``1`` if the feed is not well-formed +:abbr:`XML (Extensible Markup Language)`, and ``0`` otherwise. + +See :ref:`advanced.bozo` for more details on the :py:attr:`bozo` bit. + +.. tip:: + + :py:attr:`bozo` may not be present. Some platforms, such as Mac OS X 10.2 and some + versions of FreeBSD, do not include an :abbr:`XML (Extensible Markup Language)` + parser in their :program:`Python` distributions. :program:`Universal Feed Parser` + will still work on these platforms, but it will not be able to detect whether a + feed is well-formed. However, it *can* detect whether a feed's character + encoding is incorrectly declared. (This is done in :program:`Python`, not by + the :abbr:`XML (Extensible Markup Language)` parser.) See + :ref:`advanced.encoding` for details. diff -Nru feedparser-5.0.1/docs/reference-bozo_exception.rst feedparser-5.1.2/docs/reference-bozo_exception.rst --- feedparser-5.0.1/docs/reference-bozo_exception.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-bozo_exception.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,10 @@ +:py:attr:`bozo_exception` +========================= + +The exception raised when attempting to parse a non-well-formed feed. + +See :ref:`advanced.bozo` for more details. + +.. tip:: + + :py:attr:`bozo_exception` will only be present if :py:attr:`bozo` is ``1``. diff -Nru feedparser-5.0.1/docs/reference-encoding.rst feedparser-5.1.2/docs/reference-encoding.rst --- feedparser-5.0.1/docs/reference-encoding.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-encoding.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,16 @@ +.. _reference.encoding: + +:py:attr:`encoding` +=================== + +The character encoding that was used to parse the feed. + +.. note:: + + The process by which :program:`Universal Feed Parser` determines the character + encoding of the feed is explained in :ref:`advanced.encoding`. + +.. tip:: + + This element always exists, although it may be an empty string if the character + encoding cannot be determined. diff -Nru feedparser-5.0.1/docs/reference-entry-author.rst feedparser-5.1.2/docs/reference-entry-author.rst --- feedparser-5.0.1/docs/reference-entry-author.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-author.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,20 @@ +.. _reference.entry.author: + +:py:attr:`entries[i].author` +============================ + +The author of this entry. + +.. seealso:: + + * :ref:`reference.entry.author_detail` + +.. rubric:: Comes from + +* /atom10:feed/atom10:entry/atom10:author +* /atom03:feed/atom03:entry/atom03:author +* /rss/channel/item/dc:creator +* /rss/channel/item/dc:author +* /rss/channel/itunes:author +* /rdf:RDF/rdf:item/dc:creator +* /rdf:RDF/rdf:item/dc:author diff -Nru feedparser-5.0.1/docs/reference-entry-author_detail.rst feedparser-5.1.2/docs/reference-entry-author_detail.rst --- feedparser-5.0.1/docs/reference-entry-author_detail.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-author_detail.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,48 @@ +.. _reference.entry.author_detail: + +:py:attr:`entries[i].author_detail` +=================================== + +A dictionary with details about the author of this entry. + +.. seealso:: + + * :ref:`reference.entry.author` + + +.. _reference.entry.author_detail.name: + +:py:attr:`entries[i].author_detail.name` +---------------------------------------- + +The name of this entry's author. + + +.. _reference.entry.author_detail.href: + +:py:attr:`entries[i].author_detail.href` +---------------------------------------- + +The :abbr:`URL (Uniform Resource Locator)` of this entry's author. This can be +the author's home page, or a contact page with a webmail form. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +.. _reference.entry.author_detail.email: + +:py:attr:`entries[i].author_detail.email` +----------------------------------------- + +The email address of this entry's author. + +.. rubric:: Comes from + +* /atom10:feed/atom10:entry/atom10:author +* /atom03:feed/atom03:entry/atom03:author +* /rss/channel/item/dc:creator +* /rss/channel/item/dc:author +* /rss/channel/itunes:author +* /rdf:RDF/rdf:item/dc:creator +* /rdf:RDF/rdf:item/dc:author diff -Nru feedparser-5.0.1/docs/reference-entry-comments.rst feedparser-5.1.2/docs/reference-entry-comments.rst --- feedparser-5.0.1/docs/reference-entry-comments.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-comments.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,14 @@ +.. _reference.entry.comments: + +:py:attr:`entries[i].comments` +============================== + +A :abbr:`URL (Uniform Resource Locator)` of the :abbr:`HTML (HyperText Markup Language)` +comment submission page associated with this entry. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + +.. rubric:: Comes from + +* /rss/channel/item/comments diff -Nru feedparser-5.0.1/docs/reference-entry-content.rst feedparser-5.1.2/docs/reference-entry-content.rst --- feedparser-5.0.1/docs/reference-entry-content.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-content.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,107 @@ +.. _reference.entry.content: + +:py:attr:`entries[i].content` +============================= + +A list of dictionaries with details about the full content of the entry. + +Atom feeds may contain multiple content elements. Clients should render as +many of them as possible, based on the type and the client's abilities. + + +.. _reference.entry.content.value: + +:py:attr:`entries[i].content[j].value` +-------------------------------------- + +The value of this piece of content. + +If this contains :abbr:`HTML (HyperText Markup Language)` or +:abbr:`XHTML (Extensible HyperText Markup Language)`, it is +:ref:`sanitized ` by default. + +If this contains :abbr:`HTML (HyperText Markup Language)` or +:abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements +within this value may contain relative :abbr:`URI (Uniform Resource Identifier)`\s. +If so, they are :ref:`resolved according to a set of rules `. + +If this contains :abbr:`HTML (HyperText Markup Language)` or +:abbr:`XHTML (Extensible HyperText Markup Language)`, it will be +:ref:`parsed for microformats `. + + +.. _reference.entry.content.type: + +:py:attr:`entries[i].content[j].type` +------------------------------------- + +The content type of this piece of content. + +Most likely values for `type`: + +* :mimetype:`text/plain` +* :mimetype:`text/html` +* :mimetype:`application/xhtml+xml` + +For Atom feeds, the content type is taken from the type attribute, which +defaults to :mimetype:`text/plain` if not specified. For +:abbr:`RSS (Rich Site Summary)` feeds, the content type is auto-determined by +inspecting the content, and defaults to :mimetype:`text/html`. Note that this +may cause silent data loss if the value contains plain text with angle +brackets. There is nothing I can do about this problem; it is a limitation of +:abbr:`RSS (Rich Site Summary)`. + +Future enhancement: some versions of :abbr:`RSS (Rich Site Summary)` clearly +specify that certain values default to :mimetype:`text/plain`, and +:program:`Universal Feed Parser` should respect this, but it doesn't yet. + + +.. _reference.entry.content.language: + +:py:attr:`entries[i].content[j].language` +----------------------------------------- + +The language of this piece of content. + +:py:attr:`~entries[i].content[j].language` is supposed to be a language code, +as specified by :rfc:`3066`, but publishers have been known to publish random +values like "English" or "German". :program:`Universal Feed Parser` does not +do any parsing or normalization of language codes. + +:py:attr:`~entries[i].content[j].language` may come from the element's xml:lang +attribute, or it may inherit from a parent element's xml:lang, or the +:mailheader:`Content-Language` :abbr:`HTTP (Hypertext Transfer Protocol)` +header. If the feed does not specify a language, +:py:attr:`~entries[i].content[j].language` will be ``None``, the +:program:`Python` null value. + + +.. _reference.entry.content.base: + +:py:attr:`entries[i].content[j].base` +------------------------------------- + +The original base :abbr:`URI (Uniform Resource Identifier)` for links within +this piece of content. + +:py:attr:`~entries[i].content[j].base` is only useful in rare situations and +can usually be ignored. It is the original base +:abbr:`URI (Uniform Resource Identifier)` for this value, as specified by the +element's xml:base attribute, or a parent element's xml:base, or the +appropriate :abbr:`HTTP (Hypertext Transfer Protocol)` header, or the +:abbr:`URI (Uniform Resource Identifier)` of the feed. (See +:ref:`advanced.base` for more details.) By the time you see it, +:program:`Universal Feed Parser` has already resolved relative links in all +values where it makes sense to do so. *Clients should never need to manually +resolve relative links.* + + +.. rubric:: Comes from + +* /atom03:feed/atom03:entry/atom03:content +* /atom10:feed/atom10:entry/atom10:content +* /rdf:RDF/rdf:item/content:encoded +* /rss/channel/item/body +* /rss/channel/item/content:encoded +* /rss/channel/item/fullitem +* /rss/channel/item/xhtml:body diff -Nru feedparser-5.0.1/docs/reference-entry-contributors.rst feedparser-5.1.2/docs/reference-entry-contributors.rst --- feedparser-5.0.1/docs/reference-entry-contributors.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-contributors.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,39 @@ +:py:attr:`entries[i].contributors` +================================== + +A list of contributors (secondary authors) to this entry. + + +.. _reference.entry.contributors.name: + +:py:attr:`entries[i].contributors[j].name` +------------------------------------------ + +The name of this contributor. + + +.. _reference.entry.contributors.href: + +:py:attr:`entries[i].contributors[j].href` +------------------------------------------ + +The :abbr:`URL (Uniform Resource Locator)` of this contributor. This can be +the contributor's home page, or a contact page with a webmail form. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +.. _reference.entry.contributors.email: + +:py:attr:`entries[i].contributors[j].email` +------------------------------------------- + +The email address of this contributor. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:entry/atom03:contributor +* /atom10:feed/atom10:entry/atom10:contributor +* /rss/channel/item/dc:contributor diff -Nru feedparser-5.0.1/docs/reference-entry-created.rst feedparser-5.1.2/docs/reference-entry-created.rst --- feedparser-5.0.1/docs/reference-entry-created.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-created.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,22 @@ +.. _reference.entry.created: + +:py:attr:`entries[i].created` +============================= + +The date this entry was first created (drafted), as a string in the same format +as it was published in the original feed). + +This element is :ref:`parsed as a date ` and stored in +:ref:`reference.entry.created_parsed`. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:entry/atom03:created +* /rdf:RDF/rdf:item/dcterms:created +* /rss/channel/item/dcterms:created + + +.. seealso:: + + * :ref:`reference.entry.created_parsed` diff -Nru feedparser-5.0.1/docs/reference-entry-created_parsed.rst feedparser-5.1.2/docs/reference-entry-created_parsed.rst --- feedparser-5.0.1/docs/reference-entry-created_parsed.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-created_parsed.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,19 @@ +.. _reference.entry.created_parsed: + +:py:attr:`entries[i].created_parsed` +==================================== + +The date this entry was first created (drafted), as a standard +:program:`Python` 9-tuple. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:entry/atom03:created +* /rdf:RDF/rdf:item/dcterms:created +* /rss/channel/item/dcterms:created + + +.. seealso:: + + * :ref:`reference.entry.created` diff -Nru feedparser-5.0.1/docs/reference-entry-enclosures.rst feedparser-5.1.2/docs/reference-entry-enclosures.rst --- feedparser-5.0.1/docs/reference-entry-enclosures.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-enclosures.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,48 @@ +.. _reference.entry.enclosures: + +:py:attr:`entries[i].enclosures` +================================ + +A list of links to external files associated with this entry. + +Some aggregators automatically download enclosures (although this technique has +`known problems `_). Some aggregators +render each enclosure as a link. Most aggregators ignore them. + +The :abbr:`RSS (Rich Site Summary)` specification states that there can be at +most one enclosure per item. However, because some feeds break this rule, +:program:`Universal Feed Parser` captures all of them and makes them available +as a list. + +.. rubric:: Comes from + +- /atom10:feed/atom10:entry/atom10:link[@rel="enclosure"] +- /rss/channel/item/enclosure +- additionally, :ref:`certain links within embedded markup ` + + +.. _reference.entry.enclosures.href: + +:py:attr:`entries[i].enclosures[j].href` +---------------------------------------- + +The :abbr:`URL (Uniform Resource Locator)` of the linked file. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +.. _reference.entry.enclosures.length: + +:py:attr:`entries[i].enclosures[j].length` +------------------------------------------ + +The length of the linked file. + + +.. _reference.entry.enclosures.type: + +:py:attr:`entries[i].enclosures[j].type` +---------------------------------------- + +The content type of the linked file. diff -Nru feedparser-5.0.1/docs/reference-entry-expired.rst feedparser-5.1.2/docs/reference-entry-expired.rst --- feedparser-5.0.1/docs/reference-entry-expired.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-expired.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,24 @@ +.. _reference.entry.expired: + +:py:attr:`entries[i].expired` +============================= + +The date this entry is set to expire, as a string in the same format as it was +published in the original feed). + +This element is :ref:`parsed as a date ` and stored in +:ref:`reference.entry.expired_parsed`. + +This element is rare. It only existed in :abbr:`RSS (Rich Site Summary)` 0.93, +and it was never widely implemented by publishers. Most clients ignore it in +favor of user-defined expiration algorithms. + + +.. rubric:: Comes from + +* /rss/channel/item/expirationDate + + +.. seealso:: + + * :ref:`reference.entry.expired_parsed` diff -Nru feedparser-5.0.1/docs/reference-entry-expired_parsed.rst feedparser-5.1.2/docs/reference-entry-expired_parsed.rst --- feedparser-5.0.1/docs/reference-entry-expired_parsed.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-expired_parsed.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,20 @@ +.. _reference.entry.expired_parsed: + +:py:attr:`entries[i].expired_parsed` +==================================== + +The date this entry is set to expire, as a standard :program:`Python` 9-tuple. + +This element is rare. It only existed in :abbr:`RSS (Rich Site Summary)` 0.93, +and it was never widely implemented by publishers. Most clients ignore it in +favor of user-defined expiration algorithms. + + +.. rubric:: Comes from + +* /rss/channel/item/expirationDate + + +.. seealso:: + + * :ref:`reference.entry.expired` diff -Nru feedparser-5.0.1/docs/reference-entry-id.rst feedparser-5.1.2/docs/reference-entry-id.rst --- feedparser-5.0.1/docs/reference-entry-id.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-id.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,17 @@ +.. _reference.entry.id: + +:py:attr:`entries[i].id` +======================== + +A globally unique identifier for this entry. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:entry/atom03:id +* /atom10:feed/atom10:entry/atom10:id +* /rdf:RDF/rdf:item/@rdf:about +* /rss/channel/item/guid diff -Nru feedparser-5.0.1/docs/reference-entry-license.rst feedparser-5.1.2/docs/reference-entry-license.rst --- feedparser-5.0.1/docs/reference-entry-license.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-license.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,16 @@ +.. _reference.entry.license: + +:py:attr:`entries[i].license` +============================= + +A :abbr:`URL (Uniform Resource Locator)` of the license under which this entry +is distributed. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + +.. rubric:: Comes from + +* /atom10:feed/atom10:entry/atom10:link[@rel="license"]/@href +* /rdf:RDF/rdf:item/cc:license/@rdf:resource +* /rss/channel/item/creativeCommons:license diff -Nru feedparser-5.0.1/docs/reference-entry-link.rst feedparser-5.1.2/docs/reference-entry-link.rst --- feedparser-5.0.1/docs/reference-entry-link.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-link.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,33 @@ +.. _reference.entry.link: + +:py:attr:`entries[i].link` +========================== + +The primary link of this entry. Most feeds use this as the permanent link to +the entry in the site's archives. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + +Some :abbr:`RSS (Rich Site Summary)` feeds use guid when they mean link. guid +can also be used as an opaque identifier that has nothing to do with links. If +an :abbr:`RSS (Rich Site Summary)` feed uses guid as the entry link and no link +is present, :program:`Universal Feed Parser` detects this and makes the guid +available in :py:attr:`entries[i].link`. + +In other words, you can always use :py:attr:`entries[i].link` to get the entry +link, regardless of how the feed is actually structured. + + +.. rubric:: Comes from + +- /atom03:feed/atom03:entry/atom03:link[@rel="alternate"]/@href +- /atom10:feed/atom10:entry/atom10:link[@rel="alternate"]/@href +- /atom10:feed/atom10:entry/atom10:link[not(@rel)]/@href +- /rdf:RDF/rdf:item/rdf:link +- /rss/channel/item/link + + +.. seealso:: + + * :ref:`reference.entry.links` diff -Nru feedparser-5.0.1/docs/reference-entry-links.rst feedparser-5.1.2/docs/reference-entry-links.rst --- feedparser-5.0.1/docs/reference-entry-links.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-links.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,67 @@ +.. _reference.entry.links: + +:py:attr:`entries[i].links` +=========================== + +A list of dictionaries with details on the links associated with the feed. +Each link has a rel (relationship), type (content type), and href (the +:abbr:`URL (Uniform Resource Locator)` that the link points to). Some links +may also have a title. + + +.. _reference.entry.links.rel: + +:py:attr:`entries[i].links[j].rel` +---------------------------------- + +The relationship of this entry link. + +Atom 1.0 defines five standard link relationships and describes the process for +registering others. Here are the five standard rel values: + +* `alternate` +* `enclosure` +* `related` +* `self` +* `via` + + +.. _reference.entry.links.type: + +:py:attr:`entries[i].links[j].type` +----------------------------------- + +The content type of the page that this entry link points to. + + +.. _reference.entry.links.href: + +:py:attr:`entries[i].links[j].href` +----------------------------------- + +The :abbr:`URL (Uniform Resource Locator)` of the page that this entry link +points to. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +.. _reference.entry.links.title: + +:py:attr:`entries[i].links[j].title` +------------------------------------ + +The title of this entry link. + + +.. rubric:: Comes from + +- /atom03:feed/atom03:entry/atom03:link +- /atom10:feed/atom10:entry/atom10:link +- /rdf:RDF/rdf:item/rdf:link +- /rss/channel/item/link + + +.. seealso:: + + * :ref:`reference.entry.link` diff -Nru feedparser-5.0.1/docs/reference-entry-published.rst feedparser-5.1.2/docs/reference-entry-published.rst --- feedparser-5.0.1/docs/reference-entry-published.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-published.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,24 @@ +.. _reference.entry.published: + +:py:attr:`entries[i].published` +=============================== + +The date this entry was first published, as a string in the same format as it +was published in the original feed. + +This element is :ref:`parsed as a date ` and stored in +:ref:`reference.entry.published_parsed`. + + +.. rubric:: Comes from + +* /atom10:feed/atom10:entry/atom10:published +* /atom03:feed/atom03:entry/atom03:issued +* /rss/channel/item/dcterms:issued +* /rss/channel/item/pubDate +* /rdf:RDF/rdf:item/dcterms:issued + + +.. seealso:: + + * :ref:`reference.entry.published_parsed` diff -Nru feedparser-5.0.1/docs/reference-entry-published_parsed.rst feedparser-5.1.2/docs/reference-entry-published_parsed.rst --- feedparser-5.0.1/docs/reference-entry-published_parsed.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-published_parsed.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,21 @@ +.. _reference.entry.published_parsed: + +:py:attr:`entries[i].published_parsed` +====================================== + +The date this entry was first published, as a standard :program:`Python` +9-tuple. + + +.. rubric:: Comes from + +* /atom10:feed/atom10:entry/atom10:published +* /atom03:feed/atom03:entry/atom03:issued +* /rss/channel/item/dcterms:issued +* /rdf:RDF/rdf:item/dcterms:issued +* /rss/channel/item/pubDate + + +.. seealso:: + + * :ref:`reference.entry.published` diff -Nru feedparser-5.0.1/docs/reference-entry-publisher.rst feedparser-5.1.2/docs/reference-entry-publisher.rst --- feedparser-5.0.1/docs/reference-entry-publisher.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-publisher.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,18 @@ +.. _reference.entry.publisher: + +:py:attr:`entries[i].publisher` +=============================== + +The publisher of the entry. + + +.. rubric:: Comes from + +* /rss/item/dc:publisher +* /rss/item/itunes:owner +* /rdf:RDF/rdf:item/dc:publisher + + +.. seealso:: + + * :ref:`reference.entry.publisher_detail` diff -Nru feedparser-5.0.1/docs/reference-entry-publisher_detail.rst feedparser-5.1.2/docs/reference-entry-publisher_detail.rst --- feedparser-5.0.1/docs/reference-entry-publisher_detail.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-publisher_detail.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,42 @@ +.. _reference.entry.publisher_detail: + +:py:attr:`entries[i].publisher_detail` +====================================== + +A dictionary with details about the entry publisher. + + +:py:attr:`entries[i].publisher_detail.name` +------------------------------------------- + +The name of this entry's publisher. + + +.. _reference.entry.publisher_detail.href: + +:py:attr:`entries[i].publisher_detail.href` +------------------------------------------- + +The :abbr:`URL (Uniform Resource Locator)` of this entry's publisher. This can +be the publisher's home page, or a contact page with a webmail form. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +:py:attr:`entries[i].publisher_detail.email` +-------------------------------------------- + +The email address of this entry's publisher. + + +.. rubric:: Comes from + +* /rss/item/dc:publisher +* /rss/item/itunes:owner +* /rdf:RDF/rdf:item/dc:publisher + + +.. seealso:: + + * :ref:`reference.entry.publisher` diff -Nru feedparser-5.0.1/docs/reference-entry-source.rst feedparser-5.1.2/docs/reference-entry-source.rst --- feedparser-5.0.1/docs/reference-entry-source.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-source.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,482 @@ +.. _reference.entry.source: + +:py:attr:`entries[i].source` +============================ + +A dictionary with details about the source of the entry. + + +.. rubric:: Comes from + +* /atom10:feed/atom10:entry/atom10:source + + +:py:attr:`entries[i].source.author` +----------------------------------- + +The author of the source of this entry. + + +:py:attr:`entries[i].source.author_detail` +------------------------------------------ + +A dictionary containing details about the author of the source of this entry. + + +:py:attr:`entries[i].source.author_detail.name` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The name of the author of the source of this entry. + + +.. _reference.entry.source.author_detail.href: + +:py:attr:`entries[i].source.author_detail.href` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The :abbr:`URL (Uniform Resource Locator)` of the author of the source of this +entry. This can be the author's home page, or a contact page with a webmail +form. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +:py:attr:`entries[i].source.author_detail.email` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The email address of the author of the source of this entry. + + + +:py:attr:`entries[i].source.contributors` +----------------------------------------- + +A list of contributors to the source of this entry. + + +:py:attr:`entries[i].source.contributors[j].name` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The name of a contributor to the source of this entry. + + +.. _reference.entry.source.contributors.href: + +:py:attr:`entries[i].source.contributors[j].href` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The :abbr:`URL (Uniform Resource Locator)` of a contributor to the source of +this entry. This can be the contributor's home page, or a contact page with a +webmail form. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +:py:attr:`entries[i].source.contributors[j].email` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The email address of a contributor to the source of this entry. + + + +:py:attr:`entries[i].source.icon` +--------------------------------- + +The :abbr:`URL (Uniform Resource Locator)` of an icon representing the source +of this entry. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + + +:py:attr:`entries[i].source.id` +------------------------------- + +A globally unique identifier for the source of this entry. + + + +:py:attr:`entries[i].source.link` +--------------------------------- + +The primary permanent link of the source of this entry + + + +:py:attr:`entries[i].source.links` +---------------------------------- + +A list of all links defined by the source of this entry. + + +:py:attr:`entries[i].source.links[j].rel` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The relationship of a link defined by the source of this entry. + +Atom 1.0 defines five standard link relationships and describes the process for +registering others. Here are the five standard rel values: + +* ``alternate`` +* ``self`` +* ``related`` +* ``via`` +* ``enclosure`` + + +:py:attr:`entries[i].source.links[j].type` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The content type of the page pointed to by a link defined by the source of this +entry. + + +.. _reference.entry.source.links.href: + +:py:attr:`entries[i].source.links[j].href` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The :abbr:`URL (Uniform Resource Locator)` of the page pointed to by a link +defined by the source of this entry. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +:py:attr:`entries[i].source.links[j].title` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The title of a link defined by the source of this entry. + + + +:py:attr:`entries[i].source.logo` +--------------------------------- + +The :abbr:`URL (Uniform Resource Locator)` of a logo representing the source of +this entry. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + + +.. _reference.entry.source.rights: + +:py:attr:`entries[i].source.rights` +----------------------------------- + +A human-readable copyright statement for the source of this entry. + + + +:py:attr:`entries[i].source.rights_detail` +------------------------------------------ + +A dictionary containing details about the copyright statement for the source of +this entry. + + +:py:attr:`entries[i].source.rights_detail.value` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Same as :ref:`reference.entry.source.rights`. + +If this contains :abbr:`HTML (HyperText Markup Language)` or +:abbr:`XHTML (Extensible HyperText Markup Language)`, it is +:ref:`sanitized ` by default. + +If this contains :abbr:`HTML (HyperText Markup Language)` or +:abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements +within this value may contain relative +:abbr:`URI (Uniform Resource Identifier)`\s. If so, they are +:ref:`resolved according to a set of rules `. + + +:py:attr:`entries[i].source.rights_detail.type` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The content type of the copyright statement for the source of this entry. + +Most likely values for :py:attr:`~entries[i].source.rights_detail.type`: + +* :mimetype:`text/plain` +* :mimetype:`text/html` +* :mimetype:`application/xhtml+xml` + +For Atom feeds, the content type is taken from the type attribute, which +defaults to :mimetype:`text/plain` if not specified. For +:abbr:`RSS (Rich Site Summary)` feeds, the content type is auto-determined by +inspecting the content, and defaults to :mimetype:`text/html`. Note that this +may cause silent data loss if the value contains plain text with angle +brackets. There is nothing I can do about this problem; it is a limitation of +:abbr:`RSS (Rich Site Summary)`. + +Future enhancement: some versions of :abbr:`RSS (Rich Site Summary)` clearly +specify that certain values default to :mimetype:`text/plain`, and +:program:`Universal Feed Parser` should respect this, but it doesn't yet. + + +:py:attr:`entries[i].source.rights_detail.language` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The language of the copyright statement for the source of this entry. + +:py:attr:`~entries[i].source.rights_detail.language` is supposed to be a +language code, as specified by `RFC 3066`_, but publishers have been known to +publish random values like "English" or "German". +:program:`Universal Feed Parser` does not do any parsing or normalization of +language codes. + +.. _RFC 3066: http://www.ietf.org/rfc/rfc3066.txt + +:py:attr:`~entries[i].source.rights_detail.language` may come from the +element's xml:lang attribute, or it may inherit from a parent element's +xml:lang, or the Content-Language :abbr:`HTTP (Hypertext Transfer Protocol)` +header. If the feed does not specify a language, +:py:attr:`~entries[i].source.rights_detail.language` will be ``None``, the +:program:`Python` null value. + + +:py:attr:`entries[i].source.rights_detail.base` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The original base :abbr:`URI (Uniform Resource Identifier)` for links within +the copyright statement for the source of this entry. + +:py:attr:`entries[i].source.rights_detail.base` is only useful in rare +situations and can usually be ignored. It is the original base +:abbr:`URI (Uniform Resource Identifier)` for this value, as specified by the +element's xml:base attribute, or a parent element's xml:base, or the +appropriate :abbr:`HTTP (Hypertext Transfer Protocol)` header, or the +:abbr:`URI (Uniform Resource Identifier)` of the feed. (See +:ref:`advanced.base` for more details.) By the time you see it, +:program:`Universal Feed Parser` has already resolved relative links in all +values where it makes sense to do so. *Clients should never need to manually +resolve relative links.* + + + +.. _reference.entry.source.subtitle: + +:py:attr:`entries[i].source.subtitle` +------------------------------------- + +A subtitle, tagline, slogan, or other short description of the source of this +entry. + +If this contains :abbr:`HTML (HyperText Markup Language)` or +:abbr:`XHTML (Extensible HyperText Markup Language)`, it is +:ref:`sanitized ` by default. + +If this contains :abbr:`HTML (HyperText Markup Language)` or +:abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements +within this value may contain relative +:abbr:`URI (Uniform Resource Identifier)`\s. If so, they are +:ref:`resolved according to a set of rules `. + + + +:py:attr:`entries[i].source.subtitle_detail` +-------------------------------------------- + +A dictionary containing details about the subtitle for the source of this +entry. + + +:py:attr:`entries[i].source.subtitle_detail.value` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Same as :ref:`reference.entry.source.subtitle`. + +If this contains :abbr:`HTML (HyperText Markup Language)` or +:abbr:`XHTML (Extensible HyperText Markup Language)`, it is +:ref:`sanitized ` by default. + +If this contains :abbr:`HTML (HyperText Markup Language)` or +:abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements +within this value may contain relative +:abbr:`URI (Uniform Resource Identifier)`\s. If so, +they are :ref:`resolved according to a set of rules `. + + +:py:attr:`entries[i].source.subtitle_detail.type` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The content type of the subtitle of the source of this entry. + +Most likely values for :py:attr:`~entries[i].source.subtitle_detail.type`: + +* :mimetype:`text/plain`` +* :mimetype:`text/html`` +* :mimetype:`application/xhtml+xml`` + +For Atom feeds, the content type is taken from the type attribute, which +defaults to :mimetype:`text/plain`` if not specified. For +:abbr:`RSS (Rich Site Summary)` feeds, the content type is auto-determined by +inspecting the content, and defaults to :mimetype:`text/html``. Note that this +may cause silent data loss if the value contains plain text with angle +brackets. There is nothing I can do about this problem; it is a limitation of +:abbr:`RSS (Rich Site Summary)`. + +Future enhancement: some versions of :abbr:`RSS (Rich Site Summary)` clearly +specify that certain values default to :mimetype:`text/plain``, and +:program:`Universal Feed Parser` should respect this, but it doesn't yet. + + +:py:attr:`entries[i].source.subtitle_detail.language` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The language of the subtitle of the source of this entry. + +:py:attr:`~entries[i].source.subtitle_detail.language` is supposed to be a +language code, as specified by `RFC 3066`_, but publishers have been known to +publish random values like "English" or "German". +:program:`Universal Feed Parser` does not do any parsing or normalization of +language codes. + +:py:attr:`~entries[i].source.subtitle_detail.language` may come from the +element's xml:lang attribute, or it may inherit from a parent element's +xml:lang, or the Content-Language :abbr:`HTTP (Hypertext Transfer Protocol)` +header. If the feed does not specify a language, +:py:attr:`~entries[i].source.subtitle_detail.language` will be ``None``, the +:program:`Python` null value. + + +:py:attr:`entries[i].source.subtitle_detail.base` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The original base :abbr:`URI (Uniform Resource Identifier)` for links within +the subtitle of the source of this entry. + +:py:attr:`entries[i].source.subtitle_detail.base` is only useful in rare +situations and can usually be ignored. It is the original base +:abbr:`URI (Uniform Resource Identifier)` for this value, as specified by the +element's xml:base attribute, or a parent element's xml:base, or the +appropriate :abbr:`HTTP (Hypertext Transfer Protocol)` header, or the +:abbr:`URI (Uniform Resource Identifier)` of the feed. (See +:ref:`advanced.base` for more details.) By the time you see it, +:program:`Universal Feed Parser` has already resolved relative links in all +values where it makes sense to do so. *Clients should never need to manually +resolve relative links.* + + + +.. _reference.entry.source.title: + +:py:attr:`entries[i].source.title` +---------------------------------- + +The title of the source of this entry. + +If this contains :abbr:`HTML (HyperText Markup Language)` or +:abbr:`XHTML (Extensible HyperText Markup Language)`, it is +:ref:`sanitized ` by default. + +If this contains :abbr:`HTML (HyperText Markup Language)` or +:abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements within this +value may contain relative :abbr:`URI (Uniform Resource Identifier)`\s. If so, +they are :ref:`resolved according to a set of rules `. + + + +:py:attr:`entries[i].source.title_detail` +----------------------------------------- + +A dictionary containing details about the title for the source of this entry. + + +:py:attr:`entries[i].source.title_detail.value` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Same as :ref:`reference.entry.source.title`. + +If this contains :abbr:`HTML (HyperText Markup Language)` or +:abbr:`XHTML (Extensible HyperText Markup Language)`, it is +:ref:`sanitized ` by default. + +If this contains :abbr:`HTML (HyperText Markup Language)` or +:abbr:`XHTML (Extensible HyperText Markup Language)`, certain (X)HTML elements within this +value may contain relative :abbr:`URI (Uniform Resource Identifier)`\s. If so, +they are :ref:`resolved according to a set of rules `. + + +:py:attr:`entries[i].source.title_detail.type` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The content type of the title of the source of this entry. + +Most likely values for :py:attr:`entries[i].source.title_detail.type`: + +* :mimetype:`text/plain` +* :mimetype:`text/html` +* :mimetype:`application/xhtml+xml` + +For Atom feeds, the content type is taken from the type attribute, which +defaults to :mimetype:`text/plain` if not specified. For +:abbr:`RSS (Rich Site Summary)` feeds, the content type is auto-determined by +inspecting the content, and defaults to :mimetype:`text/html`. Note that this +may cause silent data loss if the value contains plain text with angle +brackets. There is nothing I can do about this problem; it is a limitation of +:abbr:`RSS (Rich Site Summary)`. + +Future enhancement: some versions of :abbr:`RSS (Rich Site Summary)` clearly +specify that certain values default to :mimetype:`text/plain`, and +:program:`Universal Feed Parser` should respect this, but it doesn't yet. + + +:py:attr:`entries[i].source.title_detail.language` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The language of the title of the source of this entry. + +:py:attr:`~entries[i].source.title_detail.language` is supposed to be a +language code, as specified by `RFC 3066`_, but publishers have been known to +publish random values like "English" or "German". +:program:`Universal Feed Parser` does not do any parsing or normalization of language codes. + +:py:attr:`~entries[i].source.title_detail.language` may come from the element's +xml:lang attribute, or it may inherit from a parent element's xml:lang, or the +Content-Language :abbr:`HTTP (Hypertext Transfer Protocol)` header. If the +feed does not specify a language, +:py:attr:`~entries[i].source.title_detail.language` will be ``None``, the +:program:`Python` null value. + + +:py:attr:`entries[i].source.title_detail.base` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The original base :abbr:`URI (Uniform Resource Identifier)` for links within +the title of the source of this entry. + +:py:attr:`entries[i].source.title_detail.base` is only useful in rare +situations and can usually be ignored. It is the original base +:abbr:`URI (Uniform Resource Identifier)` for this value, as specified by the element's +xml:base attribute, or a parent element's xml:base, or the appropriate +:abbr:`HTTP (Hypertext Transfer Protocol)` header, or the +:abbr:`URI (Uniform Resource Identifier)` of the feed. (See :ref:`advanced.base` for more +details.) By the time you see it, :program:`Universal Feed Parser` has already +resolved relative links in all values where it makes sense to do so. *Clients +should never need to manually resolve relative links.* + + +:py:attr:`entries[i].source.updated` +------------------------------------ + +The date the source of this entry was last updated, as a string in the same +format as it was published in the original feed. + +This element is :ref:`parsed as a date ` and stored in +:ref:`reference.entry.source.updated_parsed`. + + + +.. _reference.entry.source.updated_parsed: + +:py:attr:`entries[i].source.updated_parsed` +------------------------------------------- + +The date this entry was last updated, as a standard :program:`Python` 9-tuple. diff -Nru feedparser-5.0.1/docs/reference-entry-summary.rst feedparser-5.1.2/docs/reference-entry-summary.rst --- feedparser-5.0.1/docs/reference-entry-summary.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-summary.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,43 @@ +.. _reference.entry.summary: + +:py:attr:`entries[i].summary` +============================= + +A summary of the entry. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, it is :ref:`sanitized +` by default. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, certain (X)HTML elements within this +value may contain relative :abbr:`URI (Uniform Resource Identifier)`\s. If so, +they are :ref:`resolved according to a set of rules `. + +Some publishing systems auto-generate this value from the first few words or +first paragraph of the entry. Other publishing systems misuse it to include +the full content. In the latter cases, :program:`Universal Feed Parser` ought +to detect it and put the value in :ref:`reference.entry.content` instead, but +it doesn't. + + +.. note:: + + Some feeds include both a summary and description element for each entry. In + this case, the first element will be available in ``entry['summary']`` and the + second will be available in ``entry['content'][0]``. + + +.. rubric:: Comes from + +* /atom10:feed/atom10:entry/atom10:summary +* /atom03:feed/atom03:entry/atom03:summary +* /rss/channel/item/description +* /rss/channel/item/dc:description +* /rdf:RDF/rdf:item/rdf:description +* /rdf:RDF/rdf:item/dc:description + + +.. seealso:: + + * :ref:`reference.entry.summary_detail` diff -Nru feedparser-5.0.1/docs/reference-entry-summary_detail.rst feedparser-5.1.2/docs/reference-entry-summary_detail.rst --- feedparser-5.0.1/docs/reference-entry-summary_detail.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-summary_detail.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,105 @@ +.. _reference.entry.summary_detail: + +:py:attr:`entries[i].summary_detail` +==================================== + +A dictionary with details about the entry summary. + + +.. rubric:: Comes from + +* /atom10:feed/atom10:entry/atom10:summary +* /atom03:feed/atom03:entry/atom03:summary +* /rss/channel/item/description +* /rss/channel/item/dc:description +* /rdf:RDF/rdf:item/rdf:description +* /rdf:RDF/rdf:item/dc:description + + +.. seealso:: + + * :ref:`reference.entry.summary` + + +.. _reference.entry.summary_detail.value: + +:py:attr:`entries[i].summary_detail.value` +------------------------------------------ + +Same as :ref:`reference.entry.summary`. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, it is :ref:`sanitized +` by default. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, certain (X)HTML elements within this +value may contain relative :abbr:`URI (Uniform Resource Identifier)`\s. If so, +they are :ref:`resolved according to a set of rules `. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, it will be :ref:`parsed for +microformats `. + + +.. _reference.entry.summary_detail.type: + +:py:attr:`entries[i].summary_detail.type` +----------------------------------------- + +The content type of the entry summary. + +Most likely values for :py:attr:`~entries[i].summary_detail.type`: + +* :mimetype:`text/plain` +* :mimetype:`text/html` +* :mimetype:`application/xhtml+xml` + +For Atom feeds, the content type is taken from the type attribute, which +defaults to :mimetype:`text/plain` if not specified. For :abbr:`RSS (Rich Site +Summary)` feeds, the content type is auto-determined by inspecting the content, +and defaults to :mimetype:`text/html`. Note that this may cause silent data +loss if the value contains plain text with angle brackets. There is nothing I +can do about this problem; it is a limitation of :abbr:`RSS (Rich Site +Summary)`. + +Future enhancement: some versions of :abbr:`RSS (Rich Site Summary)` clearly +specify that certain values default to :mimetype:`text/plain`, and +:program:`Universal Feed Parser` should respect this, but it doesn't yet. + + +:py:attr:`entries[i].summary_detail.language` +--------------------------------------------- + +The language of the entry summary. + +:py:attr:`~entries[i].summary_detail.language` is supposed to be a language +code, as specified by `RFC 3066`_, but publishers have been known to +publish random values like "English" or "German". :program:`Universal Feed +Parser` does not do any parsing or normalization of language codes. + +.. _RFC 3066: http://www.ietf.org/rfc/rfc3066.txt + +:py:attr:`~entries[i].summary_detail.language` may come from the element's +xml:lang attribute, or it may inherit from a parent element's xml:lang, or the +Content-Language :abbr:`HTTP (Hypertext Transfer Protocol)` header. If the +feed does not specify a language, +:py:attr:`~entries[i].summary_detail.language` will be ``None``, the +:program:`Python` null value. + + +:py:attr:`entries[i].summary_detail.base` +----------------------------------------- + +The original base :abbr:`URI (Uniform Resource Identifier)` for links within +the entry summary. + +:py:attr:`~entries[i].summary_detail.base` is only useful in rare situations +and can usually be ignored. It is the original base :abbr:`URI (Uniform +Resource Identifier)` for this value, as specified by the element's xml:base +attribute, or a parent element's xml:base, or the appropriate :abbr:`HTTP +(Hypertext Transfer Protocol)` header, or the :abbr:`URI (Uniform Resource +Identifier)` of the feed. (See :ref:`advanced.base` for more details.) By the +time you see it, :program:`Universal Feed Parser` has already resolved relative +links in all values where it makes sense to do so. *Clients should never need +to manually resolve relative links.* diff -Nru feedparser-5.0.1/docs/reference-entry-tags.rst feedparser-5.1.2/docs/reference-entry-tags.rst --- feedparser-5.0.1/docs/reference-entry-tags.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-tags.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,46 @@ +.. _reference.entry.tags: + +:py:attr:`entries[i].tags` +========================== + +A list of dictionaries that contain details of the categories for the entry. + + +.. note:: + + Prior to version 4.0, :program:`Universal Feed Parser` exposed categories in + ``feed.category`` (the primary category) and ``feed.categories`` (a list of + tuples containing the domain and term of each category). These uses are still + supported for backward compatibility, but you will not see them in the parsed + results unless you explicitly ask for them. + + +.. _reference.entry.tags.term: + +:py:attr:`entries[i].tags[j].term` +---------------------------------- + +The category term (keyword). + + +:py:attr:`entries[i].tags[j].scheme` +------------------------------------ + +The category scheme (domain). + + +:py:attr:`entries[i].tags[j].label` +----------------------------------- + +A human-readable label for the category. + + +.. rubric:: Comes from + +* /atom10:feed/atom10:entry/category +* /atom03:feed/atom03:entry/dc:subject +* /rss/channel/item/category +* /rss/channel/item/dc:subject +* /rss/channel/item/itunes:category +* /rss/channel/item/itunes:keywords +* /rdf:RDF/rdf:channel/rdf:item/dc:subject diff -Nru feedparser-5.0.1/docs/reference-entry-title.rst feedparser-5.1.2/docs/reference-entry-title.rst --- feedparser-5.0.1/docs/reference-entry-title.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-title.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,30 @@ +.. _reference.entry.title: + +:py:attr:`entries[i].title` +=========================== + +The title of the entry. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, it is :ref:`sanitized +` by default. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, certain (X)HTML elements within this +value may contain relative :abbr:`URI (Uniform Resource Identifier)`s. If so, +they are :ref:`resolved according to a set of rules `. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:entry/atom03:title +* /atom10:feed/atom10:entry/atom10:title +* /rdf:RDF/rdf:item/dc:title +* /rdf:RDF/rdf:item/rdf:title +* /rss/channel/item/dc:title +* /rss/channel/item/title + + +.. seealso:: + + * :ref:`reference.entry.title_detail` diff -Nru feedparser-5.0.1/docs/reference-entry-title_detail.rst feedparser-5.1.2/docs/reference-entry-title_detail.rst --- feedparser-5.0.1/docs/reference-entry-title_detail.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-title_detail.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,98 @@ +.. _reference.entry.title_detail: + +:py:attr:`entries[i].title_detail` +================================== + +A dictionary with details about the entry title. + + +.. _reference.entry.title_detail.value: + +:py:attr:`entries[i].title_detail.value` +---------------------------------------- + +Same as :ref:`reference.entry.title`. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, it is :ref:`sanitized +` by default. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, certain (X)HTML elements within this +value may contain relative :abbr:`URI (Uniform Resource Identifier)`\s. If so, +they are :ref:`resolved according to a set of rules `. + + +:py:attr:`entries[i].title_detail.type` +--------------------------------------- + +The content type of the entry title. + +Most likely values for :py:attr:`~entries[i].title_detail.type`: + +* :mimetype:`text/plain` +* :mimetype:`text/html` +* :mimetype:`application/xhtml+xml` + +For Atom feeds, the content type is taken from the type attribute, which +defaults to :mimetype:`text/plain` if not specified. For :abbr:`RSS (Rich Site +Summary)` feeds, the content type is auto-determined by inspecting the content, +and defaults to :mimetype:`text/html`. Note that this may cause silent data +loss if the value contains plain text with angle brackets. There is nothing I +can do about this problem; it is a limitation of :abbr:`RSS (Rich Site +Summary)`. + +Future enhancement: some versions of :abbr:`RSS (Rich Site Summary)` clearly +specify that certain values default to :mimetype:`text/plain`, and +:program:`Universal Feed Parser` should respect this, but it doesn't yet. + + +:py:attr:`entries[i].title_detail.language` +------------------------------------------- + +The language of the entry title. + +:py:attr:`~entries[i].title_detail.language` is supposed to be a language code, +as specified by `RFC 3066`_, but publishers have been known to +publish random values like "English" or "German". :program:`Universal Feed +Parser` does not do any parsing or normalization of language codes. + +.. _RFC 3066: http://www.ietf.org/rfc/rfc3066.txt + +:py:attr:`~entries[i].title_detail.language` may come from the element's +xml:lang attribute, or it may inherit from a parent element's xml:lang, or the +Content-Language :abbr:`HTTP (Hypertext Transfer Protocol)` header. If the +feed does not specify a language, :py:attr:`~entries[i].title_detail.language` +will be ``None``, the :program:`Python` null value. + + +:py:attr:`entries[i].title_detail.base` +--------------------------------------- + +The original base :abbr:`URI (Uniform Resource Identifier)` for links within +the entry title. + +:py:attr:`~entries[i].title_detail.base` is only useful in rare situations and +can usually be ignored. It is the original base :abbr:`URI (Uniform Resource +Identifier)` for this value, as specified by the element's xml:base attribute, +or a parent element's xml:base, or the appropriate :abbr:`HTTP (Hypertext +Transfer Protocol)` header, or the :abbr:`URI (Uniform Resource Identifier)` of +the feed. (See :ref:`advanced.base` for more details.) By the time you see +it, :program:`Universal Feed Parser` has already resolved relative links in all +values where it makes sense to do so. *Clients should never need to manually +resolve relative links.* + + +.. rubric:: Comes from + +* /atom10:feed/atom10:entry/atom10:title +* /atom03:feed/atom03:entry/atom03:title +* /rss/channel/item/title +* /rss/channel/item/dc:title +* /rdf:RDF/rdf:item/rdf:title +* /rdf:RDF/rdf:item/dc:title + + +.. seealso:: + + * :ref:`reference.entry.title` diff -Nru feedparser-5.0.1/docs/reference-entry-updated.rst feedparser-5.1.2/docs/reference-entry-updated.rst --- feedparser-5.0.1/docs/reference-entry-updated.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-updated.rst 2012-02-19 21:50:48.000000000 +0000 @@ -0,0 +1,41 @@ +.. _reference.entry.updated: + +:py:attr:`entries[i].updated` +============================= + +The date this entry was last updated, as a string in the same format as it was +published in the original feed). + +This element is :ref:`parsed as a date ` and stored in +:ref:`reference.entry.updated_parsed`. + + +.. note:: + + As of version 5.1.1, if this key doesn't exist but + :py:attr:`entries[i].published` does, the value of + :py:attr:`entries[i].published` will be returned. + + In the past the RSS pubDate element was stored in `updated`, but this incorrect + behavior was reported in issue 310. However, developers may have come to rely + on this incorrect behavior -- as was reported in issue 328 -- so to help avoid + hurting their users' experience, this mapping from `updated` to `published` was + temporarily introduced to give developers time to update their software, and to + give users time to upgrade. + + This mapping is temporary and will be removed in a future version of + feedparser. + +.. rubric:: Comes from + +* /atom03:feed/atom03:entry/atom03:modified +* /atom10:feed/atom10:entry/atom10:updated +* /rdf:RDF/rdf:item/dc:date +* /rdf:RDF/rdf:item/dcterms:modified +* /rss/channel/item/dc:date +* /rss/channel/item/dcterms:modified + + +.. seealso:: + + * :ref:`reference.entry.updated_parsed` diff -Nru feedparser-5.0.1/docs/reference-entry-updated_parsed.rst feedparser-5.1.2/docs/reference-entry-updated_parsed.rst --- feedparser-5.0.1/docs/reference-entry-updated_parsed.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-updated_parsed.rst 2012-02-19 21:50:48.000000000 +0000 @@ -0,0 +1,38 @@ +.. _reference.entry.updated_parsed: + +:py:attr:`entries[i].updated_parsed` +==================================== + +The date this entry was last updated, as a standard :program:`Python` 9-tuple. + + +.. note:: + + As of version 5.1.1, if this key doesn't exist but + :py:attr:`entries[i].published_parsed` does, the value of + :py:attr:`entries[i].published_parsed` will be returned. + + In the past the RSS pubDate element was stored in `updated`, but this incorrect + behavior was reported in issue 310. However, developers may have come to rely + on this incorrect behavior -- as was reported in issue 328 -- so to help avoid + hurting their users' experience, this mapping from `updated_parsed` to + `published_parsed` was temporarily introduced to give developers time to update + their software, and to give users time to upgrade. + + This mapping is temporary and will be removed in a future version of + feedparser. + + +.. rubric:: Comes from + +* /atom10:feed/atom10:entry/atom10:updated +* /atom03:feed/atom03:entry/atom03:modified +* /rss/channel/item/dc:date +* /rss/channel/item/dcterms:modified +* /rdf:RDF/rdf:item/dc:date +* /rdf:RDF/rdf:item/dcterms:modified + + +.. seealso:: + + * :ref:`reference.entry.updated` diff -Nru feedparser-5.0.1/docs/reference-entry-vcard.rst feedparser-5.1.2/docs/reference-entry-vcard.rst --- feedparser-5.0.1/docs/reference-entry-vcard.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-vcard.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,25 @@ +.. _reference.entry.vcard: + +:py:attr:`entries[i].vcard` +=========================== + +An RFC 2426-compliant vCard derived from :ref:`hCard information +` found in this entry's :abbr:`HTML (HyperText +Markup Language)` content. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:entry/atom03:content +* /atom03:feed/atom03:entry/atom03:summary +* /atom10:feed/atom10:entry/atom10:content +* /atom10:feed/atom10:entry/atom10:summary +* /rdf:RDF/rdf:item/content:encoded +* /rdf:RDF/rdf:item/dc:description +* /rdf:RDF/rdf:item/rdf:description +* /rss/channel/item/body +* /rss/channel/item/content:encoded +* /rss/channel/item/dc:description +* /rss/channel/item/description +* /rss/channel/item/fullitem +* /rss/channel/item/xhtml:body diff -Nru feedparser-5.0.1/docs/reference-entry-xfn.rst feedparser-5.1.2/docs/reference-entry-xfn.rst --- feedparser-5.0.1/docs/reference-entry-xfn.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry-xfn.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,52 @@ +.. _reference.entry.xfn: + +:py:attr:`entries[i].xfn` +========================= + +A list of :ref:`XFN relationships ` found in this +entry's :abbr:`HTML (HyperText Markup Language)` content. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:entry/atom03:content +* /atom03:feed/atom03:entry/atom03:summary +* /atom10:feed/atom10:entry/atom10:content +* /atom10:feed/atom10:entry/atom10:summary +* /rdf:RDF/rdf:item/content:encoded +* /rdf:RDF/rdf:item/dc:description +* /rdf:RDF/rdf:item/rdf:description +* /rss/channel/item/body +* /rss/channel/item/content:encoded +* /rss/channel/item/dc:description +* /rss/channel/item/description +* /rss/channel/item/fullitem +* /rss/channel/item/xhtml:body + +entries[i].xfn is a list. Each list item represents a single person and may +contain the following values: + + +:py:attr:`entries[i].xfn[j].relationships` +------------------------------------------ + +A list of relationships for this person. Each list item is a string, either +one of the constants defined in the `XFN 1.1 profile`_ or :ref:`one of these +variations `. + +.. _XFN 1.1 profile: http://gmpg.org/xfn/11 + + +:py:attr:`entries[i].xfn[j].href` +--------------------------------- + +The :abbr:`URI (Uniform Resource Identifier)` for this person. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +:py:attr:`entries[i].xfn[j].name` +--------------------------------- + +The name of this person, a string. diff -Nru feedparser-5.0.1/docs/reference-entry.rst feedparser-5.1.2/docs/reference-entry.rst --- feedparser-5.0.1/docs/reference-entry.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-entry.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,18 @@ +:py:attr:`entries` +================== + +A list of dictionaries. Each dictionary contains data from a different entry. +Entries are listed in the order in which they appear in the original feed. + + +.. tip:: + + This element always exists, although it may be an empty list. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:entry +* /atom10:feed/atom10:entry +* /rdf:RDF/rdf:item +* /rss/channel/item diff -Nru feedparser-5.0.1/docs/reference-etag.rst feedparser-5.1.2/docs/reference-etag.rst --- feedparser-5.0.1/docs/reference-etag.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-etag.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,13 @@ +:py:attr:`etag` +=============== + +The ETag of the feed, as specified in the :abbr:`HTTP (Hypertext Transfer Protocol)` headers. + +The purpose of :py:attr:`etag` is explained more fully in :ref:`http.etag`. + +.. tip:: + + :py:attr:`etag` will only be present if the feed was retrieved from a web server, and + only if the web server provided an ETag :abbr:`HTTP (Hypertext Transfer Protocol)` + header for the feed. If the feed was parsed from a local file or from a string + in memory, :py:attr:`etag` will not be present. diff -Nru feedparser-5.0.1/docs/reference-feed-author.rst feedparser-5.1.2/docs/reference-feed-author.rst --- feedparser-5.0.1/docs/reference-feed-author.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-author.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,23 @@ +.. _reference.feed.author: + +:py:attr:`feed.author` +====================== + +The author of this feed. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:author +* /atom10:feed/atom10:author +* /rdf:RDF/rdf:channel/dc:author +* /rdf:RDF/rdf:channel/dc:creator +* /rss/channel/dc:author +* /rss/channel/dc:creator +* /rss/channel/itunes:author +* /rss/channel/managingEditor + + +.. seealso:: + + * :ref:`reference.feed.author_detail` diff -Nru feedparser-5.0.1/docs/reference-feed-author_detail.rst feedparser-5.1.2/docs/reference-feed-author_detail.rst --- feedparser-5.0.1/docs/reference-feed-author_detail.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-author_detail.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,51 @@ +.. _reference.feed.author_detail: + +:py:attr:`feed.author_detail` +============================= + +A dictionary with details about the feed author. + + +.. _reference.feed.author_detail.name: + +:py:attr:`feed.author_detail.name` +---------------------------------- + +The name of the feed author. + + +.. _reference.feed.author_detail.href: + +:py:attr:`feed.author_detail.href` +---------------------------------- + +The :abbr:`URL (Uniform Resource Locator)` of the feed author. This can be the +author's home page, or a contact page with a webmail form. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +.. _reference.feed.author_detail.email: + +:py:attr:`feed.author_detail.email` +----------------------------------- + +The email address of the feed author. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:author +* /atom10:feed/atom10:author +* /rdf:RDF/rdf:channel/dc:author +* /rdf:RDF/rdf:channel/dc:creator +* /rss/channel/dc:author +* /rss/channel/dc:creator +* /rss/channel/itunes:author +* /rss/channel/managingEditor + + +.. seealso:: + + * :ref:`reference.feed.author` diff -Nru feedparser-5.0.1/docs/reference-feed-cloud.rst feedparser-5.1.2/docs/reference-feed-cloud.rst --- feedparser-5.0.1/docs/reference-feed-cloud.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-cloud.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,65 @@ +:py:attr:`feed.cloud` +===================== + +No one really knows what a cloud is. It is vaguely documented in `:abbr:`SOAP +(Simple Object Access Protocol)` meets :abbr:`RSS (Rich Site Summary)` +`_. + + +.. _reference.feed.cloud.domain: + +:py:attr:`feed.cloud.domain` +---------------------------- + +The domain of the cloud. Should be just the domain name, not including the +http:// protocol. All clouds are presumed to operate over :abbr:`HTTP +(Hypertext Transfer Protocol)`. The cloud specification does not support +secure clouds over :abbr:`HTTPS`, nor can clouds operate over other protocols. + + +.. _reference.feed.cloud.port: + +:py:attr:`feed.cloud.port` +-------------------------- + +The port of the cloud. Should be an integer, but :program:`Universal Feed +Parser` currently returns it as a string. + + +.. _reference.feed.cloud.path: + +:py:attr:`feed.cloud.path` +-------------------------- + +The :abbr:`URL (Uniform Resource Locator)` path of the cloud. + + +.. _reference.feed.cloud.registerProcedure: + +:py:attr:`feed.cloud.registerProcedure` +--------------------------------------- + +The name of the procedure to call on the cloud. + + +.. _reference.feed.cloud.protocol: + +:py:attr:`feed.cloud.protocol` +------------------------------ + +The protocol of the cloud. Documentation differs on what the acceptable values +are. Acceptable values definitely include xml-rpc and soap, although only in +lowercase, despite both being acronyms. + +There is no way for a publisher to specify the version number of the protocol +to use. soap refers to :abbr:`SOAP (Simple Object Access Protocol)` 1.1; the +cloud interface does not support :abbr:`SOAP (Simple Object Access Protocol)` +1.0 or 1.2. + +post or http-post might also be acceptable values; nobody really knows for +sure. + + +.. rubric:: Comes from + +* /rss/channel/cloud diff -Nru feedparser-5.0.1/docs/reference-feed-contributors.rst feedparser-5.1.2/docs/reference-feed-contributors.rst --- feedparser-5.0.1/docs/reference-feed-contributors.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-contributors.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,35 @@ +:py:attr:`feed.contributors` +============================ + +A list of contributors (secondary authors) to this feed. + + +:py:attr:`feed.contributors[i].name` +------------------------------------ + +The name of this contributor. + + +.. _reference.feed.contributors.href: + +:py:attr:`feed.contributors[i].href` +------------------------------------ + +The :abbr:`URL (Uniform Resource Locator)` of this contributor. This can be +the contributor's home page, or a contact page with a webmail form. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +:py:attr:`feed.contributors[i].email` +------------------------------------- + +The email address of this contributor. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:contributor +* /atom10:feed/atom10:contributor +* /rss/channel/dc:contributor diff -Nru feedparser-5.0.1/docs/reference-feed-docs.rst feedparser-5.1.2/docs/reference-feed-docs.rst --- feedparser-5.0.1/docs/reference-feed-docs.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-docs.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,20 @@ +.. _reference.feed.docs: + +:py:attr:`feed.docs` +==================== + +A :abbr:`URL (Uniform Resource Locator)` pointing to the specification which +this feed conforms to. + +This element is rare. The reasoning was that in 25 years, someone will stumble +on an :abbr:`RSS (Rich Site Summary)` feed and not know what it is, so we +should waste everyone's bandwidth with useless links until then. Most +publishers skip it, and all clients ignore it. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +.. rubric:: Comes from + +* /rss/channel/docs diff -Nru feedparser-5.0.1/docs/reference-feed-errorreportsto.rst feedparser-5.1.2/docs/reference-feed-errorreportsto.rst --- feedparser-5.0.1/docs/reference-feed-errorreportsto.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-errorreportsto.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,10 @@ +.. _reference.feed.errorreportsto: + +:py:attr:`feed.errorreportsto` +============================== + +An email address for reporting errors in the feed itself. + +.. rubric:: Comes from + +* /rdf:RDF/admin:errorReportsTo/@rdf:resource diff -Nru feedparser-5.0.1/docs/reference-feed-generator.rst feedparser-5.1.2/docs/reference-feed-generator.rst --- feedparser-5.0.1/docs/reference-feed-generator.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-generator.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,19 @@ +.. _reference.feed.generator: + +:py:attr:`feed.generator` +========================= + +A human-readable name of the application used to generate the feed. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:generator +* /atom10:feed/atom10:generator +* /rdf:RDF/rdf:channel/admin:generatorAgent/@rdf:resource +* /rss/channel/generator + + +.. seealso:: + + * :ref:`reference.feed.generator_detail` diff -Nru feedparser-5.0.1/docs/reference-feed-generator_detail.rst feedparser-5.1.2/docs/reference-feed-generator_detail.rst --- feedparser-5.0.1/docs/reference-feed-generator_detail.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-generator_detail.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,48 @@ +.. _reference.feed.generator_detail: + +:py:attr:`feed.generator_detail` +================================ + +A dictionary with details about the feed generator. + + + +:py:attr:`feed.generator_detail.name` +------------------------------------- + +Same as :ref:`reference.feed.generator`. + + +.. _reference.feed.generator_detail.href: + +:py:attr:`feed.generator_detail.href` +------------------------------------- + +The :abbr:`URL (Uniform Resource Locator)` of the application used to generate +the feed. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +.. _reference.feed.generator_detail.version: + +:py:attr:`feed.generator_detail.version` +---------------------------------------- + +The version number of the application used to generate the feed. There is no +required format for this, but most applications use a MAJOR.MINOR version +number. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:generator +* /atom10:feed/atom10:generator +* /rdf:RDF/rdf:channel/admin:generatorAgent/@rdf:resource +* /rss/channel/generator + + +.. seealso:: + + * :ref:`reference.feed.generator` diff -Nru feedparser-5.0.1/docs/reference-feed-icon.rst feedparser-5.1.2/docs/reference-feed-icon.rst --- feedparser-5.0.1/docs/reference-feed-icon.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-icon.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,12 @@ +:py:attr:`feed.icon` +==================== + +A URL to a small icon representing the feed. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +.. rubric:: Comes from + +* /atom10:feed/atom10:icon diff -Nru feedparser-5.0.1/docs/reference-feed-id.rst feedparser-5.1.2/docs/reference-feed-id.rst --- feedparser-5.0.1/docs/reference-feed-id.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-id.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,15 @@ +.. _reference.feed.id: + +:py:attr:`feed.id` +================== + +A globally unique identifier for this feed. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:id +* /atom10:feed/atom10:id diff -Nru feedparser-5.0.1/docs/reference-feed-image.rst feedparser-5.1.2/docs/reference-feed-image.rst --- feedparser-5.0.1/docs/reference-feed-image.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-image.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,107 @@ +:py:attr:`feed.image` +===================== + +A dictionary with details about the feed image. A feed image can be a logo, +banner, or a picture of the author. + + +.. _reference.feed.image.title: + +:py:attr:`feed.image.title` +----------------=========== + +The alternate text of the feed image, which would go in the alt attribute if +you rendered the feed image as an :abbr:`HTML (HyperText Markup Language)` img +element. + + +.. _reference.feed.image.href: + +:py:attr:`feed.image.href` +-------------------------- + +The :abbr:`URL (Uniform Resource Locator)` of the feed image itself, which +would go in the src attribute if you rendered the feed image as an :abbr:`HTML +(HyperText Markup Language)` img element. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +.. _reference.feed.image.link: + +:py:attr:`feed.image.link` +-------------------------- + +The :abbr:`URL (Uniform Resource Locator)` which the feed image would point to. +If you rendered the feed image as an :abbr:`HTML (HyperText Markup Language)` +img element, you would wrap it in an a element and put this in the href +attribute. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +.. _reference.feed.image.width: + +:py:attr:`feed.image.width` +--------------------------- + +The width of the feed image, which would go in the width attribute if you +rendered the feed image as an :abbr:`HTML (HyperText Markup Language)` img +element. + + +.. _reference.feed.image.height: + +:py:attr:`feed.image.height` +---------------------------- + +The height of the feed image, which would go in the height attribute if you +rendered the feed image as an :abbr:`HTML (HyperText Markup Language)` img +element. + + +:py:attr:`feed.image.description` +--------------------------------- + +A short description of the feed image, which would go in the title attribute if +you rendered the feed image as an :abbr:`HTML (HyperText Markup Language)` img +element. This element is rare; it was available in Netscape :abbr:`RSS (Rich +Site Summary)` 0.91 but was dropped from Userland :abbr:`RSS (Rich Site +Summary)` 0.91. + + +.. rubric:: Annotated example + +This is a feed image: +:: + + + + Feed logo + http://example.org/logo.png + http://example.org/ + 80 + 15 + Visit my home page + + + +This feed image could be rendered in :abbr:`HTML (HyperText Markup Language)` as this: +:: + + +
+ Feed logo + + + +.. rubric:: Comes from + +* /rdf:RDF/rdf:image +* /rss/channel/image diff -Nru feedparser-5.0.1/docs/reference-feed-info-detail.rst feedparser-5.1.2/docs/reference-feed-info-detail.rst --- feedparser-5.0.1/docs/reference-feed-info-detail.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-info-detail.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,94 @@ +.. _reference.feed.info_detail: + +:py:attr:`feed.info_detail` +=========================== + +A dictionary with details about the feed info. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:info + + +.. seealso:: + + * :ref:`reference.feed.info` + + +.. _reference.feed.info_detail.value: + +:py:attr:`feed.info_detail.value` +--------------------------------- + +Same as :ref:`reference.feed.info`. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, it is :ref:`sanitized +` by default. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, certain (X)HTML elements within this +value may contain relative :abbr:`URI (Uniform Resource Identifier)`s. If so, +they are :ref:`resolved according to a set of rules `. + + +.. _reference.feed.info_detail.type: + +:py:attr:`feed.info_detail.type` +-------------------------------- + +The content type of the feed info. + +Most likely values for :py:attr:`~feed.info_detail.type`: + +* :mimetype:`text/plain` +* :mimetype:`text/html` +* :mimetype:`application/xhtml+xml` + +For Atom feeds, the content type is taken from the type attribute, which +defaults to :mimetype:`text/plain` if not specified. For :abbr:`RSS (Rich Site +Summary)` feeds, the content type is auto-determined by inspecting the content, +and defaults to :mimetype:`text/html`. Note that this may cause silent data +loss if the value contains plain text with angle brackets. There is nothing I +can do about this problem; it is a limitation of :abbr:`RSS (Rich Site +Summary)`. + +Future enhancement: some versions of :abbr:`RSS (Rich Site Summary)` clearly +specify that certain values default to :mimetype:`text/plain`, and +:program:`Universal Feed Parser` should respect this, but it doesn't yet. + + +:py:attr:`feed.info_detail.language` +------------------------------------ + +The language of the feed info. + +:py:attr:`~feed.info_detail.language` is supposed to be a language code, as +specified by `:abbr:`RFC (Request For Comments)` 3066 +`_, but publishers have been known to +publish random values like "English" or "German". :program:`Universal Feed +Parser` does not do any parsing or normalization of language codes. + +:py:attr:`~feed.info_detail.language` may come from the element's xml:lang +attribute, or it may inherit from a parent element's xml:lang, or the +Content-Language :abbr:`HTTP (Hypertext Transfer Protocol)` header. If the +feed does not specify a language, :py:attr:`~feed.info_detail.language` will be +``None``, the :program:`Python` null value. + + +:py:attr:`feed.info_detail.base` +-------------------------------- + +The original base :abbr:`URI (Uniform Resource Identifier)` for links within +the feed copyright. + +:py:attr:`~feed.info_detail.base` is only useful in rare situations and can +usually be ignored. It is the original base :abbr:`URI (Uniform Resource +Identifier)` for this value, as specified by the element's xml:base attribute, +or a parent element's xml:base, or the appropriate :abbr:`HTTP (Hypertext +Transfer Protocol)` header, or the :abbr:`URI (Uniform Resource Identifier)` of +the feed. (See :ref:`advanced.base` for more details.) By the time you see +it, :program:`Universal Feed Parser` has already resolved relative links in all +values where it makes sense to do so. *Clients should never need to manually +resolve relative links.* diff -Nru feedparser-5.0.1/docs/reference-feed-info.rst feedparser-5.1.2/docs/reference-feed-info.rst --- feedparser-5.0.1/docs/reference-feed-info.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-info.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,28 @@ +.. _reference.feed.info: + +:py:attr:`feed.info` +==================== + +Free-form human-readable description of the feed format itself. Intended for +people who view the feed in a browser, to explain what they just clicked on. +This element is generally ignored by feed readers. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, it is :ref:`sanitized +` by default. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, certain (X)HTML elements within this +value may contain relative :abbr:`URI (Uniform Resource Identifier)`s. If so, +they are :ref:`resolved according to a set of rules `. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:info +* /rss/channel/feedburner:browserFriendly + + +.. seealso:: + + * :ref:`reference.feed.info_detail` diff -Nru feedparser-5.0.1/docs/reference-feed-language.rst feedparser-5.1.2/docs/reference-feed-language.rst --- feedparser-5.0.1/docs/reference-feed-language.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-language.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,15 @@ +.. _reference.feed.language: + +:py:attr:`feed.language` +======================== + +The primary language of the feed. + + +.. rubric:: Comes from + +* /atom03:feed/@xml:lang +* /atom10:feed/@xml:lang +* /rdf:RDF/rdf:channel/dc:language +* /rss/channel/dc:language +* /rss/channel/language diff -Nru feedparser-5.0.1/docs/reference-feed-license.rst feedparser-5.1.2/docs/reference-feed-license.rst --- feedparser-5.0.1/docs/reference-feed-license.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-license.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,17 @@ +.. _reference.feed.license: + +:py:attr:`feed.license` +======================= + +A :abbr:`URL (Uniform Resource Locator)` of the license under which this feed +is distributed. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +.. rubric:: Comes from + +* /atom10:feed/atom10:link[@rel="license"]/@href +* /rdf:RDF/cc:license/@rdf:resource +* /rss/channel/creativeCommons:license diff -Nru feedparser-5.0.1/docs/reference-feed-link.rst feedparser-5.1.2/docs/reference-feed-link.rst --- feedparser-5.0.1/docs/reference-feed-link.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-link.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,29 @@ +.. _reference.feed.link: + +:py:attr:`feed.link` +==================== + +The :abbr:`URL (Uniform Resource Locator)` of the :abbr:`HTML (HyperText Markup +Language)` page associated with this feed. + +For site feeds, this is probably the home page of the site. For category +feeds, this is probably the category's archive page. For search feeds, this is +probably the web page that displays the search results for the given search +parameters. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:link[@rel="alternate"]/@href +* /atom10:feed/atom10:link[@rel="alternate"]/@href +* /atom10:feed/atom10:link[not(@rel)]/@href +* /rdf:RDF/rdf:channel/rdf:link +* /rss/channel/link + + +.. seealso:: + + * :ref:`reference.feed.links` diff -Nru feedparser-5.0.1/docs/reference-feed-links.rst feedparser-5.1.2/docs/reference-feed-links.rst --- feedparser-5.0.1/docs/reference-feed-links.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-links.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,65 @@ +.. _reference.feed.links: + +:py:attr:`feed.links` +===================== + +A list of dictionaries with details on the links associated with the feed. +Each link has a rel (relationship), type (content type), and href (the +:abbr:`URL (Uniform Resource Locator)` that the link points to). Some links +may also have a title. + + +.. _reference.feed.links.rel: + +:py:attr:`feed.links[i].rel` +---------------------------- + +The relationship of this feed link. + +Atom 1.0 defines five standard link relationships and describes the process for +registering others. Here are the five standard rel values: + +- `alternate` +- `enclosure` +- `related` +- `self` +- `via` + + +.. _reference.feed.links.type: + +:py:attr:`feed.links[i].type` +----------------------------- + +The content type of the page that this feed link points to. + + +.. _reference.feed.links.href: + +:py:attr:`feed.links[i].href` +----------------------------- + +The :abbr:`URL (Uniform Resource Locator)` of the page that this feed link +points to. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +:py:attr:`feed.links[i].title` +------------------------------ + +The title of this feed link. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:link +* /atom10:feed/atom10:link +* /rdf:RDF/rdf:channel/rdf:link +* /rss/channel/link + + +.. seealso:: + + * :ref:`reference.feed.link` diff -Nru feedparser-5.0.1/docs/reference-feed-logo.rst feedparser-5.1.2/docs/reference-feed-logo.rst --- feedparser-5.0.1/docs/reference-feed-logo.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-logo.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,12 @@ +:py:attr:`feed.logo` +==================== + +A URL to a graphic representing a logo for the feed. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +.. rubric:: Comes from + +* /atom10:feed/atom10:logo diff -Nru feedparser-5.0.1/docs/reference-feed-published.rst feedparser-5.1.2/docs/reference-feed-published.rst --- feedparser-5.0.1/docs/reference-feed-published.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-published.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,20 @@ +.. _reference.feed.published: + +:py:attr:`feed.published` +========================= + +The date the feed was published, as a string in the same format as it was +published in the original feed. + +This element is :ref:`parsed as a date ` and stored in +:ref:`reference.feed.published_parsed`. + + +.. rubric:: Comes from + +* /rss/channel/pubDate + + +.. seealso:: + + * :ref:`reference.feed.published_parsed` diff -Nru feedparser-5.0.1/docs/reference-feed-published_parsed.rst feedparser-5.1.2/docs/reference-feed-published_parsed.rst --- feedparser-5.0.1/docs/reference-feed-published_parsed.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-published_parsed.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,16 @@ +.. _reference.feed.published_parsed: + +:py:attr:`feed.published_parsed` +================================ + +The date the feed was published, as a standard :program:`Python` 9-tuple. + + +.. rubric:: Comes from + +* /rss/channel/pubDate + + +.. seealso:: + + * :ref:`reference.feed.published` diff -Nru feedparser-5.0.1/docs/reference-feed-publisher.rst feedparser-5.1.2/docs/reference-feed-publisher.rst --- feedparser-5.0.1/docs/reference-feed-publisher.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-publisher.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,19 @@ +.. _reference.feed.publisher: + +:py:attr:`feed.publisher` +========================= + +The publisher of the feed. + + +.. rubric:: Comes from + +* /rdf:RDF/rdf:channel/dc:publisher +* /rss/channel/dc:publisher +* /rss/channel/itunes:owner +* /rss/channel/webMaster + + +.. seealso:: + + * :ref:`reference.feed.publisher_detail` diff -Nru feedparser-5.0.1/docs/reference-feed-publisher_detail.rst feedparser-5.1.2/docs/reference-feed-publisher_detail.rst --- feedparser-5.0.1/docs/reference-feed-publisher_detail.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-publisher_detail.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,43 @@ +.. _reference.feed.publisher_detail: + +:py:attr:`feed.publisher_detail` +================================ + +A dictionary with details about the feed publisher. + + +:py:attr:`feed.publisher_detail.name` +------------------------------------- + +The name of this feed's publisher. + + +.. _reference.feed.publisher_detail.href: + +:py:attr:`feed.publisher_detail.href` +------------------------------------- + +The :abbr:`URL (Uniform Resource Locator)` of this feed's publisher. This can +be the publisher's home page, or a contact page with a webmail form. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +:py:attr:`feed.publisher_detail.email` +-------------------------------------- + +The email address of this feed's publisher. + + +.. rubric:: Comes from + +* /rdf:RDF/rdf:channel/dc:publisher +* /rss/channel/dc:publisher +* /rss/channel/itunes:owner +* /rss/channel/webMaster + + +.. seealso:: + + * :ref:`reference.feed.publisher` diff -Nru feedparser-5.0.1/docs/reference-feed-rights.rst feedparser-5.1.2/docs/reference-feed-rights.rst --- feedparser-5.0.1/docs/reference-feed-rights.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-rights.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,34 @@ +.. _reference.feed.rights: + +:py:attr:`feed.rights` +====================== + +A human-readable copyright statement for the feed. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, it is :ref:`sanitized +` by default. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, certain (X)HTML elements within this +value may contain relative :abbr:`URI (Uniform Resource Identifier)`s. If so, +they are :ref:`resolved according to a set of rules `. + + +.. note:: + + For machine-readable copyright information, see :ref:`reference.feed.license`. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:copyright +* /atom10:feed/atom10:rights +* /rdf:RDF/rdf:channel/dc:rights +* /rss/channel/copyright +* /rss/channel/dc:rights + + +.. seealso:: + + * :ref:`reference.feed.rights_detail` diff -Nru feedparser-5.0.1/docs/reference-feed-rights_detail.rst feedparser-5.1.2/docs/reference-feed-rights_detail.rst --- feedparser-5.0.1/docs/reference-feed-rights_detail.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-rights_detail.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,99 @@ +.. _reference.feed.rights_detail: + +:py:attr:`feed.rights_detail` +============================= + +A dictionary with details on the feed copyright. + + +.. _reference.feed.rights_detail.value: + +:py:attr:`feed.rights_detail.value` +----------------------------------- + +Same as :ref:`reference.feed.rights`. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, it is :ref:`sanitized +` by default. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, certain (X)HTML elements within this +value may contain relative :abbr:`URI (Uniform Resource Identifier)`s. If so, +they are :ref:`resolved according to a set of rules `. + + +.. _reference.feed.rights_detail.type: + +:py:attr:`feed.rights_detail.type` +---------------------------------- + +The content type of the feed copyright. + +Most likely values for :py:attr:`~feed.rights_detail.type`: + +* :mimetype:`text/plain` +* :mimetype:`text/html` +* :mimetype:`application/xhtml+xml` + +For Atom feeds, the content type is taken from the type attribute, which +defaults to :mimetype:`text/plain` if not specified. For :abbr:`RSS (Rich Site +Summary)` feeds, the content type is auto-determined by inspecting the content, +and defaults to :mimetype:`text/html`. Note that this may cause silent data +loss if the value contains plain text with angle brackets. There is nothing I +can do about this problem; it is a limitation of :abbr:`RSS (Rich Site +Summary)`. + +Future enhancement: some versions of :abbr:`RSS (Rich Site Summary)` clearly +specify that certain values default to :mimetype:`text/plain`, and +:program:`Universal Feed Parser` should respect this, but it doesn't yet. + + + +:py:attr:`feed.rights_detail.language` +-------------------------------------- + +The language of the feed copyright. + +:py:attr:`~feed.rights_detail.language` is supposed to be a language code, as +specified by `:abbr:`RFC (Request For Comments)` 3066 +`_, but publishers have been known to +publish random values like "English" or "German". :program:`Universal Feed +Parser` does not do any parsing or normalization of language codes. + +:py:attr:`~feed.rights_detail.language` may come from the element's xml:lang +attribute, or it may inherit from a parent element's xml:lang, or the +Content-Language :abbr:`HTTP (Hypertext Transfer Protocol)` header. If the +feed does not specify a language, :py:attr:`~feed.rights_detail.language` will +be ``None``, the :program:`Python` null value. + + +:py:attr:`feed.rights_detail.base` +---------------------------------- + +The original base :abbr:`URI (Uniform Resource Identifier)` for links within +the feed copyright. + +:py:attr:`~feed.rights_detail.base` is only useful in rare situations and can +usually be ignored. It is the original base :abbr:`URI (Uniform Resource +Identifier)` for this value, as specified by the element's xml:base attribute, +or a parent element's xml:base, or the appropriate :abbr:`HTTP (Hypertext +Transfer Protocol)` header, or the :abbr:`URI (Uniform Resource Identifier)` of +the feed. (See :ref:`advanced.base` for more details.) By the time you see +it, :program:`Universal Feed Parser` has already resolved relative links in all +values where it makes sense to do so. *Clients should never need to manually +resolve relative links.* + + +.. rubric:: Comes from + +* /atom03:feed/atom03:copyright +* /atom10:feed/atom10:rights +* /rdf:RDF/rdf:channel/dc:rights +* /rss/channel/copyright +* /rss/channel/dc:rights + + +.. seealso:: + + * :ref:`reference.feed.rights` diff -Nru feedparser-5.0.1/docs/reference-feed-subtitle.rst feedparser-5.1.2/docs/reference-feed-subtitle.rst --- feedparser-5.0.1/docs/reference-feed-subtitle.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-subtitle.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,31 @@ +.. _reference.feed.subtitle: + +:py:attr:`feed.subtitle` +======================== + +A subtitle, tagline, slogan, or other short description of the feed. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, it is :ref:`sanitized +` by default. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, certain (X)HTML elements within this +value may contain relative :abbr:`URI (Uniform Resource Identifier)`s. If so, +they are :ref:`resolved according to a set of rules `. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:tagline +* /atom10:feed/atom10:subtitle +* /rdf:RDF/rdf:channel/dc:description +* /rdf:RDF/rdf:channel/rdf:description +* /rss/channel/dc:description +* /rss/channel/description +* /rss/channel/itunes:subtitle + + +.. seealso:: + + * :ref:`reference.feed.subtitle_detail` diff -Nru feedparser-5.0.1/docs/reference-feed-subtitle_detail.rst feedparser-5.1.2/docs/reference-feed-subtitle_detail.rst --- feedparser-5.0.1/docs/reference-feed-subtitle_detail.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-subtitle_detail.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,100 @@ +.. _reference.feed.subtitle_detail: + +:py:attr:`feed.subtitle_detail` +=============================== + +A dictionary with details about the feed subtitle. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:tagline +* /atom10:feed/atom10:subtitle +* /rdf:RDF/rdf:channel/dc:description +* /rdf:RDF/rdf:channel/rdf:description +* /rss/channel/dc:description +* /rss/channel/description +* /rss/channel/itunes:subtitle + + +.. seealso:: + + * :ref:`reference.feed.subtitle` + + +.. _reference.feed.subtitle_detail.value: + +:py:attr:`feed.subtitle_detail.value` +------------------------------------- + +Same as :ref:`reference.feed.subtitle`. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, it is :ref:`sanitized +` by default. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, certain (X)HTML elements within this +value may contain relative :abbr:`URI (Uniform Resource Identifier)`\s. If so, +they are :ref:`resolved according to a set of rules `. + + +.. _reference.feed.subtitle_detail.type: + +:py:attr:`feed.subtitle_detail.type` +------------------------------------ + +The content type of the feed subtitle. + +Most likely values for :py:attr:`~feed.subtitle_detail.type`: + +* :mimetype:`text/plain` +* :mimetype:`text/html` +* :mimetype:`application/xhtml+xml` + +For Atom feeds, the content type is taken from the type attribute, which +defaults to :mimetype:`text/plain` if not specified. For :abbr:`RSS (Rich Site +Summary)` feeds, the content type is auto-determined by inspecting the content, +and defaults to :mimetype:`text/html`. Note that this may cause silent data +loss if the value contains plain text with angle brackets. There is nothing I +can do about this problem; it is a limitation of :abbr:`RSS (Rich Site +Summary)`. + +Future enhancement: some versions of :abbr:`RSS (Rich Site Summary)` clearly +specify that certain values default to :mimetype:`text/plain`, and +:program:`Universal Feed Parser` should respect this, but it doesn't yet. + + +:py:attr:`feed.subtitle_detail.language` +---------------------------------------- + +The language of the feed subtitle. + +:py:attr:`~feed.subtitle_detail.language` is supposed to be a language code, as +specified by `:abbr:`RFC (Request For Comments)` 3066 +`_, but publishers have been known to +publish random values like "English" or "German". :program:`Universal Feed +Parser` does not do any parsing or normalization of language codes. + +:py:attr:`~feed.subtitle_detail.language` may come from the element's xml:lang +attribute, or it may inherit from a parent element's xml:lang, or the +Content-Language :abbr:`HTTP (Hypertext Transfer Protocol)` header. If the +feed does not specify a language, :py:attr:`~feed.subtitle_detail.language` +will be ``None``, the :program:`Python` null value. + + +:py:attr:`feed.subtitle_detail.base` +------------------------------------ + +The original base :abbr:`URI (Uniform Resource Identifier)` for links within +the feed subtitle. + +:py:attr:`~feed.subtitle_detail.base` is only useful in rare situations and can +usually be ignored. It is the original base :abbr:`URI (Uniform Resource +Identifier)` for this value, as specified by the element's xml:base attribute, +or a parent element's xml:base, or the appropriate :abbr:`HTTP (Hypertext +Transfer Protocol)` header, or the :abbr:`URI (Uniform Resource Identifier)` of +the feed. (See :ref:`advanced.base` for more details.) By the time you see +it, :program:`Universal Feed Parser` has already resolved relative links in all +values where it makes sense to do so. *Clients should never need to manually +resolve relative links.* diff -Nru feedparser-5.0.1/docs/reference-feed-tags.rst feedparser-5.1.2/docs/reference-feed-tags.rst --- feedparser-5.0.1/docs/reference-feed-tags.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-tags.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,46 @@ +.. _reference.feed.tags: + +:py:attr:`feed.tags` +==================== + +A list of dictionaries that contain details of the categories for the feed. + + +.. note:: + + Prior to version 4.0, :program:`Universal Feed Parser` exposed categories in + ``feed.category`` (the primary category) and ``feed.categories`` (a list of + tuples containing the domain and term of each category). These uses are still + supported for backward compatibility, but you will not see them in the parsed + results unless you explicitly ask for them. + + +.. _reference.feed.tags.term: + +:py:attr:`feed.tags[i].term` +---------------------------- + +The category term (keyword). + + +:py:attr:`feed.tags[i].scheme` +------------------------------ + +The category scheme (domain). + + +:py:attr:`feed.tags[i].label` +----------------------------- + +A human-readable label for the category. + + +.. rubric:: Comes from + +* /atom03:feed/dc:subject +* /atom10:feed/category +* /rdf:RDF/rdf:channel/dc:subject +* /rss/channel/category +* /rss/channel/dc:subject +* /rss/channel/itunes:category +* /rss/channel/itunes:keywords diff -Nru feedparser-5.0.1/docs/reference-feed-textinput.rst feedparser-5.1.2/docs/reference-feed-textinput.rst --- feedparser-5.0.1/docs/reference-feed-textinput.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-textinput.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,75 @@ +:py:attr:`feed.textinput` +========================= + +A text input form. No one actually uses this. Why are you? + + +.. _reference.feed.textinput.title: + +:py:attr:`feed.textinput.title` +------------------------------- + +The title of the text input form, which would go in the value attribute of the +form's submit button. + + +.. _reference.feed.textinput.link: + +:py:attr:`feed.textinput.link` +------------------------------ + +The link of the script which processes the text input form, which would go in +the action attirbute of the form. + +If this is a relative :abbr:`URI (Uniform Resource Identifier)`, it is +:ref:`resolved according to a set of rules `. + + +.. _reference.feed.textinput.name: + +:py:attr:`feed.textinput.name` +------------------------------ + +The name of the text input box in the form, which would go in the name +attribute of the form's input box. + + +.. _reference.feed.textinput.description: + +:py:attr:`feed.textinput.description` +------------------------------------- + +A short description of the text input form, which would go in the label element +of the form. + + +.. rubric:: Annotated example + +This is a text input in a feed: +:: + + + + Go! + http://example.org/search + keyword + Search this site: + + + +This is how it could be rendered in :abbr:`HTML (HyperText Markup Language)`: +:: + + +
+ + + +
+ + +.. rubric:: Comes from + +* /rdf:RDF/rdf:textinput +* /rss/channel/textInput +* /rss/channel/textinput diff -Nru feedparser-5.0.1/docs/reference-feed-title.rst feedparser-5.1.2/docs/reference-feed-title.rst --- feedparser-5.0.1/docs/reference-feed-title.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-title.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,30 @@ +.. _reference.feed.title: + +:py:attr:`feed.title` +===================== + +The title of the feed. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, it is :ref:`sanitized +` by default. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, certain (X)HTML elements within this +value may contain relative :abbr:`URI (Uniform Resource Identifier)`s. If so, +they are :ref:`resolved according to a set of rules `. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:title +* /atom10:feed/atom10:title +* /rdf:RDF/rdf:channel/dc:title +* /rdf:RDF/rdf:channel/rdf:title +* /rss/channel/dc:title +* /rss/channel/title + + +.. seealso:: + + * :ref:`reference.feed.title_detail` diff -Nru feedparser-5.0.1/docs/reference-feed-title_detail.rst feedparser-5.1.2/docs/reference-feed-title_detail.rst --- feedparser-5.0.1/docs/reference-feed-title_detail.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-title_detail.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,101 @@ +.. _reference.feed.title_detail: + +:py:attr:`feed.title_detail` +============================ + +A dictionary with details about the feed title. + + +.. _reference.feed.title_detail.value: + +:py:attr:`feed.title_detail.value` +---------------------------------- + +Same as :ref:`reference.feed.title`. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, it is :ref:`sanitized +` by default. + +If this contains :abbr:`HTML (HyperText Markup Language)` or :abbr:`XHTML +(Extensible HyperText Markup Language)`, certain (X)HTML elements within this +value may contain relative :abbr:`URI (Uniform Resource Identifier)`\s. If so, +they are :ref:`resolved according to a set of rules `. + + +.. _reference.feed.title_detail.type: + +:py:attr:`feed.title_detail.type` +--------------------------------- + +The content type of the feed title. + +Most likely values for :py:attr:`~feed.title_detail.type`: + +* :mimetype:`text/plain` +* :mimetype:`text/html` +* :mimetype:`application/xhtml+xml` + +For Atom feeds, the content type is taken from the type attribute, which +defaults to :mimetype:`text/plain` if not specified. For :abbr:`RSS (Rich Site +Summary)` feeds, the content type is auto-determined by inspecting the content, +and defaults to :mimetype:`text/html`. Note that this may cause silent data +loss if the value contains plain text with angle brackets. There is nothing I +can do about this problem; it is a limitation of :abbr:`RSS (Rich Site +Summary)`. + +Future enhancement: some versions of :abbr:`RSS (Rich Site Summary)` clearly +specify that certain values default to :mimetype:`text/plain`, and +:program:`Universal Feed Parser` should respect this, but it doesn't yet. + + +.. _reference.feed.title_detail.language: + +:py:attr:`feed.title_detail.language` +------------------------------------- + +The language of the feed title. + +:py:attr:`~feed.title_detail.language` is supposed to be a language code, as +specified by `:abbr:`RFC (Request For Comments)` 3066 +`_, but publishers have been known to +publish random values like "English" or "German". :program:`Universal Feed +Parser` does not do any parsing or normalization of language codes. + +:py:attr:`~feed.title_detail.language` may come from the element's xml:lang +attribute, or it may inherit from a parent element's xml:lang, or the +Content-Language :abbr:`HTTP (Hypertext Transfer Protocol)` header. If the +feed does not specify a language, :py:attr:`~feed.title_detail.language` will +be ``None``, the :program:`Python` null value. + + +:py:attr:`feed.title_detail.base` +--------------------------------- + +The original base :abbr:`URI (Uniform Resource Identifier)` for links within +the feed title. + +:py:attr:`~feed.title_detail.base` is only useful in rare situations and can +usually be ignored. It is the original base :abbr:`URI (Uniform Resource +Identifier)` for this value, as specified by the element's xml:base attribute, +or a parent element's xml:base, or the appropriate :abbr:`HTTP (Hypertext +Transfer Protocol)` header, or the :abbr:`URI (Uniform Resource Identifier)` of +the feed. (See :ref:`advanced.base` for more details.) By the time you see +it, :program:`Universal Feed Parser` has already resolved relative links in all +values where it makes sense to do so. *Clients should never need to manually +resolve relative links.* + + +.. rubric:: Comes from + +* /atom03:feed/atom03:title +* /atom10:feed/atom10:title +* /rdf:RDF/rdf:channel/dc:title +* /rdf:RDF/rdf:channel/rdf:title +* /rss/channel/dc:title +* /rss/channel/title + + +.. seealso:: + + * :ref:`reference.feed.title` diff -Nru feedparser-5.0.1/docs/reference-feed-ttl.rst feedparser-5.1.2/docs/reference-feed-ttl.rst --- feedparser-5.0.1/docs/reference-feed-ttl.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-ttl.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,22 @@ +.. _reference.feed.ttl: + +:py:attr:`feed.ttl` +=================== + +According to the :abbr:`RSS (Rich Site Summary)` specification, "None" + +No one is quite sure what this means, and no one publishes feeds via +file-sharing networks. + +Some clients have interpreted this element to be some sort of inline caching +mechanism, albeit one that completely ignores the underlying :abbr:`HTTP +(Hypertext Transfer Protocol)` protocol, its robust caching mechanisms, and the +huge amount of :abbr:`HTTP (Hypertext Transfer Protocol)`-savvy network +infrastructure that understands them. Given the vague documentation, it is +impossible to say that this interpretation is any more ridiculous than the +element itself. + + +.. rubric:: Comes from + +* /rss/channel/ttl diff -Nru feedparser-5.0.1/docs/reference-feed-updated.rst feedparser-5.1.2/docs/reference-feed-updated.rst --- feedparser-5.0.1/docs/reference-feed-updated.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-updated.rst 2012-02-19 21:50:48.000000000 +0000 @@ -0,0 +1,41 @@ +.. _reference.feed.updated: + +:py:attr:`feed.updated` +======================= + +The date the feed was last updated, as a string in the same format as it was +published in the original feed. + +This element is :ref:`parsed as a date ` and stored in +:ref:`reference.feed.updated_parsed`. + + +.. note:: + + As of version 5.1.1, if this key doesn't exist but + :py:attr:`feed.published` does, the value of + :py:attr:`feed.published` will be returned. + + In the past the RSS pubDate element was stored in `updated`, but this incorrect + behavior was reported in issue 310. However, developers may have come to rely + on this incorrect behavior -- as was reported in issue 328 -- so to help avoid + hurting their users' experience, this mapping from `updated` to `published` was + temporarily introduced to give developers time to update their software, and to + give users time to upgrade. + + This mapping is temporary and will be removed in a future version of + feedparser. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:modified +* /atom10:feed/atom10:updated +* /rdf:RDF/rdf:channel/dc:date +* /rdf:RDF/rdf:channel/dcterms:modified +* /rss/channel/dc:date + + +.. seealso:: + + * :ref:`reference.feed.updated_parsed` diff -Nru feedparser-5.0.1/docs/reference-feed-updated_parsed.rst feedparser-5.1.2/docs/reference-feed-updated_parsed.rst --- feedparser-5.0.1/docs/reference-feed-updated_parsed.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed-updated_parsed.rst 2012-02-19 21:50:48.000000000 +0000 @@ -0,0 +1,37 @@ +.. _reference.feed.updated_parsed: + +:py:attr:`feed.updated_parsed` +============================== + +The date the feed was last updated, as a standard :program:`Python` 9-tuple. + + +.. note:: + + As of version 5.1.1, if this key doesn't exist but + :py:attr:`feed.published_parsed` does, the value of + :py:attr:`feed.published_parsed` will be returned. + + In the past the RSS pubDate element was stored in `updated`, but this incorrect + behavior was reported in issue 310. However, developers may have come to rely + on this incorrect behavior -- as was reported in issue 328 -- so to help avoid + hurting their users' experience, this mapping from `updated_parsed` to + `published_parsed` was temporarily introduced to give developers time to update + their software, and to give users time to upgrade. + + This mapping is temporary and will be removed in a future version of + feedparser. + + +.. rubric:: Comes from + +* /atom03:feed/atom03:modified +* /atom10:feed/atom10:updated +* /rdf:RDF/rdf:channel/dc:date +* /rdf:RDF/rdf:channel/dcterms:modified +* /rss/channel/dc:date + + +.. seealso:: + + * :ref:`reference.feed.updated` diff -Nru feedparser-5.0.1/docs/reference-feed.rst feedparser-5.1.2/docs/reference-feed.rst --- feedparser-5.0.1/docs/reference-feed.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-feed.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,17 @@ +:py:attr:`feed` +=============== + +A dictionary of data about the feed. + + +.. rubric:: Comes from + +* /atom03:feed +* /atom10:feed +* /rdf:RDF/rdf:channel +* /rss/channel + + +.. tip:: + + This element always exists, although it may be an empty dictionary. diff -Nru feedparser-5.0.1/docs/reference-headers.rst feedparser-5.1.2/docs/reference-headers.rst --- feedparser-5.0.1/docs/reference-headers.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-headers.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,11 @@ +:py:attr:`headers` +================== + +A dictionary of all the :abbr:`HTTP (Hypertext Transfer Protocol)` headers +received from the web server when retrieving the feed. + +.. tip:: + + :py:attr:`headers` will only be present if the feed was retrieved from a web + server. If the feed was parsed from a local file or from a string in memory, + :py:attr:`headers` will not be present. diff -Nru feedparser-5.0.1/docs/reference-href.rst feedparser-5.1.2/docs/reference-href.rst --- feedparser-5.0.1/docs/reference-href.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-href.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,13 @@ +:py:attr:`href` +=============== + +The final :abbr:`URL (Uniform Resource Locator)` of the feed that was parsed. + +If the feed was redirected from the original requested address, :py:attr:`href` +will contain the final (redirected) address. + +.. tip:: + + :py:attr:`href` will only be present if the feed was retrieved from a web + server. If the feed was parsed from a local file or from a string in memory, + :py:attr:`href` will not be present. diff -Nru feedparser-5.0.1/docs/reference-modified.rst feedparser-5.1.2/docs/reference-modified.rst --- feedparser-5.0.1/docs/reference-modified.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-modified.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,15 @@ +:py:attr:`modified` +=================== + +The last-modified date of the feed, as specified in the +:abbr:`HTTP (Hypertext Transfer Protocol)` headers. + +The purpose of :py:attr:`modified` is explained more fully in :ref:`http.etag`. + +.. tip:: + + :py:attr:`modified` will only be present if the feed was retrieved from a web + server, and only if the web server provided a Last-Modified + :abbr:`HTTP (Hypertext Transfer Protocol)` header for the feed. If the feed + was parsed from a local file or from a string in memory, :py:attr:`modified` + will not be present. diff -Nru feedparser-5.0.1/docs/reference-namespaces.rst feedparser-5.1.2/docs/reference-namespaces.rst --- feedparser-5.0.1/docs/reference-namespaces.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-namespaces.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,19 @@ +.. _reference.namespaces: + +:py:attr:`namespaces` +===================== + +A dictionary of all :abbr:`XML (Extensible Markup Language)` namespaces defined +in the feed, as ``{prefix: namespaceURI}``. + +.. note:: + + The prefixes listed in the :py:attr:`namespaces` dictionary may not match the + prefixes defined in the original feed. See :ref:`advanced.namespaces` for more + details. + +.. tip:: + + This element always exists, although it may be an empty dictionary if the feed + does not define any namespaces (such as an :abbr:`RSS (Rich Site Summary)` 2.0 + feed with no extensions). diff -Nru feedparser-5.0.1/docs/reference-status.rst feedparser-5.1.2/docs/reference-status.rst --- feedparser-5.0.1/docs/reference-status.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-status.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,22 @@ +:py:attr:`status` +================= + +The :abbr:`HTTP (Hypertext Transfer Protocol)` status code that was returned by +the web server when the feed was fetched. + +If the feed was redirected from its original :abbr:`URL (Uniform Resource Locator)`, +:py:attr:`status` will contain the redirect status code, not the final status +code. + +If :py:attr:`status` is ``301``, the feed was permanently redirected to a new +:abbr:`URL (Uniform Resource Locator)`. Clients should update their address +book to request the new :abbr:`URL (Uniform Resource Locator)` from now on. + +If :py:attr:`status` is ``410``, the feed is gone. Clients should stop polling the +feed. + +.. tip:: + + :py:attr:`status` will only be present if the feed was retrieved from a web + server. If the feed was parsed from a local file or from a string in memory, + :py:attr:`status` will not be present. diff -Nru feedparser-5.0.1/docs/reference-version.rst feedparser-5.1.2/docs/reference-version.rst --- feedparser-5.0.1/docs/reference-version.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference-version.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,39 @@ +.. _reference.version: + +:py:attr:`version` +================== + +The format and version of the feed. + +Here is the complete list of known feed types and versions that may be returned in :py:attr:`version`: + +============ ==================================================================================== +``atom`` Atom (unknown or unrecognized version) +``atom01`` `Atom 0.1 `_ +``atom02`` `Atom 0.2 `_ +``atom03`` `Atom 0.3 `_ +``atom10`` `Atom 1.0 `_ +``cdf`` `CDF `_ +``rss`` :abbr:`RSS (Rich Site Summary)` (unknown or unrecognized version) +``rss090`` `RSS 0.90 `_ +``rss091n`` `Netscape RSS 0.91 `_ +``rss091u`` `Userland RSS 0.91 `_ +``rss092`` `RSS 0.92 `_ +``rss093`` `RSS 0.93 `_ +``rss094`` :abbr:`RSS (Rich Site Summary)` 0.94 (no accurate specification is known to exist) +``rss10`` `RSS 1.0 `_ +``rss20`` `RSS 2.0 `_ +============ ==================================================================================== + +If the feed type is completely unknown, :py:attr:`version` will be an empty string. + +.. tip:: + + This element always exists, although it may be an empty string if the version + can not be determined. + +.. seealso:: + + `The Myth of RSS compatibility `_ + Mark Pilgrim's excellent analysis of the extraordinary variety of + incompatibilities each version of "RSS" introduced. diff -Nru feedparser-5.0.1/docs/reference.rst feedparser-5.1.2/docs/reference.rst --- feedparser-5.0.1/docs/reference.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/reference.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,10 @@ +.. _reference: + +Reference +######### + +.. toctree:: + :maxdepth: 2 + :glob: + + reference-* diff -Nru feedparser-5.0.1/docs/resolving-relative-links.rst feedparser-5.1.2/docs/resolving-relative-links.rst --- feedparser-5.0.1/docs/resolving-relative-links.rst 1970-01-01 00:00:00.000000000 +0000 +++ feedparser-5.1.2/docs/resolving-relative-links.rst 2012-02-16 07:29:25.000000000 +0000 @@ -0,0 +1,268 @@ +.. _advanced.base: + +Relative Link Resolution +======================== + +Many feed elements and attributes are :abbr:`URI (Uniform Resource Identifier)`\s. +:program:`Universal Feed Parser` resolves relative :abbr:`URI (Uniform Resource Identifier)`\s +according to the `XML:Base `_ specification. We'll see how +that works in a minute, but first let's talk about which values are treated as +:abbr:`URI (Uniform Resource Identifier)`\s. + + +Which Values Are :abbr:`URI (Uniform Resource Identifier)`\s +------------------------------------------------------------ + +These feed elements are treated as :abbr:`URI (Uniform Resource Identifier)`\s, +and resolved if they are relative: + +* :ref:`reference.entry.author_detail.href` +* :ref:`reference.entry.comments` +* :ref:`reference.entry.contributors.href` +* :ref:`reference.entry.enclosures.href` +* :ref:`reference.entry.id` +* :ref:`reference.entry.license` +* :ref:`reference.entry.link` +* :ref:`reference.entry.links.href` +* :ref:`reference.entry.publisher_detail.href` +* :ref:`reference.entry.source.author_detail.href` +* :ref:`reference.entry.source.contributors.href` +* :ref:`reference.entry.source.links.href` +* :ref:`reference.feed.author_detail.href` +* :ref:`reference.feed.contributors.href` +* :ref:`reference.feed.docs` +* :ref:`reference.feed.generator_detail.href` +* :ref:`reference.feed.id` +* :ref:`reference.feed.image.href` +* :ref:`reference.feed.image.link` +* :ref:`reference.feed.license` +* :ref:`reference.feed.link` +* :ref:`reference.feed.links.href` +* :ref:`reference.feed.publisher_detail.href` +* :ref:`reference.feed.textinput.link` + +In addition, several feed elements may contain :abbr:`HTML (HyperText Markup Language)` +or :abbr:`XHTML (Extensible HyperText Markup Language)` markup. Certain elements and +attributes in :abbr:`HTML (HyperText Markup Language)` can be relative +:abbr:`URI (Uniform Resource Identifier)`\s, and :program:`Universal Feed Parser` will +resolve these :abbr:`URI (Uniform Resource Identifier)`\s according to the same rules +as the feed elements listed above. + + +These feed elements may contain :abbr:`HTML (HyperText Markup Language)` or +:abbr:`XHTML (Extensible HyperText Markup Language)` markup. In Atom feeds, +whether these elements are treated as :abbr:`HTML (HyperText Markup Language)` +depends on the value of the type attribute. In :abbr:`RSS (Rich Site Summary)` +feeds, these values are always treated as :abbr:`HTML (HyperText Markup Language)`. + + +* :ref:`reference.entry.content.value` +* :ref:`reference.entry.summary` (:ref:`reference.entry.summary_detail.value`) +* :ref:`reference.entry.title` (:ref:`reference.entry.title_detail.value`) +* :ref:`reference.feed.info` (:ref:`reference.feed.info_detail.value`) +* :ref:`reference.feed.rights` (:ref:`reference.feed.rights_detail.value`) +* :ref:`reference.feed.subtitle` (:ref:`reference.feed.subtitle_detail.value`) +* :ref:`reference.feed.title` (:ref:`reference.feed.title_detail.value`) + + +When any of these feed elements contains :abbr:`HTML (HyperText Markup Language)` +or :abbr:`XHTML (Extensible HyperText Markup Language)` markup, the +following :abbr:`HTML (HyperText Markup Language)` elements are treated as +:abbr:`URI (Uniform Resource Identifier)`\s and are resolved if they are +relative: + + +* +* +* +*
+* +* +*
+* +* +* +*