Bogus SystemID in XHTML catalog makes org.apache.xml.resolver fail

Bug #400259 reported by Dominique Hazael-Massieux
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
w3c-dtd-xhtml (Debian)
Fix Released
Unknown
w3c-dtd-xhtml (Ubuntu)
New
Wishlist
Unassigned

Bug Description

Binary package hint: w3c-dtd-xhtml

The XML-catalog file that points to the local XHTML dtds provided by the package is using the following doctype:
<!DOCTYPE catalog PUBLIC "-//GlobalTransCorp//DTD XML Catalogs V1.0-Based Extension V1.0//EN"
  "http://globaltranscorp.org/oasis/catalog/xml/tr9401.dtd">

The URL used for the SystemId of that doctype ("http://globaltranscorp.org/oasis/catalog/xml/tr9401.dtd") does not exist any more.

When trying to use /etc/xml/catalog as the catalog for DTD resolutions with org.apache.xml.resolver, the local resolution fails:
        java -cp /usr/share/java/xml-commons-resolver-1.1.jar org.apache.xml.resolver.apps.resolver -d 2 -c /etc/xml/catalog -p "-//W3C//DTD XHTML 1.0 Strict//EN" public
        Cannot find CatalogManager.properties
        Loading catalog: ./xcatalog
        Loading catalog: /etc/xml/catalog
        Resolve PUBLIC (publicid, systemid):
          public id: -//W3C//DTD XHTML 1.0 Strict//EN
        Switching to delegated catalog(s):
         file:/etc/xml/w3c-dtd-xhtml.xml
        Loading catalog: file:/etc/xml/w3c-dtd-xhtml.xml
        Switching to delegated catalog(s):
         file:/usr/share/xml/xhtml/schema/dtd/1.0/catalog.xml
        Loading catalog: file:/usr/share/xml/xhtml/schema/dtd/1.0/catalog.xml
        Exception in thread "main" java.net.UnknownHostException: globaltranscorp.org
        [...]

This means that any XML application relying on org.apache.xml.resolver that needs to resolve DTDs (i.e. many of them) won't do the local resolution, and will thus hit the W3C Web site (cf
http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic )

A simple fix to that problem is to replace the current systemId by "/usr/share/xml/schema/xml-core/tr9401.dtd" (provided by the xml-core package).

(this might be the same bug as #390604 but I'm not sure )

Revision history for this message
Dominique Hazael-Massieux (dominique-hazael-massieux) wrote :
Changed in w3c-dtd-xhtml (Ubuntu):
importance: Undecided → Wishlist
Revision history for this message
Dominique Hazael-Massieux (dominique-hazael-massieux) wrote :

Err - sorry, why wishlist ? This is a bug report, not an enhancement request.

Revision history for this message
Dominique Hazael-Massieux (dominique-hazael-massieux) wrote :

Again, could this bug's importance please be re-qualified? It is not a wishlist item, it is an actual bug report (with a patch, too).

Revision history for this message
ljs (ljs) wrote :

The file: url in the patch has too many slashes. I would suggest plain "/usr/share/xml/schema/xml-core/tr9401.dtd" as the system identifier.

Arguably this is a bug in org.apache.xml.resolver, cf. http://www.oasis-open.org/committees/download.php/14809/xml-catalogs.html#s.bootstrap - the resolver should be able to parse catalog files without needing to resolve external entities

Revision history for this message
jablko (ms419) wrote :
Revision history for this message
Bertails (bertails) wrote :

This bug is still there in Ubuntu 10.04.

Please someone accept to apply the patch.

Revision history for this message
Bruce Merry (bmerry) wrote :

This still affects Ubuntu 12.04.3. I've attached a patch against w3c-dtd-xhtml_1.1-5ubuntu1.diff, which should make this easier to fix.

And I agree with comment #2 - regardless of which package is at fault, resolution is failing, so it is a bug in the Ubuntu system.

Revision history for this message
Bruce Merry (bmerry) wrote :

This bug also affects 13.10, and is causing the maven-xml-plugin to fail (which wasn't the cause in 12.04). I've attached a patch against the source tree for version 1.2-4.

Changed in w3c-dtd-xhtml (Debian):
status: Unknown → New
Changed in w3c-dtd-xhtml (Debian):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.