restricted librarian urls give a 404 if normalised (e.g. by apache, chromium, often shows up on private PPA build logs)

Bug #677270 reported by Alex Chiang
116
This bug affects 17 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
William Grant

Bug Description

The restricted librarian generates non-canonical form urls, these can then be changed by canonicalising clients / intermediaries. Changing the restricted librarian urls causes the token to not match and a 404 - file not found - is returned to the client.

Apache without the nocanon config option will canonicalise, and some browsers like Chrome are known to canonicalise too.

Fairly simple file names - 'foo+bar.txt' - will show this problem.

Workarounds
===========

Use Firefox, run apache with nocanon on proxypass rules. We are currently doing the latter in the Canonical datacentre.

Proposed solutions
==================

* Change the url generation in Launchpad to be canonicalised, then canonicalisation will not change the url and things will Just Work.

Related branches

Revision history for this message
Robert Collins (lifeless) wrote :

I see a 404 with your url (note that the i12345 urls are time limited, and will give anyone that can copy them access to the content until the time expires -(24 hours at the moment).

Try putting the url into wget and see if it works any better, I suspect that the librarian is sending wonky content-encoding headers for some reason.

affects: launchpad → launchpad-foundations
Changed in launchpad-foundations:
importance: Undecided → High
Revision history for this message
William Grant (wgrant) wrote :

It's a 404 page labelled as gzip-encoded, when it's in fact not compressed at all.

Curtis Hovey (sinzui)
Changed in launchpad-foundations:
status: New → Triaged
Revision history for this message
Rick McBride (rmcbride) wrote :

I'm getting the same behavior, I'm getting a
Error 330 (net::ERR_CONTENT_DECODING_FAILED): Unknown error.

wget returns a 404

Revision history for this message
Francis J. Lacoste (flacoste) wrote :

We turned the public restricted librarian feature off for now. This is an apache config level problem, I'll file a RT to get sorted out before we re-enable the feature.

Changed in launchpad-foundations:
status: Triaged → In Progress
assignee: nobody → Canonical LOSAs (canonical-losas)
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 677270] [NEW] restricted librarian broken, content decoding error

Thanks for snalyzing this. I'm a little surprised that it's
apache...what's causing the issue?

Revision history for this message
Francis J. Lacoste (flacoste) wrote : Re: restricted librarian broken, content decoding error

RT #42560 tracks the LOSA-side of thing.

Revision history for this message
Robert Collins (lifeless) wrote :

It seems to work on production atm in testing with spm.

Revision history for this message
Robert Collins (lifeless) wrote :

This isn't apache.

Revision history for this message
Robert Collins (lifeless) wrote :

Or rather, its not entirely apache. let me expand.

We see a 404 on the files that choose to break. The 404 has bad content-encoding headers, and we should start doing those headers in the librarian not in apache. I'll file a separate bug on that.

The failing URLs all have % escaped characters. The tokens that are being stored have
a stored path that fails:
/35262589/buildlog_ubuntu-karmic-i386.openproj_1.4-2%2Bpx1_FAILEDTOBUILD.txt.gz
original url
https://launchpad.net/~soyuz-team/+archive/ppa/+build/1334368/+files/buildlog_ubuntu-karmic-i386.openproj_1.4-2%2Bpx1_FAILEDTOBUILD.txt.gz

the librarian log shows
GET /35262589/buildlog_ubuntu-karmic-i386.openproj_1.4-2+px1_FAILEDTOBUILD.txt.gz?token=
(note the %2B is gone, + is present instead)

But the DB table has:

19:42 < spm> /35262589/buildlog_ubuntu-karmic-i386.openproj_1.4-2%2Bpx1_FAILEDTOBUILD.txt.gz | xxxxxxxxxxxxxxxxxxxxxxxxxx | 2010-11-26 06:26:33.243417

so the lookup fails, and *boom*. We shouldn't decode the url and query, because thats a liability with various escaping tricks that folk can play.

We've checked direct queries to the librarian with the right host, token and the %2B urls work - they do. So its squid or apache breaking things.

summary: - restricted librarian broken, content decoding error
+ apache/squid breaks restricted librarian on urls with percent encoded
+ characters.
description: updated
Revision history for this message
Robert Collins (lifeless) wrote : Re: apache/squid breaks restricted librarian on urls with percent encoded characters.
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 677270] Re: apache/squid breaks restricted librarian on urls with percent encoded characters.

Looks like using RewriteRule rather than proxypass, and setting the B
flag will work, per
http://httpd.apache.org/docs/current/mod/mod_rewrite.html.

Revision history for this message
Robert Collins (lifeless) wrote : Re: apache/squid breaks restricted librarian on urls with percent encoded characters.

I need to leave this - https://issues.apache.org/bugzilla/show_bug.cgi?id=32328#c12 is very relevant.

It seems like apache is known broken here, and there have been multiple attempts to fix it, but its fundamentally not designed as a proxy.

summary: - apache/squid breaks restricted librarian on urls with percent encoded
+ apache breaks restricted librarian on urls with percent encoded
characters.
description: updated
Revision history for this message
Robert Collins (lifeless) wrote : Re: apache breaks restricted librarian on urls with percent encoded characters.

elmo suggests that 'nocanon' in the proxypass rule will do what we need.

summary: - apache breaks restricted librarian on urls with percent encoded
- characters.
+ restricted librarian urls give a 404 if normalised (e.g. by apache,
+ chromium, often shows up on private PPA build logs)
Changed in launchpad:
assignee: Canonical LOSAs (canonical-losas) → nobody
status: In Progress → Triaged
description: updated
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Any fix in sight? This is marked "High".

Revision history for this message
Robert Collins (lifeless) wrote :

We're currently burning down a backlog of criticals; this is unlikely to get any attention before then. If you would like to submit a patch for this we'd be happy to find you a mentor for it.

Revision history for this message
Robert Collins (lifeless) wrote :

(Note that this really reflects a bug in chromium - but one we can work around, and be more inline with the rfc at the same time)

tags: added: easy
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Do you have a ticket for the chromium bug? They do make new releases frequently and we could perhaps get someone's attention over there.

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 677270] Re: restricted librarian urls give a 404 if normalised (e.g. by apache, chromium, often shows up on private PPA build logs)

On Thu, Jan 26, 2012 at 2:35 AM, Andreas Hasenack <email address hidden> wrote:
> Do you have a ticket for the chromium bug? They do make new releases
> frequently and we could perhaps get someone's attention over there.

AIUI it is deliberate on chromiums part, so we haven't opened a
ticket. As noted we can workaround it here by issuing canonicalised
urls, though that shouldn't be needed.

Would you like to open the ticket?

-Rob

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Sure, can you summarize the problem and say what chrome/chromium is doing differently from firefox? I'll be happy to open a ticket upstream and reference this bug even.

Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
Changed in launchpad:
assignee: nobody → William Grant (wgrant)
tags: added: qa-needstesting
Changed in launchpad:
status: Triaged → Fix Committed
William Grant (wgrant)
tags: added: qa-ok
removed: qa-needstesting
Colin Watson (cjwatson)
Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.