Bug #605775 “Loggerhead doesn't support linking to the raw conte...” : Bugs : loggerhead

John A Meinel (jameinel) on 2010-07-15

Changed in launchpad-code:
importance:	Undecided → Medium
status:	New → Confirmed
affects:	launchpad-code → loggerhead

Max Kanat-Alexander (mkanat) on 2010-12-03

Changed in loggerhead:
assignee:	nobody → Max Kanat-Alexander (mkanat)
status:	Confirmed → In Progress

Revision history for this message

Max Kanat-Alexander (mkanat) wrote on 2010-12-03:

#1

What I've done is I've added a "raw" controller that actually attempts to do its best to serve files with the right MIME type. It's remarkably fast--about 0.05 seconds to get a file's entire content and all associated information, even on a large branch like launchpad.

What I haven't yet done is added XSS protection, which is something I may do in a follow-up bug, since the work required for this was complex enough as it is.

Revision history for this message

Robert Collins (lifeless) wrote on 2010-12-03: Re: [Bug 605775] Re: Loggerhead doesn't support linking to the raw content

#2

The security implications are pretty big; this may not fly.
Considerations: browser content sniffing and hostile content. Access
to private branch data and so forth.

I can't dig into it now, but I urge you to socialise the design and
implications with folk on the larger Launchpad team - they've learnt
the hard lessons, the hard way.

Revision history for this message

Max Kanat-Alexander (mkanat) wrote on 2010-12-03:

#3

Hey Robert. I'm totally familiar with the security implications--the Bugzilla Project had a bug on this before most people on the Internet were even aware that it could possibly be a problem. What I'm saying is that the security implications will be dealt with in a follow-up bug, and in this bug they will not be.

This is not going to be deployed on Launchpad until it's secure for Launchpad, this is just going into loggerhead trunk, which is not going onto Launchpad.

Note that most loggerhead installations have no concerns whatsoever about XSS, though, BTW. There's nothing dangerous you could do to loggerhead itself, in most situations. LP happens to have private branches and credentials, so that's different.

Revision history for this message

Martin Pool (mbp) wrote on 2010-12-07:

#4

Some discussion on the mp https://code.launchpad.net/~mkanat/loggerhead/raw-controller/+merge/42675

I think we should (or should have) split the two parts of this bug.

1- urls based only on patch is clearly useful for many urls
2- downloads that are not attachments: we should be more clear about how this is supposed to be used.

Revision history for this message

Max Kanat-Alexander (mkanat) wrote on 2010-12-07:

#5

The only URL that uses file ids by default still is the download URL. The raw controller listed in that MP is using paths like all of the other controllers, and once that MP goes in, it will be very simple to also make the download URL use paths. (The general architecture of loggerhead also still allows using file ids in the query string parameters if you want.)

As far as how the raw view is supposed to be used, I suppose there are two cases:

1) Somebody wants to see the raw content of the file quickly without any of the view or annotation issues.
2) Somebody wants to use loggerhead to serve some content.

At first I thought that #2 wasn't going to be feasible, but now I've discovered that the raw view is so fast that it could actually be done.

For #1, you could certainly say, "just serve everything as text/plain", but that doesn't actually solve the problem of XSS, because IE 7 and below will still sniff the content and render it as whatever the browser *thinks* it is. I believe it's only IE 8 and above that support X-Content-Type-Options.

So I figured I'd go with the most logical choice and attempt to serve the content with its actual, correct MIME type. That's particularly valuable for binary files like images or other media, which couldn't have a raw view otherwise.

One advantage to this also would be that it gives people the ability to rapidly get a single file out of bzr without having to check out an entire repository.

For the most part, controlling the MIME type of a file is only the illusion of security, which is worse than no security (because it makes people believe that they are secure when they are not).

The solution to the XSS problem is very much doable, and it would just involve having a secondary domain for serving raw content. I started to implement it as part of the above MP, but it turned out to be more complicated than I was expecting, so I wanted to save it for a second patch, since patches should generally be small and focused so that they can be polished and debugged appropriately (among many other important reasons to keep changes small and focused).