RuntimeError in the private xmlrpc server

Bug #198368 reported by Diogo Matsubara
260
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Critical
Barry Warsaw

Bug Description

A RuntimeError was raised by the internal xmlrpc server
The exception value is "Failed to create unique file name in /var/tmp/launchpad_mailqueue/tmp, are we under a DoS attack?"

OOPS-793XMLP1
OOPS-830XMLP15

Changed in launchpad:
importance: Undecided → High
Revision history for this message
Barry Warsaw (barry) wrote :

Matsubara,

This looks like a lower-level problem in Zope. I looked at the code (in zope/app/mail/maildir.py) and it looks pretty unlikely that we'd have a legitimate name collision. The code tries 1000 times with a random number appended to the timestamp, pid, and host.

One problem is that Zope is being too greedy on catching OSErrors and too stingy on logging exactly what error is happening. So we don't really know what's happening but I strongly suspect a permissions or ownership issue.

Revision history for this message
Barry Warsaw (barry) wrote :

Looks like it was a permission problem after all. Herb fixed the perms on asuka and we got no more oopses after I created a new mailing list on edge.

However, I did not get the notification email on edge. Maris is looking into that problem.

Changed in launchpad:
assignee: nobody → barry
status: New → Fix Released
Revision history for this message
Diogo Matsubara (matsubara) wrote : Re: [Bug 198368] Re: RuntimeError in the private xmlrpc server

On Tue, Mar 04, 2008 at 03:36:47PM -0000, Barry Warsaw wrote:
> Looks like it was a permission problem after all. Herb fixed the perms
> on asuka and we got no more oopses after I created a new mailing list on
> edge.

I think Barry means: s/asuka/arsenic/ :-)

>
> However, I did not get the notification email on edge. Maris is looking
> into that problem.
>
>
> ** Changed in: launchpad
> Assignee: (unassigned) => Barry Warsaw (barry)
> Status: New => Fix Released
>
> --
> RuntimeError in the private xmlrpc server
> https://bugs.launchpad.net/bugs/198368
> You received this bug notification because you are a direct subscriber
> of the bug.

--
Diogo M. Matsubara

Revision history for this message
Diogo Matsubara (matsubara) wrote :

This one was triggered again yesterday: OOPS-821XMLP1

Was the permission reset?

Revision history for this message
Diogo Matsubara (matsubara) wrote :

Hi Barry, I'm re-opening as we had another occurance of the OOPS. I've asked herb to check arsenic's permissions and they're correct.
Can you check this out?
Thanks

Changed in launchpad:
status: Fix Released → Confirmed
Revision history for this message
Barry Warsaw (barry) wrote :

I'll take a look asap, though we might not be able to find out exactly what's going on without hacking the Zope code. This error comes from deep inside Zope and it just doesn't log enough information to pinpoint the error. I don't know what our policy is on that, but we may have to cowboy something in to get more information.

description: updated
Revision history for this message
Barry Warsaw (barry) wrote :

I'm raising the importance because it's now clear that this is blocking mailing lists on production. All the mailman side issues appear to have been fixed with the latest cherry picks, but because this bug is happening in xmlrpc-private, LP is not being informed by mailman that all is well. We are going to have to cowboy in some changes to Zope to improve the logging and then figure out what's going on from there.

Changed in launchpad:
importance: High → Critical
milestone: none → 1.2.4
Barry Warsaw (barry)
Changed in launchpad:
status: Confirmed → In Progress
Revision history for this message
Barry Warsaw (barry) wrote :

Okay, after running a hacked maildir.py and watching what happens, I have a new hypothesis. The enhanced log shows that we are running out of file descriptors.

Maildir.newMessage() opens files and passes them to MaildirMessageWriter, but the latter does not close the files until commit() or abort(). I added some extra debugging and it's clear that for some list notifications, we're seeing tons of opens before we see a close. Well, what if we're trying to notify more people than we have file descriptors available, and every notification generates a new maildir file? Zope will just happily exhaust fds before it hits two phase commit or abort. This seems like a fundamental flaw somewhere <wink>.

I can think of a couple of ways out of this, though I'm not sure yet what's feasible in Zope. You could chunk up the transactions to a small number of recipients, and deliver to that chunk then commit(). This could mean that other errors will cause some recips to get notifications and others not, so we'd have to worry about any duplications during recovery.

Maybe we have to change the protocol so that the fd is closed sooner than commit() or abort(), since that's always going to happen anyway. The client of MaildirMessageWriter has to know when that will happen though.

Seems like we're probably going to have to patch Zope though to fix this.

Revision history for this message
Barry Warsaw (barry) wrote :

Looks like Zope trunk addressed this very issue im r76463

http://svn.zope.org/zope.sendmail/trunk/src/zope/sendmail/maildir.py?rev=76463&view=log

Revision history for this message
Barry Warsaw (barry) wrote :
Revision history for this message
Barry Warsaw (barry) wrote :

This zope change has been backported, reviewed, and is now working its way through pqm.

Barry Warsaw (barry)
Changed in launchpad:
status: In Progress → Fix Committed
Barry Warsaw (barry)
Changed in launchpad:
status: Fix Committed → Fix Released
Curtis Hovey (sinzui)
visibility: private → public
To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.