bzr+ssh on launchpad should fork, not exec

Bug #660264 reported by Michael Hudson-Doyle
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Triaged
High
Unassigned

Bug Description

So as to start up more quickly.

Code is in db-stable, needs staging testing with init scripts and then RT to enable in prod (after the nov rollout)

Related branches

Revision history for this message
Launchpad QA Bot (lpqabot) wrote : Bug fixed by a commit
tags: added: qa-needstesting
Changed in launchpad-code:
assignee: nobody → John A Meinel (jameinel)
status: In Progress → Fix Committed
Revision history for this message
Robert Collins (lifeless) wrote :

04:08 < lifeless> mwhudson: http://launchpad.net/bugs/660264 is ok right, in the absence of config changes?
04:10 * mwhudson looks
04:11 < mwhudson> lifeless: it's ok in the sense of 'do no harm' indeed

tags: added: qa-ok
removed: qa-needstesting
tags: removed: qa-ok
description: updated
Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-needstesting
tags: added: qa-ok
removed: qa-needstesting
description: updated
Revision history for this message
Martin Pool (mbp) wrote :

I believe that the next tasks here are

 * losas must arrange for the lp-serve daemon to run on qastaging <https://rt.admin.canonical.com/Ticket/Display.html?id=42199>
 * once we're happy with that, change a configuration switch to make the conch server use it

Later, we want monitoring that the service continues running <https://rt.admin.canonical.com/Ticket/Display.html?id=41791> - perhaps we should split that to another bug too.

Revision history for this message
Martin Pool (mbp) wrote :

John tells me Francis put this on his 'to track list' and at priority 87.

Curtis Hovey (sinzui)
Changed in launchpad-code:
status: Fix Committed → Fix Released
Revision history for this message
John A Meinel (jameinel) wrote :

The code has landed and is in production, but it hasn't actually been activated. Should this actually be considered Released? If so, then should I be starting a new bug indicating that the fixes for this bug are not actually active in production yet?

Revision history for this message
Aaron Bentley (abentley) wrote :

I think that we mean "deployed" when we say "released".

Revision history for this message
John A Meinel (jameinel) wrote :

Perhaps, but then you are missing as state. Namely "in use" rather than "the code is sitting dormant on the production server".

IOW, the *bug* listed here: "bzr+ssh on launchpad should fork, not exec" has not been fixed. It is still execing. There is code which could be used to do this, but it is not active... I'm fine putting it in another bug if that seems more appropriate.

Revision history for this message
Aaron Bentley (abentley) wrote :

Yes, we are missing a state, but the "released but dormant" state is rare enough that I'm not sure adding that state would be a good idea.

Revision history for this message
Tim Penhey (thumper) wrote :

So... what do we need to do to test this on staging?

Changed in launchpad-code:
status: Fix Released → Triaged
importance: Undecided → High
Revision history for this message
John A Meinel (jameinel) wrote :

Get enough losa time to finish off:
https://rt.admin.canonical.com/Ticket/Display.html?id=42199
(private link)

We've been doing a bit of back and forth, but the lag time is usually measured in multiple days between responses. Though I think with a couple of hours of dedicated time we could finish it. It just requires stuff that I'm not familiar with on machines that I don't have access to (sysadmin work on the qastaging server). Not to mention that production setup isn't the same as development setup, etc.

Also, last week the losas were understaffed, so it got pushed back further. I haven't gotten any responses from losa ping today, but hopefully since I responded mthaddon will get another chance to look at it tonight.

Revision history for this message
John A Meinel (jameinel) wrote :

This is active on qastaging, but still waiting on LOSA time to roll it out into production. Not sure if that counts as "Fix Released" if it isn't deployed.

Changed in launchpad:
status: Triaged → In Progress
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 660264] Re: bzr+ssh on launchpad should fork, not exec

Whats the RT for activating it in production?

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 1/30/2011 5:16 PM, Robert Collins wrote:
> Whats the RT for activating it in production?
>

rt #43393 is about setting it up on staging, but I was going to co-opt
that one to make it about production. Still waiting on SPM to have time
to actually get to it, though.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk1HIhwACgkQJdeBCYSNAAMY5gCgl9mzuWlgztX2SC9YnT8CrHdi
2RsAnjqsaNyE9ZuDxO6GFFUZ5EgV0SIM
=PSnJ
-----END PGP SIGNATURE-----

Revision history for this message
Robert Collins (lifeless) wrote :

On Tue, Feb 1, 2011 at 9:57 AM, John A Meinel <email address hidden> wrote:

> rt #43393 is about setting it up on staging, but I was going to co-opt
> that one to make it about production. Still waiting on SPM to have time
> to actually get to it, though.

The priority to fix things on staging vs prod is very different; I
would file a new RT for sure.

Revision history for this message
Martin Pool (mbp) wrote :

I think you should really file a new ticket; please cc me when you file it.

Revision history for this message
Martin Pool (mbp) wrote :

rt 43743 asks for this to be done on production too; obviously it depends on doing it on staging first and then testing that

Revision history for this message
Robert Collins (lifeless) wrote :

On Tue, Feb 1, 2011 at 1:55 PM, Martin Pool <email address hidden> wrote:
> rt 43743 asks for this to be done on production too; obviously it
> depends on doing it on staging first and then testing that

Huh? its been tested on qastaging. Theres no need to pipeline this
after staging.

Revision history for this message
Martin Pool (mbp) wrote :

What Robert meant there is that we don't need to test on staging; it can go direct to lpnet from here.

So, we should still do both these rts, but there is no ordering dependency between them.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 1/31/2011 9:49 PM, Martin Pool wrote:
> What Robert meant there is that we don't need to test on staging; it can
> go direct to lpnet from here.
>
> So, we should still do both these rts, but there is no ordering
> dependency between them.
>

Since it all seems to be blocked on losa time, I didn't think there was
a need for a separate rt. It needs to be deployed in both places anyway.

However, since you felt it was warranted, rt #43756. You've been CC'd
and it is now in the queue.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk1IITQACgkQJdeBCYSNAAOY0wCgormoBxi3o+Y1N1ueGN5aD2uO
jPEAnj09wA16jMiFcxUxCi1zpRbK+J+m
=sKir
-----END PGP SIGNATURE-----

Revision history for this message
Martin Pool (mbp) wrote :

This was deployed last week, but it seemed to start spinning and it had to be shut down.

https://wiki.canonical.com/IncidentReports/2011-02-11-Codehosting-Forking-Service-Down

The code is still in-tree but the feature is disabled at the moment, and it seems stable like that. We need to address the issues in that incident report to determine what caused the problem and how to test for it on qastaging before this goes live again.

William Grant (wgrant)
tags: removed: qa-ok
Revision history for this message
Martin Pool (mbp) wrote :

<lifeless> poolie: haproxy is only half-live, we're not actually in nodowntime mode yet.

This is also blocked on bug 819604 and bug 795025.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Bugs 819604 and 795025 have been fixed in bzr, and I'm hoping to land that new bzr in Launchpad soon.

Bug 795025 ("no way to gracefully disconnect clients and shut down the bzr serve") also requires some launchpad changes though.

tags: added: codehosting-ssh inventory performance
removed: lp-code
Curtis Hovey (sinzui)
Changed in launchpad:
assignee: John A Meinel (jameinel) → nobody
status: In Progress → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.