Bazaar

Bug #257217
Comment #12

Comment 12 for bug 257217

Revision history for this message

Martin Pool (mbp) wrote on 2011-04-21: Re: [Bug 257217] Re: closing terminal causes stale lock

#12

On 20 April 2011 22:49, John Arbash Meinel <email address hidden> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 04/20/2011 11:44 AM, Martin Pool wrote:
>> On 20 April 2011 19:15, John A Meinel <email address hidden> wrote:
>>> How about we land SIGHUP changes, and just go with it for now. We can
>>> always open another bug if we need to.
>>
>> I think you're right: catching sighup is a step forward and will fix
>> some cases. What I saw before lead me to believe that when the
>> window's killed ssh may also be killed without us getting a chance to
>> do anything about it. (We could try to prevent that by eg changing
>> process group etc, but that may lead to knock-on problems.)
>>
>> Probably we should just
>> * merge that branch and close this bug
>> * reliably release locks server-side over ssh (maybe we do?)
>
> For commands where we can, we do. However, we still have a "write-lock
> the remote object because I'm going to do multiple operations" RPC,
> where we hand the client a lock token, which it is required to send to
> the server for each operation. So the server doesn't know when the last
> write action is going to be received. (If ever.)
> Again, we try to be stateless, so just having the connection drop
> "doesn't count". (Since HTTP wouldn't be able to tell you that.)

We shouldn't be doctrinaire about this. If the ssh connection drops,
it is very likely the whole client has terminated.

Perhaps in the future we should teach the client to start a new ssh
connection and pick up over that, but I'm not sure that's a really
important case. ssh connections fairly rarely just drop by themselves
and then can be immediately restarted, though this can sometimes
happen when for example switching from a wireless to wired network. A
more likely reason is that the whole client died or was interrupted,
or that there was an extended network outage, in which case it's not a
very safe assumption no one will have broken the lock.

For SSH dropouts I think the best thing is probably just to make it
easy, safe and fast to resume an interrupted transfer in a second run
of the program. Secondarily we could make it try to restart the SSH
connection, but in that case it should probably be prepared for the
objects no longer to be locked.

Over HTTP it's true we can't tell when the client has disappeared, but
that is a much less common case.

>> * just cope with stale locks over sftp
>>
>> I'll have a look at this tomorrow.

That's bug https://bugs.launchpad.net/bugs/220464 which I actually am
working on.

Martin

On 20 April 2011 22:49, John Arbash Meinel <john@arbash-meinel.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 04/20/2011 11:44 AM, Martin Pool wrote:
>> On 20 April 2011 19:15, John A Meinel <john@arbash-meinel.com> wrote:
>>> How about we land SIGHUP changes, and just go with it for now. We can
>>> always open another bug if we need to.
>>
>> I think you're right: catching sighup is a step forward and will fix
>> some cases.  What I saw before lead me to believe that when the
>> window's killed ssh may also be killed without us getting a chance to
>> do anything about it.  (We could try to prevent that by eg changing
>> process group etc, but that may lead to knock-on problems.)
>>
>> Probably we should just
>>  * merge that branch and close this bug
>>  * reliably release locks server-side over ssh (maybe we do?)
>
> For commands where we can, we do. However, we still have a "write-lock
> the remote object because I'm going to do multiple operations" RPC,
> where we hand the client a lock token, which it is required to send to
> the server for each operation. So the server doesn't know when the last
> write action is going to be received. (If ever.)
> Again, we try to be stateless, so just having the connection drop
> "doesn't count". (Since HTTP wouldn't be able to tell you that.)

We shouldn't be doctrinaire about this.  If the ssh connection drops,
it is very likely the whole client has terminated.

Perhaps in the future we should teach the client to start a new ssh
connection and pick up over that, but I'm not sure that's a really
important case.  ssh connections fairly rarely just drop by themselves
and then can be immediately restarted, though this can sometimes
happen when for example switching from a wireless to wired network.  A
more likely reason is that the whole client died or was interrupted,
or that there was an extended network outage, in which case it's not a
very safe assumption no one will have broken the lock.

For SSH dropouts I think the best thing is probably just to make it
easy, safe and fast to resume an interrupted transfer in a second run
of the program.  Secondarily we could make it try to restart the SSH
connection, but in that case it should probably be prepared for the
objects no longer to be locked.

Over HTTP it's true we can't tell when the client has disappeared, but
that is a much less common case.

>>  * just cope with stale locks over sftp
>>
>> I'll have a look at this tomorrow.

That's bug https://bugs.launchpad.net/bugs/220464 which I actually am
working on.

Martin