stale lock prevents apport runs

Bug #137567 reported by C de-Avillez
6
Affects Status Importance Assigned to Milestone
apport (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Binary package hint: apport

Today I succeeded in crashing Evolution-Data-Server, and I got a crash report created in /var/crash. Nevertheless apport-retrace refused to retrace this crash stating that a required keyword was missing from the report.

Looking at /var/log/apport I see an entry for the crash with this message: "another apport instance is already running, aborting", so this is the reason for the missing keyword, I guess.

The problems:

1. there was *no* other instance of apport running at the time of the crash;
2. the /var/crash/.lock lock file was dated Sep 5th 2007. Since then this machine has been rebooted several times.

So... stale lock. Since the .lock file was owned by root, it might well be that my apport run (being run as myself) did not have the necessary privilege to delete a file owned by root. I do not know, have not had time to look at the code.

Although I do understand the need to throttle apport simultaneous runs, I also think some sort of cleanup should be implemented; this cleanup would have to take in consideration that the lock file may be owned by another user.

Revision history for this message
C de-Avillez (hggdh2) wrote :

bah, the lock file was not dated Sep 5th... it was Aug 5th. Sorry.

Revision history for this message
Daniel Hahler (blueyed) wrote :

I have the same issue. It seems like a X server crash left the stale lock file around. (I'm using Option "NoTrapSignals" "true" in xorg.conf, so apport gets to know about the X server crashes).

Changed in apport:
status: New → Confirmed
Revision history for this message
Daniel Hahler (blueyed) wrote :

This debdiff makes apport use /var/lock/apport as a lockfile instead.
It also makes it exit with code 1 in case the lockfile could not get created (bug 147237).

I've also added a "/bin/rm -f /var/lock/apport" to the init script, in case apport gets restarted.

In fact, the lockfile would not have to be moved to /var/lock: deleting it in the init script's "start" action would be enough. But it seems like having lockfiles in /var/lock sounds reasonable.

Revision history for this message
Martin Pitt (pitti) wrote :

The .lock file is not stale at all. It does not matter if it exists, since the lock is done using flock(2), not by merely testing the existence of that file. So the patch would not help in any way. So the reason for this must be entirely different. Can you please attach your /var/log/apport.log?

Changed in apport:
status: Confirmed → Incomplete
Revision history for this message
Daniel Hahler (blueyed) wrote :

I guess then that the reason has been that the lockfile from "root" was left and then already the
fd = os.open(lockfile, os.O_WRONLY|os.O_CREAT|os.O_NOFOLLOW)
failed.

I've just had X crashing again and the lock file left, owned by root.

Revision history for this message
Martin Pitt (pitti) wrote :

Daniel, no, that's not it. First, apport is always run as root. Second, if the os.open() fails, apport writes "cannot create lock file" into apport.log and exits. If you got "another apport instance is already running, aborting", then the flock() call failed. If it usually works, then I guess there really was another apport instance running at that time. If it generally fails, it should be reproducible with

  python -c "import os, fcntl; fd = os.open('/var/lock/.crash', os.O_WRONLY|os.O_CREAT|os.O_NOFOLLOW); fcntl.lockf(fd, fcntl.LOCK_EX | fcntl.LOCK_NB)"

Does that work for you?

Revision history for this message
Martin Pitt (pitti) wrote :

Whoops, sorry. Of course I meant

  sudo python -c "import os, fcntl; fd = os.open('/var/crash/.lock', os.O_WRONLY|os.O_CREAT|os.O_NOFOLLOW); fcntl.lockf(fd, fcntl.LOCK_EX | fcntl.LOCK_NB)"

Revision history for this message
Daniel Hahler (blueyed) wrote :

Yes, the code snippet works for me and I believe I've created confusion here - not knowing about flock(2) and that apport is always being run as root, sorry.

hggdh, can you give more information? Has the problem happened to you again?

Revision history for this message
C de-Avillez (hggdh2) wrote :

Since then I have been monitoring the logs and /var/crash. I never saw it happening again.

So, sorry, no new data. I guess we could close it invalid then.

Revision history for this message
Martin Pitt (pitti) wrote :

Thanks, hggdh, for reporting back. Let's close this then until it happens again.

Changed in apport:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.