devkit-disks-daemon crashed with SIGSEGV in g_str_hash() with encrypted devices

Bug #414407 reported by Dagfinn Ilmari Mannsåker
174
This bug affects 42 people
Affects Status Importance Assigned to Milestone
obsolete
Fix Released
Medium
devicekit-disks (Ubuntu)
Fix Released
High
Martin Pitt

Bug Description

Binary package hint: devicekit-disks

Crash occurred on reboot after system upgrade.

ProblemType: Crash
Architecture: i386
Date: Sun Aug 16 12:42:25 2009
DistroRelease: Ubuntu 9.10
ExecutablePath: /usr/lib/devicekit-disks/devkit-disks-daemon
Package: devicekit-disks 005-0ubuntu6
ProcCmdline: /usr/lib/devicekit-disks/devkit-disks-daemon
ProcEnviron:

ProcVersionSignature: Ubuntu 2.6.31-5.24-generic
SegvAnalysis:
 Segfault happened at: 0x957f67 <g_str_hash+7>: movsbl (%edx),%eax
 PC (0x00957f67) ok
 source "(%edx)" (0x00000000) not located in a known VMA region (needed readable region)!
 destination "%eax" ok
SegvReason: reading NULL VMA
Signal: 11
SourcePackage: devicekit-disks
StacktraceTop:
 g_str_hash () from /usr/lib/libglib-2.0.so.0
 ?? () from /usr/lib/libglib-2.0.so.0
 ?? ()
 g_udev_marshal_VOID__STRING_OBJECT ()
 g_closure_invoke () from /usr/lib/libgobject-2.0.so.0
Title: devkit-disks-daemon crashed with SIGSEGV in g_str_hash()
Uname: Linux 2.6.31-5-generic i686
UserGroups:

Revision history for this message
Dagfinn Ilmari Mannsåker (ilmari) wrote :
Revision history for this message
Apport retracing service (apport) wrote : Stacktrace.txt (retraced)

StacktraceTop:IA__g_str_hash (v=0x0) at /build/buildd/glib2.0-2.21.4/glib/gstring.c:99
g_hash_table_remove_internal (hash_table=0x8227e90,
device_remove (daemon=0x8231c08, d=<value optimized out>)
g_udev_marshal_VOID__STRING_OBJECT ()
IA__g_closure_invoke (closure=0x8238d38, return_value=0x0,

Revision history for this message
Apport retracing service (apport) wrote : ThreadStacktrace.txt (retraced)
Changed in devicekit-disks (Ubuntu):
importance: Undecided → Medium
tags: removed: need-i386-retrace
Revision history for this message
In , Martin Pitt (pitti) wrote :

We got several duplicate reports about dk-disks crashing in device_remove(), apparently when trying to remove a NULL key from the hash tables:

#0 IA__g_str_hash (v=0x0) at /build/buildd/glib2.0-2.21.4/glib/gstring.c:99
 p = (const signed char *) 0x0
 h = <value optimized out>
#1 0x00926d97 in g_hash_table_remove_internal (hash_table=0x8227e90,
    key=0x0, notify=1) at /build/buildd/glib2.0-2.21.4/glib/ghash.c:195
 node = <value optimized out>
 node_index = <value optimized out>
 __PRETTY_FUNCTION__ = "g_hash_table_remove_internal"
#2 0x0804dc53 in device_remove (daemon=0x8231c08, d=<value optimized out>)
    at devkit-disks-daemon.c:748
 device = <value optimized out>
 native_path = <value optimized out>
 __PRETTY_FUNCTION__ = "device_remove"
[...]
(See http://launchpadlibrarian.net/30388984/ThreadStacktrace.txt for the full trace).

So it seems that one of device->priv->{native_path,device_file,object_path,dev} is NULL. Unfortunately the stack trace isn't precise enough to tell which one in particular. Most duplicates reported that this happens on trying to mount an encrypted drive.

One could just paper over this by checking for NULL keys before trying to remove from the hash table. It's probably a good idea for robustness anyway, but it would paper over the fact that something in the internal state keeping is wrong. These properties aren't queried on device removal, but on addition, so it doesn't sound like a race condition on already lost data on removal.

native_path is already set in devkit_disks_device_new(), so it's unlikely to be NULL. However, object_path is set in register_disks_device(), and device_file and dev in update_info(). So there is one possible path which could lead to this:

        devkit_disks_device_set_device_file (device, g_udev_device_get_device_file (device->priv->d));
        if (device->priv->device_file == NULL) {
                g_warning ("No device file for %s", device->priv->native_path);
                goto out;
        }

So perhaps device_add() shouldn't add such invalid ones to the hash tables, or devices which don't have all four data fields should be discarded at all?

Revision history for this message
Martin Pitt (pitti) wrote : Re: devkit-disks-daemon crashed with SIGSEGV in g_str_hash()

Does anyone have a recipe how to reproduce this reliably?

visibility: private → public
Changed in devicekit-disks (Ubuntu):
status: New → Confirmed
Martin Pitt (pitti)
summary: - devkit-disks-daemon crashed with SIGSEGV in g_str_hash()
+ devkit-disks-daemon crashed with SIGSEGV in g_str_hash() with encrypted
+ devices
Changed in devicekit:
status: Unknown → Confirmed
Revision history for this message
maxstirner (philipp-d) wrote :

this is a bit annoying, luks prompting on gnome used to work :(

upon first boot, when adding encrypted usb hdds it gives a "DBUS timeout" message after the enter encryption password prompt, then cannot mount the volume manually via cryptsetup until the usb drive is taken out and plugged back in (cannot access device).

Martin Pitt (pitti)
Changed in devicekit-disks (Ubuntu):
importance: Medium → High
assignee: nobody → Martin Pitt (pitti)
status: Confirmed → Triaged
Revision history for this message
In , Martin Pitt (pitti) wrote :

Created an attachment (id=30207)
don't remove NULL values from hash tables

Since the duplicates of this crash keep coming (11 already), I robustified device_remove() to not attempt to remove NULL values from the hash tables, since that's guaranteed to blow up. It does not fix the underlying status bookkeeping problem, of course, so I added a TODO comment.

IMHO the patch is good and should be applied anyway, but it's certainly not complete yet.

I tried to replicate this under various conditions with an encrypted USB drive (all of the reports are for encrypted drives); yanking it out right after plugging in, entering correct password and yanking it out, entering wrong password, or none at all, killing/restarting daemon in between, but I never got any of the values to be NULL. So this remains a bit of a mystery to me.

There was one recent comment in the downstream bug, after I asked for a reproduction recipe:

  "upon first boot, when adding encrypted usb hdds it gives a "DBUS timeout" message after the enter encryption password prompt, then cannot mount the volume manually via cryptsetup until the usb drive is taken out and plugged back in (cannot access device)."

But a d-bus timeout rather sounds like a consequence of dk-disks crashing than the reason for the crash.

Revision history for this message
Martin Pitt (pitti) wrote :

Patch sent to upstream for review.

Changed in devicekit-disks (Ubuntu):
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package devicekit-disks - 007-1ubuntu1

---------------
devicekit-disks (007-1ubuntu1) karmic; urgency=low

  * Upload current Debian git head to pick up recent bug fixes.
  * debian/rules: Enable quilt patch system. Add quilt build dependency.
  * Add 01-mkfs-tempdir.patch: Daemon does not create /var/run/DeviceKit-disks/,
    so mkfs jobs fail. Just create the directory in /tmp, this is what /tmp is
    for, after all. (See https://bugs.freedesktop.org/show_bug.cgi?id=24265)
  * Add 00git-fix-inhibit.patch: Actually make Inhibit() work again. Taken
    from upstream git head. (LP: #428133)
  * Add 02-unlock-CD-trays-after-mounting.patch: Unlike in the hal world, we
    do not have a daemon polling CD drives for eject button presses. In order
    to make hardware tray eject buttons work, unlock the tray after
    mounting a CD. This is pretty much equivalent to yanking out USB sticks,
    which we already handle reasonably (detecting disappeared device,
    force-unmounting). (https://bugs.freedesktop.org/show_bug.cgi?id=24052,
    LP: #397734)
  * Add 03-fix-subsystem-check-for-firewire.patch: Firewire subsystem is
    called "ieee1394" in current Linux. Now check for both "ieee1394" and
    "firewire". This fixes firewire drives to not be considered system
    internal any more. (https://bugs.freedesktop.org/show_bug.cgi?id=24351,
    LP: #442604)
  * Add 04-mount-vfat-with-shortname-mixed-by-default.patch: The previous
    default, shortname=lower, breaks all-uppercase file names ("touch
    FOO" creates "foo"), thus breaks rsync, and Windows compatibility. The
    default was changed in the Linux kernel for 2.6.32 as well.
    (https://bugs.freedesktop.org/show_bug.cgi?id=24129, LP: #428174)
  * Add 05-dont-remove-NULL-values-from-hash-tables.patch: In device_remove(),
    device_file, object_path, or dev are sometimes NULL. Do not attempt to
    remove NULL from the hash tables, since this crashes. This is mainly a
    robustification patch, it does not really fix the underlying
    state bookkeeping. (http://bugs.freedesktop.org/show_bug.cgi?id=24264,
    LP: #414407)

 -- Martin Pitt <email address hidden> Fri, 09 Oct 2009 09:55:41 +0200

Changed in devicekit-disks (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
In , Zeuthen (zeuthen) wrote :

(In reply to comment #0)
> However, object_path is set in register_disks_device(), and device_file
> and dev in update_info(). So there is one possible path which could lead to
> this:
>
> devkit_disks_device_set_device_file (device,
> g_udev_device_get_device_file (device->priv->d));
> if (device->priv->device_file == NULL) {
> g_warning ("No device file for %s", device->priv->native_path);
> goto out;
> }
>
> So perhaps device_add() shouldn't add such invalid ones to the hash tables, or
> devices which don't have all four data fields should be discarded at all?

Yeah, I think there's some race with 'change', then 'remove' and our process handles the 'change' event just before the 'remove' event (e.g. the 'remove' event happened in udev already - it's just queued up in our process). So that brings us here.

Actually we don't handle neither the udev 'move' event nor the situation where the device file is changing. To handle this we need to keep the global hash tables up to date when processing a 'change' event. I've committed a patch to do this (hold on for the URL) and it should fix this problem as well.

Revision history for this message
In , Zeuthen (zeuthen) wrote :

Here's the patch

http://cgit.freedesktop.org/DeviceKit/DeviceKit-disks/commit/?id=7d9be0143e8d5b34364c5b7dc27cf735da84b558

Martin, any chance you can get the upstream reporters to test it? Thanks.

Please reopen the bug if there's still a crash.

Changed in devicekit:
status: Confirmed → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

I just uploaded a newer version 007-2 with the upstream patch, which should now fix it properly. I really appreciate if people who could reproduce this will give this version a good beating and report positive or negative feedback here. I personally could never reproduce it, and I tried just about any possible combination of wrong/right/no passwords, yanking out the device, and daemon restart..

Thanks all for your reports!

Revision history for this message
maxstirner (philipp-d) wrote :

Thanks for your work on this. I am fully updated on karmic, nonetheless this is still occurring. Unfortunately I can't fully reproduce it myself. Why was the whole gnome luks system changed again? It used to work rather well..

Revision history for this message
Kazuhiro Okamoto (linuxian) wrote :

 2009-1-15 daily ,tested

Changed in devicekit:
importance: Unknown → Medium
Changed in devicekit:
importance: Medium → Unknown
Changed in devicekit:
importance: Unknown → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.