Comment 8 for bug 1930359

Revision history for this message
Matthew Ruffell (mruffell) wrote : Re: gdm fails to start in a VMware Horizon VDI environment with latest mutter 3.36.9-0ubuntu0.20.04.1 in focal-updates

Hi Daniel,

Yes, I am sure this is the same issue that they are experiencing there, and I now believe the issue lies in glib, and not mutter.

When we install mutter-common, it calls the libglib2.0-0 hook to recompile the gsettings schemas.

The customer provided me with a tarball of their /usr/share/glib-2.0/schemas directory, and I have spent the day looking at it.

I deleted all the schemas from a test 20.04 VM, and extracted the tarball of their schemas in place, and rebooted the VM.

From there, the same exact problems occurred. Each program could not load the compiled gschema file, and hit a breakpoint in the glib library.

Jul 2 13:41:04 ubuntu tracker-miner-f[1235]: No GSettings schemas are installed on the system
Jul 2 13:41:04 ubuntu tracker-extract[1234]: No GSettings schemas are installed on the system
Jul 2 13:41:04 ubuntu kernel: [ 13.280095] show_signal: 7 callbacks suppressed
Jul 2 13:41:04 ubuntu kernel: [ 13.280097] traps: tracker-miner-f[1235] trap int3 ip:7fb6202ac295 sp:7fff0d5c7cd0 error:0 in libglib-2.0.so.0.6400.6[7fb620270000+84000]
Jul 2 13:41:04 ubuntu kernel: [ 13.281163] traps: tracker-extract[1234] trap int3 ip:7f8718ac3295 sp:7ffe774d1c40 error:0 in libglib-2.0.so.0.6400.6[7f8718a87000+84000]

Jul 2 13:41:00 ubuntu gnome-session[1175]: gnome-session-binary[1175]: GLib-GIO-ERROR: No GSettings schemas are installed on the system
Jul 2 13:41:00 ubuntu gnome-session[1175]: aborting...
Jul 2 13:41:00 ubuntu gnome-session-binary[1175]: GLib-GIO-ERROR: No GSettings schemas are installed on the system#012aborting...
Jul 2 13:41:00 ubuntu gdm3: GdmDisplay: Session never registered, failing
Jul 2 13:41:00 ubuntu gdm3: GdmLocalDisplayFactory: maximum number of X display failures reached: check X server log for errors
Jul 2 13:41:00 ubuntu gdm3: Child process -1157 was already dead.

Now, looking closer, we see their gschema.compiled file exists. This means that we aren't dealing with a missing file and it not being re-created, but instead a corrupted gschema.compiled file.

I rebuilt the file with:

$ sudo glib-compile-schemas /usr/share/glib-2.0/schemas/

and rebooted, and the system came up normally. Very interesting.

From there, I rebuilt the file several times, each time checking the sha256 value. Each time it was exactly the same, so the compile process appears to be deterministic.

I then did a binary diff of the corrupted gschema.compiled file, and a freshly rebuilt one.

I found two bytes were different:

$ cmp -l ~/schemas/gschemas.compiled /usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X %02X\n", $1, strtonum(0$2), strtonum(0$3)}'
0000376F E3 25
00003771 A4 65

$ xxd ~/schemas/gschemas.compiled > ~/corrupt.bin
$ xxd /usr/share/glib-2.0/schemas/gschemas.compiled > ~/working.bin
$ diff ~/corrupt.bin ~/working.bin
887,888c887,888
< 00003760: 0515 0000 ffff ffff 7837 0000 0000 e300 ........x7......
< 00003770: a455 0000 0000 0000 6f72 672e 676e 6f6d .U......org.gnom
---
> 00003760: 0515 0000 ffff ffff 7837 0000 0000 2500 ........x7....%.
> 00003770: 6555 0000 0000 0000 6f72 672e 676e 6f6d eU......org.gnom

I need to determine exactly how these two bytes ended up different.

I think we are chasing two bugs here:

1) A bug which generates a corrupted gschema.compiled file.
2) A bug where we cannot parse a corrupted gschema.compiled file gracefully.

Since my VM was generating a lot of coredumps for each process, I took a look. I downloaded the debug symbols of glib2.0 for 20.04 and opened a crashdump in gdb.

(gdb) bt
#0 _g_log_abort (breakpoint=1) at ../../../glib/gmessages.c:554
#1 0x00007f635e381579 in g_logv (log_domain=0x7f635e6006ff "GLib-GIO", log_level=G_LOG_LEVEL_ERROR, format=<optimized out>, args=args@entry=0x7ffe83d1e730) at ../../../glib/gmessages.c:1373
#2 0x00007f635e381743 in g_log (log_domain=log_domain@entry=0x7f635e6006ff "GLib-GIO", log_level=log_level@entry=G_LOG_LEVEL_ERROR,
    format=format@entry=0x7f635e6217b8 "No GSettings schemas are installed on the system") at ../../../glib/gmessages.c:1415
#3 0x00007f635e5ad1fa in g_settings_set_property (object=<optimized out>, prop_id=2, value=<optimized out>, pspec=<optimized out>) at ../../../gio/gsettings.c:591
#4 0x00007f635e46b681 in object_set_property (nqueue=0x55a285fd8e20, value=0x7ffe83d1e910, pspec=0x55a285fd4570, object=0x55a285fe3570) at ../../../gobject/gobject.c:1565
#5 g_object_new_internal (class=class@entry=0x55a285fee870, params=params@entry=0x7ffe83d1e9b0, n_params=n_params@entry=1) at ../../../gobject/gobject.c:1971
#6 0x00007f635e46d378 in g_object_new_valist (object_type=<optimized out>, first_property_name=<optimized out>, var_args=var_args@entry=0x7ffe83d1eb00) at ../../../gobject/gobject.c:2262
#7 0x00007f635e46d6cd in g_object_new (object_type=<optimized out>, first_property_name=<optimized out>) at ../../../gobject/gobject.c:1780
#8 0x000055a285196a5c in ?? ()
#9 0x000055a28517cfe6 in ?? ()
#10 0x00007f635e0180b3 in __libc_start_main (main=0x55a28517c8d0, argc=4, argv=0x7ffe83d1edc8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe83d1edb8)
    at ../csu/libc-start.c:308
#11 0x000055a28517d21e in ?? ()

Okay, so gnome-session and gdm and nautilus and all the other programs crash for the exact same reason, and that is, glib2.0 tries to parse the binary gsettings.compiled file, fails for some reason, and returns NULL to its caller in g_settings_set_property():

 544 static void
 545 g_settings_set_property (GObject *object,
 546 guint prop_id,
 547 const GValue *value,
 548 GParamSpec *pspec)
 549 {
...
 588 default_source = g_settings_schema_source_get_default ();
 589
 590 if (default_source == NULL)
 591 g_error ("No GSettings schemas are installed on the system");
...

Now, this goes and logs the error to disk, and eventually hits a breakpoint in _g_log_abort(), called from g_logv(), the kernel finds that there is no debugger waiting for this breakpoint, and then collects a coredump, and terminates the process.

I followed the logic in g_settings_schema_source_get_default(). What it does is allocate a buffer for the binary file, read the file in, and then attempts to build a table by parsing the binary file. Interestingly, it explicitly marks the input as "trusted" and even has a comment to say that problems can occur if we parse a trusted binary file, that happens to be corrupted.

 248 /**
 249 * g_settings_schema_source_new_from_directory:
...
 264 * If @trusted is %TRUE then `gschemas.compiled` is trusted not to be
 265 * corrupted. This assumption has a performance advantage, but can result
 266 * in crashes or inconsistent behaviour in the case of a corrupted file.
 267 * Generally, you should set @trusted to %TRUE for files installed by the
 268 * system and to %FALSE for files in the home directory.
 269 *
 270 * In either case, an empty file or some types of corruption in the file will
 271 * result in %G_FILE_ERROR_INVAL being returned.
...

I did some quick tests. If I changed each byte that was different individually, things worked without issue. So we need both of these byte changed to cause issues.

At the moment, I am reading up about the compiled gschema binary format, and how the glib library parses the binary file, and why we error out on corruption.

I tried the same corrupted gschema.compiled file on a fresh Impish install, and the latest glib version there crashes as well.