Applications have a dead open dbus connection (since image #96)

Bug #1334676 reported by Brendan Donegan
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Autopilot
Invalid
Critical
Christopher Lee
qtubuntu-sensors
Fix Released
Critical
Ricardo Mendoza

Bug Description

Without any apparent changes to autopilot itself, an issue has been introduced in launching applications since image #96 (see http://people.canonical.com/~lzemczak/landing-team/96.commitlog). This manifests itself in the logs as:

12:58:11.249 INFO _launcher:255 - Launching process: ['/usr/lib/arm-linux-gnueabihf/qt5/bin/qmlscene', '-testability', '-I/usr/lib/modules', '/tmp/tmpBppkbb.qml', '--desktop_file_hint=/home/phablet/.local/share/applications/tmpbs7ZaX.desktop']
12:58:40.820 ERROR proxies:410 - Introspect error on :1.810:/com/canonical/Autopilot/Introspection: dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
12:58:40.840 DEBUG proxies:413 - Executing introspect queue due to error

This adds 25 seconds to the length of every test, and causes some tests to fail as they depend on the application being in a certain state when the test is run (e.g. https://bugs.launchpad.net/mediaplayer-app/+bug/1334186).

description: updated
Revision history for this message
Brendan Donegan (brendan-donegan) wrote :

This also happens with python3-autopilot. My versions below:

ii python-autopilot 1.4.1+14.10.20140430-0ubuntu4
ii python3-autopilot 1.5.0+14.10.20140613-0ubuntu1

Changed in autopilot:
assignee: nobody → Christopher Lee (veebers)
importance: Undecided → Critical
Revision history for this message
Brendan Donegan (brendan-donegan) wrote :

That code snippet is a red herring - even when it works well (in #95), dbus_application_name is not set, so it's nothing to do with that. This is the call that actually hangs:

362 -> proxy_obj = get_proxy_object_for_existing_process(
363 pid=pid,
364 process=process,
365 dbus_bus=dbus_bus,
366 emulator_base=launcher_instance.emulator_base,
367 application_name=application_name,

Martin Pitt (pitti)
description: updated
Revision history for this message
Christopher Lee (veebers) wrote :

I dug a little further into this issue and it isn't an issue with autopilot (or libautopilot-qt etc.).

What I've observed is that there is an extra dbus connection being created when an application is being launched that does nothing (i.e. is registered but that's it, doesn't respond to queries).
This means that when we attempt to query this connection to see if it's the one that we want we hang for ~25 seconds as we wait for the "Error: org.freedesktop.DBus.Error.NoReply" failure.

I've been able to observe this happening using these steps: http://pastebin.ubuntu.com/7709477/
(Note that all this is done using a mako device, not desktop)
In this example the app is launched without testability/autopilot support but we see the same behaviour if it is.

When I run these steps on image #95 I only get 2 connections for the app and querying both gives me a quick response.

I've logged with both dbus-monitor, using G_DBUS_DEBUG=all and QDBUS_DEBUG=1 with the launched application.
You can also see in this run [1] (this is the application log with QDBUS_DEBUG=1) that when I kill the app with Ctrl-C there are a bunch of assertion failure messages from dbus-connection.c.

Here are some other example logs, but I doubt they will be super useful.

http://paste.ubuntu.com/7709432/ (this is an app run with G_DBUS_DEBUG=all)
Connections for this run for this app pid were: :1.847, 1.848 & 1.849
The non responsive connection in this run is :1.849 but doesn't appear in this log.

http://paste.ubuntu.com/7709408/ (this is the output from dbus-monitor)
Connections were :1.746, :1.747 & :1.748 (the hanging one being :1.748)
On line 97 you can see the connection being registered.

To quote Martin Pitt:
"it's not gdbus/not qdbus, but libdbus directly, not doing anything on the connection except registering it, connection is dead, and these assertions from libdbus"

[1] http://paste.ubuntu.com/7709407/

Changed in autopilot:
status: New → Invalid
summary: - Dbus errors when launching an application since image #96
+ Applications have a dead open dbus connection (since image #96)
Revision history for this message
Martin Pitt (pitti) wrote :

This should still have an open bug task. Tentatively assigning to ubuntu-sdk-libs, but of course that's too broad.

Suggestions how to continue debugging:

 * Bisect http://people.canonical.com/~lzemczak/landing-team/96.commitlog and see which new component actually introduced this regression and then assing to that, and possibly revert

 * This assertion:

   process 7262: arguments to dbus_connection_close() were incorrect, assertion "connection->generation == _dbus_current_generation" failed in file ../../dbus/dbus-connection.c line 2890.

  seems to be the best hint we have at the moment, and it's consistent with having a bus connection which is still registered (dbus-monitor doesn't show it as closed), but already dead as the application tried to close it in an invalid way. This is coming from libdbus-1-3, so it's not anything using gdbus or qdbus. Suggesting to add a breakpoint there and when the assertion hits see where/what in the stack tries to close that connection.

Changed in ubuntu-touch-meta (Ubuntu):
importance: Undecided → Critical
I Ahmad (iahmad)
no longer affects: messaging-app
Revision history for this message
Brendan Donegan (brendan-donegan) wrote :

So the bisect concludes:

1.) Install image #95
2.) Update qtubuntu-sensors to 0.1+14.10.20140624-0ubuntu1
3.) Run the autopilot test. You will see the following errors happen quickly:

10:48:20.537 ERROR proxies:410 - Introspect error on :1.140:/com/canonical/Autopilot/Introspection: dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NoReply: Message did not receive a reply (timeout by message bus)
10:48:20.539 DEBUG proxies:413 - Executing introspect queue due to error
10:48:20.551 INFO __init__:387 - dbus.DBusException while attempting to get PID for :1.139: DBusException("Could not get PID of name ':1.139': no such name",)
10:48:20.560 INFO __init__:387 - dbus.DBusException while attempting to get PID for :1.138: DBusException("Could not get PID of name ':1.138': no such name",)
10:48:21.565 INFO _launcher:368 - waiting for process to exit.

4.) Update ubuntu-application-api2-touch to 2.1.0+14.10.20140624.4-0ubuntu1
5.) Run the test again. The errors no longer appear, but the hang is now present.

affects: ubuntu-touch-meta (Ubuntu) → qtubuntu-sensors
Changed in qtubuntu-sensors:
assignee: nobody → Ricardo Mendoza (ricmm)
Revision history for this message
Brendan Donegan (brendan-donegan) wrote :

Sorry to play devils advocate, but what is the reason for autopilot to scan the whole bus? It seems like a vulnerability in the architecture to allow a problem in some random component to impact so badly on autopilot itself.

Revision history for this message
Sergio Schvezov (sergiusens) wrote :

How can this be explained?

phablet@ubuntu-phablet:~$ for X in $(qdbus); do echo "$(qdbus org.freedesktop.DBus / GetConnectionUnixProcessID $X) -> $X"; done > /tmp/pid-bus.log
phablet@ubuntu-phablet:~$ grep $(pgrep dialer) /tmp/pid-bus.log4861 -> :1.422
phablet@ubuntu-phablet:~$ qdbus :1.422
Error: org.freedesktop.DBus.Error.NoReply
Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
phablet@ubuntu-phablet:~$ qdbus :1.422
/
/dialer_2dapp
/org
/org/freedesktop
/org/freedesktop/Telepathy
/org/freedesktop/Telepathy/Client
/org/freedesktop/Telepathy/Client/TelephonyPluginObserver

Revision history for this message
Sergio Schvezov (sergiusens) wrote :

Seems we are dealing with the anonymous name the client creates to connect to com.canonical.usensord

usensord does error out when called (although I'm fixing it to report an interface error instead of unknown method)

sergiusens@rivendell:~/projects/go/src/launchpad.net/usensord$ time qdbus com.canonical.usensord
Error: org.freedesktop.DBus.Error.UnknownObject
Unknown object path /

real 0m0.013s
user 0m0.007s
sys 0m0.005s
sergiusens@rivendell:~/projects/go/src/launchpad.net/usensord$ time qdbus com.canonical.usensord /com/canonical/usensord/haptic
Error: org.freedesktop.DBus.Error.UnknownMethod
Unknown method

real 0m0.013s
user 0m0.007s
sys 0m0.005s

Revision history for this message
Christopher Lee (veebers) wrote :

Hi Brendan,

Autopilot need to find the right connection to send all it's queries to so to do so we query the available buses based on the query criteria passed to get_proxy_object_for_existing_process (launch_test_application does this under the hood).
Most times the only time that autopilot scans the whole bus is to find which connections are for PID x. For all the connections for PID x it then tries to find the autopilot introspection interface.

In this bug we are seeing 3 bus connections for the application under test, when we attempt to find the bus that provides the AP introspection we encounter one that hangs for 25 seconds.

Revision history for this message
Ricardo Mendoza (ricmm) wrote :

Chris,

So the issue lies with dbus-cpp not starting a bus listener instance on the client if there is no explicit request from the client to be exposed on the bus. The client "connection" therefore cannot serve messages pushed to it.

I don't know if this follows the DBus specification to the letter so I will now proceed to add something to platform-api to manually start the bus listener while dbus-cpp itself gains the capabilities. However, perhaps we should check if it is indeed expected by the spec to have the client listen if it hasn't exported itself explicitly on the bus, otherwise Autopilot may be making an incorrect assumption.

Revision history for this message
Brendan Donegan (brendan-donegan) wrote :

This landed in silo 007 and is fixed now - autopilot is still vulnerable to this type of behaviour though so it might be worth thinking about re-opening the autopilot task and figuring out any way that autopilot could protect itself in these situations.

Changed in qtubuntu-sensors:
status: New → Fix Released
Revision history for this message
Sergio Schvezov (sergiusens) wrote :

Can't autopilot instead of poll, wait for a org.freedesktop.DBus.ObjectManager.InterfacesAdded signal?

Revision history for this message
Thomi Richards (thomir-deactivatedaccount) wrote : Re: [Bug 1334676] Re: Applications have a dead open dbus connection (since image #96)

On Thu, Jul 3, 2014 at 9:03 AM, Sergio Schvezov <
<email address hidden>> wrote:

> Can't autopilot instead of poll, wait for a
> org.freedesktop.DBus.ObjectManager.InterfacesAdded signal?
>

No, we'd still get signals for dead interfaces.

While I understand the desire to protect autopilot against similar issues
in the future, I think we have more important matters to attend to. This
has happened exactly once in the three years that we've been deploying
autopilot, and it's due to clients not following the dbus spec. As such,
I'm not willing to commit resources to fix a problem that probably won't
happen ever again, and can be avoided by clients behaving nicely on the
dbus bus.

Having said that, well-tested patches are most welcome.

--
Thomi Richards
<email address hidden>

Revision history for this message
Sergio Schvezov (sergiusens) wrote :

@thomi this is actually part of the problem; I really can't find anywhere in the dbus spec a MUST implement, only a MAY and that means it's optional; hence I ask; to which section of the SPEC are you referring to when you say the dbus spec is not being followed?

Revision history for this message
Antti Kaijanmäki (kaijanmaki) wrote :

====
Martin Pitt (pitti) wrote on 2014-06-27:
 * This assertion:

   process 7262: arguments to dbus_connection_close() were incorrect, assertion "connection->generation == _dbus_current_generation" failed in file ../../dbus/dbus-connection.c line 2890.
====

Just to document, this assertion is probably due to:
bug #1422304

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.