avahi-daemon takes 100% CPU right after boot and at every restart of CUPS

Bug #1059286 reported by Till Kamppeter
32
This bug affects 7 people
Affects Status Importance Assigned to Milestone
avahi (Debian)
Fix Released
Unknown
avahi (Ubuntu)
Fix Released
Critical
Till Kamppeter

Bug Description

When I boot the system and log in avahi-daemon takes 100% CPU and Avahi/mDNS/Bonjour does not work at all, avahi-discover does not get access to anything and does not open its window, until it reaches a timeout and shows an error dialog.

When I restart avahi-daemon with

sudo restart avahi-daemon

avahi-daemon usually gets back to normal not taking significant CPU percentage and answering to avahi-discover correctly. If it takes 100% CPU again I simply do another restart to get it to normal.

ProblemType: Bug
DistroRelease: Ubuntu 12.10
Package: avahi-daemon 0.6.31-1ubuntu1
ProcVersionSignature: Ubuntu 3.5.0-16.25-generic 3.5.4
Uname: Linux 3.5.0-16-generic x86_64
ApportVersion: 2.5.3-0ubuntu1
Architecture: amd64
Date: Mon Oct 1 00:05:30 2012
InstallationMedia: Ubuntu 12.10 "Quantal Quetzal" - Alpha amd64 (20120724.2)
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: avahi
UpgradeStatus: No upgrade log present (probably fresh install)
mtime.conffile..etc.avahi.avahi.daemon.conf: 2012-09-18T17:25:04.112674

Related branches

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :
Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

I have found out what spins up the avahi-daemon. It is the starting of CUPS during boot. So I can reproduce the bug without needing to reboot. I simply run

sudo restart cups

on the command line, repeating this command in the case the avahi daemon does not spin up (it actually spins up in 99% of the cases).

I have 18 print queues which I am sharing and CUPS registers them all with Avahi. One sees them in the "avahi-discover" window under "Internet Printer".

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

Attaching strace to the running avahi-daemon (<PID> is process ID of avahi-daemon)

sudo strace -s 65535 -p <PID> 2>&1 | tee log

and stopping CUPS with

sudo stop cups

shows that avahi-daemon still sends out (or at least tries) broadcast packets with the print queues:

sendmsg(13, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5353), sin_addr=ine
t_addr("224.0.0.251")}, msg_iov(1)=[{"\0\0\0\0\0\31\0\0\0\0\0\0\31Generic PostSc
ript @ till\4_ipp\4_tcp\5local\0\0\377\0\1%HP Photosmart Pro B8800 series @ till
\300&\0\377\0\1!HP Photosmart C8100 series @ till\10_printer\300+\0\377\0\1!HP P
hotosmart C8100 series @ till\5_ipps\300+\0\377\0\1!HP Photosmart C8100 series @
 till\300&\0\377\0\1!HP Photosmart C5200 series @ till\300\211\0\377\0\1!HP Phot
osmart C5200 series @ till\300\272\0\377\0\1!HP Photosmart C5200 series @ till\3
00&\0\377\0\1\35EPSON Stylus COLOR 880 @ till\300\211\0\377\0\1\35EPSON Stylus C
OLOR 880 @ till\300\272\0\377\0\1\35EPSON Stylus COLOR 880 @ till\300&\0\377\0\1
\36HP_Color_LaserJet_2600n @ till\300\211\0\377\0\1\36HP_Color_LaserJet_2600n @
till\300\272\0\377\0\1\36HP_Color_LaserJet_2600n @ till\300&\0\377\0\1\rHP Fax @
 till\300\211\0\377\0\1\rHP Fax @ till\300\272\0\377\0\1\rHP Fax @ till\300&\0\3
77\0\1'Hewlett-Packard HP LaserJet 3390 @ till\300\211\0\377\0\1'Hewlett-Packard
 HP LaserJet 3390 @ till\300\272\0\377\0\1'Hewlett-Packard HP LaserJet 3390 @ ti
ll\300&\0\377\0\1'Hewlett-Packard HP LaserJet 1020 @ till\300\211\0\377\0\1'Hewl
ett-Packard HP LaserJet 1020 @ till\300\272\0\377\0\1'Hewlett-Packard HP LaserJe
t 1020 @ till\300&\0\377\0\0013Hewlett-Packard HP Color LaserJet CM3530 MFP @ ti
ll\300\211\0\377\0\0013Hewlett-Packard HP Color LaserJet CM3530 MFP @ till\300\2
72\0\377\0\1", 1029}], msg_controllen=28, {cmsg_len=28, cmsg_level=SOL_IP, cmsg_
type=, ...}, msg_flags=0}, 0) = 1029

Remote machines do not see any Avahi broadcast from the machine where avahi-daemon is spinning on 100% CPU.

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

After the update to cups 1.5.3-0ubuntu10 nothing changed. avahi-daemon still spins up to 100% (but the DNS-SD broadcasting part of the new CUPS version did not change anyway).

Changed in avahi (Ubuntu):
importance: High → Critical
Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

I am attaching an strace log. For this I restart avahi-daemon to assure that it is not on 100% CPU. CUPS is running. Then I run

sudo strace -ftvvs 65535 -p PID 2>&1 | tee log

with PID being the process ID of avahi-daemon. I have restarted CUPS at 23:58:50, then I have restarted CUPS at 23:59:31 and the restart command took until 23:59:38. I stopped CUPS at 00:01:24 and it took until 00:01:26. The log ends at 00:03:25.

Note that avahi-daemon did not spin up this time. It seems only to spin up if it is not coupled with strace. strace seems to slow down avahi-daemon and this way avahi-daemon does not go wild when CUPS is registering its print queues. After decoupling strace and starting cups again not under observation by strace, avahi-daemon spins up.

So we should run avahi-daemon under strace by default, simply piping the log to /dev/null ...

For me it seems a timing problem.

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

Right after writing comment #5 I coupled strace to the spinning avahi-daemon, producing log like crazy. Attached is the log of some seconds of this, more or less the time of writing this paragraph.

Sorry, have to compress the log this time.

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

As long as avahi-daemon is coupled with strace it behaves absolutely correct. I can restart CUPS as often as I want and avahi-daemon does not spin up. After that, independent whether I leave strace attached or not, avahi-daemon continues behaving normally. Only if I restart CUPS while strace is not cuppled to avahi-daemon, avahi-daemon spins up.

The problem seems to have started with CUPS 1.6.x which ships Avahi-based Bonjour support from upstream in contrary to CUPS 1.5.3 which uses a patch from Tim Waugh to implement Avahi-based Bonjour support.

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

Tim, do you know anything about the differences between your Avahi support patch in CUPS and the upstream-supplied method in CUPS 1.6.x?

summary: - avahi-daemon takes 100% CPU right after boot
+ avahi-daemon takes 100% CPU right after boot and at every restart of
+ CUPS
Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

avahi-damon also does not spin up when one starts cups and there are no local print queues. Probably the problem occurs only when there is a certain number of queues.

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

Ran avahi-daemon under gdb. After restarting cups and avahi-daemon spinning up I interrupted the running avahi-daemon with Ctrl+C and took a backtrace:

 (gdb) bt
#0 0x00007ffff7bd2fef in avahi_unescape_label (
    name=name@entry=0x7fffffffe408,
    dest=dest@entry=0x7fffffffe410 "HP Photosmart C5200 series @ till",
    size=size@entry=64) at domain.c:100
#1 0x00007ffff7bd3674 in avahi_domain_equal (
    a=0x6757a0 "\\032C5200\\032series\\032\\064\\032till._printer._tcp.local",
    b=0x64d9c0 "Hewlett-Packard\\032HP\\032Color\\032LaserJet\\032CM3530\\032MFP\\032\\064\\032till._ipp._tcp.local") at domain.c:239
#2 0x00007ffff79b1bdf in packet_add_probe_query (s=s@entry=0x62b190,
    p=p@entry=0x674be0, pj=0x66e880, pj@entry=0x673380) at probe-sched.c:201
#3 0x00007ffff79b1ece in elapse_callback (e=<optimized out>,
    data=<optimized out>) at probe-sched.c:281
#4 0x00007ffff79a2093 in expiration_event (timeout=<optimized out>,
    userdata=0x623ad0) at timeeventq.c:94
#5 0x00007ffff7bd56c8 in avahi_simple_poll_dispatch (s=0x6229c0)
    at simple-watch.c:570
#6 0x00007ffff7bd5785 in avahi_simple_poll_iterate (s=<optimized out>,
    timeout=timeout@entry=-1) at simple-watch.c:605
#7 0x000000000040676c in run_server (c=<optimized out>) at main.c:1217
#8 main (argc=<optimized out>, argv=<optimized out>) at main.c:1635
(gdb)

Revision history for this message
Tim Waugh (twaugh) wrote :

There were quite a few differences with the upstream approach. The upstream Avahi support looked more correct as the changes were going in, but I haven't tested it out.

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

Repeating the test from comment #10 I get always the same backtrace, only with different queue names.

Revision history for this message
Tim Waugh (twaugh) wrote :

Till: try "finish" until it hangs. That will tell you how far down the stack the hang originates. Then you can step through it.

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :
Download full text (13.5 KiB)

Tim, I have tried it twice, both times I seem to run into a bug of gdb:

till@till:~$ sudo gdb avahi-daemon
[sudo] password for till:
GNU gdb (GDB) 7.5-ubuntu
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/avahi-daemon...Reading symbols from /usr/lib/debug/.build-id/47/c734e7af8803ff790794348387df57e5843b67.debug...done.
done.
(gdb) finish
The program is not being run.
(gdb) help finish
Execute until selected stack frame returns.
Upon return, the value returned is printed and put in the value history.
(gdb) run
Starting program: /usr/sbin/avahi-daemon
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Process 29509 died: No such process; trying to remove PID file. (/var/run/avahi-daemon//pid)
Found user 'avahi' (UID 111) and group 'avahi' (GID 120).
Successfully dropped root privileges.
avahi-daemon 0.6.31 starting up.
Successfully called chroot().
Successfully dropped remaining capabilities.
Loading service file /services/nut.service.
Loading service file /services/udisks.service.
Joining mDNS multicast group on interface virbr0.IPv4 with address 192.168.122.1.
New relevant interface virbr0.IPv4 for mDNS.
Joining mDNS multicast group on interface eth2.IPv4 with address 192.168.178.26.
New relevant interface eth2.IPv4 for mDNS.
Network interface enumeration completed.
Registering new address record for fe80::fc54:ff:feaa:6be4 on vnet1.*.
Registering new address record for fe80::fc54:ff:fe36:4ab on vnet0.*.
Registering new address record for 192.168.122.1 on virbr0.IPv4.
Registering new address record for 2002:5ddb:93a1:0:222:4dff:fe84:44d7 on eth2.*.
Registering new address record for 2002:5ddb:93a1:0:e904:5d60:fd49:58c8 on eth2.*.
Registering new address record for 192.168.178.26 on eth2.IPv4.
Server startup complete. Host name is till.local. Local service cookie is 3725523792.
Registering HINFO record with values 'X86_64'/'LINUX'.
Received response from host 192.168.122.1 with invalid source port 1048 on interface 'virbr0.0'
Received response from host 192.168.122.1 with invalid source port 1049 on interface 'virbr0.0'
^C
Program received signal SIGINT, Interrupt.
avahi_utf8_valid (
    str=str@entry=0x7fffffffe410 "Hewlett-Packard HP LaserJet 1020 @ till")
    at utf8.c:59
59 utf8.c: No such file or directory.
(gdb) bt
#0 avahi_utf8_valid (
    str=str@entry=0x7fffffffe410 "Hewlett-Packard HP LaserJet 1020 @ till")
    at utf8.c:59
#1 0x00007ffff7bd30f8 in avahi_unescape_label (
    name=name@entry=0x7fffffffe408,
    dest=dest@entry=0x7fffffffe410 "Hewlett-Packard HP LaserJet 1020 @ till",
    size=size@entry=64) at domain.c:109
#2 0x00007ffff7bd3674 in avahi_domain_equal (a=0x6770ba "_ipp._tcp.local",
    b=0x64bd00 "Hewlett-Packard\\032HP\\032Colo...

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

For me it looks like that avahi-daemon falls into an infinite loop sending out certain messages repeatedly (there are several 100 messages per second announcing the available CUPS queues) and does not listen any more to requests (loop does not go through the code for listening). Therefore avahi-discover cannot connect and also CUPS is not able to unregister its printers on shutdown. This leaves avahi-daemon continue to spin and announce CUPS queues.

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

Now I did a test of booting the SSD of my PC in my laptop (same software configuration but different hardware) and the same problem occurs, so the problem is not caused by any particular hardware.

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

The only network traffic with avahi-daemon is the following:

till@till:~$ sudo lsof -p 1024 -a -i
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/till/gvfs
      Output information may be incomplete.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
avahi-dae 1024 avahi 13u IPv4 9767 0t0 UDP *:mdns
avahi-dae 1024 avahi 14u IPv4 9768 0t0 UDP *:57043
till@till:~$

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

I have found a fix on the Avahi upstream mailing list:

http://lists.freedesktop.org/archives/avahi/2012-September/002188.html

I have rebuilt the package with this patch applied and the problem went away.

Citing the posting:

From: "Lee, Chun-Yi" <jlee at suse.com>

When we tested put a lot of airprint service files(have 45 to 60 flies), found there have cpu loadinghigh problem when start avahi-daemon with those service files.

After traced source code, there have problem in probe-sched.c::elapse_callback causes doesn't have any probe job set to DONE so the daemon unlimited send out DNS package.
The root cause is when compare with the free package space in packet_add_probe_query before attach job key, the free package space doesn't include any record data that will attached after all keys attached. This defect causes whole DNS package is filled by job key, but doesn't remain enough space for any rdata. Then, that means have no job set to DONE.

This patch add a new res_size member to AvahiDnsPacket, it used to sum the reserve size for record data the will attached after all keys attached. It can avoid keys consume whole size until p->size larger then p->max_size.

Changed in avahi (Ubuntu):
status: New → In Progress
assignee: nobody → Till Kamppeter (till-kamppeter)
Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

debdiff with the fix is attached.

I have tested the fixed avahi-daemon by restarting CUPS several times and avahi-daemon does not spin up any more. Printer broadcasting still works correctly (checked with avahi-discover and "lpinfo -v" on remote machine).

Changed in avahi (Ubuntu):
status: In Progress → Fix Committed
tags: added: patch
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package avahi - 0.6.31-1ubuntu2

---------------
avahi (0.6.31-1ubuntu2) quantal-proposed; urgency=low

  * debian/patches/avahi-core-reserve-space-for-record-data-when-size-estimate.patch:
    avahi-core: reserve space for record data when size estimate. This
    prevents avahi-daemon from falling into an invalid loop when many CUPS/IPP
    printer shares get registered (LP: #1059286,
    http://lists.freedesktop.org/archives/avahi/2012-September/002188.html).
 -- Till Kamppeter <email address hidden> Mon, 8 Oct 2012 22:23:00 +0200

Changed in avahi (Ubuntu):
status: Fix Committed → Fix Released
Changed in avahi (Debian):
status: Unknown → New
Changed in avahi (Debian):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.