WiFi malfunction after suspend & resume stress - sudo wpa_cli scan required to fix it.

Bug #1585863 reported by Shih-Yuan Lee
664
This bug affects 191 people
Affects Status Importance Assigned to Milestone
OEM Priority Project
Fix Released
Critical
Unassigned
Xenial
Fix Released
Critical
Unassigned
network-manager (Ubuntu)
Fix Released
High
Unassigned

Bug Description

HOW TO REPRODUCE:
1. Install fwts by `sudo apt-get install fwts`.
2. Run the suspend & resume stress test.
sudo fwts s3 --s3-multiple=30 --s3-min-delay=5 --s3-max-delay=5 --s3-delay-delta=5

RESULT:
The WiFi can not connect to any access point and we have to execute `sudo wpa_cli scan` manually to make it work again.

WORKAROUND:
(http://askubuntu.com/questions/761180/wifi-doesnt-work-after-suspend-after-16-04-upgrade)

SYSTEM INFO:
Description: Ubuntu Yakkety Yak (development branch)
Release: 16.10
Packages:
libnm-glib-vpn1:amd64 1.2.2-0ubuntu2
libnm-glib4:amd64 1.2.2-0ubuntu2
libnm-util2:amd64 1.2.2-0ubuntu2
libnm0:amd64 1.2.2-0ubuntu2
network-manager 1.2.2-0ubuntu2

Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :
tags: added: xenial yakkety
1 comments hidden view all 112 comments
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "network-manager_1.2.2-0ubuntu3.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Revision history for this message
V字龍(Vdragon) (vdragon) wrote :

@fourdollars
The xenial debdiff seems to be tainted by color codes...?

Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

@vdragon: Thx.

Mathew Hodson (mhodson)
Changed in network-manager (Ubuntu):
importance: Undecided → High
Kent Lin (kent-jclin)
Changed in oem-priority:
importance: Undecided → Critical
Revision history for this message
Sebastien Bacher (seb128) wrote :

Thanks Shih-Yuan, could you upstream your change so it's reviewed by somebody who knows the codebase?

Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

@seb128, the bug is from Ubuntu own patch so the fix can not be upstreamed.

Revision history for this message
Michael Terry (mterry) wrote :

Shih-Yuan, I think it would make more sense to just modify wifi-Signal-on-the-wifi-device-when-its-supplicant-i.patch in place, rather than creating a patch to patch our patch. :)

But I've asked cyphermox to check what's happening here, since it seems odd that his change to src/supplicant-manager/nm-supplicant-interface.c would be entirely unneeded?

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

That seems wrong. This change was done because you might otherwise get stray SCAN_DONE signals when the scan is in progress -- we should still be getting a SCAN_DONE signal later from the supplicant when we get the scan results. The issue here is that the scan may not be done correctly on return from suspend, but I doubt that forcing a SCAN_DONE to look at the scan results again is the right way to go about fixing this.

Since the issue can be resolved by asking the driver to scan again (via wpa or via iw dev wlan0 scan) and only seems to apply for iwlwifi (as far as I've heard), it would probably really point to either a driver or a wpasupplicant bug.

Tony, what are your thoughts on this? I know you recently spent a lot of time looking at the scanning/scan results logic?

Revision history for this message
Dave Chiluk (chiluk) wrote :

This really should have been worked via one of the myriad of other older open bugs related to this, like 1576747. I'll start duping against this bug.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in network-manager (Ubuntu):
status: New → Confirmed
Revision history for this message
Dave Chiluk (chiluk) wrote :

The above patches do not resolve this issue for my machine. de-duping 1576747.

Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

Hi Mathieu,

I found this issue on ath9k.

04:00.0 Network controller [0280]: Qualcomm Atheros QCA9565 / AR9565 Wireless Network Adapter [168c:0036] (rev 01)
        Subsystem: Dell QCA9565 / AR9565 Wireless Network Adapter [1028:020e]

This patch is just back to the original implementation of the upstream.
Removing wifi-Signal-on-the-wifi-device-when-its-supplicant-i.patch can also fix this issue.

BTW, I can also reproduce this issue on my own USB WiFi adapter.
Bus 003 Device 003: ID 07b8:3072 AboCom Systems Inc 802.11n/b/g Mini Wireless LAN USB2.0 Adapter
Driver=rt2800usb (Ralink RT2800 USB Wireless LAN driver.)

Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

This patch is for xenial.

Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

I can also reproduce the same issue on my own Lenovo ThinkPad X200.

03:00.0 Ethernet controller [0200]: Qualcomm Atheros AR242x / AR542x Wireless Network Adapter (PCI-Express) [168c:001c] (rev 01)
 Subsystem: Qualcomm Atheros AR242x / AR542x Wireless Network Adapter (PCI-Express) [168c:0035]
...
 Kernel driver in use: ath5k
 Kernel modules: ath5k

description: updated
Revision history for this message
Hans Deragon (deragon) wrote :

Looks like Bug #1448555 is a duplicate of this one.

Revision history for this message
Hans Deragon (deragon) wrote :

Bug #1380480 is not the same, but until someone finds the proper fix for both issues, why not introduce a script under /etc/pm/sleep.d that restarts the network manager upon resume? Lets get something working for the non technical people / consumer quickly. Such a script would "solve" the problem for both issues.

summary: - WiFi malfunction after suspend & resume stress
+ WiFi malfunction after suspend & resume stress - sudo wpa_cli scan
+ required to fix it.
Revision history for this message
auspex (auspex) wrote :

because restarting network-manager from sleep.d isn't even a workaround, let alone a fix. With that, my network successfully reconnects just about as often as if there is nothing in sleep.d.

Which, apparently, would be because /etc/pm/sleep.d is never invoked.

Revision history for this message
Cristiano Gavião (cvgaviao) wrote :

Don't know about others but my Dell's notebook is not only loosing the wifi, after sleep it is not waking up anymore.I need to power off it...
In the log I think I just see these: /lib/systemd/system-sleep/wpasupplicant failed with error code 255.

Revision history for this message
auspex (auspex) wrote :

I finally worked around my problem by adding a script in /lib/systemd/system-sleep/:

    $ cat /lib/systemd/system-sleep/12_wifi
    #!/bin/bash

    case $1 in
      "post")
        # disable/enable wifi
        rfkill block wifi; rfkill unblock wifi
        logger "reenabled wifi"
      ;;
    esac

Revision history for this message
Samuel W (poshul) wrote :

I can confirm that this also happens when toggling the radio killswitch on my x230 with Xenial:

Jul 26 09:48:19 Host NetworkManager[2979]: <info> [1469540899.8669] manager: WiFi now enabled by radio killswitch
Jul 26 09:48:19 Host kernel: [71025.599893] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
Jul 26 09:48:19 Host wpa_supplicant[3344]: dbus: wpa_dbus_get_object_properties: failed to get object properties: (none) none
Jul 26 09:48:19 Host wpa_supplicant[3344]: dbus: Failed to construct signal
Jul 26 09:48:19 Host NetworkManager[2979]: <info> [1469540899.9049] device (wlan0): supplicant interface state: starting -> ready
Jul 26 09:48:19 Host NetworkManager[2979]: <info> [1469540899.9050] device (wlan0): state change: unavailable -> disconnected (reason 'supplicant-available') [20 30 42]
Jul 26 09:48:19 Host kernel: [71025.639368] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready

Revision history for this message
Hans Deragon (deragon) wrote :

Bug# 1455097 "/etc/pm/sleep.d/ is no more processed" confirms that any solution involving "/etc/pm/sleep.d/" will not work since systemd took over. auspex solution works (thank you), although in my case, I simply restart the Network Manager (systemctl restart network-manager) to be really sure that Wifi will come up (see below).

That said, this bug is a shame to Ubuntu. Any consumer expects wifi/networking to come up upon resume. It is basic. Nobody at Canonical suffers from this problem?

If it is too hard to find a solution at the heart of the problem in a timely matter, we should quickly make a patch to force a rescan or restart the Network Manager. Who on this bug list is entitled to take a decision and package a script under '/lib/systemd/system-sleep'? Anyone? What is the process to get things done?

Following is the reason why I prefer a restart of the Network Manager. There were times that the Network Manager (at least in 14.04 for sure, 16.04, not so sure) would not even come out from sleep. This is why I prefer to restart it upon resume; it fixes multiple problems such as coming out of sleep and Wifi scanning. It does not cost much and I do not find any downsides, except for the nm-applet disappearing and reappearing quickly upon resume.

Revision history for this message
teo1978 (teo8976) wrote :

> Nobody at Canonical suffers from this problem?

LOL I don't think anybody at Canonical actually uses it, otherwise it couldn't possibly be as broken as it is.

Revision history for this message
Aleve Sicofante (sicofante) wrote :

@auspex: Your script doesn't work here. I just created it and gave it execution permissions, rebooted and tried a sleep/resume cycle. No dice.

I'm just amazed no one from Canonical is chiming in. This has been happening from the very moment I installed 16.04 on its release day and it happens in the two laptops I use (Lenovo T400 and Dell Studio 1537, both with Intel wireless cards). It's so obvious I didn't even try to file a bug understanding it would be naturally solved by 16.04.1 at the latest... I'm truly amazed in the worst sense.

Revision history for this message
monte (monte3) wrote :

yeah, canonical, is anyone there?!

Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

This is for xenial.

Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

This is for yakkety.

Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

Hi,

I made a testing PPA at ppa:fourdollars/lp1585863.
Please help to check if it can fix this issue for you.
If not, your problem may not be related to this issue.

Revision history for this message
Aron Xu (happyaron) wrote :

btw, n-m/1.2.2 is in queue for Xenial SRU, fix for this issue can be integrated once it's in -proposed.

Revision history for this message
auspex (auspex) wrote : Re: [Bug 1585863] Re: WiFi malfunction after suspend & resume stress - sudo wpa_cli scan required to fix it.

You can try having the script do "modprobe -r" and "modprobe" on your wifi
module. That should always work, but seemed like overkill in my case. In
any case, these are workarounds, not fixes.

On 4 Aug 2016 6:01 a.m., "Aleve Sicofante" <email address hidden> wrote:

> @auspex: Your script doesn't work here. I just created it and gave it
> execution permissions, rebooted and tried a sleep/resume cycle. No dice.
>
> I'm just amazed no one from Canonical is chiming in. This has been
> happening from the very moment I installed 16.04 on its release day and
> it happens in the two laptops I use (Lenovo T400 and Dell Studio 1537,
> both with Intel wireless cards). It's so obvious I didn't even try to
> file a bug understanding it would be naturally solved by 16.04.1 at the
> latest... I'm truly amazed in the worst sense.
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (1448555).
> https://bugs.launchpad.net/bugs/1585863
>
> Title:
> WiFi malfunction after suspend & resume stress - sudo wpa_cli scan
> required to fix it.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/network-manager/+bug/1585863/+subscriptions
>

Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

Hi Aron,

I saw n-m/1.2.2 at https://launchpad.net/ubuntu/xenial/+queue?queue_state=1.
When do you expect it will be accepted?

Revision history for this message
Aron Xu (happyaron) wrote :

@fourdollars, I've pinged release team for twice but without response, personally it's nice to be accepted asap.

Revision history for this message
Mario Olmedo (molmedo1) wrote :

same issue here on two laptops. This workaround seems to work for me until I get a real fix.

http://askubuntu.com/questions/761180/wifi-doesnt-work-after-suspend-after-16-04-upgrade

-Mario

Revision history for this message
JaSauders (jasauders) wrote :

I noticed that NM 1.2.2 hit proposed yesterday. I pulled down proposed but did not see a change with NM 1.2.2 (i.e. I was still seeing the issue), though after re-reading Aron's message, it sounds like the fix is not 1.2.2, but the fix can be applied to 1.2.2 once it becomes available. Anyway, after this I added the testing PPA fourdollars provided to my main laptop (1 of 5 wireless devices I have that are seeing this issue). I've had it running for the last day or so. As of now I have not seen the issue come up after adding the PPA. Whatever was changed in the PPA (which I can't lie, I'm rather curious to know) seems to have done the trick.

While I could instigate the wireless issue before by closing/opening my laptop lid only a few times (this issue was a quite simple roll of the dice, not difficult to notice, further confusing me as to why it wasn't tackled before release but that's another conversation), the real culprit that seemed to always cause it was when my laptop was suspended for several hours. Even after an all-night suspend, the laptop resumed from suspend without issue and connected to wifi this morning.

Thank you folks for the testing PPA. Fingers crossed this issue can be fixed soon!

Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

This is for 1.2.2 of xenial.

Revision history for this message
XiaoLe.S (e89021) wrote :
Revision history for this message
XiaoLe.S (e89021) wrote :

#36 is for 1.2.2 of yakkety!!

Revision history for this message
Joakim Koed (vooze) wrote :

Thank you for the patches, glad there is some progress, since this https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1589401 seems to be "closed" now :/ - However, its now 20 days later, and I keep seeing more and more people on askubuntu, IRC, reddit etc. have this issue. How can we move forward?

I have just downloaded the network-manager packages from proposed and build with fourdollars's xenial patch.

First two tries with a full reboot, it was showing up as a wifi connection (no ethernet logo) but still no wifi networks in the nm-applet list. Third try it was again showing up as a ethernet logo. So It does not seem to be working.

So no luck. Feel free to ask me to test more patches, I can build myself etc.

Revision history for this message
Aleve Sicofante (sicofante) wrote :

I installed Fourdollars' PPA but I can see current version is higher than that (1.2.4 vs 1.2.2 in the PPA).

I'm using yakkety right now.

How can I test the PPA's version?

Mathew Hodson (mhodson)
tags: added: suspend-resume
Revision history for this message
Hans Deragon (deragon) wrote :

This serious issue is dragging to long. People want computers that "just work". If pinpointing the source of the problem is difficult and few resources are available, why not package a workaround that restarts NetworkManager upon resume?

Attached is my workaround. Works nicely. Finally, I have a computer that "just works" and it's not a Mac (well, I have other issues but this one is solved).

Simple move the script under /lib/systemd/system-sleep and ensure it is executable.

Aron Xu (happyaron)
no longer affects: network-manager
tags: added: papercuts2017
32 comments hidden view all 112 comments
Revision history for this message
taiebot65 (dedreuil) wrote :

Sorry #1659058

Revision history for this message
Tony Espy (awe) wrote :

@taiebot65

Thanks. I'll take a look.

Regarding this bug, it turns out that when I was testing my version of NM with the dropped 'ScanDone' patch (wifi-Signal-on-the-wifi-device...), I'd been doing so on top of the newly re-based 1.2.6, and it turns out there was an actual fix in 1.2.6 which seems to fix the bug:

https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?h=nm-1-2&id=b749f7d31b7e5f57f18026a14b6a444e59a488cf

This new version of NM hasn't yet been SRU'd for xenial, as version 1.2.4-0ubuntu0.16.04.1 is still in -proposed. I'll see if maybe we can expedite this landing and skip 1.2.4 altogether.

Note, I've managed to run 200 cycles of S3 on my Thinkpad without hitting the bug. This included about 25 cycles of manual suspend/resume, and 175 iterations using fwts.

Revision history for this message
Tony Espy (awe) wrote :

Ugh, and then I wake my Thinkpad 410s just now, and I hit the bug on the 201st cycle. ;(-

Guess I'll go back to dropping the patch again from 1.2.6 and see how many cycles I can run on it.

Revision history for this message
cascagrossa (cascagrossa-cascao) wrote :

Just to confirm, about #60 and #65, the workaround does not work.
I also confirm that the script "/lib/systemd/system-sleep/wpasupplicant" never runs, it is certainly a Systemd error.

Revision history for this message
Marco Pedrazzi (pedrazzi2009) wrote :

I confirm, sometime suspend fails and syslog says:
(my ubuntu is an upgrade from 15.10 to 16.04) model: asus-n551vw

Jan 29 19:46:39 asus-n551v systemd[1]: Reached target Sleep.
Jan 29 19:46:39 asus-n551v systemd[1]: Starting Suspend...
Jan 29 19:46:39 asus-n551v systemd-sleep[16200]: Failed to connect to non-global ctrl_ifname: (nil) error: No such file or directory
Jan 29 19:46:39 asus-n551v systemd-sleep[16201]: /lib/systemd/system-sleep/wpasupplicant failed with error code 255.
Jan 29 19:46:39 asus-n551v systemd-sleep[16200]: Suspending system…

and next the pc restart!!

Jan 29 20:39:14 asus-n551v kernel: [ 0.000000] Linux version 4.9.4-040904-generic (kernel@tangerine) (gcc version 6.2.0 20161005 (Ubuntu 6.2.0-5ubuntu12) )

I tryed more version of kernel like: 4.7.3/4.8.0/4.8.3/4.9.0 but everything sometime Fails!

Revision history for this message
Dan Dascalescu (ddascalescu+launchpad) wrote :

"after resume, only a couple of wifi networks will be listed at most, and never the one I use" - that's exactly the symptom I see after resuming my DELL E7450. Also, the Wi-Fi icon is replaced with an "arrow up arrow down" one. `sudo service network-manager restart` reconnects most of the time, but sometimes that wrong icons stays on.

I've just modified `/lib/systemd/system-sleep/wpasupplicant` as described in #64, and will be testing for the next several days. Is a reboot necessary for that modification to take effect?

Revision history for this message
Kevin Brubeck Unhammer (unhammer) wrote :

#64 wrote "could this suggest this might be better fixed with a systemd dependency?", well, http://man7.org/linux/man-pages/man8/systemd-sleep.8.html says

       Note that scripts or binaries dropped in
       /usr/lib/systemd/system-sleep/ are intended for local use only and
       should be considered hacks. If applications want to be notified of
       system suspend/hibernation and resume, there are much nicer
       interfaces available.

(I can't find from that man-page what those interfaces are though.)

Revision history for this message
Tony Espy (awe) wrote :

@Kevin

NetworkManager already has code to monitor system signals related to suspend/resume, so no adding additional scripts to /usr/lib/systemd/system-sleep isn't the answer.

@Dan

Different bug... this bug is caused by NetworkManager's WiFi scanning logic stalling due to a race condition. You can tell this by running 'sudo wpa_cli' and watching for scan events. If you don't see any, then you've hit the bug.

I've also unfortunately confirmed that dropping the original patch from 1.2.6 doesn't fix the problem either. I tried a cycle of 100 with my version of 1.2.6 with the original ScanDone patch dropped and I still tripped the bug.

Revision history for this message
auspex (auspex) wrote :

Seems to me that if NM stalls due to a race condition, then restarting NM
*is* a workaround, so yes, adding additional scripts to systemd is a
solution, but not the "answer".

derek

On Fri, Feb 3, 2017 at 1:11 AM, Tony Espy <email address hidden>
wrote:

> @Kevin
>
> NetworkManager already has code to monitor system signals related to
> suspend/resume, so no adding additional scripts to /usr/lib/systemd
> /system-sleep isn't the answer.
>
> @Dan
>
> Different bug... this bug is caused by NetworkManager's WiFi scanning
> logic stalling due to a race condition. You can tell this by running
> 'sudo wpa_cli' and watching for scan events. If you don't see any, then
> you've hit the bug.
>
> I've also unfortunately confirmed that dropping the original patch from
> 1.2.6 doesn't fix the problem either. I tried a cycle of 100 with my
> version of 1.2.6 with the original ScanDone patch dropped and I still
> tripped the bug.
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (1448555).
> https://bugs.launchpad.net/bugs/1585863
>
> Title:
> WiFi malfunction after suspend & resume stress - sudo wpa_cli scan
> required to fix it.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/oem-priority/+bug/1585863/+subscriptions
>

description: updated
Changed in oem-priority:
status: New → Confirmed
Cyrus Lien (cyruslien)
Changed in oem-priority:
assignee: nobody → Shih-Yuan Lee (fourdollars)
tags: added: somerville
tags: removed: somerville
Changed in oem-priority:
assignee: Shih-Yuan Lee (fourdollars) → nobody
Changed in oem-priority:
status: Confirmed → Triaged
Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

I can not reduplicate this issue on Ubuntu 17.04
It looks like that network-manager 1.4.4 has fixed this issue.

Revision history for this message
Hans Deragon (deragon) wrote :

If network-manager 1.4.4 has fixed this issue, it then needs to be backported to 16.04 LTS and 14.04 LTS.

Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

This issue has been fixed by the following four commits.

From 1b925c0028cdaaf14d4ebd1f07848ba5640915c6 Mon Sep 17 00:00:00 2001
From: Tony Espy <email address hidden>
Date: Thu, 16 Jun 2016 15:07:33 -0400
Subject: [PATCH] wifi: clear WiFi requested_scan if suppl exits

It's possible for wpa_supplicant to exit with an
outstanding requested_scan pending. This can lead
to a stall condition where scanning no longer occurs.

https://mail.gnome.org/archives/networkmanager-list/2016-June/msg00117.html
(cherry picked from commit 899d7e5cb1eb3bddaf92de3644c49c9f634b675e)
---

From eed8fd2e43d244caa856d9993e750ff19ba62fd7 Mon Sep 17 00:00:00 2001
From: Tony Espy <email address hidden>
Date: Thu, 16 Jun 2016 15:07:32 -0400
Subject: [PATCH] wifi: clear WiFi requested_scan if suppl goes INACTIVE

It's possible for wpa_supplicant to transition to INACTIVE
state with an outstanding requested_scan pending. This can
lead to a stall condition where scanning no longer occurs.

[<email address hidden>: added break statement to avoid fall-through]

https://mail.gnome.org/archives/networkmanager-list/2016-June/msg00116.html
---

From 788583d9fd35f9a83c932c5fa6ca059e19fcd7c6 Mon Sep 17 00:00:00 2001
From: Thomas Haller <email address hidden>
Date: Wed, 6 Jul 2016 09:30:46 +0200
Subject: [PATCH] wifi: fix missing pending-action-remove for "scan"

    <warn> [1467730406.7343] device (wlp3s0): add_pending_action (2): scan already pending
    file devices/nm-device.c: line 10443 (nm_device_add_pending_action): should not be reached

Fixes: eed8fd2e43d244caa856d9993e750ff19ba62fd7
---

From f270bc34b4e503d5ba79d6aad1129fb4f49fee05 Mon Sep 17 00:00:00 2001
From: Thomas Haller <email address hidden>
Date: Tue, 14 Feb 2017 15:10:36 +0100
Subject: [PATCH] device/wifi: block autoconnect while scanning is in progress

We should only start autoconnecting after the scan is complete.
Otherwise, we might activate a shared connection or pick a
connection based on an incomplete scan list.

https://bugzilla.gnome.org/show_bug.cgi?id=770938
(cherry picked from commit 2ab2254dd7336b9b7baa03ea1eb1f1c72f7ab6a8)
---

Revision history for this message
Rik Shaw (rik-shaw) wrote :

@fourdollars does this commit get applied to 16.04 proposed and then to backports? I (and I am sure others) would be happy to test this for xenial but I am unclear as to the path the patches follow to get to the main repositories.

Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

I made a PPA at https://launchpad.net/~fourdollars/+archive/ubuntu/lp1585863 to include comment #84.
You can try it and give some feedback.

Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

Oops, sorry. The package is built failed.
I need to revise the patch again.

Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

I missed this commit. It should work now.

From 6eaded9071fbf868476255adb8ee5f416e7ad134 Mon Sep 17 00:00:00 2001
From: Thomas Haller <email address hidden>
Date: Tue, 14 Feb 2017 12:45:38 +0100
Subject: [PATCH] device: add get_autoconnect_allowed() virtual function

It allows derived classes to override the autoconnect-allowed
state.

We already have

- NM_DEVICE_AUTOCONNECT property, which is two parts:
  - NMDevicePrivate::autoconnect_user, which is settable via
    D-Bus by the use, to allow the device to autoconnect.
  - NMDevicePrivate::autoconnect_intern, which is set by
    internal decision.
- NM_DEVICE_AUTOCONNECT_ALLOWED signal, where other devices can
  subscribe to block autoconnect. Currently that is only used
  by NMDeviceOlpcMesh.

These two make up for nm_device_autoconnect_allowed().

Add another way to allow derived classes to disable autoconnect
temporarily. This could also be achieved by having the device
subscribe to NM_DEVICE_AUTOCONNECT_ALLOWED of self, or by adding
a signal slot. But a plain function pointer seems easier.

Revision history for this message
monte (monte3) wrote :

Seems it works for my Lenovo x230t. Thanks for your effort!

Revision history for this message
Rik Shaw (rik-shaw) wrote :

The fix from the fourdollars ppa is also working for me: Lenovo x230.

I ran the stress test before applying the fix and confirmed that wifi was non-functioning after the 30 suspend / wake cycles and needed the "sudo wpa_cli scan" to activate it again.

I then added the ppa and applied the update to network-manager and then rebooted. Then I ran the stress test again and after the 30 suspend / wake cycles wifi was still functioning correctly! This is using Ubuntu 16.04.

Revision history for this message
Vaclav Rehak (vaclav-n) wrote :

For me the problem seems to be fixed with today update to network-manager 1.2.6 in xenial-proposed. So far I did some 5-7 suspend/resume cycles in different locations and everything seems to work.

Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

I tested network-manager 1.2.6-0ubuntu0.16.04.1 from xenial-proposed for 100 cycles, and this issue didn't happen again.

Revision history for this message
cascagrossa (cascagrossa-cascao) wrote :

Also tested network-manager 1.2.6-0ubuntu0.16.04.1 from xenial-proposed for 15 days, and it didn't happen again.
Looks like the error is fixed in that version of network-manager.

Revision history for this message
cascagrossa (cascagrossa-cascao) wrote :

Since network-manager 1.2.6-0ubuntu0.16.04.1 was published in the xenial-updates repository, I think the status can be set to fixed for xenial.

code:
apt show network-manager 2>/dev/null | grep -E 'Version|APT-Sources'
Version: 1.2.6-0ubuntu0.16.04.1
APT-Sources: http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages

At least for my configuration Dell Inspiron 15 (5557).

Changed in oem-priority:
status: Triaged → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Could anyone make sure that this bug is also fixed for yakkety (the original bug target) and zesty? In that case I suppose we could finally close this bug and remove it from the sponsorship queue.

Aron Xu (happyaron)
Changed in network-manager (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Nikolaj Hansen (barnabasdk) wrote :

Confirmed Lenovo T440P

Revision history for this message
PabloAB (pabloab777) wrote :

Also here Ubuntu 16.04, kernel 4.11.3, Asus UX303UB.

Probably related to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1574125

Changed in oem-priority:
status: Fix Released → Triaged
status: Triaged → In Progress
status: In Progress → Triaged
Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

oem-priority needs to the SRU for xenial.
happyaron has finished the major work on https://launchpad.net/~happyaron/+archive/ubuntu/nm-oem/+packages.
Please help to finish the remaining SRU process.

Revision history for this message
Olivier Tilloy (osomon) wrote :

While I don't have a xenial machine handy to test the bug and fix, I had a look at the debdiff for the packages in Aron's PPA, for some sanity checks.
The following patches have been added:

# fix for LP: #1585863
shared-add-nm_auto_close-and-nm_auto_fclose.patch
device-wwan-use-nm_auto_close-instead-of-gs_fd_close.patch
platform-add-a-new-function-nmp_utils_open_sysctl.patch
platform-refactor-wifi_utils_is_wifi-not-to-pass-sys.patch
platform-wifi-use-nmp_utils_open_sysctl-to-check-if-.patch
platform-refactor-nmp_utils_sysctl_open_netdir.patch
all-use-O_CLOEXEC-for-file-descriptors.patch
core-add-utils-for-file-handling.patch

The patches (which as far as I can tell are all cherrypicks from upstream commits) don't appear to be applied in chronological order (for instance platform-refactor-nmp_utils_sysctl_open_netdir.patch clearly uses a new function that was added by core-add-utils-for-file-handling.patch). Not that big of a deal if they apply cleanly, but applying them in chronological order would make it easier to review and maintain them.

Revision history for this message
Olivier Tilloy (osomon) wrote :

@fourdollars: have you observed yourself the bug with 1.2.6-0ubuntu0.16.04.1 on xenial? I'd like to make double-sure Aron's patches are really needed. Comment #92 by you states that the bug was fixed with 1.2.6.

Revision history for this message
Tony Espy (awe) wrote :

This bug was marked FixReleased based on the upload of network-manager 1.2.6-0ubuntu0.16.04.1 to xenial updates and the comments from a many people that the issue was resolved.

Recently there were two comments (#96 and #97) that claim that the bug still exists. Comment #96 doesn't even list which release, and #97 has very little detail. For this bug, the specific case is that after S3, NM failed to restart scanning. So for someone to confirm they've hit this bug, they need to use 'sudo wpa_cli' to check whether or NM is scanning or not. If NM's scan logic is working, you'll see CTRL-EVENT-SCAN-RESULTS messages output by wpa_cli.

Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

Sorry. Comment #92 is not a good test.
I only tested it on my own laptop.
I still received some reports from my colleagues that it doesn't really fix the problem after that comment.

Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

We have tested Aron's patch on many OEM projects. The result is good and it does fx the problem. At least it passed all QA tests.

Revision history for this message
Tony Espy (awe) wrote :

There's no evidence (ie. syslogs, package versions, output of wpa_cli) provided which is a basis for re-opening the bug.

Also, please point us to the *exact* patch that is supposed to fix the problem.

Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

There are many duplicated bugs.
Including this bug, all point to the race conditions, and Aron's patch set are all about the solutions for those race conditions.
If we want to identify every race conditions and the logs, that will be incredibly tedious workloads.
Imaging that we need to de-duplicate those bugs, categorize those bugs and provided the fixes for each bug.

Revision history for this message
Tony Espy (awe) wrote :

My point is that we shouldn't release these patches as an SRU without being able to reproduce the bugs that the patches claim to fix first. Otherwise we risk introducing additional regressions.

So if we want to consider an SRU for NM with one or more of Aaron's patches, then yes we need to go through each one of his patches, try to re-create the problem, validate that the patch fixes it, and then submit the patch upstream for comment as well. I'll put this bug back on the agenda for next week's network/telephony meeting.

Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

IIRC, Aron's patches are from the upstream. That means we don't need to submit those patches to the upstream.

Revision history for this message
Olivier Tilloy (osomon) wrote :

Indeed, all the patches in Aaron's set are cherry-picked from upstream.

However I agree with Tony, we lack evidence that version 1.2.6-0ubuntu0.16.04.1 in xenial is still affected by the original bug.

> I still received some reports from my colleagues that it doesn't
> really fix the problem after that comment.

@fourdollars: Can you get your colleagues to confirm the bug following Tony's instructions in comment #101 ?

Revision history for this message
Shih-Yuan Lee (fourdollars) wrote :

Hi, I will close the request of SRU for oem-priority of this issue and check if we can apply Aron's patches on other network-manager's bugs for SRU.

Changed in oem-priority:
status: Triaged → Fix Released
Revision history for this message
Olivier Tilloy (osomon) wrote :

For future reference, this is the list of upstream commits corresponding to Aaron's patches, in the order they are being applied:

312cea870dfbc363da44074bd6f56ccd283c5420
  shared: add nm_auto_close and nm_auto_fclose

ed299cc8605a8291a61b3a514f8dc20390b18c77
  device/wwan: use nm_auto_close instead of gs_fd_close

713c74f6e4a88f874cf3e9908b3fb153f2ea5b83
  platform: add a new function nmp_utils_open_sysctl()

e714a20bc2464dc97492731a0d656e8c6bab65aa
  platform: refactor wifi_utils_is_wifi() not to pass sysfs_path

b95556eb781a18ee1c96470f40b9e1e162b0ee60
  platform: wifi: use nmp_utils_open_sysctl() to check if device is wifi

76876e896c242fd82d048743ffcf2c0481442dc5
  platform: refactor nmp_utils_sysctl_open_netdir()

4bdee37771ae741f4f9548b52c1db53ddf080fe8
  all: use O_CLOEXEC for file descriptors

dcc8de16b2acc43b2a9155fcfb91fa2602f3a401
  core: add utils for file handling

As far as I can tell, none of them have been backported to the 1.2 branch of NetworkManager.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Is it feasible to put NM 1.4 into -backports pocket?

Revision history for this message
Mark (mago90) wrote :

I am on Lubuntu 16.04.03 and I am having trouble with WIFI reconnection after suspending the system. I opened https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1719731 but without much luck. Is there any chance that my issue is related or a dup of this one?
Has the fix been relseased for 16.04.3? There are various tickets reporting the same issue after suspending...but so far no luck with fixes.

Displaying first 40 and last 40 comments. View all 112 comments or add a comment.