linux-image-5.4.0-97.110 freezes by accessing cifs shares

Bug #1959665 reported by Sebastian Berner
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
Medium
Tim Gardner

Bug Description

Ubuntu 20.04 with kernel image 5.4.0-97.110 randomly freezes when we access cifs shares. The system needs a hard reset after the freeze.

The previous kernel image (5.4.0-96.109) does not show this behavior. Multiple Ubuntu 20.04 systems running on kernel image 9.4.0-97.110 are affected on our sites.

The cifs shares are accessed via autofs using Kerberos authentication.

The following info is the only error message we could actively catch on a different system before the system was frozen:

general protection fault: 0000 [#3] SMP PTI
CPU: 0 PID: 21294 Comm: automount Tainted: G D OE 5.4.0-97-generic #110-Ubuntu
Hardware name: Dell Inc. Precision 7530/03RV2M, BIOS 1.14.4 10/21/2020
RIP: 0010:kmem_cache_alloc_trace+0x8c/0x240
Code: 08 65 4c 03 05 1d e8 f7 6e 49 83 78 10 00 4d 8b 38 0f 84 92 01 00 00 4d 85 ff 0f 84 89 01 00 00 41 8b 41 20 49 8b 39 4c 01 f8 <48> 8b 18 48 89 c1 49 33 99 70 01 00 00 4c 89 f8 48 0f c9 48 31 cb
RSP: 0018:ffffa6ca860f7ba0 EFLAGS: 00010286
RAX: fdd44d7c219190e1 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 000000000000d558 RSI: 0000000000000cc0 RDI: 0000000000035060
RBP: ffffa6ca860f7bd0 R08: ffff99233c035060 R09: ffff992338c079c0
R10: 0000000000000001 R11: 0000000000000004 R12: 0000000000000cc0
R13: 000000000000000b R14: ffff992338c079c0 R15: fdd44d7c219190e1
FS: 00007fdc24e29700(0000) GS:ffff99233c000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f15a80ee010 CR3: 00000007e36b2006 CR4: 00000000003606f0
Call Trace:
 ? proc_self_get_link+0x70/0xd0
 proc_self_get_link+0x70/0xd0
 link_path_walk.part.0+0x478/0x550
 ? trailing_symlink+0x1d1/0x280
 path_openat+0xb7/0x290
 do_filp_open+0x91/0x100
 ? unuse_pde+0x30/0x30
 do_sys_open+0x17e/0x290
 __x64_sys_openat+0x20/0x30
 __orig_openat+0x71/0xc0 [eset_rtp]
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __x64_sys_futex+0x13f/0x170
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 __x64_sys_ertp_openat+0x29/0x60 [eset_rtp]
 do_syscall_64+0x57/0x190
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fdc28a41f24
Code: 24 20 eb 8f 66 90 44 89 54 24 0c e8 56 68 f8 ff 44 8b 54 24 0c 44 89 e2 48 89 ee 41 89 c0 bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 32 44 89 c7 89 44 24 0c e8 88 68 f8 ff 8b 44
RSP: 002b:00007fdc24e25980 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 00007fdc040008d0 RCX: 00007fdc28a41f24
 ? unuse_pde+0x30/0x30
 do_sys_open+0x17e/0x290
 __x64_sys_openat+0x20/0x30
 __orig_openat+0x71/0xc0 [eset_rtp]
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __x64_sys_futex+0x13f/0x170
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 __x64_sys_ertp_openat+0x29/0x60 [eset_rtp]
 do_syscall_64+0x57/0x190
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fdc28a41f24
Code: 24 20 eb 8f 66 90 44 89 54 24 0c e8 56 68 f8 ff 44 8b 54 24 0c 44 89 e2 48 89 ee 41 89 c0 bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 32 44 89 c7 89 44 24 0c e8 88 68 f8 ff 8b 44
RSP: 002b:00007fdc24e25980 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 00007fdc040008d0 RCX: 00007fdc28a41f24
RDX: 0000000000080000 RSI: 0000564d4410a8c8 RDI: 00000000ffffff9c
RBP: 0000564d4410a8c8 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000080000
R13: 0000564d4410e28a R14: 00007fdc24e25b40 R15: 00007fdc24e28fc0
Modules linked in: cmac nls_utf8 cifs fscache libdes ccm vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) eset_rtp(OE) xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ip>
 snd_seq_midi coretemp snd_seq_midi_event snd_rawmidi kvm_intel snd_seq kvm dell_wmi snd_seq_device rapl dell_smbios snd_timer dcdbas intel_cstate input_leds iwlwifi serio_raw snd intel_wmi_thunderbolt rtsx_pci_ms dell_wmi_descriptor wmi_bmof processor_thermal_device ucsi_acp>
 pinctrl_cannonlake video pinctrl_intel
---[ end trace 80828f22da45a19b ]---
RIP: 0010:__slab_free+0x199/0x360
Code: 00 48 89 c7 fa 66 0f 1f 44 00 00 f0 49 0f ba 2c 24 00 72 79 4d 3b 6c 24 20 74 11 49 0f ba 34 24 00 57 9d 0f 1f 44 00 00 eb 9f <0f> 0b 49 3b 5c 24 28 75 e8 48 8b 44 24 28 49 89 4c 24 28 49 89 44
RSP: 0018:ffffa6ca85a8f8c0 EFLAGS: 00010246
RAX: ffff9921db223620 RBX: 0000000080800046 RCX: ffff9921db223620
RDX: ffff9921db223620 RSI: fffff5c79c6c88c0 RDI: ffff992338c07800
RBP: ffffa6ca85a8f968 R08: 0000000000000001 R09: ffffffffc1514620
R10: ffff9921db223620 R11: 0000000000000001 R12: fffff5c79c6c88c0
R13: ffff9921db223620 R14: ffff992338c07800 R15: ffff9922e0c4a800
FS: 00007fdc24e29700(0000) GS:ffff99233c000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.4.0-97-generic 5.4.0-97.110
ProcVersionSignature: Ubuntu 5.4.0-97.110-generic 5.4.162
Uname: Linux 5.4.0-97-generic x86_64
ApportVersion: 2.20.11-0ubuntu27.21
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: gdm 1562 F.... pulseaudio
                      seko7335 2504 F.... pulseaudio
CasperMD5CheckResult: skip
Date: Tue Feb 1 11:38:40 2022
HibernationDevice: RESUME=UUID=8ad0aa7e-9033-401d-82ab-ba2f280cde29
InstallationDate: Installed on 2020-04-24 (648 days ago)
InstallationMedia: Ubuntu 20.04 LTS "Focal Fossa" - Release amd64 (20200423)
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 003: ID 8087:0025 Intel Corp.
 Bus 001 Device 002: ID 0a5c:5831 Broadcom Corp. 5880
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Lsusb-t:
 /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 10000M
 /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/16p, 480M
     |__ Port 10: Dev 2, If 0, Class=Application Specific Interface, Driver=, 12M
     |__ Port 14: Dev 3, If 0, Class=Wireless, Driver=btusb, 12M
     |__ Port 14: Dev 3, If 1, Class=Wireless, Driver=btusb, 12M
MachineType: Dell Inc. Precision 7530
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.4.0-97-generic root=/dev/mapper/ubuntu--vg-root ro quiet splash
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-97-generic N/A
 linux-backports-modules-5.4.0-97-generic N/A
 linux-firmware 1.187.25
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 12/09/2021
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.19.0
dmi.board.name: 0425K7
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 10
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.19.0:bd12/09/2021:svnDellInc.:pnPrecision7530:pvr:rvnDellInc.:rn0425K7:rvrA00:cvnDellInc.:ct10:cvr:
dmi.product.family: Precision
dmi.product.name: Precision 7530
dmi.product.sku: 0831
dmi.sys.vendor: Dell Inc.
modified.conffile..etc.default.apport: [modified]
mtime.conffile..etc.default.apport: 2021-02-03T11:40:27.470104

---
- fixed typo in kernel version 9.4.0-97.110 -> 5.4.0-97.110 (thanks @kaiszy)

CVE References

Revision history for this message
Sebastian Berner (sebastian-berner) wrote :
Revision history for this message
kaiszy (kai-szymanski) wrote :

We can confirm that. Our machines (8 VMs on a VSphere cluster) also freeze after the update to 5.4.0-97 #100.

Revision history for this message
kaiszy (kai-szymanski) wrote :

After rolling back to 5.4.0-80 #90 (thanks to Puppet ;) the machines are stable again.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
kaiszy (kai-szymanski) wrote :

I tried now also with kernel 5.4.0-96 #109. As Sebastian says: This also works stable.

summary: - linux-image-9.4.0-97.110 freezes by accessing cifs shares
+ linux-image-5.4.0-97.110 freezes by accessing cifs shares
description: updated
Revision history for this message
kaiszy (kai-szymanski) wrote :
Download full text (5.4 KiB)

```
Servers:
Number of credits: 185 Dialect 0x300
1) Name: a.b.c.d Uses: 1 Capability: 0x300067 Session Status: 1 TCP status: 1 Instance: 1
        Local Users To Server: 1 SecMode: 0x1 Req On Wire: 0 SessionId: 0xc1be04000435
        Shares:
        0) IPC: \\xxx\IPC$ Mounts: 1 DevInfo: 0x0 Attributes: 0x0
        PathComponentMax: 0 Status: 1 type: 0 Serial Number: 0x0
        Share Capabilities: None Share Flags: 0x30
        tid: 0x1 Maximal Access: 0x11f01ff

        1) \\xxx\yyy Mounts: 1 DevInfo: 0x60020 Attributes: 0xc700ff
        PathComponentMax: 255 Status: 1 type: DISK Serial Number: 0xaaf791c3
        Share Capabilities: DFS, Aligned, Partition Aligned, TRIM-support, Share Flags: 0x803
        tid: 0x5 Optimal sector size: 0x200 Maximal Access: 0x1200a9

        MIDs:

        Server interfaces: 1
        0) Speed: 10000000000 bps
                Capabilities:
                IPv4: e.f.g.h

 Debuginfo:

[ 710.383873] kernel BUG at mm/slub.c:307!
[ 710.384582] invalid opcode: 0000 [#1] SMP PTI
[ 710.385116] CPU: 1 PID: 3089 Comm: umount Not tainted 5.4.0-97-generic #110-Ubuntu
[ 710.385775] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[ 710.386903] RIP: 0010:__slab_free+0x199/0x360
[ 710.387470] Code: 00 48 89 c7 fa 66 0f 1f 44 00 00 f0 49 0f ba 2c 24 00 72 79 4d 3b 6c 24 20 74 11 49 0f ba 34 24 00 57 9d 0f 1f 44 00 00 eb 9f <0f> 0b 49 3b 5c 24 28 75 e8 48 8b 44 24 28 49 89 4c 24 28 49 89 44
[ 710.389229] RSP: 0018:ffffb94541417ce0 EFLAGS: 00010246
[ 710.389809] RAX: ffff90cfb5fab120 RBX: 000000008080007f RCX: ffff90cfb5fab120
[ 710.390406] RDX: ffff90cfb5fab120 RSI: ffffec7d84d7eac0 RDI: ffff90cfbb003880
[ 710.390993] RBP: ffffb94541417d90 R08: 0000000000000001 R09: ffffffffc08ed620
[ 710.391574] R10: ffff90cfb5fab120 R11: 0000000000000001 R12: ffffec7d84d7eac0
[ 710.392147] R13: ffff90cfb5fab120 R14: ffff90cfbb003880 R15: 0000000000000000
[ 710.392709] FS: 00007f7b219e5840(0000) GS:ffff90cfbba80000(0000) knlGS:0000000000000000
[ 710.393274] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 710.393820] CR2: 000055a3da30b9d0 CR3: 000000012658a006 CR4: 00000000007606e0
[ 710.394545] PKRU: 55555554
[ 710.395093] Call Trace:
[ 710.395760] ? fscache_free_cookie+0x39/0x60 [fscache]
[ 710.396480] ? cifs_cleanup_volume_info_contents+0x30/0x60 [cifs]
[ 710.397026] kfree+0x231/0x250
[ 710.397593] cifs_cleanup_volume_info_contents+0x30/0x60 [cifs]
[ 710.398159] dfs_cache_del_vol+0xec/0x1a0 [cifs]
[ 710.398720] cifs_umount+0x9e/0xd0 [cifs]
[ 710.399270] cifs_kill_sb+0x1f/0x30 [cifs]
[ 710.399807] deactivate_locked_super+0x3b/0x80
[ 710.400358] deactivate_super+0x3e/0x50
[ 710.400882] cleanup_mnt+0x109/0x160
[ 710.401403] __cleanup_mnt+0x12/0x20
[ 710.401919] task_work_run+0x8f/0xb0
[ 710.402456] exit_to_usermode_loop+0x131/0x160
[ 710.402956] do_syscall_64+0x163/0x190
[ 710.403448] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 710.403922] RIP: 0033:0x7f7b21c4a2cb
[ 710.404380] Code: 8b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6...

Read more...

Revision history for this message
Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Focal):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Tim Gardner (timg-tpi)
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
kaiszy (kai-szymanski) wrote :

Hi,

thanks for the fast reply! I have just installed it:

:~$ uname -a
Linux foo.bar 5.4.0-98-generic #111~lp1959665.1 SMP Tue Feb 1 19:50:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

I mount all shares and start my tests (ncdu and a simple bash testscript). With 97#110 it takes 30 seconds to crash a machine. ...with the 98#111 the scripts work (i abort the test after 60 minutes). I dont get a kernel panic with this kernel. Maybe Sebastian can confirm this ?

Revision history for this message
Thomas Hansen (thomasphansen) wrote (last edit ):

Hello,

Thanks for the help! :)

Are these cifs patches present in 5.13.0.27? I got similar issues with 5.4.0.97 and 5.13.0.27, and moving to previous releases fixed the issues in both cases! :)

Cheers!

Thomas

Revision history for this message
Sebastian Berner (sebastian-berner) wrote :

Hi,

thanks for the quick reply.

I can confirm, that while running on 5.4.0-98-generic #111~lp1959665.1 I was able to access all of our cifs shares without issues.

A script, which creates, reads, modifies and deletes files on different cifs shares also works without any issue.

Removing the latest cifs patches from kernel 5.4.0-97.110 seems to fix our issue.

I will keep testing on 5.4.0-98-generic #111~lp1959665.1.

Best regards,
Sebastian

Revision history for this message
Sebastian Berner (sebastian-berner) wrote :

Just a quick heads up on my tests with different kernel versions. I have installed all available Updates for Ubuntu 20.04 (including samba-*= 2:4.13.17~dfsg-0ubuntu0.21.04.1; CVE-2022-0336). These are the results:

GA:
5.4.0-98.111~lp1959665.1 -> no freezes in 3 hours of testing
5.4.0-97.110 -> still keeps freezing

HWE:
5.13.0-27.29~20.04.1 -> no freezes in 1 hour of testing
5.13.0-28.31~20.04.1 -> no freezes in 1 hour of testing

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Hi all - thanks for the quick testing. We are evaluating the best course of action. In the short term that will likely be a respin with these patches reverted. Longer term I will work to figure out the error and find a correction.

rtg

Revision history for this message
kaiszy (kai-szymanski) wrote :

Hi Tim,

thanks for your afford!

Best regards,
  Kai.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

The final decision was to respin:

linux (5.4.0-99.112) focal; urgency=medium

  * focal/linux: 5.4.0-99.112 -proposed tracker (LP: #1959817)

  * linux-image-5.4.0-97.110 freezes by accessing cifs shares (LP: #1959665)
    - Revert "cifs: To match file servers, make sure the server hostname matches"
    - Revert "cifs: set a minimum of 120s for next dns resolution"
    - Revert "cifs: use the expiry output of dns_query to schedule next
      resolution"

 -- Stefan Bader <email address hidden> Wed, 02 Feb 2022 17:21:18 +0100

This affects all Focal and Bionic 5.4 based derivatives as well.

Revision history for this message
kaiszy (kai-szymanski) wrote :

Great News, thanks!

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.4.0-99.112

---------------
linux (5.4.0-99.112) focal; urgency=medium

  * focal/linux: 5.4.0-99.112 -proposed tracker (LP: #1959817)

  * linux-image-5.4.0-97.110 freezes by accessing cifs shares (LP: #1959665)
    - Revert "cifs: To match file servers, make sure the server hostname matches"
    - Revert "cifs: set a minimum of 120s for next dns resolution"
    - Revert "cifs: use the expiry output of dns_query to schedule next
      resolution"

 -- Stefan Bader <email address hidden> Wed, 02 Feb 2022 17:21:18 +0100

Changed in linux (Ubuntu Focal):
status: In Progress → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/5.4.0-1069.72 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Tim Gardner (timg-tpi) wrote :

@sebastian-berner or @kai-szymanski - please confirm that this fixes your cifs issue.

Revision history for this message
Sebastian Berner (sebastian-berner) wrote :

Hi Tim,

I can confirm that 5.4.0-99.112 fixes the cifs issue.

Testing in our environment does not show any freezes accessing cifs shares while using 5.4.0-99.112.

Best Regards,
Sebastian

Revision history for this message
kaiszy (kai-szymanski) wrote :

Hi Tim,

i also can confirm that 5.4.0-99.112 fixes the bug. Testscript runs for more than a hour with this kernel without freezing the machine.

Thanks!

tags: added: verification-done-focal
removed: verification-needed-focal
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.