openafs gives segfault on kernel 2.6.22-13

Bug #150469 reported by Joachim Dahl
2
Affects Status Importance Assigned to Milestone
openafs (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Binary package hint: openafs-modules-source

with the latest kernel upgrade on Gutsy, openafs modules build, but "klog" authentication
gives a segmentation fault, and

/etc/init.d/openafs restart hangs openafs

hangs.

Revision history for this message
Achim Bohnet (allee) wrote :

I can't reproduce it on. klog works fine with my AFS user.

Linux allee 2.6.22-13-generic #1 SMP Thu Oct 4 17:18:44 GMT 2007 i686 GNU/Linux

ach@allee(0) ~ $ dpkg -l \*afs\* | grep ^i
ii libpam-openafs-kaserver 1.4.4.dfsg1-7 AFS distributed filesystem kaserver PAM module
ii openafs-client 1.4.4.dfsg1-7 AFS distributed filesystem client support
ii openafs-kpasswd 1.4.4.dfsg1-7 AFS distributed filesystem old password changing
ii openafs-modules-2.6.22-12-generic 1.4.4.dfsg1-7+2.6.22-12.39 AFS distributed filesystem kernel module
ii openafs-modules-2.6.22-13-generic 1.4.4.dfsg1-7+2.6.22-13.40 AFS distributed filesystem kernel module
ii openafs-modules-generic 2.6.22.13 AFS kernel driver for -generic kernel
ii openafs-modules-source 1.4.4.dfsg1-7 AFS distributed filesystem kernel module source

Revision history for this message
Joachim Dahl (jdahl) wrote : Re: [Bug 150469] Re: openafs gives segfault on kernel 2.6.22-13

this is the output from my machine:

joachim@jod-nb:~$ dpkg -l \*afs\* | grep ^i
ii openafs-client 1.4.4.dfsg1-7 AFS
distributed filesystem client support
ii openafs-modules-2.6.22-10-generic 1.4.4.dfsg1-6+2.6.22-10.30 AFS
distributed filesystem kernel module
ii openafs-modules-2.6.22-11-generic 1.4.4.dfsg1-7+2.6.22-11.32 AFS
distributed filesystem kernel module
ii openafs-modules-2.6.22-12-generic 1.4.4.dfsg1-7+2.6.22-12.39 AFS
distributed filesystem kernel module
ii openafs-modules-2.6.22-13-generic 1.4.4.dfsg1-7+2.6.22-13.40 AFS
distributed filesystem kernel module
ii openafs-modules-2.6.22-9-generic 1.4.4.dfsg1-6+2.6.22-9.25 AFS
distributed filesystem kernel module
ii openafs-modules-source 1.4.4.dfsg1-7 AFS
distributed filesystem kernel module sou

joachim@jod-nb:~$ lsmod | grep afs
openafs 558524 4

joachim@jod-nb:~$ klog
Password:
Segmentation fault

Achim Bohnet wrote:
> I can't reproduce it on. klog works fine with my AFS user.
>
> Linux allee 2.6.22-13-generic #1 SMP Thu Oct 4 17:18:44 GMT 2007 i686
> GNU/Linux
>
> ach@allee(0) ~ $ dpkg -l \*afs\* | grep ^i
> ii libpam-openafs-kaserver 1.4.4.dfsg1-7 AFS distributed filesystem kaserver PAM module
> ii openafs-client 1.4.4.dfsg1-7 AFS distributed filesystem client support
> ii openafs-kpasswd 1.4.4.dfsg1-7 AFS distributed filesystem old password changing
> ii openafs-modules-2.6.22-12-generic 1.4.4.dfsg1-7+2.6.22-12.39 AFS distributed filesystem kernel module
> ii openafs-modules-2.6.22-13-generic 1.4.4.dfsg1-7+2.6.22-13.40 AFS distributed filesystem kernel module
> ii openafs-modules-generic 2.6.22.13 AFS kernel driver for -generic kernel
> ii openafs-modules-source 1.4.4.dfsg1-7 AFS distributed filesystem kernel module source
>
>

Revision history for this message
Achim Bohnet (allee) wrote :

Sorry, can't help. I tried to reproduce and could not.
Just want to add: I'm authenticating against a kerberos4
server.

Maybe strace output or running under gdb gives an idea?

Revision history for this message
Joachim Dahl (jdahl) wrote :

I purged everything related to openafs and tried to rebuild using the steps
from /usr/share/openafs-client/README.modules:

   module-assistant auto-build openafs-modules

The problem remains:
root@jod-nb:/home/joachim# dpkg -i
/usr/src/openafs-modules-2.6.22-13-generic_1.4.4.dfsg1-7+2.6.22-13.40_i386.deb

Selecting previously deselected package openafs-modules-2.6.22-13-generic.
(Reading database ... 187010 files and directories currently installed.)
Unpacking openafs-modules-2.6.22-13-generic (from
.../openafs-modules-2.6.22-13-generic_1.4.4.dfsg1-7+2.6.22-13.40_i386.deb)
...
Setting up openafs-modules-2.6.22-13-generic
(1.4.4.dfsg1-7+2.6.22-13.40) ...

root@jod-nb:/home/joachim# insmod
/lib/modules/2.6.22-13-generic/fs/openafs.ko
root@jod-nb:/home/joachim# /etc/init.d/openafs-client start
Starting AFS services: afsd.
afsd: All AFS daemons started.
fs: Invalid argument.

I've never had problems like this before... I am not sure what
authentication
openafs uses by default.

- joachim

Achim Bohnet wrote:
> Sorry, can't help. I tried to reproduce and could not.
> Just want to add: I'm authenticating against a kerberos4
> server.
>
> Maybe strace output or running under gdb gives an idea?
>
>

Revision history for this message
Joachim Dahl (jdahl) wrote :

Achim Bohnet wrote:
> Sorry, can't help. I tried to reproduce and could not.
> Just want to add: I'm authenticating against a kerberos4
> server.
>
> Maybe strace output or running under gdb gives an idea?
>
>
The output of
sudo strace /etc/init.d/openafs-client start

gives me

stat64("/usr/bin/fs", {st_mode=S_IFREG|0755, st_size=310812, ...}) = 0
clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0xb7d936f8) = 25053
wait4(-1, fs: Invalid argument.
[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 25053
--- SIGCHLD (Child exited) @ 0 (0) ---
read(10, "", 8192) = 0
exit_group(0) = ?
Process 25021 detached

Joachim

Revision history for this message
Björn Torkelsson (torkel) wrote :

Can you please run:

sh -x /etc/init.d/openafs-client start

I'm wondering why fs says Invalid argument.

/torkel

Revision history for this message
Joachim Dahl (jdahl) wrote :

Björn Torkelsson wrote:
> Can you please run:
>
> sh -x /etc/init.d/openafs-client start
>
> I'm wondering why fs says Invalid argument.
>
> /torkel
>
>
I ran

joachim@jod-nb:~$ sudo sh -x /etc/init.d/openafs-client force-start

and the relevant part of the output is pasted below:

+ start-stop-daemon --start --quiet --exec /sbin/afsd -- -afsdb -fakestat
afsd: All AFS daemons started.
+ is_on true
+ [ xtrue = xtrue ]
+ return 0
+ fs setcrypt on
fs: Invalid argument.
+ [ -n ]
+

thanks,
joachim

Revision history for this message
Russ Allbery (rra-debian) wrote :

The error message that you're getting is consistent with the kernel module failing to register the AFS system calls. The AFS client programs are then trying to make system calls that don't exist, which result in various odd errors such as the ones you're seeing.

I'm a little confused by the output you're seeing, given that loading the AFS kernel module should result in a message saying that it taints the kernel and starting afsd should produce information about the cache, neither of which are happening. Are you just not showing all of the output that you're getting, or are you missing some output? If you have an old kernel module installed that was never fully removed from the kernel, things like this can happen. Sometimes AFS doesn't want to unload cleanly and the machine has to be rebooted to get back to a consistent state, although that's fairly rare these days.

dmesg output may also be useful, in particular the lines about searching for the system call table. Again, I don't understand why you're not seeing that output when you start the AFS client for the first time after boot.

Revision history for this message
Joachim Dahl (jdahl) wrote :
Download full text (6.8 KiB)

I only have one kernel installed (I removed the previous kernel before I
realized the
afs problem), so it shouldn't be an issue of loading a tainted version
kernel module.

I am attaching the full output from the starting the client as well as
dmesg output below.

Thanks
joachim

joachim@jod-nb:~$ sudo sh -x /etc/init.d/openafs-client force-start
+ PATH=/bin:/usr/bin:/sbin:/usr/sbin
+ CACHEINFO=/etc/openafs/cacheinfo
+ uname -r
+ MODULEDIR=/lib/modules/2.6.22-13-generic/fs
+ exec
+ exec
+ [ -f /etc/openafs/afs.conf ]
+ . /etc/openafs/afs.conf
+ test -f /etc/openafs/afs.conf.client
+ . /etc/openafs/afs.conf.client
+ AFS_CLIENT=true
+ AFS_AFSDB=true
+ AFS_CRYPT=true
+ AFS_DYNROOT=false
+ AFS_FAKESTAT=true
+ VERBOSE=
+ OPTIONS=AUTOMATIC
+ AFS_POST_INIT=
+ AFS_PRE_SHUTDOWN=
+ test -x /sbin/afsd
+ echo -n Starting AFS services:
Starting AFS services:+ load_client
+ [ -z ]
+ choose_client
+ uname -v
+ set X #1 SMP Thu Oct 4 17:18:44 GMT 2007
+ shift
+ MP=.mp
+ [ -n .mp -a -f /lib/modules/2.6.22-13-generic/fs/openafs.mp.o ]
+ [ -n .mp -a -f /lib/modules/2.6.22-13-generic/fs/openafs.mp.ko ]
+ [ -f /lib/modules/2.6.22-13-generic/fs/openafs.ko ]
+ MP=
+ LIBAFS=openafs.ko
+ [ ! -f /lib/modules/2.6.22-13-generic/fs/openafs.ko ]
+ /sbin/lsmod
+ fgrep openafs
+ LOADED=
+ [ -z ]
+ modprobe openafs
+ status=0
+ [ 0 = 0 ]
+ echo -n openafs
 openafs+ return 0
+ start_client
+ pidof /sbin/afsd
+ pidof /usr/sbin/afsd
+ choose_afsd_options
+ [ -z AUTOMATIC ]
+ [ AUTOMATIC = AUTOMATIC ]
+ AFSD_OPTIONS=
+ is_on true
+ [ xtrue = xtrue ]
+ return 0
+ AFSD_OPTIONS= -afsdb
+ is_on false
+ [ xfalse = xtrue ]
+ return 1
+ is_on true
+ [ xtrue = xtrue ]
+ return 0
+ AFSD_OPTIONS= -afsdb -fakestat
+ echo afsd.
 afsd.
+ start-stop-daemon --start --quiet --exec /sbin/afsd -- -afsdb -fakestat
afsd: All AFS daemons started.
+ is_on true
+ [ xtrue = xtrue ]
+ return 0
+ fs setcrypt on
fs: Invalid argument.
+ [ -n ]
+

Output from dmesg after klog gives segfault:
.
.
.
[10736.308000] openafs: module license
'http://www.openafs.org/dl/license10.html' taints kernel.
[10736.440000] Found system call table at 0xc02fc540 (pattern scan)
[10736.564000] Starting AFS cache scan...found 1776 non-empty cache
files (56%).
[11019.800000] BUG: unable to handle kernel NULL pointer dereference at
virtual address 00000000
[11019.800000] printing eip:
[11019.800000] f9b33c3c
[11019.800000] *pde = 00000000
[11019.800000] Oops: 0000 [#1]
[11019.800000] SMP
[11019.800000] Modules linked in: openafs(P) tun af_packet binfmt_misc
i915 drm rfcomm hidp hid l2cap ppdev ipv6 sbs bay video battery button
container ac dock cpufreq_stats cpufreq_ondemand freq_table
cpufreq_powersave cpufreq_userspace cpufreq_conservative lp joydev arc4
ecb blkcipher snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm
snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event
snd_seq iwl4965 pcmcia irda iwlwifi_mac80211 hci_usb snd_timer
snd_seq_device bluetooth cfg80211 crc_ccitt parport_pc parport sky2
pcspkr psmouse snd soundcore snd_page_alloc yenta_socket rsrc_nonstatic
pcmcia_core shpchp pci_hotplug sdhci mmc_core serio_raw intel_agp
agpgart evdev sr_mod cdrom ext3 jbd mbcache sg sd_mod ata...

Read more...

Revision history for this message
Joachim Dahl (jdahl) wrote :

Russ Allbery wrote:
> dmesg output may also be useful, in particular the lines about searching
> for the system call table. Again, I don't understand why you're not
> seeing that output when you start the AFS client for the first time
> after boot.
>
>
I should mention that after experiencing these problems I edited the
"start" section of /etc/init.d/openafs-client to

case "$1" in
start)
    ;;

force-start)

so that afs problems doesn't stall my machine during startup; that's
why I use
/etc/init.d/openafs-client force-start

to start openafs; now whenever I start afs a reboot is necessary to
shutdown my laptop, which I don't want to do unnecessarily.

Revision history for this message
Russ Allbery (rra-debian) wrote :

> [11019.800000] BUG: unable to handle kernel NULL pointer dereference at
> virtual address 00000000

Aha. There's the actual problem.

I'm not sure what's causing it, though. It looks very similar to the problem for AMD64 that was fixed in -5, but you already have -7 and this isn't an AMD64 system from the looks of it. So I'm not sure off-hand what's causing this problem.

I'll forward this to the dev list and see if anyone has any ideas. It's odd that it's not working for you when it's working fine for other people with the same kernel.

Revision history for this message
Joachim Dahl (jdahl) wrote :

I realized that gcc incorrectly symlinks to a beta version of gcc-4.3
which I
installed from the unoffficial ubuntu toolchain repository.

I will rebuild the kernel-modules and report back to hopefully cancel
the bugreport once I correct all the nuked symlinks...

My apologies
Joachim

Russ Allbery wrote:
>> [11019.800000] BUG: unable to handle kernel NULL pointer dereference at
>> virtual address 00000000
>>
>
> Aha. There's the actual problem.
>
> I'm not sure what's causing it, though. It looks very similar to the
> problem for AMD64 that was fixed in -5, but you already have -7 and this
> isn't an AMD64 system from the looks of it. So I'm not sure off-hand
> what's causing this problem.
>
> I'll forward this to the dev list and see if anyone has any ideas. It's
> odd that it's not working for you when it's working fine for other
> people with the same kernel.
>
>

Revision history for this message
Russ Allbery (rra-debian) wrote :

JDahl <email address hidden> writes:

> I realized that gcc incorrectly symlinks to a beta version of gcc-4.3
> which I installed from the unoffficial ubuntu toolchain repository.

> I will rebuild the kernel-modules and report back to hopefully cancel
> the bugreport once I correct all the nuked symlinks...

Oh! Yes, there is a bug in OpenAFS when built with gcc 4.2 or later.
That's not normally how it's manifested, but that could well be related.
This will be fixed (hopefully) in the next Debian package upload.

--
Russ Allbery (<email address hidden>) <http://www.eyrie.org/~eagle/>

Revision history for this message
Joachim Dahl (jdahl) wrote :

after fixing symlinks and rebuilding the modules with gcc-4.1
everythings works again,
so this thread can be disregarded.

Joachim

Russ Allbery wrote:
> JDahl <email address hidden> writes:
>
>
>> I realized that gcc incorrectly symlinks to a beta version of gcc-4.3
>> which I installed from the unoffficial ubuntu toolchain repository.
>>
>
>
>> I will rebuild the kernel-modules and report back to hopefully cancel
>> the bugreport once I correct all the nuked symlinks...
>>
>
> Oh! Yes, there is a bug in OpenAFS when built with gcc 4.2 or later.
> That's not normally how it's manifested, but that could well be related.
> This will be fixed (hopefully) in the next Debian package upload.
>
>

Achim Bohnet (allee)
Changed in openafs:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.