mount.ocfs2: join errors when node with kernel >= 2.6.37 joins with nodes with kernels < 2.6.37

Bug #756894 reported by iMac
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Medium
Unassigned
ocfs2-tools (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Binary package hint: ocfs2-tools

I have two iscsi volumes (2GB and 4GB). Both are mounted using ocfs2-tools (1.4.4-3) on node 0 (192.168.79.14) running Debian 6.0 Squeeze.

On Natty (node 1), o2cb status appears to indicate connectivity with the cluster, and blkid shows the block devices are available. (output from these attached).

Console output:

:~# mount /export/data
mount.ocfs2: Protocol not available while mounting /dev/sdb on /export/data. Check 'dmesg' for more information on this error.

Dmesg output:

[502823.151300] o2net: connected to node media (num 0) at 192.168.79.14:7777
[502827.215736] (mount.ocfs2,23552,0):dlm_send_nodeinfo:1233 ERROR: node mismatch -92, node 0
[502827.215754] (mount.ocfs2,23552,0):dlm_try_to_join_domain:1616 ERROR: status = -92
[502827.215955] (mount.ocfs2,23552,0):dlm_join_domain:1877 ERROR: status = -92
[502827.216067] (mount.ocfs2,23552,0):dlm_register_domain:2143 ERROR: status = -92
[502827.216113] (mount.ocfs2,23552,0):o2cb_cluster_connect:313 ERROR: status = -92
[502827.216124] (mount.ocfs2,23552,0):ocfs2_dlm_init:3086 ERROR: status = -92
[502827.216157] (mount.ocfs2,23552,0):ocfs2_mount_volume:1899 ERROR: status = -92
[502827.216216] ocfs2: Unmounting device (8,16) on (node 0)
[502829.170039] o2net: no longer connected to node media (num 0) at 192.168.79.14:7777
[502833.211031] o2net: connected to node media (num 0) at 192.168.79.14:7777
[502835.268255] (mount.ocfs2,23564,0):dlm_send_nodeinfo:1233 ERROR: node mismatch -92, node 0
[502835.268274] (mount.ocfs2,23564,0):dlm_try_to_join_domain:1616 ERROR: status = -92
[502835.268444] (mount.ocfs2,23564,0):dlm_join_domain:1877 ERROR: status = -92
[502835.268553] (mount.ocfs2,23564,0):dlm_register_domain:2143 ERROR: status = -92
[502835.268602] (mount.ocfs2,23564,0):o2cb_cluster_connect:313 ERROR: status = -92
[502835.268614] (mount.ocfs2,23564,0):ocfs2_dlm_init:3086 ERROR: status = -92
[502835.268645] (mount.ocfs2,23564,0):ocfs2_mount_volume:1899 ERROR: status = -92
[502835.268704] ocfs2: Unmounting device (8,16) on (node 0)
[502837.230036] o2net: no longer connected to node media (num 0) at 192.168.79.14:7777

Fstab:

UUID=d34223f3-5d47-49c3-81b2-59a942de351b /export/data ocfs2 defaults,acl,_netdev,noatime 0 0

blkid:

/dev/sdb: LABEL="data_volume" UUID="d34223f3-5d47-49c3-81b2-59a942de351b" TYPE="ocfs2"

o2cb status:

:~# /etc/init.d/o2cb status
Driver for "configfs": Loaded
Filesystem "configfs": Mounted
Stack glue driver: Loaded
Stack plugin "o2cb": Loaded
Driver for "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster ocfs2: Online
Heartbeat dead threshold = 31
  Network idle timeout: 30000
  Network keepalive delay: 2000
  Network reconnect delay: 2000
Checking O2CB heartbeat: Not active

One node 0 (debian 6) i see the following in dmesg

o2net: accepted connection from node ibm-main (num 1) at 192.168.79.77:7777
o2net: no longer connected to node ibm-main (num 1) at 192.168.79.77:7777

ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: ocfs2-tools 1.6.3-1ubuntu2
ProcVersionSignature: Ubuntu 2.6.38-7.39-server 2.6.38
Uname: Linux 2.6.38-7-server x86_64
Architecture: amd64
Date: Sun Apr 10 17:26:52 2011
ProcEnviron: SHELL=/bin/bash
SourcePackage: ocfs2-tools
UpgradeStatus: Upgraded to natty on 2011-04-05 (5 days ago)
---
AlsaDevices:
 total 0
 crw------- 1 root root 116, 1 Jul 5 23:09 seq
 crw------- 1 root root 116, 33 Jul 5 23:09 timer
AplayDevices: aplay: device_list:240: no soundcards found...
Architecture: amd64
ArecordDevices: arecord: device_list:240: no soundcards found...
CurrentDmesg: Error: command ['sh', '-c', 'dmesg | comm -13 --nocheck-order /var/log/dmesg -'] failed with exit code 1: comm: /var/log/dmesg: Permission denied
DistroRelease: Ubuntu 11.04
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1:
Package: linux (not installed)
ProcEnviron:
 PATH=(custom, no user)
 SHELL=/bin/bash
ProcKernelCmdLine: root=/dev/xvda ro
ProcVersionSignature: Ubuntu 2.6.38-8.42-server 2.6.38.2
Tags: natty
UdevDb: Error: [Errno 2] No such file or directory
Uname: Linux 2.6.38-8-server x86_64
UpgradeStatus: Upgraded to natty on 2011-04-05 (106 days ago)
UserGroups: Domain Users admin
---
AlsaDevices:
 total 0
 crw------- 1 root root 116, 1 2011-07-05 23:09 seq
 crw------- 1 root root 116, 33 2011-07-05 23:09 timer
AplayDevices: aplay: device_list:240: no soundcards found...
Architecture: amd64
ArecordDevices: arecord: device_list:240: no soundcards found...
DistroRelease: Ubuntu 11.04
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1:
Package: linux (not installed)
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: root=/dev/xvda ro
ProcVersionSignature: Ubuntu 2.6.38-8.42-server 2.6.38.2
Tags: natty
Uname: Linux 2.6.38-8-server x86_64
UpgradeStatus: Upgraded to natty on 2011-04-05 (106 days ago)
UserGroups:

Revision history for this message
iMac (imac-netstatz) wrote :
Revision history for this message
iMac (imac-netstatz) wrote :

my cluster.conf

node:
 ip_port = 7777
 ip_address = 192.168.79.14
 number = 0
 name = media
 cluster = ocfs2

node:
 ip_port = 7777
 ip_address = 192.168.79.77
 number = 1
 name = ibm-main
 cluster = ocfs2

node:
 ip_port = 7777
 ip_address = 192.168.79.79
 number = 2
 name = imac-lap
 cluster = ocfs2

cluster:
 node_count = 3
 name = ocfs2

description: updated
Revision history for this message
iMac (imac-netstatz) wrote :

my modules
# lsmod | grep ocfs
ocfs2 788119 0
quota_tree 18308 1 ocfs2
ocfs2_dlmfs 27697 1
ocfs2_stack_o2cb 13322 0
ocfs2_dlm 242947 1 ocfs2_stack_o2cb
ocfs2_nodemanager 225790 12 ocfs2,ocfs2_dlmfs,ocfs2_stack_o2cb,ocfs2_dlm
ocfs2_stackglue 17200 3 ocfs2,ocfs2_dlmfs,ocfs2_stack_o2cb
configfs 35105 2 ocfs2_nodemanager

Revision history for this message
iMac (imac-netstatz) wrote :

On node 0 (debian 6, 1.4.4) the heartbeat parameters have the same value, which was my initial though for the "node mismatch" error, noting that ocfs2 1.6 is supposed to be fully compatible with 1.4.

The o2cb status ouput on node0 is as follows, noting the heartbeat is active because the volumes are mounted.

:~# /etc/init.d/o2cb status
Driver for "configfs": Loaded
Filesystem "configfs": Mounted
Stack glue driver: Loaded
Stack plugin "o2cb": Loaded
Driver for "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster ocfs2: Online
Heartbeat dead threshold = 31
  Network idle timeout: 30000
  Network keepalive delay: 2000
  Network reconnect delay: 2000
Checking O2CB heartbeat: Active

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi There,

Thank you for reporting bugs and trying to make Ubuntu better.

This actually seems like an issue with communication rather than mounting, and appears to be related to the kernel rather than ocfs2-tools package (which would made the bug belong to the kernel rather than this package).

Could you please take a look at the following thread [1] and see if these sounds familiar to you, and let us know.

For now, I'll be marking this bug report as incomplete until further information is provided.

[1]: http://<email address hidden>/msg04710.html

Changed in ocfs2-tools (Ubuntu):
status: New → Incomplete
Revision history for this message
iMac (imac-netstatz) wrote :

Thanks Andres, this is exactly it as my kernels are within the scope of this issue with exactly those symptoms. Following the thread, I noted that the upstream patch is committed and the scope is specific to the o2cb cluster stack.

http://oss.oracle.com/pipermail/ocfs2-devel/2011-April/007996.html

It is part of a set of patches for 2.6.39, but does impact 2.6.37.

http://oss.oracle.com/pipermail/ocfs2-devel/2011-April/007998.html

iMac (imac-netstatz)
affects: ocfs2-tools → ocfs2
Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in ocfs2-tools (Ubuntu):
status: Incomplete → Invalid
summary: - mount.ocfs2 protocol not available mounting error
+ mount.ocfs2: join errors when kernel >= 2.6.37 joins with nodes having
+ kernels < 2.6.37.
summary: - mount.ocfs2: join errors when kernel >= 2.6.37 joins with nodes having
- kernels < 2.6.37.
+ mount.ocfs2: join errors when node with kernel >= 2.6.37 joins with
+ nodes with kernels < 2.6.37
Revision history for this message
iMac (imac-netstatz) wrote :

I successfully applied the attached patch against the current linux-image-2.6.38-8-server (.42) ubuntu source. I grabbed it from this post, and have attached it here separately.

http://oss.oracle.com/pipermail/ocfs2-devel/2011-April/007996.html

It seems like an easily reviewed merge to current 2.6.37 as it appears to simply correct an error in the code where the locking protocol version is evaluated.

I also confirmed this is pending for next bundle into upstream 2.6.39 with oracle. (i.e not in 2.6.39-rc2 changelog yet)

[ 67.429903] scsi0 : iSCSI Initiator over TCP/IP
[ 67.724378] scsi 0:0:0:0: Direct-Access IET VIRTUAL-DISK 0 PQ: 0 ANSI: 4
[ 67.724627] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 67.735449] sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
[ 67.735606] sd 0:0:0:0: [sda] 7814045696 512-byte logical blocks: (4.00 TB/3.63 TiB)
[ 67.735745] sd 0:0:0:0: [sda] Write Protect is off
[ 67.735750] sd 0:0:0:0: [sda] Mode Sense: 77 00 00 08
[ 67.736025] sd 0:0:0:0: [sda] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA
[ 67.737612] sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
[ 67.749812] sda: unknown partition table
[ 67.751647] sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
[ 67.752307] sd 0:0:0:0: [sda] Attached SCSI disk
[ 72.030876] o2net: connected to node media (num 0) at 192.168.79.14:7777
[ 76.156603] OCFS2 1.5.0
[ 76.169929] o2dlm: Nodes in domain D34223F35D4749C381B259A942DE351B: 0 1
[ 76.195126] JBD: Ignoring recovery information on journal
[ 76.300303] ocfs2: Mounting device (8,0) on (node 1, slot 1) with ordered data mode.

root@ibm-main:~# uname -a
Linux ibm-main 2.6.38-8-server #42 SMP Wed Apr 13 10:32:23 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

iMac (imac-netstatz)
Changed in ocfs2:
status: New → Invalid
Revision history for this message
Stefan Bader (smb) wrote :

I agree, the patch looks reasonable. Though it would be much preferable to kindly ask the patch author about the upstream submission, and when there is a commit in the upstream tree, think about SRUing it into (in fact when asking, it would be best if the upstream submission had a cc: <email address hidden> in the signed-off-by area to mark it as a fix for the 2.6.38.y stable tree). That way it won't become a potential problem in 11.10 again.

Revision history for this message
iMac (imac-netstatz) wrote :

I asked this question two weeks ago, noting the Oracle team signed off on it per the thread.

Expect to see it show up anytime now in 2.6.39.

-------- Forwarded Message --------
From: Sunil Mushran <email address hidden>
Cc: Ocfs2-Devel <email address hidden>
Subject: Re: OSS o2dlm Bug reference
Date: Tue, 12 Apr 2011 17:17:55 -0700

This patch was posted a week ago and still is not in the upstream tree.
Typically Joel waits for sometime to collect patches before pushing them
upstream.

On 04/12/2011 09:14 AM, Ian B. MacDonald wrote:
> Sunil,
>
> Quick question,
>
> Is there an easy way to find this patch in upstream kernel/git? i
> browsed the changelog for 2.6.39-rcX .. lots of ocfs2 stuff, but I
> didn't see specific reference to this patch set.
>
> Is that because it is still under review?
>
> cheers,
> Ian

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in dianosing the problem. From a terminal window please run:

apport-collect 756894

and then change the status of the bug back to 'New'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
iMac (imac-netstatz) wrote : ProcCpuinfo.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
iMac (imac-netstatz) wrote : ProcCpuinfo_.txt

apport information

Revision history for this message
iMac (imac-netstatz) wrote : ProcInterrupts.txt

apport information

Revision history for this message
iMac (imac-netstatz) wrote : ProcModules.txt

apport information

Revision history for this message
iMac (imac-netstatz) wrote : UdevLog.txt

apport information

description: updated
Revision history for this message
iMac (imac-netstatz) wrote : BootDmesg.txt

apport information

Revision history for this message
iMac (imac-netstatz) wrote : CurrentDmesg.txt

apport information

Revision history for this message
iMac (imac-netstatz) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
iMac (imac-netstatz) wrote : ProcCpuinfo_.txt

apport information

Revision history for this message
iMac (imac-netstatz) wrote : ProcInterrupts.txt

apport information

Revision history for this message
iMac (imac-netstatz) wrote : ProcModules.txt

apport information

Revision history for this message
iMac (imac-netstatz) wrote : UdevDb.txt

apport information

Revision history for this message
iMac (imac-netstatz) wrote : UdevLog.txt

apport information

Revision history for this message
iMac (imac-netstatz) wrote :

I'll run apport for Brad on the patched server. Unfortunately I can't bring down the OCFS2 cluster, so hopefully all the upstream data, including the Oracle patch is sufficient.

Revision history for this message
iMac (imac-netstatz) wrote :

Scratch that, this headless machine barfed at apport even after oauth.

The authorization page:
 (https://launchpad.net/+authorize-token?oauth_token=-=-&allow_permission=DESKTOP_INTEGRATION)
should be opening in your browser. Use your browser to authorize
this program to access Launchpad on your behalf.

Waiting to hear from Launchpad about your decision...
ERROR: hook /usr/share/apport/package-hooks//source_linux-meta.py crashed:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/apport/report.py", line 656, in add_hooks_info
    symb['add_info'](self, ui)
  File "/usr/share/apport/package-hooks//source_linux-meta.py", line 42, in add_info
    attach_alsa(report)
  File "/usr/lib/python2.7/dist-packages/apport/hookutils.py", line 220, in attach_alsa
    report['PciMultimedia'] = pci_devices(PCI_MULTIMEDIA)
  File "/usr/lib/python2.7/dist-packages/apport/hookutils.py", line 422, in pci_devices
    key, value = line.split(':',1)
ValueError: need more than 1 value to unpack

Revision history for this message
iMac (imac-netstatz) wrote :

This is now resolved in current 11.10 packages, as well as the new 12.04 Beta

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "Patch submitted by oracle to upstream 2.6.39" of this bug report has been identified as being a patch. The ubuntu-reviewers team has been subscribed to the bug report so that they can review the patch. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-reviewers team please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags: added: patch
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.