Hotswapping doesn't work on 2.6.24-19-server kernel with SAS1064

Bug #259164 reported by behemot
8
Affects Status Importance Assigned to Milestone
Linux
Expired
High
linux (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Hi.

I'm experiencing some strange problem with hotswap on my linux box with SAS1064 SAS controller and 2.6.24-19 kernel. When I'm trying to remove disk from system, like this:

# systool -b scsi
Bus = "scsi"

 Device = "0:0:0:0"
 Device = "0:0:1:0"
 Device = "0:0:2:0"
 Device = "0:0:3:0"
 Device = "1:0:0:0"

# echo 1 > /sys/class/scsi_host/host0/device/port-0\:0/end_device-0\:0/target0\:0\:0/0\:0\:0\:0/delete
# systool -b scsi
Bus = "scsi"

 Device = "0:0:1:0"
 Device = "0:0:2:0"
 Device = "0:0:3:0"
 Device = "1:0:0:0"

It's OK. Disk successfully removed. When I'm trying to rescan bus without physical removal of this disk like this:

# echo "- - -" > /sys/class/scsi_host/host0/scan
It's OK again. Disk successfully attached:

# systool -b scsi
Bus = "scsi"

 Device = "0:0:0:0"
 Device = "0:0:1:0"
 Device = "0:0:2:0"
 Device = "0:0:3:0"
 Device = "1:0:0:0"

But when I detach disk from system, physically remove it from box, swap it with another one and try to attach it like:

# echo "- - -" > /sys/class/scsi_host/host0/scan

Nothing happens. Linux doesn't see any new hard drives attached to this host.
Interesting, that if I will swap disk drives again (inserting an old one), and rescan bus - It'll be OK again, and disk will be successfully attached.

I tried 2.6.24-19 and 2.6.26-5 kernels from Hardy and Intrepid distribution and got the same on both of them. But when I tried Vanilla 2.6.24 or 2.6.26 kernels - It worked as expected. Even handled out hotplug events. Kernel 2.6.22-14-server from Gutsy distribution worked fine too.

Revision history for this message
behemot (vlad-seliverstov) wrote :
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
behemot (vlad-seliverstov) wrote :

Hi again.
I've done many tests of this case, and I've figured out that hotplug works fine on SAS1064ET (that's PCI-Express version of SAS1064) with Vanilla 2.6.24 and 2.6.26. But on PCI-X SAS1064 It doesn't works with all kernels I've tried. I've tried following ones:
- Vanilla 2.6.24-4
- Ubuntu 2.6.24-4
- Vanilla 2.6.26
- Vanilla 2.6.26-2
- Vanilla 2.6.26-3
- Vanilla v2.6.27-rc6
- Ubuntu-2.6.27-3.4

To be completely sure I've tried this on different boxes and different SAS1064 controllers and I've got the same.
So, with 2.6.27 kernel it doesn't work too.

Revision history for this message
behemot (vlad-seliverstov) wrote :

I've found kernel, where it works. It's Vanilla v2.6.20. With Vanilla v2.6.22 it doesn't works at all. I've done some bisection with linux-2.6 git tree too. git-bisect log is in attachment. Two skips in the middle - that's about when kernel failed to build, and I've tried to skip some patches to successfully build it. Finally, I've marked that bisection as bad one.
I hope, this will help.
So, here is the winner:

$ git bisect bad
df9e062ad994c4db683377b108c0dbed4690e4b0 is first bad commit
commit df9e062ad994c4db683377b108c0dbed4690e4b0
Author: Eric Moore <email address hidden>
Date: Mon Jan 29 09:46:21 2007 -0700

    [SCSI] fusion - serialize target resets in mptsas.c

    Fusion firmware requires target reset following hotplug removal event,
    with purpose to flush target outstanding request in fw. Current implementation
    does the target resets from delayed work tasks, that in heavy load
    conditions, take too long to be invoked, resulting in command time outs
    This patch will issue target reset immediately from ISR context, and will
    queue remaining target resets to be issued after the previous one completes.
    The delayed work tasks are spawned during the target reset completion.

    Signed-off-by: Eric Moore <email address hidden>
    Signed-off-by: James Bottomley <email address hidden>

:040000 040000 e14002c1c3c13fedd7aa6793030db82cea882eea e1da7950b3dd71f8d56b38c405799140aac5af1c M drivers

Revision history for this message
behemot (vlad-seliverstov) wrote :

Here is the kernel config that I've used to for all bisection builds.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Behemot,

Thanks for doing the bisection, I think this would be good to share with upstream. Care to also open an upstream bug report at http://bugzilla.kernel.org ? Thanks.

Revision history for this message
behemot (vlad-seliverstov) wrote :

I've already done it. Here it is: http://bugzilla.kernel.org/show_bug.cgi?id=11619

Changed in linux:
status: Unknown → Confirmed
Revision history for this message
kernel-janitor (kernel-janitor) wrote :

Hi behemot,

This bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 259164

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu release http://www.ubuntu.com/getubuntu/download . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Invalid
Changed in linux:
importance: Unknown → High
Changed in linux:
status: Confirmed → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.