iscsid tries to reconnect existing session at startup, failing to do so and hanging the system

Bug #850960 reported by Stéphane Graber
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
open-iscsi (Ubuntu)
Fix Released
Undecided
Unassigned
Precise
Won't Fix
Medium
Ante Karamatić

Bug Description

[Impact]
This bug affects iSCSI when acting as an initiator only.

Works: everything when not using an iSCSI root fs.
Works: an iSCSI root fs when not using iSCSI for any other mounts after the root fs is mounted.
Doesn't work: further iSCSI mounts after using an iSCSI root fs. For example: OpenStack won't work on a node using an iSCSI root fs, since OpenStack uses further iSCSI mounts.

[Original Description]

When starting open-iscsi with an already established session (from iscsistart), iscsid tries to reconnect it and fails to do it (wrong AuthMethod).

Before Oneiric, a bug prevented iscsid from starting, making it "work" when root is on iscsi. That's as long as you don't need to mount another lun.

In Oneiric, this bug got fixed, exposing the open-iscsi bug. The workaround for now (bug 838809) is to exit the open-iscsi init script when detecting we already have a session established from the initramfs.

Ideally, open-iscsi should be able to start, detect that a session is already established and either not touch it at all or be able to reconnect it with the right settings.

Tags: server-o-ro
Dave Walker (davewalker)
tags: added: server-o-ro
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in open-iscsi (Ubuntu):
status: New → Confirmed
Revision history for this message
Ante Karamatić (ivoks) wrote :

Unmarking it as a duplicate, cause it's not. This workaround is not needed with open-iscsi 2.0.873 and newer. Workaround also means that iscsid won't be running, making these systems unusable for OpenStack compute nodes.

Revision history for this message
Robie Basak (racb) wrote :

From IRC, Ante reports that is fixed in the newest Quantal version, but needs an SRU for Precise.

Changed in open-iscsi (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Robie Basak (racb) wrote :

Current status as I understood on it (from Ante on IRC):

Fixed properly in Quantal.

Workaround present in Precise that just prevents iscsid from starting, but then one can not mount any other target, which means the system won't work with nova compute. So the fix that went into Quantal needs to be SRU'd.

The upstream commit is possibly https://github.com/mikechristie/open-iscsi/commit/5383b4b373bdea6cc50b2099201dde33de80d145

Robie Basak (racb)
Changed in open-iscsi (Ubuntu Precise):
status: New → Triaged
status: Triaged → In Progress
milestone: none → ubuntu-12.04.1
assignee: nobody → Ante Karamatić (ivoks)
Robie Basak (racb)
description: updated
Changed in open-iscsi (Ubuntu Precise):
importance: Undecided → Medium
Robie Basak (racb)
Changed in open-iscsi (Ubuntu Precise):
milestone: ubuntu-12.04.1 → precise-updates
Revision history for this message
Stephen Gran (sgran) wrote :

Hello,

We're still seeing iscsid not starting on a node with iscsi root. A quick test of the version in quantal looks like it starts iscsid and consequently handles things like target failovers properly.

We have diskless nodes that we would like to use as openstack compute nodes. While this bug is unresolved in precise, it's going to be difficult. How much effort is it to just backport 2.0.873 from Quantal?

Cheers,

Revision history for this message
Robie Basak (racb) wrote :

Anything but cherry-picking a fix would most likely violate SRU policy (https://wiki.ubuntu.com/StableReleaseUpdates). This is the only way a fix can go into precise-updates.

If you want a full backport, then this would need to go into precise-backports. Users would need to enable backports to make use of it, since it isn't the default. See https://wiki.ubuntu.com/UbuntuBackports for details of this process.

Right now, I think the easiest way for this bug to make progress is for someone to figure out what to cherry-pick, test it and then go through the SRU procedure at https://wiki.ubuntu.com/StableReleaseUpdates#Procedure

Revision history for this message
Stephen Gran (sgran) wrote :

Hello,

That seems to go against what is being said by both racb and stgraber above. The quick hack of not starting the daemon breaks using precise, an LTS release, as a compute node for openstack when isci is in use for block storage. This seems a bit extreme to me.

Can you please reconsider?

Thanks,

Revision history for this message
Robie Basak (racb) wrote :

Please reconsider what?

Revision history for this message
Robie Basak (racb) wrote :

This bug having been "In Progress" for over a year, I presume it's not actually in progress any more.

Changed in open-iscsi (Ubuntu Precise):
status: In Progress → Triaged
Revision history for this message
Stephen Gran (sgran) wrote :

Adding this fix into a stable release update for precise.

Revision history for this message
Robie Basak (racb) wrote :

Yes we can, if someone can put forward a suitable patch. I said above:

"Right now, I think the easiest way for this bug to make progress is for someone to figure out what to cherry-pick, test it and then go through the SRU procedure at https://wiki.ubuntu.com/StableReleaseUpdates#Procedure"

I don't understand how you came to the conclusion that we couldn't do this. Backporting 2.0.873 from Quantal, as you requested, is likely to violate SRU policy (see the SRU policy itself for rationale). However, cherry-picking a suitable patch should be fine.

Revision history for this message
Stephen Gran (sgran) wrote :

so, the diffstat between the version I know doesn't work and the version I know does work is:
 335 files changed, 70473 insertions(+), 6135 deletions(-)

If what you're saying is, "you figure it out, that sounds like work to me", that's ok, I understand that. We have a working backport of the new version, and it works for us. I'm not going to figure out which of the 70k lines of code churn made the difference, sorry.

I was just trying to say that the bug that was band-aided to get open-iscsi out the door for precise is now fixed. If you prefer the band-aid at the expense of arguing for a backport, that is of course your prerogative.

Cheers,

Revision history for this message
Robie Basak (racb) wrote :

We will not backport 335 changed files to precise-updates. Not knowing what changes are in there introduces a significant chance of regression for existing open-iscsi users who are not affected by this bug. I'm certain that the SRU team would find this an entirely unacceptable risk. It defeats the whole point of having a stable release, and I understand that the development release is already fixed (and thus so will the next LTS).

Other routes are available, such as the backports repository, but backporting such churn and making it available to users automatically is not an option due to the risk of regression. Only a minimal patch is acceptable for an automatic update. I hope you understand this reasoning.

This bug remains open for Precise. Volunteers to drive any of these options are welcome.

Revision history for this message
Stephen Gran (sgran) wrote :

Have you actually asked the SRU team about this? The version of open-iscsi in precise has a bad hack that means that you can't use diskless nodes as openstack compute hosts. I would have thought that fixing a piece of software used by the UEC platform in an LTS release would qualify as something worth doing. If you don't want to ask yourself, can you point me to who to ask?

Revision history for this message
Robie Basak (racb) wrote :

> The version of open-iscsi in precise has a bad hack that means that you can't use diskless nodes as openstack compute hosts. I would have thought that fixing a piece of software used by the UEC platform in an LTS release would qualify as something worth doing.

Absolutely - it is definitely worth doing. I've said this multiple times, and at no point have I said otherwise. The *way* this needs to be done is with a minimal patch, so as to minimise regressions for other users, as per SRU policy. But nobody has brought such a patch forward.

I have not asked the SRU team. Instead, I'm going on my familiarity with SRU procedure, which is well documented (https://wiki.ubuntu.com/StableReleaseUpdates). But if you would like to consult them, then please do. Exceptions can be made. For the venue? Try https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel or https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss as you feel is appropriate.

Revision history for this message
Zollner Robert (wolfit-ro) wrote :

Has this bug been fixed in 12.04.4? ( or should I install it from source )

- apt cache reports Version: 2.0.873-3ubuntu5~ubuntu12.04.1
- and apps are reporting: iscsiadm version 2.0-871
- the init script still exits when iscsi was ran from initramfs

Thanx.

Revision history for this message
voice06 (voice06) wrote :

This is still an occurring issue that is preventing my server, which uses a dedicated iSCSI HBA, from being able to successfully boot after a fresh install.

The short of it is as follows:
- OS installs via the dedicated iSCSI HBA
- open-iscsi gets installed, then gets configured to autostart and use the same lun the HBA is using
- open-iscsi takes the lun from the HBA during boot, the system completely locks

There doesn't seem to be a choice given regarding the installation of open-iscsi during the installation for server versions of 14.04.1 or 14.10, it just blindly installs the package and auto-configures it. At the very least there should be an option to prevent open-iscsi from installing or by default install it so it does not start at boot if a dedicated HBA is detected.

The hardware is a Dell PowerEdge 810 with a QLogic isp4032-based HBA.

Revision history for this message
Steve Langasek (vorlon) wrote :

The Precise Pangolin has reached end of life, so this bug will not be fixed for that release

Changed in open-iscsi (Ubuntu Precise):
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.