If maas-enlist fails to reach a DNS server, the node will be named ";; connection timed out; no servers could be reached"

Bug #1081660 reported by Raphaël Badin
74
This bug affects 10 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Andres Rodriguez
maas-enlist (Ubuntu)
Invalid
Critical
Andres Rodriguez

Bug Description

Maybe having a network without a proper DNS setup is wrong, but maas-enlist could definitely deal with that more gracefully.

Gavin Panella (allenap)
no longer affects: maas
Changed in maas-enlist (Ubuntu):
status: New → Triaged
importance: Undecided → Critical
Changed in maas-enlist (Ubuntu):
assignee: nobody → Andres Rodriguez (andreserl)
Revision history for this message
Scott Moser (smoser) wrote :

This bug appears only when using precise daily images.

Revision history for this message
Raphaël Badin (rvb) wrote :

@Scott. This bug appears everytime the DNS config is wrong and the query issued an enlisting node times out.

Revision history for this message
Raphaël Badin (rvb) wrote :

issued by* an enlisting node

Revision history for this message
Scott Moser (smoser) wrote :

This was broken in the daily images after a newer version of open-iscsi was uploaded to precise-updates.
That new version did not have fixes for 2 bugs that we had in the ephemeral images:
  * update resolvconf with settings found by ipconfig for the
    interface that contains the iscsi-root filesystem (LP: #1050487)
  * support files written by klibc ipconfig to be found in /tmp or
    /run. copy files in /tmp to /run (LP: #1047722)

To fix this, I have uploaded open-iscsi at version 2.0.871-0ubuntu9.12.04.2~maasppa0 to the maas ephemeral ppa (https://launchpad.net/~maas-maintainers/+archive/maas-ephemeral-images)

The *real* fix is to get these two changes SRU'd.

Revision history for this message
Scott Moser (smoser) wrote :

I have also uploaded to precise-proposed the same version that is in the maas-ephemeral-images archive.

Revision history for this message
mahmoh (mahmoh) wrote :

This occurred on just one of thirty nodes, raring maas (1.3+bzr1461+dfsg-0ubuntu2.2) with precise commissioning on armhf, for me.

Revision history for this message
Diogo Matsubara (matsubara) wrote :

@mahmoh I had a similar issue in the QA lab, where just one of the nodes exhibited the symptom described here (hostname set to "connection timeout"). It turned out I had another DHCP server (different from the cluster controller configured one) running on that subnet and causing this odd behavior.

Revision history for this message
rowez (info-rowez) wrote :

Running MAAS server on Saucy! And Saucy as the default distro for commissioning and deployment.

Using a Arcadyan modem for DHCP range: 192.168.2.2-192.168.2.139 and the Cluster Interfaces set to manage DHCP and DNS. With rangeL192.168.150-192.168.2.167, broadcast-ip: 192.168.2.255 and router-adres to modem.

I have set zone cloud in field Default domain for new nodes (MAAS/setting).

But /etc/bind/maas is zone master, with ip-range:192.168.2.0-192.168.2.255.

Is the directory /etc/bind/maas created by MAAS? I don't think!

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

See bug 1284964 for a possible explanation: the enlist_userdata calls "dig" to look up a hostname, but does not check for errors — and "dig" prints the error message to stdout, not stderr, so it ends up in the hostname field.

Revision history for this message
Leonardo Borda (lborda) wrote :

setting the correct dns-search in the interfaces file fixes the issue for me.

Anders (eddiedog988)
Changed in maas-enlist (Ubuntu):
status: Triaged → Confirmed
Revision history for this message
Julian Edwards (julian-edwards) wrote :

maas-enlist is now in maas's source tree, adjusting tasks appropriately.

Changed in maas:
status: New → Triaged
importance: Undecided → Critical
Changed in maas-enlist (Ubuntu):
status: Confirmed → Invalid
Changed in maas:
milestone: none → 1.6.0
Changed in maas:
milestone: 1.6.0 → 1.6.1
Revision history for this message
Andres Rodriguez (andreserl) wrote :

hell script: (copy/paste from maas_enlist.sh)

##!/bin/sh -x

dig_output=""
ip=$(ifconfig wlan0 | awk '$1 == "inet" { sub("addr:","",$2); print $2; }') &&
     [ -n "${ip}" ] && dig_output=$(dig +short -x $ip) && host=${dig_output%.}
echo $host

-----------------------------

roaksoax@unleashed:~$ ./test.sh
+ dig_output=
+ ifconfig wlan0+
awk $1 == "inet" { sub("addr:","",$2); print $2; }
+ ip=10.0.0.175
+ [ -n 10.0.0.175 ]
+ dig +short -x 10.0.0.175
+ dig_output=;; connection timed out; no servers could be reached
+ echo

$host variable results in an empty string even if dig_output=;; connection timed out; no servers could be reached. I think this is fixed?

Changed in maas:
status: Triaged → Fix Committed
Christian Reis (kiko)
Changed in maas:
assignee: nobody → Andres Rodriguez (andreserl)
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.