1) The 'No compute node record for host phanpy: ComputeHostNotFound_Remote: Compute host phanpy could not be found.' message is benign, this message appears on first start of the `nova-compute` service. It keeps appearing in the log here due to failure to register available resources. See 3)
2) Technically, the compute hosts are partially registered with `nova`:
$ nova service-list
+----+----------------+---------------------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+----+----------------+---------------------+----------+---------+-------+----------------------------+-----------------+
| 1 | nova-conductor | juju-302a0a-2-lxd-2 | internal | enabled | up | 2018-06-29T10:28:00.000000 | - |
| 14 | nova-scheduler | juju-302a0a-2-lxd-2 | internal | enabled | up | 2018-06-29T10:28:01.000000 | - |
| 15 | nova-compute | phanpy | nova | enabled | up | 2018-06-29T10:28:01.000000 | - |
| 16 | nova-compute | aurorus | nova | enabled | up | 2018-06-29T10:28:05.000000 | - |
| 26 | nova-compute | zygarde | nova | enabled | up | 2018-06-29T10:28:05.000000 | - |
+----+----------------+---------------------+----------+---------+-------+----------------------------+-----------------+
3) However the compute hosts does not have any resources. The reason for no resources appearing in `nova` is that `nova-compute` service hits a TraceBack during initial host registration:
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager Traceback (most recent call last):
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 7277, in update_available_resource_for_node
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager rt.update_available_resource(context, nodename)
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 664, in update_available_resource
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager resources = self.driver.get_available_resource(nodename)
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 6438, in get_available_resource
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager self._get_pci_passthrough_devices()
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5945, in _get_pci_passthrough_devices
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager pci_info.append(self._get_pcidev_info(name))
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5906, in _get_pcidev_info
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager device.update(_get_device_capabilities(device, address))
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5877, in _get_device_capabilities
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager pcinet_info = self._get_pcinet_info(address)
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5820, in _get_pcinet_info
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager virtdev = self._host.device_lookup_by_name(devname)
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/host.py", line 838, in device_lookup_by_name
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager return self.get_connection().nodeDeviceLookupByName(name)
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager result = proxy_call(self._autowrap, f, *args, **kwargs)
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager rv = execute(f, *args, **kwargs)
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager six.reraise(c, e, tb)
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager rv = meth(*args, **kwargs)
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/libvirt.py", line 4177, in nodeDeviceLookupByName
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager if ret is None:raise libvirtError('virNodeDeviceLookupByName() failed', conn=self)
2018-06-29 06:25:57.161 35528 ERROR nova.compute.manager libvirtError: Node device not found: no node device with matching name 'net_enP2p1s0f2_02_0f_b7_00_00_01'
As both the above analysis and the referenced mailing list thread suggest, the hardware/driver in question has a peculiar operating mode in that it is not possible to query which physical device a virtual function belongs to:
$ sudo virsh nodedev-list|grep net
<empty>
$ grep libvirtd /var/log/syslog|head -20
Jun 29 08:07:00 phanpy libvirtd[2487]: 2018-06-29 08:07:00.616+0000: 2487: error : virNetSocketReadWire:1811 : End of file while reading data: Input/output error
Jun 29 08:18:37 phanpy libvirtd[110330]: 2018-06-29 08:18:37.946+0000: 110362: info : libvirt version: 4.0.0, package: 1ubuntu8.2 (Marc Deslauriers <email address hidden> Wed, 23 May 2018 13:23:01 -0400)
Jun 29 08:18:37 phanpy libvirtd[110330]: 2018-06-29 08:18:37.946+0000: 110362: info : hostname: phanpy
Jun 29 08:18:37 phanpy libvirtd[110330]: 2018-06-29 08:18:37.946+0000: 110362: error : virCPUGetHost:457 : this function is not supported by the connection driver: cannot detect host CPU model for aarch64 architecture
Jun 29 08:18:37 phanpy libvirtd[110330]: 2018-06-29 08:18:37.946+0000: 110362: error : virCPUGetHost:457 : this function is not supported by the connection driver: cannot detect host CPU model for aarch64 architecture
Jun 29 08:18:37 phanpy libvirtd[110330]: 2018-06-29 08:18:37.958+0000: 110362: error : virCPUGetHost:457 : this function is not supported by the connection driver: cannot detect host CPU model for aarch64 architecture
Jun 29 08:18:37 phanpy libvirtd[110330]: 2018-06-29 08:18:37.958+0000: 110362: error : virCPUGetHost:457 : this function is not supported by the connection driver: cannot detect host CPU model for aarch64 architecture
Jun 29 08:18:37 phanpy libvirtd[110330]: 2018-06-29 08:18:37.973+0000: 110362: error : virCPUGetHost:457 : this function is not supported by the connection driver: cannot detect host CPU model for aarch64 architecture
Jun 29 08:18:37 phanpy libvirtd[110330]: 2018-06-29 08:18:37.974+0000: 110362: error : virCPUGetHost:457 : this function is not supported by the connection driver: cannot detect host CPU model for aarch64 architecture
Jun 29 08:18:38 phanpy libvirtd[110330]: 2018-06-29 08:18:38.189+0000: 110368: error : virNetDevGetPhysicalFunction:1434 : internal error: The PF device for VF enP2p1s0f1 has no network device name
Jun 29 08:18:38 phanpy libvirtd[110330]: 2018-06-29 08:18:38.193+0000: 110368: error : virNetDevGetPhysicalFunction:1434 : internal error: The PF device for VF enP2p1s0f2 has no network device name
Jun 29 08:18:38 phanpy libvirtd[110330]: 2018-06-29 08:18:38.197+0000: 110368: error : virNetDevGetPhysicalFunction:1434 : internal error: The PF device for VF enP2p1s0f3 has no network device name
I have collected a run of libvirtd with debugging enabled and will attach a log from that.
The issue has its roots in hardware/driver operating peculiar or differently than both libvirt and Nova would expect wrt. not being able to query which PF a VF belongs to. What is still unclear to me is how this would be different on Xenial compared to Bionic.
Concluding, I would suggest that the `libvirt` driver in `nova` itself could handle this differently by extending the exception check in `_get_pci_passthrough_devices()` in `nova/virt/libvirt/driver.py`.
1) The 'No compute node record for host phanpy: ComputeHostNotF ound_Remote: Compute host phanpy could not be found.' message is benign, this message appears on first start of the `nova-compute` service. It keeps appearing in the log here due to failure to register available resources. See 3)
2) Technically, the compute hosts are partially registered with `nova`: ------- ------- -+----- ------- ------- --+---- ------+ ------- --+---- ---+--- ------- ------- ------- ----+-- ------- ------- -+ ------- ------- -+----- ------- ------- --+---- ------+ ------- --+---- ---+--- ------- ------- ------- ----+-- ------- ------- -+ 29T10:28: 00.000000 | - | 29T10:28: 01.000000 | - | 29T10:28: 01.000000 | - | 29T10:28: 05.000000 | - | 29T10:28: 05.000000 | - | ------- ------- -+----- ------- ------- --+---- ------+ ------- --+---- ---+--- ------- ------- ------- ----+-- ------- ------- -+
$ nova service-list
+----+-
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+----+-
| 1 | nova-conductor | juju-302a0a-2-lxd-2 | internal | enabled | up | 2018-06-
| 14 | nova-scheduler | juju-302a0a-2-lxd-2 | internal | enabled | up | 2018-06-
| 15 | nova-compute | phanpy | nova | enabled | up | 2018-06-
| 16 | nova-compute | aurorus | nova | enabled | up | 2018-06-
| 26 | nova-compute | zygarde | nova | enabled | up | 2018-06-
+----+-
3) However the compute hosts does not have any resources. The reason for no resources appearing in `nova` is that `nova-compute` service hits a TraceBack during initial host registration:
2018-06-29 06:25:57.161 35528 ERROR nova.compute. manager Traceback (most recent call last): manager File "/usr/lib/ python2. 7/dist- packages/ nova/compute/ manager. py", line 7277, in update_ available_ resource_ for_node manager rt.update_ available_ resource( context, nodename) manager File "/usr/lib/ python2. 7/dist- packages/ nova/compute/ resource_ tracker. py", line 664, in update_ available_ resource manager resources = self.driver. get_available_ resource( nodename) manager File "/usr/lib/ python2. 7/dist- packages/ nova/virt/ libvirt/ driver. py", line 6438, in get_available_ resource manager self._get_ pci_passthrough _devices( ) manager File "/usr/lib/ python2. 7/dist- packages/ nova/virt/ libvirt/ driver. py", line 5945, in _get_pci_ passthrough_ devices manager pci_info. append( self._get_ pcidev_ info(name) ) manager File "/usr/lib/ python2. 7/dist- packages/ nova/virt/ libvirt/ driver. py", line 5906, in _get_pcidev_info manager device. update( _get_device_ capabilities( device, address)) manager File "/usr/lib/ python2. 7/dist- packages/ nova/virt/ libvirt/ driver. py", line 5877, in _get_device_ capabilities manager pcinet_info = self._get_ pcinet_ info(address) manager File "/usr/lib/ python2. 7/dist- packages/ nova/virt/ libvirt/ driver. py", line 5820, in _get_pcinet_info manager virtdev = self._host. device_ lookup_ by_name( devname) manager File "/usr/lib/ python2. 7/dist- packages/ nova/virt/ libvirt/ host.py" , line 838, in device_ lookup_ by_name manager return self.get_ connection( ).nodeDeviceLoo kupByName( name) manager File "/usr/lib/ python2. 7/dist- packages/ eventlet/ tpool.py" , line 186, in doit manager result = proxy_call( self._autowrap, f, *args, **kwargs) manager File "/usr/lib/ python2. 7/dist- packages/ eventlet/ tpool.py" , line 144, in proxy_call manager rv = execute(f, *args, **kwargs) manager File "/usr/lib/ python2. 7/dist- packages/ eventlet/ tpool.py" , line 125, in execute manager six.reraise(c, e, tb) manager File "/usr/lib/ python2. 7/dist- packages/ eventlet/ tpool.py" , line 83, in tworker manager rv = meth(*args, **kwargs) manager File "/usr/lib/ python2. 7/dist- packages/ libvirt. py", line 4177, in nodeDeviceLooku pByName manager if ret is None:raise libvirtError( 'virNodeDeviceL ookupByName( ) failed', conn=self) manager libvirtError: Node device not found: no node device with matching name 'net_enP2p1s0f2 _02_0f_ b7_00_00_ 01'
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
2018-06-29 06:25:57.161 35528 ERROR nova.compute.
As both the above analysis and the referenced mailing list thread suggest, the hardware/driver in question has a peculiar operating mode in that it is not possible to query which physical device a virtual function belongs to:
$ sudo virsh nodedev-list|grep net
<empty>
$ grep libvirtd /var/log/ syslog| head -20 dWire:1811 : End of file while reading data: Input/output error sicalFunction: 1434 : internal error: The PF device for VF enP2p1s0f1 has no network device name sicalFunction: 1434 : internal error: The PF device for VF enP2p1s0f2 has no network device name sicalFunction: 1434 : internal error: The PF device for VF enP2p1s0f3 has no network device name
Jun 29 08:07:00 phanpy libvirtd[2487]: 2018-06-29 08:07:00.616+0000: 2487: error : virNetSocketRea
Jun 29 08:18:37 phanpy libvirtd[110330]: 2018-06-29 08:18:37.946+0000: 110362: info : libvirt version: 4.0.0, package: 1ubuntu8.2 (Marc Deslauriers <email address hidden> Wed, 23 May 2018 13:23:01 -0400)
Jun 29 08:18:37 phanpy libvirtd[110330]: 2018-06-29 08:18:37.946+0000: 110362: info : hostname: phanpy
Jun 29 08:18:37 phanpy libvirtd[110330]: 2018-06-29 08:18:37.946+0000: 110362: error : virCPUGetHost:457 : this function is not supported by the connection driver: cannot detect host CPU model for aarch64 architecture
Jun 29 08:18:37 phanpy libvirtd[110330]: 2018-06-29 08:18:37.946+0000: 110362: error : virCPUGetHost:457 : this function is not supported by the connection driver: cannot detect host CPU model for aarch64 architecture
Jun 29 08:18:37 phanpy libvirtd[110330]: 2018-06-29 08:18:37.958+0000: 110362: error : virCPUGetHost:457 : this function is not supported by the connection driver: cannot detect host CPU model for aarch64 architecture
Jun 29 08:18:37 phanpy libvirtd[110330]: 2018-06-29 08:18:37.958+0000: 110362: error : virCPUGetHost:457 : this function is not supported by the connection driver: cannot detect host CPU model for aarch64 architecture
Jun 29 08:18:37 phanpy libvirtd[110330]: 2018-06-29 08:18:37.973+0000: 110362: error : virCPUGetHost:457 : this function is not supported by the connection driver: cannot detect host CPU model for aarch64 architecture
Jun 29 08:18:37 phanpy libvirtd[110330]: 2018-06-29 08:18:37.974+0000: 110362: error : virCPUGetHost:457 : this function is not supported by the connection driver: cannot detect host CPU model for aarch64 architecture
Jun 29 08:18:38 phanpy libvirtd[110330]: 2018-06-29 08:18:38.189+0000: 110368: error : virNetDevGetPhy
Jun 29 08:18:38 phanpy libvirtd[110330]: 2018-06-29 08:18:38.193+0000: 110368: error : virNetDevGetPhy
Jun 29 08:18:38 phanpy libvirtd[110330]: 2018-06-29 08:18:38.197+0000: 110368: error : virNetDevGetPhy
I have collected a run of libvirtd with debugging enabled and will attach a log from that.
The issue has its roots in hardware/driver operating peculiar or differently than both libvirt and Nova would expect wrt. not being able to query which PF a VF belongs to. What is still unclear to me is how this would be different on Xenial compared to Bionic.
Concluding, I would suggest that the `libvirt` driver in `nova` itself could handle this differently by extending the exception check in `_get_pci_ passthrough_ devices( )` in `nova/virt/ libvirt/ driver. py`.