System hangs when using NFS storage backend with loopback mounts

Bug #1646896 reported by Mihail
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Low
Unassigned

Bug Description

Description
===========
When using high speed disks and NFS as storage backend, during high loads the nfs mounts hang indefinitely.

Steps to reproduce
==================
A chronological list of steps which will bring off the
issue you noticed:
* Spin up a VM with a mounted cinder volume from an NFS backend
* Generate some read/write load
* Occationally the loopback NFS mounts will hang. The machine and everything else using that mount will also hang.

Expected result
===============
The system should run stably

Actual result
=============
Occasionally , usually during higher load the system will hang.

Environment
===========
1. Exact version of OpenStack you are running. See the following
Openstack Kilo

openstack-nova-compute-2015.1.1-1.el7.noarch
openstack-nova-cert-2015.1.1-1.el7.noarch
python-nova-2015.1.1-1.el7.noarch
openstack-nova-console-2015.1.1-1.el7.noarch
openstack-nova-novncproxy-2015.1.1-1.el7.noarch
openstack-nova-common-2015.1.1-1.el7.noarch
python-novaclient-2.23.0-1.el7.noarch
openstack-nova-scheduler-2015.1.1-1.el7.noarch
openstack-nova-api-2015.1.1-1.el7.noarch
openstack-nova-conductor-2015.1.1-1.el7.noarch

2. Which hypervisor did you use?
  Libvirt + KVM,

2. Which storage type did you use?
   NFS

3. Which networking type did you use?
    Neutron with OpenVSwitch

Logs & Configs
==============

Nova.conf:
[DEFAULT]
notification_driver=ceilometer.compute.nova_notifier
notification_driver=nova.openstack.common.notifier.rpc_notifier
notification_driver =
notification_topics=notifications
rpc_backend=rabbit
internal_service_availability_zone=internal
default_availability_zone=nova
notify_api_faults=False
state_path=/openstack/nova
report_interval=10
enabled_apis=ec2,osapi_compute,metadata
ec2_listen=0.0.0.0
ec2_workers=2
osapi_compute_listen=0.0.0.0
osapi_compute_workers=2
metadata_listen=0.0.0.0
metadata_workers=2
compute_manager=nova.compute.manager.ComputeManager
service_down_time=60
rootwrap_config=/etc/nova/rootwrap.conf
auth_strategy=keystone
use_forwarded_for=False
novncproxy_host=192.168.0.1
novncproxy_port=6080
allow_resize_to_same_host=true
block_device_allocate_retries=1560
heal_instance_info_cache_interval=60
reserved_host_memory_mb=512
network_api_class=nova.network.neutronv2.api.API
default_floating_pool=public
force_snat_range=0.0.0.0/0
metadata_host=192.168.0.1
dhcp_domain=novalocal
security_group_api=neutron
debug=True
verbose=True
log_dir=/var/log/nova
use_syslog=False
cpu_allocation_ratio=16.0
ram_allocation_ratio=1.5
scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,CoreFilter
scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
compute_driver=libvirt.LibvirtDriver
vif_plugging_is_fatal=True
vif_plugging_timeout=300
firewall_driver=nova.virt.firewall.NoopFirewallDriver
remove_unused_base_images=true
force_raw_images=True
novncproxy_base_url=http://0.0.0.0:6080/vnc_auto.html
vncserver_listen=192.168.0.1
vncserver_proxyclient_address=127.0.0.1
vnc_enabled=True
vnc_keymap=en-us
volume_api_class=nova.volume.cinder.API
amqp_durable_queues=False
sql_connection=mysql:XXXXXXXXXXX
lock_path=/openstack/nova/tmp
osapi_volume_listen=0.0.0.0
[api_database]
[barbican]
[cells]
[cinder]
[conductor]
workers=2
[database]
[ephemeral_storage_encryption]
[glance]
api_servers=192.168.0.1:9292
[guestfs]
[hyperv]
[image_file_url]
[ironic]
[keymgr]
[keystone_authtoken]
auth_uri=http://192.168.0.1:5000/v2.0
identity_uri=http://192.168.0.1:35357
admin_user=nova
admin_password=XXXXXXx
[libvirt]
virt_type=kvm
inject_password=False
inject_key=False
inject_partition=-1
live_migration_uri=qemu+tcp://nova@%s/system
cpu_mode=host-model
disk_cachemodes=file=writethrough,block=writethrough
nfs_mount_options=rw,hard,intr,nolock,vers=4.1,timeo=10
vif_driver=nova.virt.libvirt.vif.LibvirtGenericVIFDriver
[metrics]
[neutron]
......

Cinder.conf:

[nfs_ssd]
nfs_used_ratio=0.95
nfs_oversub_ratio=10.0
volume_driver=cinder.volume.drivers.nfs.NfsDriver
nfs_shares_config=/etc/cinder/nfs_shares_ssd.conf
volume_backend_name=nfs_ssd
quota_volumes = -1
nfs_mount_options=rw,hard,intr,nolock

- No notable output in nova log

- System log /dmesg after a hang:

Nov 24 04:10:41 openstack1.itgix.com kernel: INFO: task qemu-kvm:11726 blocked for more than 120 seconds.
Nov 24 04:10:41 openstack1.itgix.com kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 24 04:10:41 openstack1.itgix.com kernel: qemu-kvm D ffff88118b1b1f60 0 11726 1 0x00000080
Nov 24 04:10:41 openstack1.itgix.com kernel: ffff880da4c77c40 0000000000000082 ffff881184b86780 ffff880da4c77fd8
Nov 24 04:10:41 openstack1.itgix.com kernel: ffff880da4c77fd8 ffff880da4c77fd8 ffff881184b86780 ffff88118b1b1f58
Nov 24 04:10:41 openstack1.itgix.com kernel: ffff88118b1b1f5c ffff881184b86780 00000000ffffffff ffff88118b1b1f60
Nov 24 04:10:41 openstack1.itgix.com kernel: Call Trace:
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8163bf29>] schedule_preempt_disabled+0x29/0x70
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff81639c25>] __mutex_lock_slowpath+0xc5/0x1c0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff812fbff4>] ? timerqueue_del+0x24/0x70
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8163908f>] mutex_lock+0x1f/0x2f
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8116b60a>] generic_file_aio_write+0x4a/0xc0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffffa06dc03b>] nfs_file_write+0xbb/0x1d0 [nfs]
Nov 24 04:10:41 openstack1.itgix.com kernel: Call Trace:
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8163bf29>] schedule_preempt_disabled+0x29/0x70
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff81639c25>] __mutex_lock_slowpath+0xc5/0x1c0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8163908f>] mutex_lock+0x1f/0x2f
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8116b60a>] generic_file_aio_write+0x4a/0xc0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffffa06dc03b>] nfs_file_write+0xbb/0x1d0 [nfs]
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff811dde5d>] do_sync_write+0x8d/0xd0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff811de67d>] vfs_write+0xbd/0x1e0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff811df2d2>] SyS_pwrite64+0x92/0xc0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff81645ec9>] system_call_fastpath+0x16/0x1b

Revision history for this message
Mihail (mihail.itgix) wrote :

I will post how we mitigate the problem when I formulate it.

Revision history for this message
Matt Riedemann (mriedem) wrote :

I'm not sure what nova can be doing with this. Might also be a cinder issue. I'm not sure if there are qemu or nfs configs to adjust for this, or image meta / flavor / volume type extra specs to use to tune this.

tags: added: nfs volumes
Revision history for this message
Mihail (mihail.itgix) wrote :
Download full text (4.9 KiB)

We have a reason to beleve that the root cause is the nfs client and server hitting a deadlock, because they are running on the same machine using the same memory resources. This is due to the loopback NFS mounts, as the way nova operates is to mount NFS even for local volumes.

More information can be found in this article: https://lwn.net/Articles/595652/

While this is a purely NFS issue, there are ways to mitigate the issue in nova as resolution in NFS package is nowhere to be seen for years.

Our proposal is the following. Add a check in nova to determine if the volume is from a local cinder storage node. If storage is local do a "bind" mount instead of a normal NFS client mount. This way we work around the NFS loopback deadlock problem.
The imperfection of the change is that a bind mount cannot be discovered by the is_mounted method and will be mounted again on every request. Also it will currently not pick up a local mount if the nfs_shares address is a hostname instead of an IP address.

Here is the diff of the proposed change:

49,55d48
< """Misho : Needed for nfs client to recognize local shares """
< import socket
< import fcntl
< import struct
< import netifaces
< import pprint
<
937d929
< LOG.debug("Checking if the mount path is mounted: " + mount_path)
945,964c937
< address = []
<
< for iface in netifaces.interfaces():
< try:
< alladdr = netifaces.ifaddresses(iface)[netifaces.AF_INET]
< ## [{'peer': '127.0.0.1', 'netmask': '255.0.0.0', 'addr': '127.0.0.1'} ...
< for theaddr in alladdr:
< address.append( theaddr['addr'] ) # '192.168.0.110'
< except:
< LOG.debug("Couldn't get address for :"+iface)
< ...

Read more...

Revision history for this message
Sean Dague (sdague) wrote :

Please propose the fix via gerrit, and we can figure out from there if it's invasive enough to need a spec.

Changed in nova:
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
Mihail (mihail.itgix) wrote :

Submit under: https://review.openstack.org/422041 proposed fix for bug 1646896 - System hangs when using NFS storage backend ...

Revision history for this message
Mihail (mihail.itgix) wrote :

Added mitigation of the issue with duplicate bind mounts.
Please review latest patch set : https://review.openstack.org/422041

Revision history for this message
Sean Dague (sdague) wrote :

Automatically discovered version kilo in description. If this is incorrect, please update the description to include 'nova version: ...'

tags: added: openstack-version.kilo
Sean Dague (sdague)
Changed in nova:
status: Confirmed → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.