High load average on Lucid for nominal/idle system use

Bug #635181 reported by Chris
46
This bug affects 7 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

This is a VBox guest. It has been updated to the latest kernel.

Doing an rsync of a folder containing a large amount of data in sub folders to an NFS mounted devices. the load average will rise to +6 and render the console frozen. Trying to login to the guest via ssh will wait for 10 min before the tcp session times out. Rebooting and Trying again the system did not freeze, but the load average stays high. Observations of iftop, iotop and top show no process is taking the system resources (zero io on iftop and iotop). could not Ctrl-C out of rsync. Forced to kill rsync pid.

Similar load average comes from doing intra vdi copies of large files. Doing a DD of /dev/zero from the virtual system to a USB mounted devices also causes the issue above.

Parent OS is also lucid with latest patches. Moving large files from one device to another causes similar load average. Identical response.

After the #42 kernel load averages went from 50 to 5 but lack of usability still remains. Though the load average numbers are more realistic, the issue of load averages climbing to high numbers for what should be a modest request seems unreasonable.

Likewise when "idle" the systems exhibit load averages above 0.50 when only a terminal is open on them. (all VMs shutdown, no browser, etc active).

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-24-generic 2.6.32-24.42
Regression: Yes
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.32-24.42-generic 2.6.32.15+drm33.5
Uname: Linux 2.6.32-24-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
AplayDevices:
 **** List of PLAYBACK Hardware Devices ****
 card 0: I82801AAICH [Intel 82801AA-ICH], device 0: Intel ICH [Intel 82801AA-ICH]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: oc010000 1399 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'I82801AAICH'/'Intel 82801AA-ICH with STAC9700,83,84 at irq 21'
   Mixer name : 'SigmaTel STAC9700,83,84'
   Components : 'AC97a:83847600'
   Controls : 34
   Simple ctrls : 24
Date: Fri Sep 10 12:42:38 2010
HibernationDevice: RESUME=UUID=9a57a528-17c6-4d64-ad60-955cefbb142e
InstallationMedia: Ubuntu 10.04 LTS "Lucid Lynx" - Release amd64 (20100429)
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.

 eth1 no wireless extensions.
Lsusb:
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: innotek GmbH VirtualBox
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-24-generic root=UUID=86816374-789f-498d-ba54-99093a90f759 ro quiet splash
ProcEnviron:
 LC_TIME=en_DK.utf8
 PATH=(custom, user)
 LANG=en_US.utf8
 SHELL=/bin/bash
RelatedPackageVersions: linux-firmware 1.34.1
RfKill:

SourcePackage: linux
dmi.bios.date: 12/01/2006
dmi.bios.vendor: innotek GmbH
dmi.bios.version: VirtualBox
dmi.modalias: dmi:bvninnotekGmbH:bvrVirtualBox:bd12/01/2006:svninnotekGmbH:pnVirtualBox:pvr1.2:
dmi.product.name: VirtualBox
dmi.product.version: 1.2
dmi.sys.vendor: innotek GmbH

Revision history for this message
Chris (nakota07) wrote :
Revision history for this message
Chris (nakota07) wrote :

I am uploading a sar -P (load average) for another VM I have. Violet was kept unused with only one running application (terminal) and no other running programs. During the time of the sar captures, the system was kept unused. I would expect that a load average of 0.00 or 0.10 might be expected for lavg1.

Revision history for this message
Chris (nakota07) wrote :

Data collected from violet.

Revision history for this message
Chris (nakota07) wrote :

This is a fork from bug ID 574910 to deal with non EC high load averages on idle systems.

Revision history for this message
Alvin (alvind) wrote :

Are you sure there is no I/O? You say iotop doesn't show anything, but iotop is broken in Lucid.

Revision history for this message
Chris (nakota07) wrote :

(sigh) another drive by comment for karma points.

In the NFS situation I would expect NFS tasks to 'float high' in regular TOP. No such items are seen at the top of top. The 'meta' information in iotop does not seem to be totally broken. The Total Disk Read and Disk Write stats though may be not as accurate in their totals they still reflect activity under normal operation. The "zero" I speak of above is that line showing zero to a few byes of activity with rsync sending the LA1 to 6 or more.

The issue manifests itself in NFS copying via rsync, dd from /dev/zero to a usb device, and on SATA to SATA moves of large VDI files. If a simple dd=/dev/zero of=/path/to/usb/file bs=1024 count=1M sends the load average sky high (on two different guests on two different parents and one parent is running 10.04) something is wrong on 10.04. It did not operate this way before. I should not need to render my parent or guests useless moving 30 GB vdi images from one disk spindle to another. Nor should I need to "put up with" (as seen in the sar output attached to the case) load averages for idle systems that, for no apparent reason, migrate above 1.00 or more. If you had read my other posts on 574910 you would have noted that regressing the kernel to a 9.10 version improves the issue, but does not totally resolve it.

There are more instance types. See my posts in 574910.

NB: As to the "broken" state of iotop, Only CONFIG_TASK_DELAY_ACCT seems to be disabled in Lucid. CONFIG_TASK_DELAY_ACCT used for intra-task prioritization[1] and seems only to disable the SwapIn and IO %. So, unless I have missed other posts where it states that iotop is totally broken, the package is still valid for meta information. (BTW if CONFIG_TASK_DELAY_ACCT is disabled due to "performance" issues, why not disable SMP on all kernels too since that is a performance hit too for all non SMP hosts).

1 https://lists.ubuntu.com/archives/kernel-team/2009-December/008029.html

Revision history for this message
Chris (nakota07) wrote :

Close the case.

Revision history for this message
imagine (imagine-de) wrote :

Hi Chris,

why should it be closed? Did you find a solution?

Revision history for this message
Chris (nakota07) wrote : Re: [Bug 635181] Re: High load average on Lucid for nominal/idle system use

Yes, the solution is: Fedora core 13, or 14.

Since I installed FC13 XFCE on the parent host (home and work), and
using GNOME FC13/14 on guests things have been normal.

Today, 11/24, I rzipped several 4-9GB files in the 'parent' os. No
problem. No sluggish display. No guests "freezing" for several
seconds. Load averages of 4 and not ++(++(++10)).

I may take a gander at Ubuntu when the next LTS release comes out. That
may give me enough healing time to see that +5 months of my life wasted
is not such a long time. Heck it is better than 5 months in jail.

10.04 is/was a colossal waste of (my) time and energy. The problem may
persist, but for what I care at this point...

After giving up on Ubuntu in late September, I had enough free time, not
to mention faster hosts, in October/November to download and play with
Hercules, vm370\mvs-380 and music/sp on my home system (guest FC14). I
have had better support in the yahoo group for those products than the
'drive by' RTFM shouts on here. I might even try z/Linux. It took
almost two months for anyone (you) to notice on here to close the
case. I need to redo the mapping in my brain for apt-get install to
yum -y install. Otherwise the transition is less than it takes to get
used to AIX from Solaris.

Have a nice day

On 2010-11-24 02:29, imagine wrote:
> Hi Chris,
>
> why should it be closed? Did you find a solution?
>

--
------------------------------------------------------------------------
Chris | <email address hidden>
------------------------------------------------------------------------
This account for level two spam guarding.

Revision history for this message
penalvch (penalvch) wrote :

Chris, this bug report is being closed due to your last comment https://bugs.launchpad.net/ubuntu/+source/linux/+bug/635181/comments/7 regarding you requesting this be closed. For future reference you can manage the status of your own bugs by clicking on the current status in the yellow line and then choosing a new status in the revealed drop down box. You can learn more about bug statuses at https://wiki.ubuntu.com/Bugs/Status. Thank you again for taking the time to report this bug and helping to make Ubuntu better. Please submit any future bugs you may find.

Changed in linux (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.