Comment 9 for bug 1570195

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

There is no obvious run-until-success loop in any of the involved code.
Only this in virtnet_send_command could be related
/* Spin for a response, the kick causes an ioport write, trapping
 * into the hypervisor, so the request should be handled immediately.
 */
while (!virtqueue_get_buf(vi->cvq, &tmp) &&
       !virtqueue_is_broken(vi->cvq))
       cpu_relax();

We need to catch who is calling whom and how often to get a better idea what is going on when going to get stuck.
Interesting are from the stack:

cpu_relax
virtnet_send_command
virtnet_set_queues
virtnet_set_channels
ethtool_set_channels
dev_ethtool

cd /sys/kernel/debug/tracing
echo 0 > tracing_on
echo function_graph > current_tracer
tail -f trace
# get global and one on each of our 4 CPUs from trace and per_cpu/cpu[0-3]/trace
echo 1 > tracing_on
ethtool -L eth1 combined 3

The system is stuck enough that all hang immediately without reporting.
Need to go deeper with debugging, but that is probably monday then.