Network Performance dropping between vms on different location in Azure

Bug #1521053 reported by Seyeong Kim
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Unassigned
Vivid
Fix Released
Medium
Seyeong Kim

Bug Description

[Impact]

Ubuntu VMs between different location in Azure , especially North Europe and East Europe in this case, have network performance issue.
It should be around 100MB/s speed between them. but it's around 0.3MB/s when dropping happens.

[Fix]

Upstream development
0d158852a8089099a6959ae235b20f230871982f ("hv_netvsc: Clean up two unused variables")

It's affected over 3.19.0-28-generic (ubuntu-vivid)

[Testcase]

Make 2 VMs on North Europe, West Europe each.
Then run below test script

NE VM

- netcat & nload
 while true; do netcat -l 8080 < /dev/zero; done;
 nload -u M eth0 ( need nload pkg )

- iperf
 iperf -s -f M

WE VM

- netcat
 for i in {1..1000}
 do
  timeout 30s nc NE_HOST 8080 > /dev/null
 done

- iperf
 iperf -c HOST -f M

Network performance dropping can be seen frequently.

More Tests
http://pastebin.ubuntu.com/13657083/

Seyeong Kim (seyeongkim)
tags: added: sts vivid
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1521053

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Vivid):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Vivid):
status: New → In Progress
importance: Undecided → Medium
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Incomplete → In Progress
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Vivid test kernel with a cherry pick of commit 0d158852a. The test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1521053/

Can you test this kernel and see if it resolves this bug? If it does, we can submit an SRU request for that commit.

Thanks in advance!

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

Hello Joseph

I tested your test kernel, and confirmed it's working fine.

Thanks

Changed in linux (Ubuntu):
status: In Progress → Confirmed
Changed in linux (Ubuntu Vivid):
status: In Progress → Confirmed
Changed in linux (Ubuntu):
assignee: Joseph Salisbury (jsalisbury) → nobody
Changed in linux (Ubuntu Vivid):
assignee: Joseph Salisbury (jsalisbury) → nobody
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

It also can be reproduced by using iperf

Revision history for this message
Seyeong Kim (seyeongkim) wrote :
Revision history for this message
Seyeong Kim (seyeongkim) wrote :
Download full text (8.1 KiB)

Environment
- Ubuntu trusty 14.04.3 (ubuntu-vivid kernel)
- DS2, West Europe <-> North Europe, Azure
- test app : netcat+nload, iperf

Logs
1. ===================================================================================================================
The customer provide us some analysis about kernel version, which is ok, which is not

Works
ii linux-image-3.16.0-52-generic 3.16.0-52.71~14.04.1 amd64 Linux kernel image for version 3.16.0 on 64 bit x86 SMP
ii linux-image-3.19.0-18-generic 3.19.0-18.18~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii linux-image-3.19.0-20-generic 3.19.0-20.20~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii linux-image-3.19.0-21-generic 3.19.0-21.21~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii linux-image-3.19.0-22-generic 3.19.0-22.22~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii linux-image-3.19.0-23-generic 3.19.0-23.24~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii linux-image-3.19.0-25-generic 3.19.0-25.26~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii linux-image-3.19.0-26-generic 3.19.0-26.28~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP

Doesnt work
ii linux-image-3.19.0-28-generic 3.19.0-28.30~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii linux-image-3.19.0-30-generic 3.19.0-30.34~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii linux-image-3.19.0-31-generic 3.19.0-31.36~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii linux-image-3.19.0-32-generic 3.19.0-32.37~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP
======================================================================================================================

2.====================================================================================================================
Fail ( dropping )
----------------------------------------------------------------------------------------------------------------------
after bisecting them,
I found below commit is the one which dropping is started

commit 1826dae15f7b5d4742bd54c0392b2280cad0ef60
Author: Haiyang Zhang <email address hidden>
Date: Mon Apr 13 16:34:35 2015 -0700

hv_netvsc: Implement partial copy into send buffer

BugLink: http://bugs.launchpad.net/bugs/1454892

If remaining space in a send buffer slot is too small for the whole message,
we only copy the RNDIS header and PPI data into send buffer, so we can batch
one more packet each time. It reduces the vmbus per-message overhead.

Signed-off-by: Haiyang Zhang <email address hidden>
Reviewed-by: K. Y. Srinivasan <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
(cherry picked from commit aa0a34be68290aa9aa071c0691fb8b6edda38358)
Signed-off-by: Joseph Salisbury <email address hidden>
Acked-by: Tim Gardner <email address hidden>
Acked-by: Brad Figg <email address hidden>
Signed-off-by: Brad Figg <email address hidden>
============================================================...

Read more...

description: updated
Seyeong Kim (seyeongkim)
description: updated
description: updated
description: updated
description: updated
Seyeong Kim (seyeongkim)
description: updated
Seyeong Kim (seyeongkim)
Changed in linux (Ubuntu Vivid):
status: Confirmed → In Progress
assignee: nobody → Seyeong Kim (xtrusia)
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

non patched kernel perf stat


[ 3] local 10.13.0.4 port 36554 connected with 104.40.129.48 port 8042
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-377.9 sec 100 MBytes 0.26 MBytes/sec

 Performance counter stats for 'iperf -c 104.40.129.48 -f M -p 8042 -n 100M':

        452.813718 task-clock (msec) # 0.001 CPUs utilized
                   38287 context-switches # 0.085 M/sec
                          11 cpu-migrations # 0.024 K/sec
                        151 page-faults # 0.333 K/sec
                            0 cycles # 0.000 GHz
                            0 stalled-cycles-frontend # 0.00% frontend cycles idle
                            0 stalled-cycles-backend # 0.00% backend cycles idle
                            0 instructions
                            0 branches # 0.000 K/sec
                            0 branch-misses # 0.000 K/sec

     377.932572800 seconds time elapsed

######################################################

patched kernel perf stat

[ 3] local 10.23.0.4 port 45723 connected with 40.115.34.62 port 8042
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 0.9 sec 100 MBytes 106 MBytes/sec

 Performance counter stats for 'iperf -c 40.115.34.62 -f M -p 8042 -n 100M':

         30.442426 task-clock (msec) # 0.031 CPUs utilized
                      206 context-switches # 0.007 M/sec
                           3 cpu-migrations # 0.099 K/sec
                      154 page-faults # 0.005 M/sec
                           0 cycles # 0.000 GHz
                           0 stalled-cycles-frontend # 0.00% frontend cycles idle
                           0 stalled-cycles-backend # 0.00% backend cycles idle
                           0 instructions
                           0 branches # 0.000 K/sec
                           0 branch-misses # 0.000 K/sec

       0.997563300 seconds time elapsed

######################################################

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Seyeong - can you try reverting 'hv_netvsc: Implement partial copy into send buffer' ? This backport was dropped from Utopic because it caused regressions. Perhaps the same is true of Vivid.

Revision history for this message
Seyeong Kim (seyeongkim) wrote :
Download full text (7.2 KiB)

@timg-tpi

I reverted it on latest ubuntu-vivid, but there is variable dependency with the other related commits, so I patched like below

it is better, but still dropping is there.

original 100 -> 0.3
below patch 100 -> 15~20

Thanks.

####################################################################

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index bf2604b..ad73121 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -132,8 +132,6 @@ struct hv_netvsc_packet {
        struct hv_device *device;
        bool is_data_pkt;
        bool xmit_more; /* from skb */
- bool cp_partial; /* partial copy into send buffer */
-
        u16 vlan_tci;

        u16 q_idx;
@@ -148,9 +146,6 @@ struct hv_netvsc_packet {
        /* This points to the memory after page_buf */
        struct rndis_message *rndis_msg;

- u32 rmsg_size; /* RNDIS header and PPI size */
- u32 rmsg_pgcnt; /* page count of RNDIS header and PPI */
-
        u32 total_data_buflen;
        /* Points to the send/receive buffer where the ethernet frame is */
        void *data;
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index b15041b..20102cd 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -703,18 +703,15 @@ static u32 netvsc_copy_to_send_buf(struct netvsc_device *net_device,
        u32 msg_size = 0;
        u32 padding = 0;
        u32 remain = packet->total_data_buflen % net_device->pkt_align;
- u32 page_count = packet->cp_partial ? packet->rmsg_pgcnt :
- packet->page_buf_cnt;

        /* Add padding */
- if (packet->is_data_pkt && packet->xmit_more && remain &&
- !packet->cp_partial) {
+ if (packet->is_data_pkt && packet->xmit_more && remain) {
                padding = net_device->pkt_align - remain;
                packet->rndis_msg->msg_len += padding;
                packet->total_data_buflen += padding;
        }

- for (i = 0; i < page_count; i++) {
+ for (i = 0; i < packet->page_buf_cnt; i++) {
                char *src = phys_to_virt(packet->page_buf[i].pfn << PAGE_SHIFT);
                u32 offset = packet->page_buf[i].offset;
                u32 len = packet->page_buf[i].len;
@@ -742,7 +739,6 @@ static inline int netvsc_send_pkt(
        struct net_device *ndev = net_device->ndev;
        u64 req_id;
        int ret;
- struct hv_page_buffer *pgbuf;
        u32 ring_avail = hv_ringbuf_avail_percent(&out_channel->outbound);

        nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
@@ -781,10 +777,8 @@ static inline int netvsc_send_pkt(
                packet->xmit_more = false;

        if (packet->page_buf_cnt) {
- pgbuf = packet->cp_partial ? packet->page_buf +
- packet->rmsg_pgcnt : packet->page_buf;
                ret = vmbus_sendpacket_pagebuffer_ctl(out_channel,
- pgbuf,
+ packet->page_buf,
                                                      packet->page_buf_cnt,
                                                      &nvmsg,
 ...

Read more...

Revision history for this message
Dexuan Cui (decui) wrote :

When the issue happens (it looks due to the layout of the struct somehow...), can you try the small workaround patch at
https://patchwork.ozlabs.org/patch/518469/?

I paste it below:

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 88a0069..7233790 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -132,7 +132,9 @@ static inline bool dev_xmit_complete(int rc)
  * used.
  */

-#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
+#if IS_ENABLED(CONFIG_HYPERV_NET)
+# define LL_MAX_HEADER 224
+#elif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
 # if defined(CONFIG_MAC80211_MESH)
 # define LL_MAX_HEADER 128
 # else

If this can work, please use the formal fixes from KY, which have been in linux-next:
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/log/?qt=grep&q=hv_netvsc (please check the patches of the past week)

Revision history for this message
Dexuan Cui (decui) wrote :

BTW, I'm not sure if comment #10 could helps or not -- just FYI. :-)

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

@decui

Thanks, I confirmed it works.

tags: added: kernel-da-key kernel-hyper-v
Andy Whitcroft (apw)
Changed in linux (Ubuntu Vivid):
status: In Progress → Fix Committed
Revision history for this message
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-vivid' to 'verification-done-vivid'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-vivid
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

No dropping anymore with -proposed kernel

tags: added: verification-done-vivid
removed: verification-needed-vivid
Revision history for this message
Andy Whitcroft (apw) wrote :

Fix released in 3.19.0-47.53

Changed in linux (Ubuntu Vivid):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.