[Hyper-V] Add infiniband support for Azure HPC

Bug #1701744 reported by Long Li
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux-azure (Ubuntu)
Fix Released
Medium
Marcelo Cerri
Xenial
Fix Released
Medium
Marcelo Cerri

Bug Description

This is the infiniband driver for Azure HPC.

Windows Azure agent will provision an image for running infiniband RDMA via DAPL when "OS.EnableRDMA=y" is defined in waagent.conf.

Note: Ubuntu image needs to load rdma_ucm on boot to expose the RDMA CM interface to user-mode library.

Revision history for this message
Long Li (longli) wrote :
Revision history for this message
Long Li (longli) wrote :
Revision history for this message
Joshua R. Poulson (jrp) wrote :

This is for the linux-azure kernel.

summary: - Add infiniband support for Azure HPC
+ [Hyper-V] Add infiniband support for Azure HPC
Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1701744

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Joshua R. Poulson (jrp) wrote :

No log files required, kernel patch for IB on Azure. Microsoft will test proposed kernels.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: patch
Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Marcelo Cerri (mhcerri)
no longer affects: linux (Ubuntu)
Changed in linux-azure (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Marcelo Cerri (mhcerri)
Revision history for this message
Joshua R. Poulson (jrp) wrote :

Remember, this driver has to be loaded as a module, so that differences in host OS can load different drivers, once they exist. Current host versions are OS142 and OS144.

Revision history for this message
Long Li (longli) wrote :

Please build the driver as a kernel module package. The package name should have something that match the hosting cluster ND version. Currently we have 142 and 144.

For this driver, please build two kernel module packages:
<package-name>.142.deb and <package-name>.144.deb (those two packages are binary identical because ND142 and ND144 use the same infiniband interface)

Then build a master infiniband package, that contains those two packages. When doing apt install <master-package-name>, it copied the kernel modules packages to /opt/Microsoft

e.g.

apt install <infiniband-master-package>

will create the following deb files in /opt/Microsoft

<package-name>.142.deb
<package-name>.144.deb

The WALA will figure out which deb package to install based on host passed ND version.

Marcelo Cerri (mhcerri)
Changed in linux-azure (Ubuntu):
status: Triaged → In Progress
Revision history for this message
Marcelo Cerri (mhcerri) wrote :
Revision history for this message
Marcelo Cerri (mhcerri) wrote :

Inclusion of the minor host OS version in the modules names:

https://lists.ubuntu.com/archives/kernel-team/2017-July/085920.html

Marcelo Cerri (mhcerri)
Changed in linux-azure (Ubuntu Xenial):
status: New → In Progress
status: In Progress → Fix Committed
Changed in linux-azure (Ubuntu):
status: In Progress → Fix Committed
Changed in linux-azure (Ubuntu Xenial):
assignee: nobody → Marcelo Cerri (mhcerri)
importance: Undecided → Medium
Marcelo Cerri (mhcerri)
Changed in linux-azure (Ubuntu):
status: Fix Committed → Fix Released
Changed in linux-azure (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.