Comment 4 for bug 256216

Revision history for this message
Roland Dreier (roland.dreier) wrote :

Fair questions, although it probably would have been better to look at this before the other group "rdma" changes went into libibverbs as part of hardy (see bug #225788). And I wish we could have had this discussion two months ago, rather than two weeks before the Intrepid release.

Anyway, I'll try to answer the questions:

 - RDMA stands for "remote direct memory access," and it is a type of high performance networking implemented by InfiniBand and some 10 GbE adapters. Part of RDMA is "kernel bypass," which allows userspace process direct access to hardware registers to reduce latency and CPU overhead in performing RDMA operations. http://en.wikipedia.org/wiki/RDMA has a more complete overview.

 - Users that are running high-performance jobs would need access to these device nodes; it makes sense to me that administrators would not necessarily want to allow all users to have direct access to do things that might interfere with other jobs on a high-performance network.

 - The device nodes in this particular bug are actually virtual devices that are used for connection setup; the actual direct-access nodes have permissions covered by the udev rules in the libibverbs1 package. In any case, RDMA hardware is generally a PCI Express or PCI-X card (basically a high-end NIC), although some systems have hardware directly on a system bus (AMD hypertransport, or IBM system p GX bus). The hardware is only pluggable via something like PCI hot-swap, which is generally a high-end server feature. It's definitely not something that a user on a multi-seat system is going to plug into a USB port.

 - Not sure what it would mean for users to use the devices directly -- obviously device access is through software (rather than poking solder balls with a wire or something like that). For the rdma_cm node specifically that this bug is about, typical user will link their application to librdmacm and use the library to establish RDMA connections. Users will then run their application directly (or possibly through a job submission queue for large shared clusters).

 - As I said before, the rdma_cm device nodes should be usable by non-administrator users, but the administrator probably wants the ability to restrict access to only certain users.

Let me ask on fundamental question of my own: if upstreams are shirking responsibility by suggesting that standard group permissions be used by administrators to set policy, what do you feel is a better way for upstreams to provide this mechanism?