diff -Nru hwloc-2.7.0/config/hwloc_internal.m4 hwloc-2.7.1/config/hwloc_internal.m4 --- hwloc-2.7.0/config/hwloc_internal.m4 2021-12-06 12:19:05.000000000 +0000 +++ hwloc-2.7.1/config/hwloc_internal.m4 2022-03-24 12:33:56.000000000 +0000 @@ -104,7 +104,7 @@ # CUDA install path (and NVML and OpenCL) AC_ARG_WITH([cuda], AS_HELP_STRING([--with-cuda=
Some operating systems only support binding threads or processes to a single PU. Others allow binding to larger sets such as entire Cores or Packages or even random sets of invididual PUs. In such operating system, the scheduler is free to run the task on one of these PU, then migrate it to another PU, etc. It is often useful to call hwloc_bitmap_singlify() on the target CPU set before passing it to the binding function to avoid these expensive migrations. See the documentation of hwloc_bitmap_singlify() for details.
+Some operating systems only support binding threads or processes to a single PU. Others allow binding to larger sets such as entire Cores or Packages or even random sets of individual PUs. In such operating system, the scheduler is free to run the task on one of these PU, then migrate it to another PU, etc. It is often useful to call hwloc_bitmap_singlify() on the target CPU set before passing it to the binding function to avoid these expensive migrations. See the documentation of hwloc_bitmap_singlify() for details.
Some operating systems do not provide all hwloc-supported mechanisms to bind processes, threads, etc. hwloc_topology_get_support() may be used to query about the actual CPU binding support in the currently used operating system.
When the requested binding operation is not available and the HWLOC_CPUBIND_STRICT flag was passed, the function returns -1. errno
is set to ENOSYS
when it is not possible to bind the requested kind of object processes/threads. errno is set to EXDEV
when the requested cpuset can not be enforced (e.g. some systems only allow one CPU, and some other systems only allow one NUMA node).
If HWLOC_CPUBIND_STRICT was not passed, the function may fail as well, or the operating system may use a slightly different operation (with side-effects, smaller binding set, etc.) when the requested operation is not exactly supported.
The most portable version that should be preferred over the others, whenever possible, is the following one which just binds the current program, assuming it is single-threaded:
If the program may be multithreaded, the following one should be preferred to only bind the current thread:
Bind current thread of current process.
Request for strict binding from the OS.
-By default, when the designated CPUs are all busy while other CPUs are idle, operating systems may execute the thread/process on those other CPUs instead of the designated CPUs, to let them progress anyway. Strict binding means that the thread/process will _never_ execute on other cpus than the designated CPUs, even when those are busy with other tasks and other CPUs are idle.
+By default, when the designated CPUs are all busy while other CPUs are idle, operating systems may execute the thread/process on those other CPUs instead of the designated CPUs, to let them progress anyway. Strict binding means that the thread/process will _never_ execute on other CPUs than the designated CPUs, even when those are busy with other tasks and other CPUs are idle.
When retrieving the binding of a process, this flag checks whether all its threads actually have the same binding. If the flag is not given, the binding of each thread will be accumulated.
Bind current process or thread on cpus given in physical bitmap set
.
Bind current process or thread on CPUs given in physical bitmap set
.
Bind a process pid
on cpus given in physical bitmap set
.
Bind a process pid
on CPUs given in physical bitmap set
.
hwloc_pid_t
is pid_t
on Unix platforms, and HANDLE
on native Windows platforms.Bind a thread thread
on cpus given in physical bitmap set
.
Bind a thread thread
on CPUs given in physical bitmap set
.
hwloc_thread_t
is pthread_t
on Unix platforms, and HANDLE
on native Windows platforms.flags
. Flags should be given to hwloc_topology_set_flags(). They may also be returned by hwloc_topology_get_flags().
Enumerator | |
---|---|
HWLOC_TOPOLOGY_FLAG_INCLUDE_DISALLOWED | Detect the whole system, ignore reservations, include disallowed objects. -Gather all resources, even if some were disabled by the administrator. For instance, ignore Linux Cgroup/Cpusets and gather all processors and memory nodes. +Gather all online resources, even if some were disabled by the administrator. For instance, ignore Linux Cgroup/Cpusets and gather all processors and memory nodes. However offline PUs and NUMA nodes are still ignored. When this flag is not set, PUs and NUMA nodes that are disallowed are not added to the topology. Parent objects (package, core, cache, etc.) are added only if some of their children are allowed. All existing PUs and NUMA nodes in the topology are allowed. hwloc_topology_get_allowed_cpuset() and hwloc_topology_get_allowed_nodeset() are equal to the root object cpuset and nodeset. When this flag is set, the actual sets of allowed PUs and NUMA nodes are given by hwloc_topology_get_allowed_cpuset() and hwloc_topology_get_allowed_nodeset(). They may be smaller than the root object cpuset and nodeset. If the current topology is exported to XML and reimported later, this flag should be set again in the reimported topology so that disallowed resources are reimported as well. @@ -227,6 +227,7 @@Get OR'ed flags of a topology. Get the OR'ed set of hwloc_topology_flags_e of a topology. +If hwloc_topology_set_flags() was not called earlier, no flags are set (
Set OR'ed flags to non-yet-loaded topology. Set a OR'ed set of hwloc_topology_flags_e onto a topology that was not yet loaded. -If this function is called multiple times, the last invokation will erase and replace the set of flags that was previously set. -The flags set in a topology may be retrieved with hwloc_topology_get_flags() +If this function is called multiple times, the last invocation will erase and replace the set of flags that was previously set. +By default, no flags are set ( The flags set in a topology may be retrieved with hwloc_topology_get_flags(). diff -Nru hwloc-2.7.0/doc/doxygen-doc/html/a00206.html hwloc-2.7.1/doc/doxygen-doc/html/a00206.html --- hwloc-2.7.0/doc/doxygen-doc/html/a00206.html 2021-12-06 12:22:58.000000000 +0000 +++ hwloc-2.7.1/doc/doxygen-doc/html/a00206.html 2022-03-24 12:33:56.000000000 +0000 @@ -19,7 +19,7 @@ |
Hardware Locality (hwloc)
- 2.7.0
+ 2.7.1
|
|
Hardware Locality (hwloc)
- 2.7.0
+ 2.7.1
|
|
Hardware Locality (hwloc)
- 2.7.0
+ 2.7.1
|
|
Hardware Locality (hwloc)
- 2.7.0
+ 2.7.1
|
|
Hardware Locality (hwloc)
- 2.7.0
+ 2.7.1
|
|
Hardware Locality (hwloc)
- 2.7.0
+ 2.7.1
|
|
Hardware Locality (hwloc)
- 2.7.0
+ 2.7.1
|
|
Hardware Locality (hwloc)
- 2.7.0
+ 2.7.1
|
|
Hardware Locality (hwloc)
- 2.7.0
+ 2.7.1
|
|
Hardware Locality (hwloc)
- 2.7.0
+ 2.7.1
|
|
Hardware Locality (hwloc)
- 2.7.0
+ 2.7.1
|
|
Hardware Locality (hwloc)
- 2.7.0
+ 2.7.1
|
|
Hardware Locality (hwloc)
- 2.7.0
+ 2.7.1
|
|
Hardware Locality (hwloc)
- 2.7.0
+ 2.7.1
|
|
Hardware Locality (hwloc)
- 2.7.0
+ 2.7.1
|
|
Hardware Locality (hwloc)
- 2.7.0
+ 2.7.1
|
|
Hardware Locality (hwloc)
- 2.7.0
+ 2.7.1
|
|
Hardware Locality (hwloc)
- 2.7.0
+ 2.7.1
|
|
Hardware Locality (hwloc)
- 2.7.0
+ 2.7.1
|
|
Hardware Locality (hwloc)
- 2.7.0
+ 2.7.1
|
|
int | hwloc_memattr_set_value (hwloc_topology_t topology, hwloc_memattr_id_t attribute, hwloc_obj_t target_node, struct hwloc_location *initiator, unsigned long flags, hwloc_uint64_t value) |
int | hwloc_memattr_get_targets (hwloc_topology_t topology, hwloc_memattr_id_t attribute, struct hwloc_location *initiator, unsigned long flags, unsigned *nrp, hwloc_obj_t *targets, hwloc_uint64_t *values) |
int | hwloc_memattr_get_targets (hwloc_topology_t topology, hwloc_memattr_id_t attribute, struct hwloc_location *initiator, unsigned long flags, unsigned *nr, hwloc_obj_t *targets, hwloc_uint64_t *values) |
int | hwloc_memattr_get_initiators (hwloc_topology_t topology, hwloc_memattr_id_t attribute, hwloc_obj_t target_node, unsigned long flags, unsigned *nr, struct hwloc_location *initiators, hwloc_uint64_t *values) |
flags
must be 0
for now.
initiator
should be of type HWLOC_LOCATION_TYPE_CPUSET when refering to accesses performed by CPU cores. HWLOC_LOCATION_TYPE_OBJECT is currently unused internally by hwloc, but users may for instance use it to provide custom information about host memory accesses performed by GPUs. initiator
should be of type HWLOC_LOCATION_TYPE_CPUSET when referring to accesses performed by CPU cores. HWLOC_LOCATION_TYPE_OBJECT is currently unused internally by hwloc, but users may for instance use it to provide custom information about host memory accesses performed by GPUs.
@@ -401,7 +401,7 @@
If the attribute does not relate to a specific initiator (it does not have the flag HWLOC_MEMATTR_FLAG_NEED_INITIATOR), location initiator
is ignored and may be NULL
.
The initiator will be copied into the topology, the caller should free anything allocated to store the initiator, for instance the cpuset.
flags
must be 0
for now.
initiator
should be of type HWLOC_LOCATION_TYPE_CPUSET when refering to accesses performed by CPU cores. HWLOC_LOCATION_TYPE_OBJECT is currently unused internally by hwloc, but users may for instance use it to provide custom information about host memory accesses performed by GPUs. initiator
should be of type HWLOC_LOCATION_TYPE_CPUSET when referring to accesses performed by CPU cores. HWLOC_LOCATION_TYPE_OBJECT is currently unused internally by hwloc, but users may for instance use it to provide custom information about host memory accesses performed by GPUs. Bind a thread tid
on cpus given in cpuset set
.
The behavior is exactly the same as the Linux sched_setaffinity system call, but uses a hwloc cpuset.
-CPUs covered by this object.
This is the set of CPUs for which there are PU objects in the topology under this object, i.e. which are known to be physically contained in this object and known how (the children path between this object and the PU objects).
-If the HWLOC_TOPOLOGY_FLAG_INCLUDE_DISALLOWED configuration flag is set, some of these CPUs may not be allowed for binding, see hwloc_topology_get_allowed_cpuset().
+If the HWLOC_TOPOLOGY_FLAG_INCLUDE_DISALLOWED configuration flag is set, some of these CPUs may be online but not allowed for binding, see hwloc_topology_get_allowed_cpuset().
NUMA nodes covered by this object or containing this object.
This is the set of NUMA nodes for which there are NUMA node objects in the topology under or above this object, i.e. which are known to be physically contained in this object or containing it and known how (the children path between this object and the NUMA node objects).
In the end, these nodes are those that are close to the current object. Function hwloc_get_local_numanode_objs() may be used to list those NUMA nodes more precisely.
-If the HWLOC_TOPOLOGY_FLAG_INCLUDE_DISALLOWED configuration flag is set, some of these nodes may not be allowed for allocation, see hwloc_topology_get_allowed_nodeset().
+If the HWLOC_TOPOLOGY_FLAG_INCLUDE_DISALLOWED configuration flag is set, some of these nodes may be online but not allowed for allocation, see hwloc_topology_get_allowed_nodeset().
If there are no NUMA nodes in the machine, all the memory is close to this object, so only the first bit may be set in nodeset
.
Bridge specific Object Attribues.
+Bridge specific Object Attributes.
These info attributes are attached to OS device objects specified in parentheses.
N*0xX
where N
is the number of queues in the group, and 0xX
is the hexadecimal bitmask of ze_command_queue_group_property_flag_t
listing properties of those queues). Functions consulting memory attributes in hwloc/memattrs.h are thread-safe except if the topology was recently modified (because memory attributes may involve objects that were removed).
-Whenever the topology is modified (see above), hwloc_topology_refresh()
should be called in the same thread-safe context to force the refresh of internal memory attribute structures. A call to hwloc_memattr_get_value()
or hwloc_memattr_get_targets()
may also refresh internal structures for a given memory attribute.
Whenever the topology is modified (see above), hwloc_topology_refresh()
should be called in the same thread-safe context to force the refresh of internal memory attribute structures. A call to hwloc_memattr_get_value()
or hwloc_memattr_get_targets()
may also refresh internal structures for a given memory attribute.
Once this refresh has been performed, multiple functions consulting memory attributes may then be performed concurrently by multiple threads.
If pkg-config
does not work, passing --with-cuda=/path/to/cuda
to the configure script is another way to define the corresponding library and header paths. Finally, these paths may also be set through environment variables such as LIBRARY_PATH
and C_INCLUDE_PATH
.
These paths, either detected by pkg-config
or given manually, will also be used to detect NVML and OpenCL libraries and enable their hwloc backends.
To find out whether CUDA was detected and enabled, look in Probe / display I/O devices at the end of the configure script output. Passing --enable-cuda
will also cause configure to fail if CUDA could not be found and enabled in hwloc.
Note that --with-cuda=/nonexisting
may be used to disable all dependencies that are installed by CUDA, i.e. the CUDA, NVML and NVIDIA OpenCL backends, since the given directory does not exist.
Intel Xeon Phi processors introduced a new memory architecture by possibly having two distinct local memories: some normal memory (DDR) and some high-bandwidth on-package memory (MCDRAM). Processors can be configured in various clustering modes to have up to 4 Clusters. Moreover, each Cluster (quarter, half or whole processor) of the processor may have its own local parts of the DDR and of the MCDRAM. This memory and clustering configuration may be probed by looking at MemoryMode and ClusterMode attributes, see Hardware Platform Information and doc/examples/get-knl-modes.c in the source directory.
@@ -330,7 +331,7 @@The NetBSD (and FreeBSD) backend uses x86-specific topology discovery (through the x86 component). This implementation requires CPU binding so as to query topology information from each individual processor. This means that hwloc cannot find any useful topology information unless user-level process binding is allowed by the NetBSD kernel. The security.models.extensions.user_set_cpu_affinity
sysctl variable must be set to 1 to do so. Otherwise, only the number of processors will be detected.
The AIX operating system requires specific user capabilities for attaching processes to resource sets (CAP_NUMA_ATTACH). Otherwise functions such as hwloc_set_cpubind() fail (return -1 with errno set to EPERM).
+The AIX operating system requires specific user capabilities for attaching processes to resource sets (CAP_NUMA_ATTACH). Otherwise functions such as hwloc_set_cpubind() fail (return -1 with errno set to EPERM).
This capability must also be inherited (through the additional CAP_PROPAGATE capability) if you plan to bind a process before forking another process, for instance with hwloc-bind
.
These capabilities may be given by the administrator with:
chuser "capabilities=CAP_PROPAGATE,CAP_NUMA_ATTACH" <username>
diff -Nru hwloc-2.7.0/doc/doxygen-doc/html/a00410.html hwloc-2.7.1/doc/doxygen-doc/html/a00410.html --- hwloc-2.7.0/doc/doxygen-doc/html/a00410.html 2021-12-06 12:22:58.000000000 +0000 +++ hwloc-2.7.1/doc/doxygen-doc/html/a00410.html 2022-03-24 12:33:56.000000000 +0000 @@ -19,7 +19,7 @@
Bugs should be reported in the tracker (https://github.com/open-mpi/hwloc/issues). Opening a new issue automatically displays lots of hints about how to debug and report issues.
Questions may be sent to the users or developers mailing lists (https://www.open-mpi.org/community/lists/hwloc.php).
-There is also a #hwloc
IRC channel on Freenode (irc.freenode.net
).
There is also a #hwloc
IRC channel on Libera Chat (irc.libera.chat
).