https://launchpad.net/ubuntu/+source/nvidia-nccl/2.18.5-1-2/+build/26988716 RUN: /usr/share/launchpad-buildd/bin/builder-prep Kernel version: Linux bos02-ppc64el-018 5.4.0-164-generic #181-Ubuntu SMP Fri Sep 1 13:41:18 UTC 2023 ppc64le Buildd toolchain package versions: launchpad-buildd_235~645~ubuntu20.04.1 python3-lpbuildd_235~645~ubuntu20.04.1 sbuild_0.79.0-1ubuntu1 git-build-recipe_0.3.6 git_1:2.25.1-1ubuntu3.11 dpkg-dev_1.19.7ubuntu3.2 python3-debian_0.1.36ubuntu1.1. Syncing the system clock with the buildd NTP service... 12 Nov 05:08:48 ntpdate[2081]: adjust time server 10.211.37.1 offset -0.003830 sec RUN: /usr/share/launchpad-buildd/bin/in-target unpack-chroot --backend=chroot --series=noble --arch=ppc64el PACKAGEBUILD-26988716 --image-type chroot /home/buildd/filecache-default/d42c54d6e204d222772cc047f01d282d9c30a0e5 Creating target for build PACKAGEBUILD-26988716 RUN: /usr/share/launchpad-buildd/bin/in-target mount-chroot --backend=chroot --series=noble --arch=ppc64el PACKAGEBUILD-26988716 Starting target for build PACKAGEBUILD-26988716 RUN: /usr/share/launchpad-buildd/bin/in-target override-sources-list --backend=chroot --series=noble --arch=ppc64el PACKAGEBUILD-26988716 'deb http://ftpmaster.internal/ubuntu noble main restricted universe multiverse' 'deb http://ftpmaster.internal/ubuntu noble-security main restricted universe multiverse' 'deb http://ftpmaster.internal/ubuntu noble-updates main restricted universe multiverse' 'deb http://ftpmaster.internal/ubuntu noble-proposed main restricted universe multiverse' Overriding sources.list in build-PACKAGEBUILD-26988716 RUN: /usr/share/launchpad-buildd/bin/in-target update-debian-chroot --backend=chroot --series=noble --arch=ppc64el PACKAGEBUILD-26988716 Updating target for build PACKAGEBUILD-26988716 Get:1 http://ftpmaster.internal/ubuntu noble InRelease [240 kB] Get:2 http://ftpmaster.internal/ubuntu noble-security InRelease [74.9 kB] Get:3 http://ftpmaster.internal/ubuntu noble-updates InRelease [74.9 kB] Get:4 http://ftpmaster.internal/ubuntu noble-proposed InRelease [102 kB] Get:5 http://ftpmaster.internal/ubuntu noble/main ppc64el Packages [1351 kB] Get:6 http://ftpmaster.internal/ubuntu noble/main Translation-en [517 kB] Get:7 http://ftpmaster.internal/ubuntu noble/universe ppc64el Packages [14.6 MB] Get:8 http://ftpmaster.internal/ubuntu noble/universe Translation-en [6007 kB] Get:9 http://ftpmaster.internal/ubuntu noble/multiverse ppc64el Packages [183 kB] Get:10 http://ftpmaster.internal/ubuntu noble/multiverse Translation-en [115 kB] Get:11 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el Packages [167 kB] Get:12 http://ftpmaster.internal/ubuntu noble-proposed/main Translation-en [59.2 kB] Get:13 http://ftpmaster.internal/ubuntu noble-proposed/restricted ppc64el Packages [3128 B] Get:14 http://ftpmaster.internal/ubuntu noble-proposed/restricted Translation-en [6796 B] Get:15 http://ftpmaster.internal/ubuntu noble-proposed/universe ppc64el Packages [1755 kB] Get:16 http://ftpmaster.internal/ubuntu noble-proposed/universe Translation-en [600 kB] Get:17 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el Packages [17.8 kB] Get:18 http://ftpmaster.internal/ubuntu noble-proposed/multiverse Translation-en [9824 B] Fetched 25.9 MB in 5s (5374 kB/s) Reading package lists... Reading package lists... Building dependency tree... Reading state information... Calculating upgrade... The following package was automatically installed and is no longer required: libunistring2 Use 'sudo apt autoremove' to remove it. The following NEW packages will be installed: libunistring5 The following packages will be upgraded: apt apt-utils base-files base-passwd bash-completion binutils binutils-common binutils-powerpc64le-linux-gnu cpp-13 debianutils diffutils dpkg dpkg-dev fakeroot g++-13 gcc-13 gcc-13-base grep libapparmor1 libapt-pkg6.0 libargon2-1 libasan8 libatomic1 libaudit-common libaudit1 libbinutils libc-bin libc-dev-bin libc6 libc6-dev libcap-ng0 libcc1-0 libctf-nobfd0 libctf0 libdb5.3 libdpkg-perl libfakeroot libgcc-13-dev libgcc-s1 libgnutls30 libgomp1 libidn2-0 libitm1 liblsan0 liblzma5 libncursesw6 libnsl-dev libnsl2 libpng16-16 libquadmath0 libseccomp2 libselinux1 libsemanage-common libsemanage2 libsframe1 libsqlite3-0 libssl3 libstdc++-13-dev libstdc++6 libsystemd-shared libsystemd0 libtinfo6 libtsan2 libubsan1 libudev1 libxxhash0 libzstd1 mawk ncurses-base ncurses-bin openssl optipng systemd systemd-dev systemd-sysv xz-utils 76 upgraded, 1 newly installed, 0 to remove and 0 not upgraded. Need to get 85.1 MB of archives. After this operation, 2034 kB of additional disk space will be used. Get:1 http://ftpmaster.internal/ubuntu noble/main ppc64el libnsl-dev ppc64el 1.3.0-3 [79.2 kB] Get:2 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el libc6-dev ppc64el 2.38-3ubuntu1 [2083 kB] Get:3 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el libc-dev-bin ppc64el 2.38-3ubuntu1 [21.0 kB] Get:4 http://ftpmaster.internal/ubuntu noble/main ppc64el libnsl2 ppc64el 1.3.0-3 [47.3 kB] Get:5 http://ftpmaster.internal/ubuntu noble/main ppc64el libcc1-0 ppc64el 13.2.0-6ubuntu1 [48.7 kB] Get:6 http://ftpmaster.internal/ubuntu noble/main ppc64el gcc-13-base ppc64el 13.2.0-6ubuntu1 [44.3 kB] Get:7 http://ftpmaster.internal/ubuntu noble/main ppc64el libgcc-s1 ppc64el 13.2.0-6ubuntu1 [38.5 kB] Get:8 http://ftpmaster.internal/ubuntu noble/main ppc64el libgomp1 ppc64el 13.2.0-6ubuntu1 [155 kB] Get:9 http://ftpmaster.internal/ubuntu noble/main ppc64el libitm1 ppc64el 13.2.0-6ubuntu1 [32.6 kB] Get:10 http://ftpmaster.internal/ubuntu noble/main ppc64el libatomic1 ppc64el 13.2.0-6ubuntu1 [10.6 kB] Get:11 http://ftpmaster.internal/ubuntu noble/main ppc64el libasan8 ppc64el 13.2.0-6ubuntu1 [2837 kB] Get:12 http://ftpmaster.internal/ubuntu noble/main ppc64el liblsan0 ppc64el 13.2.0-6ubuntu1 [1233 kB] Get:13 http://ftpmaster.internal/ubuntu noble/main ppc64el libtsan2 ppc64el 13.2.0-6ubuntu1 [2657 kB] Get:14 http://ftpmaster.internal/ubuntu noble/main ppc64el libubsan1 ppc64el 13.2.0-6ubuntu1 [1134 kB] Get:15 http://ftpmaster.internal/ubuntu noble/main ppc64el libquadmath0 ppc64el 13.2.0-6ubuntu1 [156 kB] Get:16 http://ftpmaster.internal/ubuntu noble/main ppc64el g++-13 ppc64el 13.2.0-6ubuntu1 [11.2 MB] Get:17 http://ftpmaster.internal/ubuntu noble/main ppc64el libstdc++-13-dev ppc64el 13.2.0-6ubuntu1 [2475 kB] Get:18 http://ftpmaster.internal/ubuntu noble/main ppc64el libgcc-13-dev ppc64el 13.2.0-6ubuntu1 [1578 kB] Get:19 http://ftpmaster.internal/ubuntu noble/main ppc64el gcc-13 ppc64el 13.2.0-6ubuntu1 [19.5 MB] Get:20 http://ftpmaster.internal/ubuntu noble/main ppc64el cpp-13 ppc64el 13.2.0-6ubuntu1 [9736 kB] Get:21 http://ftpmaster.internal/ubuntu noble/main ppc64el libstdc++6 ppc64el 13.2.0-6ubuntu1 [872 kB] Get:22 http://ftpmaster.internal/ubuntu noble/main ppc64el libzstd1 ppc64el 1.5.5+dfsg2-2 [390 kB] Get:23 http://ftpmaster.internal/ubuntu noble/main ppc64el libctf0 ppc64el 2.41-6ubuntu1 [111 kB] Get:24 http://ftpmaster.internal/ubuntu noble/main ppc64el libctf-nobfd0 ppc64el 2.41-6ubuntu1 [111 kB] Get:25 http://ftpmaster.internal/ubuntu noble/main ppc64el libsframe1 ppc64el 2.41-6ubuntu1 [15.9 kB] Get:26 http://ftpmaster.internal/ubuntu noble/main ppc64el libbinutils ppc64el 2.41-6ubuntu1 [694 kB] Get:27 http://ftpmaster.internal/ubuntu noble/main ppc64el binutils-common ppc64el 2.41-6ubuntu1 [228 kB] Get:28 http://ftpmaster.internal/ubuntu noble/main ppc64el binutils ppc64el 2.41-6ubuntu1 [3078 B] Get:29 http://ftpmaster.internal/ubuntu noble/main ppc64el binutils-powerpc64le-linux-gnu ppc64el 2.41-6ubuntu1 [2495 kB] Get:30 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el libc6 ppc64el 2.38-3ubuntu1 [3245 kB] Get:31 http://ftpmaster.internal/ubuntu noble/main ppc64el base-files ppc64el 13ubuntu4 [74.0 kB] Get:32 http://ftpmaster.internal/ubuntu noble/main ppc64el debianutils ppc64el 5.14 [89.5 kB] Get:33 http://ftpmaster.internal/ubuntu noble/main ppc64el diffutils ppc64el 1:3.10-1 [200 kB] Get:34 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el liblzma5 ppc64el 5.4.4-0.1 [156 kB] Get:35 http://ftpmaster.internal/ubuntu noble/main ppc64el libapparmor1 ppc64el 4.0.0~alpha2-0ubuntu6 [52.8 kB] Get:36 http://ftpmaster.internal/ubuntu noble/main ppc64el libaudit-common all 1:3.1.1-1build1 [5510 B] Get:37 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el libcap-ng0 ppc64el 0.8.3-1build3 [16.2 kB] Get:38 http://ftpmaster.internal/ubuntu noble/main ppc64el libaudit1 ppc64el 1:3.1.1-1build1 [51.4 kB] Get:39 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el libseccomp2 ppc64el 2.5.4-2ubuntu1 [61.9 kB] Get:40 http://ftpmaster.internal/ubuntu noble/main ppc64el libselinux1 ppc64el 3.5-1build1 [97.2 kB] Get:41 http://ftpmaster.internal/ubuntu noble/main ppc64el libssl3 ppc64el 3.0.10-1ubuntu2.1 [2149 kB] Get:42 http://ftpmaster.internal/ubuntu noble/main ppc64el systemd-sysv ppc64el 253.5-1ubuntu7 [11.5 kB] Get:43 http://ftpmaster.internal/ubuntu noble/main ppc64el systemd-dev all 253.5-1ubuntu7 [78.5 kB] Get:44 http://ftpmaster.internal/ubuntu noble/main ppc64el systemd ppc64el 253.5-1ubuntu7 [3271 kB] Get:45 http://ftpmaster.internal/ubuntu noble/main ppc64el libsystemd-shared ppc64el 253.5-1ubuntu7 [2086 kB] Get:46 http://ftpmaster.internal/ubuntu noble/main ppc64el libsystemd0 ppc64el 253.5-1ubuntu7 [490 kB] Get:47 http://ftpmaster.internal/ubuntu noble/main ppc64el libudev1 ppc64el 253.5-1ubuntu7 [183 kB] Get:48 http://ftpmaster.internal/ubuntu noble/main ppc64el libxxhash0 ppc64el 0.8.2-2 [30.4 kB] Get:49 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el libapt-pkg6.0 ppc64el 2.7.6 [1006 kB] Get:50 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el dpkg ppc64el 1.22.1ubuntu2 [1465 kB] Get:51 http://ftpmaster.internal/ubuntu noble/main ppc64el grep ppc64el 3.11-3 [172 kB] Get:52 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el ncurses-bin ppc64el 6.4+20231016-1 [200 kB] Get:53 http://ftpmaster.internal/ubuntu noble/main ppc64el base-passwd ppc64el 3.6.2 [52.4 kB] Get:54 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el libc-bin ppc64el 2.38-3ubuntu1 [745 kB] Get:55 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el ncurses-base all 6.4+20231016-1 [24.7 kB] Get:56 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el libdb5.3 ppc64el 5.3.28+dfsg2-4 [853 kB] Get:57 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el apt ppc64el 2.7.6 [1390 kB] Get:58 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el apt-utils ppc64el 2.7.6 [224 kB] Get:59 http://ftpmaster.internal/ubuntu noble/main ppc64el libunistring5 ppc64el 1.1-2 [556 kB] Get:60 http://ftpmaster.internal/ubuntu noble/main ppc64el libidn2-0 ppc64el 2.3.4-1build1 [67.8 kB] Get:61 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el libgnutls30 ppc64el 3.8.1-4ubuntu3 [1044 kB] Get:62 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el libsemanage-common all 3.5-1build1 [9982 B] Get:63 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el libsemanage2 ppc64el 3.5-1build1 [113 kB] Get:64 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el libncursesw6 ppc64el 6.4+20231016-1 [181 kB] Get:65 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el libtinfo6 ppc64el 6.4+20231016-1 [132 kB] Get:66 http://ftpmaster.internal/ubuntu noble/main ppc64el mawk ppc64el 1.3.4.20230808-1 [136 kB] Get:67 http://ftpmaster.internal/ubuntu noble/main ppc64el libargon2-1 ppc64el 0~20190702+dfsg-4 [27.1 kB] Get:68 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el libsqlite3-0 ppc64el 3.44.0-1 [787 kB] Get:69 http://ftpmaster.internal/ubuntu noble/main ppc64el openssl ppc64el 3.0.10-1ubuntu2.1 [1208 kB] Get:70 http://ftpmaster.internal/ubuntu noble/main ppc64el bash-completion all 1:2.11-8 [180 kB] Get:71 http://ftpmaster.internal/ubuntu noble/main ppc64el libpng16-16 ppc64el 1.6.40-2 [239 kB] Get:72 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el xz-utils ppc64el 5.4.4-0.1 [275 kB] Get:73 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el dpkg-dev all 1.22.1ubuntu2 [1148 kB] Get:74 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el libdpkg-perl all 1.22.1ubuntu2 [285 kB] Get:75 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el libfakeroot ppc64el 1.32.2-1 [34.0 kB] Get:76 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el fakeroot ppc64el 1.32.2-1 [69.0 kB] Get:77 http://ftpmaster.internal/ubuntu noble/main ppc64el optipng ppc64el 0.7.7-3 [90.2 kB] Preconfiguring packages ... Fetched 85.1 MB in 6s (13.3 MB/s) (Reading database ... 13550 files and directories currently installed.) Preparing to unpack .../0-libnsl-dev_1.3.0-3_ppc64el.deb ... Unpacking libnsl-dev:ppc64el (1.3.0-3) over (1.3.0-2build2) ... Preparing to unpack .../1-libc6-dev_2.38-3ubuntu1_ppc64el.deb ... Unpacking libc6-dev:ppc64el (2.38-3ubuntu1) over (2.38-1ubuntu6) ... Preparing to unpack .../2-libc-dev-bin_2.38-3ubuntu1_ppc64el.deb ... Unpacking libc-dev-bin (2.38-3ubuntu1) over (2.38-1ubuntu6) ... Preparing to unpack .../3-libnsl2_1.3.0-3_ppc64el.deb ... Unpacking libnsl2:ppc64el (1.3.0-3) over (1.3.0-2build2) ... Preparing to unpack .../4-libcc1-0_13.2.0-6ubuntu1_ppc64el.deb ... Unpacking libcc1-0:ppc64el (13.2.0-6ubuntu1) over (13.2.0-4ubuntu3) ... Preparing to unpack .../5-gcc-13-base_13.2.0-6ubuntu1_ppc64el.deb ... Unpacking gcc-13-base:ppc64el (13.2.0-6ubuntu1) over (13.2.0-4ubuntu3) ... Setting up gcc-13-base:ppc64el (13.2.0-6ubuntu1) ... (Reading database ... 13550 files and directories currently installed.) Preparing to unpack .../libgcc-s1_13.2.0-6ubuntu1_ppc64el.deb ... Unpacking libgcc-s1:ppc64el (13.2.0-6ubuntu1) over (13.2.0-4ubuntu3) ... Setting up libgcc-s1:ppc64el (13.2.0-6ubuntu1) ... (Reading database ... 13550 files and directories currently installed.) Preparing to unpack .../00-libgomp1_13.2.0-6ubuntu1_ppc64el.deb ... Unpacking libgomp1:ppc64el (13.2.0-6ubuntu1) over (13.2.0-4ubuntu3) ... Preparing to unpack .../01-libitm1_13.2.0-6ubuntu1_ppc64el.deb ... Unpacking libitm1:ppc64el (13.2.0-6ubuntu1) over (13.2.0-4ubuntu3) ... Preparing to unpack .../02-libatomic1_13.2.0-6ubuntu1_ppc64el.deb ... Unpacking libatomic1:ppc64el (13.2.0-6ubuntu1) over (13.2.0-4ubuntu3) ... Preparing to unpack .../03-libasan8_13.2.0-6ubuntu1_ppc64el.deb ... Unpacking libasan8:ppc64el (13.2.0-6ubuntu1) over (13.2.0-4ubuntu3) ... Preparing to unpack .../04-liblsan0_13.2.0-6ubuntu1_ppc64el.deb ... Unpacking liblsan0:ppc64el (13.2.0-6ubuntu1) over (13.2.0-4ubuntu3) ... Preparing to unpack .../05-libtsan2_13.2.0-6ubuntu1_ppc64el.deb ... Unpacking libtsan2:ppc64el (13.2.0-6ubuntu1) over (13.2.0-4ubuntu3) ... Preparing to unpack .../06-libubsan1_13.2.0-6ubuntu1_ppc64el.deb ... Unpacking libubsan1:ppc64el (13.2.0-6ubuntu1) over (13.2.0-4ubuntu3) ... Preparing to unpack .../07-libquadmath0_13.2.0-6ubuntu1_ppc64el.deb ... Unpacking libquadmath0:ppc64el (13.2.0-6ubuntu1) over (13.2.0-4ubuntu3) ... Preparing to unpack .../08-g++-13_13.2.0-6ubuntu1_ppc64el.deb ... Unpacking g++-13 (13.2.0-6ubuntu1) over (13.2.0-4ubuntu3) ... Preparing to unpack .../09-libstdc++-13-dev_13.2.0-6ubuntu1_ppc64el.deb ... Unpacking libstdc++-13-dev:ppc64el (13.2.0-6ubuntu1) over (13.2.0-4ubuntu3) ... Preparing to unpack .../10-libgcc-13-dev_13.2.0-6ubuntu1_ppc64el.deb ... Unpacking libgcc-13-dev:ppc64el (13.2.0-6ubuntu1) over (13.2.0-4ubuntu3) ... Preparing to unpack .../11-gcc-13_13.2.0-6ubuntu1_ppc64el.deb ... Unpacking gcc-13 (13.2.0-6ubuntu1) over (13.2.0-4ubuntu3) ... Preparing to unpack .../12-cpp-13_13.2.0-6ubuntu1_ppc64el.deb ... Unpacking cpp-13 (13.2.0-6ubuntu1) over (13.2.0-4ubuntu3) ... Preparing to unpack .../13-libstdc++6_13.2.0-6ubuntu1_ppc64el.deb ... Unpacking libstdc++6:ppc64el (13.2.0-6ubuntu1) over (13.2.0-4ubuntu3) ... Setting up libstdc++6:ppc64el (13.2.0-6ubuntu1) ... (Reading database ... 13550 files and directories currently installed.) Preparing to unpack .../libzstd1_1.5.5+dfsg2-2_ppc64el.deb ... Unpacking libzstd1:ppc64el (1.5.5+dfsg2-2) over (1.5.5+dfsg2-1ubuntu2) ... Setting up libzstd1:ppc64el (1.5.5+dfsg2-2) ... (Reading database ... 13550 files and directories currently installed.) Preparing to unpack .../0-libctf0_2.41-6ubuntu1_ppc64el.deb ... Unpacking libctf0:ppc64el (2.41-6ubuntu1) over (2.41-5ubuntu1) ... Preparing to unpack .../1-libctf-nobfd0_2.41-6ubuntu1_ppc64el.deb ... Unpacking libctf-nobfd0:ppc64el (2.41-6ubuntu1) over (2.41-5ubuntu1) ... Preparing to unpack .../2-libsframe1_2.41-6ubuntu1_ppc64el.deb ... Unpacking libsframe1:ppc64el (2.41-6ubuntu1) over (2.41-5ubuntu1) ... Preparing to unpack .../3-libbinutils_2.41-6ubuntu1_ppc64el.deb ... Unpacking libbinutils:ppc64el (2.41-6ubuntu1) over (2.41-5ubuntu1) ... Preparing to unpack .../4-binutils-common_2.41-6ubuntu1_ppc64el.deb ... Unpacking binutils-common:ppc64el (2.41-6ubuntu1) over (2.41-5ubuntu1) ... Preparing to unpack .../5-binutils_2.41-6ubuntu1_ppc64el.deb ... Unpacking binutils (2.41-6ubuntu1) over (2.41-5ubuntu1) ... Preparing to unpack .../6-binutils-powerpc64le-linux-gnu_2.41-6ubuntu1_ppc64el.deb ... Unpacking binutils-powerpc64le-linux-gnu (2.41-6ubuntu1) over (2.41-5ubuntu1) ... Preparing to unpack .../7-libc6_2.38-3ubuntu1_ppc64el.deb ... Unpacking libc6:ppc64el (2.38-3ubuntu1) over (2.38-1ubuntu6) ... Setting up libc6:ppc64el (2.38-3ubuntu1) ... (Reading database ... 13550 files and directories currently installed.) Preparing to unpack .../base-files_13ubuntu4_ppc64el.deb ... Unpacking base-files (13ubuntu4) over (13ubuntu3) ... Setting up base-files (13ubuntu4) ... Installing new version of config file /etc/issue ... Installing new version of config file /etc/issue.net ... Installing new version of config file /etc/lsb-release ... (Reading database ... 13550 files and directories currently installed.) Preparing to unpack .../debianutils_5.14_ppc64el.deb ... Unpacking debianutils (5.14) over (5.8-1) ... Setting up debianutils (5.14) ... (Reading database ... 13549 files and directories currently installed.) Preparing to unpack .../diffutils_1%3a3.10-1_ppc64el.deb ... Unpacking diffutils (1:3.10-1) over (1:3.8-4) ... Setting up diffutils (1:3.10-1) ... (Reading database ... 13549 files and directories currently installed.) Preparing to unpack .../liblzma5_5.4.4-0.1_ppc64el.deb ... Unpacking liblzma5:ppc64el (5.4.4-0.1) over (5.4.1-0.2) ... Setting up liblzma5:ppc64el (5.4.4-0.1) ... (Reading database ... 13549 files and directories currently installed.) Preparing to unpack .../libapparmor1_4.0.0~alpha2-0ubuntu6_ppc64el.deb ... Unpacking libapparmor1:ppc64el (4.0.0~alpha2-0ubuntu6) over (4.0.0~alpha2-0ubuntu5) ... Preparing to unpack .../libaudit-common_1%3a3.1.1-1build1_all.deb ... Unpacking libaudit-common (1:3.1.1-1build1) over (1:3.1.1-1) ... Setting up libaudit-common (1:3.1.1-1build1) ... (Reading database ... 13549 files and directories currently installed.) Preparing to unpack .../libcap-ng0_0.8.3-1build3_ppc64el.deb ... Unpacking libcap-ng0:ppc64el (0.8.3-1build3) over (0.8.3-1build2) ... Setting up libcap-ng0:ppc64el (0.8.3-1build3) ... (Reading database ... 13549 files and directories currently installed.) Preparing to unpack .../libaudit1_1%3a3.1.1-1build1_ppc64el.deb ... Unpacking libaudit1:ppc64el (1:3.1.1-1build1) over (1:3.1.1-1) ... Setting up libaudit1:ppc64el (1:3.1.1-1build1) ... (Reading database ... 13549 files and directories currently installed.) Preparing to unpack .../libseccomp2_2.5.4-2ubuntu1_ppc64el.deb ... Unpacking libseccomp2:ppc64el (2.5.4-2ubuntu1) over (2.5.4-1ubuntu3) ... Setting up libseccomp2:ppc64el (2.5.4-2ubuntu1) ... (Reading database ... 13549 files and directories currently installed.) Preparing to unpack .../libselinux1_3.5-1build1_ppc64el.deb ... Unpacking libselinux1:ppc64el (3.5-1build1) over (3.5-1) ... Setting up libselinux1:ppc64el (3.5-1build1) ... (Reading database ... 13549 files and directories currently installed.) Preparing to unpack .../libssl3_3.0.10-1ubuntu2.1_ppc64el.deb ... Unpacking libssl3:ppc64el (3.0.10-1ubuntu2.1) over (3.0.10-1ubuntu2) ... Preparing to unpack .../systemd-sysv_253.5-1ubuntu7_ppc64el.deb ... Unpacking systemd-sysv (253.5-1ubuntu7) over (253.5-1ubuntu6) ... Preparing to unpack .../systemd-dev_253.5-1ubuntu7_all.deb ... Unpacking systemd-dev (253.5-1ubuntu7) over (253.5-1ubuntu6) ... Setting up libssl3:ppc64el (3.0.10-1ubuntu2.1) ... (Reading database ... 13549 files and directories currently installed.) Preparing to unpack .../systemd_253.5-1ubuntu7_ppc64el.deb ... Unpacking systemd (253.5-1ubuntu7) over (253.5-1ubuntu6) ... Preparing to unpack .../libsystemd-shared_253.5-1ubuntu7_ppc64el.deb ... Unpacking libsystemd-shared:ppc64el (253.5-1ubuntu7) over (253.5-1ubuntu6) ... Preparing to unpack .../libsystemd0_253.5-1ubuntu7_ppc64el.deb ... Unpacking libsystemd0:ppc64el (253.5-1ubuntu7) over (253.5-1ubuntu6) ... Setting up libsystemd0:ppc64el (253.5-1ubuntu7) ... (Reading database ... 13549 files and directories currently installed.) Preparing to unpack .../libudev1_253.5-1ubuntu7_ppc64el.deb ... Unpacking libudev1:ppc64el (253.5-1ubuntu7) over (253.5-1ubuntu6) ... Setting up libudev1:ppc64el (253.5-1ubuntu7) ... (Reading database ... 13549 files and directories currently installed.) Preparing to unpack .../libxxhash0_0.8.2-2_ppc64el.deb ... Unpacking libxxhash0:ppc64el (0.8.2-2) over (0.8.1-1) ... Setting up libxxhash0:ppc64el (0.8.2-2) ... (Reading database ... 13549 files and directories currently installed.) Preparing to unpack .../libapt-pkg6.0_2.7.6_ppc64el.deb ... Unpacking libapt-pkg6.0:ppc64el (2.7.6) over (2.7.3) ... Setting up libapt-pkg6.0:ppc64el (2.7.6) ... (Reading database ... 13549 files and directories currently installed.) Preparing to unpack .../dpkg_1.22.1ubuntu2_ppc64el.deb ... Unpacking dpkg (1.22.1ubuntu2) over (1.22.0ubuntu1) ... Setting up dpkg (1.22.1ubuntu2) ... (Reading database ... 13547 files and directories currently installed.) Preparing to unpack .../grep_3.11-3_ppc64el.deb ... Unpacking grep (3.11-3) over (3.11-2) ... Setting up grep (3.11-3) ... (Reading database ... 13547 files and directories currently installed.) Preparing to unpack .../ncurses-bin_6.4+20231016-1_ppc64el.deb ... Unpacking ncurses-bin (6.4+20231016-1) over (6.4+20230625-2) ... Setting up ncurses-bin (6.4+20231016-1) ... (Reading database ... 13547 files and directories currently installed.) Preparing to unpack .../base-passwd_3.6.2_ppc64el.deb ... Unpacking base-passwd (3.6.2) over (3.6.1) ... Setting up base-passwd (3.6.2) ... (Reading database ... 13547 files and directories currently installed.) Preparing to unpack .../libc-bin_2.38-3ubuntu1_ppc64el.deb ... Unpacking libc-bin (2.38-3ubuntu1) over (2.38-1ubuntu6) ... Setting up libc-bin (2.38-3ubuntu1) ... (Reading database ... 13547 files and directories currently installed.) Preparing to unpack .../ncurses-base_6.4+20231016-1_all.deb ... Unpacking ncurses-base (6.4+20231016-1) over (6.4+20230625-2) ... Setting up ncurses-base (6.4+20231016-1) ... (Reading database ... 13547 files and directories currently installed.) Preparing to unpack .../libdb5.3_5.3.28+dfsg2-4_ppc64el.deb ... Unpacking libdb5.3:ppc64el (5.3.28+dfsg2-4) over (5.3.28+dfsg2-2) ... Setting up libdb5.3:ppc64el (5.3.28+dfsg2-4) ... (Reading database ... 13547 files and directories currently installed.) Preparing to unpack .../archives/apt_2.7.6_ppc64el.deb ... Unpacking apt (2.7.6) over (2.7.3) ... Setting up apt (2.7.6) ... (Reading database ... 13547 files and directories currently installed.) Preparing to unpack .../apt-utils_2.7.6_ppc64el.deb ... Unpacking apt-utils (2.7.6) over (2.7.3) ... Selecting previously unselected package libunistring5:ppc64el. Preparing to unpack .../libunistring5_1.1-2_ppc64el.deb ... Unpacking libunistring5:ppc64el (1.1-2) ... Setting up libunistring5:ppc64el (1.1-2) ... (Reading database ... 13552 files and directories currently installed.) Preparing to unpack .../libidn2-0_2.3.4-1build1_ppc64el.deb ... Unpacking libidn2-0:ppc64el (2.3.4-1build1) over (2.3.4-1) ... Setting up libidn2-0:ppc64el (2.3.4-1build1) ... (Reading database ... 13552 files and directories currently installed.) Preparing to unpack .../libgnutls30_3.8.1-4ubuntu3_ppc64el.deb ... Unpacking libgnutls30:ppc64el (3.8.1-4ubuntu3) over (3.8.1-4ubuntu1) ... Setting up libgnutls30:ppc64el (3.8.1-4ubuntu3) ... (Reading database ... 13553 files and directories currently installed.) Preparing to unpack .../libsemanage-common_3.5-1build1_all.deb ... Unpacking libsemanage-common (3.5-1build1) over (3.5-1) ... Setting up libsemanage-common (3.5-1build1) ... (Reading database ... 13553 files and directories currently installed.) Preparing to unpack .../libsemanage2_3.5-1build1_ppc64el.deb ... Unpacking libsemanage2:ppc64el (3.5-1build1) over (3.5-1) ... Setting up libsemanage2:ppc64el (3.5-1build1) ... (Reading database ... 13553 files and directories currently installed.) Preparing to unpack .../libncursesw6_6.4+20231016-1_ppc64el.deb ... Unpacking libncursesw6:ppc64el (6.4+20231016-1) over (6.4+20230625-2) ... Preparing to unpack .../libtinfo6_6.4+20231016-1_ppc64el.deb ... Unpacking libtinfo6:ppc64el (6.4+20231016-1) over (6.4+20230625-2) ... Setting up libtinfo6:ppc64el (6.4+20231016-1) ... (Reading database ... 13553 files and directories currently installed.) Preparing to unpack .../00-mawk_1.3.4.20230808-1_ppc64el.deb ... Unpacking mawk (1.3.4.20230808-1) over (1.3.4.20230730-1) ... Preparing to unpack .../01-libargon2-1_0~20190702+dfsg-4_ppc64el.deb ... Unpacking libargon2-1:ppc64el (0~20190702+dfsg-4) over (0~20190702+dfsg-3) ... Preparing to unpack .../02-libsqlite3-0_3.44.0-1_ppc64el.deb ... Unpacking libsqlite3-0:ppc64el (3.44.0-1) over (3.42.0-1) ... Preparing to unpack .../03-openssl_3.0.10-1ubuntu2.1_ppc64el.deb ... Unpacking openssl (3.0.10-1ubuntu2.1) over (3.0.10-1ubuntu2) ... Preparing to unpack .../04-bash-completion_1%3a2.11-8_all.deb ... Unpacking bash-completion (1:2.11-8) over (1:2.11-7) ... Preparing to unpack .../05-libpng16-16_1.6.40-2_ppc64el.deb ... Unpacking libpng16-16:ppc64el (1.6.40-2) over (1.6.40-1) ... Preparing to unpack .../06-xz-utils_5.4.4-0.1_ppc64el.deb ... Unpacking xz-utils (5.4.4-0.1) over (5.4.1-0.2) ... Preparing to unpack .../07-dpkg-dev_1.22.1ubuntu2_all.deb ... Unpacking dpkg-dev (1.22.1ubuntu2) over (1.22.0ubuntu1) ... Preparing to unpack .../08-libdpkg-perl_1.22.1ubuntu2_all.deb ... Unpacking libdpkg-perl (1.22.1ubuntu2) over (1.22.0ubuntu1) ... Preparing to unpack .../09-libfakeroot_1.32.2-1_ppc64el.deb ... Unpacking libfakeroot:ppc64el (1.32.2-1) over (1.32.1-1) ... Preparing to unpack .../10-fakeroot_1.32.2-1_ppc64el.deb ... Unpacking fakeroot (1.32.2-1) over (1.32.1-1) ... Preparing to unpack .../11-optipng_0.7.7-3_ppc64el.deb ... Unpacking optipng (0.7.7-3) over (0.7.7-2build1) ... Setting up libapparmor1:ppc64el (4.0.0~alpha2-0ubuntu6) ... Setting up apt-utils (2.7.6) ... Setting up cpp-13 (13.2.0-6ubuntu1) ... Setting up libargon2-1:ppc64el (0~20190702+dfsg-4) ... Setting up libsqlite3-0:ppc64el (3.44.0-1) ... Setting up binutils-common:ppc64el (2.41-6ubuntu1) ... Setting up libctf-nobfd0:ppc64el (2.41-6ubuntu1) ... Setting up systemd-dev (253.5-1ubuntu7) ... Setting up libgomp1:ppc64el (13.2.0-6ubuntu1) ... Setting up libsframe1:ppc64el (2.41-6ubuntu1) ... Setting up libfakeroot:ppc64el (1.32.2-1) ... Setting up fakeroot (1.32.2-1) ... Setting up bash-completion (1:2.11-8) ... Setting up xz-utils (5.4.4-0.1) ... Setting up libquadmath0:ppc64el (13.2.0-6ubuntu1) ... Setting up libpng16-16:ppc64el (1.6.40-2) ... Setting up libatomic1:ppc64el (13.2.0-6ubuntu1) ... Setting up libsystemd-shared:ppc64el (253.5-1ubuntu7) ... Setting up libncursesw6:ppc64el (6.4+20231016-1) ... Setting up libdpkg-perl (1.22.1ubuntu2) ... Setting up libubsan1:ppc64el (13.2.0-6ubuntu1) ... Setting up libasan8:ppc64el (13.2.0-6ubuntu1) ... Setting up libnsl2:ppc64el (1.3.0-3) ... Setting up mawk (1.3.4.20230808-1) ... Setting up libtsan2:ppc64el (13.2.0-6ubuntu1) ... Setting up libbinutils:ppc64el (2.41-6ubuntu1) ... Setting up libc-dev-bin (2.38-3ubuntu1) ... Setting up openssl (3.0.10-1ubuntu2.1) ... Setting up libcc1-0:ppc64el (13.2.0-6ubuntu1) ... Setting up liblsan0:ppc64el (13.2.0-6ubuntu1) ... Setting up libitm1:ppc64el (13.2.0-6ubuntu1) ... Setting up libctf0:ppc64el (2.41-6ubuntu1) ... Setting up systemd (253.5-1ubuntu7) ... Initializing machine ID from random generator. Setting up optipng (0.7.7-3) ... Setting up libgcc-13-dev:ppc64el (13.2.0-6ubuntu1) ... Setting up libnsl-dev:ppc64el (1.3.0-3) ... Setting up libc6-dev:ppc64el (2.38-3ubuntu1) ... Setting up binutils-powerpc64le-linux-gnu (2.41-6ubuntu1) ... Setting up libstdc++-13-dev:ppc64el (13.2.0-6ubuntu1) ... Setting up systemd-sysv (253.5-1ubuntu7) ... Setting up binutils (2.41-6ubuntu1) ... Setting up dpkg-dev (1.22.1ubuntu2) ... Setting up gcc-13 (13.2.0-6ubuntu1) ... Setting up g++-13 (13.2.0-6ubuntu1) ... Processing triggers for libc-bin (2.38-3ubuntu1) ... RUN: /usr/share/launchpad-buildd/bin/sbuild-package PACKAGEBUILD-26988716 ppc64el noble-proposed -c chroot:build-PACKAGEBUILD-26988716 --arch=ppc64el --dist=noble-proposed --nolog nvidia-nccl_2.18.5-1-2.dsc Initiating build PACKAGEBUILD-26988716 with 4 jobs across 4 processor cores. Kernel reported to sbuild: 5.4.0-164-generic #181-Ubuntu SMP Fri Sep 1 13:41:18 UTC 2023 ppc64le sbuild (Debian sbuild) 0.79.0 (05 February 2020) on bos02-ppc64el-018.buildd +==============================================================================+ | nvidia-nccl 2.18.5-1-2 (ppc64el) Sun, 12 Nov 2023 05:09:16 +0000 | +==============================================================================+ Package: nvidia-nccl Version: 2.18.5-1-2 Source Version: 2.18.5-1-2 Distribution: noble-proposed Machine Architecture: ppc64el Host Architecture: ppc64el Build Architecture: ppc64el Build Type: any I: NOTICE: Log filtering will replace 'home/buildd/build-PACKAGEBUILD-26988716/chroot-autobuild' with '<>' I: NOTICE: Log filtering will replace 'build/nvidia-nccl-Y3LpE1/resolver-aQB3yq' with '<>' +------------------------------------------------------------------------------+ | Fetch source files | +------------------------------------------------------------------------------+ Local sources ------------- nvidia-nccl_2.18.5-1-2.dsc exists in .; copying to chroot I: NOTICE: Log filtering will replace 'build/nvidia-nccl-Y3LpE1/nvidia-nccl-2.18.5-1' with '<>' I: NOTICE: Log filtering will replace 'build/nvidia-nccl-Y3LpE1' with '<>' +------------------------------------------------------------------------------+ | Install package build dependencies | +------------------------------------------------------------------------------+ Setup apt archive ----------------- Merged Build-Depends: debhelper-compat (= 13), nvidia-cuda-toolkit-gcc, build-essential, fakeroot Filtered Build-Depends: debhelper-compat (= 13), nvidia-cuda-toolkit-gcc, build-essential, fakeroot dpkg-deb: building package 'sbuild-build-depends-main-dummy' in '/<>/apt_archive/sbuild-build-depends-main-dummy.deb'. Ign:1 copy:/<>/apt_archive ./ InRelease Get:2 copy:/<>/apt_archive ./ Release [957 B] Ign:3 copy:/<>/apt_archive ./ Release.gpg Get:4 copy:/<>/apt_archive ./ Sources [385 B] Get:5 copy:/<>/apt_archive ./ Packages [469 B] Fetched 1811 B in 0s (129 kB/s) Reading package lists... Reading package lists... Install main build dependencies (apt-based resolver) ---------------------------------------------------- Installing build dependencies Reading package lists... Building dependency tree... Reading state information... The following packages were automatically installed and are no longer required: apt-utils bash-completion ca-certificates debconf-i18n krb5-locales libgpg-error-l10n libgpm2 liblocale-gettext-perl libnss-nis libnss-nisplus libtext-charwidth-perl libtext-iconv-perl libtext-wrapi18n-perl libunistring2 openssl psmisc uuid-runtime Use 'apt autoremove' to remove them. The following additional packages will be installed: autoconf automake autopoint autotools-dev cpp-12 debhelper debugedit dh-autoreconf dh-strip-nondeterminism dwz file g++-12 gcc-12 gcc-12-base gettext gettext-base groff-base intltool-debian libaccinj64-12.0 libarchive-zip-perl libcu++-dev libcub-dev libcublas12 libcublaslt12 libcudart12 libcufft11 libcufftw11 libcuinj64-12.0 libcupti-dev libcupti12 libcurand10 libcusolver11 libcusolvermg11 libcusparse12 libdebhelper-perl libdw1 libelf1 libfile-stripnondeterminism-perl libgcc-12-dev libicu72 libmagic-mgc libmagic1 libnppc12 libnppial12 libnppicc12 libnppidei12 libnppif12 libnppig12 libnppim12 libnppist12 libnppisu12 libnppitc12 libnpps12 libnvblas12 libnvidia-ml-dev libnvjitlink12 libnvjpeg12 libnvrtc-builtins12.0 libnvrtc12 libnvtoolsext1 libnvvm4 libpipeline1 libstdc++-12-dev libsub-override-perl libthrust-dev libtool libuchardet0 libxml2 m4 man-db nvidia-cuda-dev nvidia-cuda-toolkit nvidia-cuda-toolkit-gcc nvidia-opencl-dev nvidia-profiler ocl-icd-libopencl1 ocl-icd-opencl-dev opencl-c-headers opencl-clhpp-headers po-debconf Suggested packages: autoconf-archive gnu-standards autoconf-doc gcc-12-locales cpp-12-doc dh-make gcc-12-doc gettext-doc libasprintf-dev libgettextpo-dev groff libstdc++-12-doc libtool-doc gfortran | fortran95-compiler gcj-jdk m4-doc apparmor less www-browser opencl-icd opencl-clhpp-headers-doc libmail-box-perl Recommended packages: curl | wget | lynx libcupti-doc libarchive-cpio-perl libtbb-dev libltdl-dev libgl-dev libvdpau-dev libnvcuvid1 nvidia-cuda-toolkit-doc nvidia-cuda-gdb nvidia-visual-profiler nsight-compute nsight-systems nvidia-opencl-icd libmail-sendmail-perl The following NEW packages will be installed: autoconf automake autopoint autotools-dev cpp-12 debhelper debugedit dh-autoreconf dh-strip-nondeterminism dwz file g++-12 gcc-12 gcc-12-base gettext gettext-base groff-base intltool-debian libaccinj64-12.0 libarchive-zip-perl libcu++-dev libcub-dev libcublas12 libcublaslt12 libcudart12 libcufft11 libcufftw11 libcuinj64-12.0 libcupti-dev libcupti12 libcurand10 libcusolver11 libcusolvermg11 libcusparse12 libdebhelper-perl libdw1 libelf1 libfile-stripnondeterminism-perl libgcc-12-dev libicu72 libmagic-mgc libmagic1 libnppc12 libnppial12 libnppicc12 libnppidei12 libnppif12 libnppig12 libnppim12 libnppist12 libnppisu12 libnppitc12 libnpps12 libnvblas12 libnvidia-ml-dev libnvjitlink12 libnvjpeg12 libnvrtc-builtins12.0 libnvrtc12 libnvtoolsext1 libnvvm4 libpipeline1 libstdc++-12-dev libsub-override-perl libthrust-dev libtool libuchardet0 libxml2 m4 man-db nvidia-cuda-dev nvidia-cuda-toolkit nvidia-cuda-toolkit-gcc nvidia-opencl-dev nvidia-profiler ocl-icd-libopencl1 ocl-icd-opencl-dev opencl-c-headers opencl-clhpp-headers po-debconf sbuild-build-depends-main-dummy 0 upgraded, 81 newly installed, 0 to remove and 0 not upgraded. Need to get 1387 MB of archives. After this operation, 4664 MB of additional disk space will be used. Get:1 copy:/<>/apt_archive ./ sbuild-build-depends-main-dummy 0.invalid.0 [680 B] Get:2 http://ftpmaster.internal/ubuntu noble/main ppc64el libelf1 ppc64el 0.189-4 [66.3 kB] Get:3 http://ftpmaster.internal/ubuntu noble/main ppc64el libicu72 ppc64el 72.1-3ubuntu3 [11.2 MB] Get:4 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el libxml2 ppc64el 2.9.14+dfsg-1.3build1 [826 kB] Get:5 http://ftpmaster.internal/ubuntu noble/main ppc64el libmagic-mgc ppc64el 1:5.45-2 [307 kB] Get:6 http://ftpmaster.internal/ubuntu noble/main ppc64el libmagic1 ppc64el 1:5.45-2 [106 kB] Get:7 http://ftpmaster.internal/ubuntu noble/main ppc64el file ppc64el 1:5.45-2 [22.6 kB] Get:8 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el gettext-base ppc64el 0.21-13build1 [42.6 kB] Get:9 http://ftpmaster.internal/ubuntu noble/main ppc64el libuchardet0 ppc64el 0.0.7-1build2 [80.4 kB] Get:10 http://ftpmaster.internal/ubuntu noble/main ppc64el groff-base ppc64el 1.23.0-3 [1108 kB] Get:11 http://ftpmaster.internal/ubuntu noble/main ppc64el libpipeline1 ppc64el 1.5.7-1 [25.8 kB] Get:12 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el man-db ppc64el 2.12.0-1 [1268 kB] Get:13 http://ftpmaster.internal/ubuntu noble/main ppc64el m4 ppc64el 1.4.19-4 [275 kB] Get:14 http://ftpmaster.internal/ubuntu noble/main ppc64el autoconf all 2.71-3 [339 kB] Get:15 http://ftpmaster.internal/ubuntu noble/main ppc64el autotools-dev all 20220109.1 [44.9 kB] Get:16 http://ftpmaster.internal/ubuntu noble/main ppc64el automake all 1:1.16.5-1.3 [558 kB] Get:17 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el autopoint all 0.21-13build1 [422 kB] Get:18 http://ftpmaster.internal/ubuntu noble/universe ppc64el gcc-12-base ppc64el 12.3.0-11ubuntu1 [44.1 kB] Get:19 http://ftpmaster.internal/ubuntu noble/universe ppc64el cpp-12 ppc64el 12.3.0-11ubuntu1 [10.1 MB] Get:20 http://ftpmaster.internal/ubuntu noble/main ppc64el libdebhelper-perl all 13.11.7ubuntu1 [85.8 kB] Get:21 http://ftpmaster.internal/ubuntu noble/universe ppc64el libgcc-12-dev ppc64el 12.3.0-11ubuntu1 [1512 kB] Get:22 http://ftpmaster.internal/ubuntu noble/universe ppc64el gcc-12 ppc64el 12.3.0-11ubuntu1 [19.8 MB] Get:23 http://ftpmaster.internal/ubuntu noble/main ppc64el libtool all 2.4.7-7 [166 kB] Get:24 http://ftpmaster.internal/ubuntu noble/main ppc64el dh-autoreconf all 20 [16.1 kB] Get:25 http://ftpmaster.internal/ubuntu noble/main ppc64el libarchive-zip-perl all 1.68-1 [90.2 kB] Get:26 http://ftpmaster.internal/ubuntu noble/main ppc64el libsub-override-perl all 0.09-4 [8706 B] Get:27 http://ftpmaster.internal/ubuntu noble/main ppc64el libfile-stripnondeterminism-perl all 1.13.1-1 [18.1 kB] Get:28 http://ftpmaster.internal/ubuntu noble/main ppc64el dh-strip-nondeterminism all 1.13.1-1 [5362 B] Get:29 http://ftpmaster.internal/ubuntu noble/main ppc64el libdw1 ppc64el 0.189-4 [292 kB] Get:30 http://ftpmaster.internal/ubuntu noble/main ppc64el debugedit ppc64el 1:5.0-5 [51.1 kB] Get:31 http://ftpmaster.internal/ubuntu noble/main ppc64el dwz ppc64el 0.15-1 [139 kB] Get:32 http://ftpmaster.internal/ubuntu noble-proposed/main ppc64el gettext ppc64el 0.21-13build1 [974 kB] Get:33 http://ftpmaster.internal/ubuntu noble/main ppc64el intltool-debian all 0.35.0+20060710.6 [23.2 kB] Get:34 http://ftpmaster.internal/ubuntu noble/main ppc64el po-debconf all 1.0.21+nmu1 [233 kB] Get:35 http://ftpmaster.internal/ubuntu noble/main ppc64el debhelper all 13.11.7ubuntu1 [940 kB] Get:36 http://ftpmaster.internal/ubuntu noble/universe ppc64el libstdc++-12-dev ppc64el 12.3.0-11ubuntu1 [2303 kB] Get:37 http://ftpmaster.internal/ubuntu noble/universe ppc64el g++-12 ppc64el 12.3.0-11ubuntu1 [11.4 MB] Get:38 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libcupti12 ppc64el 12.0.146~12.0.1-3 [8468 kB] Get:39 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libaccinj64-12.0 ppc64el 12.0.146~12.0.1-3 [840 kB] Get:40 http://ftpmaster.internal/ubuntu noble/universe ppc64el libcu++-dev all 1.9.0-3 [540 kB] Get:41 http://ftpmaster.internal/ubuntu noble/universe ppc64el libcub-dev all 2.0.1-2 [241 kB] Get:42 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libcublaslt12 ppc64el 12.0.2.224~12.0.1-3 [148 MB] Get:43 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libcublas12 ppc64el 12.0.2.224~12.0.1-3 [50.6 MB] Get:44 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libcudart12 ppc64el 12.0.146~12.0.1-3 [180 kB] Get:45 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libcufft11 ppc64el 11.0.1.95~12.0.1-3 [44.5 MB] Get:46 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libcufftw11 ppc64el 11.0.1.95~12.0.1-3 [456 kB] Get:47 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libcuinj64-12.0 ppc64el 12.0.146~12.0.1-3 [1031 kB] Get:48 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libcurand10 ppc64el 11.1.1+~10.3.1.124~12.0.1-3 [41.8 MB] Get:49 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libnvjitlink12 ppc64el 12.0.140~12.0.1-3 [15.1 MB] Get:50 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libcusolver11 ppc64el 11.4.3.1~12.0.1-3 [35.7 MB] Get:51 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libcusolvermg11 ppc64el 11.4.3.1~12.0.1-3 [22.8 MB] Get:52 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libcusparse12 ppc64el 12.0.1.140~12.0.1-3 [108 MB] Get:53 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libnppc12 ppc64el 12.0.1.104~12.0.1-3 [463 kB] Get:54 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libnppial12 ppc64el 12.0.1.104~12.0.1-3 [5601 kB] Get:55 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libnppicc12 ppc64el 12.0.1.104~12.0.1-3 [2462 kB] Get:56 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libnppidei12 ppc64el 12.0.1.104~12.0.1-3 [2784 kB] Get:57 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libnppif12 ppc64el 12.0.1.104~12.0.1-3 [45.1 MB] Get:58 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libnppig12 ppc64el 12.0.1.104~12.0.1-3 [15.8 MB] Get:59 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libnppim12 ppc64el 12.0.1.104~12.0.1-3 [3157 kB] Get:60 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libnppist12 ppc64el 12.0.1.104~12.0.1-3 [15.9 MB] Get:61 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libnppisu12 ppc64el 12.0.1.104~12.0.1-3 [173 kB] Get:62 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libnppitc12 ppc64el 12.0.1.104~12.0.1-3 [1493 kB] Get:63 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libnpps12 ppc64el 12.0.1.104~12.0.1-3 [7361 kB] Get:64 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libnvblas12 ppc64el 12.0.2.224~12.0.1-3 [186 kB] Get:65 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libnvidia-ml-dev ppc64el 12.0.140~12.0.1-3 [83.3 kB] Get:66 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libnvjpeg12 ppc64el 12.0.1.102~12.0.1-3 [1947 kB] Get:67 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libnvrtc-builtins12.0 ppc64el 12.0.140~12.0.1-3 [147 kB] Get:68 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libnvrtc12 ppc64el 12.0.140~12.0.1-3 [17.3 MB] Get:69 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libnvvm4 ppc64el 12.0.140~12.0.1-3 [8701 kB] Get:70 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el nvidia-profiler ppc64el 12.0.146~12.0.1-3 [2212 kB] Get:71 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libnvtoolsext1 ppc64el 12.0.140~12.0.1-3 [33.9 kB] Get:72 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el libcupti-dev ppc64el 12.0.146~12.0.1-3 [118 kB] Get:73 http://ftpmaster.internal/ubuntu noble/universe ppc64el libthrust-dev all 2.0.1-2 [418 kB] Get:74 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el nvidia-cuda-dev ppc64el 12.0.146~12.0.1-3 [626 MB] Get:75 http://ftpmaster.internal/ubuntu noble/universe ppc64el opencl-c-headers all 3.0~2023.04.17-1 [45.5 kB] Get:76 http://ftpmaster.internal/ubuntu noble/universe ppc64el opencl-clhpp-headers all 3.0~2023.04.17-2ubuntu1 [49.4 kB] Get:77 http://ftpmaster.internal/ubuntu noble/universe ppc64el ocl-icd-libopencl1 ppc64el 2.3.2-1 [41.7 kB] Get:78 http://ftpmaster.internal/ubuntu noble/universe ppc64el ocl-icd-opencl-dev ppc64el 2.3.2-1 [2440 B] Get:79 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el nvidia-opencl-dev ppc64el 12.0.140~12.0.1-3 [23.7 kB] Get:80 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el nvidia-cuda-toolkit ppc64el 12.0.140~12.0.1-3 [85.8 MB] Get:81 http://ftpmaster.internal/ubuntu noble-proposed/multiverse ppc64el nvidia-cuda-toolkit-gcc ppc64el 12.0.1-3 [16.4 kB] Preconfiguring packages ... Fetched 1387 MB in 1min 6s (21.0 MB/s) Selecting previously unselected package libelf1:ppc64el. (Reading database ... 13556 files and directories currently installed.) Preparing to unpack .../00-libelf1_0.189-4_ppc64el.deb ... Unpacking libelf1:ppc64el (0.189-4) ... Selecting previously unselected package libicu72:ppc64el. Preparing to unpack .../01-libicu72_72.1-3ubuntu3_ppc64el.deb ... Unpacking libicu72:ppc64el (72.1-3ubuntu3) ... Selecting previously unselected package libxml2:ppc64el. Preparing to unpack .../02-libxml2_2.9.14+dfsg-1.3build1_ppc64el.deb ... Unpacking libxml2:ppc64el (2.9.14+dfsg-1.3build1) ... Selecting previously unselected package libmagic-mgc. Preparing to unpack .../03-libmagic-mgc_1%3a5.45-2_ppc64el.deb ... Unpacking libmagic-mgc (1:5.45-2) ... Selecting previously unselected package libmagic1:ppc64el. Preparing to unpack .../04-libmagic1_1%3a5.45-2_ppc64el.deb ... Unpacking libmagic1:ppc64el (1:5.45-2) ... Selecting previously unselected package file. Preparing to unpack .../05-file_1%3a5.45-2_ppc64el.deb ... Unpacking file (1:5.45-2) ... Selecting previously unselected package gettext-base. Preparing to unpack .../06-gettext-base_0.21-13build1_ppc64el.deb ... Unpacking gettext-base (0.21-13build1) ... Selecting previously unselected package libuchardet0:ppc64el. Preparing to unpack .../07-libuchardet0_0.0.7-1build2_ppc64el.deb ... Unpacking libuchardet0:ppc64el (0.0.7-1build2) ... Selecting previously unselected package groff-base. Preparing to unpack .../08-groff-base_1.23.0-3_ppc64el.deb ... Unpacking groff-base (1.23.0-3) ... Selecting previously unselected package libpipeline1:ppc64el. Preparing to unpack .../09-libpipeline1_1.5.7-1_ppc64el.deb ... Unpacking libpipeline1:ppc64el (1.5.7-1) ... Selecting previously unselected package man-db. Preparing to unpack .../10-man-db_2.12.0-1_ppc64el.deb ... Unpacking man-db (2.12.0-1) ... Selecting previously unselected package m4. Preparing to unpack .../11-m4_1.4.19-4_ppc64el.deb ... Unpacking m4 (1.4.19-4) ... Selecting previously unselected package autoconf. Preparing to unpack .../12-autoconf_2.71-3_all.deb ... Unpacking autoconf (2.71-3) ... Selecting previously unselected package autotools-dev. Preparing to unpack .../13-autotools-dev_20220109.1_all.deb ... Unpacking autotools-dev (20220109.1) ... Selecting previously unselected package automake. Preparing to unpack .../14-automake_1%3a1.16.5-1.3_all.deb ... Unpacking automake (1:1.16.5-1.3) ... Selecting previously unselected package autopoint. Preparing to unpack .../15-autopoint_0.21-13build1_all.deb ... Unpacking autopoint (0.21-13build1) ... Selecting previously unselected package gcc-12-base:ppc64el. Preparing to unpack .../16-gcc-12-base_12.3.0-11ubuntu1_ppc64el.deb ... Unpacking gcc-12-base:ppc64el (12.3.0-11ubuntu1) ... Selecting previously unselected package cpp-12. Preparing to unpack .../17-cpp-12_12.3.0-11ubuntu1_ppc64el.deb ... Unpacking cpp-12 (12.3.0-11ubuntu1) ... Selecting previously unselected package libdebhelper-perl. Preparing to unpack .../18-libdebhelper-perl_13.11.7ubuntu1_all.deb ... Unpacking libdebhelper-perl (13.11.7ubuntu1) ... Selecting previously unselected package libgcc-12-dev:ppc64el. Preparing to unpack .../19-libgcc-12-dev_12.3.0-11ubuntu1_ppc64el.deb ... Unpacking libgcc-12-dev:ppc64el (12.3.0-11ubuntu1) ... Selecting previously unselected package gcc-12. Preparing to unpack .../20-gcc-12_12.3.0-11ubuntu1_ppc64el.deb ... Unpacking gcc-12 (12.3.0-11ubuntu1) ... Selecting previously unselected package libtool. Preparing to unpack .../21-libtool_2.4.7-7_all.deb ... Unpacking libtool (2.4.7-7) ... Selecting previously unselected package dh-autoreconf. Preparing to unpack .../22-dh-autoreconf_20_all.deb ... Unpacking dh-autoreconf (20) ... Selecting previously unselected package libarchive-zip-perl. Preparing to unpack .../23-libarchive-zip-perl_1.68-1_all.deb ... Unpacking libarchive-zip-perl (1.68-1) ... Selecting previously unselected package libsub-override-perl. Preparing to unpack .../24-libsub-override-perl_0.09-4_all.deb ... Unpacking libsub-override-perl (0.09-4) ... Selecting previously unselected package libfile-stripnondeterminism-perl. Preparing to unpack .../25-libfile-stripnondeterminism-perl_1.13.1-1_all.deb ... Unpacking libfile-stripnondeterminism-perl (1.13.1-1) ... Selecting previously unselected package dh-strip-nondeterminism. Preparing to unpack .../26-dh-strip-nondeterminism_1.13.1-1_all.deb ... Unpacking dh-strip-nondeterminism (1.13.1-1) ... Selecting previously unselected package libdw1:ppc64el. Preparing to unpack .../27-libdw1_0.189-4_ppc64el.deb ... Unpacking libdw1:ppc64el (0.189-4) ... Selecting previously unselected package debugedit. Preparing to unpack .../28-debugedit_1%3a5.0-5_ppc64el.deb ... Unpacking debugedit (1:5.0-5) ... Selecting previously unselected package dwz. Preparing to unpack .../29-dwz_0.15-1_ppc64el.deb ... Unpacking dwz (0.15-1) ... Selecting previously unselected package gettext. Preparing to unpack .../30-gettext_0.21-13build1_ppc64el.deb ... Unpacking gettext (0.21-13build1) ... Selecting previously unselected package intltool-debian. Preparing to unpack .../31-intltool-debian_0.35.0+20060710.6_all.deb ... Unpacking intltool-debian (0.35.0+20060710.6) ... Selecting previously unselected package po-debconf. Preparing to unpack .../32-po-debconf_1.0.21+nmu1_all.deb ... Unpacking po-debconf (1.0.21+nmu1) ... Selecting previously unselected package debhelper. Preparing to unpack .../33-debhelper_13.11.7ubuntu1_all.deb ... Unpacking debhelper (13.11.7ubuntu1) ... Selecting previously unselected package libstdc++-12-dev:ppc64el. Preparing to unpack .../34-libstdc++-12-dev_12.3.0-11ubuntu1_ppc64el.deb ... Unpacking libstdc++-12-dev:ppc64el (12.3.0-11ubuntu1) ... Selecting previously unselected package g++-12. Preparing to unpack .../35-g++-12_12.3.0-11ubuntu1_ppc64el.deb ... Unpacking g++-12 (12.3.0-11ubuntu1) ... Selecting previously unselected package libcupti12:ppc64el. Preparing to unpack .../36-libcupti12_12.0.146~12.0.1-3_ppc64el.deb ... Unpacking libcupti12:ppc64el (12.0.146~12.0.1-3) ... Selecting previously unselected package libaccinj64-12.0:ppc64el. Preparing to unpack .../37-libaccinj64-12.0_12.0.146~12.0.1-3_ppc64el.deb ... Unpacking libaccinj64-12.0:ppc64el (12.0.146~12.0.1-3) ... Selecting previously unselected package libcu++-dev. Preparing to unpack .../38-libcu++-dev_1.9.0-3_all.deb ... Unpacking libcu++-dev (1.9.0-3) ... Selecting previously unselected package libcub-dev. Preparing to unpack .../39-libcub-dev_2.0.1-2_all.deb ... Unpacking libcub-dev (2.0.1-2) ... Selecting previously unselected package libcublaslt12:ppc64el. Preparing to unpack .../40-libcublaslt12_12.0.2.224~12.0.1-3_ppc64el.deb ... Unpacking libcublaslt12:ppc64el (12.0.2.224~12.0.1-3) ... Selecting previously unselected package libcublas12:ppc64el. Preparing to unpack .../41-libcublas12_12.0.2.224~12.0.1-3_ppc64el.deb ... Unpacking libcublas12:ppc64el (12.0.2.224~12.0.1-3) ... Selecting previously unselected package libcudart12:ppc64el. Preparing to unpack .../42-libcudart12_12.0.146~12.0.1-3_ppc64el.deb ... Unpacking libcudart12:ppc64el (12.0.146~12.0.1-3) ... Selecting previously unselected package libcufft11:ppc64el. Preparing to unpack .../43-libcufft11_11.0.1.95~12.0.1-3_ppc64el.deb ... Unpacking libcufft11:ppc64el (11.0.1.95~12.0.1-3) ... Selecting previously unselected package libcufftw11:ppc64el. Preparing to unpack .../44-libcufftw11_11.0.1.95~12.0.1-3_ppc64el.deb ... Unpacking libcufftw11:ppc64el (11.0.1.95~12.0.1-3) ... Selecting previously unselected package libcuinj64-12.0:ppc64el. Preparing to unpack .../45-libcuinj64-12.0_12.0.146~12.0.1-3_ppc64el.deb ... Unpacking libcuinj64-12.0:ppc64el (12.0.146~12.0.1-3) ... Selecting previously unselected package libcurand10:ppc64el. Preparing to unpack .../46-libcurand10_11.1.1+~10.3.1.124~12.0.1-3_ppc64el.deb ... Unpacking libcurand10:ppc64el (11.1.1+~10.3.1.124~12.0.1-3) ... Selecting previously unselected package libnvjitlink12:ppc64el. Preparing to unpack .../47-libnvjitlink12_12.0.140~12.0.1-3_ppc64el.deb ... Unpacking libnvjitlink12:ppc64el (12.0.140~12.0.1-3) ... Selecting previously unselected package libcusolver11:ppc64el. Preparing to unpack .../48-libcusolver11_11.4.3.1~12.0.1-3_ppc64el.deb ... Unpacking libcusolver11:ppc64el (11.4.3.1~12.0.1-3) ... Selecting previously unselected package libcusolvermg11:ppc64el. Preparing to unpack .../49-libcusolvermg11_11.4.3.1~12.0.1-3_ppc64el.deb ... Unpacking libcusolvermg11:ppc64el (11.4.3.1~12.0.1-3) ... Selecting previously unselected package libcusparse12:ppc64el. Preparing to unpack .../50-libcusparse12_12.0.1.140~12.0.1-3_ppc64el.deb ... Unpacking libcusparse12:ppc64el (12.0.1.140~12.0.1-3) ... Selecting previously unselected package libnppc12:ppc64el. Preparing to unpack .../51-libnppc12_12.0.1.104~12.0.1-3_ppc64el.deb ... Unpacking libnppc12:ppc64el (12.0.1.104~12.0.1-3) ... Selecting previously unselected package libnppial12:ppc64el. Preparing to unpack .../52-libnppial12_12.0.1.104~12.0.1-3_ppc64el.deb ... Unpacking libnppial12:ppc64el (12.0.1.104~12.0.1-3) ... Selecting previously unselected package libnppicc12:ppc64el. Preparing to unpack .../53-libnppicc12_12.0.1.104~12.0.1-3_ppc64el.deb ... Unpacking libnppicc12:ppc64el (12.0.1.104~12.0.1-3) ... Selecting previously unselected package libnppidei12:ppc64el. Preparing to unpack .../54-libnppidei12_12.0.1.104~12.0.1-3_ppc64el.deb ... Unpacking libnppidei12:ppc64el (12.0.1.104~12.0.1-3) ... Selecting previously unselected package libnppif12:ppc64el. Preparing to unpack .../55-libnppif12_12.0.1.104~12.0.1-3_ppc64el.deb ... Unpacking libnppif12:ppc64el (12.0.1.104~12.0.1-3) ... Selecting previously unselected package libnppig12:ppc64el. Preparing to unpack .../56-libnppig12_12.0.1.104~12.0.1-3_ppc64el.deb ... Unpacking libnppig12:ppc64el (12.0.1.104~12.0.1-3) ... Selecting previously unselected package libnppim12:ppc64el. Preparing to unpack .../57-libnppim12_12.0.1.104~12.0.1-3_ppc64el.deb ... Unpacking libnppim12:ppc64el (12.0.1.104~12.0.1-3) ... Selecting previously unselected package libnppist12:ppc64el. Preparing to unpack .../58-libnppist12_12.0.1.104~12.0.1-3_ppc64el.deb ... Unpacking libnppist12:ppc64el (12.0.1.104~12.0.1-3) ... Selecting previously unselected package libnppisu12:ppc64el. Preparing to unpack .../59-libnppisu12_12.0.1.104~12.0.1-3_ppc64el.deb ... Unpacking libnppisu12:ppc64el (12.0.1.104~12.0.1-3) ... Selecting previously unselected package libnppitc12:ppc64el. Preparing to unpack .../60-libnppitc12_12.0.1.104~12.0.1-3_ppc64el.deb ... Unpacking libnppitc12:ppc64el (12.0.1.104~12.0.1-3) ... Selecting previously unselected package libnpps12:ppc64el. Preparing to unpack .../61-libnpps12_12.0.1.104~12.0.1-3_ppc64el.deb ... Unpacking libnpps12:ppc64el (12.0.1.104~12.0.1-3) ... Selecting previously unselected package libnvblas12:ppc64el. Preparing to unpack .../62-libnvblas12_12.0.2.224~12.0.1-3_ppc64el.deb ... Unpacking libnvblas12:ppc64el (12.0.2.224~12.0.1-3) ... Selecting previously unselected package libnvidia-ml-dev:ppc64el. Preparing to unpack .../63-libnvidia-ml-dev_12.0.140~12.0.1-3_ppc64el.deb ... Unpacking libnvidia-ml-dev:ppc64el (12.0.140~12.0.1-3) ... Selecting previously unselected package libnvjpeg12:ppc64el. Preparing to unpack .../64-libnvjpeg12_12.0.1.102~12.0.1-3_ppc64el.deb ... Unpacking libnvjpeg12:ppc64el (12.0.1.102~12.0.1-3) ... Selecting previously unselected package libnvrtc-builtins12.0:ppc64el. Preparing to unpack .../65-libnvrtc-builtins12.0_12.0.140~12.0.1-3_ppc64el.deb ... Unpacking libnvrtc-builtins12.0:ppc64el (12.0.140~12.0.1-3) ... Selecting previously unselected package libnvrtc12:ppc64el. Preparing to unpack .../66-libnvrtc12_12.0.140~12.0.1-3_ppc64el.deb ... Unpacking libnvrtc12:ppc64el (12.0.140~12.0.1-3) ... Selecting previously unselected package libnvvm4:ppc64el. Preparing to unpack .../67-libnvvm4_12.0.140~12.0.1-3_ppc64el.deb ... Unpacking libnvvm4:ppc64el (12.0.140~12.0.1-3) ... Selecting previously unselected package nvidia-profiler. Preparing to unpack .../68-nvidia-profiler_12.0.146~12.0.1-3_ppc64el.deb ... Unpacking nvidia-profiler (12.0.146~12.0.1-3) ... Selecting previously unselected package libnvtoolsext1:ppc64el. Preparing to unpack .../69-libnvtoolsext1_12.0.140~12.0.1-3_ppc64el.deb ... Unpacking libnvtoolsext1:ppc64el (12.0.140~12.0.1-3) ... Selecting previously unselected package libcupti-dev:ppc64el. Preparing to unpack .../70-libcupti-dev_12.0.146~12.0.1-3_ppc64el.deb ... Unpacking libcupti-dev:ppc64el (12.0.146~12.0.1-3) ... Selecting previously unselected package libthrust-dev. Preparing to unpack .../71-libthrust-dev_2.0.1-2_all.deb ... Unpacking libthrust-dev (2.0.1-2) ... Selecting previously unselected package nvidia-cuda-dev:ppc64el. Preparing to unpack .../72-nvidia-cuda-dev_12.0.146~12.0.1-3_ppc64el.deb ... Unpacking nvidia-cuda-dev:ppc64el (12.0.146~12.0.1-3) ... Selecting previously unselected package opencl-c-headers. Preparing to unpack .../73-opencl-c-headers_3.0~2023.04.17-1_all.deb ... Unpacking opencl-c-headers (3.0~2023.04.17-1) ... Selecting previously unselected package opencl-clhpp-headers. Preparing to unpack .../74-opencl-clhpp-headers_3.0~2023.04.17-2ubuntu1_all.deb ... Unpacking opencl-clhpp-headers (3.0~2023.04.17-2ubuntu1) ... Selecting previously unselected package ocl-icd-libopencl1:ppc64el. Preparing to unpack .../75-ocl-icd-libopencl1_2.3.2-1_ppc64el.deb ... Unpacking ocl-icd-libopencl1:ppc64el (2.3.2-1) ... Selecting previously unselected package ocl-icd-opencl-dev:ppc64el. Preparing to unpack .../76-ocl-icd-opencl-dev_2.3.2-1_ppc64el.deb ... Unpacking ocl-icd-opencl-dev:ppc64el (2.3.2-1) ... Selecting previously unselected package nvidia-opencl-dev:ppc64el. Preparing to unpack .../77-nvidia-opencl-dev_12.0.140~12.0.1-3_ppc64el.deb ... Unpacking nvidia-opencl-dev:ppc64el (12.0.140~12.0.1-3) ... Selecting previously unselected package nvidia-cuda-toolkit. Preparing to unpack .../78-nvidia-cuda-toolkit_12.0.140~12.0.1-3_ppc64el.deb ... Unpacking nvidia-cuda-toolkit (12.0.140~12.0.1-3) ... Selecting previously unselected package nvidia-cuda-toolkit-gcc. Preparing to unpack .../79-nvidia-cuda-toolkit-gcc_12.0.1-3_ppc64el.deb ... Unpacking nvidia-cuda-toolkit-gcc (12.0.1-3) ... Selecting previously unselected package sbuild-build-depends-main-dummy. Preparing to unpack .../80-sbuild-build-depends-main-dummy_0.invalid.0_ppc64el.deb ... Unpacking sbuild-build-depends-main-dummy (0.invalid.0) ... Setting up libpipeline1:ppc64el (1.5.7-1) ... Setting up libnppc12:ppc64el (12.0.1.104~12.0.1-3) ... Setting up libnppial12:ppc64el (12.0.1.104~12.0.1-3) ... Setting up libcudart12:ppc64el (12.0.146~12.0.1-3) ... Setting up libicu72:ppc64el (72.1-3ubuntu3) ... Setting up libnvidia-ml-dev:ppc64el (12.0.140~12.0.1-3) ... Setting up libnppig12:ppc64el (12.0.1.104~12.0.1-3) ... Setting up libnppidei12:ppc64el (12.0.1.104~12.0.1-3) ... Setting up libnppif12:ppc64el (12.0.1.104~12.0.1-3) ... Setting up libmagic-mgc (1:5.45-2) ... Setting up libarchive-zip-perl (1.68-1) ... Setting up libcupti12:ppc64el (12.0.146~12.0.1-3) ... Setting up libcu++-dev (1.9.0-3) ... Setting up libdebhelper-perl (13.11.7ubuntu1) ... Setting up libcupti-dev:ppc64el (12.0.146~12.0.1-3) ... Setting up libnppicc12:ppc64el (12.0.1.104~12.0.1-3) ... Setting up libmagic1:ppc64el (1:5.45-2) ... Setting up gettext-base (0.21-13build1) ... Setting up m4 (1.4.19-4) ... Setting up file (1:5.45-2) ... Setting up gcc-12-base:ppc64el (12.3.0-11ubuntu1) ... Setting up libnppisu12:ppc64el (12.0.1.104~12.0.1-3) ... Setting up autotools-dev (20220109.1) ... Setting up libnpps12:ppc64el (12.0.1.104~12.0.1-3) ... Setting up libnvjitlink12:ppc64el (12.0.140~12.0.1-3) ... Setting up libgcc-12-dev:ppc64el (12.3.0-11ubuntu1) ... Setting up autopoint (0.21-13build1) ... Setting up libnppim12:ppc64el (12.0.1.104~12.0.1-3) ... Setting up libcufft11:ppc64el (11.0.1.95~12.0.1-3) ... Setting up libnppitc12:ppc64el (12.0.1.104~12.0.1-3) ... Setting up opencl-c-headers (3.0~2023.04.17-1) ... Setting up libnvrtc-builtins12.0:ppc64el (12.0.140~12.0.1-3) ... Setting up libnvjpeg12:ppc64el (12.0.1.102~12.0.1-3) ... Setting up autoconf (2.71-3) ... Setting up libcublaslt12:ppc64el (12.0.2.224~12.0.1-3) ... Setting up ocl-icd-libopencl1:ppc64el (2.3.2-1) ... Setting up libuchardet0:ppc64el (0.0.7-1build2) ... Setting up libsub-override-perl (0.09-4) ... Setting up libnvrtc12:ppc64el (12.0.140~12.0.1-3) ... Setting up libnvvm4:ppc64el (12.0.140~12.0.1-3) ... Setting up libnvtoolsext1:ppc64el (12.0.140~12.0.1-3) ... Setting up libcub-dev (2.0.1-2) ... Setting up libelf1:ppc64el (0.189-4) ... Setting up libxml2:ppc64el (2.9.14+dfsg-1.3build1) ... Setting up libnppist12:ppc64el (12.0.1.104~12.0.1-3) ... Setting up libcurand10:ppc64el (11.1.1+~10.3.1.124~12.0.1-3) ... Setting up automake (1:1.16.5-1.3) ... update-alternatives: using /usr/bin/automake-1.16 to provide /usr/bin/automake (automake) in auto mode Setting up libfile-stripnondeterminism-perl (1.13.1-1) ... Setting up libcusparse12:ppc64el (12.0.1.140~12.0.1-3) ... Setting up libdw1:ppc64el (0.189-4) ... Setting up cpp-12 (12.3.0-11ubuntu1) ... Setting up gettext (0.21-13build1) ... Setting up libthrust-dev (2.0.1-2) ... Setting up libstdc++-12-dev:ppc64el (12.3.0-11ubuntu1) ... Setting up libaccinj64-12.0:ppc64el (12.0.146~12.0.1-3) ... Setting up libcufftw11:ppc64el (11.0.1.95~12.0.1-3) ... Setting up libtool (2.4.7-7) ... Setting up libcuinj64-12.0:ppc64el (12.0.146~12.0.1-3) ... Setting up libcublas12:ppc64el (12.0.2.224~12.0.1-3) ... Setting up nvidia-profiler (12.0.146~12.0.1-3) ... Setting up intltool-debian (0.35.0+20060710.6) ... Setting up dh-autoreconf (20) ... Setting up gcc-12 (12.3.0-11ubuntu1) ... Setting up libcusolver11:ppc64el (11.4.3.1~12.0.1-3) ... Setting up opencl-clhpp-headers (3.0~2023.04.17-2ubuntu1) ... Setting up libcusolvermg11:ppc64el (11.4.3.1~12.0.1-3) ... Setting up ocl-icd-opencl-dev:ppc64el (2.3.2-1) ... Setting up libnvblas12:ppc64el (12.0.2.224~12.0.1-3) ... Setting up dh-strip-nondeterminism (1.13.1-1) ... Setting up dwz (0.15-1) ... Setting up groff-base (1.23.0-3) ... Setting up debugedit (1:5.0-5) ... Setting up g++-12 (12.3.0-11ubuntu1) ... Setting up po-debconf (1.0.21+nmu1) ... Setting up nvidia-cuda-dev:ppc64el (12.0.146~12.0.1-3) ... Setting up man-db (2.12.0-1) ... Not building database; man-db/auto-update is not 'true'. Created symlink /etc/systemd/system/timers.target.wants/man-db.timer → /lib/systemd/system/man-db.timer. Setting up nvidia-cuda-toolkit (12.0.140~12.0.1-3) ... Setting up nvidia-opencl-dev:ppc64el (12.0.140~12.0.1-3) ... Setting up nvidia-cuda-toolkit-gcc (12.0.1-3) ... Setting up debhelper (13.11.7ubuntu1) ... Setting up sbuild-build-depends-main-dummy (0.invalid.0) ... Processing triggers for libc-bin (2.38-3ubuntu1) ... +------------------------------------------------------------------------------+ | Check architectures | +------------------------------------------------------------------------------+ Arch check ok (ppc64el included in amd64 arm64 ppc64el) +------------------------------------------------------------------------------+ | Build environment | +------------------------------------------------------------------------------+ Kernel: Linux 5.4.0-164-generic #181-Ubuntu SMP Fri Sep 1 13:41:18 UTC 2023 ppc64el (ppc64le) Toolchain package versions: binutils_2.41-6ubuntu1 dpkg-dev_1.22.1ubuntu2 g++-12_12.3.0-11ubuntu1 g++-13_13.2.0-6ubuntu1 gcc-12_12.3.0-11ubuntu1 gcc-13_13.2.0-6ubuntu1 libc6-dev_2.38-3ubuntu1 libstdc++-12-dev_12.3.0-11ubuntu1 libstdc++-13-dev_13.2.0-6ubuntu1 libstdc++6_13.2.0-6ubuntu1 linux-libc-dev_6.5.0-9.9 Package versions: adduser_3.137ubuntu1 advancecomp_2.5-1 apt_2.7.6 apt-utils_2.7.6 autoconf_2.71-3 automake_1:1.16.5-1.3 autopoint_0.21-13build1 autotools-dev_20220109.1 base-files_13ubuntu4 base-passwd_3.6.2 bash_5.2.15-2ubuntu1 bash-completion_1:2.11-8 binutils_2.41-6ubuntu1 binutils-common_2.41-6ubuntu1 binutils-powerpc64le-linux-gnu_2.41-6ubuntu1 bsdextrautils_2.39.1-4ubuntu2 bsdutils_1:2.39.1-4ubuntu2 build-essential_12.10ubuntu1 bzip2_1.0.8-5build1 ca-certificates_20230311ubuntu1 coreutils_9.1-1ubuntu2 cpp_4:13.2.0-1ubuntu1 cpp-12_12.3.0-11ubuntu1 cpp-13_13.2.0-6ubuntu1 dash_0.5.12-6ubuntu1 debconf_1.5.82 debconf-i18n_1.5.82 debhelper_13.11.7ubuntu1 debianutils_5.14 debugedit_1:5.0-5 dh-autoreconf_20 dh-strip-nondeterminism_1.13.1-1 diffutils_1:3.10-1 dpkg_1.22.1ubuntu2 dpkg-dev_1.22.1ubuntu2 dwz_0.15-1 e2fsprogs_1.47.0-2ubuntu1 fakeroot_1.32.2-1 file_1:5.45-2 findutils_4.9.0-5 g++_4:13.2.0-1ubuntu1 g++-12_12.3.0-11ubuntu1 g++-13_13.2.0-6ubuntu1 gcc_4:13.2.0-1ubuntu1 gcc-12_12.3.0-11ubuntu1 gcc-12-base_12.3.0-11ubuntu1 gcc-13_13.2.0-6ubuntu1 gcc-13-base_13.2.0-6ubuntu1 gettext_0.21-13build1 gettext-base_0.21-13build1 gpg_2.2.40-1.1ubuntu1 gpg-agent_2.2.40-1.1ubuntu1 gpgconf_2.2.40-1.1ubuntu1 gpgv_2.2.40-1.1ubuntu1 grep_3.11-3 groff-base_1.23.0-3 gzip_1.12-1ubuntu1 hostname_3.23+nmu1ubuntu1 init_1.65.2ubuntu1 init-system-helpers_1.65.2ubuntu1 intltool-debian_0.35.0+20060710.6 krb5-locales_1.20.1-3ubuntu1 libaccinj64-12.0_12.0.146~12.0.1-3 libacl1_2.3.1-3 libapparmor1_4.0.0~alpha2-0ubuntu6 libapt-pkg6.0_2.7.6 libarchive-zip-perl_1.68-1 libargon2-1_0~20190702+dfsg-4 libasan8_13.2.0-6ubuntu1 libassuan0_2.5.6-1 libatomic1_13.2.0-6ubuntu1 libattr1_1:2.5.1-4 libaudit-common_1:3.1.1-1build1 libaudit1_1:3.1.1-1build1 libbinutils_2.41-6ubuntu1 libblkid1_2.39.1-4ubuntu2 libbz2-1.0_1.0.8-5build1 libc-bin_2.38-3ubuntu1 libc-dev-bin_2.38-3ubuntu1 libc6_2.38-3ubuntu1 libc6-dev_2.38-3ubuntu1 libcap-ng0_0.8.3-1build3 libcap2_1:2.66-4ubuntu1 libcc1-0_13.2.0-6ubuntu1 libcom-err2_1.47.0-2ubuntu1 libcrypt-dev_1:4.4.36-2 libcrypt1_1:4.4.36-2 libcryptsetup12_2:2.6.1-4ubuntu3 libctf-nobfd0_2.41-6ubuntu1 libctf0_2.41-6ubuntu1 libcu++-dev_1.9.0-3 libcub-dev_2.0.1-2 libcublas12_12.0.2.224~12.0.1-3 libcublaslt12_12.0.2.224~12.0.1-3 libcudart12_12.0.146~12.0.1-3 libcufft11_11.0.1.95~12.0.1-3 libcufftw11_11.0.1.95~12.0.1-3 libcuinj64-12.0_12.0.146~12.0.1-3 libcupti-dev_12.0.146~12.0.1-3 libcupti12_12.0.146~12.0.1-3 libcurand10_11.1.1+~10.3.1.124~12.0.1-3 libcusolver11_11.4.3.1~12.0.1-3 libcusolvermg11_11.4.3.1~12.0.1-3 libcusparse12_12.0.1.140~12.0.1-3 libdb5.3_5.3.28+dfsg2-4 libdebconfclient0_0.270ubuntu1 libdebhelper-perl_13.11.7ubuntu1 libdevmapper1.02.1_2:1.02.185-2ubuntu1 libdpkg-perl_1.22.1ubuntu2 libdw1_0.189-4 libelf1_0.189-4 libext2fs2_1.47.0-2ubuntu1 libfakeroot_1.32.2-1 libfdisk1_2.39.1-4ubuntu2 libffi8_3.4.4-1 libfile-stripnondeterminism-perl_1.13.1-1 libgcc-12-dev_12.3.0-11ubuntu1 libgcc-13-dev_13.2.0-6ubuntu1 libgcc-s1_13.2.0-6ubuntu1 libgcrypt20_1.10.2-3ubuntu1 libgdbm-compat4_1.23-3 libgdbm6_1.23-3 libgmp10_2:6.3.0+dfsg-2ubuntu4 libgnutls30_3.8.1-4ubuntu3 libgomp1_13.2.0-6ubuntu1 libgpg-error-l10n_1.47-2 libgpg-error0_1.47-2 libgpm2_1.20.7-10build1 libgssapi-krb5-2_1.20.1-3ubuntu1 libhogweed6_3.9.1-2 libicu72_72.1-3ubuntu3 libidn2-0_2.3.4-1build1 libip4tc2_1.8.9-2ubuntu2 libisl23_0.26-3 libitm1_13.2.0-6ubuntu1 libjansson4_2.14-2 libjson-c5_0.17-1 libk5crypto3_1.20.1-3ubuntu1 libkeyutils1_1.6.3-2 libkmod2_30+20230519-1ubuntu3 libkrb5-3_1.20.1-3ubuntu1 libkrb5support0_1.20.1-3ubuntu1 liblocale-gettext-perl_1.07-6 liblockfile-bin_1.17-1build2 liblockfile1_1.17-1build2 liblsan0_13.2.0-6ubuntu1 liblz4-1_1.9.4-1 liblzma5_5.4.4-0.1 libmagic-mgc_1:5.45-2 libmagic1_1:5.45-2 libmd0_1.1.0-1 libmount1_2.39.1-4ubuntu2 libmpc3_1.3.1-1 libmpfr6_4.2.1-1 libncursesw6_6.4+20231016-1 libnettle8_3.9.1-2 libnppc12_12.0.1.104~12.0.1-3 libnppial12_12.0.1.104~12.0.1-3 libnppicc12_12.0.1.104~12.0.1-3 libnppidei12_12.0.1.104~12.0.1-3 libnppif12_12.0.1.104~12.0.1-3 libnppig12_12.0.1.104~12.0.1-3 libnppim12_12.0.1.104~12.0.1-3 libnppist12_12.0.1.104~12.0.1-3 libnppisu12_12.0.1.104~12.0.1-3 libnppitc12_12.0.1.104~12.0.1-3 libnpps12_12.0.1.104~12.0.1-3 libnpth0_1.6-3build2 libnsl-dev_1.3.0-3 libnsl2_1.3.0-3 libnss-nis_3.1-0ubuntu6 libnss-nisplus_1.3-0ubuntu6 libnvblas12_12.0.2.224~12.0.1-3 libnvidia-ml-dev_12.0.140~12.0.1-3 libnvjitlink12_12.0.140~12.0.1-3 libnvjpeg12_12.0.1.102~12.0.1-3 libnvrtc-builtins12.0_12.0.140~12.0.1-3 libnvrtc12_12.0.140~12.0.1-3 libnvtoolsext1_12.0.140~12.0.1-3 libnvvm4_12.0.140~12.0.1-3 libp11-kit0_0.25.0-4ubuntu1 libpam-modules_1.5.2-6ubuntu1 libpam-modules-bin_1.5.2-6ubuntu1 libpam-runtime_1.5.2-6ubuntu1 libpam0g_1.5.2-6ubuntu1 libpcre2-8-0_10.42-4 libperl5.36_5.36.0-9ubuntu1 libpipeline1_1.5.7-1 libpng16-16_1.6.40-2 libproc2-0_2:4.0.3-1ubuntu1 libquadmath0_13.2.0-6ubuntu1 libreadline8_8.2-1.3 libseccomp2_2.5.4-2ubuntu1 libselinux1_3.5-1build1 libsemanage-common_3.5-1build1 libsemanage2_3.5-1build1 libsepol2_3.5-1 libsframe1_2.41-6ubuntu1 libsmartcols1_2.39.1-4ubuntu2 libsqlite3-0_3.44.0-1 libss2_1.47.0-2ubuntu1 libssl3_3.0.10-1ubuntu2.1 libstdc++-12-dev_12.3.0-11ubuntu1 libstdc++-13-dev_13.2.0-6ubuntu1 libstdc++6_13.2.0-6ubuntu1 libsub-override-perl_0.09-4 libsystemd-shared_253.5-1ubuntu7 libsystemd0_253.5-1ubuntu7 libtasn1-6_4.19.0-3 libtext-charwidth-perl_0.04-11 libtext-iconv-perl_1.7-8 libtext-wrapi18n-perl_0.06-10 libthrust-dev_2.0.1-2 libtinfo6_6.4+20231016-1 libtirpc-common_1.3.3+ds-1 libtirpc-dev_1.3.3+ds-1 libtirpc3_1.3.3+ds-1 libtool_2.4.7-7 libtsan2_13.2.0-6ubuntu1 libubsan1_13.2.0-6ubuntu1 libuchardet0_0.0.7-1build2 libudev1_253.5-1ubuntu7 libunistring2_1.0-2 libunistring5_1.1-2 libuuid1_2.39.1-4ubuntu2 libxml2_2.9.14+dfsg-1.3build1 libxxhash0_0.8.2-2 libzstd1_1.5.5+dfsg2-2 linux-libc-dev_6.5.0-9.9 lockfile-progs_0.1.19build1 login_1:4.13+dfsg1-1ubuntu1 logsave_1.47.0-2ubuntu1 lto-disabled-list_43 m4_1.4.19-4 make_4.3-4.1build1 man-db_2.12.0-1 mawk_1.3.4.20230808-1 mount_2.39.1-4ubuntu2 ncurses-base_6.4+20231016-1 ncurses-bin_6.4+20231016-1 nvidia-cuda-dev_12.0.146~12.0.1-3 nvidia-cuda-toolkit_12.0.140~12.0.1-3 nvidia-cuda-toolkit-gcc_12.0.1-3 nvidia-opencl-dev_12.0.140~12.0.1-3 nvidia-profiler_12.0.146~12.0.1-3 ocl-icd-libopencl1_2.3.2-1 ocl-icd-opencl-dev_2.3.2-1 opencl-c-headers_3.0~2023.04.17-1 opencl-clhpp-headers_3.0~2023.04.17-2ubuntu1 openssl_3.0.10-1ubuntu2.1 optipng_0.7.7-3 passwd_1:4.13+dfsg1-1ubuntu1 patch_2.7.6-7build2 perl_5.36.0-9ubuntu1 perl-base_5.36.0-9ubuntu1 perl-modules-5.36_5.36.0-9ubuntu1 pinentry-curses_1.2.1-1ubuntu1 pkgbinarymangler_154 po-debconf_1.0.21+nmu1 policyrcd-script-zg2_0.1-3.1 procps_2:4.0.3-1ubuntu1 psmisc_23.6-1 readline-common_8.2-1.3 rpcsvc-proto_1.4.2-0ubuntu6 sbuild-build-depends-main-dummy_0.invalid.0 sed_4.9-1 sensible-utils_0.0.20 systemd_253.5-1ubuntu7 systemd-dev_253.5-1ubuntu7 systemd-sysv_253.5-1ubuntu7 sysvinit-utils_3.07-1ubuntu1 tar_1.34+dfsg-1.2ubuntu1 tzdata_2023c-9ubuntu1 ubuntu-keyring_2021.03.26 usrmerge_35ubuntu1 util-linux_2.39.1-4ubuntu2 uuid-runtime_2.39.1-4ubuntu2 xz-utils_5.4.4-0.1 zlib1g_1:1.2.13.dfsg-1ubuntu5 +------------------------------------------------------------------------------+ | Build | +------------------------------------------------------------------------------+ Unpack source ------------- -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Format: 3.0 (quilt) Source: nvidia-nccl Binary: libnccl2, libnccl-dev Architecture: amd64 arm64 ppc64el Version: 2.18.5-1-2 Maintainer: Debian NVIDIA Maintainers Uploaders: Mo Zhou Homepage: https://github.com/NVIDIA/nccl Standards-Version: 4.6.2 Vcs-Browser: https://salsa.debian.org/nvidia-team/nvidia-nccl Vcs-Git: https://salsa.debian.org/nvidia-team/nvidia-nccl.git Build-Depends: debhelper-compat (= 13), nvidia-cuda-toolkit-gcc Package-List: libnccl-dev deb contrib/libdevel optional arch=amd64,arm64,ppc64el libnccl2 deb contrib/libs optional arch=amd64,arm64,ppc64el Checksums-Sha1: aeaef5d6b8cfd0cd22010a28f98cdd99c745c4f0 362611 nvidia-nccl_2.18.5-1.orig.tar.gz 50721d341785322983fc6f62cab34de5240c6c4e 5016 nvidia-nccl_2.18.5-1-2.debian.tar.xz Checksums-Sha256: b4f5d7d9eea2c12e32e7a06fe138b2cfc75969c6d5c473aa6f819a792db2fc96 362611 nvidia-nccl_2.18.5-1.orig.tar.gz 2ae0d151024c52faac933b01bbd779e0eb079cf1d4be3fea9f4abe9f62da3983 5016 nvidia-nccl_2.18.5-1-2.debian.tar.xz Files: ead8749c3cd685282492056ae06216b3 362611 nvidia-nccl_2.18.5-1.orig.tar.gz 649ee2190a6d8e1fe328c3627b57da24 5016 nvidia-nccl_2.18.5-1-2.debian.tar.xz -----BEGIN PGP SIGNATURE----- iQJEBAEBCAAuFiEE6/MKMKjZxjvaRMaUX7M/k1np7QgFAmVQLLMQHGFuYmVAZGVi aWFuLm9yZwAKCRBfsz+TWentCGKgEACB77cVz7SLVMEjn1jdrgJCQ3WEv2svy0/e oxoVfo0k/4/HFwEC/uG0oktsXYr1VjYh7VFTJpgQe1ogVdvqodsrQnNSvv8svUJO iODLPITkOAVIO+KkrEtTbR1/PbcIQ3T6IpmjvBIcFoR2QgaNHutT+Qk2kA3lwz2D 9xqREXeeu7+6dgrkLPtb4NVARySquIjKYle4fVOTyjdxIPlw+RVx+TqP3IJGtO4a eDu9Tyi7U6hbsg9CSWV0nUEFpZWffkEAkic9s9rH3hTh34iD1ibdvbZPLjwREoWI mi7OUppwQseVmibzlG5kHJBSNWTjoOP8p1NH+98nrajG0UzeTdKFcKVVN1tNmtFh AZwO44alixZ3yyQro7Aja+iZyfCT1MQgxKW28RoLZuvI8nK2rNcx65qwuLtpeWut 8g0DINkCcfjkBe0WTp+/f2add/W//2kHurGIMtlEUfHd6pgvk01/8WTYADGvsUQV CV1LEpotS6ybE0BTZtp3s99oAEWpKq7exGNFJHJJlPD6aaIH+zikJfWkN0uHh0JV pvTSz9uhFZQ42jYCcZbvidvON7+VRgiZsddGSERMX3F167it79D9hjmBzK6ewuH2 kZ/AAiVF8C9cc0Zr7J+KolbeliPPKQGcG0wDIPIlfspOgL/oG641/btHIWEJ+msC /BnfMQs4Nw== =NhLP -----END PGP SIGNATURE----- gpgv: Signature made Sun Nov 12 01:38:59 2023 UTC gpgv: using RSA key EBF30A30A8D9C63BDA44C6945FB33F9359E9ED08 gpgv: issuer "anbe@debian.org" gpgv: Can't check signature: No public key dpkg-source: warning: cannot verify inline signature for ./nvidia-nccl_2.18.5-1-2.dsc: no acceptable signature found dpkg-source: info: extracting nvidia-nccl in /<> dpkg-source: info: unpacking nvidia-nccl_2.18.5-1.orig.tar.gz dpkg-source: info: unpacking nvidia-nccl_2.18.5-1-2.debian.tar.xz dpkg-source: info: using patch list from debian/patches/series dpkg-source: info: applying hardening.patch Check disk space ---------------- Sufficient free space for build User Environment ---------------- APT_CONFIG=/var/lib/sbuild/apt.conf DEB_BUILD_OPTIONS=parallel=4 HOME=/sbuild-nonexistent LANG=C.UTF-8 LC_ALL=C.UTF-8 LOGNAME=buildd PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games SCHROOT_ALIAS_NAME=build-PACKAGEBUILD-26988716 SCHROOT_CHROOT_NAME=build-PACKAGEBUILD-26988716 SCHROOT_COMMAND=env SCHROOT_GID=2501 SCHROOT_GROUP=buildd SCHROOT_SESSION_ID=build-PACKAGEBUILD-26988716 SCHROOT_UID=2001 SCHROOT_USER=buildd SHELL=/bin/sh TERM=unknown USER=buildd V=1 dpkg-buildpackage ----------------- Command: dpkg-buildpackage -us -uc -mLaunchpad Build Daemon -B -rfakeroot dpkg-buildpackage: info: source package nvidia-nccl dpkg-buildpackage: info: source version 2.18.5-1-2 dpkg-buildpackage: info: source distribution unstable dpkg-source --before-build . dpkg-buildpackage: info: host architecture ppc64el debian/rules clean dh clean dh_auto_clean make -j4 clean make[1]: Entering directory '/<>' make -C src clean BUILDDIR=/<>/build make -C pkg clean BUILDDIR=/<>/build make[2]: Entering directory '/<>/src' make[2]: Entering directory '/<>/pkg' make -C debian clean make -C txz clean make[3]: Entering directory '/<>/pkg/debian' make[3]: Entering directory '/<>/pkg/txz' NVCC_GENCODE is -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 NVCC_GENCODE is -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 NVCC_GENCODE is -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 rm -Rf /<>/build/debian /<>/build/pkg/deb/ make[3]: Leaving directory '/<>/pkg/debian' rm -Rf /<>/build/txz /<>/build/pkg/txz/ make[3]: Leaving directory '/<>/pkg/txz' make[2]: Leaving directory '/<>/pkg' make -C collectives/device clean make[3]: Entering directory '/<>/src/collectives/device' NVCC_GENCODE is -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 rm -f /<>/build/obj/collectives/device/sendrecv_sum_i8.o /<>/build/obj/collectives/device/sendrecv_sum_u8.o /<>/build/obj/collectives/device/sendrecv_sum_i32.o /<>/build/obj/collectives/device/sendrecv_sum_u32.o /<>/build/obj/collectives/device/sendrecv_sum_i64.o /<>/build/obj/collectives/device/sendrecv_sum_u64.o /<>/build/obj/collectives/device/sendrecv_sum_f16.o /<>/build/obj/collectives/device/sendrecv_sum_f32.o /<>/build/obj/collectives/device/sendrecv_sum_f64.o /<>/build/obj/collectives/device/sendrecv_sum_bf16.o /<>/build/obj/collectives/device/sendrecv_prod_i8.o /<>/build/obj/collectives/device/sendrecv_prod_u8.o /<>/build/obj/collectives/device/sendrecv_prod_i32.o /<>/build/obj/collectives/device/sendrecv_prod_u32.o /<>/build/obj/collectives/device/sendrecv_prod_i64.o /<>/build/obj/collectives/device/sendrecv_prod_u64.o /<>/build/obj/collectives/device/sendrecv_prod_f16.o /<>/build/obj/collectives/device/sendrecv_prod_f32.o /<>/build/obj/collectives/device/sendrecv_prod_f64.o /<>/build/obj/collectives/device/sendrecv_prod_bf16.o /<>/build/obj/collectives/device/sendrecv_min_i8.o /<>/build/obj/collectives/device/sendrecv_min_u8.o /<>/build/obj/collectives/device/sendrecv_min_i32.o /<>/build/obj/collectives/device/sendrecv_min_u32.o /<>/build/obj/collectives/device/sendrecv_min_i64.o /<>/build/obj/collectives/device/sendrecv_min_u64.o /<>/build/obj/collectives/device/sendrecv_min_f16.o /<>/build/obj/collectives/device/sendrecv_min_f32.o /<>/build/obj/collectives/device/sendrecv_min_f64.o /<>/build/obj/collectives/device/sendrecv_min_bf16.o /<>/build/obj/collectives/device/sendrecv_max_i8.o /<>/build/obj/collectives/device/sendrecv_max_u8.o /<>/build/obj/collectives/device/sendrecv_max_i32.o /<>/build/obj/collectives/device/sendrecv_max_u32.o /<>/build/obj/collectives/device/sendrecv_max_i64.o /<>/build/obj/collectives/device/sendrecv_max_u64.o /<>/build/obj/collectives/device/sendrecv_max_f16.o /<>/build/obj/collectives/device/sendrecv_max_f32.o /<>/build/obj/collectives/device/sendrecv_max_f64.o /<>/build/obj/collectives/device/sendrecv_max_bf16.o /<>/build/obj/collectives/device/sendrecv_premulsum_i8.o /<>/build/obj/collectives/device/sendrecv_premulsum_u8.o /<>/build/obj/collectives/device/sendrecv_premulsum_i32.o /<>/build/obj/collectives/device/sendrecv_premulsum_u32.o /<>/build/obj/collectives/device/sendrecv_premulsum_i64.o /<>/build/obj/collectives/device/sendrecv_premulsum_u64.o /<>/build/obj/collectives/device/sendrecv_premulsum_f16.o /<>/build/obj/collectives/device/sendrecv_premulsum_f32.o /<>/build/obj/collectives/device/sendrecv_premulsum_f64.o /<>/build/obj/collectives/device/sendrecv_premulsum_bf16.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i8.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u8.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i32.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u32.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i64.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u64.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f16.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f32.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f64.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_bf16.o /<>/build/obj/collectives/device/all_reduce_sum_i8.o /<>/build/obj/collectives/device/all_reduce_sum_u8.o /<>/build/obj/collectives/device/all_reduce_sum_i32.o /<>/build/obj/collectives/device/all_reduce_sum_u32.o /<>/build/obj/collectives/device/all_reduce_sum_i64.o /<>/build/obj/collectives/device/all_reduce_sum_u64.o /<>/build/obj/collectives/device/all_reduce_sum_f16.o /<>/build/obj/collectives/device/all_reduce_sum_f32.o /<>/build/obj/collectives/device/all_reduce_sum_f64.o /<>/build/obj/collectives/device/all_reduce_sum_bf16.o /<>/build/obj/collectives/device/all_reduce_prod_i8.o /<>/build/obj/collectives/device/all_reduce_prod_u8.o /<>/build/obj/collectives/device/all_reduce_prod_i32.o /<>/build/obj/collectives/device/all_reduce_prod_u32.o /<>/build/obj/collectives/device/all_reduce_prod_i64.o /<>/build/obj/collectives/device/all_reduce_prod_u64.o /<>/build/obj/collectives/device/all_reduce_prod_f16.o /<>/build/obj/collectives/device/all_reduce_prod_f32.o /<>/build/obj/collectives/device/all_reduce_prod_f64.o /<>/build/obj/collectives/device/all_reduce_prod_bf16.o /<>/build/obj/collectives/device/all_reduce_min_i8.o /<>/build/obj/collectives/device/all_reduce_min_u8.o /<>/build/obj/collectives/device/all_reduce_min_i32.o /<>/build/obj/collectives/device/all_reduce_min_u32.o /<>/build/obj/collectives/device/all_reduce_min_i64.o /<>/build/obj/collectives/device/all_reduce_min_u64.o /<>/build/obj/collectives/device/all_reduce_min_f16.o /<>/build/obj/collectives/device/all_reduce_min_f32.o /<>/build/obj/collectives/device/all_reduce_min_f64.o /<>/build/obj/collectives/device/all_reduce_min_bf16.o /<>/build/obj/collectives/device/all_reduce_max_i8.o /<>/build/obj/collectives/device/all_reduce_max_u8.o /<>/build/obj/collectives/device/all_reduce_max_i32.o /<>/build/obj/collectives/device/all_reduce_max_u32.o /<>/build/obj/collectives/device/all_reduce_max_i64.o /<>/build/obj/collectives/device/all_reduce_max_u64.o /<>/build/obj/collectives/device/all_reduce_max_f16.o /<>/build/obj/collectives/device/all_reduce_max_f32.o /<>/build/obj/collectives/device/all_reduce_max_f64.o /<>/build/obj/collectives/device/all_reduce_max_bf16.o /<>/build/obj/collectives/device/all_reduce_premulsum_i8.o /<>/build/obj/collectives/device/all_reduce_premulsum_u8.o /<>/build/obj/collectives/device/all_reduce_premulsum_i32.o /<>/build/obj/collectives/device/all_reduce_premulsum_u32.o /<>/build/obj/collectives/device/all_reduce_premulsum_i64.o /<>/build/obj/collectives/device/all_reduce_premulsum_u64.o /<>/build/obj/collectives/device/all_reduce_premulsum_f16.o /<>/build/obj/collectives/device/all_reduce_premulsum_f32.o /<>/build/obj/collectives/device/all_reduce_premulsum_f64.o /<>/build/obj/collectives/device/all_reduce_premulsum_bf16.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i8.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u8.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i32.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u32.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i64.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u64.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f16.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f32.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f64.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_bf16.o /<>/build/obj/collectives/device/all_gather_sum_i8.o /<>/build/obj/collectives/device/all_gather_sum_u8.o /<>/build/obj/collectives/device/all_gather_sum_i32.o /<>/build/obj/collectives/device/all_gather_sum_u32.o /<>/build/obj/collectives/device/all_gather_sum_i64.o /<>/build/obj/collectives/device/all_gather_sum_u64.o /<>/build/obj/collectives/device/all_gather_sum_f16.o /<>/build/obj/collectives/device/all_gather_sum_f32.o /<>/build/obj/collectives/device/all_gather_sum_f64.o /<>/build/obj/collectives/device/all_gather_sum_bf16.o /<>/build/obj/collectives/device/all_gather_prod_i8.o /<>/build/obj/collectives/device/all_gather_prod_u8.o /<>/build/obj/collectives/device/all_gather_prod_i32.o /<>/build/obj/collectives/device/all_gather_prod_u32.o /<>/build/obj/collectives/device/all_gather_prod_i64.o /<>/build/obj/collectives/device/all_gather_prod_u64.o /<>/build/obj/collectives/device/all_gather_prod_f16.o /<>/build/obj/collectives/device/all_gather_prod_f32.o /<>/build/obj/collectives/device/all_gather_prod_f64.o /<>/build/obj/collectives/device/all_gather_prod_bf16.o /<>/build/obj/collectives/device/all_gather_min_i8.o /<>/build/obj/collectives/device/all_gather_min_u8.o /<>/build/obj/collectives/device/all_gather_min_i32.o /<>/build/obj/collectives/device/all_gather_min_u32.o /<>/build/obj/collectives/device/all_gather_min_i64.o /<>/build/obj/collectives/device/all_gather_min_u64.o /<>/build/obj/collectives/device/all_gather_min_f16.o /<>/build/obj/collectives/device/all_gather_min_f32.o /<>/build/obj/collectives/device/all_gather_min_f64.o /<>/build/obj/collectives/device/all_gather_min_bf16.o /<>/build/obj/collectives/device/all_gather_max_i8.o /<>/build/obj/collectives/device/all_gather_max_u8.o /<>/build/obj/collectives/device/all_gather_max_i32.o /<>/build/obj/collectives/device/all_gather_max_u32.o /<>/build/obj/collectives/device/all_gather_max_i64.o /<>/build/obj/collectives/device/all_gather_max_u64.o /<>/build/obj/collectives/device/all_gather_max_f16.o /<>/build/obj/collectives/device/all_gather_max_f32.o /<>/build/obj/collectives/device/all_gather_max_f64.o /<>/build/obj/collectives/device/all_gather_max_bf16.o /<>/build/obj/collectives/device/all_gather_premulsum_i8.o /<>/build/obj/collectives/device/all_gather_premulsum_u8.o /<>/build/obj/collectives/device/all_gather_premulsum_i32.o /<>/build/obj/collectives/device/all_gather_premulsum_u32.o /<>/build/obj/collectives/device/all_gather_premulsum_i64.o /<>/build/obj/collectives/device/all_gather_premulsum_u64.o /<>/build/obj/collectives/device/all_gather_premulsum_f16.o /<>/build/obj/collectives/device/all_gather_premulsum_f32.o /<>/build/obj/collectives/device/all_gather_premulsum_f64.o /<>/build/obj/collectives/device/all_gather_premulsum_bf16.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_i8.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_u8.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_i32.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_u32.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_i64.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_u64.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_f16.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_f32.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_f64.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_bf16.o /<>/build/obj/collectives/device/broadcast_sum_i8.o /<>/build/obj/collectives/device/broadcast_sum_u8.o /<>/build/obj/collectives/device/broadcast_sum_i32.o /<>/build/obj/collectives/device/broadcast_sum_u32.o /<>/build/obj/collectives/device/broadcast_sum_i64.o /<>/build/obj/collectives/device/broadcast_sum_u64.o /<>/build/obj/collectives/device/broadcast_sum_f16.o /<>/build/obj/collectives/device/broadcast_sum_f32.o /<>/build/obj/collectives/device/broadcast_sum_f64.o /<>/build/obj/collectives/device/broadcast_sum_bf16.o /<>/build/obj/collectives/device/broadcast_prod_i8.o /<>/build/obj/collectives/device/broadcast_prod_u8.o /<>/build/obj/collectives/device/broadcast_prod_i32.o /<>/build/obj/collectives/device/broadcast_prod_u32.o /<>/build/obj/collectives/device/broadcast_prod_i64.o /<>/build/obj/collectives/device/broadcast_prod_u64.o /<>/build/obj/collectives/device/broadcast_prod_f16.o /<>/build/obj/collectives/device/broadcast_prod_f32.o /<>/build/obj/collectives/device/broadcast_prod_f64.o /<>/build/obj/collectives/device/broadcast_prod_bf16.o /<>/build/obj/collectives/device/broadcast_min_i8.o /<>/build/obj/collectives/device/broadcast_min_u8.o /<>/build/obj/collectives/device/broadcast_min_i32.o /<>/build/obj/collectives/device/broadcast_min_u32.o /<>/build/obj/collectives/device/broadcast_min_i64.o /<>/build/obj/collectives/device/broadcast_min_u64.o /<>/build/obj/collectives/device/broadcast_min_f16.o /<>/build/obj/collectives/device/broadcast_min_f32.o /<>/build/obj/collectives/device/broadcast_min_f64.o /<>/build/obj/collectives/device/broadcast_min_bf16.o /<>/build/obj/collectives/device/broadcast_max_i8.o /<>/build/obj/collectives/device/broadcast_max_u8.o /<>/build/obj/collectives/device/broadcast_max_i32.o /<>/build/obj/collectives/device/broadcast_max_u32.o /<>/build/obj/collectives/device/broadcast_max_i64.o /<>/build/obj/collectives/device/broadcast_max_u64.o /<>/build/obj/collectives/device/broadcast_max_f16.o /<>/build/obj/collectives/device/broadcast_max_f32.o /<>/build/obj/collectives/device/broadcast_max_f64.o /<>/build/obj/collectives/device/broadcast_max_bf16.o /<>/build/obj/collectives/device/broadcast_premulsum_i8.o /<>/build/obj/collectives/device/broadcast_premulsum_u8.o /<>/build/obj/collectives/device/broadcast_premulsum_i32.o /<>/build/obj/collectives/device/broadcast_premulsum_u32.o /<>/build/obj/collectives/device/broadcast_premulsum_i64.o /<>/build/obj/collectives/device/broadcast_premulsum_u64.o /<>/build/obj/collectives/device/broadcast_premulsum_f16.o /<>/build/obj/collectives/device/broadcast_premulsum_f32.o /<>/build/obj/collectives/device/broadcast_premulsum_f64.o /<>/build/obj/collectives/device/broadcast_premulsum_bf16.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_i8.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_u8.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_i32.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_u32.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_i64.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_u64.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_f16.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_f32.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_f64.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_bf16.o /<>/build/obj/collectives/device/reduce_sum_i8.o /<>/build/obj/collectives/device/reduce_sum_u8.o /<>/build/obj/collectives/device/reduce_sum_i32.o /<>/build/obj/collectives/device/reduce_sum_u32.o /<>/build/obj/collectives/device/reduce_sum_i64.o /<>/build/obj/collectives/device/reduce_sum_u64.o /<>/build/obj/collectives/device/reduce_sum_f16.o /<>/build/obj/collectives/device/reduce_sum_f32.o /<>/build/obj/collectives/device/reduce_sum_f64.o /<>/build/obj/collectives/device/reduce_sum_bf16.o /<>/build/obj/collectives/device/reduce_prod_i8.o /<>/build/obj/collectives/device/reduce_prod_u8.o /<>/build/obj/collectives/device/reduce_prod_i32.o /<>/build/obj/collectives/device/reduce_prod_u32.o /<>/build/obj/collectives/device/reduce_prod_i64.o /<>/build/obj/collectives/device/reduce_prod_u64.o /<>/build/obj/collectives/device/reduce_prod_f16.o /<>/build/obj/collectives/device/reduce_prod_f32.o /<>/build/obj/collectives/device/reduce_prod_f64.o /<>/build/obj/collectives/device/reduce_prod_bf16.o /<>/build/obj/collectives/device/reduce_min_i8.o /<>/build/obj/collectives/device/reduce_min_u8.o /<>/build/obj/collectives/device/reduce_min_i32.o /<>/build/obj/collectives/device/reduce_min_u32.o /<>/build/obj/collectives/device/reduce_min_i64.o /<>/build/obj/collectives/device/reduce_min_u64.o /<>/build/obj/collectives/device/reduce_min_f16.o /<>/build/obj/collectives/device/reduce_min_f32.o /<>/build/obj/collectives/device/reduce_min_f64.o /<>/build/obj/collectives/device/reduce_min_bf16.o /<>/build/obj/collectives/device/reduce_max_i8.o /<>/build/obj/collectives/device/reduce_max_u8.o /<>/build/obj/collectives/device/reduce_max_i32.o /<>/build/obj/collectives/device/reduce_max_u32.o /<>/build/obj/collectives/device/reduce_max_i64.o /<>/build/obj/collectives/device/reduce_max_u64.o /<>/build/obj/collectives/device/reduce_max_f16.o /<>/build/obj/collectives/device/reduce_max_f32.o /<>/build/obj/collectives/device/reduce_max_f64.o /<>/build/obj/collectives/device/reduce_max_bf16.o /<>/build/obj/collectives/device/reduce_premulsum_i8.o /<>/build/obj/collectives/device/reduce_premulsum_u8.o /<>/build/obj/collectives/device/reduce_premulsum_i32.o /<>/build/obj/collectives/device/reduce_premulsum_u32.o /<>/build/obj/collectives/device/reduce_premulsum_i64.o /<>/build/obj/collectives/device/reduce_premulsum_u64.o /<>/build/obj/collectives/device/reduce_premulsum_f16.o /<>/build/obj/collectives/device/reduce_premulsum_f32.o /<>/build/obj/collectives/device/reduce_premulsum_f64.o /<>/build/obj/collectives/device/reduce_premulsum_bf16.o /<>/build/obj/collectives/device/reduce_sumpostdiv_i8.o /<>/build/obj/collectives/device/reduce_sumpostdiv_u8.o /<>/build/obj/collectives/device/reduce_sumpostdiv_i32.o /<>/build/obj/collectives/device/reduce_sumpostdiv_u32.o /<>/build/obj/collectives/device/reduce_sumpostdiv_i64.o /<>/build/obj/collectives/device/reduce_sumpostdiv_u64.o /<>/build/obj/collectives/device/reduce_sumpostdiv_f16.o /<>/build/obj/collectives/device/reduce_sumpostdiv_f32.o /<>/build/obj/collectives/device/reduce_sumpostdiv_f64.o /<>/build/obj/collectives/device/reduce_sumpostdiv_bf16.o /<>/build/obj/collectives/device/reduce_scatter_sum_i8.o /<>/build/obj/collectives/device/reduce_scatter_sum_u8.o /<>/build/obj/collectives/device/reduce_scatter_sum_i32.o /<>/build/obj/collectives/device/reduce_scatter_sum_u32.o /<>/build/obj/collectives/device/reduce_scatter_sum_i64.o /<>/build/obj/collectives/device/reduce_scatter_sum_u64.o /<>/build/obj/collectives/device/reduce_scatter_sum_f16.o /<>/build/obj/collectives/device/reduce_scatter_sum_f32.o /<>/build/obj/collectives/device/reduce_scatter_sum_f64.o /<>/build/obj/collectives/device/reduce_scatter_sum_bf16.o /<>/build/obj/collectives/device/reduce_scatter_prod_i8.o /<>/build/obj/collectives/device/reduce_scatter_prod_u8.o /<>/build/obj/collectives/device/reduce_scatter_prod_i32.o /<>/build/obj/collectives/device/reduce_scatter_prod_u32.o /<>/build/obj/collectives/device/reduce_scatter_prod_i64.o /<>/build/obj/collectives/device/reduce_scatter_prod_u64.o /<>/build/obj/collectives/device/reduce_scatter_prod_f16.o /<>/build/obj/collectives/device/reduce_scatter_prod_f32.o /<>/build/obj/collectives/device/reduce_scatter_prod_f64.o /<>/build/obj/collectives/device/reduce_scatter_prod_bf16.o /<>/build/obj/collectives/device/reduce_scatter_min_i8.o /<>/build/obj/collectives/device/reduce_scatter_min_u8.o /<>/build/obj/collectives/device/reduce_scatter_min_i32.o /<>/build/obj/collectives/device/reduce_scatter_min_u32.o /<>/build/obj/collectives/device/reduce_scatter_min_i64.o /<>/build/obj/collectives/device/reduce_scatter_min_u64.o /<>/build/obj/collectives/device/reduce_scatter_min_f16.o /<>/build/obj/collectives/device/reduce_scatter_min_f32.o /<>/build/obj/collectives/device/reduce_scatter_min_f64.o /<>/build/obj/collectives/device/reduce_scatter_min_bf16.o /<>/build/obj/collectives/device/reduce_scatter_max_i8.o /<>/build/obj/collectives/device/reduce_scatter_max_u8.o /<>/build/obj/collectives/device/reduce_scatter_max_i32.o /<>/build/obj/collectives/device/reduce_scatter_max_u32.o /<>/build/obj/collectives/device/reduce_scatter_max_i64.o /<>/build/obj/collectives/device/reduce_scatter_max_u64.o /<>/build/obj/collectives/device/reduce_scatter_max_f16.o /<>/build/obj/collectives/device/reduce_scatter_max_f32.o /<>/build/obj/collectives/device/reduce_scatter_max_f64.o /<>/build/obj/collectives/device/reduce_scatter_max_bf16.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_i8.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_u8.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_i32.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_u32.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_i64.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_u64.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_f16.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_f32.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_f64.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_bf16.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i8.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u8.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i32.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u32.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i64.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u64.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f16.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f32.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f64.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_bf16.o /<>/build/obj/collectives/device/functions.o /<>/build/obj/collectives/device/onerank_reduce.o /<>/build/obj/collectives/device/devlink.o /<>/build/obj/collectives/device/all_reduce.d /<>/build/obj/collectives/device/broadcast.d /<>/build/obj/collectives/device/reduce.d /<>/build/obj/collectives/device/all_gather.d /<>/build/obj/collectives/device/reduce_scatter.d /<>/build/obj/collectives/device/sendrecv.d /<>/build/obj/collectives/device/onerank_reduce.d /<>/build/obj/collectives/device/functions.d /<>/build/obj/collectives/device/all_reduce.dep /<>/build/obj/collectives/device/broadcast.dep /<>/build/obj/collectives/device/reduce.dep /<>/build/obj/collectives/device/all_gather.dep /<>/build/obj/collectives/device/reduce_scatter.dep /<>/build/obj/collectives/device/sendrecv.dep /<>/build/obj/collectives/device/onerank_reduce.dep /<>/build/obj/collectives/device/functions.dep /<>/build/obj/collectives/device/Makefile.rules /<>/build/obj/collectives/device/colldevice.a make[3]: Leaving directory '/<>/src/collectives/device' rm -rf /<>/build/include /<>/build/lib /<>/build/lib/pkgconfig /<>/build/obj make[2]: Leaving directory '/<>/src' make[1]: Leaving directory '/<>' dh_clean debian/rules binary-arch dh binary-arch dh_update_autotools_config -a dh_autoreconf -a dh_auto_configure -a debian/rules override_dh_auto_build make[1]: Entering directory '/<>' dh_auto_build -- src.build make -j4 "INSTALL=install --strip-program=true" src.build make[2]: Entering directory '/<>' make -C src build BUILDDIR=/<>/build make[3]: Entering directory '/<>/src' NVCC_GENCODE is -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 mkdir -p /<>/build/include Generating nccl.h.in > /<>/build/include/nccl.h Grabbing include/nccl_net.h > /<>/build/include/nccl_net.h sed -e "s/\${nccl:Major}/2/g" \ -e "s/\${nccl:Minor}/18/g" \ -e "s/\${nccl:Patch}/3/g" \ -e "s/\${nccl:Suffix}//g" \ -e "s/\${nccl:Version}/21803/g" \ nccl.h.in > /<>/build/include/nccl.h mkdir -p /<>/build/include mkdir -p /<>/build/lib/pkgconfig install -m 644 include/nccl_net.h /<>/build/include/nccl_net.h Generating nccl.pc.in > /<>/build/lib/pkgconfig/nccl.pc sed -e 's|${nccl:Prefix}|\/usr/local|g' \ -e "s/\${nccl:Major}/2/g" \ -e "s/\${nccl:Minor}/18/g" \ -e "s/\${nccl:Patch}/3/g" \ nccl.pc.in > /<>/build/lib/pkgconfig/nccl.pc Compiling init.cc > /<>/build/obj/init.o mkdir -p `dirname /<>/build/obj/init.o` Compiling init_nvtx.cc > /<>/build/obj/init_nvtx.o cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c init.cc -o /<>/build/obj/init.o mkdir -p `dirname /<>/build/obj/init_nvtx.o` Compiling channel.cc > /<>/build/obj/channel.o mkdir -p `dirname /<>/build/obj/channel.o` Compiling bootstrap.cc > /<>/build/obj/bootstrap.o cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c init_nvtx.cc -o /<>/build/obj/init_nvtx.o mkdir -p `dirname /<>/build/obj/bootstrap.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c channel.cc -o /<>/build/obj/channel.o cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c bootstrap.cc -o /<>/build/obj/bootstrap.o In file included from include/transport.h:10, from include/comm.h:10, from include/channel.h:9, from init.cc:8: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ In file included from include/transport.h:10, from include/comm.h:10, from include/channel.h:9, from channel.cc:7: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ In file included from init_nvtx.cc:2: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ init_nvtx.cc:10:1: warning: missing initializer for member ‘nvtxPayloadEnum_t::isFlag’ [-Wmissing-field-initializers] 10 | }; | ^ init_nvtx.cc:10:1: warning: missing initializer for member ‘nvtxPayloadEnum_t::isFlag’ [-Wmissing-field-initializers] init_nvtx.cc:10:1: warning: missing initializer for member ‘nvtxPayloadEnum_t::isFlag’ [-Wmissing-field-initializers] init_nvtx.cc:10:1: warning: missing initializer for member ‘nvtxPayloadEnum_t::isFlag’ [-Wmissing-field-initializers] init_nvtx.cc:10:1: warning: missing initializer for member ‘nvtxPayloadEnum_t::isFlag’ [-Wmissing-field-initializers] In file included from include/core.h:62, from include/info.h:13, from include/graph.h:114, from include/transport.h:11: include/nvtx.h: At global scope: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ In file included from include/core.h:62, from bootstrap.cc:8: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ In file included from include/core.h:62, from include/info.h:13, from include/graph.h:114, from include/transport.h:11: include/nvtx.h: At global scope: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ In file included from include/transport.h:10, from include/comm.h:10, from include/bootstrap.h:11, from bootstrap.cc:10: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ In file included from include/core.h:59: include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclChannelPeer; size_t = long unsigned int]’: channel.cc:29:7: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ init.cc: In function ‘ncclResult_t commGetSplitInfo(ncclComm*, ncclComm*, int, int, int*, int*, int*)’: init.cc:1271:55: warning: unused parameter ‘comm’ [-Wunused-parameter] 1271 | static ncclResult_t commGetSplitInfo(struct ncclComm* comm, struct ncclComm* parent, int color, int key, int* nRanksRet, int* myRankRet, int* parentRanksRet) { | ~~~~~~~~~~~~~~~~~^~~~ init.cc: At global scope: init.cc:1622:1: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::description’ [-Wmissing-field-initializers] 1622 | }; | ^ init.cc:1622:1: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::arrayOrUnionDetail’ [-Wmissing-field-initializers] init.cc:1622:1: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::offset’ [-Wmissing-field-initializers] init.cc:1622:1: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::semantics’ [-Wmissing-field-initializers] init.cc:1622:1: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::reserved’ [-Wmissing-field-initializers] init.cc:1622:1: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::semantics’ [-Wmissing-field-initializers] init.cc:1622:1: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::reserved’ [-Wmissing-field-initializers] init.cc:1622:1: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::semantics’ [-Wmissing-field-initializers] init.cc:1622:1: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::reserved’ [-Wmissing-field-initializers] init.cc: In function ‘ncclResult_t ncclCommInitAll(ncclComm**, int, const int*)’: init.cc:1649:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::description’ [-Wmissing-field-initializers] 1649 | }; | ^ init.cc:1649:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::arrayOrUnionDetail’ [-Wmissing-field-initializers] init.cc:1649:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::offset’ [-Wmissing-field-initializers] init.cc:1649:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::semantics’ [-Wmissing-field-initializers] init.cc:1649:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::reserved’ [-Wmissing-field-initializers] init.cc: In function ‘const char* ncclGetLastError(ncclComm_t)’: init.cc:2067:41: warning: unused parameter ‘comm’ [-Wunused-parameter] 2067 | const char* ncclGetLastError(ncclComm_t comm) { | ~~~~~~~~~~~^~~~ In file included from include/core.h:59: include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = long unsigned int; size_t = long unsigned int]’: init.cc:348:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclSharedResources; size_t = long unsigned int]’: init.cc:356:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = int; size_t = long unsigned int]’: init.cc:360:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ bootstrap.cc: In function ‘ncclResult_t bootstrapCreateRoot(ncclBootstrapHandle*, bool)’: bootstrap.cc:169:75: warning: unused parameter ‘idFromEnv’ [-Wunused-parameter] 169 | ncclResult_t bootstrapCreateRoot(struct ncclBootstrapHandle* handle, bool idFromEnv) { | ~~~~~^~~~~~~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = collNetTrySetup(ncclComm_t, ncclComm_t, ncclTopoGraph*)::collnetShareInfo; size_t = long unsigned int]’: init.cc:571:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclCollNetSharedRes; size_t = long unsigned int]’: init.cc:627:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = unsigned char [4][10]; size_t = long unsigned int]’: init.cc:658:7: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclPeerInfo; size_t = long unsigned int]’: init.cc:789:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = initTransportsRank(ncclComm*, ncclComm*)::allGatherInfo; size_t = long unsigned int]’: init.cc:923:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclNodeRanks; size_t = long unsigned int]’: init.cc:957:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclTopoRanks*; size_t = long unsigned int]’: init.cc:991:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclComm; size_t = long unsigned int]’: init.cc:1583:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = unsigned int; size_t = long unsigned int]’: init.cc:1585:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclCommInitRankAsyncJob; size_t = long unsigned int]’: init.cc:1592:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclCommFinalizeAsyncJob; size_t = long unsigned int]’: init.cc:1794:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ bootstrap.cc: In function ‘ncclResult_t bootstrapInit(ncclBootstrapHandle*, ncclComm*)’: bootstrap.cc:235:29: warning: missing initializer for member ‘extInfo::nranks’ [-Wmissing-field-initializers] 235 | struct extInfo info = { 0 }; | ^ bootstrap.cc:235:29: warning: missing initializer for member ‘extInfo::extAddressListenRoot’ [-Wmissing-field-initializers] bootstrap.cc:235:29: warning: missing initializer for member ‘extInfo::extAddressListen’ [-Wmissing-field-initializers] include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = gdr_mem_desc; size_t = long unsigned int]’: include/gdrwrap.h:218:3: required from ‘ncclResult_t ncclGdrCudaCalloc(T**, T**, size_t, void**) [with T = ncclWork; size_t = long unsigned int]’ init.cc:407:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ In file included from include/core.h:59: include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclSocketAddress; size_t = long unsigned int]’: bootstrap.cc:107:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclSocket; size_t = long unsigned int]’: bootstrap.cc:174:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = bootstrapRootArgs; size_t = long unsigned int]’: bootstrap.cc:179:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = bootstrapState; size_t = long unsigned int]’: bootstrap.cc:237:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = unexConn; size_t = long unsigned int]’: bootstrap.cc:478:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ Compiling transport.cc > /<>/build/obj/transport.o mkdir -p `dirname /<>/build/obj/transport.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c transport.cc -o /<>/build/obj/transport.o In file included from include/transport.h:10, from include/comm.h:10, from transport.cc:7: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ In file included from include/core.h:62, from include/info.h:13, from include/graph.h:114, from include/transport.h:11: include/nvtx.h: At global scope: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ Compiling enqueue.cc > /<>/build/obj/enqueue.o mkdir -p `dirname /<>/build/obj/enqueue.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c enqueue.cc -o /<>/build/obj/enqueue.o In file included from include/core.h:59: include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclConnect; size_t = long unsigned int]’: transport.cc:252:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclTransportCollNetSetup(ncclComm*, ncclTopoGraph*, ncclChannel*, int, int, int, int)::; size_t = long unsigned int]’: transport.cc:255:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ In file included from include/transport.h:10, from include/comm.h:10, from include/enqueue.h:10, from enqueue.cc:7: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ Compiling group.cc > /<>/build/obj/group.o mkdir -p `dirname /<>/build/obj/group.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c group.cc -o /<>/build/obj/group.o In file included from include/core.h:62, from include/info.h:13, from include/graph.h:114, from include/transport.h:11: include/nvtx.h: At global scope: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ In file included from include/transport.h:10, from include/comm.h:10, from include/group.h:11, from group.cc:7: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ enqueue.cc: In function ‘ncclResult_t ncclInitKernelsForDevice(int, size_t*)’: enqueue.cc:114:35: warning: missing initializer for member ‘cudaFuncAttributes::constSizeBytes’ [-Wmissing-field-initializers] 114 | cudaFuncAttributes attr = {0}; | ^ enqueue.cc:114:35: warning: missing initializer for member ‘cudaFuncAttributes::localSizeBytes’ [-Wmissing-field-initializers] enqueue.cc:114:35: warning: missing initializer for member ‘cudaFuncAttributes::maxThreadsPerBlock’ [-Wmissing-field-initializers] enqueue.cc:114:35: warning: missing initializer for member ‘cudaFuncAttributes::numRegs’ [-Wmissing-field-initializers] enqueue.cc:114:35: warning: missing initializer for member ‘cudaFuncAttributes::ptxVersion’ [-Wmissing-field-initializers] enqueue.cc:114:35: warning: missing initializer for member ‘cudaFuncAttributes::binaryVersion’ [-Wmissing-field-initializers] enqueue.cc:114:35: warning: missing initializer for member ‘cudaFuncAttributes::cacheModeCA’ [-Wmissing-field-initializers] enqueue.cc:114:35: warning: missing initializer for member ‘cudaFuncAttributes::maxDynamicSharedSizeBytes’ [-Wmissing-field-initializers] enqueue.cc:114:35: warning: missing initializer for member ‘cudaFuncAttributes::preferredShmemCarveout’ [-Wmissing-field-initializers] enqueue.cc:114:35: warning: missing initializer for member ‘cudaFuncAttributes::clusterDimMustBeSet’ [-Wmissing-field-initializers] enqueue.cc:114:35: warning: missing initializer for member ‘cudaFuncAttributes::requiredClusterWidth’ [-Wmissing-field-initializers] enqueue.cc:114:35: warning: missing initializer for member ‘cudaFuncAttributes::requiredClusterHeight’ [-Wmissing-field-initializers] enqueue.cc:114:35: warning: missing initializer for member ‘cudaFuncAttributes::requiredClusterDepth’ [-Wmissing-field-initializers] enqueue.cc:114:35: warning: missing initializer for member ‘cudaFuncAttributes::clusterSchedulingPolicyPreference’ [-Wmissing-field-initializers] enqueue.cc:114:35: warning: missing initializer for member ‘cudaFuncAttributes::nonPortableClusterSizeAllowed’ [-Wmissing-field-initializers] enqueue.cc:114:35: warning: missing initializer for member ‘cudaFuncAttributes::reserved’ [-Wmissing-field-initializers] enqueue.cc: In function ‘ncclResult_t addP2pToPlan(ncclComm*, ncclKernelPlan*, int*, bool, int, int, void*, size_t, bool)’: enqueue.cc:361:3: warning: missing initializer for member ‘ncclInfo::opFull’ [-Wmissing-field-initializers] 361 | }; | ^ enqueue.cc:361:3: warning: missing initializer for member ‘ncclInfo::algorithm’ [-Wmissing-field-initializers] enqueue.cc:361:3: warning: missing initializer for member ‘ncclInfo::protocol’ [-Wmissing-field-initializers] enqueue.cc:361:3: warning: missing initializer for member ‘ncclInfo::pattern’ [-Wmissing-field-initializers] enqueue.cc:361:3: warning: missing initializer for member ‘ncclInfo::nChannels’ [-Wmissing-field-initializers] enqueue.cc:361:3: warning: missing initializer for member ‘ncclInfo::nThreads’ [-Wmissing-field-initializers] enqueue.cc:361:3: warning: missing initializer for member ‘ncclInfo::nBytes’ [-Wmissing-field-initializers] enqueue.cc:361:3: warning: missing initializer for member ‘ncclInfo::nstepsPerLoop’ [-Wmissing-field-initializers] enqueue.cc:361:3: warning: missing initializer for member ‘ncclInfo::nchunksPerLoop’ [-Wmissing-field-initializers] enqueue.cc:361:3: warning: missing initializer for member ‘ncclInfo::chunkSize’ [-Wmissing-field-initializers] enqueue.cc:361:3: warning: missing initializer for member ‘ncclInfo::channelId’ [-Wmissing-field-initializers] enqueue.cc:375:35: warning: missing initializer for member ‘ncclWorkElemP2p::proto’ [-Wmissing-field-initializers] 375 | struct ncclWorkElemP2p elem = {0}; | ^ enqueue.cc:375:35: warning: missing initializer for member ‘ncclWorkElemP2p::p2pType’ [-Wmissing-field-initializers] enqueue.cc:375:35: warning: missing initializer for member ‘ncclWorkElemP2p::nWarps’ [-Wmissing-field-initializers] enqueue.cc:375:35: warning: missing initializer for member ‘ncclWorkElemP2p::warpStart’ [-Wmissing-field-initializers] enqueue.cc:375:35: warning: missing initializer for member ‘ncclWorkElemP2p::ngroups’ [-Wmissing-field-initializers] enqueue.cc:375:35: warning: missing initializer for member ‘ncclWorkElemP2p::buffHi32’ [-Wmissing-field-initializers] enqueue.cc:375:35: warning: missing initializer for member ‘ncclWorkElemP2p::buffLo32’ [-Wmissing-field-initializers] enqueue.cc:375:35: warning: missing initializer for member ‘ncclWorkElemP2p::countHi32’ [-Wmissing-field-initializers] enqueue.cc:375:35: warning: missing initializer for member ‘ncclWorkElemP2p::countLo32’ [-Wmissing-field-initializers] enqueue.cc:375:35: warning: missing initializer for member ‘ncclWorkElemP2p::chunkSize’ [-Wmissing-field-initializers] enqueue.cc: In function ‘ncclResult_t ncclLaunchKernel(ncclComm*, ncclKernelPlan*)’: enqueue.cc:1052:41: warning: missing initializer for member ‘cudaLaunchConfig_st::blockDim’ [-Wmissing-field-initializers] 1052 | cudaLaunchConfig_t launchConfig = {0}; | ^ enqueue.cc:1052:41: warning: missing initializer for member ‘cudaLaunchConfig_st::dynamicSmemBytes’ [-Wmissing-field-initializers] enqueue.cc:1052:41: warning: missing initializer for member ‘cudaLaunchConfig_st::stream’ [-Wmissing-field-initializers] enqueue.cc:1052:41: warning: missing initializer for member ‘cudaLaunchConfig_st::attrs’ [-Wmissing-field-initializers] enqueue.cc:1052:41: warning: missing initializer for member ‘cudaLaunchConfig_st::numAttrs’ [-Wmissing-field-initializers] In file included from include/core.h:62, from include/info.h:13, from include/graph.h:114, from include/transport.h:11: include/nvtx.h: At global scope: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ In file included from include/core.h:59: include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclPreconnectJob; size_t = long unsigned int]’: group.cc:266:7: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ Compiling debug.cc > /<>/build/obj/debug.o mkdir -p `dirname /<>/build/obj/debug.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c debug.cc -o /<>/build/obj/debug.o In function ‘addP2pToPlan(ncclComm*, ncclKernelPlan*, int*, bool, int, int, void*, unsigned long, bool)’, inlined from ‘scheduleP2pTasksToPlan(ncclComm*, ncclKernelPlan*, int*)’ at enqueue.cc:686:13: enqueue.cc:387:20: warning: ‘fuseOk’ may be used uninitialized [-Wmaybe-uninitialized] 387 | appendWorkElemP2p(comm, plan, channelId, &elem, fuseOk); | ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ enqueue.cc: In function ‘scheduleP2pTasksToPlan(ncclComm*, ncclKernelPlan*, int*)’: enqueue.cc:639:8: note: ‘fuseOk’ was declared here 639 | bool fuseOk; | ^~~~~~ In file included from include/core.h:62, from debug.cc:7: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ Compiling proxy.cc > /<>/build/obj/proxy.o mkdir -p `dirname /<>/build/obj/proxy.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c proxy.cc -o /<>/build/obj/proxy.o Compiling net.cc > /<>/build/obj/net.o mkdir -p `dirname /<>/build/obj/net.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c net.cc -o /<>/build/obj/net.o In file included from include/transport.h:10, from include/comm.h:10, from proxy.cc:7: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ In file included from include/transport.h:10, from include/comm.h:10, from include/net.h:12, from net.cc:1: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ In file included from include/core.h:62, from include/info.h:13, from include/graph.h:114, from include/transport.h:11: include/nvtx.h: At global scope: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ proxy.cc: In function ‘void ncclDumpProxyState(int)’: proxy.cc:787:29: warning: unused parameter ‘signal’ [-Wunused-parameter] 787 | void ncclDumpProxyState(int signal) { | ~~~~^~~~~~ proxy.cc: In function ‘ncclResult_t ncclProxyConnect(ncclComm*, int, int, int, ncclProxyConnector*)’: proxy.cc:1037:35: warning: missing initializer for member ‘ncclProxyInitReq::send’ [-Wmissing-field-initializers] 1037 | struct ncclProxyInitReq req = {0}; | ^ proxy.cc:1037:35: warning: missing initializer for member ‘ncclProxyInitReq::tpLocalRank’ [-Wmissing-field-initializers] proxy.cc:1037:35: warning: missing initializer for member ‘ncclProxyInitReq::tpRank’ [-Wmissing-field-initializers] proxy.cc:1037:35: warning: missing initializer for member ‘ncclProxyInitReq::sameProcess’ [-Wmissing-field-initializers] proxy.cc:1044:37: warning: missing initializer for member ‘ncclProxyInitResp::devShmPath’ [-Wmissing-field-initializers] 1044 | struct ncclProxyInitResp resp = {0}; | ^ proxy.cc: In function ‘ncclResult_t ncclProxyClientConvertFdBlocking(ncclComm*, ncclProxyConnector*, int, int*)’: proxy.cc:1070:38: warning: missing initializer for member ‘ncclIpcSocket::socketName’ [-Wmissing-field-initializers] 1070 | struct ncclIpcSocket ipcSock = { 0 }; | ^ proxy.cc:1070:38: warning: missing initializer for member ‘ncclIpcSocket::abortFlag’ [-Wmissing-field-initializers] proxy.cc: In function ‘ncclResult_t proxyConvertFd(ncclProxyLocalPeer*, void*, ncclProxyState*, int)’: proxy.cc:1288:38: warning: missing initializer for member ‘ncclIpcSocket::socketName’ [-Wmissing-field-initializers] 1288 | struct ncclIpcSocket ipcSock = { 0 }; | ^ proxy.cc:1288:38: warning: missing initializer for member ‘ncclIpcSocket::abortFlag’ [-Wmissing-field-initializers] In file included from include/core.h:59: include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclExpectedProxyResponse; size_t = long unsigned int]’: proxy.cc:81:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclProxyPool; size_t = long unsigned int]’: proxy.cc:193:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclProxyConnection; size_t = long unsigned int]’: proxy.cc:943:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclSocket; size_t = long unsigned int]’: proxy.cc:1021:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclProxyOps; size_t = long unsigned int]’: proxy.cc:1022:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = void*; size_t = long unsigned int]’: proxy.cc:1023:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclProxyAsyncOp; size_t = long unsigned int]’: proxy.cc:1357:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = char; size_t = long unsigned int]’: proxy.cc:1365:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclProxyState; size_t = long unsigned int]’: proxy.cc:1552:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ In file included from include/core.h:62, from include/info.h:13, from include/graph.h:114, from include/transport.h:11: include/nvtx.h: At global scope: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ net.cc: In function ‘ncclResult_t ncclNet_v4_as_v6_isend(void*, void*, int, int, void*, void**)’: net.cc:37:86: warning: unused parameter ‘tag’ [-Wunused-parameter] 37 | static ncclResult_t ncclNet_v4_as_v6_isend(void* sendComm, void* data, int size, int tag, void* mhandle, void** request) { | ~~~~^~~ net.cc: In function ‘ncclResult_t ncclNet_v4_as_v6_irecv(void*, int, void**, int*, int*, void**, void**)’: net.cc:41:97: warning: unused parameter ‘tags’ [-Wunused-parameter] 41 | static ncclResult_t ncclNet_v4_as_v6_irecv(void* recvComm, int n, void** data, int* sizes, int* tags, void** mhandles, void** request) { | ~~~~~^~~~ Compiling misc/cudawrap.cc > /<>/build/obj/misc/cudawrap.o mkdir -p `dirname /<>/build/obj/misc/cudawrap.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c misc/cudawrap.cc -o /<>/build/obj/misc/cudawrap.o Compiling misc/nvmlwrap.cc > /<>/build/obj/misc/nvmlwrap.o mkdir -p `dirname /<>/build/obj/misc/nvmlwrap.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c misc/nvmlwrap.cc -o /<>/build/obj/misc/nvmlwrap.o Compiling misc/ibvsymbols.cc > /<>/build/obj/misc/ibvsymbols.o mkdir -p `dirname /<>/build/obj/misc/ibvsymbols.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c misc/ibvsymbols.cc -o /<>/build/obj/misc/ibvsymbols.o In file included from include/core.h:62, from misc/ibvsymbols.cc:64: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ Compiling misc/ibvwrap.cc > /<>/build/obj/misc/ibvwrap.o mkdir -p `dirname /<>/build/obj/misc/ibvwrap.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c misc/ibvwrap.cc -o /<>/build/obj/misc/ibvwrap.o In file included from include/core.h:62, from include/ibvwrap.h:21, from misc/ibvwrap.cc:7: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ Compiling misc/gdrwrap.cc > /<>/build/obj/misc/gdrwrap.o mkdir -p `dirname /<>/build/obj/misc/gdrwrap.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c misc/gdrwrap.cc -o /<>/build/obj/misc/gdrwrap.o In file included from include/core.h:62, from misc/gdrwrap.cc:10: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ Compiling misc/utils.cc > /<>/build/obj/misc/utils.o mkdir -p `dirname /<>/build/obj/misc/utils.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c misc/utils.cc -o /<>/build/obj/misc/utils.o Compiling misc/argcheck.cc > /<>/build/obj/misc/argcheck.o mkdir -p `dirname /<>/build/obj/misc/argcheck.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c misc/argcheck.cc -o /<>/build/obj/misc/argcheck.o In file included from include/core.h:62, from misc/utils.cc:8: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ In file included from include/core.h:62, from include/argcheck.h:10, from misc/argcheck.cc:7: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ In file included from include/info.h:11, from include/argcheck.h:11: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ Compiling misc/socket.cc > /<>/build/obj/misc/socket.o mkdir -p `dirname /<>/build/obj/misc/socket.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c misc/socket.cc -o /<>/build/obj/misc/socket.o Compiling misc/shmutils.cc > /<>/build/obj/misc/shmutils.o mkdir -p `dirname /<>/build/obj/misc/shmutils.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c misc/shmutils.cc -o /<>/build/obj/misc/shmutils.o Compiling misc/profiler.cc > /<>/build/obj/misc/profiler.o mkdir -p `dirname /<>/build/obj/misc/profiler.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c misc/profiler.cc -o /<>/build/obj/misc/profiler.o Compiling misc/param.cc > /<>/build/obj/misc/param.o mkdir -p `dirname /<>/build/obj/misc/param.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c misc/param.cc -o /<>/build/obj/misc/param.o In file included from include/proxy.h:10, from include/profiler.h:10, from misc/profiler.cc:7: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ Compiling misc/strongstream.cc > /<>/build/obj/misc/strongstream.o mkdir -p `dirname /<>/build/obj/misc/strongstream.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c misc/strongstream.cc -o /<>/build/obj/misc/strongstream.o Compiling misc/ipcsocket.cc > /<>/build/obj/misc/ipcsocket.o mkdir -p `dirname /<>/build/obj/misc/ipcsocket.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c misc/ipcsocket.cc -o /<>/build/obj/misc/ipcsocket.o Compiling transport/p2p.cc > /<>/build/obj/transport/p2p.o mkdir -p `dirname /<>/build/obj/transport/p2p.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c transport/p2p.cc -o /<>/build/obj/transport/p2p.o In file included from include/core.h:62, from include/info.h:13, from include/proxy.h:11: include/nvtx.h: At global scope: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ misc/profiler.cc: In function ‘ncclResult_t ncclProfilingRecord(ncclProxyArgs*, int, int, int)’: misc/profiler.cc:113:56: warning: unused parameter ‘args’ [-Wunused-parameter] 113 | ncclResult_t ncclProfilingRecord(struct ncclProxyArgs* args, int sub, int step, int state) { return ncclSuccess; } | ~~~~~~~~~~~~~~~~~~~~~~^~~~ misc/profiler.cc:113:66: warning: unused parameter ‘sub’ [-Wunused-parameter] 113 | ncclResult_t ncclProfilingRecord(struct ncclProxyArgs* args, int sub, int step, int state) { return ncclSuccess; } | ~~~~^~~ misc/profiler.cc:113:75: warning: unused parameter ‘step’ [-Wunused-parameter] 113 | ncclResult_t ncclProfilingRecord(struct ncclProxyArgs* args, int sub, int step, int state) { return ncclSuccess; } | ~~~~^~~~ misc/profiler.cc:113:85: warning: unused parameter ‘state’ [-Wunused-parameter] 113 | ncclResult_t ncclProfilingRecord(struct ncclProxyArgs* args, int sub, int step, int state) { return ncclSuccess; } | ~~~~^~~~~ In file included from include/transport.h:10, from include/comm.h:10, from transport/p2p.cc:7: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ Compiling transport/shm.cc > /<>/build/obj/transport/shm.o mkdir -p `dirname /<>/build/obj/transport/shm.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c transport/shm.cc -o /<>/build/obj/transport/shm.o Compiling transport/net.cc > /<>/build/obj/transport/net.o mkdir -p `dirname /<>/build/obj/transport/net.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c transport/net.cc -o /<>/build/obj/transport/net.o In file included from include/transport.h:10, from include/comm.h:10, from transport/shm.cc:7: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ In file included from include/core.h:62, from include/info.h:13, from include/graph.h:114, from include/transport.h:11: include/nvtx.h: At global scope: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ transport/p2p.cc: In function ‘ncclResult_t p2pCanConnect(int*, ncclTopoSystem*, ncclTopoGraph*, ncclPeerInfo*, ncclPeerInfo*)’: transport/p2p.cc:103:89: warning: unused parameter ‘graph’ [-Wunused-parameter] 103 | ncclResult_t p2pCanConnect(int* ret, struct ncclTopoSystem* topo, struct ncclTopoGraph* graph, struct ncclPeerInfo* info1, struct ncclPeerInfo* info2) { | ~~~~~~~~~~~~~~~~~~~~~~^~~~~ transport/p2p.cc: In function ‘ncclResult_t p2pRecvSetup(ncclComm*, ncclTopoGraph*, ncclPeerInfo*, ncclPeerInfo*, ncclConnect*, ncclConnector*, int, int)’: transport/p2p.cc:395:71: warning: unused parameter ‘channelId’ [-Wunused-parameter] 395 | struct ncclConnect* connectInfo, struct ncclConnector * recv, int channelId, int connIndex) { | ~~~~^~~~~~~~~ transport/p2p.cc: In function ‘ncclResult_t p2pSendConnect(ncclComm*, ncclConnect*, int, int, ncclConnector*)’: transport/p2p.cc:445:96: warning: unused parameter ‘nranks’ [-Wunused-parameter] 445 | static ncclResult_t p2pSendConnect(struct ncclComm* comm, struct ncclConnect* connectInfo, int nranks, int rank, struct ncclConnector* send) { | ~~~~^~~~~~ transport/p2p.cc: In function ‘ncclResult_t p2pRecvConnect(ncclComm*, ncclConnect*, int, int, ncclConnector*)’: transport/p2p.cc:481:89: warning: unused parameter ‘nranks’ [-Wunused-parameter] 481 | ncclResult_t p2pRecvConnect(struct ncclComm* comm, struct ncclConnect* connectInfo, int nranks, int rank, struct ncclConnector* recv) { | ~~~~^~~~~~ transport/p2p.cc: In function ‘ncclResult_t p2pRecvProxySetup(ncclProxyConnection*, ncclProxyState*, void*, int, void*, int, int*)’: transport/p2p.cc:600:102: warning: unused parameter ‘proxyState’ [-Wunused-parameter] 600 | static ncclResult_t p2pRecvProxySetup(struct ncclProxyConnection* connection, struct ncclProxyState* proxyState, void* reqBuff, int reqSize, void* respBuff, int respSize, int* done) { | ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~ transport/p2p.cc: In function ‘ncclResult_t p2pSendProxyConnect(ncclProxyConnection*, ncclProxyState*, void*, int, void*, int, int*)’: transport/p2p.cc:620:104: warning: unused parameter ‘proxyState’ [-Wunused-parameter] 620 | static ncclResult_t p2pSendProxyConnect(struct ncclProxyConnection* connection, struct ncclProxyState* proxyState, void* reqBuff, int reqSize, void* respBuff, int respSize, int* done) { | ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~ transport/p2p.cc:620:150: warning: unused parameter ‘respBuff’ [-Wunused-parameter] 620 | static ncclResult_t p2pSendProxyConnect(struct ncclProxyConnection* connection, struct ncclProxyState* proxyState, void* reqBuff, int reqSize, void* respBuff, int respSize, int* done) { | ~~~~~~^~~~~~~~ transport/p2p.cc:620:164: warning: unused parameter ‘respSize’ [-Wunused-parameter] 620 | static ncclResult_t p2pSendProxyConnect(struct ncclProxyConnection* connection, struct ncclProxyState* proxyState, void* reqBuff, int reqSize, void* respBuff, int respSize, int* done) { | ~~~~^~~~~~~~ transport/p2p.cc:620:179: warning: unused parameter ‘done’ [-Wunused-parameter] 620 | static ncclResult_t p2pSendProxyConnect(struct ncclProxyConnection* connection, struct ncclProxyState* proxyState, void* reqBuff, int reqSize, void* respBuff, int respSize, int* done) { | ~~~~~^~~~ transport/p2p.cc: In function ‘ncclResult_t p2pSendProxyFree(ncclProxyConnection*, ncclProxyState*)’: transport/p2p.cc:634:101: warning: unused parameter ‘proxyState’ [-Wunused-parameter] 634 | static ncclResult_t p2pSendProxyFree(struct ncclProxyConnection* connection, struct ncclProxyState* proxyState) { | ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~ transport/p2p.cc: In function ‘ncclResult_t p2pRecvProxyFree(ncclProxyConnection*, ncclProxyState*)’: transport/p2p.cc:666:101: warning: unused parameter ‘proxyState’ [-Wunused-parameter] 666 | static ncclResult_t p2pRecvProxyFree(struct ncclProxyConnection* connection, struct ncclProxyState* proxyState) { | ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~ In file included from include/transport.h:10, from include/comm.h:10, from transport/net.cc:7: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ In file included from include/core.h:59: include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = p2pResources; size_t = long unsigned int]’: transport/p2p.cc:332:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = p2pShmProxyInfo; size_t = long unsigned int]’: transport/p2p.cc:562:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = p2pCuMemProxyInfo; size_t = long unsigned int]’: transport/p2p.cc:589:7: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ In file included from include/core.h:62, from include/info.h:13, from include/graph.h:114, from include/transport.h:11: include/nvtx.h: At global scope: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ transport/shm.cc: In function ‘ncclResult_t shmCanConnect(int*, ncclTopoSystem*, ncclTopoGraph*, ncclPeerInfo*, ncclPeerInfo*)’: transport/shm.cc:50:96: warning: unused parameter ‘graph’ [-Wunused-parameter] 50 | static ncclResult_t shmCanConnect(int* ret, struct ncclTopoSystem* topo, struct ncclTopoGraph* graph, struct ncclPeerInfo* info1, struct ncclPeerInfo* info2) { | ~~~~~~~~~~~~~~~~~~~~~~^~~~~ transport/shm.cc: In function ‘ncclResult_t shmSendSetup(ncclComm*, ncclTopoGraph*, ncclPeerInfo*, ncclPeerInfo*, ncclConnect*, ncclConnector*, int, int)’: transport/shm.cc:76:79: warning: unused parameter ‘graph’ [-Wunused-parameter] 76 | static ncclResult_t shmSendSetup(struct ncclComm* comm, struct ncclTopoGraph* graph, struct ncclPeerInfo* myInfo, struct ncclPeerInfo* peerInfo, struct ncclConnect* connectInfo, struct ncclConnector* send, int channelId, int connIndex) { | ~~~~~~~~~~~~~~~~~~~~~~^~~~~ transport/shm.cc:76:226: warning: unused parameter ‘connIndex’ [-Wunused-parameter] 76 | static ncclResult_t shmSendSetup(struct ncclComm* comm, struct ncclTopoGraph* graph, struct ncclPeerInfo* myInfo, struct ncclPeerInfo* peerInfo, struct ncclConnect* connectInfo, struct ncclConnector* send, int channelId, int connIndex) { | ~~~~^~~~~~~~~ transport/shm.cc: In function ‘ncclResult_t shmRecvSetup(ncclComm*, ncclTopoGraph*, ncclPeerInfo*, ncclPeerInfo*, ncclConnect*, ncclConnector*, int, int)’: transport/shm.cc:99:79: warning: unused parameter ‘graph’ [-Wunused-parameter] 99 | static ncclResult_t shmRecvSetup(struct ncclComm* comm, struct ncclTopoGraph* graph, struct ncclPeerInfo* myInfo, struct ncclPeerInfo* peerInfo, struct ncclConnect* connectInfo, struct ncclConnector* recv, int channelId, int connIndex) { | ~~~~~~~~~~~~~~~~~~~~~~^~~~~ transport/shm.cc:99:107: warning: unused parameter ‘myInfo’ [-Wunused-parameter] 99 | static ncclResult_t shmRecvSetup(struct ncclComm* comm, struct ncclTopoGraph* graph, struct ncclPeerInfo* myInfo, struct ncclPeerInfo* peerInfo, struct ncclConnect* connectInfo, struct ncclConnector* recv, int channelId, int connIndex) { | ~~~~~~~~~~~~~~~~~~~~~^~~~~~ transport/shm.cc:99:136: warning: unused parameter ‘peerInfo’ [-Wunused-parameter] 99 | static ncclResult_t shmRecvSetup(struct ncclComm* comm, struct ncclTopoGraph* graph, struct ncclPeerInfo* myInfo, struct ncclPeerInfo* peerInfo, struct ncclConnect* connectInfo, struct ncclConnector* recv, int channelId, int connIndex) { | ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~ transport/shm.cc:99:211: warning: unused parameter ‘channelId’ [-Wunused-parameter] 99 | static ncclResult_t shmRecvSetup(struct ncclComm* comm, struct ncclTopoGraph* graph, struct ncclPeerInfo* myInfo, struct ncclPeerInfo* peerInfo, struct ncclConnect* connectInfo, struct ncclConnector* recv, int channelId, int connIndex) { | ~~~~^~~~~~~~~ transport/shm.cc:99:226: warning: unused parameter ‘connIndex’ [-Wunused-parameter] 99 | static ncclResult_t shmRecvSetup(struct ncclComm* comm, struct ncclTopoGraph* graph, struct ncclPeerInfo* myInfo, struct ncclPeerInfo* peerInfo, struct ncclConnect* connectInfo, struct ncclConnector* recv, int channelId, int connIndex) { | ~~~~^~~~~~~~~ transport/shm.cc: In function ‘ncclResult_t shmSendConnect(ncclComm*, ncclConnect*, int, int, ncclConnector*)’: transport/shm.cc:161:130: warning: missing initializer for member ‘shmProxyInfo::step’ [-Wmissing-field-initializers] 161 | struct shmProxyInfo proxyInfo = { NULL, NULL, send->conn.buffs[NCCL_PROTO_SIMPLE], resources->hostMem, resources->remHostMem }; | ^ transport/shm.cc:161:130: warning: missing initializer for member ‘shmProxyInfo::stream’ [-Wmissing-field-initializers] transport/shm.cc:161:130: warning: missing initializer for member ‘shmProxyInfo::events’ [-Wmissing-field-initializers] transport/shm.cc:135:96: warning: unused parameter ‘nranks’ [-Wunused-parameter] 135 | static ncclResult_t shmSendConnect(struct ncclComm* comm, struct ncclConnect* connectInfo, int nranks, int rank, struct ncclConnector* send) { | ~~~~^~~~~~ transport/shm.cc:135:108: warning: unused parameter ‘rank’ [-Wunused-parameter] 135 | static ncclResult_t shmSendConnect(struct ncclComm* comm, struct ncclConnect* connectInfo, int nranks, int rank, struct ncclConnector* send) { | ~~~~^~~~ transport/shm.cc: In function ‘ncclResult_t shmRecvConnect(ncclComm*, ncclConnect*, int, int, ncclConnector*)’: transport/shm.cc:191:130: warning: missing initializer for member ‘shmProxyInfo::step’ [-Wmissing-field-initializers] 191 | struct shmProxyInfo proxyInfo = { NULL, NULL, recv->conn.buffs[NCCL_PROTO_SIMPLE], resources->remHostMem, resources->hostMem }; | ^ transport/shm.cc:191:130: warning: missing initializer for member ‘shmProxyInfo::stream’ [-Wmissing-field-initializers] transport/shm.cc:191:130: warning: missing initializer for member ‘shmProxyInfo::events’ [-Wmissing-field-initializers] transport/shm.cc:170:96: warning: unused parameter ‘nranks’ [-Wunused-parameter] 170 | static ncclResult_t shmRecvConnect(struct ncclComm* comm, struct ncclConnect* connectInfo, int nranks, int rank, struct ncclConnector* recv) { | ~~~~^~~~~~ transport/shm.cc:170:108: warning: unused parameter ‘rank’ [-Wunused-parameter] 170 | static ncclResult_t shmRecvConnect(struct ncclComm* comm, struct ncclConnect* connectInfo, int nranks, int rank, struct ncclConnector* recv) { | ~~~~^~~~ transport/shm.cc: In function ‘ncclResult_t shmSendProxyConnect(ncclProxyConnection*, ncclProxyState*, void*, int, void*, int, int*)’: transport/shm.cc:219:179: warning: unused parameter ‘done’ [-Wunused-parameter] 219 | static ncclResult_t shmSendProxyConnect(struct ncclProxyConnection* connection, struct ncclProxyState* proxyState, void* reqBuff, int reqSize, void* respBuff, int respSize, int* done) { | ~~~~~^~~~ transport/shm.cc: In function ‘ncclResult_t shmRecvProxyConnect(ncclProxyConnection*, ncclProxyState*, void*, int, void*, int, int*)’: transport/shm.cc:237:179: warning: unused parameter ‘done’ [-Wunused-parameter] 237 | static ncclResult_t shmRecvProxyConnect(struct ncclProxyConnection* connection, struct ncclProxyState* proxyState, void* reqBuff, int reqSize, void* respBuff, int respSize, int* done) { | ~~~~~^~~~ transport/shm.cc: In function ‘ncclResult_t shmSendProxyFree(ncclProxyConnection*, ncclProxyState*)’: transport/shm.cc:255:101: warning: unused parameter ‘proxyState’ [-Wunused-parameter] 255 | static ncclResult_t shmSendProxyFree(struct ncclProxyConnection* connection, struct ncclProxyState* proxyState) { | ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~ transport/shm.cc: In function ‘ncclResult_t shmRecvProxyFree(ncclProxyConnection*, ncclProxyState*)’: transport/shm.cc:270:101: warning: unused parameter ‘proxyState’ [-Wunused-parameter] 270 | static ncclResult_t shmRecvProxyFree(struct ncclProxyConnection* connection, struct ncclProxyState* proxyState) { | ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~ In file included from include/core.h:62, from include/info.h:13, from include/graph.h:114, from include/transport.h:11: include/nvtx.h: At global scope: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ In file included from include/core.h:59: include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = shmSendResources; size_t = long unsigned int]’: transport/shm.cc:78:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = shmRecvResources; size_t = long unsigned int]’: transport/shm.cc:101:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = shmProxyInfo; size_t = long unsigned int]’: transport/shm.cc:221:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ transport/net.cc: In function ‘ncclResult_t canConnect(int*, ncclTopoSystem*, ncclTopoGraph*, ncclPeerInfo*, ncclPeerInfo*)’: transport/net.cc:138:93: warning: unused parameter ‘graph’ [-Wunused-parameter] 138 | static ncclResult_t canConnect(int* ret, struct ncclTopoSystem* topo, struct ncclTopoGraph* graph, struct ncclPeerInfo* info1, struct ncclPeerInfo* info2) { | ~~~~~~~~~~~~~~~~~~~~~~^~~~~ transport/net.cc: In function ‘ncclResult_t sendSetup(ncclComm*, ncclTopoGraph*, ncclPeerInfo*, ncclPeerInfo*, ncclConnect*, ncclConnector*, int, int)’: transport/net.cc:165:29: warning: missing initializer for member ‘setupReq::tpLocalRank’ [-Wmissing-field-initializers] 165 | struct setupReq req = { 0 }; | ^ transport/net.cc:165:29: warning: missing initializer for member ‘setupReq::tpRemoteRank’ [-Wmissing-field-initializers] transport/net.cc:165:29: warning: missing initializer for member ‘setupReq::shared’ [-Wmissing-field-initializers] transport/net.cc:165:29: warning: missing initializer for member ‘setupReq::netDev’ [-Wmissing-field-initializers] transport/net.cc:165:29: warning: missing initializer for member ‘setupReq::useGdr’ [-Wmissing-field-initializers] transport/net.cc:165:29: warning: missing initializer for member ‘setupReq::needFlush’ [-Wmissing-field-initializers] transport/net.cc:165:29: warning: missing initializer for member ‘setupReq::channelId’ [-Wmissing-field-initializers] transport/net.cc:165:29: warning: missing initializer for member ‘setupReq::connIndex’ [-Wmissing-field-initializers] transport/net.cc: In function ‘ncclResult_t recvSetup(ncclComm*, ncclTopoGraph*, ncclPeerInfo*, ncclPeerInfo*, ncclConnect*, ncclConnector*, int, int)’: transport/net.cc:203:29: warning: missing initializer for member ‘setupReq::tpLocalRank’ [-Wmissing-field-initializers] 203 | struct setupReq req = { 0 }; | ^ transport/net.cc:203:29: warning: missing initializer for member ‘setupReq::tpRemoteRank’ [-Wmissing-field-initializers] transport/net.cc:203:29: warning: missing initializer for member ‘setupReq::shared’ [-Wmissing-field-initializers] transport/net.cc:203:29: warning: missing initializer for member ‘setupReq::netDev’ [-Wmissing-field-initializers] transport/net.cc:203:29: warning: missing initializer for member ‘setupReq::useGdr’ [-Wmissing-field-initializers] transport/net.cc:203:29: warning: missing initializer for member ‘setupReq::needFlush’ [-Wmissing-field-initializers] transport/net.cc:203:29: warning: missing initializer for member ‘setupReq::channelId’ [-Wmissing-field-initializers] transport/net.cc:203:29: warning: missing initializer for member ‘setupReq::connIndex’ [-Wmissing-field-initializers] transport/net.cc: In function ‘ncclResult_t sendConnect(ncclComm*, ncclConnect*, int, int, ncclConnector*)’: transport/net.cc:270:93: warning: unused parameter ‘nranks’ [-Wunused-parameter] 270 | static ncclResult_t sendConnect(struct ncclComm* comm, struct ncclConnect* connectInfo, int nranks, int rank, struct ncclConnector* send) { | ~~~~^~~~~~ transport/net.cc:270:105: warning: unused parameter ‘rank’ [-Wunused-parameter] 270 | static ncclResult_t sendConnect(struct ncclComm* comm, struct ncclConnect* connectInfo, int nranks, int rank, struct ncclConnector* send) { | ~~~~^~~~ transport/net.cc: In function ‘ncclResult_t recvConnect(ncclComm*, ncclConnect*, int, int, ncclConnector*)’: transport/net.cc:346:93: warning: unused parameter ‘nranks’ [-Wunused-parameter] 346 | static ncclResult_t recvConnect(struct ncclComm* comm, struct ncclConnect* connectInfo, int nranks, int rank, struct ncclConnector* recv) { | ~~~~^~~~~~ transport/net.cc:346:105: warning: unused parameter ‘rank’ [-Wunused-parameter] 346 | static ncclResult_t recvConnect(struct ncclComm* comm, struct ncclConnect* connectInfo, int nranks, int rank, struct ncclConnector* recv) { | ~~~~^~~~ transport/net.cc: In function ‘ncclResult_t sendProxySetup(ncclProxyConnection*, ncclProxyState*, void*, int, void*, int, int*)’: transport/net.cc:499:145: warning: unused parameter ‘respBuff’ [-Wunused-parameter] 499 | static ncclResult_t sendProxySetup(struct ncclProxyConnection* connection, struct ncclProxyState* proxyState, void* reqBuff, int reqSize, void* respBuff, int respSize, int* done) { | ~~~~~~^~~~~~~~ In file included from include/core.h:59: include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = connectMap; size_t = long unsigned int]’: transport/net.cc:278:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclProxyPeer*; size_t = long unsigned int]’: transport/net.cc:427:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclProxyPeer; size_t = long unsigned int]’: transport/net.cc:431:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = sendResources; size_t = long unsigned int]’: transport/net.cc:504:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = recvResources; size_t = long unsigned int]’: transport/net.cc:532:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclSharedNetComms; size_t = long unsigned int]’: transport/net.cc:577:9: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = gdr_mem_desc; size_t = long unsigned int]’: include/gdrwrap.h:218:3: required from ‘ncclResult_t ncclGdrCudaCalloc(T**, T**, size_t, void**) [with T = long unsigned int; size_t = long unsigned int]’ transport/net.cc:650:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ Compiling transport/net_socket.cc > /<>/build/obj/transport/net_socket.o mkdir -p `dirname /<>/build/obj/transport/net_socket.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c transport/net_socket.cc -o /<>/build/obj/transport/net_socket.o In file included from include/transport.h:10, from include/comm.h:10, from transport/net_socket.cc:7: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ In file included from include/core.h:62, from include/info.h:13, from include/graph.h:114, from include/transport.h:11: include/nvtx.h: At global scope: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ transport/net_socket.cc: In function ‘ncclResult_t ncclNetSocketInit(ncclDebugLogger_t)’: transport/net_socket.cc:38:50: warning: unused parameter ‘logFunction’ [-Wunused-parameter] 38 | ncclResult_t ncclNetSocketInit(ncclDebugLogger_t logFunction) { | ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~ transport/net_socket.cc: In function ‘ncclResult_t ncclNetSocketRegMr(void*, void*, int, int, void**)’: transport/net_socket.cc:535:39: warning: unused parameter ‘comm’ [-Wunused-parameter] 535 | ncclResult_t ncclNetSocketRegMr(void* comm, void* data, int size, int type, void** mhandle) { | ~~~~~~^~~~ transport/net_socket.cc:535:51: warning: unused parameter ‘data’ [-Wunused-parameter] 535 | ncclResult_t ncclNetSocketRegMr(void* comm, void* data, int size, int type, void** mhandle) { | ~~~~~~^~~~ transport/net_socket.cc:535:61: warning: unused parameter ‘size’ [-Wunused-parameter] 535 | ncclResult_t ncclNetSocketRegMr(void* comm, void* data, int size, int type, void** mhandle) { | ~~~~^~~~ transport/net_socket.cc:535:84: warning: unused parameter ‘mhandle’ [-Wunused-parameter] 535 | ncclResult_t ncclNetSocketRegMr(void* comm, void* data, int size, int type, void** mhandle) { | ~~~~~~~^~~~~~~ transport/net_socket.cc: In function ‘ncclResult_t ncclNetSocketDeregMr(void*, void*)’: transport/net_socket.cc:538:41: warning: unused parameter ‘comm’ [-Wunused-parameter] 538 | ncclResult_t ncclNetSocketDeregMr(void* comm, void* mhandle) { return ncclSuccess; } | ~~~~~~^~~~ transport/net_socket.cc:538:53: warning: unused parameter ‘mhandle’ [-Wunused-parameter] 538 | ncclResult_t ncclNetSocketDeregMr(void* comm, void* mhandle) { return ncclSuccess; } | ~~~~~~^~~~~~~ transport/net_socket.cc: In function ‘ncclResult_t ncclNetSocketIsend(void*, void*, int, int, void*, void**)’: transport/net_socket.cc:540:75: warning: unused parameter ‘tag’ [-Wunused-parameter] 540 | ncclResult_t ncclNetSocketIsend(void* sendComm, void* data, int size, int tag, void* mhandle, void** request) { | ~~~~^~~ transport/net_socket.cc:540:86: warning: unused parameter ‘mhandle’ [-Wunused-parameter] 540 | ncclResult_t ncclNetSocketIsend(void* sendComm, void* data, int size, int tag, void* mhandle, void** request) { | ~~~~~~^~~~~~~ transport/net_socket.cc: In function ‘ncclResult_t ncclNetSocketIrecv(void*, int, void**, int*, int*, void**, void**)’: transport/net_socket.cc:546:86: warning: unused parameter ‘tags’ [-Wunused-parameter] 546 | ncclResult_t ncclNetSocketIrecv(void* recvComm, int n, void** data, int* sizes, int* tags, void** mhandles, void** request) { | ~~~~~^~~~ transport/net_socket.cc:546:99: warning: unused parameter ‘mhandles’ [-Wunused-parameter] 546 | ncclResult_t ncclNetSocketIrecv(void* recvComm, int n, void** data, int* sizes, int* tags, void** mhandles, void** request) { | ~~~~~~~^~~~~~~~ transport/net_socket.cc: In function ‘ncclResult_t ncclNetSocketIflush(void*, int, void**, int*, void**, void**)’: transport/net_socket.cc:553:40: warning: unused parameter ‘recvComm’ [-Wunused-parameter] 553 | ncclResult_t ncclNetSocketIflush(void* recvComm, int n, void** data, int* sizes, void** mhandles, void** request) { | ~~~~~~^~~~~~~~ transport/net_socket.cc:553:54: warning: unused parameter ‘n’ [-Wunused-parameter] 553 | ncclResult_t ncclNetSocketIflush(void* recvComm, int n, void** data, int* sizes, void** mhandles, void** request) { | ~~~~^ transport/net_socket.cc:553:64: warning: unused parameter ‘data’ [-Wunused-parameter] 553 | ncclResult_t ncclNetSocketIflush(void* recvComm, int n, void** data, int* sizes, void** mhandles, void** request) { | ~~~~~~~^~~~ transport/net_socket.cc:553:75: warning: unused parameter ‘sizes’ [-Wunused-parameter] 553 | ncclResult_t ncclNetSocketIflush(void* recvComm, int n, void** data, int* sizes, void** mhandles, void** request) { | ~~~~~^~~~~ transport/net_socket.cc:553:89: warning: unused parameter ‘mhandles’ [-Wunused-parameter] 553 | ncclResult_t ncclNetSocketIflush(void* recvComm, int n, void** data, int* sizes, void** mhandles, void** request) { | ~~~~~~~^~~~~~~~ transport/net_socket.cc:553:106: warning: unused parameter ‘request’ [-Wunused-parameter] 553 | ncclResult_t ncclNetSocketIflush(void* recvComm, int n, void** data, int* sizes, void** mhandles, void** request) { | ~~~~~~~^~~~~~~ In file included from include/core.h:59: include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclNetSocketListenComm; size_t = long unsigned int]’: transport/net_socket.cc:291:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclNetSocketComm; size_t = long unsigned int]’: transport/net_socket.cc:320:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclSocket; size_t = long unsigned int]’: transport/net_socket.cc:370:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclNetSocketTask; size_t = long unsigned int]’: transport/net_socket.cc:432:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ Compiling transport/net_ib.cc > /<>/build/obj/transport/net_ib.o mkdir -p `dirname /<>/build/obj/transport/net_ib.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c transport/net_ib.cc -o /<>/build/obj/transport/net_ib.o Compiling transport/coll_net.cc > /<>/build/obj/transport/coll_net.o mkdir -p `dirname /<>/build/obj/transport/coll_net.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c transport/coll_net.cc -o /<>/build/obj/transport/coll_net.o In file included from include/transport.h:10, from include/comm.h:10, from transport/coll_net.cc:7: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ In file included from include/core.h:62, from transport/net_ib.cc:8: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ In file included from include/transport.h:10, from include/comm.h:10, from include/net.h:12, from transport/net_ib.cc:10: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ transport/net_ib.cc: In function ‘ncclResult_t ncclIbInit(ncclDebugLogger_t)’: transport/net_ib.cc:159:43: warning: unused parameter ‘logFunction’ [-Wunused-parameter] 159 | ncclResult_t ncclIbInit(ncclDebugLogger_t logFunction) { | ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~ transport/net_ib.cc: In function ‘ncclResult_t ncclIbGdrSupport(int)’: transport/net_ib.cc:281:35: warning: unused parameter ‘ibDev’ [-Wunused-parameter] 281 | ncclResult_t ncclIbGdrSupport(int ibDev) { | ~~~~^~~~~ transport/net_ib.cc: In function ‘ncclResult_t ncclIbRegMrDmaBuf(void*, void*, size_t, int, uint64_t, int, void**)’: transport/net_ib.cc:879:73: warning: unused parameter ‘type’ [-Wunused-parameter] 879 | ncclResult_t ncclIbRegMrDmaBuf(void* comm, void* data, size_t size, int type, uint64_t offset, int fd, void** mhandle) { | ~~~~^~~~ In file included from include/core.h:59: include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclIbListenComm; size_t = long unsigned int]’: transport/net_ib.cc:598:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ In file included from include/core.h:62, from include/info.h:13, from include/graph.h:114, from include/transport.h:11: include/nvtx.h: At global scope: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ transport/coll_net.cc: In function ‘ncclResult_t canConnect(int*, ncclTopoSystem*, ncclTopoGraph*, ncclPeerInfo*, ncclPeerInfo*)’: transport/coll_net.cc:134:65: warning: unused parameter ‘topo’ [-Wunused-parameter] 134 | static ncclResult_t canConnect(int* ret, struct ncclTopoSystem* topo, struct ncclTopoGraph* graph, struct ncclPeerInfo* info1, struct ncclPeerInfo* info2) { | ~~~~~~~~~~~~~~~~~~~~~~~^~~~ transport/coll_net.cc:134:93: warning: unused parameter ‘graph’ [-Wunused-parameter] 134 | static ncclResult_t canConnect(int* ret, struct ncclTopoSystem* topo, struct ncclTopoGraph* graph, struct ncclPeerInfo* info1, struct ncclPeerInfo* info2) { | ~~~~~~~~~~~~~~~~~~~~~~^~~~~ transport/coll_net.cc:134:121: warning: unused parameter ‘info1’ [-Wunused-parameter] 134 | static ncclResult_t canConnect(int* ret, struct ncclTopoSystem* topo, struct ncclTopoGraph* graph, struct ncclPeerInfo* info1, struct ncclPeerInfo* info2) { | ~~~~~~~~~~~~~~~~~~~~~^~~~~ transport/coll_net.cc:134:149: warning: unused parameter ‘info2’ [-Wunused-parameter] 134 | static ncclResult_t canConnect(int* ret, struct ncclTopoSystem* topo, struct ncclTopoGraph* graph, struct ncclPeerInfo* info1, struct ncclPeerInfo* info2) { | ~~~~~~~~~~~~~~~~~~~~~^~~~~ transport/coll_net.cc: In function ‘ncclResult_t sendSetup(ncclComm*, ncclTopoGraph*, ncclPeerInfo*, ncclPeerInfo*, ncclConnect*, ncclConnector*, int, int)’: transport/coll_net.cc:151:29: warning: missing initializer for member ‘setupReq::useGdr’ [-Wmissing-field-initializers] 151 | struct setupReq req = { 0 }; | ^ transport/coll_net.cc:151:29: warning: missing initializer for member ‘setupReq::needFlush’ [-Wmissing-field-initializers] transport/coll_net.cc:151:29: warning: missing initializer for member ‘setupReq::collNet’ [-Wmissing-field-initializers] transport/coll_net.cc:150:133: warning: unused parameter ‘peerInfo’ [-Wunused-parameter] 150 | static ncclResult_t sendSetup(struct ncclComm* comm, struct ncclTopoGraph* graph, struct ncclPeerInfo* myInfo, struct ncclPeerInfo* peerInfo, struct ncclConnect* connectInfo, struct ncclConnector* send, int channelId, int connIndex) { | ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~ transport/coll_net.cc:150:163: warning: unused parameter ‘connectInfo’ [-Wunused-parameter] 150 | static ncclResult_t sendSetup(struct ncclComm* comm, struct ncclTopoGraph* graph, struct ncclPeerInfo* myInfo, struct ncclPeerInfo* peerInfo, struct ncclConnect* connectInfo, struct ncclConnector* send, int channelId, int connIndex) { | ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~ transport/coll_net.cc: In function ‘ncclResult_t recvSetup(ncclComm*, ncclTopoGraph*, ncclPeerInfo*, ncclPeerInfo*, ncclConnect*, ncclConnector*, int, int)’: transport/coll_net.cc:171:29: warning: missing initializer for member ‘setupReq::useGdr’ [-Wmissing-field-initializers] 171 | struct setupReq req = { 0 }; | ^ transport/coll_net.cc:171:29: warning: missing initializer for member ‘setupReq::needFlush’ [-Wmissing-field-initializers] transport/coll_net.cc:171:29: warning: missing initializer for member ‘setupReq::collNet’ [-Wmissing-field-initializers] transport/coll_net.cc:170:133: warning: unused parameter ‘peerInfo’ [-Wunused-parameter] 170 | static ncclResult_t recvSetup(struct ncclComm* comm, struct ncclTopoGraph* graph, struct ncclPeerInfo* myInfo, struct ncclPeerInfo* peerInfo, struct ncclConnect* connectInfo, struct ncclConnector* recv, int channelId, int connIndex) { | ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~ transport/coll_net.cc: In function ‘ncclResult_t sendFree(ncclConnector*)’: transport/coll_net.cc:278:52: warning: unused parameter ‘send’ [-Wunused-parameter] 278 | static ncclResult_t sendFree(struct ncclConnector* send) { | ~~~~~~~~~~~~~~~~~~~~~~^~~~ transport/coll_net.cc: In function ‘ncclResult_t recvFree(ncclConnector*)’: transport/coll_net.cc:282:52: warning: unused parameter ‘recv’ [-Wunused-parameter] 282 | static ncclResult_t recvFree(struct ncclConnector* recv) { | ~~~~~~~~~~~~~~~~~~~~~~^~~~ transport/coll_net.cc: In function ‘ncclResult_t sendProxySetup(ncclProxyConnection*, ncclProxyState*, void*, int, void*, int, int*)’: transport/coll_net.cc:286:145: warning: unused parameter ‘respBuff’ [-Wunused-parameter] 286 | static ncclResult_t sendProxySetup(struct ncclProxyConnection* connection, struct ncclProxyState* proxyState, void* reqBuff, int reqSize, void* respBuff, int respSize, int* done) { | ~~~~~~^~~~~~~~ transport/coll_net.cc:286:159: warning: unused parameter ‘respSize’ [-Wunused-parameter] 286 | static ncclResult_t sendProxySetup(struct ncclProxyConnection* connection, struct ncclProxyState* proxyState, void* reqBuff, int reqSize, void* respBuff, int respSize, int* done) { | ~~~~^~~~~~~~ transport/coll_net.cc:286:174: warning: unused parameter ‘done’ [-Wunused-parameter] 286 | static ncclResult_t sendProxySetup(struct ncclProxyConnection* connection, struct ncclProxyState* proxyState, void* reqBuff, int reqSize, void* respBuff, int respSize, int* done) { | ~~~~~^~~~ transport/coll_net.cc: In function ‘ncclResult_t recvProxySetup(ncclProxyConnection*, ncclProxyState*, void*, int, void*, int, int*)’: transport/coll_net.cc:394:174: warning: unused parameter ‘done’ [-Wunused-parameter] 394 | static ncclResult_t recvProxySetup(struct ncclProxyConnection* connection, struct ncclProxyState* proxyState, void* reqBuff, int reqSize, void* respBuff, int respSize, int* done) { | ~~~~~^~~~ transport/coll_net.cc: In function ‘ncclResult_t sendProxyConnect(ncclProxyConnection*, ncclProxyState*, void*, int, void*, int, int*)’: transport/coll_net.cc:419:176: warning: unused parameter ‘done’ [-Wunused-parameter] 419 | static ncclResult_t sendProxyConnect(struct ncclProxyConnection* connection, struct ncclProxyState* proxyState, void* reqBuff, int reqSize, void* respBuff, int respSize, int* done) { | ~~~~~^~~~ transport/coll_net.cc: In function ‘ncclResult_t recvProxyConnect(ncclProxyConnection*, ncclProxyState*, void*, int, void*, int, int*)’: transport/coll_net.cc:493:176: warning: unused parameter ‘done’ [-Wunused-parameter] 493 | static ncclResult_t recvProxyConnect(struct ncclProxyConnection* connection, struct ncclProxyState* proxyState, void* reqBuff, int reqSize, void* respBuff, int respSize, int* done) { | ~~~~~^~~~ In file included from include/core.h:59: include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = sendResources; size_t = long unsigned int]’: transport/coll_net.cc:291:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = sharedResources; size_t = long unsigned int]’: transport/coll_net.cc:314:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = char (*)[128]; size_t = long unsigned int]’: transport/coll_net.cc:327:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = recvResources; size_t = long unsigned int]’: transport/coll_net.cc:399:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = gdr_mem_desc; size_t = long unsigned int]’: include/gdrwrap.h:218:3: required from ‘ncclResult_t ncclGdrCudaCalloc(T**, T**, size_t, void**) [with T = long unsigned int; size_t = long unsigned int]’ transport/coll_net.cc:452:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ Compiling transport/nvls.cc > /<>/build/obj/transport/nvls.o mkdir -p `dirname /<>/build/obj/transport/nvls.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c transport/nvls.cc -o /<>/build/obj/transport/nvls.o In file included from include/transport.h:10, from include/comm.h:10, from transport/nvls.cc:9: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ Compiling collectives/sendrecv.cc > /<>/build/obj/collectives/sendrecv.o mkdir -p `dirname /<>/build/obj/collectives/sendrecv.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c collectives/sendrecv.cc -o /<>/build/obj/collectives/sendrecv.o In file included from include/transport.h:10, from include/comm.h:10, from include/enqueue.h:10, from collectives/sendrecv.cc:7: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ In file included from include/core.h:62, from include/info.h:13, from include/graph.h:114, from include/transport.h:11: include/nvtx.h: At global scope: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ transport/nvls.cc: In function ‘ncclResult_t ncclNvlsSetup(ncclComm*, ncclComm*)’: transport/nvls.cc:410:45: warning: unused parameter ‘comm’ [-Wunused-parameter] 410 | ncclResult_t ncclNvlsSetup(struct ncclComm* comm, struct ncclComm* parent) { | ~~~~~~~~~~~~~~~~~^~~~ transport/nvls.cc:410:68: warning: unused parameter ‘parent’ [-Wunused-parameter] 410 | ncclResult_t ncclNvlsSetup(struct ncclComm* comm, struct ncclComm* parent) { | ~~~~~~~~~~~~~~~~~^~~~~~ transport/nvls.cc: In function ‘ncclResult_t ncclNvlsFree(ncclComm*)’: transport/nvls.cc:414:44: warning: unused parameter ‘comm’ [-Wunused-parameter] 414 | ncclResult_t ncclNvlsFree(struct ncclComm* comm) { | ~~~~~~~~~~~~~~~~~^~~~ In file included from include/core.h:62, from include/info.h:13, from include/graph.h:114, from include/transport.h:11: include/nvtx.h: At global scope: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ collectives/sendrecv.cc:18:1: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::description’ [-Wmissing-field-initializers] 18 | }; | ^ collectives/sendrecv.cc:18:1: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::arrayOrUnionDetail’ [-Wmissing-field-initializers] collectives/sendrecv.cc:18:1: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::offset’ [-Wmissing-field-initializers] collectives/sendrecv.cc:18:1: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::semantics’ [-Wmissing-field-initializers] collectives/sendrecv.cc:18:1: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::reserved’ [-Wmissing-field-initializers] collectives/sendrecv.cc:18:1: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::semantics’ [-Wmissing-field-initializers] collectives/sendrecv.cc:18:1: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::reserved’ [-Wmissing-field-initializers] collectives/sendrecv.cc: In function ‘ncclResult_t ncclSend(const void*, size_t, ncclDataType_t, int, ncclComm_t, cudaStream_t)’: collectives/sendrecv.cc:29:10: warning: missing initializer for member ‘ncclInfo::opFull’ [-Wmissing-field-initializers] 29 | 1, 1 }; | ^ collectives/sendrecv.cc:29:10: warning: missing initializer for member ‘ncclInfo::algorithm’ [-Wmissing-field-initializers] collectives/sendrecv.cc:29:10: warning: missing initializer for member ‘ncclInfo::protocol’ [-Wmissing-field-initializers] collectives/sendrecv.cc:29:10: warning: missing initializer for member ‘ncclInfo::pattern’ [-Wmissing-field-initializers] collectives/sendrecv.cc:29:10: warning: missing initializer for member ‘ncclInfo::nChannels’ [-Wmissing-field-initializers] collectives/sendrecv.cc:29:10: warning: missing initializer for member ‘ncclInfo::nThreads’ [-Wmissing-field-initializers] collectives/sendrecv.cc:29:10: warning: missing initializer for member ‘ncclInfo::nBytes’ [-Wmissing-field-initializers] collectives/sendrecv.cc:29:10: warning: missing initializer for member ‘ncclInfo::nstepsPerLoop’ [-Wmissing-field-initializers] collectives/sendrecv.cc:29:10: warning: missing initializer for member ‘ncclInfo::nchunksPerLoop’ [-Wmissing-field-initializers] collectives/sendrecv.cc:29:10: warning: missing initializer for member ‘ncclInfo::chunkSize’ [-Wmissing-field-initializers] collectives/sendrecv.cc:29:10: warning: missing initializer for member ‘ncclInfo::channelId’ [-Wmissing-field-initializers] collectives/sendrecv.cc: In function ‘ncclResult_t ncclRecv(void*, size_t, ncclDataType_t, int, ncclComm_t, cudaStream_t)’: collectives/sendrecv.cc:46:10: warning: missing initializer for member ‘ncclInfo::opFull’ [-Wmissing-field-initializers] 46 | 1, 1 }; | ^ collectives/sendrecv.cc:46:10: warning: missing initializer for member ‘ncclInfo::algorithm’ [-Wmissing-field-initializers] collectives/sendrecv.cc:46:10: warning: missing initializer for member ‘ncclInfo::protocol’ [-Wmissing-field-initializers] collectives/sendrecv.cc:46:10: warning: missing initializer for member ‘ncclInfo::pattern’ [-Wmissing-field-initializers] collectives/sendrecv.cc:46:10: warning: missing initializer for member ‘ncclInfo::nChannels’ [-Wmissing-field-initializers] collectives/sendrecv.cc:46:10: warning: missing initializer for member ‘ncclInfo::nThreads’ [-Wmissing-field-initializers] collectives/sendrecv.cc:46:10: warning: missing initializer for member ‘ncclInfo::nBytes’ [-Wmissing-field-initializers] collectives/sendrecv.cc:46:10: warning: missing initializer for member ‘ncclInfo::nstepsPerLoop’ [-Wmissing-field-initializers] collectives/sendrecv.cc:46:10: warning: missing initializer for member ‘ncclInfo::nchunksPerLoop’ [-Wmissing-field-initializers] collectives/sendrecv.cc:46:10: warning: missing initializer for member ‘ncclInfo::chunkSize’ [-Wmissing-field-initializers] collectives/sendrecv.cc:46:10: warning: missing initializer for member ‘ncclInfo::channelId’ [-Wmissing-field-initializers] Compiling collectives/all_reduce.cc > /<>/build/obj/collectives/all_reduce.o mkdir -p `dirname /<>/build/obj/collectives/all_reduce.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c collectives/all_reduce.cc -o /<>/build/obj/collectives/all_reduce.o Compiling collectives/all_gather.cc > /<>/build/obj/collectives/all_gather.o mkdir -p `dirname /<>/build/obj/collectives/all_gather.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c collectives/all_gather.cc -o /<>/build/obj/collectives/all_gather.o Compiling collectives/broadcast.cc > /<>/build/obj/collectives/broadcast.o mkdir -p `dirname /<>/build/obj/collectives/broadcast.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c collectives/broadcast.cc -o /<>/build/obj/collectives/broadcast.o In file included from include/transport.h:10, from include/comm.h:10, from include/enqueue.h:10, from collectives/all_reduce.cc:7: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ In file included from include/transport.h:10, from include/comm.h:10, from include/enqueue.h:10, from collectives/all_gather.cc:7: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ Compiling collectives/reduce.cc > /<>/build/obj/collectives/reduce.o mkdir -p `dirname /<>/build/obj/collectives/reduce.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c collectives/reduce.cc -o /<>/build/obj/collectives/reduce.o In file included from include/transport.h:10, from include/comm.h:10, from include/enqueue.h:10, from collectives/broadcast.cc:7: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ In file included from include/core.h:62, from include/info.h:13, from include/graph.h:114, from include/transport.h:11: include/nvtx.h: At global scope: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ collectives/all_reduce.cc: In function ‘ncclResult_t ncclAllReduce(const void*, void*, size_t, ncclDataType_t, ncclRedOp_t, ncclComm*, cudaStream_t)’: collectives/all_reduce.cc:23:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::description’ [-Wmissing-field-initializers] 23 | }; | ^ collectives/all_reduce.cc:23:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::arrayOrUnionDetail’ [-Wmissing-field-initializers] collectives/all_reduce.cc:23:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::offset’ [-Wmissing-field-initializers] collectives/all_reduce.cc:23:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::semantics’ [-Wmissing-field-initializers] collectives/all_reduce.cc:23:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::reserved’ [-Wmissing-field-initializers] collectives/all_reduce.cc:23:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::semantics’ [-Wmissing-field-initializers] collectives/all_reduce.cc:23:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::reserved’ [-Wmissing-field-initializers] collectives/all_reduce.cc:29:48: warning: missing initializer for member ‘ncclInfo::opFull’ [-Wmissing-field-initializers] 29 | ALLREDUCE_CHUNKSTEPS, ALLREDUCE_SLICESTEPS }; | ^ collectives/all_reduce.cc:29:48: warning: missing initializer for member ‘ncclInfo::algorithm’ [-Wmissing-field-initializers] collectives/all_reduce.cc:29:48: warning: missing initializer for member ‘ncclInfo::protocol’ [-Wmissing-field-initializers] collectives/all_reduce.cc:29:48: warning: missing initializer for member ‘ncclInfo::pattern’ [-Wmissing-field-initializers] collectives/all_reduce.cc:29:48: warning: missing initializer for member ‘ncclInfo::nChannels’ [-Wmissing-field-initializers] collectives/all_reduce.cc:29:48: warning: missing initializer for member ‘ncclInfo::nThreads’ [-Wmissing-field-initializers] collectives/all_reduce.cc:29:48: warning: missing initializer for member ‘ncclInfo::nBytes’ [-Wmissing-field-initializers] collectives/all_reduce.cc:29:48: warning: missing initializer for member ‘ncclInfo::nstepsPerLoop’ [-Wmissing-field-initializers] collectives/all_reduce.cc:29:48: warning: missing initializer for member ‘ncclInfo::nchunksPerLoop’ [-Wmissing-field-initializers] collectives/all_reduce.cc:29:48: warning: missing initializer for member ‘ncclInfo::chunkSize’ [-Wmissing-field-initializers] collectives/all_reduce.cc:29:48: warning: missing initializer for member ‘ncclInfo::channelId’ [-Wmissing-field-initializers] In file included from include/transport.h:10, from include/comm.h:10, from include/enqueue.h:10, from collectives/reduce.cc:7: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ In file included from include/core.h:62, from include/info.h:13, from include/graph.h:114, from include/transport.h:11: include/nvtx.h: At global scope: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ collectives/all_gather.cc: In function ‘ncclResult_t ncclAllGather(const void*, void*, size_t, ncclDataType_t, ncclComm_t, cudaStream_t)’: collectives/all_gather.cc:17:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::description’ [-Wmissing-field-initializers] 17 | }; | ^ collectives/all_gather.cc:17:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::arrayOrUnionDetail’ [-Wmissing-field-initializers] collectives/all_gather.cc:17:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::offset’ [-Wmissing-field-initializers] collectives/all_gather.cc:17:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::semantics’ [-Wmissing-field-initializers] collectives/all_gather.cc:17:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::reserved’ [-Wmissing-field-initializers] collectives/all_gather.cc:23:48: warning: missing initializer for member ‘ncclInfo::opFull’ [-Wmissing-field-initializers] 23 | ALLGATHER_CHUNKSTEPS, ALLGATHER_SLICESTEPS }; | ^ collectives/all_gather.cc:23:48: warning: missing initializer for member ‘ncclInfo::algorithm’ [-Wmissing-field-initializers] collectives/all_gather.cc:23:48: warning: missing initializer for member ‘ncclInfo::protocol’ [-Wmissing-field-initializers] collectives/all_gather.cc:23:48: warning: missing initializer for member ‘ncclInfo::pattern’ [-Wmissing-field-initializers] collectives/all_gather.cc:23:48: warning: missing initializer for member ‘ncclInfo::nChannels’ [-Wmissing-field-initializers] collectives/all_gather.cc:23:48: warning: missing initializer for member ‘ncclInfo::nThreads’ [-Wmissing-field-initializers] collectives/all_gather.cc:23:48: warning: missing initializer for member ‘ncclInfo::nBytes’ [-Wmissing-field-initializers] collectives/all_gather.cc:23:48: warning: missing initializer for member ‘ncclInfo::nstepsPerLoop’ [-Wmissing-field-initializers] collectives/all_gather.cc:23:48: warning: missing initializer for member ‘ncclInfo::nchunksPerLoop’ [-Wmissing-field-initializers] collectives/all_gather.cc:23:48: warning: missing initializer for member ‘ncclInfo::chunkSize’ [-Wmissing-field-initializers] collectives/all_gather.cc:23:48: warning: missing initializer for member ‘ncclInfo::channelId’ [-Wmissing-field-initializers] In file included from include/core.h:62, from include/info.h:13, from include/graph.h:114, from include/transport.h:11: include/nvtx.h: At global scope: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ collectives/broadcast.cc: In function ‘ncclResult_t ncclBroadcast(const void*, void*, size_t, ncclDataType_t, int, ncclComm_t, cudaStream_t)’: collectives/broadcast.cc:21:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::description’ [-Wmissing-field-initializers] 21 | }; | ^ collectives/broadcast.cc:21:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::arrayOrUnionDetail’ [-Wmissing-field-initializers] collectives/broadcast.cc:21:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::offset’ [-Wmissing-field-initializers] collectives/broadcast.cc:21:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::semantics’ [-Wmissing-field-initializers] collectives/broadcast.cc:21:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::reserved’ [-Wmissing-field-initializers] collectives/broadcast.cc:21:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::semantics’ [-Wmissing-field-initializers] collectives/broadcast.cc:21:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::reserved’ [-Wmissing-field-initializers] collectives/broadcast.cc:27:48: warning: missing initializer for member ‘ncclInfo::opFull’ [-Wmissing-field-initializers] 27 | BROADCAST_CHUNKSTEPS, BROADCAST_SLICESTEPS }; | ^ collectives/broadcast.cc:27:48: warning: missing initializer for member ‘ncclInfo::algorithm’ [-Wmissing-field-initializers] collectives/broadcast.cc:27:48: warning: missing initializer for member ‘ncclInfo::protocol’ [-Wmissing-field-initializers] collectives/broadcast.cc:27:48: warning: missing initializer for member ‘ncclInfo::pattern’ [-Wmissing-field-initializers] collectives/broadcast.cc:27:48: warning: missing initializer for member ‘ncclInfo::nChannels’ [-Wmissing-field-initializers] collectives/broadcast.cc:27:48: warning: missing initializer for member ‘ncclInfo::nThreads’ [-Wmissing-field-initializers] collectives/broadcast.cc:27:48: warning: missing initializer for member ‘ncclInfo::nBytes’ [-Wmissing-field-initializers] collectives/broadcast.cc:27:48: warning: missing initializer for member ‘ncclInfo::nstepsPerLoop’ [-Wmissing-field-initializers] collectives/broadcast.cc:27:48: warning: missing initializer for member ‘ncclInfo::nchunksPerLoop’ [-Wmissing-field-initializers] collectives/broadcast.cc:27:48: warning: missing initializer for member ‘ncclInfo::chunkSize’ [-Wmissing-field-initializers] collectives/broadcast.cc:27:48: warning: missing initializer for member ‘ncclInfo::channelId’ [-Wmissing-field-initializers] In file included from include/core.h:62, from include/info.h:13, from include/graph.h:114, from include/transport.h:11: include/nvtx.h: At global scope: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ collectives/reduce.cc: In function ‘ncclResult_t ncclReduce(const void*, void*, size_t, ncclDataType_t, ncclRedOp_t, int, ncclComm_t, cudaStream_t)’: collectives/reduce.cc:25:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::description’ [-Wmissing-field-initializers] 25 | }; | ^ collectives/reduce.cc:25:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::arrayOrUnionDetail’ [-Wmissing-field-initializers] collectives/reduce.cc:25:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::offset’ [-Wmissing-field-initializers] collectives/reduce.cc:25:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::semantics’ [-Wmissing-field-initializers] collectives/reduce.cc:25:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::reserved’ [-Wmissing-field-initializers] collectives/reduce.cc:25:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::semantics’ [-Wmissing-field-initializers] collectives/reduce.cc:25:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::reserved’ [-Wmissing-field-initializers] collectives/reduce.cc:25:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::semantics’ [-Wmissing-field-initializers] collectives/reduce.cc:25:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::reserved’ [-Wmissing-field-initializers] collectives/reduce.cc:31:42: warning: missing initializer for member ‘ncclInfo::opFull’ [-Wmissing-field-initializers] 31 | REDUCE_CHUNKSTEPS, REDUCE_SLICESTEPS }; | ^ collectives/reduce.cc:31:42: warning: missing initializer for member ‘ncclInfo::algorithm’ [-Wmissing-field-initializers] collectives/reduce.cc:31:42: warning: missing initializer for member ‘ncclInfo::protocol’ [-Wmissing-field-initializers] collectives/reduce.cc:31:42: warning: missing initializer for member ‘ncclInfo::pattern’ [-Wmissing-field-initializers] collectives/reduce.cc:31:42: warning: missing initializer for member ‘ncclInfo::nChannels’ [-Wmissing-field-initializers] collectives/reduce.cc:31:42: warning: missing initializer for member ‘ncclInfo::nThreads’ [-Wmissing-field-initializers] collectives/reduce.cc:31:42: warning: missing initializer for member ‘ncclInfo::nBytes’ [-Wmissing-field-initializers] collectives/reduce.cc:31:42: warning: missing initializer for member ‘ncclInfo::nstepsPerLoop’ [-Wmissing-field-initializers] collectives/reduce.cc:31:42: warning: missing initializer for member ‘ncclInfo::nchunksPerLoop’ [-Wmissing-field-initializers] collectives/reduce.cc:31:42: warning: missing initializer for member ‘ncclInfo::chunkSize’ [-Wmissing-field-initializers] collectives/reduce.cc:31:42: warning: missing initializer for member ‘ncclInfo::channelId’ [-Wmissing-field-initializers] Compiling collectives/reduce_scatter.cc > /<>/build/obj/collectives/reduce_scatter.o mkdir -p `dirname /<>/build/obj/collectives/reduce_scatter.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c collectives/reduce_scatter.cc -o /<>/build/obj/collectives/reduce_scatter.o Compiling graph/topo.cc > /<>/build/obj/graph/topo.o mkdir -p `dirname /<>/build/obj/graph/topo.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c graph/topo.cc -o /<>/build/obj/graph/topo.o In file included from include/transport.h:10, from include/comm.h:10, from include/enqueue.h:10, from collectives/reduce_scatter.cc:7: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ Compiling graph/paths.cc > /<>/build/obj/graph/paths.o mkdir -p `dirname /<>/build/obj/graph/paths.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c graph/paths.cc -o /<>/build/obj/graph/paths.o Compiling graph/search.cc > /<>/build/obj/graph/search.o mkdir -p `dirname /<>/build/obj/graph/search.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c graph/search.cc -o /<>/build/obj/graph/search.o In file included from include/core.h:62, from include/info.h:13, from include/graph.h:114, from include/transport.h:11: include/nvtx.h: At global scope: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ collectives/reduce_scatter.cc: In function ‘ncclResult_t ncclReduceScatter(const void*, void*, size_t, ncclDataType_t, ncclRedOp_t, ncclComm*, cudaStream_t)’: collectives/reduce_scatter.cc:23:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::description’ [-Wmissing-field-initializers] 23 | }; | ^ collectives/reduce_scatter.cc:23:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::arrayOrUnionDetail’ [-Wmissing-field-initializers] collectives/reduce_scatter.cc:23:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::offset’ [-Wmissing-field-initializers] collectives/reduce_scatter.cc:23:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::semantics’ [-Wmissing-field-initializers] collectives/reduce_scatter.cc:23:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::reserved’ [-Wmissing-field-initializers] collectives/reduce_scatter.cc:23:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::semantics’ [-Wmissing-field-initializers] collectives/reduce_scatter.cc:23:3: warning: missing initializer for member ‘nvtxPayloadSchemaEntry_t::reserved’ [-Wmissing-field-initializers] collectives/reduce_scatter.cc:29:56: warning: missing initializer for member ‘ncclInfo::opFull’ [-Wmissing-field-initializers] 29 | REDUCESCATTER_CHUNKSTEPS, REDUCESCATTER_SLICESTEPS }; | ^ collectives/reduce_scatter.cc:29:56: warning: missing initializer for member ‘ncclInfo::algorithm’ [-Wmissing-field-initializers] collectives/reduce_scatter.cc:29:56: warning: missing initializer for member ‘ncclInfo::protocol’ [-Wmissing-field-initializers] collectives/reduce_scatter.cc:29:56: warning: missing initializer for member ‘ncclInfo::pattern’ [-Wmissing-field-initializers] collectives/reduce_scatter.cc:29:56: warning: missing initializer for member ‘ncclInfo::nChannels’ [-Wmissing-field-initializers] collectives/reduce_scatter.cc:29:56: warning: missing initializer for member ‘ncclInfo::nThreads’ [-Wmissing-field-initializers] collectives/reduce_scatter.cc:29:56: warning: missing initializer for member ‘ncclInfo::nBytes’ [-Wmissing-field-initializers] collectives/reduce_scatter.cc:29:56: warning: missing initializer for member ‘ncclInfo::nstepsPerLoop’ [-Wmissing-field-initializers] collectives/reduce_scatter.cc:29:56: warning: missing initializer for member ‘ncclInfo::nchunksPerLoop’ [-Wmissing-field-initializers] collectives/reduce_scatter.cc:29:56: warning: missing initializer for member ‘ncclInfo::chunkSize’ [-Wmissing-field-initializers] collectives/reduce_scatter.cc:29:56: warning: missing initializer for member ‘ncclInfo::channelId’ [-Wmissing-field-initializers] In file included from include/core.h:62, from graph/topo.cc:7: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ In file included from include/graph.h:11, from graph/topo.cc:8: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ In file included from include/core.h:62, from graph/paths.cc:7: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ In file included from include/graph.h:11, from graph/paths.cc:8: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ graph/topo.cc: In function ‘ncclResult_t pciPathToInt64(char*, int, int, int64_t*)’: graph/topo.cc:31:57: warning: unused parameter ‘minOffset’ [-Wunused-parameter] 31 | ncclResult_t pciPathToInt64(char* path, int offset, int minOffset, int64_t* id) { | ~~~~^~~~~~~~~ graph/topo.cc: In function ‘ncclResult_t ncclTopoAddGpu(ncclXmlNode*, ncclTopoSystem*, ncclTopoNode*)’: graph/topo.cc:363:80: warning: unused parameter ‘system’ [-Wunused-parameter] 363 | ncclResult_t ncclTopoAddGpu(struct ncclXmlNode* xmlGpu, struct ncclTopoSystem* system, struct ncclTopoNode* gpu) { | ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ In file included from include/core.h:59: include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = long int; size_t = long unsigned int]’: graph/topo.cc:188:7: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclTopoSystem; size_t = long unsigned int]’: graph/topo.cc:544:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclXml; size_t = long unsigned int]’: graph/topo.cc:597:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = int; size_t = long unsigned int]’: graph/topo.cc:687:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = long unsigned int; size_t = long unsigned int]’: graph/topo.cc:711:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ In file included from include/core.h:59: include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclTopoLinkList; size_t = long unsigned int]’: graph/paths.cc:36:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = int; size_t = long unsigned int]’: graph/paths.cc:620:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = long int; size_t = long unsigned int]’: graph/paths.cc:621:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ In file included from include/core.h:62, from graph/search.cc:7: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ In file included from include/graph.h:11, from graph/search.cc:8: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ graph/search.cc: In function ‘float getTotalBw(ncclTopoSystem*, ncclTopoNode*)’: graph/search.cc:27:48: warning: unused parameter ‘system’ [-Wunused-parameter] 27 | static float getTotalBw(struct ncclTopoSystem* system, struct ncclTopoNode* gpu) { | ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ In file included from include/core.h:59: include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = int; size_t = long unsigned int]’: graph/search.cc:377:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = ncclXml; size_t = long unsigned int]’: graph/search.cc:819:5: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ Compiling graph/connect.cc > /<>/build/obj/graph/connect.o mkdir -p `dirname /<>/build/obj/graph/connect.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c graph/connect.cc -o /<>/build/obj/graph/connect.o In file included from include/transport.h:10, from include/comm.h:10, from graph/connect.cc:7: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ In file included from include/core.h:62, from include/info.h:13, from include/graph.h:114, from include/transport.h:11: include/nvtx.h: At global scope: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ graph/connect.cc: In function ‘ncclResult_t connectTrees(ncclComm*, int*, int*, int*, int*)’: graph/connect.cc:130:119: warning: unused parameter ‘treePatterns’ [-Wunused-parameter] 130 | static ncclResult_t connectTrees(struct ncclComm* comm, int* treeToParent, int* treeToChild0, int* treeToChild1, int* treePatterns) { | ~~~~~^~~~~~~~~~~~ In file included from include/core.h:59: include/alloc.h: In instantiation of ‘ncclResult_t ncclCallocDebug(T**, size_t, const char*, int) [with T = int; size_t = long unsigned int]’: graph/connect.cc:174:3: required from here include/alloc.h:44:65: warning: unused parameter ‘filefunc’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~~~~~~~~~^~~~~~~~ include/alloc.h:44:79: warning: unused parameter ‘line’ [-Wunused-parameter] 44 | ncclResult_t ncclCallocDebug(T** ptr, size_t nelem, const char *filefunc, int line) { | ~~~~^~~~ Compiling graph/rings.cc > /<>/build/obj/graph/rings.o mkdir -p `dirname /<>/build/obj/graph/rings.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c graph/rings.cc -o /<>/build/obj/graph/rings.o In file included from include/core.h:62, from graph/rings.cc:7: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ graph/rings.cc: In function ‘ncclResult_t ncclBuildRings(int, int*, int, int, int*, int*)’: graph/rings.cc:22:80: warning: unused parameter ‘prev’ [-Wunused-parameter] 22 | ncclResult_t ncclBuildRings(int nrings, int* rings, int rank, int nranks, int* prev, int* next) { | ~~~~~^~~~ Compiling graph/trees.cc > /<>/build/obj/graph/trees.o mkdir -p `dirname /<>/build/obj/graph/trees.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c graph/trees.cc -o /<>/build/obj/graph/trees.o Compiling graph/tuning.cc > /<>/build/obj/graph/tuning.o mkdir -p `dirname /<>/build/obj/graph/tuning.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c graph/tuning.cc -o /<>/build/obj/graph/tuning.o Compiling graph/xml.cc > /<>/build/obj/graph/xml.o mkdir -p `dirname /<>/build/obj/graph/xml.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c graph/xml.cc -o /<>/build/obj/graph/xml.o In file included from include/core.h:62, from graph/tuning.cc:7: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ In file included from graph/tuning.cc:8: include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollBytes(int)’: include/devcomm.h:350:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 350 | __host__ __device__ constexpr int ncclNvlsUnrollBytes(int cudaArch = NCCL_CUDA_ARCH) { return 4*16; } | ^ include/devcomm.h: In function ‘constexpr int ncclNvlsUnrollInsns(int)’: include/devcomm.h:351:59: warning: unused parameter ‘cudaArch’ [-Wunused-parameter] 351 | __host__ __device__ constexpr int ncclNvlsUnrollInsns(int cudaArch = NCCL_CUDA_ARCH) { return 16; } | ^ In file included from include/core.h:62, from graph/xml.cc:12: include/nvtx.h:66:21: warning: missing initializer for member ‘nvtxPayloadSchemaAttr_t::schemaId’ [-Wmissing-field-initializers] 66 | nullptr, 0, 0, 0}; | ^ graph/xml.cc: In function ‘ncclResult_t ncclTopoGetXmlFromCpu(ncclXmlNode*, ncclXml*)’: graph/xml.cc:370:81: warning: unused parameter ‘xml’ [-Wunused-parameter] 370 | ncclResult_t ncclTopoGetXmlFromCpu(struct ncclXmlNode* cpuNode, struct ncclXml* xml) { | ~~~~~~~~~~~~~~~~^~~ Compiling enhcompat.cc > /<>/build/obj/enhcompat.o mkdir -p `dirname /<>/build/obj/enhcompat.o` cuda-g++ -I. -I/<>/build/include -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -Iinclude -c enhcompat.cc -o /<>/build/obj/enhcompat.o make -C collectives/device make[4]: Entering directory '/<>/src/collectives/device' NVCC_GENCODE is -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 Generating rules > /<>/build/obj/collectives/device/Makefile.rules NVCC_GENCODE is -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sum_i8.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_sum_i8.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sum_u8.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_sum_u8.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sum_i32.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_sum_i32.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sum_u32.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_sum_u32.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sum_i64.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_sum_i64.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sum_u64.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_sum_u64.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sum_f16.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_sum_f16.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sum_f32.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_sum_f32.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sum_f64.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_sum_f64.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sum_bf16.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_sum_bf16.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_prod_i8.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_prod_i8.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_prod_u8.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_prod_u8.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_prod_i32.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_prod_i32.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_prod_u32.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_prod_u32.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_prod_i64.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_prod_i64.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_prod_u64.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_prod_u64.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_prod_f16.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_prod_f16.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_prod_f32.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_prod_f32.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_prod_f64.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_prod_f64.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_prod_bf16.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_prod_bf16.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_min_i8.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_min_i8.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_min_u8.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_min_u8.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_min_i32.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_min_i32.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_min_u32.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_min_u32.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_min_i64.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_min_i64.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_min_u64.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_min_u64.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_min_f16.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_min_f16.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_min_f32.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_min_f32.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_min_f64.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_min_f64.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_min_bf16.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_min_bf16.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_max_i8.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_max_i8.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_max_u8.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_max_u8.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_max_i32.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_max_i32.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_max_u32.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_max_u32.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_max_i64.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_max_i64.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_max_u64.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_max_u64.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_max_f16.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_max_f16.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_max_f32.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_max_f32.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_max_f64.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_max_f64.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_max_bf16.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_max_bf16.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_premulsum_i8.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_premulsum_i8.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_premulsum_u8.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_premulsum_u8.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_premulsum_i32.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_premulsum_i32.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_premulsum_u32.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_premulsum_u32.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_premulsum_i64.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_premulsum_i64.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_premulsum_u64.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_premulsum_u64.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_premulsum_f16.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_premulsum_f16.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_premulsum_f32.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_premulsum_f32.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_premulsum_f64.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_premulsum_f64.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_premulsum_bf16.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_premulsum_bf16.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i8.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i8.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u8.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u8.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i32.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i32.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u32.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u32.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i64.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i64.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u64.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u64.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f16.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f16.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f32.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f32.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f64.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f64.cu Copying sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sumpostdiv_bf16.cu cp sendrecv.cu /<>/build/obj/collectives/device/sendrecv_sumpostdiv_bf16.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sum_i8.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_sum_i8.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sum_u8.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_sum_u8.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sum_i32.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_sum_i32.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sum_u32.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_sum_u32.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sum_i64.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_sum_i64.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sum_u64.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_sum_u64.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sum_f16.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_sum_f16.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sum_f32.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_sum_f32.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sum_f64.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_sum_f64.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sum_bf16.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_sum_bf16.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_prod_i8.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_prod_i8.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_prod_u8.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_prod_u8.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_prod_i32.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_prod_i32.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_prod_u32.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_prod_u32.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_prod_i64.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_prod_i64.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_prod_u64.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_prod_u64.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_prod_f16.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_prod_f16.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_prod_f32.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_prod_f32.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_prod_f64.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_prod_f64.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_prod_bf16.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_prod_bf16.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_min_i8.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_min_i8.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_min_u8.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_min_u8.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_min_i32.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_min_i32.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_min_u32.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_min_u32.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_min_i64.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_min_i64.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_min_u64.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_min_u64.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_min_f16.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_min_f16.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_min_f32.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_min_f32.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_min_f64.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_min_f64.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_min_bf16.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_min_bf16.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_max_i8.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_max_i8.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_max_u8.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_max_u8.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_max_i32.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_max_i32.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_max_u32.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_max_u32.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_max_i64.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_max_i64.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_max_u64.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_max_u64.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_max_f16.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_max_f16.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_max_f32.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_max_f32.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_max_f64.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_max_f64.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_max_bf16.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_max_bf16.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_premulsum_i8.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_premulsum_i8.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_premulsum_u8.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_premulsum_u8.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_premulsum_i32.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_premulsum_i32.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_premulsum_u32.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_premulsum_u32.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_premulsum_i64.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_premulsum_i64.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_premulsum_u64.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_premulsum_u64.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_premulsum_f16.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_premulsum_f16.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_premulsum_f32.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_premulsum_f32.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_premulsum_f64.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_premulsum_f64.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_premulsum_bf16.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_premulsum_bf16.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i8.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i8.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u8.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u8.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i32.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i32.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u32.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u32.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i64.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i64.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u64.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u64.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f16.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f16.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f32.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f32.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f64.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f64.cu Copying all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sumpostdiv_bf16.cu cp all_reduce.cu /<>/build/obj/collectives/device/all_reduce_sumpostdiv_bf16.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_sum_i8.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_sum_i8.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_sum_u8.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_sum_u8.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_sum_i32.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_sum_i32.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_sum_u32.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_sum_u32.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_sum_i64.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_sum_i64.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_sum_u64.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_sum_u64.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_sum_f16.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_sum_f16.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_sum_f32.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_sum_f32.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_sum_f64.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_sum_f64.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_sum_bf16.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_sum_bf16.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_prod_i8.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_prod_i8.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_prod_u8.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_prod_u8.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_prod_i32.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_prod_i32.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_prod_u32.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_prod_u32.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_prod_i64.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_prod_i64.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_prod_u64.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_prod_u64.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_prod_f16.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_prod_f16.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_prod_f32.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_prod_f32.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_prod_f64.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_prod_f64.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_prod_bf16.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_prod_bf16.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_min_i8.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_min_i8.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_min_u8.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_min_u8.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_min_i32.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_min_i32.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_min_u32.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_min_u32.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_min_i64.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_min_i64.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_min_u64.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_min_u64.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_min_f16.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_min_f16.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_min_f32.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_min_f32.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_min_f64.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_min_f64.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_min_bf16.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_min_bf16.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_max_i8.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_max_i8.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_max_u8.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_max_u8.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_max_i32.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_max_i32.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_max_u32.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_max_u32.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_max_i64.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_max_i64.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_max_u64.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_max_u64.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_max_f16.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_max_f16.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_max_f32.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_max_f32.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_max_f64.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_max_f64.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_max_bf16.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_max_bf16.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_premulsum_i8.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_premulsum_i8.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_premulsum_u8.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_premulsum_u8.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_premulsum_i32.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_premulsum_i32.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_premulsum_u32.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_premulsum_u32.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_premulsum_i64.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_premulsum_i64.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_premulsum_u64.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_premulsum_u64.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_premulsum_f16.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_premulsum_f16.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_premulsum_f32.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_premulsum_f32.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_premulsum_f64.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_premulsum_f64.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_premulsum_bf16.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_premulsum_bf16.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_sumpostdiv_i8.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_sumpostdiv_i8.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_sumpostdiv_u8.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_sumpostdiv_u8.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_sumpostdiv_i32.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_sumpostdiv_i32.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_sumpostdiv_u32.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_sumpostdiv_u32.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_sumpostdiv_i64.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_sumpostdiv_i64.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_sumpostdiv_u64.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_sumpostdiv_u64.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_sumpostdiv_f16.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_sumpostdiv_f16.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_sumpostdiv_f32.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_sumpostdiv_f32.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_sumpostdiv_f64.cu Copying all_gather.cu > /<>/build/obj/collectives/device/all_gather_sumpostdiv_bf16.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_sumpostdiv_f64.cu cp all_gather.cu /<>/build/obj/collectives/device/all_gather_sumpostdiv_bf16.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_sum_i8.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_sum_i8.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_sum_u8.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_sum_u8.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_sum_i32.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_sum_i32.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_sum_u32.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_sum_u32.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_sum_i64.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_sum_i64.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_sum_u64.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_sum_u64.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_sum_f16.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_sum_f16.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_sum_f32.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_sum_f32.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_sum_f64.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_sum_f64.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_sum_bf16.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_sum_bf16.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_prod_i8.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_prod_i8.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_prod_u8.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_prod_u8.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_prod_i32.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_prod_i32.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_prod_u32.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_prod_u32.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_prod_i64.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_prod_i64.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_prod_u64.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_prod_u64.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_prod_f16.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_prod_f16.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_prod_f32.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_prod_f32.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_prod_f64.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_prod_f64.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_prod_bf16.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_prod_bf16.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_min_i8.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_min_i8.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_min_u8.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_min_u8.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_min_i32.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_min_i32.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_min_u32.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_min_u32.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_min_i64.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_min_i64.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_min_u64.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_min_u64.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_min_f16.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_min_f16.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_min_f32.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_min_f32.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_min_f64.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_min_f64.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_min_bf16.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_min_bf16.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_max_i8.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_max_i8.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_max_u8.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_max_u8.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_max_i32.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_max_i32.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_max_u32.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_max_i64.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_max_u32.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_max_i64.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_max_u64.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_max_u64.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_max_f16.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_max_f16.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_max_f32.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_max_f32.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_max_f64.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_max_f64.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_max_bf16.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_max_bf16.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_premulsum_i8.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_premulsum_i8.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_premulsum_u8.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_premulsum_u8.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_premulsum_i32.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_premulsum_i32.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_premulsum_u32.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_premulsum_u32.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_premulsum_i64.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_premulsum_i64.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_premulsum_u64.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_premulsum_u64.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_premulsum_f16.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_premulsum_f16.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_premulsum_f32.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_premulsum_f32.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_premulsum_f64.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_premulsum_f64.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_premulsum_bf16.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_premulsum_bf16.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_sumpostdiv_i8.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_sumpostdiv_i8.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_sumpostdiv_u8.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_sumpostdiv_u8.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_sumpostdiv_i32.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_sumpostdiv_i32.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_sumpostdiv_u32.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_sumpostdiv_u32.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_sumpostdiv_i64.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_sumpostdiv_i64.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_sumpostdiv_u64.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_sumpostdiv_u64.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_sumpostdiv_f16.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_sumpostdiv_f16.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_sumpostdiv_f32.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_sumpostdiv_f32.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_sumpostdiv_f64.cu Copying broadcast.cu > /<>/build/obj/collectives/device/broadcast_sumpostdiv_bf16.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_sumpostdiv_f64.cu cp broadcast.cu /<>/build/obj/collectives/device/broadcast_sumpostdiv_bf16.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_sum_i8.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_sum_i8.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_sum_u8.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_sum_u8.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_sum_i32.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_sum_i32.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_sum_u32.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_sum_u32.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_sum_i64.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_sum_i64.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_sum_u64.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_sum_u64.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_sum_f16.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_sum_f16.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_sum_f32.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_sum_f32.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_sum_f64.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_sum_f64.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_sum_bf16.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_sum_bf16.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_prod_i8.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_prod_i8.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_prod_u8.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_prod_u8.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_prod_i32.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_prod_i32.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_prod_u32.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_prod_u32.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_prod_i64.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_prod_i64.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_prod_u64.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_prod_u64.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_prod_f16.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_prod_f16.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_prod_f32.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_prod_f32.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_prod_f64.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_prod_f64.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_prod_bf16.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_prod_bf16.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_min_i8.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_min_i8.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_min_u8.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_min_u8.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_min_i32.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_min_i32.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_min_u32.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_min_u32.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_min_i64.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_min_i64.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_min_u64.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_min_u64.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_min_f16.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_min_f16.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_min_f32.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_min_f32.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_min_f64.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_min_f64.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_min_bf16.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_min_bf16.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_max_i8.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_max_i8.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_max_u8.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_max_u8.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_max_i32.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_max_i32.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_max_u32.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_max_u32.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_max_i64.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_max_i64.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_max_u64.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_max_u64.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_max_f16.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_max_f16.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_max_f32.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_max_f32.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_max_f64.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_max_f64.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_max_bf16.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_max_bf16.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_premulsum_i8.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_premulsum_i8.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_premulsum_u8.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_premulsum_u8.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_premulsum_i32.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_premulsum_i32.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_premulsum_u32.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_premulsum_u32.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_premulsum_i64.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_premulsum_i64.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_premulsum_u64.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_premulsum_u64.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_premulsum_f16.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_premulsum_f16.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_premulsum_f32.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_premulsum_f32.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_premulsum_f64.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_premulsum_f64.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_premulsum_bf16.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_premulsum_bf16.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_sumpostdiv_i8.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_sumpostdiv_i8.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_sumpostdiv_u8.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_sumpostdiv_u8.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_sumpostdiv_i32.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_sumpostdiv_i32.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_sumpostdiv_u32.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_sumpostdiv_u32.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_sumpostdiv_i64.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_sumpostdiv_i64.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_sumpostdiv_u64.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_sumpostdiv_u64.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_sumpostdiv_f16.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_sumpostdiv_f16.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_sumpostdiv_f32.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_sumpostdiv_f32.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_sumpostdiv_f64.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_sumpostdiv_f64.cu Copying reduce.cu > /<>/build/obj/collectives/device/reduce_sumpostdiv_bf16.cu cp reduce.cu /<>/build/obj/collectives/device/reduce_sumpostdiv_bf16.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sum_i8.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_sum_i8.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sum_u8.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_sum_u8.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sum_i32.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_sum_i32.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sum_u32.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_sum_u32.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sum_i64.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_sum_i64.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sum_u64.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_sum_u64.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sum_f16.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_sum_f16.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sum_f32.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_sum_f32.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sum_f64.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_sum_f64.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sum_bf16.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_sum_bf16.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_prod_i8.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_prod_i8.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_prod_u8.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_prod_u8.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_prod_i32.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_prod_i32.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_prod_u32.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_prod_u32.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_prod_i64.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_prod_i64.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_prod_u64.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_prod_u64.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_prod_f16.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_prod_f16.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_prod_f32.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_prod_f32.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_prod_f64.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_prod_f64.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_prod_bf16.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_prod_bf16.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_min_i8.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_min_i8.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_min_u8.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_min_u8.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_min_i32.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_min_i32.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_min_u32.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_min_u32.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_min_i64.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_min_i64.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_min_u64.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_min_u64.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_min_f16.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_min_f16.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_min_f32.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_min_f32.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_min_f64.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_min_f64.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_min_bf16.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_min_bf16.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_max_i8.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_max_i8.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_max_u8.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_max_u8.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_max_i32.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_max_i32.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_max_u32.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_max_u32.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_max_i64.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_max_i64.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_max_u64.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_max_u64.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_max_f16.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_max_f16.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_max_f32.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_max_f32.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_max_f64.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_max_f64.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_max_bf16.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_max_bf16.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_premulsum_i8.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_premulsum_i8.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_premulsum_u8.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_premulsum_u8.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_premulsum_i32.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_premulsum_i32.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_premulsum_u32.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_premulsum_u32.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_premulsum_i64.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_premulsum_i64.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_premulsum_u64.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_premulsum_u64.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_premulsum_f16.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_premulsum_f16.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_premulsum_f32.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_premulsum_f32.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_premulsum_f64.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_premulsum_f64.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_premulsum_bf16.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_premulsum_bf16.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i8.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i8.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u8.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u8.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i32.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i32.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u32.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u32.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i64.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i64.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u64.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u64.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f16.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f16.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f32.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f32.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f64.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f64.cu Copying reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_bf16.cu cp reduce_scatter.cu /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_bf16.cu Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sum_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_sum_i8.cu -o /<>/build/obj/collectives/device/sendrecv_sum_i8.o Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sum_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_sum_u8.cu -o /<>/build/obj/collectives/device/sendrecv_sum_u8.o ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sum_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_sum_i32.cu -o /<>/build/obj/collectives/device/sendrecv_sum_i32.o Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sum_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_sum_u32.cu -o /<>/build/obj/collectives/device/sendrecv_sum_u32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_SendRecv_RING_SIMPLE_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z42ncclKernel_SendRecv_RING_SIMPLE_Sum_int8_tP11ncclDevCommmP8ncclWork 136 bytes stack frame, 152 bytes spill stores, 212 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Function properties for _Z44ncclFunction_SendRecv_RING_SIMPLE_Sum_int8_tv 200 bytes stack frame, 200 bytes spill stores, 204 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_SendRecv_RING_SIMPLE_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z42ncclKernel_SendRecv_RING_SIMPLE_Sum_int8_tP11ncclDevCommmP8ncclWork 136 bytes stack frame, 152 bytes spill stores, 212 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Function properties for _Z44ncclFunction_SendRecv_RING_SIMPLE_Sum_int8_tv 200 bytes stack frame, 200 bytes spill stores, 204 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sum_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_sum_i64.cu -o /<>/build/obj/collectives/device/sendrecv_sum_i64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sum_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_sum_u64.cu -o /<>/build/obj/collectives/device/sendrecv_sum_u64.o ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_SendRecv_RING_SIMPLE_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z42ncclKernel_SendRecv_RING_SIMPLE_Sum_int8_tP11ncclDevCommmP8ncclWork 136 bytes stack frame, 152 bytes spill stores, 212 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Function properties for _Z44ncclFunction_SendRecv_RING_SIMPLE_Sum_int8_tv 200 bytes stack frame, 200 bytes spill stores, 204 bytes spill loads Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sum_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_sum_f16.cu -o /<>/build/obj/collectives/device/sendrecv_sum_f16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_SendRecv_RING_SIMPLE_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z42ncclKernel_SendRecv_RING_SIMPLE_Sum_int8_tP11ncclDevCommmP8ncclWork 120 bytes stack frame, 144 bytes spill stores, 208 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z44ncclFunction_SendRecv_RING_SIMPLE_Sum_int8_tv 224 bytes stack frame, 228 bytes spill stores, 236 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sum_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_sum_f32.cu -o /<>/build/obj/collectives/device/sendrecv_sum_f32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_SendRecv_RING_SIMPLE_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z42ncclKernel_SendRecv_RING_SIMPLE_Sum_int8_tP11ncclDevCommmP8ncclWork 256 bytes stack frame, 596 bytes spill stores, 952 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z44ncclFunction_SendRecv_RING_SIMPLE_Sum_int8_tv 352 bytes stack frame, 600 bytes spill stores, 864 bytes spill loads ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sum_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_sum_f64.cu -o /<>/build/obj/collectives/device/sendrecv_sum_f64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sum_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_sum_bf16.cu -o /<>/build/obj/collectives/device/sendrecv_sum_bf16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_SendRecv_RING_SIMPLE_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z42ncclKernel_SendRecv_RING_SIMPLE_Sum_int8_tP11ncclDevCommmP8ncclWork 168 bytes stack frame, 516 bytes spill stores, 788 bytes spill loads ptxas info : Used 96 registers ptxas info : Function properties for _Z44ncclFunction_SendRecv_RING_SIMPLE_Sum_int8_tv 264 bytes stack frame, 396 bytes spill stores, 552 bytes spill loads ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_prod_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_prod_i8.cu -o /<>/build/obj/collectives/device/sendrecv_prod_i8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_prod_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_prod_u8.cu -o /<>/build/obj/collectives/device/sendrecv_prod_u8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_prod_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_prod_i32.cu -o /<>/build/obj/collectives/device/sendrecv_prod_i32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_prod_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_prod_u32.cu -o /<>/build/obj/collectives/device/sendrecv_prod_u32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_prod_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_prod_i64.cu -o /<>/build/obj/collectives/device/sendrecv_prod_i64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_prod_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_prod_u64.cu -o /<>/build/obj/collectives/device/sendrecv_prod_u64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_prod_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_prod_f16.cu -o /<>/build/obj/collectives/device/sendrecv_prod_f16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_prod_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_prod_f32.cu -o /<>/build/obj/collectives/device/sendrecv_prod_f32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_prod_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_prod_f64.cu -o /<>/build/obj/collectives/device/sendrecv_prod_f64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_prod_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_prod_bf16.cu -o /<>/build/obj/collectives/device/sendrecv_prod_bf16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_min_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_min_i8.cu -o /<>/build/obj/collectives/device/sendrecv_min_i8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_min_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_min_u8.cu -o /<>/build/obj/collectives/device/sendrecv_min_u8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_min_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_min_i32.cu -o /<>/build/obj/collectives/device/sendrecv_min_i32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_min_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_min_u32.cu -o /<>/build/obj/collectives/device/sendrecv_min_u32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_min_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_min_i64.cu -o /<>/build/obj/collectives/device/sendrecv_min_i64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_min_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_min_u64.cu -o /<>/build/obj/collectives/device/sendrecv_min_u64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_min_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_min_f16.cu -o /<>/build/obj/collectives/device/sendrecv_min_f16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_min_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_min_f32.cu -o /<>/build/obj/collectives/device/sendrecv_min_f32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_min_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_min_f64.cu -o /<>/build/obj/collectives/device/sendrecv_min_f64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_min_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_min_bf16.cu -o /<>/build/obj/collectives/device/sendrecv_min_bf16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_max_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_max_i8.cu -o /<>/build/obj/collectives/device/sendrecv_max_i8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_max_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_max_u8.cu -o /<>/build/obj/collectives/device/sendrecv_max_u8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_max_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_max_i32.cu -o /<>/build/obj/collectives/device/sendrecv_max_i32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_max_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_max_u32.cu -o /<>/build/obj/collectives/device/sendrecv_max_u32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_max_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_max_i64.cu -o /<>/build/obj/collectives/device/sendrecv_max_i64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_max_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_max_u64.cu -o /<>/build/obj/collectives/device/sendrecv_max_u64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_max_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_max_f16.cu -o /<>/build/obj/collectives/device/sendrecv_max_f16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_max_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_max_f32.cu -o /<>/build/obj/collectives/device/sendrecv_max_f32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_max_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_max_f64.cu -o /<>/build/obj/collectives/device/sendrecv_max_f64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_max_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_max_bf16.cu -o /<>/build/obj/collectives/device/sendrecv_max_bf16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_premulsum_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_premulsum_i8.cu -o /<>/build/obj/collectives/device/sendrecv_premulsum_i8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_premulsum_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_premulsum_u8.cu -o /<>/build/obj/collectives/device/sendrecv_premulsum_u8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_premulsum_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_premulsum_i32.cu -o /<>/build/obj/collectives/device/sendrecv_premulsum_i32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_premulsum_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_premulsum_u32.cu -o /<>/build/obj/collectives/device/sendrecv_premulsum_u32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_premulsum_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_premulsum_i64.cu -o /<>/build/obj/collectives/device/sendrecv_premulsum_i64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_premulsum_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_premulsum_u64.cu -o /<>/build/obj/collectives/device/sendrecv_premulsum_u64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_premulsum_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_premulsum_f16.cu -o /<>/build/obj/collectives/device/sendrecv_premulsum_f16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_premulsum_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_premulsum_f32.cu -o /<>/build/obj/collectives/device/sendrecv_premulsum_f32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_premulsum_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_premulsum_f64.cu -o /<>/build/obj/collectives/device/sendrecv_premulsum_f64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_premulsum_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_premulsum_bf16.cu -o /<>/build/obj/collectives/device/sendrecv_premulsum_bf16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i8.cu -o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u8.cu -o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i32.cu -o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u32.cu -o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i64.cu -o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u64.cu -o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f16.cu -o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f32.cu -o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f64.cu -o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling sendrecv.cu > /<>/build/obj/collectives/device/sendrecv_sumpostdiv_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/sendrecv_sumpostdiv_bf16.cu -o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_bf16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sum_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_sum_i8.cu -o /<>/build/obj/collectives/device/all_reduce_sum_i8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sum_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_sum_u8.cu -o /<>/build/obj/collectives/device/all_reduce_sum_u8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sum_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_sum_i32.cu -o /<>/build/obj/collectives/device/all_reduce_sum_i32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sum_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_sum_u32.cu -o /<>/build/obj/collectives/device/all_reduce_sum_u32.o ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z39ncclKernel_AllReduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z39ncclKernel_AllReduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 94 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z39ncclKernel_AllReduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 144 bytes stack frame, 300 bytes spill stores, 492 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 512 bytes stack frame, 1196 bytes spill stores, 1760 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Sum_int8_tv 296 bytes stack frame, 320 bytes spill stores, 512 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Sum_int8_tv 528 bytes stack frame, 372 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Sum_int8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Sum_int8_tv 472 bytes stack frame, 992 bytes spill stores, 1440 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Sum_int8_tv 520 bytes stack frame, 368 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Sum_int8_tv 344 bytes stack frame, 536 bytes spill stores, 648 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z40ncclKernel_AllReduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z40ncclKernel_AllReduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 94 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z40ncclKernel_AllReduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 144 bytes stack frame, 300 bytes spill stores, 492 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_uint8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_uint8_tv 512 bytes stack frame, 1196 bytes spill stores, 1760 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Sum_uint8_tv 296 bytes stack frame, 320 bytes spill stores, 512 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Sum_uint8_tv 528 bytes stack frame, 372 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Sum_uint8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Sum_uint8_tv 472 bytes stack frame, 992 bytes spill stores, 1440 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Sum_uint8_tv 520 bytes stack frame, 368 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Sum_uint8_tv 344 bytes stack frame, 536 bytes spill stores, 648 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z40ncclKernel_AllReduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z40ncclKernel_AllReduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z40ncclKernel_AllReduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 152 bytes stack frame, 300 bytes spill stores, 452 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_int32_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_int32_tv 488 bytes stack frame, 1004 bytes spill stores, 1208 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Sum_int32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Sum_int32_tv 544 bytes stack frame, 412 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Sum_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Sum_int32_tv 328 bytes stack frame, 472 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Sum_int32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Sum_int32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z41ncclKernel_AllReduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z41ncclKernel_AllReduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 152 bytes stack frame, 300 bytes spill stores, 452 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_uint32_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_uint32_tv 488 bytes stack frame, 1004 bytes spill stores, 1208 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Sum_uint32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Sum_uint32_tv 544 bytes stack frame, 412 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Sum_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Sum_uint32_tv 328 bytes stack frame, 472 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Sum_uint32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Sum_uint32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z40ncclKernel_AllReduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z40ncclKernel_AllReduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z40ncclKernel_AllReduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 152 bytes stack frame, 300 bytes spill stores, 452 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_int32_tv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_int32_tv 488 bytes stack frame, 1004 bytes spill stores, 1208 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Sum_int32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Sum_int32_tv 544 bytes stack frame, 412 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Sum_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Sum_int32_tv 328 bytes stack frame, 472 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Sum_int32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Sum_int32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z39ncclKernel_AllReduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z39ncclKernel_AllReduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 94 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z39ncclKernel_AllReduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 144 bytes stack frame, 300 bytes spill stores, 492 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 512 bytes stack frame, 1196 bytes spill stores, 1760 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Sum_int8_tv 296 bytes stack frame, 320 bytes spill stores, 512 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Sum_int8_tv 528 bytes stack frame, 372 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Sum_int8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Sum_int8_tv 472 bytes stack frame, 992 bytes spill stores, 1440 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Sum_int8_tv 520 bytes stack frame, 368 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Sum_int8_tv 344 bytes stack frame, 536 bytes spill stores, 648 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z40ncclKernel_AllReduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z40ncclKernel_AllReduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 94 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z40ncclKernel_AllReduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 144 bytes stack frame, 300 bytes spill stores, 492 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_uint8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_uint8_tv 512 bytes stack frame, 1196 bytes spill stores, 1760 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Sum_uint8_tv 296 bytes stack frame, 320 bytes spill stores, 512 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Sum_uint8_tv 528 bytes stack frame, 372 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Sum_uint8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Sum_uint8_tv 472 bytes stack frame, 992 bytes spill stores, 1440 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Sum_uint8_tv 520 bytes stack frame, 368 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Sum_uint8_tv 344 bytes stack frame, 536 bytes spill stores, 648 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z41ncclKernel_AllReduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z41ncclKernel_AllReduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 152 bytes stack frame, 300 bytes spill stores, 452 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_uint32_tv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_uint32_tv 488 bytes stack frame, 1004 bytes spill stores, 1208 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Sum_uint32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Sum_uint32_tv 544 bytes stack frame, 412 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Sum_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Sum_uint32_tv 328 bytes stack frame, 472 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Sum_uint32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Sum_uint32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z40ncclKernel_AllReduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z40ncclKernel_AllReduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z40ncclKernel_AllReduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 152 bytes stack frame, 300 bytes spill stores, 452 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_int32_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_int32_tv 488 bytes stack frame, 1004 bytes spill stores, 1208 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Sum_int32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Sum_int32_tv 544 bytes stack frame, 412 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Sum_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Sum_int32_tv 328 bytes stack frame, 472 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Sum_int32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Sum_int32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z41ncclKernel_AllReduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z41ncclKernel_AllReduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 152 bytes stack frame, 300 bytes spill stores, 452 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_uint32_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_uint32_tv 488 bytes stack frame, 1004 bytes spill stores, 1208 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Sum_uint32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Sum_uint32_tv 544 bytes stack frame, 412 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Sum_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Sum_uint32_tv 328 bytes stack frame, 472 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Sum_uint32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Sum_uint32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z39ncclKernel_AllReduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z39ncclKernel_AllReduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 94 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z39ncclKernel_AllReduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 144 bytes stack frame, 300 bytes spill stores, 492 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 512 bytes stack frame, 1196 bytes spill stores, 1760 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Sum_int8_tv 296 bytes stack frame, 320 bytes spill stores, 512 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Sum_int8_tv 528 bytes stack frame, 372 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Sum_int8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Sum_int8_tv 472 bytes stack frame, 992 bytes spill stores, 1440 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Sum_int8_tv 520 bytes stack frame, 368 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Sum_int8_tv 344 bytes stack frame, 536 bytes spill stores, 648 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z40ncclKernel_AllReduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z40ncclKernel_AllReduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 94 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z40ncclKernel_AllReduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 144 bytes stack frame, 300 bytes spill stores, 492 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_uint8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_uint8_tv 512 bytes stack frame, 1196 bytes spill stores, 1760 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Sum_uint8_tv 296 bytes stack frame, 320 bytes spill stores, 512 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Sum_uint8_tv 528 bytes stack frame, 372 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Sum_uint8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Sum_uint8_tv 472 bytes stack frame, 992 bytes spill stores, 1440 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Sum_uint8_tv 520 bytes stack frame, 368 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Sum_uint8_tv 344 bytes stack frame, 536 bytes spill stores, 648 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z40ncclKernel_AllReduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z40ncclKernel_AllReduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 16 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z40ncclKernel_AllReduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 176 bytes stack frame, 324 bytes spill stores, 524 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_int32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_int32_tv 544 bytes stack frame, 1156 bytes spill stores, 1496 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Sum_int32_tv 296 bytes stack frame, 320 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Sum_int32_tv 600 bytes stack frame, 472 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Sum_int32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Sum_int32_tv 360 bytes stack frame, 588 bytes spill stores, 812 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Sum_int32_tv 576 bytes stack frame, 420 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Sum_int32_tv 368 bytes stack frame, 532 bytes spill stores, 720 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z41ncclKernel_AllReduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 16 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z41ncclKernel_AllReduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 176 bytes stack frame, 324 bytes spill stores, 524 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_uint32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_uint32_tv 544 bytes stack frame, 1156 bytes spill stores, 1496 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Sum_uint32_tv 296 bytes stack frame, 320 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Sum_uint32_tv 600 bytes stack frame, 472 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Sum_uint32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Sum_uint32_tv 360 bytes stack frame, 588 bytes spill stores, 812 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Sum_uint32_tv 576 bytes stack frame, 420 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Sum_uint32_tv 368 bytes stack frame, 532 bytes spill stores, 720 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z39ncclKernel_AllReduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z39ncclKernel_AllReduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z39ncclKernel_AllReduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 160 bytes stack frame, 316 bytes spill stores, 556 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 232 bytes stack frame, 252 bytes spill stores, 244 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 560 bytes stack frame, 1184 bytes spill stores, 1684 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Sum_int8_tv 344 bytes stack frame, 388 bytes spill stores, 632 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Sum_int8_tv 576 bytes stack frame, 428 bytes spill stores, 472 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Sum_int8_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Sum_int8_tv 568 bytes stack frame, 1156 bytes spill stores, 1588 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Sum_int8_tv 592 bytes stack frame, 432 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Sum_int8_tv 360 bytes stack frame, 532 bytes spill stores, 704 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z40ncclKernel_AllReduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z40ncclKernel_AllReduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z40ncclKernel_AllReduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 160 bytes stack frame, 316 bytes spill stores, 556 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_uint8_tv 232 bytes stack frame, 252 bytes spill stores, 244 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_uint8_tv 560 bytes stack frame, 1184 bytes spill stores, 1684 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Sum_uint8_tv 344 bytes stack frame, 388 bytes spill stores, 632 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Sum_uint8_tv 576 bytes stack frame, 428 bytes spill stores, 472 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Sum_uint8_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Sum_uint8_tv 568 bytes stack frame, 1156 bytes spill stores, 1588 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Sum_uint8_tv 592 bytes stack frame, 432 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Sum_uint8_tv 360 bytes stack frame, 532 bytes spill stores, 704 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z40ncclKernel_AllReduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z40ncclKernel_AllReduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 16 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z40ncclKernel_AllReduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 176 bytes stack frame, 324 bytes spill stores, 524 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_int32_tv 288 bytes stack frame, 312 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_int32_tv 576 bytes stack frame, 1196 bytes spill stores, 1792 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Sum_int32_tv 408 bytes stack frame, 484 bytes spill stores, 1112 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Sum_int32_tv 600 bytes stack frame, 476 bytes spill stores, 536 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Sum_int32_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Sum_int32_tv 520 bytes stack frame, 1044 bytes spill stores, 1392 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Sum_int32_tv 576 bytes stack frame, 416 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Sum_int32_tv 384 bytes stack frame, 548 bytes spill stores, 748 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z41ncclKernel_AllReduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 16 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z41ncclKernel_AllReduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 176 bytes stack frame, 324 bytes spill stores, 524 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_uint32_tv 288 bytes stack frame, 312 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_uint32_tv 576 bytes stack frame, 1196 bytes spill stores, 1792 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Sum_uint32_tv 408 bytes stack frame, 484 bytes spill stores, 1112 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Sum_uint32_tv 600 bytes stack frame, 476 bytes spill stores, 536 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Sum_uint32_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Sum_uint32_tv 520 bytes stack frame, 1044 bytes spill stores, 1392 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Sum_uint32_tv 576 bytes stack frame, 416 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Sum_uint32_tv 384 bytes stack frame, 548 bytes spill stores, 748 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z39ncclKernel_AllReduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z39ncclKernel_AllReduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z39ncclKernel_AllReduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 152 bytes stack frame, 316 bytes spill stores, 560 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 376 bytes stack frame, 628 bytes spill stores, 1316 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 688 bytes stack frame, 1996 bytes spill stores, 3516 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Sum_int8_tv 624 bytes stack frame, 880 bytes spill stores, 1840 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Sum_int8_tv 560 bytes stack frame, 416 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Sum_int8_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Sum_int8_tv 944 bytes stack frame, 1928 bytes spill stores, 2608 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Sum_int8_tv 584 bytes stack frame, 424 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Sum_int8_tv 376 bytes stack frame, 548 bytes spill stores, 736 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z40ncclKernel_AllReduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z40ncclKernel_AllReduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z40ncclKernel_AllReduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 152 bytes stack frame, 316 bytes spill stores, 560 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_uint8_tv 376 bytes stack frame, 628 bytes spill stores, 1316 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_uint8_tv 688 bytes stack frame, 1996 bytes spill stores, 3516 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Sum_uint8_tv 624 bytes stack frame, 880 bytes spill stores, 1840 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Sum_uint8_tv 560 bytes stack frame, 416 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Sum_uint8_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Sum_uint8_tv 944 bytes stack frame, 1928 bytes spill stores, 2608 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Sum_uint8_tv 584 bytes stack frame, 424 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Sum_uint8_tv 376 bytes stack frame, 548 bytes spill stores, 736 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z40ncclKernel_AllReduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z40ncclKernel_AllReduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads ptxas info : Used 96 registers ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z40ncclKernel_AllReduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 144 bytes stack frame, 272 bytes spill stores, 444 bytes spill loads ptxas info : Used 96 registers ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_int32_tv 440 bytes stack frame, 736 bytes spill stores, 1092 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Sum_int32_tv 408 bytes stack frame, 648 bytes spill stores, 924 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_int32_tv 240 bytes stack frame, 268 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_int32_tv 552 bytes stack frame, 1244 bytes spill stores, 1872 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Sum_int32_tv 312 bytes stack frame, 340 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Sum_int32_tv 544 bytes stack frame, 404 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Sum_int32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Sum_int32_tv 440 bytes stack frame, 928 bytes spill stores, 1288 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Sum_int32_tv 488 bytes stack frame, 300 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Sum_int32_tv 360 bytes stack frame, 548 bytes spill stores, 756 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sum_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_sum_i64.cu -o /<>/build/obj/collectives/device/all_reduce_sum_i64.o ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z41ncclKernel_AllReduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads ptxas info : Used 96 registers ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z41ncclKernel_AllReduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 144 bytes stack frame, 272 bytes spill stores, 444 bytes spill loads ptxas info : Used 96 registers ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_uint32_tv 440 bytes stack frame, 736 bytes spill stores, 1092 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Sum_uint32_tv 408 bytes stack frame, 648 bytes spill stores, 924 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_uint32_tv 240 bytes stack frame, 268 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_uint32_tv 552 bytes stack frame, 1244 bytes spill stores, 1872 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Sum_uint32_tv 312 bytes stack frame, 340 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Sum_uint32_tv 544 bytes stack frame, 404 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Sum_uint32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Sum_uint32_tv 440 bytes stack frame, 928 bytes spill stores, 1288 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Sum_uint32_tv 488 bytes stack frame, 300 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Sum_uint32_tv 360 bytes stack frame, 548 bytes spill stores, 756 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sum_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_sum_u64.cu -o /<>/build/obj/collectives/device/all_reduce_sum_u64.o ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z40ncclKernel_AllReduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z40ncclKernel_AllReduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z40ncclKernel_AllReduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 136 bytes stack frame, 268 bytes spill stores, 488 bytes spill loads ptxas info : Used 96 registers ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_uint8_tv 360 bytes stack frame, 624 bytes spill stores, 1452 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_uint8_tv 672 bytes stack frame, 2036 bytes spill stores, 3728 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Sum_uint8_tv 416 bytes stack frame, 656 bytes spill stores, 1368 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Sum_uint8_tv 520 bytes stack frame, 372 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Sum_uint8_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Sum_uint8_tv 880 bytes stack frame, 1832 bytes spill stores, 2508 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Sum_uint8_tv 480 bytes stack frame, 272 bytes spill stores, 248 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Sum_uint8_tv 360 bytes stack frame, 532 bytes spill stores, 744 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z39ncclKernel_AllReduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z39ncclKernel_AllReduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z39ncclKernel_AllReduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 136 bytes stack frame, 268 bytes spill stores, 488 bytes spill loads ptxas info : Used 96 registers ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 360 bytes stack frame, 624 bytes spill stores, 1452 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 672 bytes stack frame, 2036 bytes spill stores, 3728 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Sum_int8_tv 416 bytes stack frame, 656 bytes spill stores, 1368 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Sum_int8_tv 520 bytes stack frame, 372 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Sum_int8_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Sum_int8_tv 880 bytes stack frame, 1832 bytes spill stores, 2508 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Sum_int8_tv 480 bytes stack frame, 272 bytes spill stores, 248 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Sum_int8_tv 360 bytes stack frame, 532 bytes spill stores, 744 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sum_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_sum_f16.cu -o /<>/build/obj/collectives/device/all_reduce_sum_f16.o Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sum_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_sum_f32.cu -o /<>/build/obj/collectives/device/all_reduce_sum_f32.o ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z40ncclKernel_AllReduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z40ncclKernel_AllReduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 94 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z40ncclKernel_AllReduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 144 bytes stack frame, 264 bytes spill stores, 456 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_int64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_int64_tv 568 bytes stack frame, 1228 bytes spill stores, 1736 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Sum_int64_tv 256 bytes stack frame, 280 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Sum_int64_tv 552 bytes stack frame, 420 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Sum_int64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Sum_int64_tv 304 bytes stack frame, 380 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Sum_int64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Sum_int64_tv 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z41ncclKernel_AllReduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 94 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z41ncclKernel_AllReduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 144 bytes stack frame, 264 bytes spill stores, 456 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_uint64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_uint64_tv 568 bytes stack frame, 1228 bytes spill stores, 1736 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Sum_uint64_tv 256 bytes stack frame, 280 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Sum_uint64_tv 552 bytes stack frame, 420 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Sum_uint64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Sum_uint64_tv 304 bytes stack frame, 380 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Sum_uint64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Sum_uint64_tv 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z43ncclKernel_AllReduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z43ncclKernel_AllReduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_AllReduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z38ncclKernel_AllReduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z47ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z48ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_AllReduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z38ncclKernel_AllReduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z38ncclKernel_AllReduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z38ncclKernel_AllReduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 152 bytes stack frame, 300 bytes spill stores, 452 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_floatv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_floatv 496 bytes stack frame, 1016 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Sum_floatv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Sum_floatv 544 bytes stack frame, 396 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Sum_floatv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Sum_floatv 344 bytes stack frame, 500 bytes spill stores, 688 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Sum_floatv 528 bytes stack frame, 376 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Sum_floatv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_AllReduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z42ncclKernel_AllReduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_AllReduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z37ncclKernel_AllReduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z46ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z47ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_AllReduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z37ncclKernel_AllReduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z37ncclKernel_AllReduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z37ncclKernel_AllReduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 144 bytes stack frame, 296 bytes spill stores, 480 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_NVLS_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_halfv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_halfv 488 bytes stack frame, 984 bytes spill stores, 1300 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_SIMPLE_Sum_halfv 272 bytes stack frame, 304 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL128_Sum_halfv 544 bytes stack frame, 412 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_RING_LL_Sum_halfv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_SIMPLE_Sum_halfv 408 bytes stack frame, 652 bytes spill stores, 1000 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL128_Sum_halfv 616 bytes stack frame, 496 bytes spill stores, 648 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_TREE_LL_Sum_halfv 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z40ncclKernel_AllReduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z40ncclKernel_AllReduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 94 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z40ncclKernel_AllReduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 144 bytes stack frame, 264 bytes spill stores, 456 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_int64_tv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_int64_tv 568 bytes stack frame, 1228 bytes spill stores, 1736 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Sum_int64_tv 256 bytes stack frame, 280 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Sum_int64_tv 552 bytes stack frame, 420 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Sum_int64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Sum_int64_tv 304 bytes stack frame, 380 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Sum_int64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Sum_int64_tv 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z41ncclKernel_AllReduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 94 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z41ncclKernel_AllReduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 144 bytes stack frame, 264 bytes spill stores, 456 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_uint64_tv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_uint64_tv 568 bytes stack frame, 1228 bytes spill stores, 1736 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Sum_uint64_tv 256 bytes stack frame, 280 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Sum_uint64_tv 552 bytes stack frame, 420 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Sum_uint64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Sum_uint64_tv 304 bytes stack frame, 380 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Sum_uint64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Sum_uint64_tv 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z43ncclKernel_AllReduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z43ncclKernel_AllReduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_AllReduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z38ncclKernel_AllReduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z47ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z48ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_AllReduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z38ncclKernel_AllReduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z38ncclKernel_AllReduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z38ncclKernel_AllReduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 152 bytes stack frame, 300 bytes spill stores, 452 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_floatv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_floatv 496 bytes stack frame, 1016 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Sum_floatv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Sum_floatv 544 bytes stack frame, 396 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Sum_floatv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Sum_floatv 344 bytes stack frame, 500 bytes spill stores, 688 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Sum_floatv 528 bytes stack frame, 376 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Sum_floatv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_AllReduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z42ncclKernel_AllReduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_AllReduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z37ncclKernel_AllReduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z46ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z47ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_AllReduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z37ncclKernel_AllReduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z37ncclKernel_AllReduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z37ncclKernel_AllReduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 144 bytes stack frame, 296 bytes spill stores, 480 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_NVLS_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_halfv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_halfv 488 bytes stack frame, 1020 bytes spill stores, 1252 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_SIMPLE_Sum_halfv 224 bytes stack frame, 244 bytes spill stores, 232 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL128_Sum_halfv 568 bytes stack frame, 436 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_RING_LL_Sum_halfv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_SIMPLE_Sum_halfv 304 bytes stack frame, 412 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL128_Sum_halfv 528 bytes stack frame, 388 bytes spill stores, 416 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_TREE_LL_Sum_halfv 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z40ncclKernel_AllReduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z40ncclKernel_AllReduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 94 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z40ncclKernel_AllReduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 144 bytes stack frame, 264 bytes spill stores, 456 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_int64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_int64_tv 568 bytes stack frame, 1228 bytes spill stores, 1736 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Sum_int64_tv 256 bytes stack frame, 280 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Sum_int64_tv 552 bytes stack frame, 420 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Sum_int64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Sum_int64_tv 304 bytes stack frame, 380 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Sum_int64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Sum_int64_tv 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z41ncclKernel_AllReduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 94 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z41ncclKernel_AllReduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 144 bytes stack frame, 264 bytes spill stores, 456 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_uint64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_uint64_tv 568 bytes stack frame, 1228 bytes spill stores, 1736 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Sum_uint64_tv 256 bytes stack frame, 280 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Sum_uint64_tv 552 bytes stack frame, 420 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Sum_uint64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Sum_uint64_tv 304 bytes stack frame, 380 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Sum_uint64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Sum_uint64_tv 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z43ncclKernel_AllReduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z43ncclKernel_AllReduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_AllReduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z38ncclKernel_AllReduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z47ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z48ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_AllReduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z38ncclKernel_AllReduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z38ncclKernel_AllReduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z38ncclKernel_AllReduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 152 bytes stack frame, 300 bytes spill stores, 452 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_floatv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_floatv 496 bytes stack frame, 1016 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Sum_floatv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Sum_floatv 544 bytes stack frame, 396 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Sum_floatv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Sum_floatv 344 bytes stack frame, 500 bytes spill stores, 688 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Sum_floatv 528 bytes stack frame, 376 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Sum_floatv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_AllReduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z42ncclKernel_AllReduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_AllReduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z37ncclKernel_AllReduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z46ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z47ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_AllReduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z37ncclKernel_AllReduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z37ncclKernel_AllReduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z37ncclKernel_AllReduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 144 bytes stack frame, 296 bytes spill stores, 480 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_NVLS_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_halfv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_halfv 488 bytes stack frame, 996 bytes spill stores, 1328 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_SIMPLE_Sum_halfv 272 bytes stack frame, 304 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL128_Sum_halfv 544 bytes stack frame, 412 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_RING_LL_Sum_halfv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_SIMPLE_Sum_halfv 408 bytes stack frame, 652 bytes spill stores, 1000 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL128_Sum_halfv 616 bytes stack frame, 496 bytes spill stores, 648 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_TREE_LL_Sum_halfv 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z40ncclKernel_AllReduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z40ncclKernel_AllReduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z40ncclKernel_AllReduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 152 bytes stack frame, 304 bytes spill stores, 512 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_int64_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_int64_tv 584 bytes stack frame, 1248 bytes spill stores, 1792 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Sum_int64_tv 304 bytes stack frame, 324 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Sum_int64_tv 600 bytes stack frame, 460 bytes spill stores, 584 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Sum_int64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Sum_int64_tv 312 bytes stack frame, 424 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Sum_int64_tv 568 bytes stack frame, 408 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Sum_int64_tv 352 bytes stack frame, 520 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z41ncclKernel_AllReduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z41ncclKernel_AllReduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 152 bytes stack frame, 304 bytes spill stores, 512 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_uint64_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_uint64_tv 584 bytes stack frame, 1248 bytes spill stores, 1792 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Sum_uint64_tv 304 bytes stack frame, 324 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Sum_uint64_tv 600 bytes stack frame, 460 bytes spill stores, 584 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Sum_uint64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Sum_uint64_tv 312 bytes stack frame, 424 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Sum_uint64_tv 568 bytes stack frame, 408 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Sum_uint64_tv 352 bytes stack frame, 520 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z43ncclKernel_AllReduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z43ncclKernel_AllReduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_AllReduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z38ncclKernel_AllReduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z47ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z48ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_AllReduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z38ncclKernel_AllReduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork 16 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_AllReduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z38ncclKernel_AllReduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 176 bytes stack frame, 324 bytes spill stores, 524 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_floatv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_floatv 536 bytes stack frame, 1060 bytes spill stores, 1412 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Sum_floatv 296 bytes stack frame, 320 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Sum_floatv 600 bytes stack frame, 472 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Sum_floatv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Sum_floatv 352 bytes stack frame, 508 bytes spill stores, 668 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Sum_floatv 576 bytes stack frame, 416 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Sum_floatv 368 bytes stack frame, 532 bytes spill stores, 720 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z40ncclKernel_AllReduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z40ncclKernel_AllReduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z40ncclKernel_AllReduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 152 bytes stack frame, 300 bytes spill stores, 520 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_int64_tv 296 bytes stack frame, 324 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_int64_tv 712 bytes stack frame, 1616 bytes spill stores, 2392 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Sum_int64_tv 408 bytes stack frame, 480 bytes spill stores, 1104 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Sum_int64_tv 600 bytes stack frame, 472 bytes spill stores, 608 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Sum_int64_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Sum_int64_tv 504 bytes stack frame, 1032 bytes spill stores, 1520 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Sum_int64_tv 568 bytes stack frame, 408 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Sum_int64_tv 368 bytes stack frame, 536 bytes spill stores, 692 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_AllReduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z42ncclKernel_AllReduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_AllReduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z37ncclKernel_AllReduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z46ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z47ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_AllReduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z37ncclKernel_AllReduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_AllReduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z37ncclKernel_AllReduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 160 bytes stack frame, 316 bytes spill stores, 556 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_NVLS_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_halfv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_halfv 536 bytes stack frame, 1048 bytes spill stores, 1416 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_SIMPLE_Sum_halfv 288 bytes stack frame, 312 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL128_Sum_halfv 624 bytes stack frame, 504 bytes spill stores, 608 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_RING_LL_Sum_halfv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_SIMPLE_Sum_halfv 320 bytes stack frame, 464 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL128_Sum_halfv 592 bytes stack frame, 432 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_TREE_LL_Sum_halfv 360 bytes stack frame, 524 bytes spill stores, 696 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z41ncclKernel_AllReduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z41ncclKernel_AllReduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 152 bytes stack frame, 300 bytes spill stores, 520 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_uint64_tv 296 bytes stack frame, 324 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_uint64_tv 712 bytes stack frame, 1616 bytes spill stores, 2392 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Sum_uint64_tv 408 bytes stack frame, 480 bytes spill stores, 1104 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Sum_uint64_tv 600 bytes stack frame, 472 bytes spill stores, 608 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Sum_uint64_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Sum_uint64_tv 504 bytes stack frame, 1032 bytes spill stores, 1520 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Sum_uint64_tv 568 bytes stack frame, 408 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Sum_uint64_tv 368 bytes stack frame, 536 bytes spill stores, 692 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z43ncclKernel_AllReduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z43ncclKernel_AllReduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_AllReduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z38ncclKernel_AllReduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z47ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z48ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_AllReduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z38ncclKernel_AllReduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork 16 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_AllReduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z38ncclKernel_AllReduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 176 bytes stack frame, 324 bytes spill stores, 524 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_floatv 288 bytes stack frame, 312 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_floatv 576 bytes stack frame, 1220 bytes spill stores, 1812 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Sum_floatv 408 bytes stack frame, 480 bytes spill stores, 1164 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Sum_floatv 600 bytes stack frame, 476 bytes spill stores, 536 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Sum_floatv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Sum_floatv 504 bytes stack frame, 1028 bytes spill stores, 1412 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Sum_floatv 576 bytes stack frame, 420 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Sum_floatv 384 bytes stack frame, 548 bytes spill stores, 748 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z45ncclKernel_AllReduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z40ncclKernel_AllReduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z40ncclKernel_AllReduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers ptxas info : Compiling entry function '_Z40ncclKernel_AllReduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z40ncclKernel_AllReduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 136 bytes stack frame, 248 bytes spill stores, 460 bytes spill loads ptxas info : Used 96 registers ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_int64_tv 440 bytes stack frame, 712 bytes spill stores, 1052 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Sum_int64_tv 408 bytes stack frame, 652 bytes spill stores, 944 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_int64_tv 240 bytes stack frame, 264 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_int64_tv 696 bytes stack frame, 1564 bytes spill stores, 2340 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Sum_int64_tv 312 bytes stack frame, 340 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Sum_int64_tv 544 bytes stack frame, 404 bytes spill stores, 512 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Sum_int64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Sum_int64_tv 456 bytes stack frame, 976 bytes spill stores, 1440 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Sum_int64_tv 512 bytes stack frame, 348 bytes spill stores, 364 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Sum_int64_tv 344 bytes stack frame, 508 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_AllReduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z42ncclKernel_AllReduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_AllReduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z37ncclKernel_AllReduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z46ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z47ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_AllReduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z37ncclKernel_AllReduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_AllReduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z37ncclKernel_AllReduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 152 bytes stack frame, 312 bytes spill stores, 548 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_NVLS_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_halfv 256 bytes stack frame, 276 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_halfv 584 bytes stack frame, 1300 bytes spill stores, 2044 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_SIMPLE_Sum_halfv 416 bytes stack frame, 488 bytes spill stores, 1012 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL128_Sum_halfv 600 bytes stack frame, 476 bytes spill stores, 536 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_RING_LL_Sum_halfv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_SIMPLE_Sum_halfv 520 bytes stack frame, 1056 bytes spill stores, 1580 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL128_Sum_halfv 592 bytes stack frame, 432 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_TREE_LL_Sum_halfv 384 bytes stack frame, 540 bytes spill stores, 736 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sum_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_sum_f64.cu -o /<>/build/obj/collectives/device/all_reduce_sum_f64.o ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z46ncclKernel_AllReduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z41ncclKernel_AllReduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z50ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z51ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z41ncclKernel_AllReduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers ptxas info : Compiling entry function '_Z41ncclKernel_AllReduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z41ncclKernel_AllReduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 136 bytes stack frame, 248 bytes spill stores, 460 bytes spill loads ptxas info : Used 96 registers ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_uint64_tv 440 bytes stack frame, 712 bytes spill stores, 1052 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Sum_uint64_tv 408 bytes stack frame, 652 bytes spill stores, 944 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_uint64_tv 240 bytes stack frame, 264 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_uint64_tv 696 bytes stack frame, 1564 bytes spill stores, 2340 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Sum_uint64_tv 312 bytes stack frame, 340 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Sum_uint64_tv 544 bytes stack frame, 404 bytes spill stores, 512 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Sum_uint64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Sum_uint64_tv 456 bytes stack frame, 976 bytes spill stores, 1440 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Sum_uint64_tv 512 bytes stack frame, 348 bytes spill stores, 364 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Sum_uint64_tv 344 bytes stack frame, 508 bytes spill stores, 684 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sum_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_sum_bf16.cu -o /<>/build/obj/collectives/device/all_reduce_sum_bf16.o ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z43ncclKernel_AllReduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z43ncclKernel_AllReduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z38ncclKernel_AllReduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z38ncclKernel_AllReduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z47ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z47ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z48ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z48ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z38ncclKernel_AllReduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z38ncclKernel_AllReduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads ptxas info : Used 96 registers ptxas info : Compiling entry function '_Z38ncclKernel_AllReduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z38ncclKernel_AllReduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 144 bytes stack frame, 272 bytes spill stores, 444 bytes spill loads ptxas info : Used 96 registers ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_floatv 440 bytes stack frame, 716 bytes spill stores, 1056 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Sum_floatv 408 bytes stack frame, 648 bytes spill stores, 924 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_floatv 240 bytes stack frame, 268 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_floatv 552 bytes stack frame, 1248 bytes spill stores, 1984 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Sum_floatv 312 bytes stack frame, 340 bytes spill stores, 620 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Sum_floatv 544 bytes stack frame, 404 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Sum_floatv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Sum_floatv 440 bytes stack frame, 900 bytes spill stores, 1264 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Sum_floatv 496 bytes stack frame, 292 bytes spill stores, 272 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Sum_floatv 360 bytes stack frame, 548 bytes spill stores, 756 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_prod_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_prod_i8.cu -o /<>/build/obj/collectives/device/all_reduce_prod_i8.o ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z39ncclKernel_AllReduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z39ncclKernel_AllReduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 95 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z39ncclKernel_AllReduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 144 bytes stack frame, 264 bytes spill stores, 456 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_doublev 520 bytes stack frame, 1040 bytes spill stores, 1416 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Sum_doublev 232 bytes stack frame, 256 bytes spill stores, 252 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Sum_doublev 536 bytes stack frame, 392 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Sum_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Sum_doublev 328 bytes stack frame, 408 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Sum_doublev 520 bytes stack frame, 368 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Sum_doublev 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_AllReduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z42ncclKernel_AllReduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z37ncclKernel_AllReduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z37ncclKernel_AllReduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z46ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z47ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z47ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z37ncclKernel_AllReduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z37ncclKernel_AllReduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers ptxas info : Compiling entry function '_Z37ncclKernel_AllReduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z37ncclKernel_AllReduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 136 bytes stack frame, 272 bytes spill stores, 492 bytes spill loads ptxas info : Used 96 registers ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_halfv 440 bytes stack frame, 708 bytes spill stores, 1028 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_SIMPLE_Sum_halfv 408 bytes stack frame, 612 bytes spill stores, 880 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_NVLS_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_halfv 240 bytes stack frame, 264 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_halfv 560 bytes stack frame, 1320 bytes spill stores, 1968 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_SIMPLE_Sum_halfv 320 bytes stack frame, 348 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL128_Sum_halfv 544 bytes stack frame, 408 bytes spill stores, 484 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_RING_LL_Sum_halfv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_SIMPLE_Sum_halfv 448 bytes stack frame, 948 bytes spill stores, 1472 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL128_Sum_halfv 488 bytes stack frame, 296 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_TREE_LL_Sum_halfv 360 bytes stack frame, 528 bytes spill stores, 728 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_prod_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_prod_u8.cu -o /<>/build/obj/collectives/device/all_reduce_prod_u8.o ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z51ncclKernel_AllReduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z51ncclKernel_AllReduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z46ncclKernel_AllReduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z55ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z55ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z56ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z56ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z46ncclKernel_AllReduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 12 bytes cmem[2] ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z46ncclKernel_AllReduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 144 bytes stack frame, 296 bytes spill stores, 480 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 12 bytes cmem[2] ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum___nv_bfloat16v 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum___nv_bfloat16v 448 bytes stack frame, 948 bytes spill stores, 1152 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_Sum___nv_bfloat16v 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_Sum___nv_bfloat16v 536 bytes stack frame, 384 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_Sum___nv_bfloat16v 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_Sum___nv_bfloat16v 304 bytes stack frame, 380 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_Sum___nv_bfloat16v 688 bytes stack frame, 284 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_Sum___nv_bfloat16v 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_int8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_int8_tv 520 bytes stack frame, 1168 bytes spill stores, 1648 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Prod_int8_tv 288 bytes stack frame, 320 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Prod_int8_tv 528 bytes stack frame, 372 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Prod_int8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Prod_int8_tv 496 bytes stack frame, 920 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Prod_int8_tv 856 bytes stack frame, 436 bytes spill stores, 608 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Prod_int8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z39ncclKernel_AllReduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z39ncclKernel_AllReduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 95 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z39ncclKernel_AllReduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 144 bytes stack frame, 264 bytes spill stores, 456 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_doublev 520 bytes stack frame, 1040 bytes spill stores, 1416 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Sum_doublev 232 bytes stack frame, 256 bytes spill stores, 252 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Sum_doublev 536 bytes stack frame, 392 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Sum_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Sum_doublev 328 bytes stack frame, 408 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Sum_doublev 520 bytes stack frame, 368 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Sum_doublev 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_uint8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_uint8_tv 520 bytes stack frame, 1168 bytes spill stores, 1648 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Prod_uint8_tv 288 bytes stack frame, 320 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Prod_uint8_tv 528 bytes stack frame, 372 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Prod_uint8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Prod_uint8_tv 496 bytes stack frame, 920 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Prod_uint8_tv 856 bytes stack frame, 436 bytes spill stores, 608 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Prod_uint8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z39ncclKernel_AllReduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z39ncclKernel_AllReduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 95 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z39ncclKernel_AllReduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 144 bytes stack frame, 264 bytes spill stores, 456 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_doublev 520 bytes stack frame, 1040 bytes spill stores, 1416 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Sum_doublev 232 bytes stack frame, 256 bytes spill stores, 252 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Sum_doublev 536 bytes stack frame, 392 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Sum_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Sum_doublev 328 bytes stack frame, 408 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Sum_doublev 520 bytes stack frame, 368 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Sum_doublev 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z51ncclKernel_AllReduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z51ncclKernel_AllReduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z46ncclKernel_AllReduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z55ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z55ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z56ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z56ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z46ncclKernel_AllReduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 12 bytes cmem[2] ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z46ncclKernel_AllReduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 144 bytes stack frame, 296 bytes spill stores, 480 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 12 bytes cmem[2] ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum___nv_bfloat16v 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum___nv_bfloat16v 448 bytes stack frame, 948 bytes spill stores, 1152 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_Sum___nv_bfloat16v 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_Sum___nv_bfloat16v 536 bytes stack frame, 384 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_Sum___nv_bfloat16v 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_Sum___nv_bfloat16v 304 bytes stack frame, 380 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_Sum___nv_bfloat16v 688 bytes stack frame, 284 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_Sum___nv_bfloat16v 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_int8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_int8_tv 520 bytes stack frame, 1168 bytes spill stores, 1648 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Prod_int8_tv 288 bytes stack frame, 320 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Prod_int8_tv 528 bytes stack frame, 372 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Prod_int8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Prod_int8_tv 496 bytes stack frame, 920 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Prod_int8_tv 856 bytes stack frame, 436 bytes spill stores, 608 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Prod_int8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_uint8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_uint8_tv 520 bytes stack frame, 1168 bytes spill stores, 1648 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Prod_uint8_tv 288 bytes stack frame, 320 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Prod_uint8_tv 528 bytes stack frame, 372 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Prod_uint8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Prod_uint8_tv 496 bytes stack frame, 920 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Prod_uint8_tv 856 bytes stack frame, 436 bytes spill stores, 608 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Prod_uint8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z39ncclKernel_AllReduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z39ncclKernel_AllReduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z39ncclKernel_AllReduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 152 bytes stack frame, 304 bytes spill stores, 512 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_doublev 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_doublev 560 bytes stack frame, 1112 bytes spill stores, 1632 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Sum_doublev 288 bytes stack frame, 312 bytes spill stores, 412 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Sum_doublev 600 bytes stack frame, 464 bytes spill stores, 584 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Sum_doublev 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Sum_doublev 352 bytes stack frame, 496 bytes spill stores, 684 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Sum_doublev 592 bytes stack frame, 436 bytes spill stores, 596 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Sum_doublev 352 bytes stack frame, 512 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_int8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_int8_tv 520 bytes stack frame, 1168 bytes spill stores, 1648 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Prod_int8_tv 288 bytes stack frame, 320 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Prod_int8_tv 528 bytes stack frame, 372 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Prod_int8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Prod_int8_tv 496 bytes stack frame, 920 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Prod_int8_tv 856 bytes stack frame, 436 bytes spill stores, 608 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Prod_int8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z51ncclKernel_AllReduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z51ncclKernel_AllReduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z46ncclKernel_AllReduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z55ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z55ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z56ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z56ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z46ncclKernel_AllReduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 12 bytes cmem[2] ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z46ncclKernel_AllReduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 144 bytes stack frame, 296 bytes spill stores, 480 bytes spill loads ptxas info : Used 96 registers, 344 bytes cmem[0], 12 bytes cmem[2] ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum___nv_bfloat16v 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum___nv_bfloat16v 448 bytes stack frame, 948 bytes spill stores, 1152 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_Sum___nv_bfloat16v 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_Sum___nv_bfloat16v 536 bytes stack frame, 384 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_Sum___nv_bfloat16v 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_Sum___nv_bfloat16v 304 bytes stack frame, 380 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_Sum___nv_bfloat16v 688 bytes stack frame, 284 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_Sum___nv_bfloat16v 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_uint8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_uint8_tv 520 bytes stack frame, 1168 bytes spill stores, 1648 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Prod_uint8_tv 288 bytes stack frame, 320 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Prod_uint8_tv 528 bytes stack frame, 372 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Prod_uint8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Prod_uint8_tv 496 bytes stack frame, 920 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Prod_uint8_tv 856 bytes stack frame, 436 bytes spill stores, 608 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Prod_uint8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z39ncclKernel_AllReduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z39ncclKernel_AllReduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z39ncclKernel_AllReduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 152 bytes stack frame, 300 bytes spill stores, 520 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_doublev 240 bytes stack frame, 260 bytes spill stores, 252 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_doublev 704 bytes stack frame, 1572 bytes spill stores, 2396 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Sum_doublev 368 bytes stack frame, 404 bytes spill stores, 788 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Sum_doublev 576 bytes stack frame, 440 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Sum_doublev 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Sum_doublev 480 bytes stack frame, 976 bytes spill stores, 1496 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Sum_doublev 592 bytes stack frame, 436 bytes spill stores, 596 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Sum_doublev 368 bytes stack frame, 536 bytes spill stores, 692 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_int8_tv 232 bytes stack frame, 252 bytes spill stores, 244 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_int8_tv 560 bytes stack frame, 1192 bytes spill stores, 1692 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Prod_int8_tv 344 bytes stack frame, 388 bytes spill stores, 632 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Prod_int8_tv 592 bytes stack frame, 460 bytes spill stores, 544 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Prod_int8_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Prod_int8_tv 568 bytes stack frame, 988 bytes spill stores, 1452 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Prod_int8_tv 880 bytes stack frame, 500 bytes spill stores, 660 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Prod_int8_tv 376 bytes stack frame, 540 bytes spill stores, 740 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z51ncclKernel_AllReduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z51ncclKernel_AllReduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z46ncclKernel_AllReduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z55ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z55ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z56ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z56ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z46ncclKernel_AllReduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z46ncclKernel_AllReduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 160 bytes stack frame, 316 bytes spill stores, 556 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum___nv_bfloat16v 208 bytes stack frame, 204 bytes spill stores, 204 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum___nv_bfloat16v 496 bytes stack frame, 1012 bytes spill stores, 1240 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_Sum___nv_bfloat16v 288 bytes stack frame, 308 bytes spill stores, 380 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_Sum___nv_bfloat16v 592 bytes stack frame, 452 bytes spill stores, 524 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_Sum___nv_bfloat16v 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_Sum___nv_bfloat16v 328 bytes stack frame, 440 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_Sum___nv_bfloat16v 736 bytes stack frame, 324 bytes spill stores, 480 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_Sum___nv_bfloat16v 360 bytes stack frame, 524 bytes spill stores, 696 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_uint8_tv 232 bytes stack frame, 252 bytes spill stores, 244 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_uint8_tv 560 bytes stack frame, 1192 bytes spill stores, 1692 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Prod_uint8_tv 344 bytes stack frame, 388 bytes spill stores, 632 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Prod_uint8_tv 592 bytes stack frame, 460 bytes spill stores, 544 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Prod_uint8_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Prod_uint8_tv 568 bytes stack frame, 988 bytes spill stores, 1452 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Prod_uint8_tv 880 bytes stack frame, 500 bytes spill stores, 660 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Prod_uint8_tv 376 bytes stack frame, 540 bytes spill stores, 740 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z44ncclKernel_AllReduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z39ncclKernel_AllReduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z48ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z49ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z39ncclKernel_AllReduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers ptxas info : Compiling entry function '_Z39ncclKernel_AllReduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z39ncclKernel_AllReduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 136 bytes stack frame, 248 bytes spill stores, 460 bytes spill loads ptxas info : Used 96 registers ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum_doublev 440 bytes stack frame, 712 bytes spill stores, 1052 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Sum_doublev 408 bytes stack frame, 652 bytes spill stores, 944 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum_doublev 224 bytes stack frame, 244 bytes spill stores, 252 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum_doublev 688 bytes stack frame, 1552 bytes spill stores, 2324 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Sum_doublev 272 bytes stack frame, 292 bytes spill stores, 388 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Sum_doublev 520 bytes stack frame, 372 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Sum_doublev 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Sum_doublev 432 bytes stack frame, 896 bytes spill stores, 1272 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Sum_doublev 496 bytes stack frame, 316 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Sum_doublev 344 bytes stack frame, 508 bytes spill stores, 684 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_prod_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_prod_i32.cu -o /<>/build/obj/collectives/device/all_reduce_prod_i32.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_int8_tv 376 bytes stack frame, 628 bytes spill stores, 1316 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_int8_tv 696 bytes stack frame, 1856 bytes spill stores, 3188 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Prod_int8_tv 632 bytes stack frame, 868 bytes spill stores, 1828 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Prod_int8_tv 592 bytes stack frame, 460 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Prod_int8_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Prod_int8_tv 1120 bytes stack frame, 2572 bytes spill stores, 3260 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Prod_int8_tv 888 bytes stack frame, 504 bytes spill stores, 664 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Prod_int8_tv 384 bytes stack frame, 548 bytes spill stores, 744 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z51ncclKernel_AllReduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z51ncclKernel_AllReduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z46ncclKernel_AllReduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z55ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z55ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z56ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z56ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z46ncclKernel_AllReduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z46ncclKernel_AllReduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 152 bytes stack frame, 312 bytes spill stores, 548 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum___nv_bfloat16v 256 bytes stack frame, 276 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum___nv_bfloat16v 584 bytes stack frame, 1304 bytes spill stores, 2068 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_Sum___nv_bfloat16v 416 bytes stack frame, 488 bytes spill stores, 1012 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_Sum___nv_bfloat16v 600 bytes stack frame, 476 bytes spill stores, 536 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_Sum___nv_bfloat16v 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_Sum___nv_bfloat16v 544 bytes stack frame, 1084 bytes spill stores, 1596 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_Sum___nv_bfloat16v 592 bytes stack frame, 432 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_Sum___nv_bfloat16v 384 bytes stack frame, 540 bytes spill stores, 736 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_uint8_tv 376 bytes stack frame, 628 bytes spill stores, 1316 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_uint8_tv 696 bytes stack frame, 1856 bytes spill stores, 3188 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Prod_uint8_tv 632 bytes stack frame, 868 bytes spill stores, 1828 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Prod_uint8_tv 592 bytes stack frame, 460 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Prod_uint8_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Prod_uint8_tv 1120 bytes stack frame, 2572 bytes spill stores, 3260 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Prod_uint8_tv 888 bytes stack frame, 504 bytes spill stores, 664 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Prod_uint8_tv 384 bytes stack frame, 548 bytes spill stores, 744 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_int32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_int32_tv 488 bytes stack frame, 996 bytes spill stores, 1200 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Prod_int32_tv 240 bytes stack frame, 268 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Prod_int32_tv 560 bytes stack frame, 424 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Prod_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Prod_int32_tv 328 bytes stack frame, 476 bytes spill stores, 624 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Prod_int32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Prod_int32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_int8_tv 360 bytes stack frame, 624 bytes spill stores, 1452 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_int8_tv 672 bytes stack frame, 2024 bytes spill stores, 3728 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Prod_int8_tv 424 bytes stack frame, 648 bytes spill stores, 1360 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Prod_int8_tv 536 bytes stack frame, 396 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Prod_int8_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Prod_int8_tv 1040 bytes stack frame, 2524 bytes spill stores, 3268 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Prod_int8_tv 808 bytes stack frame, 388 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Prod_int8_tv 360 bytes stack frame, 528 bytes spill stores, 748 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_prod_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_prod_u32.cu -o /<>/build/obj/collectives/device/all_reduce_prod_u32.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_int32_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_int32_tv 488 bytes stack frame, 996 bytes spill stores, 1200 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Prod_int32_tv 240 bytes stack frame, 268 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Prod_int32_tv 560 bytes stack frame, 424 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Prod_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Prod_int32_tv 328 bytes stack frame, 476 bytes spill stores, 624 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Prod_int32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Prod_int32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Compiling entry function '_Z51ncclKernel_AllReduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z51ncclKernel_AllReduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z46ncclKernel_AllReduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z55ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z55ncclKernel_AllReduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z56ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z56ncclKernel_AllReduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z46ncclKernel_AllReduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers ptxas info : Compiling entry function '_Z46ncclKernel_AllReduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z46ncclKernel_AllReduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 136 bytes stack frame, 272 bytes spill stores, 492 bytes spill loads ptxas info : Used 96 registers ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Sum___nv_bfloat16v 440 bytes stack frame, 708 bytes spill stores, 1028 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_Sum___nv_bfloat16v 408 bytes stack frame, 612 bytes spill stores, 880 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Sum___nv_bfloat16v 240 bytes stack frame, 264 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Sum___nv_bfloat16v 560 bytes stack frame, 1320 bytes spill stores, 1968 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_Sum___nv_bfloat16v 320 bytes stack frame, 348 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_Sum___nv_bfloat16v 544 bytes stack frame, 408 bytes spill stores, 484 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_Sum___nv_bfloat16v 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_Sum___nv_bfloat16v 448 bytes stack frame, 948 bytes spill stores, 1472 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_Sum___nv_bfloat16v 488 bytes stack frame, 296 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_Sum___nv_bfloat16v 360 bytes stack frame, 528 bytes spill stores, 728 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_prod_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_prod_i64.cu -o /<>/build/obj/collectives/device/all_reduce_prod_i64.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_uint8_tv 360 bytes stack frame, 624 bytes spill stores, 1452 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_uint8_tv 672 bytes stack frame, 2024 bytes spill stores, 3728 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Prod_uint8_tv 424 bytes stack frame, 648 bytes spill stores, 1360 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Prod_uint8_tv 536 bytes stack frame, 396 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Prod_uint8_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Prod_uint8_tv 1040 bytes stack frame, 2524 bytes spill stores, 3268 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Prod_uint8_tv 808 bytes stack frame, 388 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Prod_uint8_tv 360 bytes stack frame, 528 bytes spill stores, 748 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_prod_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_prod_u64.cu -o /<>/build/obj/collectives/device/all_reduce_prod_u64.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_uint32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_uint32_tv 488 bytes stack frame, 996 bytes spill stores, 1200 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_SIMPLE_Prod_uint32_tv 240 bytes stack frame, 268 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL128_Prod_uint32_tv 560 bytes stack frame, 424 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL_Prod_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_SIMPLE_Prod_uint32_tv 328 bytes stack frame, 476 bytes spill stores, 624 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL128_Prod_uint32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL_Prod_uint32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_int32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_int32_tv 488 bytes stack frame, 996 bytes spill stores, 1200 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Prod_int32_tv 240 bytes stack frame, 268 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Prod_int32_tv 560 bytes stack frame, 424 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Prod_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Prod_int32_tv 328 bytes stack frame, 476 bytes spill stores, 624 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Prod_int32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Prod_int32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_int64_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_int64_tv 496 bytes stack frame, 1000 bytes spill stores, 1328 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Prod_int64_tv 248 bytes stack frame, 276 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Prod_int64_tv 552 bytes stack frame, 420 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Prod_int64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Prod_int64_tv 296 bytes stack frame, 368 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Prod_int64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Prod_int64_tv 336 bytes stack frame, 516 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_uint64_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_uint64_tv 496 bytes stack frame, 1000 bytes spill stores, 1328 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_SIMPLE_Prod_uint64_tv 248 bytes stack frame, 276 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL128_Prod_uint64_tv 552 bytes stack frame, 420 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL_Prod_uint64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_SIMPLE_Prod_uint64_tv 296 bytes stack frame, 368 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL128_Prod_uint64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL_Prod_uint64_tv 336 bytes stack frame, 516 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_uint32_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_uint32_tv 488 bytes stack frame, 996 bytes spill stores, 1200 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_SIMPLE_Prod_uint32_tv 240 bytes stack frame, 268 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL128_Prod_uint32_tv 560 bytes stack frame, 424 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL_Prod_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_SIMPLE_Prod_uint32_tv 328 bytes stack frame, 476 bytes spill stores, 624 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL128_Prod_uint32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL_Prod_uint32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_int32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_int32_tv 544 bytes stack frame, 1156 bytes spill stores, 1496 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Prod_int32_tv 296 bytes stack frame, 320 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Prod_int32_tv 600 bytes stack frame, 472 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Prod_int32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Prod_int32_tv 360 bytes stack frame, 544 bytes spill stores, 756 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Prod_int32_tv 576 bytes stack frame, 420 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Prod_int32_tv 368 bytes stack frame, 532 bytes spill stores, 720 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_int64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_int64_tv 496 bytes stack frame, 1000 bytes spill stores, 1328 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Prod_int64_tv 248 bytes stack frame, 276 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Prod_int64_tv 552 bytes stack frame, 420 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Prod_int64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Prod_int64_tv 296 bytes stack frame, 368 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Prod_int64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Prod_int64_tv 336 bytes stack frame, 516 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_uint64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_uint64_tv 496 bytes stack frame, 1000 bytes spill stores, 1328 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_SIMPLE_Prod_uint64_tv 248 bytes stack frame, 276 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL128_Prod_uint64_tv 552 bytes stack frame, 420 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL_Prod_uint64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_SIMPLE_Prod_uint64_tv 296 bytes stack frame, 368 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL128_Prod_uint64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL_Prod_uint64_tv 336 bytes stack frame, 516 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_uint32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_uint32_tv 488 bytes stack frame, 996 bytes spill stores, 1200 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_SIMPLE_Prod_uint32_tv 240 bytes stack frame, 268 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL128_Prod_uint32_tv 560 bytes stack frame, 424 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL_Prod_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_SIMPLE_Prod_uint32_tv 328 bytes stack frame, 476 bytes spill stores, 624 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL128_Prod_uint32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL_Prod_uint32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_int32_tv 288 bytes stack frame, 312 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_int32_tv 576 bytes stack frame, 1252 bytes spill stores, 1856 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Prod_int32_tv 408 bytes stack frame, 484 bytes spill stores, 1112 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Prod_int32_tv 600 bytes stack frame, 476 bytes spill stores, 536 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Prod_int32_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Prod_int32_tv 520 bytes stack frame, 1048 bytes spill stores, 1412 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Prod_int32_tv 576 bytes stack frame, 416 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Prod_int32_tv 384 bytes stack frame, 548 bytes spill stores, 748 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_int64_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_int64_tv 496 bytes stack frame, 1000 bytes spill stores, 1328 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Prod_int64_tv 248 bytes stack frame, 276 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Prod_int64_tv 552 bytes stack frame, 420 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Prod_int64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Prod_int64_tv 296 bytes stack frame, 368 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Prod_int64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Prod_int64_tv 336 bytes stack frame, 516 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_uint64_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_uint64_tv 496 bytes stack frame, 1000 bytes spill stores, 1328 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_SIMPLE_Prod_uint64_tv 248 bytes stack frame, 276 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL128_Prod_uint64_tv 552 bytes stack frame, 420 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL_Prod_uint64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_SIMPLE_Prod_uint64_tv 296 bytes stack frame, 368 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL128_Prod_uint64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL_Prod_uint64_tv 336 bytes stack frame, 516 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_uint32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_uint32_tv 544 bytes stack frame, 1156 bytes spill stores, 1496 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_SIMPLE_Prod_uint32_tv 296 bytes stack frame, 320 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL128_Prod_uint32_tv 600 bytes stack frame, 472 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL_Prod_uint32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_SIMPLE_Prod_uint32_tv 360 bytes stack frame, 544 bytes spill stores, 756 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL128_Prod_uint32_tv 576 bytes stack frame, 420 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL_Prod_uint32_tv 368 bytes stack frame, 532 bytes spill stores, 720 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_int32_tv 240 bytes stack frame, 268 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_int32_tv 560 bytes stack frame, 1232 bytes spill stores, 1832 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Prod_int32_tv 312 bytes stack frame, 340 bytes spill stores, 620 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Prod_int32_tv 544 bytes stack frame, 404 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Prod_int32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Prod_int32_tv 440 bytes stack frame, 932 bytes spill stores, 1256 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Prod_int32_tv 488 bytes stack frame, 300 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Prod_int32_tv 360 bytes stack frame, 548 bytes spill stores, 756 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_int64_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_int64_tv 608 bytes stack frame, 1312 bytes spill stores, 2040 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Prod_int64_tv 312 bytes stack frame, 332 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Prod_int64_tv 600 bytes stack frame, 460 bytes spill stores, 584 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Prod_int64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Prod_int64_tv 320 bytes stack frame, 408 bytes spill stores, 560 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Prod_int64_tv 568 bytes stack frame, 408 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Prod_int64_tv 352 bytes stack frame, 512 bytes spill stores, 664 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_prod_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_prod_f16.cu -o /<>/build/obj/collectives/device/all_reduce_prod_f16.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_uint64_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_uint64_tv 608 bytes stack frame, 1312 bytes spill stores, 2040 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_SIMPLE_Prod_uint64_tv 312 bytes stack frame, 332 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL128_Prod_uint64_tv 600 bytes stack frame, 460 bytes spill stores, 584 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL_Prod_uint64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_SIMPLE_Prod_uint64_tv 320 bytes stack frame, 408 bytes spill stores, 560 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL128_Prod_uint64_tv 568 bytes stack frame, 408 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL_Prod_uint64_tv 352 bytes stack frame, 512 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_uint32_tv 288 bytes stack frame, 312 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_uint32_tv 576 bytes stack frame, 1252 bytes spill stores, 1856 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_SIMPLE_Prod_uint32_tv 408 bytes stack frame, 484 bytes spill stores, 1112 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL128_Prod_uint32_tv 600 bytes stack frame, 476 bytes spill stores, 536 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL_Prod_uint32_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_SIMPLE_Prod_uint32_tv 520 bytes stack frame, 1048 bytes spill stores, 1412 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL128_Prod_uint32_tv 576 bytes stack frame, 416 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL_Prod_uint32_tv 384 bytes stack frame, 548 bytes spill stores, 748 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_int64_tv 296 bytes stack frame, 324 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_int64_tv 688 bytes stack frame, 1524 bytes spill stores, 2216 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Prod_int64_tv 408 bytes stack frame, 476 bytes spill stores, 1116 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Prod_int64_tv 584 bytes stack frame, 456 bytes spill stores, 516 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Prod_int64_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Prod_int64_tv 480 bytes stack frame, 1000 bytes spill stores, 1424 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Prod_int64_tv 568 bytes stack frame, 408 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Prod_int64_tv 368 bytes stack frame, 536 bytes spill stores, 692 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_halfv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_halfv 488 bytes stack frame, 984 bytes spill stores, 1300 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Prod_halfv 272 bytes stack frame, 304 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Prod_halfv 544 bytes stack frame, 412 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Prod_halfv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Prod_halfv 408 bytes stack frame, 652 bytes spill stores, 1000 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Prod_halfv 616 bytes stack frame, 496 bytes spill stores, 648 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Prod_halfv 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_uint64_tv 296 bytes stack frame, 324 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_uint64_tv 688 bytes stack frame, 1524 bytes spill stores, 2216 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_SIMPLE_Prod_uint64_tv 408 bytes stack frame, 476 bytes spill stores, 1116 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL128_Prod_uint64_tv 584 bytes stack frame, 456 bytes spill stores, 516 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL_Prod_uint64_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_SIMPLE_Prod_uint64_tv 480 bytes stack frame, 1000 bytes spill stores, 1424 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL128_Prod_uint64_tv 568 bytes stack frame, 408 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL_Prod_uint64_tv 368 bytes stack frame, 536 bytes spill stores, 692 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_uint32_tv 240 bytes stack frame, 268 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_uint32_tv 560 bytes stack frame, 1232 bytes spill stores, 1832 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_SIMPLE_Prod_uint32_tv 312 bytes stack frame, 340 bytes spill stores, 620 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL128_Prod_uint32_tv 544 bytes stack frame, 404 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL_Prod_uint32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_SIMPLE_Prod_uint32_tv 440 bytes stack frame, 932 bytes spill stores, 1256 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL128_Prod_uint32_tv 488 bytes stack frame, 300 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL_Prod_uint32_tv 360 bytes stack frame, 548 bytes spill stores, 756 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_prod_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_prod_f32.cu -o /<>/build/obj/collectives/device/all_reduce_prod_f32.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_int64_tv 240 bytes stack frame, 268 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_int64_tv 672 bytes stack frame, 1480 bytes spill stores, 2188 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Prod_int64_tv 312 bytes stack frame, 364 bytes spill stores, 664 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Prod_int64_tv 528 bytes stack frame, 380 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Prod_int64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Prod_int64_tv 432 bytes stack frame, 936 bytes spill stores, 1280 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Prod_int64_tv 512 bytes stack frame, 348 bytes spill stores, 364 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Prod_int64_tv 344 bytes stack frame, 496 bytes spill stores, 684 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_prod_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_prod_f64.cu -o /<>/build/obj/collectives/device/all_reduce_prod_f64.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_halfv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_halfv 488 bytes stack frame, 1020 bytes spill stores, 1252 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Prod_halfv 224 bytes stack frame, 244 bytes spill stores, 232 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Prod_halfv 568 bytes stack frame, 436 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Prod_halfv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Prod_halfv 304 bytes stack frame, 412 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Prod_halfv 528 bytes stack frame, 388 bytes spill stores, 416 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Prod_halfv 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_uint64_tv 240 bytes stack frame, 268 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_uint64_tv 672 bytes stack frame, 1480 bytes spill stores, 2188 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_SIMPLE_Prod_uint64_tv 312 bytes stack frame, 364 bytes spill stores, 664 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL128_Prod_uint64_tv 528 bytes stack frame, 380 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL_Prod_uint64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_SIMPLE_Prod_uint64_tv 432 bytes stack frame, 936 bytes spill stores, 1280 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL128_Prod_uint64_tv 512 bytes stack frame, 348 bytes spill stores, 364 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL_Prod_uint64_tv 344 bytes stack frame, 496 bytes spill stores, 684 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_prod_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_prod_bf16.cu -o /<>/build/obj/collectives/device/all_reduce_prod_bf16.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_floatv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_floatv 496 bytes stack frame, 1016 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Prod_floatv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Prod_floatv 544 bytes stack frame, 396 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Prod_floatv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Prod_floatv 344 bytes stack frame, 500 bytes spill stores, 688 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Prod_floatv 528 bytes stack frame, 376 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Prod_floatv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_doublev 520 bytes stack frame, 1040 bytes spill stores, 1416 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Prod_doublev 232 bytes stack frame, 256 bytes spill stores, 252 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Prod_doublev 536 bytes stack frame, 392 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Prod_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Prod_doublev 328 bytes stack frame, 408 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Prod_doublev 520 bytes stack frame, 368 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Prod_doublev 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_halfv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_halfv 488 bytes stack frame, 996 bytes spill stores, 1328 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Prod_halfv 272 bytes stack frame, 304 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Prod_halfv 544 bytes stack frame, 412 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Prod_halfv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Prod_halfv 408 bytes stack frame, 652 bytes spill stores, 1000 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Prod_halfv 616 bytes stack frame, 496 bytes spill stores, 648 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Prod_halfv 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod___nv_bfloat16v 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod___nv_bfloat16v 448 bytes stack frame, 948 bytes spill stores, 1152 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_Prod___nv_bfloat16v 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_Prod___nv_bfloat16v 536 bytes stack frame, 384 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_Prod___nv_bfloat16v 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_Prod___nv_bfloat16v 304 bytes stack frame, 380 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_Prod___nv_bfloat16v 688 bytes stack frame, 284 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_Prod___nv_bfloat16v 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_floatv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_floatv 496 bytes stack frame, 1016 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Prod_floatv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Prod_floatv 544 bytes stack frame, 396 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Prod_floatv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Prod_floatv 344 bytes stack frame, 500 bytes spill stores, 688 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Prod_floatv 528 bytes stack frame, 376 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Prod_floatv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_doublev 520 bytes stack frame, 1040 bytes spill stores, 1416 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Prod_doublev 232 bytes stack frame, 256 bytes spill stores, 252 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Prod_doublev 536 bytes stack frame, 392 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Prod_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Prod_doublev 328 bytes stack frame, 408 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Prod_doublev 520 bytes stack frame, 368 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Prod_doublev 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_halfv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_halfv 536 bytes stack frame, 1048 bytes spill stores, 1416 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Prod_halfv 288 bytes stack frame, 312 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Prod_halfv 624 bytes stack frame, 504 bytes spill stores, 608 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Prod_halfv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Prod_halfv 320 bytes stack frame, 464 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Prod_halfv 592 bytes stack frame, 432 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Prod_halfv 360 bytes stack frame, 524 bytes spill stores, 696 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_floatv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_floatv 496 bytes stack frame, 1016 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Prod_floatv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Prod_floatv 544 bytes stack frame, 396 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Prod_floatv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Prod_floatv 344 bytes stack frame, 500 bytes spill stores, 688 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Prod_floatv 528 bytes stack frame, 376 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Prod_floatv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_doublev 520 bytes stack frame, 1040 bytes spill stores, 1416 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Prod_doublev 232 bytes stack frame, 256 bytes spill stores, 252 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Prod_doublev 536 bytes stack frame, 392 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Prod_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Prod_doublev 328 bytes stack frame, 408 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Prod_doublev 520 bytes stack frame, 368 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Prod_doublev 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod___nv_bfloat16v 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod___nv_bfloat16v 448 bytes stack frame, 948 bytes spill stores, 1152 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_Prod___nv_bfloat16v 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_Prod___nv_bfloat16v 536 bytes stack frame, 384 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_Prod___nv_bfloat16v 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_Prod___nv_bfloat16v 304 bytes stack frame, 380 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_Prod___nv_bfloat16v 688 bytes stack frame, 284 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_Prod___nv_bfloat16v 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_halfv 256 bytes stack frame, 276 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_halfv 584 bytes stack frame, 1300 bytes spill stores, 2044 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Prod_halfv 416 bytes stack frame, 488 bytes spill stores, 1012 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Prod_halfv 600 bytes stack frame, 476 bytes spill stores, 536 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Prod_halfv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Prod_halfv 520 bytes stack frame, 1056 bytes spill stores, 1580 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Prod_halfv 592 bytes stack frame, 432 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Prod_halfv 384 bytes stack frame, 540 bytes spill stores, 736 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_floatv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_floatv 536 bytes stack frame, 1060 bytes spill stores, 1412 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Prod_floatv 296 bytes stack frame, 320 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Prod_floatv 600 bytes stack frame, 472 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Prod_floatv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Prod_floatv 352 bytes stack frame, 508 bytes spill stores, 668 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Prod_floatv 576 bytes stack frame, 416 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Prod_floatv 368 bytes stack frame, 532 bytes spill stores, 720 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_doublev 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_doublev 560 bytes stack frame, 1112 bytes spill stores, 1632 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Prod_doublev 288 bytes stack frame, 312 bytes spill stores, 412 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Prod_doublev 600 bytes stack frame, 464 bytes spill stores, 584 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Prod_doublev 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Prod_doublev 352 bytes stack frame, 496 bytes spill stores, 684 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Prod_doublev 592 bytes stack frame, 436 bytes spill stores, 596 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Prod_doublev 352 bytes stack frame, 512 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod___nv_bfloat16v 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod___nv_bfloat16v 448 bytes stack frame, 948 bytes spill stores, 1152 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_Prod___nv_bfloat16v 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_Prod___nv_bfloat16v 536 bytes stack frame, 384 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_Prod___nv_bfloat16v 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_Prod___nv_bfloat16v 304 bytes stack frame, 380 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_Prod___nv_bfloat16v 688 bytes stack frame, 284 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_Prod___nv_bfloat16v 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_halfv 240 bytes stack frame, 264 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_halfv 560 bytes stack frame, 1320 bytes spill stores, 1968 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Prod_halfv 320 bytes stack frame, 348 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Prod_halfv 544 bytes stack frame, 408 bytes spill stores, 484 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Prod_halfv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Prod_halfv 448 bytes stack frame, 948 bytes spill stores, 1472 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Prod_halfv 488 bytes stack frame, 296 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Prod_halfv 360 bytes stack frame, 528 bytes spill stores, 728 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_floatv 288 bytes stack frame, 312 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_floatv 576 bytes stack frame, 1220 bytes spill stores, 1812 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Prod_floatv 408 bytes stack frame, 480 bytes spill stores, 1164 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Prod_floatv 600 bytes stack frame, 476 bytes spill stores, 536 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Prod_floatv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Prod_floatv 504 bytes stack frame, 1028 bytes spill stores, 1412 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Prod_floatv 576 bytes stack frame, 420 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Prod_floatv 384 bytes stack frame, 548 bytes spill stores, 748 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_min_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_min_i8.cu -o /<>/build/obj/collectives/device/all_reduce_min_i8.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_doublev 240 bytes stack frame, 260 bytes spill stores, 252 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_doublev 704 bytes stack frame, 1572 bytes spill stores, 2396 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Prod_doublev 368 bytes stack frame, 404 bytes spill stores, 788 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Prod_doublev 576 bytes stack frame, 440 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Prod_doublev 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Prod_doublev 480 bytes stack frame, 976 bytes spill stores, 1496 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Prod_doublev 592 bytes stack frame, 436 bytes spill stores, 596 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Prod_doublev 368 bytes stack frame, 536 bytes spill stores, 692 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod___nv_bfloat16v 208 bytes stack frame, 204 bytes spill stores, 204 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod___nv_bfloat16v 496 bytes stack frame, 1012 bytes spill stores, 1240 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_Prod___nv_bfloat16v 288 bytes stack frame, 308 bytes spill stores, 380 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_Prod___nv_bfloat16v 592 bytes stack frame, 452 bytes spill stores, 524 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_Prod___nv_bfloat16v 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_Prod___nv_bfloat16v 328 bytes stack frame, 440 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_Prod___nv_bfloat16v 736 bytes stack frame, 324 bytes spill stores, 480 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_Prod___nv_bfloat16v 360 bytes stack frame, 524 bytes spill stores, 696 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_floatv 240 bytes stack frame, 268 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_floatv 552 bytes stack frame, 1248 bytes spill stores, 1984 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Prod_floatv 312 bytes stack frame, 340 bytes spill stores, 620 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Prod_floatv 544 bytes stack frame, 404 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Prod_floatv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Prod_floatv 440 bytes stack frame, 900 bytes spill stores, 1264 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Prod_floatv 496 bytes stack frame, 292 bytes spill stores, 272 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Prod_floatv 360 bytes stack frame, 548 bytes spill stores, 756 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_min_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_min_u8.cu -o /<>/build/obj/collectives/device/all_reduce_min_u8.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod_doublev 224 bytes stack frame, 244 bytes spill stores, 252 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod_doublev 688 bytes stack frame, 1552 bytes spill stores, 2324 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Prod_doublev 272 bytes stack frame, 292 bytes spill stores, 388 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Prod_doublev 520 bytes stack frame, 372 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Prod_doublev 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Prod_doublev 432 bytes stack frame, 896 bytes spill stores, 1272 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Prod_doublev 496 bytes stack frame, 316 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Prod_doublev 344 bytes stack frame, 508 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_int8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_int8_tv 512 bytes stack frame, 1216 bytes spill stores, 1780 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Min_int8_tv 296 bytes stack frame, 320 bytes spill stores, 512 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Min_int8_tv 536 bytes stack frame, 388 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Min_int8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Min_int8_tv 488 bytes stack frame, 1028 bytes spill stores, 1476 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Min_int8_tv 872 bytes stack frame, 456 bytes spill stores, 624 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Min_int8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_min_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_min_i32.cu -o /<>/build/obj/collectives/device/all_reduce_min_i32.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod___nv_bfloat16v 256 bytes stack frame, 276 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod___nv_bfloat16v 584 bytes stack frame, 1304 bytes spill stores, 2068 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_Prod___nv_bfloat16v 416 bytes stack frame, 488 bytes spill stores, 1012 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_Prod___nv_bfloat16v 600 bytes stack frame, 476 bytes spill stores, 536 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_Prod___nv_bfloat16v 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_Prod___nv_bfloat16v 544 bytes stack frame, 1084 bytes spill stores, 1596 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_Prod___nv_bfloat16v 592 bytes stack frame, 432 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_Prod___nv_bfloat16v 384 bytes stack frame, 540 bytes spill stores, 736 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_uint8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_uint8_tv 512 bytes stack frame, 1212 bytes spill stores, 1796 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Min_uint8_tv 296 bytes stack frame, 328 bytes spill stores, 520 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Min_uint8_tv 536 bytes stack frame, 388 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Min_uint8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Min_uint8_tv 488 bytes stack frame, 908 bytes spill stores, 1272 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Min_uint8_tv 880 bytes stack frame, 464 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Min_uint8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_int32_tv 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_int32_tv 488 bytes stack frame, 1004 bytes spill stores, 1208 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Min_int32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Min_int32_tv 536 bytes stack frame, 384 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Min_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Min_int32_tv 328 bytes stack frame, 472 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Min_int32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Min_int32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_int8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_int8_tv 512 bytes stack frame, 1216 bytes spill stores, 1780 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Min_int8_tv 296 bytes stack frame, 320 bytes spill stores, 512 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Min_int8_tv 536 bytes stack frame, 388 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Min_int8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Min_int8_tv 488 bytes stack frame, 1028 bytes spill stores, 1476 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Min_int8_tv 872 bytes stack frame, 456 bytes spill stores, 624 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Min_int8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Prod___nv_bfloat16v 240 bytes stack frame, 264 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Prod___nv_bfloat16v 560 bytes stack frame, 1320 bytes spill stores, 1968 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_Prod___nv_bfloat16v 320 bytes stack frame, 348 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_Prod___nv_bfloat16v 544 bytes stack frame, 408 bytes spill stores, 484 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_Prod___nv_bfloat16v 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_Prod___nv_bfloat16v 448 bytes stack frame, 948 bytes spill stores, 1472 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_Prod___nv_bfloat16v 488 bytes stack frame, 296 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_Prod___nv_bfloat16v 360 bytes stack frame, 528 bytes spill stores, 728 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_int32_tv 488 bytes stack frame, 1004 bytes spill stores, 1208 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Min_int32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Min_int32_tv 536 bytes stack frame, 384 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Min_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Min_int32_tv 328 bytes stack frame, 472 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Min_int32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Min_int32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_min_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_min_u32.cu -o /<>/build/obj/collectives/device/all_reduce_min_u32.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_uint8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_uint8_tv 512 bytes stack frame, 1212 bytes spill stores, 1796 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Min_uint8_tv 296 bytes stack frame, 328 bytes spill stores, 520 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Min_uint8_tv 536 bytes stack frame, 388 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Min_uint8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Min_uint8_tv 488 bytes stack frame, 908 bytes spill stores, 1272 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Min_uint8_tv 880 bytes stack frame, 464 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Min_uint8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_int8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_int8_tv 512 bytes stack frame, 1216 bytes spill stores, 1780 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Min_int8_tv 296 bytes stack frame, 320 bytes spill stores, 512 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Min_int8_tv 536 bytes stack frame, 388 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Min_int8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Min_int8_tv 488 bytes stack frame, 1028 bytes spill stores, 1476 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Min_int8_tv 872 bytes stack frame, 456 bytes spill stores, 624 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Min_int8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_int32_tv 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_int32_tv 488 bytes stack frame, 1004 bytes spill stores, 1208 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Min_int32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Min_int32_tv 536 bytes stack frame, 384 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Min_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Min_int32_tv 328 bytes stack frame, 472 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Min_int32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Min_int32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_uint32_tv 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_uint32_tv 488 bytes stack frame, 1004 bytes spill stores, 1208 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Min_uint32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Min_uint32_tv 536 bytes stack frame, 384 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Min_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Min_uint32_tv 328 bytes stack frame, 472 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Min_uint32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Min_uint32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_uint8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_uint8_tv 512 bytes stack frame, 1212 bytes spill stores, 1796 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Min_uint8_tv 296 bytes stack frame, 328 bytes spill stores, 520 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Min_uint8_tv 536 bytes stack frame, 388 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Min_uint8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Min_uint8_tv 488 bytes stack frame, 908 bytes spill stores, 1272 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Min_uint8_tv 880 bytes stack frame, 464 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Min_uint8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_int8_tv 288 bytes stack frame, 312 bytes spill stores, 392 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_int8_tv 552 bytes stack frame, 1208 bytes spill stores, 1712 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Min_int8_tv 400 bytes stack frame, 424 bytes spill stores, 972 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Min_int8_tv 592 bytes stack frame, 460 bytes spill stores, 544 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Min_int8_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Min_int8_tv 544 bytes stack frame, 1160 bytes spill stores, 1660 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Min_int8_tv 896 bytes stack frame, 520 bytes spill stores, 704 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Min_int8_tv 376 bytes stack frame, 540 bytes spill stores, 740 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_int32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_int32_tv 544 bytes stack frame, 1156 bytes spill stores, 1496 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Min_int32_tv 296 bytes stack frame, 320 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Min_int32_tv 584 bytes stack frame, 448 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Min_int32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Min_int32_tv 360 bytes stack frame, 544 bytes spill stores, 756 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Min_int32_tv 576 bytes stack frame, 420 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Min_int32_tv 368 bytes stack frame, 532 bytes spill stores, 720 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_uint32_tv 488 bytes stack frame, 1004 bytes spill stores, 1208 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Min_uint32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Min_uint32_tv 536 bytes stack frame, 384 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Min_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Min_uint32_tv 328 bytes stack frame, 472 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Min_uint32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Min_uint32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_uint8_tv 288 bytes stack frame, 312 bytes spill stores, 412 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_uint8_tv 552 bytes stack frame, 1172 bytes spill stores, 1636 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Min_uint8_tv 400 bytes stack frame, 432 bytes spill stores, 988 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Min_uint8_tv 592 bytes stack frame, 460 bytes spill stores, 544 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Min_uint8_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Min_uint8_tv 536 bytes stack frame, 1004 bytes spill stores, 1392 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Min_uint8_tv 904 bytes stack frame, 528 bytes spill stores, 712 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Min_uint8_tv 376 bytes stack frame, 540 bytes spill stores, 740 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_int8_tv 376 bytes stack frame, 632 bytes spill stores, 1352 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_int8_tv 688 bytes stack frame, 1992 bytes spill stores, 3500 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Min_int8_tv 568 bytes stack frame, 828 bytes spill stores, 1748 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Min_int8_tv 592 bytes stack frame, 460 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Min_int8_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Min_int8_tv 936 bytes stack frame, 2260 bytes spill stores, 2964 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Min_int8_tv 880 bytes stack frame, 496 bytes spill stores, 656 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Min_int8_tv 384 bytes stack frame, 552 bytes spill stores, 748 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_int32_tv 288 bytes stack frame, 312 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_int32_tv 576 bytes stack frame, 1252 bytes spill stores, 1856 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Min_int32_tv 408 bytes stack frame, 484 bytes spill stores, 1112 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Min_int32_tv 584 bytes stack frame, 452 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Min_int32_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Min_int32_tv 520 bytes stack frame, 1048 bytes spill stores, 1412 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Min_int32_tv 576 bytes stack frame, 416 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Min_int32_tv 384 bytes stack frame, 548 bytes spill stores, 748 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_uint32_tv 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_uint32_tv 488 bytes stack frame, 1004 bytes spill stores, 1208 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Min_uint32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Min_uint32_tv 536 bytes stack frame, 384 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Min_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Min_uint32_tv 328 bytes stack frame, 472 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Min_uint32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Min_uint32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_uint8_tv 376 bytes stack frame, 628 bytes spill stores, 1316 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_uint8_tv 688 bytes stack frame, 1992 bytes spill stores, 3500 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Min_uint8_tv 536 bytes stack frame, 804 bytes spill stores, 1732 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Min_uint8_tv 592 bytes stack frame, 460 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Min_uint8_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Min_uint8_tv 968 bytes stack frame, 2260 bytes spill stores, 3004 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Min_uint8_tv 880 bytes stack frame, 496 bytes spill stores, 656 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Min_uint8_tv 384 bytes stack frame, 552 bytes spill stores, 748 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_uint32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_uint32_tv 544 bytes stack frame, 1156 bytes spill stores, 1496 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Min_uint32_tv 296 bytes stack frame, 320 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Min_uint32_tv 584 bytes stack frame, 448 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Min_uint32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Min_uint32_tv 360 bytes stack frame, 544 bytes spill stores, 756 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Min_uint32_tv 576 bytes stack frame, 420 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Min_uint32_tv 368 bytes stack frame, 532 bytes spill stores, 720 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_int8_tv 368 bytes stack frame, 628 bytes spill stores, 1456 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_int8_tv 672 bytes stack frame, 2032 bytes spill stores, 3712 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Min_int8_tv 432 bytes stack frame, 644 bytes spill stores, 1356 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Min_int8_tv 536 bytes stack frame, 396 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Min_int8_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Min_int8_tv 864 bytes stack frame, 2200 bytes spill stores, 2944 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Min_int8_tv 800 bytes stack frame, 380 bytes spill stores, 548 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Min_int8_tv 360 bytes stack frame, 528 bytes spill stores, 748 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_int32_tv 440 bytes stack frame, 728 bytes spill stores, 1076 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Min_int32_tv 408 bytes stack frame, 648 bytes spill stores, 924 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_int32_tv 240 bytes stack frame, 268 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_int32_tv 560 bytes stack frame, 1232 bytes spill stores, 1832 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Min_int32_tv 312 bytes stack frame, 340 bytes spill stores, 620 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Min_int32_tv 544 bytes stack frame, 404 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Min_int32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Min_int32_tv 440 bytes stack frame, 932 bytes spill stores, 1256 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Min_int32_tv 488 bytes stack frame, 300 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Min_int32_tv 360 bytes stack frame, 548 bytes spill stores, 756 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_min_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_min_i64.cu -o /<>/build/obj/collectives/device/all_reduce_min_i64.o Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_min_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_min_u64.cu -o /<>/build/obj/collectives/device/all_reduce_min_u64.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_uint32_tv 288 bytes stack frame, 312 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_uint32_tv 576 bytes stack frame, 1252 bytes spill stores, 1856 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Min_uint32_tv 408 bytes stack frame, 484 bytes spill stores, 1112 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Min_uint32_tv 584 bytes stack frame, 452 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Min_uint32_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Min_uint32_tv 520 bytes stack frame, 1048 bytes spill stores, 1412 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Min_uint32_tv 576 bytes stack frame, 416 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Min_uint32_tv 384 bytes stack frame, 548 bytes spill stores, 748 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_uint8_tv 360 bytes stack frame, 624 bytes spill stores, 1452 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_uint8_tv 664 bytes stack frame, 2032 bytes spill stores, 3712 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Min_uint8_tv 432 bytes stack frame, 644 bytes spill stores, 1352 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Min_uint8_tv 536 bytes stack frame, 396 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Min_uint8_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Min_uint8_tv 880 bytes stack frame, 2188 bytes spill stores, 2936 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Min_uint8_tv 808 bytes stack frame, 380 bytes spill stores, 548 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Min_uint8_tv 360 bytes stack frame, 528 bytes spill stores, 748 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_min_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_min_f16.cu -o /<>/build/obj/collectives/device/all_reduce_min_f16.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_int64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_int64_tv 496 bytes stack frame, 992 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Min_int64_tv 248 bytes stack frame, 276 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Min_int64_tv 552 bytes stack frame, 416 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Min_int64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Min_int64_tv 296 bytes stack frame, 368 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Min_int64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Min_int64_tv 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_uint64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_uint64_tv 496 bytes stack frame, 992 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Min_uint64_tv 248 bytes stack frame, 276 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Min_uint64_tv 552 bytes stack frame, 416 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Min_uint64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Min_uint64_tv 296 bytes stack frame, 368 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Min_uint64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Min_uint64_tv 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_int64_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_int64_tv 496 bytes stack frame, 992 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Min_int64_tv 248 bytes stack frame, 276 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Min_int64_tv 552 bytes stack frame, 416 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Min_int64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Min_int64_tv 296 bytes stack frame, 368 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Min_int64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Min_int64_tv 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_uint64_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_uint64_tv 496 bytes stack frame, 992 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Min_uint64_tv 248 bytes stack frame, 276 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Min_uint64_tv 552 bytes stack frame, 416 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Min_uint64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Min_uint64_tv 296 bytes stack frame, 368 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Min_uint64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Min_uint64_tv 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_uint32_tv 440 bytes stack frame, 728 bytes spill stores, 1076 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Min_uint32_tv 408 bytes stack frame, 648 bytes spill stores, 924 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_uint32_tv 240 bytes stack frame, 268 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_uint32_tv 560 bytes stack frame, 1232 bytes spill stores, 1832 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Min_uint32_tv 312 bytes stack frame, 340 bytes spill stores, 620 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Min_uint32_tv 544 bytes stack frame, 404 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Min_uint32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Min_uint32_tv 440 bytes stack frame, 932 bytes spill stores, 1256 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Min_uint32_tv 488 bytes stack frame, 300 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Min_uint32_tv 360 bytes stack frame, 548 bytes spill stores, 756 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_NVLS_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_halfv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_halfv 496 bytes stack frame, 1004 bytes spill stores, 1336 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_SIMPLE_Min_halfv 280 bytes stack frame, 316 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL128_Min_halfv 544 bytes stack frame, 412 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_RING_LL_Min_halfv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_SIMPLE_Min_halfv 416 bytes stack frame, 652 bytes spill stores, 992 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL128_Min_halfv 616 bytes stack frame, 496 bytes spill stores, 648 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_TREE_LL_Min_halfv 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_min_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_min_f32.cu -o /<>/build/obj/collectives/device/all_reduce_min_f32.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_int64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_int64_tv 496 bytes stack frame, 992 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Min_int64_tv 248 bytes stack frame, 276 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Min_int64_tv 552 bytes stack frame, 416 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Min_int64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Min_int64_tv 296 bytes stack frame, 368 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Min_int64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Min_int64_tv 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_uint64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_uint64_tv 496 bytes stack frame, 992 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Min_uint64_tv 248 bytes stack frame, 276 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Min_uint64_tv 552 bytes stack frame, 416 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Min_uint64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Min_uint64_tv 296 bytes stack frame, 368 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Min_uint64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Min_uint64_tv 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_floatv 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_floatv 496 bytes stack frame, 1016 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Min_floatv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Min_floatv 544 bytes stack frame, 396 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Min_floatv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Min_floatv 344 bytes stack frame, 500 bytes spill stores, 688 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Min_floatv 528 bytes stack frame, 376 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Min_floatv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_NVLS_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_halfv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_halfv 480 bytes stack frame, 964 bytes spill stores, 1300 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_SIMPLE_Min_halfv 264 bytes stack frame, 288 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL128_Min_halfv 544 bytes stack frame, 412 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_RING_LL_Min_halfv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_SIMPLE_Min_halfv 416 bytes stack frame, 652 bytes spill stores, 992 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL128_Min_halfv 616 bytes stack frame, 496 bytes spill stores, 648 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_TREE_LL_Min_halfv 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_int64_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_int64_tv 552 bytes stack frame, 1156 bytes spill stores, 1556 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Min_int64_tv 304 bytes stack frame, 324 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Min_int64_tv 600 bytes stack frame, 460 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Min_int64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Min_int64_tv 312 bytes stack frame, 424 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Min_int64_tv 568 bytes stack frame, 408 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Min_int64_tv 352 bytes stack frame, 520 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_uint64_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_uint64_tv 552 bytes stack frame, 1156 bytes spill stores, 1556 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Min_uint64_tv 304 bytes stack frame, 324 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Min_uint64_tv 600 bytes stack frame, 460 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Min_uint64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Min_uint64_tv 312 bytes stack frame, 424 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Min_uint64_tv 568 bytes stack frame, 408 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Min_uint64_tv 352 bytes stack frame, 520 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_floatv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_floatv 496 bytes stack frame, 1016 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Min_floatv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Min_floatv 544 bytes stack frame, 396 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Min_floatv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Min_floatv 344 bytes stack frame, 500 bytes spill stores, 688 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Min_floatv 528 bytes stack frame, 376 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Min_floatv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_NVLS_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_halfv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_halfv 496 bytes stack frame, 1004 bytes spill stores, 1336 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_SIMPLE_Min_halfv 280 bytes stack frame, 316 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL128_Min_halfv 544 bytes stack frame, 412 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_RING_LL_Min_halfv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_SIMPLE_Min_halfv 416 bytes stack frame, 652 bytes spill stores, 992 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL128_Min_halfv 616 bytes stack frame, 496 bytes spill stores, 648 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_TREE_LL_Min_halfv 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_int64_tv 296 bytes stack frame, 324 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_int64_tv 648 bytes stack frame, 1424 bytes spill stores, 2124 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Min_int64_tv 416 bytes stack frame, 496 bytes spill stores, 1072 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Min_int64_tv 600 bytes stack frame, 472 bytes spill stores, 604 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Min_int64_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Min_int64_tv 424 bytes stack frame, 832 bytes spill stores, 1256 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Min_int64_tv 568 bytes stack frame, 408 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Min_int64_tv 368 bytes stack frame, 536 bytes spill stores, 692 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_uint64_tv 296 bytes stack frame, 324 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_uint64_tv 648 bytes stack frame, 1424 bytes spill stores, 2124 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Min_uint64_tv 416 bytes stack frame, 496 bytes spill stores, 1072 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Min_uint64_tv 600 bytes stack frame, 472 bytes spill stores, 604 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Min_uint64_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Min_uint64_tv 424 bytes stack frame, 832 bytes spill stores, 1256 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Min_uint64_tv 568 bytes stack frame, 408 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Min_uint64_tv 368 bytes stack frame, 536 bytes spill stores, 692 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_floatv 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_floatv 496 bytes stack frame, 1016 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Min_floatv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Min_floatv 544 bytes stack frame, 396 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Min_floatv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Min_floatv 344 bytes stack frame, 500 bytes spill stores, 688 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Min_floatv 528 bytes stack frame, 376 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Min_floatv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_NVLS_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_halfv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_halfv 528 bytes stack frame, 1032 bytes spill stores, 1404 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_SIMPLE_Min_halfv 320 bytes stack frame, 344 bytes spill stores, 548 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL128_Min_halfv 600 bytes stack frame, 468 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_RING_LL_Min_halfv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_SIMPLE_Min_halfv 416 bytes stack frame, 676 bytes spill stores, 1036 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL128_Min_halfv 672 bytes stack frame, 572 bytes spill stores, 760 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_TREE_LL_Min_halfv 360 bytes stack frame, 524 bytes spill stores, 696 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_floatv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_floatv 536 bytes stack frame, 1060 bytes spill stores, 1412 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Min_floatv 296 bytes stack frame, 320 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Min_floatv 600 bytes stack frame, 472 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Min_floatv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Min_floatv 352 bytes stack frame, 508 bytes spill stores, 668 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Min_floatv 576 bytes stack frame, 416 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Min_floatv 368 bytes stack frame, 532 bytes spill stores, 720 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_int64_tv 440 bytes stack frame, 712 bytes spill stores, 1052 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Min_int64_tv 408 bytes stack frame, 652 bytes spill stores, 944 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_int64_tv 240 bytes stack frame, 264 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_int64_tv 648 bytes stack frame, 1440 bytes spill stores, 2104 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Min_int64_tv 312 bytes stack frame, 340 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Min_int64_tv 544 bytes stack frame, 404 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Min_int64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Min_int64_tv 368 bytes stack frame, 760 bytes spill stores, 1028 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Min_int64_tv 504 bytes stack frame, 320 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Min_int64_tv 344 bytes stack frame, 508 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_uint64_tv 440 bytes stack frame, 712 bytes spill stores, 1052 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Min_uint64_tv 408 bytes stack frame, 652 bytes spill stores, 944 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_uint64_tv 240 bytes stack frame, 264 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_uint64_tv 648 bytes stack frame, 1440 bytes spill stores, 2104 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Min_uint64_tv 312 bytes stack frame, 340 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Min_uint64_tv 544 bytes stack frame, 404 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Min_uint64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Min_uint64_tv 368 bytes stack frame, 760 bytes spill stores, 1028 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Min_uint64_tv 504 bytes stack frame, 320 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Min_uint64_tv 344 bytes stack frame, 508 bytes spill stores, 684 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_min_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_min_f64.cu -o /<>/build/obj/collectives/device/all_reduce_min_f64.o Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_min_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_min_bf16.cu -o /<>/build/obj/collectives/device/all_reduce_min_bf16.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_NVLS_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_halfv 256 bytes stack frame, 276 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_halfv 584 bytes stack frame, 1300 bytes spill stores, 2044 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_SIMPLE_Min_halfv 416 bytes stack frame, 488 bytes spill stores, 1012 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL128_Min_halfv 600 bytes stack frame, 476 bytes spill stores, 536 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_RING_LL_Min_halfv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_SIMPLE_Min_halfv 520 bytes stack frame, 1056 bytes spill stores, 1580 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL128_Min_halfv 592 bytes stack frame, 432 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_TREE_LL_Min_halfv 384 bytes stack frame, 540 bytes spill stores, 736 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_floatv 288 bytes stack frame, 312 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_floatv 576 bytes stack frame, 1220 bytes spill stores, 1812 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Min_floatv 408 bytes stack frame, 480 bytes spill stores, 1164 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Min_floatv 600 bytes stack frame, 476 bytes spill stores, 536 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Min_floatv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Min_floatv 504 bytes stack frame, 1028 bytes spill stores, 1412 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Min_floatv 576 bytes stack frame, 420 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Min_floatv 384 bytes stack frame, 548 bytes spill stores, 748 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_doublev 528 bytes stack frame, 1056 bytes spill stores, 1456 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Min_doublev 256 bytes stack frame, 280 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Min_doublev 536 bytes stack frame, 392 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Min_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Min_doublev 336 bytes stack frame, 448 bytes spill stores, 668 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Min_doublev 528 bytes stack frame, 384 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Min_doublev 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min___nv_bfloat16v 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min___nv_bfloat16v 448 bytes stack frame, 976 bytes spill stores, 1188 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_Min___nv_bfloat16v 240 bytes stack frame, 268 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_Min___nv_bfloat16v 536 bytes stack frame, 384 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_Min___nv_bfloat16v 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_Min___nv_bfloat16v 304 bytes stack frame, 380 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_Min___nv_bfloat16v 688 bytes stack frame, 284 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_Min___nv_bfloat16v 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_halfv 440 bytes stack frame, 708 bytes spill stores, 1028 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_SIMPLE_Min_halfv 408 bytes stack frame, 612 bytes spill stores, 880 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_NVLS_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_halfv 240 bytes stack frame, 264 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_halfv 560 bytes stack frame, 1320 bytes spill stores, 1968 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_SIMPLE_Min_halfv 320 bytes stack frame, 348 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL128_Min_halfv 544 bytes stack frame, 408 bytes spill stores, 484 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_RING_LL_Min_halfv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_SIMPLE_Min_halfv 448 bytes stack frame, 948 bytes spill stores, 1472 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL128_Min_halfv 488 bytes stack frame, 296 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_TREE_LL_Min_halfv 360 bytes stack frame, 528 bytes spill stores, 728 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_max_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_max_i8.cu -o /<>/build/obj/collectives/device/all_reduce_max_i8.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_floatv 240 bytes stack frame, 268 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_floatv 552 bytes stack frame, 1248 bytes spill stores, 1984 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Min_floatv 312 bytes stack frame, 340 bytes spill stores, 620 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Min_floatv 544 bytes stack frame, 404 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Min_floatv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Min_floatv 440 bytes stack frame, 900 bytes spill stores, 1264 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Min_floatv 496 bytes stack frame, 292 bytes spill stores, 272 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Min_floatv 360 bytes stack frame, 548 bytes spill stores, 756 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_doublev 520 bytes stack frame, 1040 bytes spill stores, 1416 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Min_doublev 232 bytes stack frame, 256 bytes spill stores, 252 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Min_doublev 536 bytes stack frame, 392 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Min_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Min_doublev 328 bytes stack frame, 408 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Min_doublev 520 bytes stack frame, 368 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Min_doublev 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_max_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_max_u8.cu -o /<>/build/obj/collectives/device/all_reduce_max_u8.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min___nv_bfloat16v 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min___nv_bfloat16v 448 bytes stack frame, 976 bytes spill stores, 1188 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_Min___nv_bfloat16v 240 bytes stack frame, 268 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_Min___nv_bfloat16v 536 bytes stack frame, 384 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_Min___nv_bfloat16v 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_Min___nv_bfloat16v 304 bytes stack frame, 380 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_Min___nv_bfloat16v 688 bytes stack frame, 284 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_Min___nv_bfloat16v 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_doublev 520 bytes stack frame, 1040 bytes spill stores, 1416 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Min_doublev 232 bytes stack frame, 256 bytes spill stores, 252 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Min_doublev 536 bytes stack frame, 392 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Min_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Min_doublev 328 bytes stack frame, 408 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Min_doublev 520 bytes stack frame, 368 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Min_doublev 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_int8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_int8_tv 512 bytes stack frame, 1216 bytes spill stores, 1780 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Max_int8_tv 296 bytes stack frame, 320 bytes spill stores, 512 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Max_int8_tv 536 bytes stack frame, 388 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Max_int8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Max_int8_tv 488 bytes stack frame, 1028 bytes spill stores, 1476 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Max_int8_tv 872 bytes stack frame, 456 bytes spill stores, 624 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Max_int8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_uint8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_uint8_tv 512 bytes stack frame, 1212 bytes spill stores, 1796 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Max_uint8_tv 296 bytes stack frame, 328 bytes spill stores, 520 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Max_uint8_tv 536 bytes stack frame, 388 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Max_uint8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Max_uint8_tv 488 bytes stack frame, 908 bytes spill stores, 1272 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Max_uint8_tv 880 bytes stack frame, 464 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Max_uint8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_doublev 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_doublev 544 bytes stack frame, 1068 bytes spill stores, 1484 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Min_doublev 296 bytes stack frame, 316 bytes spill stores, 424 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Min_doublev 600 bytes stack frame, 480 bytes spill stores, 588 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Min_doublev 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Min_doublev 328 bytes stack frame, 452 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Min_doublev 592 bytes stack frame, 436 bytes spill stores, 596 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Min_doublev 352 bytes stack frame, 512 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min___nv_bfloat16v 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min___nv_bfloat16v 448 bytes stack frame, 976 bytes spill stores, 1188 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_Min___nv_bfloat16v 240 bytes stack frame, 268 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_Min___nv_bfloat16v 536 bytes stack frame, 384 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_Min___nv_bfloat16v 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_Min___nv_bfloat16v 304 bytes stack frame, 380 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_Min___nv_bfloat16v 688 bytes stack frame, 284 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_Min___nv_bfloat16v 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_int8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_int8_tv 512 bytes stack frame, 1216 bytes spill stores, 1780 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Max_int8_tv 296 bytes stack frame, 320 bytes spill stores, 512 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Max_int8_tv 536 bytes stack frame, 388 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Max_int8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Max_int8_tv 488 bytes stack frame, 1028 bytes spill stores, 1476 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Max_int8_tv 872 bytes stack frame, 456 bytes spill stores, 624 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Max_int8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_uint8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_uint8_tv 512 bytes stack frame, 1212 bytes spill stores, 1796 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Max_uint8_tv 296 bytes stack frame, 328 bytes spill stores, 520 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Max_uint8_tv 536 bytes stack frame, 388 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Max_uint8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Max_uint8_tv 488 bytes stack frame, 908 bytes spill stores, 1272 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Max_uint8_tv 880 bytes stack frame, 464 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Max_uint8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_doublev 296 bytes stack frame, 324 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_doublev 680 bytes stack frame, 1452 bytes spill stores, 2212 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Min_doublev 424 bytes stack frame, 516 bytes spill stores, 1228 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Min_doublev 616 bytes stack frame, 500 bytes spill stores, 632 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Min_doublev 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Min_doublev 480 bytes stack frame, 964 bytes spill stores, 1544 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Min_doublev 592 bytes stack frame, 436 bytes spill stores, 596 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Min_doublev 368 bytes stack frame, 536 bytes spill stores, 692 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_int8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_int8_tv 512 bytes stack frame, 1216 bytes spill stores, 1780 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Max_int8_tv 296 bytes stack frame, 320 bytes spill stores, 512 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Max_int8_tv 536 bytes stack frame, 388 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Max_int8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Max_int8_tv 488 bytes stack frame, 1028 bytes spill stores, 1476 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Max_int8_tv 872 bytes stack frame, 456 bytes spill stores, 624 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Max_int8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min___nv_bfloat16v 208 bytes stack frame, 204 bytes spill stores, 204 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min___nv_bfloat16v 496 bytes stack frame, 1012 bytes spill stores, 1240 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_Min___nv_bfloat16v 288 bytes stack frame, 308 bytes spill stores, 380 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_Min___nv_bfloat16v 592 bytes stack frame, 452 bytes spill stores, 524 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_Min___nv_bfloat16v 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_Min___nv_bfloat16v 328 bytes stack frame, 440 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_Min___nv_bfloat16v 736 bytes stack frame, 324 bytes spill stores, 480 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_Min___nv_bfloat16v 360 bytes stack frame, 524 bytes spill stores, 696 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_uint8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_uint8_tv 512 bytes stack frame, 1212 bytes spill stores, 1796 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Max_uint8_tv 296 bytes stack frame, 328 bytes spill stores, 520 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Max_uint8_tv 536 bytes stack frame, 388 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Max_uint8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Max_uint8_tv 488 bytes stack frame, 908 bytes spill stores, 1272 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Max_uint8_tv 880 bytes stack frame, 464 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Max_uint8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min_doublev 240 bytes stack frame, 268 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min_doublev 664 bytes stack frame, 1412 bytes spill stores, 2176 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Min_doublev 312 bytes stack frame, 340 bytes spill stores, 596 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Min_doublev 552 bytes stack frame, 420 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Min_doublev 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Min_doublev 448 bytes stack frame, 996 bytes spill stores, 1500 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Min_doublev 504 bytes stack frame, 332 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Min_doublev 344 bytes stack frame, 508 bytes spill stores, 684 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_max_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_max_i32.cu -o /<>/build/obj/collectives/device/all_reduce_max_i32.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min___nv_bfloat16v 256 bytes stack frame, 276 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min___nv_bfloat16v 584 bytes stack frame, 1300 bytes spill stores, 2044 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_Min___nv_bfloat16v 416 bytes stack frame, 488 bytes spill stores, 1012 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_Min___nv_bfloat16v 600 bytes stack frame, 476 bytes spill stores, 536 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_Min___nv_bfloat16v 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_Min___nv_bfloat16v 520 bytes stack frame, 1056 bytes spill stores, 1580 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_Min___nv_bfloat16v 592 bytes stack frame, 432 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_Min___nv_bfloat16v 384 bytes stack frame, 540 bytes spill stores, 736 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_int8_tv 288 bytes stack frame, 312 bytes spill stores, 392 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_int8_tv 552 bytes stack frame, 1208 bytes spill stores, 1712 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Max_int8_tv 400 bytes stack frame, 424 bytes spill stores, 972 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Max_int8_tv 592 bytes stack frame, 460 bytes spill stores, 544 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Max_int8_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Max_int8_tv 544 bytes stack frame, 1160 bytes spill stores, 1660 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Max_int8_tv 896 bytes stack frame, 520 bytes spill stores, 704 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Max_int8_tv 376 bytes stack frame, 540 bytes spill stores, 740 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_uint8_tv 288 bytes stack frame, 312 bytes spill stores, 412 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_uint8_tv 552 bytes stack frame, 1172 bytes spill stores, 1636 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Max_uint8_tv 400 bytes stack frame, 432 bytes spill stores, 988 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Max_uint8_tv 592 bytes stack frame, 460 bytes spill stores, 544 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Max_uint8_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Max_uint8_tv 536 bytes stack frame, 1004 bytes spill stores, 1392 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Max_uint8_tv 904 bytes stack frame, 528 bytes spill stores, 712 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Max_uint8_tv 376 bytes stack frame, 540 bytes spill stores, 740 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_int32_tv 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_int32_tv 488 bytes stack frame, 1004 bytes spill stores, 1208 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Max_int32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Max_int32_tv 536 bytes stack frame, 384 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Max_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Max_int32_tv 328 bytes stack frame, 472 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Max_int32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Max_int32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_int8_tv 376 bytes stack frame, 632 bytes spill stores, 1352 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_int8_tv 688 bytes stack frame, 1992 bytes spill stores, 3500 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Max_int8_tv 568 bytes stack frame, 828 bytes spill stores, 1748 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Max_int8_tv 592 bytes stack frame, 460 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Max_int8_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Max_int8_tv 936 bytes stack frame, 2260 bytes spill stores, 2964 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Max_int8_tv 880 bytes stack frame, 496 bytes spill stores, 656 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Max_int8_tv 384 bytes stack frame, 552 bytes spill stores, 748 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Min___nv_bfloat16v 440 bytes stack frame, 708 bytes spill stores, 1028 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_Min___nv_bfloat16v 408 bytes stack frame, 612 bytes spill stores, 880 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Min___nv_bfloat16v 240 bytes stack frame, 264 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Min___nv_bfloat16v 560 bytes stack frame, 1320 bytes spill stores, 1968 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_Min___nv_bfloat16v 320 bytes stack frame, 348 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_Min___nv_bfloat16v 544 bytes stack frame, 408 bytes spill stores, 484 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_Min___nv_bfloat16v 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_Min___nv_bfloat16v 448 bytes stack frame, 948 bytes spill stores, 1472 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_Min___nv_bfloat16v 488 bytes stack frame, 296 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_Min___nv_bfloat16v 360 bytes stack frame, 528 bytes spill stores, 728 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_max_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_max_u32.cu -o /<>/build/obj/collectives/device/all_reduce_max_u32.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_int32_tv 488 bytes stack frame, 1004 bytes spill stores, 1208 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Max_int32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Max_int32_tv 536 bytes stack frame, 384 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Max_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Max_int32_tv 328 bytes stack frame, 472 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Max_int32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Max_int32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_uint8_tv 376 bytes stack frame, 628 bytes spill stores, 1316 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_uint8_tv 688 bytes stack frame, 1992 bytes spill stores, 3500 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Max_uint8_tv 536 bytes stack frame, 804 bytes spill stores, 1732 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Max_uint8_tv 592 bytes stack frame, 460 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Max_uint8_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Max_uint8_tv 968 bytes stack frame, 2260 bytes spill stores, 3004 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Max_uint8_tv 880 bytes stack frame, 496 bytes spill stores, 656 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Max_uint8_tv 384 bytes stack frame, 552 bytes spill stores, 748 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_int32_tv 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_int32_tv 488 bytes stack frame, 1004 bytes spill stores, 1208 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Max_int32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Max_int32_tv 536 bytes stack frame, 384 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Max_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Max_int32_tv 328 bytes stack frame, 472 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Max_int32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Max_int32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_uint32_tv 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_uint32_tv 488 bytes stack frame, 1004 bytes spill stores, 1208 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Max_uint32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Max_uint32_tv 536 bytes stack frame, 384 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Max_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Max_uint32_tv 328 bytes stack frame, 472 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Max_uint32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Max_uint32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_int8_tv 368 bytes stack frame, 628 bytes spill stores, 1456 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_int8_tv 672 bytes stack frame, 2032 bytes spill stores, 3712 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Max_int8_tv 432 bytes stack frame, 644 bytes spill stores, 1356 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Max_int8_tv 536 bytes stack frame, 396 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Max_int8_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Max_int8_tv 864 bytes stack frame, 2200 bytes spill stores, 2944 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Max_int8_tv 800 bytes stack frame, 380 bytes spill stores, 548 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Max_int8_tv 360 bytes stack frame, 528 bytes spill stores, 748 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_max_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_max_i64.cu -o /<>/build/obj/collectives/device/all_reduce_max_i64.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_uint8_tv 360 bytes stack frame, 624 bytes spill stores, 1452 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_uint8_tv 664 bytes stack frame, 2032 bytes spill stores, 3712 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Max_uint8_tv 432 bytes stack frame, 644 bytes spill stores, 1352 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Max_uint8_tv 536 bytes stack frame, 396 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Max_uint8_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Max_uint8_tv 880 bytes stack frame, 2188 bytes spill stores, 2936 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Max_uint8_tv 808 bytes stack frame, 380 bytes spill stores, 548 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Max_uint8_tv 360 bytes stack frame, 528 bytes spill stores, 748 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_max_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_max_u64.cu -o /<>/build/obj/collectives/device/all_reduce_max_u64.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_int32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_int32_tv 544 bytes stack frame, 1156 bytes spill stores, 1496 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Max_int32_tv 296 bytes stack frame, 320 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Max_int32_tv 584 bytes stack frame, 448 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Max_int32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Max_int32_tv 360 bytes stack frame, 544 bytes spill stores, 756 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Max_int32_tv 576 bytes stack frame, 420 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Max_int32_tv 368 bytes stack frame, 532 bytes spill stores, 720 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_uint32_tv 488 bytes stack frame, 1004 bytes spill stores, 1208 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Max_uint32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Max_uint32_tv 536 bytes stack frame, 384 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Max_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Max_uint32_tv 328 bytes stack frame, 472 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Max_uint32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Max_uint32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_int64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_int64_tv 496 bytes stack frame, 992 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Max_int64_tv 248 bytes stack frame, 276 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Max_int64_tv 552 bytes stack frame, 416 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Max_int64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Max_int64_tv 296 bytes stack frame, 368 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Max_int64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Max_int64_tv 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_uint64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_uint64_tv 496 bytes stack frame, 992 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Max_uint64_tv 248 bytes stack frame, 276 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Max_uint64_tv 552 bytes stack frame, 416 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Max_uint64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Max_uint64_tv 296 bytes stack frame, 368 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Max_uint64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Max_uint64_tv 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_uint32_tv 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_uint32_tv 488 bytes stack frame, 1004 bytes spill stores, 1208 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Max_uint32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Max_uint32_tv 536 bytes stack frame, 384 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Max_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Max_uint32_tv 328 bytes stack frame, 472 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Max_uint32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Max_uint32_tv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_int32_tv 288 bytes stack frame, 312 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_int32_tv 576 bytes stack frame, 1252 bytes spill stores, 1856 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Max_int32_tv 408 bytes stack frame, 484 bytes spill stores, 1112 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Max_int32_tv 584 bytes stack frame, 452 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Max_int32_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Max_int32_tv 520 bytes stack frame, 1048 bytes spill stores, 1412 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Max_int32_tv 576 bytes stack frame, 416 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Max_int32_tv 384 bytes stack frame, 548 bytes spill stores, 748 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_int64_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_int64_tv 496 bytes stack frame, 992 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Max_int64_tv 248 bytes stack frame, 276 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Max_int64_tv 552 bytes stack frame, 416 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Max_int64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Max_int64_tv 296 bytes stack frame, 368 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Max_int64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Max_int64_tv 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_uint64_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_uint64_tv 496 bytes stack frame, 992 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Max_uint64_tv 248 bytes stack frame, 276 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Max_uint64_tv 552 bytes stack frame, 416 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Max_uint64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Max_uint64_tv 296 bytes stack frame, 368 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Max_uint64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Max_uint64_tv 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_uint32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_uint32_tv 544 bytes stack frame, 1156 bytes spill stores, 1496 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Max_uint32_tv 296 bytes stack frame, 320 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Max_uint32_tv 584 bytes stack frame, 448 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Max_uint32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Max_uint32_tv 360 bytes stack frame, 544 bytes spill stores, 756 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Max_uint32_tv 576 bytes stack frame, 420 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Max_uint32_tv 368 bytes stack frame, 532 bytes spill stores, 720 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_int32_tv 440 bytes stack frame, 728 bytes spill stores, 1076 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Max_int32_tv 408 bytes stack frame, 648 bytes spill stores, 924 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_int32_tv 240 bytes stack frame, 268 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_int32_tv 560 bytes stack frame, 1232 bytes spill stores, 1832 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Max_int32_tv 312 bytes stack frame, 340 bytes spill stores, 620 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Max_int32_tv 544 bytes stack frame, 404 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Max_int32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Max_int32_tv 440 bytes stack frame, 932 bytes spill stores, 1256 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Max_int32_tv 488 bytes stack frame, 300 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Max_int32_tv 360 bytes stack frame, 548 bytes spill stores, 756 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_int64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_int64_tv 496 bytes stack frame, 992 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Max_int64_tv 248 bytes stack frame, 276 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Max_int64_tv 552 bytes stack frame, 416 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Max_int64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Max_int64_tv 296 bytes stack frame, 368 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Max_int64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Max_int64_tv 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_uint64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_uint64_tv 496 bytes stack frame, 992 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Max_uint64_tv 248 bytes stack frame, 276 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Max_uint64_tv 552 bytes stack frame, 416 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Max_uint64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Max_uint64_tv 296 bytes stack frame, 368 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Max_uint64_tv 528 bytes stack frame, 384 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Max_uint64_tv 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_max_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_max_f16.cu -o /<>/build/obj/collectives/device/all_reduce_max_f16.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_uint32_tv 288 bytes stack frame, 312 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_uint32_tv 576 bytes stack frame, 1252 bytes spill stores, 1856 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Max_uint32_tv 408 bytes stack frame, 484 bytes spill stores, 1112 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Max_uint32_tv 584 bytes stack frame, 452 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Max_uint32_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Max_uint32_tv 520 bytes stack frame, 1048 bytes spill stores, 1412 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Max_uint32_tv 576 bytes stack frame, 416 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Max_uint32_tv 384 bytes stack frame, 548 bytes spill stores, 748 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_int64_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_int64_tv 552 bytes stack frame, 1156 bytes spill stores, 1556 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Max_int64_tv 304 bytes stack frame, 324 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Max_int64_tv 600 bytes stack frame, 460 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Max_int64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Max_int64_tv 312 bytes stack frame, 424 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Max_int64_tv 568 bytes stack frame, 408 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Max_int64_tv 352 bytes stack frame, 520 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_uint64_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_uint64_tv 552 bytes stack frame, 1156 bytes spill stores, 1556 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Max_uint64_tv 304 bytes stack frame, 324 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Max_uint64_tv 600 bytes stack frame, 460 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Max_uint64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Max_uint64_tv 312 bytes stack frame, 424 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Max_uint64_tv 568 bytes stack frame, 408 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Max_uint64_tv 352 bytes stack frame, 520 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_NVLS_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_halfv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_halfv 496 bytes stack frame, 1004 bytes spill stores, 1336 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_SIMPLE_Max_halfv 280 bytes stack frame, 316 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL128_Max_halfv 544 bytes stack frame, 412 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_RING_LL_Max_halfv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_SIMPLE_Max_halfv 416 bytes stack frame, 652 bytes spill stores, 992 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL128_Max_halfv 616 bytes stack frame, 496 bytes spill stores, 648 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_TREE_LL_Max_halfv 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_int64_tv 296 bytes stack frame, 324 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_int64_tv 648 bytes stack frame, 1424 bytes spill stores, 2124 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Max_int64_tv 416 bytes stack frame, 496 bytes spill stores, 1072 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Max_int64_tv 600 bytes stack frame, 472 bytes spill stores, 604 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Max_int64_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Max_int64_tv 424 bytes stack frame, 832 bytes spill stores, 1256 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Max_int64_tv 568 bytes stack frame, 408 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Max_int64_tv 368 bytes stack frame, 536 bytes spill stores, 692 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_uint64_tv 296 bytes stack frame, 324 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_uint64_tv 648 bytes stack frame, 1424 bytes spill stores, 2124 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Max_uint64_tv 416 bytes stack frame, 496 bytes spill stores, 1072 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Max_uint64_tv 600 bytes stack frame, 472 bytes spill stores, 604 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Max_uint64_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Max_uint64_tv 424 bytes stack frame, 832 bytes spill stores, 1256 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Max_uint64_tv 568 bytes stack frame, 408 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Max_uint64_tv 368 bytes stack frame, 536 bytes spill stores, 692 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_uint32_tv 440 bytes stack frame, 728 bytes spill stores, 1076 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Max_uint32_tv 408 bytes stack frame, 648 bytes spill stores, 924 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_uint32_tv 240 bytes stack frame, 268 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_uint32_tv 560 bytes stack frame, 1232 bytes spill stores, 1832 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Max_uint32_tv 312 bytes stack frame, 340 bytes spill stores, 620 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Max_uint32_tv 544 bytes stack frame, 404 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Max_uint32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Max_uint32_tv 440 bytes stack frame, 932 bytes spill stores, 1256 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Max_uint32_tv 488 bytes stack frame, 300 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Max_uint32_tv 360 bytes stack frame, 548 bytes spill stores, 756 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_max_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_max_f32.cu -o /<>/build/obj/collectives/device/all_reduce_max_f32.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_NVLS_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_halfv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_halfv 480 bytes stack frame, 964 bytes spill stores, 1300 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_SIMPLE_Max_halfv 264 bytes stack frame, 288 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL128_Max_halfv 544 bytes stack frame, 412 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_RING_LL_Max_halfv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_SIMPLE_Max_halfv 416 bytes stack frame, 652 bytes spill stores, 992 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL128_Max_halfv 616 bytes stack frame, 496 bytes spill stores, 648 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_TREE_LL_Max_halfv 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_floatv 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_floatv 496 bytes stack frame, 1016 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Max_floatv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Max_floatv 544 bytes stack frame, 396 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Max_floatv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Max_floatv 344 bytes stack frame, 500 bytes spill stores, 688 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Max_floatv 528 bytes stack frame, 376 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Max_floatv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_int64_tv 440 bytes stack frame, 712 bytes spill stores, 1052 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_SIMPLE_Max_int64_tv 408 bytes stack frame, 652 bytes spill stores, 944 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_int64_tv 240 bytes stack frame, 264 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_int64_tv 648 bytes stack frame, 1440 bytes spill stores, 2104 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_SIMPLE_Max_int64_tv 312 bytes stack frame, 340 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL128_Max_int64_tv 544 bytes stack frame, 404 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL_Max_int64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_SIMPLE_Max_int64_tv 368 bytes stack frame, 760 bytes spill stores, 1028 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL128_Max_int64_tv 504 bytes stack frame, 320 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL_Max_int64_tv 344 bytes stack frame, 508 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_uint64_tv 440 bytes stack frame, 712 bytes spill stores, 1052 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_SIMPLE_Max_uint64_tv 408 bytes stack frame, 652 bytes spill stores, 944 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_uint64_tv 240 bytes stack frame, 264 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_uint64_tv 648 bytes stack frame, 1440 bytes spill stores, 2104 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_SIMPLE_Max_uint64_tv 312 bytes stack frame, 340 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL128_Max_uint64_tv 544 bytes stack frame, 404 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL_Max_uint64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_SIMPLE_Max_uint64_tv 368 bytes stack frame, 760 bytes spill stores, 1028 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL128_Max_uint64_tv 504 bytes stack frame, 320 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL_Max_uint64_tv 344 bytes stack frame, 508 bytes spill stores, 684 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_max_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_max_f64.cu -o /<>/build/obj/collectives/device/all_reduce_max_f64.o Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_max_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_max_bf16.cu -o /<>/build/obj/collectives/device/all_reduce_max_bf16.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_NVLS_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_halfv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_halfv 496 bytes stack frame, 1004 bytes spill stores, 1336 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_SIMPLE_Max_halfv 280 bytes stack frame, 316 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL128_Max_halfv 544 bytes stack frame, 412 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_RING_LL_Max_halfv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_SIMPLE_Max_halfv 416 bytes stack frame, 652 bytes spill stores, 992 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL128_Max_halfv 616 bytes stack frame, 496 bytes spill stores, 648 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_TREE_LL_Max_halfv 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_floatv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_floatv 496 bytes stack frame, 1016 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Max_floatv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Max_floatv 544 bytes stack frame, 396 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Max_floatv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Max_floatv 344 bytes stack frame, 500 bytes spill stores, 688 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Max_floatv 528 bytes stack frame, 376 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Max_floatv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_doublev 528 bytes stack frame, 1056 bytes spill stores, 1456 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Max_doublev 256 bytes stack frame, 280 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Max_doublev 536 bytes stack frame, 392 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Max_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Max_doublev 336 bytes stack frame, 448 bytes spill stores, 668 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Max_doublev 528 bytes stack frame, 384 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Max_doublev 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_NVLS_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_halfv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_halfv 528 bytes stack frame, 1032 bytes spill stores, 1404 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_SIMPLE_Max_halfv 320 bytes stack frame, 344 bytes spill stores, 548 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL128_Max_halfv 600 bytes stack frame, 468 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_RING_LL_Max_halfv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_SIMPLE_Max_halfv 416 bytes stack frame, 676 bytes spill stores, 1036 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL128_Max_halfv 672 bytes stack frame, 572 bytes spill stores, 760 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_TREE_LL_Max_halfv 360 bytes stack frame, 524 bytes spill stores, 696 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max___nv_bfloat16v 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max___nv_bfloat16v 448 bytes stack frame, 976 bytes spill stores, 1188 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_Max___nv_bfloat16v 240 bytes stack frame, 268 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_Max___nv_bfloat16v 536 bytes stack frame, 384 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_Max___nv_bfloat16v 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_Max___nv_bfloat16v 304 bytes stack frame, 380 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_Max___nv_bfloat16v 688 bytes stack frame, 284 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_Max___nv_bfloat16v 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_floatv 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_floatv 496 bytes stack frame, 1016 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Max_floatv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Max_floatv 544 bytes stack frame, 396 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Max_floatv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Max_floatv 344 bytes stack frame, 500 bytes spill stores, 688 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Max_floatv 528 bytes stack frame, 376 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Max_floatv 344 bytes stack frame, 552 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_doublev 520 bytes stack frame, 1040 bytes spill stores, 1416 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Max_doublev 232 bytes stack frame, 256 bytes spill stores, 252 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Max_doublev 536 bytes stack frame, 392 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Max_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Max_doublev 328 bytes stack frame, 408 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Max_doublev 520 bytes stack frame, 368 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Max_doublev 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_NVLS_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_halfv 256 bytes stack frame, 276 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_halfv 584 bytes stack frame, 1300 bytes spill stores, 2044 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_SIMPLE_Max_halfv 416 bytes stack frame, 488 bytes spill stores, 1012 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL128_Max_halfv 600 bytes stack frame, 476 bytes spill stores, 536 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_RING_LL_Max_halfv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_SIMPLE_Max_halfv 520 bytes stack frame, 1056 bytes spill stores, 1580 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL128_Max_halfv 592 bytes stack frame, 432 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_TREE_LL_Max_halfv 384 bytes stack frame, 540 bytes spill stores, 736 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_floatv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_floatv 536 bytes stack frame, 1060 bytes spill stores, 1412 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Max_floatv 296 bytes stack frame, 320 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Max_floatv 600 bytes stack frame, 472 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Max_floatv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Max_floatv 352 bytes stack frame, 508 bytes spill stores, 668 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Max_floatv 576 bytes stack frame, 416 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Max_floatv 368 bytes stack frame, 532 bytes spill stores, 720 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max___nv_bfloat16v 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max___nv_bfloat16v 448 bytes stack frame, 976 bytes spill stores, 1188 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_Max___nv_bfloat16v 240 bytes stack frame, 268 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_Max___nv_bfloat16v 536 bytes stack frame, 384 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_Max___nv_bfloat16v 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_Max___nv_bfloat16v 304 bytes stack frame, 380 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_Max___nv_bfloat16v 688 bytes stack frame, 284 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_Max___nv_bfloat16v 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_doublev 520 bytes stack frame, 1040 bytes spill stores, 1416 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Max_doublev 232 bytes stack frame, 256 bytes spill stores, 252 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Max_doublev 536 bytes stack frame, 392 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Max_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Max_doublev 328 bytes stack frame, 408 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Max_doublev 520 bytes stack frame, 368 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Max_doublev 328 bytes stack frame, 532 bytes spill stores, 620 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_floatv 288 bytes stack frame, 312 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_floatv 576 bytes stack frame, 1220 bytes spill stores, 1812 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Max_floatv 408 bytes stack frame, 480 bytes spill stores, 1164 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Max_floatv 600 bytes stack frame, 476 bytes spill stores, 536 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Max_floatv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Max_floatv 504 bytes stack frame, 1028 bytes spill stores, 1412 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Max_floatv 576 bytes stack frame, 420 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Max_floatv 384 bytes stack frame, 548 bytes spill stores, 748 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_halfv 440 bytes stack frame, 708 bytes spill stores, 1028 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_SIMPLE_Max_halfv 408 bytes stack frame, 612 bytes spill stores, 880 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_NVLS_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_NVLS_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_halfv 240 bytes stack frame, 264 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_halfv 560 bytes stack frame, 1320 bytes spill stores, 1968 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_SIMPLE_Max_halfv 320 bytes stack frame, 348 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_RING_LL128_Max_halfv 544 bytes stack frame, 408 bytes spill stores, 484 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_RING_LL_Max_halfv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_SIMPLE_Max_halfv 448 bytes stack frame, 948 bytes spill stores, 1472 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_AllReduce_TREE_LL128_Max_halfv 488 bytes stack frame, 296 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_AllReduce_TREE_LL_Max_halfv 360 bytes stack frame, 528 bytes spill stores, 728 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_premulsum_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_premulsum_i8.cu -o /<>/build/obj/collectives/device/all_reduce_premulsum_i8.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_doublev 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_doublev 544 bytes stack frame, 1068 bytes spill stores, 1484 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Max_doublev 296 bytes stack frame, 316 bytes spill stores, 424 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Max_doublev 600 bytes stack frame, 480 bytes spill stores, 588 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Max_doublev 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Max_doublev 328 bytes stack frame, 452 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Max_doublev 592 bytes stack frame, 436 bytes spill stores, 596 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Max_doublev 352 bytes stack frame, 512 bytes spill stores, 664 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max___nv_bfloat16v 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max___nv_bfloat16v 448 bytes stack frame, 976 bytes spill stores, 1188 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_Max___nv_bfloat16v 240 bytes stack frame, 268 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_Max___nv_bfloat16v 536 bytes stack frame, 384 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_Max___nv_bfloat16v 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_Max___nv_bfloat16v 304 bytes stack frame, 380 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_Max___nv_bfloat16v 688 bytes stack frame, 284 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_Max___nv_bfloat16v 344 bytes stack frame, 536 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_NVLS_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_NVLS_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_floatv 240 bytes stack frame, 268 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_floatv 552 bytes stack frame, 1248 bytes spill stores, 1984 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_SIMPLE_Max_floatv 312 bytes stack frame, 340 bytes spill stores, 620 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_RING_LL128_Max_floatv 544 bytes stack frame, 404 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_RING_LL_Max_floatv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_SIMPLE_Max_floatv 440 bytes stack frame, 900 bytes spill stores, 1264 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_AllReduce_TREE_LL128_Max_floatv 496 bytes stack frame, 292 bytes spill stores, 272 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_AllReduce_TREE_LL_Max_floatv 360 bytes stack frame, 548 bytes spill stores, 756 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_premulsum_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_premulsum_u8.cu -o /<>/build/obj/collectives/device/all_reduce_premulsum_u8.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_doublev 296 bytes stack frame, 324 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_doublev 680 bytes stack frame, 1452 bytes spill stores, 2212 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Max_doublev 424 bytes stack frame, 516 bytes spill stores, 1228 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Max_doublev 616 bytes stack frame, 500 bytes spill stores, 632 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Max_doublev 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Max_doublev 480 bytes stack frame, 964 bytes spill stores, 1544 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Max_doublev 592 bytes stack frame, 436 bytes spill stores, 596 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Max_doublev 368 bytes stack frame, 536 bytes spill stores, 692 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int8_tv 520 bytes stack frame, 1156 bytes spill stores, 1672 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_int8_tv 304 bytes stack frame, 340 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL128_PreMulSum_int8_tv 520 bytes stack frame, 372 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL_PreMulSum_int8_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_int8_tv 512 bytes stack frame, 1064 bytes spill stores, 1556 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL128_PreMulSum_int8_tv 848 bytes stack frame, 440 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL_PreMulSum_int8_tv 352 bytes stack frame, 548 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max___nv_bfloat16v 208 bytes stack frame, 204 bytes spill stores, 204 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max___nv_bfloat16v 496 bytes stack frame, 1012 bytes spill stores, 1240 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_Max___nv_bfloat16v 288 bytes stack frame, 308 bytes spill stores, 380 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_Max___nv_bfloat16v 592 bytes stack frame, 452 bytes spill stores, 524 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_Max___nv_bfloat16v 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_Max___nv_bfloat16v 328 bytes stack frame, 440 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_Max___nv_bfloat16v 736 bytes stack frame, 324 bytes spill stores, 480 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_Max___nv_bfloat16v 360 bytes stack frame, 524 bytes spill stores, 696 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_NVLS_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_NVLS_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max_doublev 240 bytes stack frame, 268 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max_doublev 664 bytes stack frame, 1412 bytes spill stores, 2176 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_SIMPLE_Max_doublev 312 bytes stack frame, 340 bytes spill stores, 596 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_RING_LL128_Max_doublev 552 bytes stack frame, 420 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_RING_LL_Max_doublev 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_SIMPLE_Max_doublev 448 bytes stack frame, 996 bytes spill stores, 1500 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllReduce_TREE_LL128_Max_doublev 504 bytes stack frame, 332 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllReduce_TREE_LL_Max_doublev 344 bytes stack frame, 508 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint8_tv 520 bytes stack frame, 1156 bytes spill stores, 1672 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_uint8_tv 304 bytes stack frame, 340 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_PreMulSum_uint8_tv 520 bytes stack frame, 372 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_PreMulSum_uint8_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_uint8_tv 512 bytes stack frame, 1064 bytes spill stores, 1556 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_PreMulSum_uint8_tv 848 bytes stack frame, 440 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_PreMulSum_uint8_tv 352 bytes stack frame, 548 bytes spill stores, 684 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_premulsum_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_premulsum_i32.cu -o /<>/build/obj/collectives/device/all_reduce_premulsum_i32.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int8_tv 520 bytes stack frame, 1156 bytes spill stores, 1672 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_int8_tv 304 bytes stack frame, 340 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL128_PreMulSum_int8_tv 528 bytes stack frame, 376 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL_PreMulSum_int8_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_int8_tv 512 bytes stack frame, 1064 bytes spill stores, 1556 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL128_PreMulSum_int8_tv 848 bytes stack frame, 440 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL_PreMulSum_int8_tv 352 bytes stack frame, 548 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max___nv_bfloat16v 256 bytes stack frame, 276 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max___nv_bfloat16v 584 bytes stack frame, 1300 bytes spill stores, 2044 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_Max___nv_bfloat16v 416 bytes stack frame, 488 bytes spill stores, 1012 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_Max___nv_bfloat16v 600 bytes stack frame, 476 bytes spill stores, 536 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_Max___nv_bfloat16v 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_Max___nv_bfloat16v 520 bytes stack frame, 1056 bytes spill stores, 1580 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_Max___nv_bfloat16v 592 bytes stack frame, 432 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_Max___nv_bfloat16v 384 bytes stack frame, 540 bytes spill stores, 736 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int32_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int32_tv 496 bytes stack frame, 996 bytes spill stores, 1280 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_int32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_PreMulSum_int32_tv 560 bytes stack frame, 424 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_PreMulSum_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_int32_tv 328 bytes stack frame, 476 bytes spill stores, 624 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_PreMulSum_int32_tv 520 bytes stack frame, 380 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_PreMulSum_int32_tv 352 bytes stack frame, 556 bytes spill stores, 704 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint8_tv 520 bytes stack frame, 1156 bytes spill stores, 1672 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_uint8_tv 304 bytes stack frame, 340 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_PreMulSum_uint8_tv 528 bytes stack frame, 376 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_PreMulSum_uint8_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_uint8_tv 512 bytes stack frame, 1064 bytes spill stores, 1556 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_PreMulSum_uint8_tv 848 bytes stack frame, 440 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_PreMulSum_uint8_tv 352 bytes stack frame, 548 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int8_tv 520 bytes stack frame, 1156 bytes spill stores, 1672 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_int8_tv 304 bytes stack frame, 340 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL128_PreMulSum_int8_tv 528 bytes stack frame, 376 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL_PreMulSum_int8_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_int8_tv 512 bytes stack frame, 1064 bytes spill stores, 1556 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL128_PreMulSum_int8_tv 848 bytes stack frame, 440 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL_PreMulSum_int8_tv 352 bytes stack frame, 548 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_Max___nv_bfloat16v 440 bytes stack frame, 708 bytes spill stores, 1028 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_Max___nv_bfloat16v 408 bytes stack frame, 612 bytes spill stores, 880 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_Max___nv_bfloat16v 240 bytes stack frame, 264 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_Max___nv_bfloat16v 560 bytes stack frame, 1320 bytes spill stores, 1968 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_Max___nv_bfloat16v 320 bytes stack frame, 348 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_Max___nv_bfloat16v 544 bytes stack frame, 408 bytes spill stores, 484 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_Max___nv_bfloat16v 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_Max___nv_bfloat16v 448 bytes stack frame, 948 bytes spill stores, 1472 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_Max___nv_bfloat16v 488 bytes stack frame, 296 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_Max___nv_bfloat16v 360 bytes stack frame, 528 bytes spill stores, 728 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_premulsum_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_premulsum_u32.cu -o /<>/build/obj/collectives/device/all_reduce_premulsum_u32.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int32_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int32_tv 496 bytes stack frame, 996 bytes spill stores, 1280 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_int32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_PreMulSum_int32_tv 560 bytes stack frame, 424 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_PreMulSum_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_int32_tv 328 bytes stack frame, 476 bytes spill stores, 624 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_PreMulSum_int32_tv 520 bytes stack frame, 372 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_PreMulSum_int32_tv 352 bytes stack frame, 556 bytes spill stores, 704 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint8_tv 216 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint8_tv 520 bytes stack frame, 1156 bytes spill stores, 1672 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_uint8_tv 304 bytes stack frame, 340 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_PreMulSum_uint8_tv 528 bytes stack frame, 376 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_PreMulSum_uint8_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_uint8_tv 512 bytes stack frame, 1064 bytes spill stores, 1556 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_PreMulSum_uint8_tv 848 bytes stack frame, 440 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_PreMulSum_uint8_tv 352 bytes stack frame, 548 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int8_tv 232 bytes stack frame, 252 bytes spill stores, 244 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int8_tv 560 bytes stack frame, 1224 bytes spill stores, 1752 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_int8_tv 344 bytes stack frame, 392 bytes spill stores, 632 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL128_PreMulSum_int8_tv 584 bytes stack frame, 444 bytes spill stores, 488 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL_PreMulSum_int8_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_int8_tv 528 bytes stack frame, 1088 bytes spill stores, 1636 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL128_PreMulSum_int8_tv 880 bytes stack frame, 488 bytes spill stores, 648 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL_PreMulSum_int8_tv 384 bytes stack frame, 544 bytes spill stores, 752 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint32_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint32_tv 496 bytes stack frame, 996 bytes spill stores, 1280 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_uint32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_PreMulSum_uint32_tv 560 bytes stack frame, 424 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_PreMulSum_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_uint32_tv 328 bytes stack frame, 476 bytes spill stores, 624 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_PreMulSum_uint32_tv 520 bytes stack frame, 380 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_PreMulSum_uint32_tv 352 bytes stack frame, 556 bytes spill stores, 704 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int32_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int32_tv 496 bytes stack frame, 996 bytes spill stores, 1280 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_int32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_PreMulSum_int32_tv 560 bytes stack frame, 424 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_PreMulSum_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_int32_tv 328 bytes stack frame, 476 bytes spill stores, 624 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_PreMulSum_int32_tv 520 bytes stack frame, 372 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_PreMulSum_int32_tv 352 bytes stack frame, 556 bytes spill stores, 704 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint8_tv 232 bytes stack frame, 252 bytes spill stores, 244 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint8_tv 560 bytes stack frame, 1224 bytes spill stores, 1752 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_uint8_tv 344 bytes stack frame, 392 bytes spill stores, 632 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_PreMulSum_uint8_tv 584 bytes stack frame, 444 bytes spill stores, 488 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_PreMulSum_uint8_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_uint8_tv 528 bytes stack frame, 1088 bytes spill stores, 1636 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_PreMulSum_uint8_tv 880 bytes stack frame, 488 bytes spill stores, 648 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_PreMulSum_uint8_tv 384 bytes stack frame, 544 bytes spill stores, 752 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint32_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint32_tv 496 bytes stack frame, 996 bytes spill stores, 1280 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_uint32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_PreMulSum_uint32_tv 560 bytes stack frame, 424 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_PreMulSum_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_uint32_tv 328 bytes stack frame, 476 bytes spill stores, 624 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_PreMulSum_uint32_tv 520 bytes stack frame, 372 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_PreMulSum_uint32_tv 352 bytes stack frame, 556 bytes spill stores, 704 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int32_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int32_tv 552 bytes stack frame, 1160 bytes spill stores, 1496 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_int32_tv 304 bytes stack frame, 324 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_PreMulSum_int32_tv 592 bytes stack frame, 448 bytes spill stores, 520 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_PreMulSum_int32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_int32_tv 360 bytes stack frame, 580 bytes spill stores, 808 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_PreMulSum_int32_tv 584 bytes stack frame, 428 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_PreMulSum_int32_tv 368 bytes stack frame, 544 bytes spill stores, 740 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int8_tv 376 bytes stack frame, 644 bytes spill stores, 1360 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int8_tv 696 bytes stack frame, 2012 bytes spill stores, 3528 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_int8_tv 512 bytes stack frame, 856 bytes spill stores, 1776 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL128_PreMulSum_int8_tv 592 bytes stack frame, 452 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL_PreMulSum_int8_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_int8_tv 1200 bytes stack frame, 2784 bytes spill stores, 3452 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL128_PreMulSum_int8_tv 888 bytes stack frame, 496 bytes spill stores, 652 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL_PreMulSum_int8_tv 392 bytes stack frame, 548 bytes spill stores, 756 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint32_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint32_tv 496 bytes stack frame, 996 bytes spill stores, 1280 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_uint32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_PreMulSum_uint32_tv 560 bytes stack frame, 424 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_PreMulSum_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_uint32_tv 328 bytes stack frame, 476 bytes spill stores, 624 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_PreMulSum_uint32_tv 520 bytes stack frame, 372 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_PreMulSum_uint32_tv 352 bytes stack frame, 556 bytes spill stores, 704 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint8_tv 376 bytes stack frame, 644 bytes spill stores, 1360 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint8_tv 696 bytes stack frame, 2012 bytes spill stores, 3528 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_uint8_tv 512 bytes stack frame, 856 bytes spill stores, 1776 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_PreMulSum_uint8_tv 592 bytes stack frame, 452 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_PreMulSum_uint8_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_uint8_tv 1200 bytes stack frame, 2784 bytes spill stores, 3452 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_PreMulSum_uint8_tv 888 bytes stack frame, 496 bytes spill stores, 652 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_PreMulSum_uint8_tv 392 bytes stack frame, 548 bytes spill stores, 756 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int32_tv 288 bytes stack frame, 316 bytes spill stores, 432 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int32_tv 608 bytes stack frame, 1320 bytes spill stores, 2040 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_int32_tv 408 bytes stack frame, 480 bytes spill stores, 1164 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_PreMulSum_int32_tv 592 bytes stack frame, 452 bytes spill stores, 524 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_PreMulSum_int32_tv 208 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_int32_tv 536 bytes stack frame, 1120 bytes spill stores, 1604 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_PreMulSum_int32_tv 584 bytes stack frame, 428 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_PreMulSum_int32_tv 384 bytes stack frame, 552 bytes spill stores, 752 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int8_tv 368 bytes stack frame, 676 bytes spill stores, 1452 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int8_tv 672 bytes stack frame, 2124 bytes spill stores, 3804 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_int8_tv 424 bytes stack frame, 660 bytes spill stores, 1384 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL128_PreMulSum_int8_tv 528 bytes stack frame, 388 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL_PreMulSum_int8_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_int8_tv 1144 bytes stack frame, 2712 bytes spill stores, 3416 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL128_PreMulSum_int8_tv 832 bytes stack frame, 428 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL_PreMulSum_int8_tv 368 bytes stack frame, 540 bytes spill stores, 780 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_premulsum_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_premulsum_i64.cu -o /<>/build/obj/collectives/device/all_reduce_premulsum_i64.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint32_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint32_tv 552 bytes stack frame, 1160 bytes spill stores, 1496 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_uint32_tv 304 bytes stack frame, 324 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_PreMulSum_uint32_tv 592 bytes stack frame, 448 bytes spill stores, 520 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_PreMulSum_uint32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_uint32_tv 360 bytes stack frame, 580 bytes spill stores, 808 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_PreMulSum_uint32_tv 584 bytes stack frame, 428 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_PreMulSum_uint32_tv 368 bytes stack frame, 544 bytes spill stores, 740 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int32_tv 248 bytes stack frame, 272 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int32_tv 568 bytes stack frame, 1244 bytes spill stores, 1980 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_int32_tv 312 bytes stack frame, 352 bytes spill stores, 628 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_PreMulSum_int32_tv 528 bytes stack frame, 384 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_PreMulSum_int32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_int32_tv 448 bytes stack frame, 928 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_PreMulSum_int32_tv 496 bytes stack frame, 324 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_PreMulSum_int32_tv 368 bytes stack frame, 552 bytes spill stores, 784 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_premulsum_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_premulsum_u64.cu -o /<>/build/obj/collectives/device/all_reduce_premulsum_u64.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint8_tv 368 bytes stack frame, 676 bytes spill stores, 1452 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint8_tv 672 bytes stack frame, 2124 bytes spill stores, 3804 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_uint8_tv 424 bytes stack frame, 660 bytes spill stores, 1384 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_PreMulSum_uint8_tv 528 bytes stack frame, 388 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_PreMulSum_uint8_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_uint8_tv 1144 bytes stack frame, 2712 bytes spill stores, 3416 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_PreMulSum_uint8_tv 832 bytes stack frame, 428 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_PreMulSum_uint8_tv 368 bytes stack frame, 540 bytes spill stores, 780 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_premulsum_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_premulsum_f16.cu -o /<>/build/obj/collectives/device/all_reduce_premulsum_f16.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int64_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int64_tv 560 bytes stack frame, 1200 bytes spill stores, 1660 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_int64_tv 264 bytes stack frame, 288 bytes spill stores, 352 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_PreMulSum_int64_tv 552 bytes stack frame, 392 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_PreMulSum_int64_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_int64_tv 304 bytes stack frame, 404 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_PreMulSum_int64_tv 536 bytes stack frame, 392 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_PreMulSum_int64_tv 336 bytes stack frame, 536 bytes spill stores, 640 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint32_tv 288 bytes stack frame, 316 bytes spill stores, 432 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint32_tv 608 bytes stack frame, 1320 bytes spill stores, 2040 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_uint32_tv 408 bytes stack frame, 480 bytes spill stores, 1164 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_PreMulSum_uint32_tv 592 bytes stack frame, 452 bytes spill stores, 524 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_PreMulSum_uint32_tv 208 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_uint32_tv 536 bytes stack frame, 1120 bytes spill stores, 1604 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_PreMulSum_uint32_tv 584 bytes stack frame, 428 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_PreMulSum_uint32_tv 384 bytes stack frame, 552 bytes spill stores, 752 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint64_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint64_tv 560 bytes stack frame, 1200 bytes spill stores, 1660 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_uint64_tv 264 bytes stack frame, 288 bytes spill stores, 352 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_PreMulSum_uint64_tv 552 bytes stack frame, 392 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_PreMulSum_uint64_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_uint64_tv 304 bytes stack frame, 404 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_PreMulSum_uint64_tv 536 bytes stack frame, 392 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_PreMulSum_uint64_tv 336 bytes stack frame, 536 bytes spill stores, 640 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int64_tv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int64_tv 560 bytes stack frame, 1200 bytes spill stores, 1660 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_int64_tv 264 bytes stack frame, 288 bytes spill stores, 352 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_PreMulSum_int64_tv 552 bytes stack frame, 392 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_PreMulSum_int64_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_int64_tv 304 bytes stack frame, 404 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_PreMulSum_int64_tv 536 bytes stack frame, 392 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_PreMulSum_int64_tv 336 bytes stack frame, 536 bytes spill stores, 640 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_halfv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_halfv 520 bytes stack frame, 1092 bytes spill stores, 1292 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_halfv 240 bytes stack frame, 268 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL128_PreMulSum_halfv 544 bytes stack frame, 396 bytes spill stores, 412 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL_PreMulSum_halfv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_halfv 408 bytes stack frame, 664 bytes spill stores, 1040 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL128_PreMulSum_halfv 616 bytes stack frame, 508 bytes spill stores, 660 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL_PreMulSum_halfv 344 bytes stack frame, 544 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint32_tv 248 bytes stack frame, 272 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint32_tv 568 bytes stack frame, 1244 bytes spill stores, 1980 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_uint32_tv 312 bytes stack frame, 352 bytes spill stores, 628 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_PreMulSum_uint32_tv 528 bytes stack frame, 384 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_PreMulSum_uint32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_uint32_tv 448 bytes stack frame, 928 bytes spill stores, 1320 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_PreMulSum_uint32_tv 496 bytes stack frame, 324 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_PreMulSum_uint32_tv 368 bytes stack frame, 552 bytes spill stores, 784 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_premulsum_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_premulsum_f32.cu -o /<>/build/obj/collectives/device/all_reduce_premulsum_f32.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint64_tv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint64_tv 560 bytes stack frame, 1200 bytes spill stores, 1660 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_uint64_tv 264 bytes stack frame, 288 bytes spill stores, 352 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_PreMulSum_uint64_tv 552 bytes stack frame, 392 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_PreMulSum_uint64_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_uint64_tv 304 bytes stack frame, 404 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_PreMulSum_uint64_tv 536 bytes stack frame, 392 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_PreMulSum_uint64_tv 336 bytes stack frame, 536 bytes spill stores, 640 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int64_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int64_tv 560 bytes stack frame, 1200 bytes spill stores, 1660 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_int64_tv 264 bytes stack frame, 288 bytes spill stores, 352 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_PreMulSum_int64_tv 552 bytes stack frame, 392 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_PreMulSum_int64_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_int64_tv 304 bytes stack frame, 404 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_PreMulSum_int64_tv 536 bytes stack frame, 392 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_PreMulSum_int64_tv 336 bytes stack frame, 536 bytes spill stores, 640 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_halfv 160 bytes stack frame, 160 bytes spill stores, 160 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_halfv 464 bytes stack frame, 964 bytes spill stores, 1156 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_halfv 232 bytes stack frame, 248 bytes spill stores, 244 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL128_PreMulSum_halfv 544 bytes stack frame, 408 bytes spill stores, 396 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL_PreMulSum_halfv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_halfv 336 bytes stack frame, 480 bytes spill stores, 648 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL128_PreMulSum_halfv 528 bytes stack frame, 380 bytes spill stores, 416 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL_PreMulSum_halfv 344 bytes stack frame, 544 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_floatv 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_floatv 496 bytes stack frame, 1012 bytes spill stores, 1328 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_floatv 248 bytes stack frame, 276 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL128_PreMulSum_floatv 552 bytes stack frame, 412 bytes spill stores, 396 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL_PreMulSum_floatv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_floatv 328 bytes stack frame, 472 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL128_PreMulSum_floatv 536 bytes stack frame, 392 bytes spill stores, 412 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL_PreMulSum_floatv 344 bytes stack frame, 552 bytes spill stores, 672 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint64_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint64_tv 560 bytes stack frame, 1200 bytes spill stores, 1660 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_uint64_tv 264 bytes stack frame, 288 bytes spill stores, 352 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_PreMulSum_uint64_tv 552 bytes stack frame, 392 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_PreMulSum_uint64_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_uint64_tv 304 bytes stack frame, 404 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_PreMulSum_uint64_tv 536 bytes stack frame, 392 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_PreMulSum_uint64_tv 336 bytes stack frame, 536 bytes spill stores, 640 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int64_tv 200 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int64_tv 584 bytes stack frame, 1224 bytes spill stores, 1896 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_int64_tv 312 bytes stack frame, 336 bytes spill stores, 516 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_PreMulSum_int64_tv 600 bytes stack frame, 444 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_PreMulSum_int64_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_int64_tv 312 bytes stack frame, 424 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_PreMulSum_int64_tv 576 bytes stack frame, 416 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_PreMulSum_int64_tv 360 bytes stack frame, 520 bytes spill stores, 692 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_floatv 160 bytes stack frame, 160 bytes spill stores, 160 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_floatv 496 bytes stack frame, 1012 bytes spill stores, 1328 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_floatv 248 bytes stack frame, 276 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL128_PreMulSum_floatv 560 bytes stack frame, 408 bytes spill stores, 416 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL_PreMulSum_floatv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_floatv 328 bytes stack frame, 472 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL128_PreMulSum_floatv 536 bytes stack frame, 392 bytes spill stores, 412 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL_PreMulSum_floatv 344 bytes stack frame, 552 bytes spill stores, 672 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_halfv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_halfv 472 bytes stack frame, 960 bytes spill stores, 1160 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_halfv 240 bytes stack frame, 268 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL128_PreMulSum_halfv 544 bytes stack frame, 396 bytes spill stores, 412 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL_PreMulSum_halfv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_halfv 408 bytes stack frame, 664 bytes spill stores, 1040 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL128_PreMulSum_halfv 616 bytes stack frame, 508 bytes spill stores, 660 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL_PreMulSum_halfv 344 bytes stack frame, 544 bytes spill stores, 652 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint64_tv 200 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint64_tv 584 bytes stack frame, 1224 bytes spill stores, 1896 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_uint64_tv 312 bytes stack frame, 336 bytes spill stores, 516 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_PreMulSum_uint64_tv 600 bytes stack frame, 444 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_PreMulSum_uint64_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_uint64_tv 312 bytes stack frame, 424 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_PreMulSum_uint64_tv 576 bytes stack frame, 416 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_PreMulSum_uint64_tv 360 bytes stack frame, 520 bytes spill stores, 692 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int64_tv 304 bytes stack frame, 336 bytes spill stores, 476 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int64_tv 712 bytes stack frame, 1584 bytes spill stores, 2448 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_int64_tv 432 bytes stack frame, 540 bytes spill stores, 1332 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_PreMulSum_int64_tv 592 bytes stack frame, 440 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_PreMulSum_int64_tv 200 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_int64_tv 504 bytes stack frame, 1032 bytes spill stores, 1520 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_PreMulSum_int64_tv 576 bytes stack frame, 420 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_PreMulSum_int64_tv 376 bytes stack frame, 536 bytes spill stores, 716 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_floatv 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_floatv 496 bytes stack frame, 1012 bytes spill stores, 1328 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_floatv 248 bytes stack frame, 276 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL128_PreMulSum_floatv 560 bytes stack frame, 408 bytes spill stores, 416 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL_PreMulSum_floatv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_floatv 328 bytes stack frame, 472 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL128_PreMulSum_floatv 536 bytes stack frame, 392 bytes spill stores, 412 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL_PreMulSum_floatv 344 bytes stack frame, 552 bytes spill stores, 672 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint64_tv 304 bytes stack frame, 336 bytes spill stores, 476 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint64_tv 712 bytes stack frame, 1584 bytes spill stores, 2448 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_uint64_tv 432 bytes stack frame, 540 bytes spill stores, 1332 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_PreMulSum_uint64_tv 592 bytes stack frame, 440 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_PreMulSum_uint64_tv 200 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_uint64_tv 504 bytes stack frame, 1032 bytes spill stores, 1520 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_PreMulSum_uint64_tv 576 bytes stack frame, 420 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_PreMulSum_uint64_tv 376 bytes stack frame, 536 bytes spill stores, 716 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_halfv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_halfv 504 bytes stack frame, 1012 bytes spill stores, 1292 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_halfv 296 bytes stack frame, 316 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL128_PreMulSum_halfv 600 bytes stack frame, 460 bytes spill stores, 524 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL_PreMulSum_halfv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_halfv 336 bytes stack frame, 492 bytes spill stores, 608 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL128_PreMulSum_halfv 584 bytes stack frame, 428 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL_PreMulSum_halfv 368 bytes stack frame, 528 bytes spill stores, 708 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int64_tv 264 bytes stack frame, 300 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int64_tv 696 bytes stack frame, 1456 bytes spill stores, 2288 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_int64_tv 320 bytes stack frame, 372 bytes spill stores, 784 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_PreMulSum_int64_tv 520 bytes stack frame, 368 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_PreMulSum_int64_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_int64_tv 456 bytes stack frame, 964 bytes spill stores, 1380 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_PreMulSum_int64_tv 512 bytes stack frame, 328 bytes spill stores, 352 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_PreMulSum_int64_tv 352 bytes stack frame, 516 bytes spill stores, 712 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_premulsum_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_premulsum_f64.cu -o /<>/build/obj/collectives/device/all_reduce_premulsum_f64.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_floatv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_floatv 544 bytes stack frame, 1072 bytes spill stores, 1424 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_floatv 304 bytes stack frame, 324 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL128_PreMulSum_floatv 592 bytes stack frame, 448 bytes spill stores, 520 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL_PreMulSum_floatv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_floatv 344 bytes stack frame, 496 bytes spill stores, 648 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL128_PreMulSum_floatv 584 bytes stack frame, 428 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL_PreMulSum_floatv 368 bytes stack frame, 536 bytes spill stores, 732 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint64_tv 264 bytes stack frame, 300 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint64_tv 696 bytes stack frame, 1456 bytes spill stores, 2288 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_uint64_tv 320 bytes stack frame, 372 bytes spill stores, 784 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_PreMulSum_uint64_tv 520 bytes stack frame, 368 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_PreMulSum_uint64_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_uint64_tv 456 bytes stack frame, 964 bytes spill stores, 1380 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_PreMulSum_uint64_tv 512 bytes stack frame, 328 bytes spill stores, 352 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_PreMulSum_uint64_tv 352 bytes stack frame, 516 bytes spill stores, 712 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_premulsum_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_premulsum_bf16.cu -o /<>/build/obj/collectives/device/all_reduce_premulsum_bf16.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_halfv 256 bytes stack frame, 284 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_halfv 648 bytes stack frame, 1408 bytes spill stores, 2120 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_halfv 400 bytes stack frame, 464 bytes spill stores, 1064 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL128_PreMulSum_halfv 600 bytes stack frame, 464 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL_PreMulSum_halfv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_halfv 528 bytes stack frame, 1080 bytes spill stores, 1632 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL128_PreMulSum_halfv 584 bytes stack frame, 428 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL_PreMulSum_halfv 376 bytes stack frame, 544 bytes spill stores, 732 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_doublev 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_doublev 528 bytes stack frame, 1076 bytes spill stores, 1496 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_doublev 248 bytes stack frame, 276 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL128_PreMulSum_doublev 536 bytes stack frame, 364 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL_PreMulSum_doublev 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_doublev 328 bytes stack frame, 408 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL128_PreMulSum_doublev 536 bytes stack frame, 388 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL_PreMulSum_doublev 336 bytes stack frame, 536 bytes spill stores, 640 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_floatv 288 bytes stack frame, 316 bytes spill stores, 432 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_floatv 576 bytes stack frame, 1224 bytes spill stores, 1928 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_floatv 416 bytes stack frame, 476 bytes spill stores, 1232 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL128_PreMulSum_floatv 592 bytes stack frame, 452 bytes spill stores, 524 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL_PreMulSum_floatv 208 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_floatv 544 bytes stack frame, 1136 bytes spill stores, 1620 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL128_PreMulSum_floatv 584 bytes stack frame, 432 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL_PreMulSum_floatv 384 bytes stack frame, 552 bytes spill stores, 752 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_NVLS_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_halfv 240 bytes stack frame, 260 bytes spill stores, 348 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_halfv 608 bytes stack frame, 1344 bytes spill stores, 2016 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_halfv 328 bytes stack frame, 368 bytes spill stores, 656 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL128_PreMulSum_halfv 536 bytes stack frame, 396 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_RING_LL_PreMulSum_halfv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_halfv 456 bytes stack frame, 1024 bytes spill stores, 1584 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL128_PreMulSum_halfv 520 bytes stack frame, 344 bytes spill stores, 320 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllReduce_TREE_LL_PreMulSum_halfv 368 bytes stack frame, 536 bytes spill stores, 756 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_doublev 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_doublev 528 bytes stack frame, 1076 bytes spill stores, 1496 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_doublev 248 bytes stack frame, 276 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL128_PreMulSum_doublev 528 bytes stack frame, 348 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL_PreMulSum_doublev 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_doublev 328 bytes stack frame, 408 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL128_PreMulSum_doublev 536 bytes stack frame, 388 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL_PreMulSum_doublev 336 bytes stack frame, 536 bytes spill stores, 640 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i8.cu -o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i8.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_NVLS_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_floatv 248 bytes stack frame, 272 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_floatv 560 bytes stack frame, 1256 bytes spill stores, 1868 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_floatv 320 bytes stack frame, 360 bytes spill stores, 644 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL128_PreMulSum_floatv 528 bytes stack frame, 384 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_RING_LL_PreMulSum_floatv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_floatv 440 bytes stack frame, 940 bytes spill stores, 1404 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL128_PreMulSum_floatv 520 bytes stack frame, 364 bytes spill stores, 424 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllReduce_TREE_LL_PreMulSum_floatv 368 bytes stack frame, 552 bytes spill stores, 784 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z63ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum___nv_bfloat16v 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum___nv_bfloat16v 464 bytes stack frame, 996 bytes spill stores, 1220 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_RING_SIMPLE_PreMulSum___nv_bfloat16v 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_RING_LL128_PreMulSum___nv_bfloat16v 544 bytes stack frame, 396 bytes spill stores, 412 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_RING_LL_PreMulSum___nv_bfloat16v 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum___nv_bfloat16v 320 bytes stack frame, 460 bytes spill stores, 624 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_TREE_LL128_PreMulSum___nv_bfloat16v 688 bytes stack frame, 304 bytes spill stores, 396 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_TREE_LL_PreMulSum___nv_bfloat16v 344 bytes stack frame, 540 bytes spill stores, 648 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u8.cu -o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u8.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_doublev 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_doublev 528 bytes stack frame, 1076 bytes spill stores, 1496 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_doublev 248 bytes stack frame, 276 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL128_PreMulSum_doublev 528 bytes stack frame, 348 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL_PreMulSum_doublev 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_doublev 328 bytes stack frame, 408 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL128_PreMulSum_doublev 536 bytes stack frame, 388 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL_PreMulSum_doublev 336 bytes stack frame, 536 bytes spill stores, 640 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int8_tv 200 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int8_tv 512 bytes stack frame, 1300 bytes spill stores, 1908 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_int8_tv 320 bytes stack frame, 412 bytes spill stores, 616 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_SumPostDiv_int8_tv 560 bytes stack frame, 412 bytes spill stores, 416 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_SumPostDiv_int8_tv 160 bytes stack frame, 160 bytes spill stores, 160 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_int8_tv 544 bytes stack frame, 1116 bytes spill stores, 1680 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_SumPostDiv_int8_tv 864 bytes stack frame, 436 bytes spill stores, 608 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_SumPostDiv_int8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_doublev 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_doublev 576 bytes stack frame, 1216 bytes spill stores, 1776 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_doublev 304 bytes stack frame, 324 bytes spill stores, 480 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL128_PreMulSum_doublev 592 bytes stack frame, 436 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL_PreMulSum_doublev 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_doublev 352 bytes stack frame, 496 bytes spill stores, 684 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL128_PreMulSum_doublev 600 bytes stack frame, 460 bytes spill stores, 620 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL_PreMulSum_doublev 360 bytes stack frame, 520 bytes spill stores, 692 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint8_tv 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint8_tv 512 bytes stack frame, 1300 bytes spill stores, 1908 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_uint8_tv 320 bytes stack frame, 412 bytes spill stores, 616 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_SumPostDiv_uint8_tv 560 bytes stack frame, 412 bytes spill stores, 416 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_SumPostDiv_uint8_tv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_uint8_tv 544 bytes stack frame, 1112 bytes spill stores, 1680 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_SumPostDiv_uint8_tv 864 bytes stack frame, 436 bytes spill stores, 608 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_SumPostDiv_uint8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z63ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum___nv_bfloat16v 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum___nv_bfloat16v 464 bytes stack frame, 996 bytes spill stores, 1220 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_RING_SIMPLE_PreMulSum___nv_bfloat16v 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_RING_LL128_PreMulSum___nv_bfloat16v 544 bytes stack frame, 396 bytes spill stores, 412 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_RING_LL_PreMulSum___nv_bfloat16v 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum___nv_bfloat16v 320 bytes stack frame, 460 bytes spill stores, 624 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_TREE_LL128_PreMulSum___nv_bfloat16v 688 bytes stack frame, 304 bytes spill stores, 396 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_TREE_LL_PreMulSum___nv_bfloat16v 344 bytes stack frame, 540 bytes spill stores, 648 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int8_tv 200 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int8_tv 512 bytes stack frame, 1300 bytes spill stores, 1908 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_int8_tv 320 bytes stack frame, 412 bytes spill stores, 616 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_SumPostDiv_int8_tv 552 bytes stack frame, 396 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_SumPostDiv_int8_tv 160 bytes stack frame, 160 bytes spill stores, 160 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_int8_tv 544 bytes stack frame, 1116 bytes spill stores, 1680 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_SumPostDiv_int8_tv 864 bytes stack frame, 436 bytes spill stores, 608 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_SumPostDiv_int8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_doublev 240 bytes stack frame, 260 bytes spill stores, 252 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_doublev 704 bytes stack frame, 1548 bytes spill stores, 2344 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_doublev 376 bytes stack frame, 448 bytes spill stores, 820 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL128_PreMulSum_doublev 592 bytes stack frame, 440 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL_PreMulSum_doublev 200 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_doublev 464 bytes stack frame, 948 bytes spill stores, 1460 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL128_PreMulSum_doublev 600 bytes stack frame, 460 bytes spill stores, 620 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL_PreMulSum_doublev 376 bytes stack frame, 536 bytes spill stores, 716 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint8_tv 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint8_tv 512 bytes stack frame, 1300 bytes spill stores, 1908 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_uint8_tv 320 bytes stack frame, 412 bytes spill stores, 616 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_SumPostDiv_uint8_tv 552 bytes stack frame, 396 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_SumPostDiv_uint8_tv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_uint8_tv 544 bytes stack frame, 1112 bytes spill stores, 1680 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_SumPostDiv_uint8_tv 864 bytes stack frame, 436 bytes spill stores, 608 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_SumPostDiv_uint8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_NVLS_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum_doublev 224 bytes stack frame, 244 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum_doublev 704 bytes stack frame, 1536 bytes spill stores, 2396 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_SIMPLE_PreMulSum_doublev 280 bytes stack frame, 296 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL128_PreMulSum_doublev 512 bytes stack frame, 364 bytes spill stores, 432 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_RING_LL_PreMulSum_doublev 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum_doublev 448 bytes stack frame, 944 bytes spill stores, 1412 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL128_PreMulSum_doublev 504 bytes stack frame, 324 bytes spill stores, 304 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_AllReduce_TREE_LL_PreMulSum_doublev 352 bytes stack frame, 516 bytes spill stores, 712 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i32.cu -o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i32.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int8_tv 200 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int8_tv 512 bytes stack frame, 1300 bytes spill stores, 1908 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_int8_tv 320 bytes stack frame, 412 bytes spill stores, 616 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_SumPostDiv_int8_tv 552 bytes stack frame, 396 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_SumPostDiv_int8_tv 160 bytes stack frame, 160 bytes spill stores, 160 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_int8_tv 544 bytes stack frame, 1116 bytes spill stores, 1680 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_SumPostDiv_int8_tv 864 bytes stack frame, 436 bytes spill stores, 608 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_SumPostDiv_int8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z63ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum___nv_bfloat16v 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum___nv_bfloat16v 464 bytes stack frame, 996 bytes spill stores, 1220 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_RING_SIMPLE_PreMulSum___nv_bfloat16v 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_RING_LL128_PreMulSum___nv_bfloat16v 544 bytes stack frame, 396 bytes spill stores, 412 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_RING_LL_PreMulSum___nv_bfloat16v 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum___nv_bfloat16v 320 bytes stack frame, 460 bytes spill stores, 624 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_TREE_LL128_PreMulSum___nv_bfloat16v 688 bytes stack frame, 304 bytes spill stores, 396 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_TREE_LL_PreMulSum___nv_bfloat16v 344 bytes stack frame, 540 bytes spill stores, 648 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint8_tv 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint8_tv 512 bytes stack frame, 1300 bytes spill stores, 1908 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_uint8_tv 320 bytes stack frame, 412 bytes spill stores, 616 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_SumPostDiv_uint8_tv 552 bytes stack frame, 396 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_SumPostDiv_uint8_tv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_uint8_tv 544 bytes stack frame, 1112 bytes spill stores, 1680 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_SumPostDiv_uint8_tv 864 bytes stack frame, 436 bytes spill stores, 608 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_SumPostDiv_uint8_tv 352 bytes stack frame, 552 bytes spill stores, 684 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int32_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int32_tv 504 bytes stack frame, 1072 bytes spill stores, 1360 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_int32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_SumPostDiv_int32_tv 552 bytes stack frame, 412 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_SumPostDiv_int32_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_int32_tv 312 bytes stack frame, 408 bytes spill stores, 568 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_SumPostDiv_int32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_SumPostDiv_int32_tv 344 bytes stack frame, 560 bytes spill stores, 672 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int8_tv 256 bytes stack frame, 276 bytes spill stores, 356 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int8_tv 560 bytes stack frame, 1240 bytes spill stores, 1700 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_int8_tv 384 bytes stack frame, 448 bytes spill stores, 828 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_SumPostDiv_int8_tv 600 bytes stack frame, 460 bytes spill stores, 560 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_SumPostDiv_int8_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_int8_tv 568 bytes stack frame, 1204 bytes spill stores, 1760 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_SumPostDiv_int8_tv 888 bytes stack frame, 516 bytes spill stores, 672 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_SumPostDiv_int8_tv 384 bytes stack frame, 544 bytes spill stores, 744 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint8_tv 256 bytes stack frame, 284 bytes spill stores, 380 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint8_tv 560 bytes stack frame, 1244 bytes spill stores, 1704 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_uint8_tv 384 bytes stack frame, 452 bytes spill stores, 832 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_SumPostDiv_uint8_tv 600 bytes stack frame, 460 bytes spill stores, 560 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_SumPostDiv_uint8_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_uint8_tv 576 bytes stack frame, 1208 bytes spill stores, 1760 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_SumPostDiv_uint8_tv 888 bytes stack frame, 516 bytes spill stores, 672 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_SumPostDiv_uint8_tv 384 bytes stack frame, 544 bytes spill stores, 744 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int32_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int32_tv 504 bytes stack frame, 1072 bytes spill stores, 1360 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_int32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_SumPostDiv_int32_tv 552 bytes stack frame, 412 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_SumPostDiv_int32_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_int32_tv 312 bytes stack frame, 408 bytes spill stores, 568 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_SumPostDiv_int32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_SumPostDiv_int32_tv 344 bytes stack frame, 560 bytes spill stores, 672 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z63ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum___nv_bfloat16v 208 bytes stack frame, 208 bytes spill stores, 208 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum___nv_bfloat16v 504 bytes stack frame, 1040 bytes spill stores, 1324 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_RING_SIMPLE_PreMulSum___nv_bfloat16v 296 bytes stack frame, 316 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_RING_LL128_PreMulSum___nv_bfloat16v 592 bytes stack frame, 452 bytes spill stores, 520 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_RING_LL_PreMulSum___nv_bfloat16v 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum___nv_bfloat16v 344 bytes stack frame, 492 bytes spill stores, 640 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_TREE_LL128_PreMulSum___nv_bfloat16v 744 bytes stack frame, 348 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_TREE_LL_PreMulSum___nv_bfloat16v 368 bytes stack frame, 524 bytes spill stores, 712 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int8_tv 392 bytes stack frame, 700 bytes spill stores, 1432 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int8_tv 688 bytes stack frame, 2060 bytes spill stores, 3584 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_int8_tv 536 bytes stack frame, 904 bytes spill stores, 1848 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_SumPostDiv_int8_tv 600 bytes stack frame, 460 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_SumPostDiv_int8_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_int8_tv 1088 bytes stack frame, 2688 bytes spill stores, 3428 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_SumPostDiv_int8_tv 888 bytes stack frame, 520 bytes spill stores, 676 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_SumPostDiv_int8_tv 384 bytes stack frame, 544 bytes spill stores, 756 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int32_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int32_tv 504 bytes stack frame, 1072 bytes spill stores, 1360 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_int32_tv 248 bytes stack frame, 272 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_SumPostDiv_int32_tv 552 bytes stack frame, 412 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_SumPostDiv_int32_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_int32_tv 312 bytes stack frame, 408 bytes spill stores, 568 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_SumPostDiv_int32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_SumPostDiv_int32_tv 344 bytes stack frame, 560 bytes spill stores, 672 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z63ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum___nv_bfloat16v 256 bytes stack frame, 284 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum___nv_bfloat16v 664 bytes stack frame, 1432 bytes spill stores, 2180 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_RING_SIMPLE_PreMulSum___nv_bfloat16v 400 bytes stack frame, 464 bytes spill stores, 1064 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_RING_LL128_PreMulSum___nv_bfloat16v 600 bytes stack frame, 464 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_RING_LL_PreMulSum___nv_bfloat16v 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum___nv_bfloat16v 528 bytes stack frame, 1080 bytes spill stores, 1624 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_TREE_LL128_PreMulSum___nv_bfloat16v 584 bytes stack frame, 428 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_TREE_LL_PreMulSum___nv_bfloat16v 376 bytes stack frame, 544 bytes spill stores, 732 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint8_tv 384 bytes stack frame, 704 bytes spill stores, 1428 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint8_tv 680 bytes stack frame, 2060 bytes spill stores, 3640 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_uint8_tv 536 bytes stack frame, 904 bytes spill stores, 1848 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_SumPostDiv_uint8_tv 600 bytes stack frame, 460 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_SumPostDiv_uint8_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_uint8_tv 1088 bytes stack frame, 2688 bytes spill stores, 3428 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_SumPostDiv_uint8_tv 888 bytes stack frame, 520 bytes spill stores, 676 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_SumPostDiv_uint8_tv 384 bytes stack frame, 544 bytes spill stores, 756 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int32_tv 560 bytes stack frame, 1168 bytes spill stores, 1544 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_int32_tv 296 bytes stack frame, 320 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_SumPostDiv_int32_tv 600 bytes stack frame, 472 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_SumPostDiv_int32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_int32_tv 344 bytes stack frame, 544 bytes spill stores, 696 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_SumPostDiv_int32_tv 576 bytes stack frame, 420 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_SumPostDiv_int32_tv 368 bytes stack frame, 536 bytes spill stores, 716 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z63ncclFunction_AllReduce_NVLS_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_NVLS_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_NVLS_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_PreMulSum___nv_bfloat16v 240 bytes stack frame, 260 bytes spill stores, 348 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_AllReduce_COLLNET_CHAIN_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_CHAIN_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_PreMulSum___nv_bfloat16v 608 bytes stack frame, 1344 bytes spill stores, 2016 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_AllReduce_COLLNET_DIRECT_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_AllReduce_COLLNET_DIRECT_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_RING_SIMPLE_PreMulSum___nv_bfloat16v 328 bytes stack frame, 368 bytes spill stores, 656 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_RING_LL128_PreMulSum___nv_bfloat16v 536 bytes stack frame, 396 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_RING_LL_PreMulSum___nv_bfloat16v 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_TREE_SIMPLE_PreMulSum___nv_bfloat16v 456 bytes stack frame, 1024 bytes spill stores, 1584 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_TREE_LL128_PreMulSum___nv_bfloat16v 520 bytes stack frame, 344 bytes spill stores, 320 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_TREE_LL_PreMulSum___nv_bfloat16v 368 bytes stack frame, 536 bytes spill stores, 756 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u32.cu -o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u32.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_NVLS_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int8_tv 376 bytes stack frame, 684 bytes spill stores, 1444 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int8_tv 672 bytes stack frame, 2100 bytes spill stores, 3780 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_int8_tv 440 bytes stack frame, 672 bytes spill stores, 1436 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_RING_LL128_SumPostDiv_int8_tv 544 bytes stack frame, 400 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_RING_LL_SumPostDiv_int8_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_int8_tv 1024 bytes stack frame, 2684 bytes spill stores, 3472 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllReduce_TREE_LL128_SumPostDiv_int8_tv 824 bytes stack frame, 392 bytes spill stores, 560 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_AllReduce_TREE_LL_SumPostDiv_int8_tv 368 bytes stack frame, 520 bytes spill stores, 796 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i64.cu -o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i64.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int32_tv 296 bytes stack frame, 328 bytes spill stores, 524 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int32_tv 576 bytes stack frame, 1284 bytes spill stores, 1896 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_int32_tv 408 bytes stack frame, 460 bytes spill stores, 1136 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_SumPostDiv_int32_tv 608 bytes stack frame, 476 bytes spill stores, 536 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_SumPostDiv_int32_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_int32_tv 520 bytes stack frame, 1020 bytes spill stores, 1392 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_SumPostDiv_int32_tv 576 bytes stack frame, 416 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_SumPostDiv_int32_tv 384 bytes stack frame, 552 bytes spill stores, 744 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint8_tv 368 bytes stack frame, 676 bytes spill stores, 1496 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint8_tv 664 bytes stack frame, 2096 bytes spill stores, 3832 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_uint8_tv 448 bytes stack frame, 680 bytes spill stores, 1436 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_SumPostDiv_uint8_tv 544 bytes stack frame, 400 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_SumPostDiv_uint8_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_uint8_tv 1024 bytes stack frame, 2680 bytes spill stores, 3468 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_SumPostDiv_uint8_tv 824 bytes stack frame, 392 bytes spill stores, 560 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_SumPostDiv_uint8_tv 368 bytes stack frame, 520 bytes spill stores, 796 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u64.cu -o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u64.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z59ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint32_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint32_tv 504 bytes stack frame, 1068 bytes spill stores, 1352 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_uint32_tv 248 bytes stack frame, 276 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_LL128_SumPostDiv_uint32_tv 552 bytes stack frame, 412 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL_SumPostDiv_uint32_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_uint32_tv 312 bytes stack frame, 404 bytes spill stores, 560 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_LL128_SumPostDiv_uint32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL_SumPostDiv_uint32_tv 344 bytes stack frame, 560 bytes spill stores, 672 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int64_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int64_tv 552 bytes stack frame, 1180 bytes spill stores, 1752 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_int64_tv 328 bytes stack frame, 356 bytes spill stores, 644 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_SumPostDiv_int64_tv 560 bytes stack frame, 420 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_SumPostDiv_int64_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_int64_tv 320 bytes stack frame, 412 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_SumPostDiv_int64_tv 544 bytes stack frame, 428 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_SumPostDiv_int64_tv 384 bytes stack frame, 516 bytes spill stores, 692 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int32_tv 256 bytes stack frame, 304 bytes spill stores, 392 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int32_tv 552 bytes stack frame, 1264 bytes spill stores, 1840 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_int32_tv 328 bytes stack frame, 384 bytes spill stores, 684 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_SumPostDiv_int32_tv 544 bytes stack frame, 404 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_SumPostDiv_int32_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_int32_tv 440 bytes stack frame, 908 bytes spill stores, 1228 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_SumPostDiv_int32_tv 488 bytes stack frame, 308 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_SumPostDiv_int32_tv 368 bytes stack frame, 544 bytes spill stores, 776 bytes spill loads Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f16.cu -o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f16.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z59ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint64_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint64_tv 568 bytes stack frame, 1208 bytes spill stores, 1664 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_uint64_tv 296 bytes stack frame, 332 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_LL128_SumPostDiv_uint64_tv 560 bytes stack frame, 420 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL_SumPostDiv_uint64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_uint64_tv 304 bytes stack frame, 384 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_LL128_SumPostDiv_uint64_tv 544 bytes stack frame, 396 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL_SumPostDiv_uint64_tv 368 bytes stack frame, 532 bytes spill stores, 656 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 4 bytes gmem ptxas info : Function properties for _Z59ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint32_tv 160 bytes stack frame, 160 bytes spill stores, 160 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint32_tv 504 bytes stack frame, 1068 bytes spill stores, 1352 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_uint32_tv 248 bytes stack frame, 276 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_LL128_SumPostDiv_uint32_tv 552 bytes stack frame, 412 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL_SumPostDiv_uint32_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_uint32_tv 312 bytes stack frame, 404 bytes spill stores, 560 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_LL128_SumPostDiv_uint32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL_SumPostDiv_uint32_tv 344 bytes stack frame, 560 bytes spill stores, 672 bytes spill loads ptxas info : 0 bytes gmem Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f32.cu -o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int64_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int64_tv 552 bytes stack frame, 1180 bytes spill stores, 1752 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_int64_tv 328 bytes stack frame, 356 bytes spill stores, 644 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_SumPostDiv_int64_tv 560 bytes stack frame, 420 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_SumPostDiv_int64_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_int64_tv 320 bytes stack frame, 412 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_SumPostDiv_int64_tv 544 bytes stack frame, 432 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_SumPostDiv_int64_tv 384 bytes stack frame, 516 bytes spill stores, 692 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f64.cu -o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 4 bytes gmem ptxas info : Function properties for _Z59ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint64_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint64_tv 568 bytes stack frame, 1208 bytes spill stores, 1664 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_uint64_tv 296 bytes stack frame, 332 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_LL128_SumPostDiv_uint64_tv 560 bytes stack frame, 420 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL_SumPostDiv_uint64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_uint64_tv 304 bytes stack frame, 384 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_LL128_SumPostDiv_uint64_tv 544 bytes stack frame, 396 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL_SumPostDiv_uint64_tv 368 bytes stack frame, 532 bytes spill stores, 656 bytes spill loads ptxas info : 0 bytes gmem Compiling all_reduce.cu > /<>/build/obj/collectives/device/all_reduce_sumpostdiv_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_reduce_sumpostdiv_bf16.cu -o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_bf16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 4 bytes gmem ptxas info : Function properties for _Z59ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint32_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint32_tv 504 bytes stack frame, 1068 bytes spill stores, 1352 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_uint32_tv 248 bytes stack frame, 276 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_LL128_SumPostDiv_uint32_tv 552 bytes stack frame, 412 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL_SumPostDiv_uint32_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_uint32_tv 312 bytes stack frame, 404 bytes spill stores, 560 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_LL128_SumPostDiv_uint32_tv 520 bytes stack frame, 364 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL_SumPostDiv_uint32_tv 344 bytes stack frame, 560 bytes spill stores, 672 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_sum_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_sum_i8.cu -o /<>/build/obj/collectives/device/all_gather_sum_i8.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int64_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int64_tv 552 bytes stack frame, 1180 bytes spill stores, 1752 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_int64_tv 328 bytes stack frame, 356 bytes spill stores, 644 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_SumPostDiv_int64_tv 560 bytes stack frame, 420 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_SumPostDiv_int64_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_int64_tv 320 bytes stack frame, 412 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_SumPostDiv_int64_tv 544 bytes stack frame, 432 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_SumPostDiv_int64_tv 384 bytes stack frame, 516 bytes spill stores, 692 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_AllGather_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z44ncclKernel_AllGather_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllGather_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z39ncclKernel_AllGather_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_AllGather_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z48ncclKernel_AllGather_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllGather_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z49ncclKernel_AllGather_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllGather_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z39ncclKernel_AllGather_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 90 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z39ncclKernel_AllGather_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z39ncclKernel_AllGather_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z50ncclFunction_AllGather_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllGather_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllGather_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllGather_NVLS_SIMPLE_Sum_int8_tv 280 bytes stack frame, 300 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllGather_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllGather_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllGather_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllGather_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllGather_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllGather_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllGather_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllGather_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllGather_RING_SIMPLE_Sum_int8_tv 272 bytes stack frame, 296 bytes spill stores, 340 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllGather_RING_LL128_Sum_int8_tv 360 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllGather_RING_LL_Sum_int8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllGather_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllGather_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllGather_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z59ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint64_tv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint64_tv 568 bytes stack frame, 1208 bytes spill stores, 1664 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_uint64_tv 296 bytes stack frame, 332 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_LL128_SumPostDiv_uint64_tv 560 bytes stack frame, 420 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL_SumPostDiv_uint64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_uint64_tv 304 bytes stack frame, 384 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_LL128_SumPostDiv_uint64_tv 544 bytes stack frame, 396 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL_SumPostDiv_uint64_tv 368 bytes stack frame, 532 bytes spill stores, 656 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_AllGather_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z44ncclKernel_AllGather_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllGather_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z39ncclKernel_AllGather_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_AllGather_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z48ncclKernel_AllGather_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllGather_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z49ncclKernel_AllGather_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllGather_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z39ncclKernel_AllGather_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 88 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z39ncclKernel_AllGather_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z39ncclKernel_AllGather_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z50ncclFunction_AllGather_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllGather_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllGather_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllGather_NVLS_SIMPLE_Sum_int8_tv 280 bytes stack frame, 300 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllGather_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllGather_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllGather_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllGather_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllGather_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllGather_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllGather_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllGather_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllGather_RING_SIMPLE_Sum_int8_tv 272 bytes stack frame, 296 bytes spill stores, 340 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllGather_RING_LL128_Sum_int8_tv 360 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllGather_RING_LL_Sum_int8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllGather_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllGather_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllGather_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z59ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint32_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint32_tv 560 bytes stack frame, 1168 bytes spill stores, 1544 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_uint32_tv 296 bytes stack frame, 320 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_LL128_SumPostDiv_uint32_tv 600 bytes stack frame, 472 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL_SumPostDiv_uint32_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_uint32_tv 344 bytes stack frame, 536 bytes spill stores, 672 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_LL128_SumPostDiv_uint32_tv 576 bytes stack frame, 420 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL_SumPostDiv_uint32_tv 368 bytes stack frame, 536 bytes spill stores, 716 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_AllGather_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z44ncclKernel_AllGather_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllGather_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z39ncclKernel_AllGather_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_AllGather_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z48ncclKernel_AllGather_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllGather_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z49ncclKernel_AllGather_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllGather_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z39ncclKernel_AllGather_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 90 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z39ncclKernel_AllGather_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z39ncclKernel_AllGather_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z50ncclFunction_AllGather_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllGather_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllGather_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllGather_NVLS_SIMPLE_Sum_int8_tv 280 bytes stack frame, 300 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllGather_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllGather_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllGather_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllGather_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllGather_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllGather_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllGather_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllGather_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllGather_RING_SIMPLE_Sum_int8_tv 272 bytes stack frame, 296 bytes spill stores, 340 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllGather_RING_LL128_Sum_int8_tv 360 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllGather_RING_LL_Sum_int8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllGather_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllGather_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllGather_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int64_tv 224 bytes stack frame, 220 bytes spill stores, 220 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int64_tv 600 bytes stack frame, 1288 bytes spill stores, 1824 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_int64_tv 352 bytes stack frame, 384 bytes spill stores, 620 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_SumPostDiv_int64_tv 600 bytes stack frame, 460 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_SumPostDiv_int64_tv 224 bytes stack frame, 216 bytes spill stores, 212 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_int64_tv 352 bytes stack frame, 504 bytes spill stores, 652 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_SumPostDiv_int64_tv 568 bytes stack frame, 412 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_SumPostDiv_int64_tv 400 bytes stack frame, 548 bytes spill stores, 844 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_AllGather_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z44ncclKernel_AllGather_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllGather_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z39ncclKernel_AllGather_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_AllGather_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z48ncclKernel_AllGather_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllGather_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z49ncclKernel_AllGather_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllGather_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z39ncclKernel_AllGather_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 94 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllGather_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z39ncclKernel_AllGather_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z50ncclFunction_AllGather_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllGather_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllGather_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllGather_NVLS_SIMPLE_Sum_int8_tv 304 bytes stack frame, 312 bytes spill stores, 288 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllGather_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllGather_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllGather_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllGather_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllGather_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllGather_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllGather_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllGather_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllGather_RING_SIMPLE_Sum_int8_tv 320 bytes stack frame, 348 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllGather_RING_LL128_Sum_int8_tv 440 bytes stack frame, 256 bytes spill stores, 252 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllGather_RING_LL_Sum_int8_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllGather_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllGather_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllGather_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z59ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint64_tv 216 bytes stack frame, 212 bytes spill stores, 212 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint64_tv 600 bytes stack frame, 1236 bytes spill stores, 1840 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_uint64_tv 352 bytes stack frame, 380 bytes spill stores, 616 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_LL128_SumPostDiv_uint64_tv 600 bytes stack frame, 460 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL_SumPostDiv_uint64_tv 200 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_uint64_tv 328 bytes stack frame, 428 bytes spill stores, 600 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_LL128_SumPostDiv_uint64_tv 568 bytes stack frame, 416 bytes spill stores, 516 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL_SumPostDiv_uint64_tv 376 bytes stack frame, 512 bytes spill stores, 696 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_AllGather_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z44ncclKernel_AllGather_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllGather_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z39ncclKernel_AllGather_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_AllGather_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z48ncclKernel_AllGather_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_AllGather_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z49ncclKernel_AllGather_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllGather_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z39ncclKernel_AllGather_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 92 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_AllGather_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z39ncclKernel_AllGather_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z50ncclFunction_AllGather_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllGather_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllGather_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllGather_NVLS_SIMPLE_Sum_int8_tv 448 bytes stack frame, 532 bytes spill stores, 744 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllGather_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllGather_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllGather_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllGather_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllGather_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllGather_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllGather_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllGather_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllGather_RING_SIMPLE_Sum_int8_tv 440 bytes stack frame, 628 bytes spill stores, 1340 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllGather_RING_LL128_Sum_int8_tv 440 bytes stack frame, 256 bytes spill stores, 252 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllGather_RING_LL_Sum_int8_tv 160 bytes stack frame, 160 bytes spill stores, 160 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllGather_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllGather_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllGather_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 4 bytes gmem ptxas info : Function properties for _Z59ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint32_tv 296 bytes stack frame, 360 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint32_tv 576 bytes stack frame, 1240 bytes spill stores, 1960 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_uint32_tv 424 bytes stack frame, 512 bytes spill stores, 1196 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_LL128_SumPostDiv_uint32_tv 608 bytes stack frame, 476 bytes spill stores, 536 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL_SumPostDiv_uint32_tv 208 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_uint32_tv 520 bytes stack frame, 1024 bytes spill stores, 1392 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_LL128_SumPostDiv_uint32_tv 576 bytes stack frame, 416 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL_SumPostDiv_uint32_tv 384 bytes stack frame, 552 bytes spill stores, 744 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_AllGather_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z44ncclKernel_AllGather_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z39ncclKernel_AllGather_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z39ncclKernel_AllGather_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z48ncclKernel_AllGather_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z48ncclKernel_AllGather_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z49ncclKernel_AllGather_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z49ncclKernel_AllGather_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z39ncclKernel_AllGather_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z39ncclKernel_AllGather_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 91 registers ptxas info : Compiling entry function '_Z39ncclKernel_AllGather_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z39ncclKernel_AllGather_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z50ncclFunction_AllGather_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllGather_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_AllGather_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllGather_NVLS_SIMPLE_Sum_int8_tv 432 bytes stack frame, 532 bytes spill stores, 836 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllGather_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllGather_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllGather_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllGather_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllGather_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllGather_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllGather_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_AllGather_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllGather_RING_SIMPLE_Sum_int8_tv 376 bytes stack frame, 512 bytes spill stores, 964 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllGather_RING_LL128_Sum_int8_tv 400 bytes stack frame, 212 bytes spill stores, 212 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllGather_RING_LL_Sum_int8_tv 136 bytes stack frame, 132 bytes spill stores, 132 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_AllGather_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_AllGather_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_AllGather_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_sum_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_sum_u8.cu -o /<>/build/obj/collectives/device/all_gather_sum_u8.o ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int64_tv 368 bytes stack frame, 424 bytes spill stores, 836 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int64_tv 720 bytes stack frame, 1716 bytes spill stores, 2588 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_int64_tv 424 bytes stack frame, 496 bytes spill stores, 1140 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_SumPostDiv_int64_tv 608 bytes stack frame, 472 bytes spill stores, 604 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_SumPostDiv_int64_tv 232 bytes stack frame, 220 bytes spill stores, 216 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_int64_tv 504 bytes stack frame, 1048 bytes spill stores, 1544 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_SumPostDiv_int64_tv 584 bytes stack frame, 432 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_SumPostDiv_int64_tv 400 bytes stack frame, 548 bytes spill stores, 848 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_sum_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_sum_i32.cu -o /<>/build/obj/collectives/device/all_gather_sum_i32.o ptxas info : 0 bytes gmem ptxas info : 4 bytes gmem ptxas info : Function properties for _Z59ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint64_tv 368 bytes stack frame, 412 bytes spill stores, 780 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint64_tv 720 bytes stack frame, 1668 bytes spill stores, 2468 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_uint64_tv 416 bytes stack frame, 484 bytes spill stores, 1120 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_LL128_SumPostDiv_uint64_tv 608 bytes stack frame, 472 bytes spill stores, 604 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL_SumPostDiv_uint64_tv 208 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_uint64_tv 504 bytes stack frame, 1044 bytes spill stores, 1540 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_LL128_SumPostDiv_uint64_tv 584 bytes stack frame, 432 bytes spill stores, 532 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL_SumPostDiv_uint64_tv 376 bytes stack frame, 520 bytes spill stores, 764 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_sum_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_sum_u32.cu -o /<>/build/obj/collectives/device/all_gather_sum_u32.o ptxas info : 0 bytes gmem ptxas info : 4 bytes gmem ptxas info : Function properties for _Z59ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint32_tv 280 bytes stack frame, 308 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint32_tv 552 bytes stack frame, 1268 bytes spill stores, 2032 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_uint32_tv 320 bytes stack frame, 360 bytes spill stores, 664 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_LL128_SumPostDiv_uint32_tv 544 bytes stack frame, 404 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL_SumPostDiv_uint32_tv 208 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_uint32_tv 440 bytes stack frame, 908 bytes spill stores, 1228 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_LL128_SumPostDiv_uint32_tv 488 bytes stack frame, 308 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL_SumPostDiv_uint32_tv 368 bytes stack frame, 544 bytes spill stores, 776 bytes spill loads ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_sum_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_sum_i64.cu -o /<>/build/obj/collectives/device/all_gather_sum_i64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 4 bytes gmem ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_NVLS_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int64_tv 336 bytes stack frame, 372 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int64_tv 712 bytes stack frame, 1676 bytes spill stores, 2552 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_int64_tv 368 bytes stack frame, 432 bytes spill stores, 888 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_RING_LL128_SumPostDiv_int64_tv 544 bytes stack frame, 404 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_RING_LL_SumPostDiv_int64_tv 232 bytes stack frame, 220 bytes spill stores, 220 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_int64_tv 480 bytes stack frame, 1060 bytes spill stores, 1572 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_AllReduce_TREE_LL128_SumPostDiv_int64_tv 536 bytes stack frame, 344 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_AllReduce_TREE_LL_SumPostDiv_int64_tv 424 bytes stack frame, 536 bytes spill stores, 808 bytes spill loads Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_sum_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_sum_u64.cu -o /<>/build/obj/collectives/device/all_gather_sum_u64.o ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_sum_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_sum_f16.cu -o /<>/build/obj/collectives/device/all_gather_sum_f16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_sum_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_sum_f32.cu -o /<>/build/obj/collectives/device/all_gather_sum_f32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_sum_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_sum_f64.cu -o /<>/build/obj/collectives/device/all_gather_sum_f64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_sum_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_sum_bf16.cu -o /<>/build/obj/collectives/device/all_gather_sum_bf16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 4 bytes gmem ptxas info : Function properties for _Z59ncclFunction_AllReduce_NVLS_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_AllReduce_NVLS_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_AllReduce_NVLS_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_NVLS_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_NVLS_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_NVLS_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint64_tv 312 bytes stack frame, 340 bytes spill stores, 536 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_AllReduce_COLLNET_CHAIN_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_AllReduce_COLLNET_CHAIN_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_AllReduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint64_tv 712 bytes stack frame, 1624 bytes spill stores, 2448 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_AllReduce_COLLNET_DIRECT_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_AllReduce_COLLNET_DIRECT_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_RING_SIMPLE_SumPostDiv_uint64_tv 328 bytes stack frame, 356 bytes spill stores, 656 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_RING_LL128_SumPostDiv_uint64_tv 544 bytes stack frame, 404 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_RING_LL_SumPostDiv_uint64_tv 216 bytes stack frame, 204 bytes spill stores, 204 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_AllReduce_TREE_SIMPLE_SumPostDiv_uint64_tv 480 bytes stack frame, 1012 bytes spill stores, 1488 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_AllReduce_TREE_LL128_SumPostDiv_uint64_tv 504 bytes stack frame, 300 bytes spill stores, 292 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_AllReduce_TREE_LL_SumPostDiv_uint64_tv 384 bytes stack frame, 520 bytes spill stores, 784 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_prod_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_prod_i8.cu -o /<>/build/obj/collectives/device/all_gather_prod_i8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_prod_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_prod_u8.cu -o /<>/build/obj/collectives/device/all_gather_prod_u8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_prod_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_prod_i32.cu -o /<>/build/obj/collectives/device/all_gather_prod_i32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_prod_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_prod_u32.cu -o /<>/build/obj/collectives/device/all_gather_prod_u32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_prod_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_prod_i64.cu -o /<>/build/obj/collectives/device/all_gather_prod_i64.o ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_prod_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_prod_u64.cu -o /<>/build/obj/collectives/device/all_gather_prod_u64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_prod_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_prod_f16.cu -o /<>/build/obj/collectives/device/all_gather_prod_f16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_prod_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_prod_f32.cu -o /<>/build/obj/collectives/device/all_gather_prod_f32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_prod_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_prod_f64.cu -o /<>/build/obj/collectives/device/all_gather_prod_f64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_prod_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_prod_bf16.cu -o /<>/build/obj/collectives/device/all_gather_prod_bf16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_min_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_min_i8.cu -o /<>/build/obj/collectives/device/all_gather_min_i8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_min_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_min_u8.cu -o /<>/build/obj/collectives/device/all_gather_min_u8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_min_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_min_i32.cu -o /<>/build/obj/collectives/device/all_gather_min_i32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_min_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_min_u32.cu -o /<>/build/obj/collectives/device/all_gather_min_u32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_min_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_min_i64.cu -o /<>/build/obj/collectives/device/all_gather_min_i64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_min_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_min_u64.cu -o /<>/build/obj/collectives/device/all_gather_min_u64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_min_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_min_f16.cu -o /<>/build/obj/collectives/device/all_gather_min_f16.o Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_min_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_min_f32.cu -o /<>/build/obj/collectives/device/all_gather_min_f32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_min_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_min_f64.cu -o /<>/build/obj/collectives/device/all_gather_min_f64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_min_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_min_bf16.cu -o /<>/build/obj/collectives/device/all_gather_min_bf16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_max_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_max_i8.cu -o /<>/build/obj/collectives/device/all_gather_max_i8.o ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_max_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_max_u8.cu -o /<>/build/obj/collectives/device/all_gather_max_u8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_max_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_max_i32.cu -o /<>/build/obj/collectives/device/all_gather_max_i32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_max_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_max_u32.cu -o /<>/build/obj/collectives/device/all_gather_max_u32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_max_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_max_i64.cu -o /<>/build/obj/collectives/device/all_gather_max_i64.o Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_max_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_max_u64.cu -o /<>/build/obj/collectives/device/all_gather_max_u64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_max_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_max_f16.cu -o /<>/build/obj/collectives/device/all_gather_max_f16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_max_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_max_f32.cu -o /<>/build/obj/collectives/device/all_gather_max_f32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_max_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_max_f64.cu -o /<>/build/obj/collectives/device/all_gather_max_f64.o Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_max_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_max_bf16.cu -o /<>/build/obj/collectives/device/all_gather_max_bf16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_premulsum_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_premulsum_i8.cu -o /<>/build/obj/collectives/device/all_gather_premulsum_i8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_premulsum_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_premulsum_u8.cu -o /<>/build/obj/collectives/device/all_gather_premulsum_u8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_premulsum_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_premulsum_i32.cu -o /<>/build/obj/collectives/device/all_gather_premulsum_i32.o Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_premulsum_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_premulsum_u32.cu -o /<>/build/obj/collectives/device/all_gather_premulsum_u32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_premulsum_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_premulsum_i64.cu -o /<>/build/obj/collectives/device/all_gather_premulsum_i64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_premulsum_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_premulsum_u64.cu -o /<>/build/obj/collectives/device/all_gather_premulsum_u64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_premulsum_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_premulsum_f16.cu -o /<>/build/obj/collectives/device/all_gather_premulsum_f16.o Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_premulsum_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_premulsum_f32.cu -o /<>/build/obj/collectives/device/all_gather_premulsum_f32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_premulsum_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_premulsum_f64.cu -o /<>/build/obj/collectives/device/all_gather_premulsum_f64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_premulsum_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_premulsum_bf16.cu -o /<>/build/obj/collectives/device/all_gather_premulsum_bf16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_sumpostdiv_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_sumpostdiv_i8.cu -o /<>/build/obj/collectives/device/all_gather_sumpostdiv_i8.o Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_sumpostdiv_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_sumpostdiv_u8.cu -o /<>/build/obj/collectives/device/all_gather_sumpostdiv_u8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_sumpostdiv_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_sumpostdiv_i32.cu -o /<>/build/obj/collectives/device/all_gather_sumpostdiv_i32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_sumpostdiv_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_sumpostdiv_u32.cu -o /<>/build/obj/collectives/device/all_gather_sumpostdiv_u32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_sumpostdiv_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_sumpostdiv_i64.cu -o /<>/build/obj/collectives/device/all_gather_sumpostdiv_i64.o Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_sumpostdiv_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_sumpostdiv_u64.cu -o /<>/build/obj/collectives/device/all_gather_sumpostdiv_u64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_sumpostdiv_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_sumpostdiv_f16.cu -o /<>/build/obj/collectives/device/all_gather_sumpostdiv_f16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_sumpostdiv_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_sumpostdiv_f32.cu -o /<>/build/obj/collectives/device/all_gather_sumpostdiv_f32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_sumpostdiv_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_sumpostdiv_f64.cu -o /<>/build/obj/collectives/device/all_gather_sumpostdiv_f64.o Compiling all_gather.cu > /<>/build/obj/collectives/device/all_gather_sumpostdiv_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/all_gather_sumpostdiv_bf16.cu -o /<>/build/obj/collectives/device/all_gather_sumpostdiv_bf16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_sum_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_sum_i8.cu -o /<>/build/obj/collectives/device/broadcast_sum_i8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_sum_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_sum_u8.cu -o /<>/build/obj/collectives/device/broadcast_sum_u8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_sum_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_sum_i32.cu -o /<>/build/obj/collectives/device/broadcast_sum_i32.o Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_sum_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_sum_u32.cu -o /<>/build/obj/collectives/device/broadcast_sum_u32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_Broadcast_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z44ncclKernel_Broadcast_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_Broadcast_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z39ncclKernel_Broadcast_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_Broadcast_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z48ncclKernel_Broadcast_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_Broadcast_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z49ncclKernel_Broadcast_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_Broadcast_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z39ncclKernel_Broadcast_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z39ncclKernel_Broadcast_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z39ncclKernel_Broadcast_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z50ncclFunction_Broadcast_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Broadcast_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Broadcast_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Broadcast_NVLS_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Broadcast_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Broadcast_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Broadcast_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Broadcast_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Broadcast_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Broadcast_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Broadcast_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Broadcast_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Broadcast_RING_SIMPLE_Sum_int8_tv 264 bytes stack frame, 284 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Broadcast_RING_LL128_Sum_int8_tv 352 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Broadcast_RING_LL_Sum_int8_tv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Broadcast_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Broadcast_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Broadcast_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_Broadcast_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z44ncclKernel_Broadcast_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_Broadcast_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z39ncclKernel_Broadcast_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_Broadcast_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z48ncclKernel_Broadcast_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_Broadcast_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z49ncclKernel_Broadcast_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_Broadcast_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z39ncclKernel_Broadcast_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z39ncclKernel_Broadcast_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z39ncclKernel_Broadcast_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z50ncclFunction_Broadcast_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Broadcast_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Broadcast_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Broadcast_NVLS_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Broadcast_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Broadcast_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Broadcast_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Broadcast_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Broadcast_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Broadcast_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Broadcast_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Broadcast_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Broadcast_RING_SIMPLE_Sum_int8_tv 264 bytes stack frame, 284 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Broadcast_RING_LL128_Sum_int8_tv 352 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Broadcast_RING_LL_Sum_int8_tv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Broadcast_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Broadcast_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Broadcast_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_sum_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_sum_i64.cu -o /<>/build/obj/collectives/device/broadcast_sum_i64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_sum_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_sum_u64.cu -o /<>/build/obj/collectives/device/broadcast_sum_u64.o ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_sum_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_sum_f16.cu -o /<>/build/obj/collectives/device/broadcast_sum_f16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_Broadcast_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z44ncclKernel_Broadcast_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_Broadcast_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z39ncclKernel_Broadcast_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_Broadcast_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z48ncclKernel_Broadcast_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_Broadcast_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z49ncclKernel_Broadcast_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_Broadcast_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z39ncclKernel_Broadcast_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z39ncclKernel_Broadcast_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z39ncclKernel_Broadcast_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z50ncclFunction_Broadcast_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Broadcast_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Broadcast_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Broadcast_NVLS_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Broadcast_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Broadcast_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Broadcast_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Broadcast_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Broadcast_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Broadcast_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Broadcast_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Broadcast_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Broadcast_RING_SIMPLE_Sum_int8_tv 264 bytes stack frame, 284 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Broadcast_RING_LL128_Sum_int8_tv 352 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Broadcast_RING_LL_Sum_int8_tv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Broadcast_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Broadcast_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Broadcast_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_Broadcast_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z44ncclKernel_Broadcast_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_Broadcast_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z39ncclKernel_Broadcast_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_Broadcast_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z48ncclKernel_Broadcast_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_Broadcast_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z49ncclKernel_Broadcast_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_Broadcast_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z39ncclKernel_Broadcast_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 86 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_Broadcast_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z39ncclKernel_Broadcast_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z50ncclFunction_Broadcast_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Broadcast_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Broadcast_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Broadcast_NVLS_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Broadcast_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Broadcast_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Broadcast_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Broadcast_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Broadcast_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Broadcast_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Broadcast_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Broadcast_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Broadcast_RING_SIMPLE_Sum_int8_tv 280 bytes stack frame, 300 bytes spill stores, 288 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Broadcast_RING_LL128_Sum_int8_tv 432 bytes stack frame, 240 bytes spill stores, 236 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Broadcast_RING_LL_Sum_int8_tv 136 bytes stack frame, 132 bytes spill stores, 132 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Broadcast_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Broadcast_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Broadcast_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_sum_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_sum_f32.cu -o /<>/build/obj/collectives/device/broadcast_sum_f32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_sum_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_sum_f64.cu -o /<>/build/obj/collectives/device/broadcast_sum_f64.o ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_sum_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_sum_bf16.cu -o /<>/build/obj/collectives/device/broadcast_sum_bf16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_Broadcast_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z44ncclKernel_Broadcast_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_Broadcast_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z39ncclKernel_Broadcast_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_Broadcast_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z48ncclKernel_Broadcast_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z49ncclKernel_Broadcast_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z49ncclKernel_Broadcast_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_Broadcast_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z39ncclKernel_Broadcast_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 84 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z39ncclKernel_Broadcast_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z39ncclKernel_Broadcast_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z50ncclFunction_Broadcast_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Broadcast_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Broadcast_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Broadcast_NVLS_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Broadcast_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Broadcast_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Broadcast_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Broadcast_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Broadcast_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Broadcast_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Broadcast_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Broadcast_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Broadcast_RING_SIMPLE_Sum_int8_tv 424 bytes stack frame, 508 bytes spill stores, 980 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Broadcast_RING_LL128_Sum_int8_tv 424 bytes stack frame, 240 bytes spill stores, 236 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Broadcast_RING_LL_Sum_int8_tv 136 bytes stack frame, 132 bytes spill stores, 132 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Broadcast_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Broadcast_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Broadcast_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_prod_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_prod_i8.cu -o /<>/build/obj/collectives/device/broadcast_prod_i8.o ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z44ncclKernel_Broadcast_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z44ncclKernel_Broadcast_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z39ncclKernel_Broadcast_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z39ncclKernel_Broadcast_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z48ncclKernel_Broadcast_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z48ncclKernel_Broadcast_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z49ncclKernel_Broadcast_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z49ncclKernel_Broadcast_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z39ncclKernel_Broadcast_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z39ncclKernel_Broadcast_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers ptxas info : Compiling entry function '_Z39ncclKernel_Broadcast_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z39ncclKernel_Broadcast_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z50ncclFunction_Broadcast_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Broadcast_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Broadcast_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Broadcast_NVLS_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Broadcast_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Broadcast_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Broadcast_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Broadcast_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Broadcast_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Broadcast_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Broadcast_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Broadcast_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Broadcast_RING_SIMPLE_Sum_int8_tv 344 bytes stack frame, 384 bytes spill stores, 520 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Broadcast_RING_LL128_Sum_int8_tv 400 bytes stack frame, 216 bytes spill stores, 216 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Broadcast_RING_LL_Sum_int8_tv 112 bytes stack frame, 108 bytes spill stores, 108 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Broadcast_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Broadcast_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Broadcast_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_prod_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_prod_u8.cu -o /<>/build/obj/collectives/device/broadcast_prod_u8.o ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_prod_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_prod_i32.cu -o /<>/build/obj/collectives/device/broadcast_prod_i32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_prod_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_prod_u32.cu -o /<>/build/obj/collectives/device/broadcast_prod_u32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_prod_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_prod_i64.cu -o /<>/build/obj/collectives/device/broadcast_prod_i64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_prod_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_prod_u64.cu -o /<>/build/obj/collectives/device/broadcast_prod_u64.o ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_prod_f16.o mkdir -p /<>/build/obj/collectives/device ptxas info : 0 bytes gmem /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_prod_f16.cu -o /<>/build/obj/collectives/device/broadcast_prod_f16.o ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_prod_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_prod_f32.cu -o /<>/build/obj/collectives/device/broadcast_prod_f32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_prod_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_prod_f64.cu -o /<>/build/obj/collectives/device/broadcast_prod_f64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_prod_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_prod_bf16.cu -o /<>/build/obj/collectives/device/broadcast_prod_bf16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_min_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_min_i8.cu -o /<>/build/obj/collectives/device/broadcast_min_i8.o ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_min_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_min_u8.cu -o /<>/build/obj/collectives/device/broadcast_min_u8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_min_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_min_i32.cu -o /<>/build/obj/collectives/device/broadcast_min_i32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_min_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_min_u32.cu -o /<>/build/obj/collectives/device/broadcast_min_u32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_min_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_min_i64.cu -o /<>/build/obj/collectives/device/broadcast_min_i64.o ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_min_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_min_u64.cu -o /<>/build/obj/collectives/device/broadcast_min_u64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_min_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_min_f16.cu -o /<>/build/obj/collectives/device/broadcast_min_f16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_min_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_min_f32.cu -o /<>/build/obj/collectives/device/broadcast_min_f32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_min_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_min_f64.cu -o /<>/build/obj/collectives/device/broadcast_min_f64.o ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_min_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_min_bf16.cu -o /<>/build/obj/collectives/device/broadcast_min_bf16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_max_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_max_i8.cu -o /<>/build/obj/collectives/device/broadcast_max_i8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_max_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_max_u8.cu -o /<>/build/obj/collectives/device/broadcast_max_u8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_max_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_max_i32.cu -o /<>/build/obj/collectives/device/broadcast_max_i32.o ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_max_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_max_u32.cu -o /<>/build/obj/collectives/device/broadcast_max_u32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_max_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_max_i64.cu -o /<>/build/obj/collectives/device/broadcast_max_i64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_max_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_max_u64.cu -o /<>/build/obj/collectives/device/broadcast_max_u64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_max_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_max_f16.cu -o /<>/build/obj/collectives/device/broadcast_max_f16.o ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_max_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_max_f32.cu -o /<>/build/obj/collectives/device/broadcast_max_f32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_max_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_max_f64.cu -o /<>/build/obj/collectives/device/broadcast_max_f64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_max_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_max_bf16.cu -o /<>/build/obj/collectives/device/broadcast_max_bf16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_premulsum_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_premulsum_i8.cu -o /<>/build/obj/collectives/device/broadcast_premulsum_i8.o ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_premulsum_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_premulsum_u8.cu -o /<>/build/obj/collectives/device/broadcast_premulsum_u8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_premulsum_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_premulsum_i32.cu -o /<>/build/obj/collectives/device/broadcast_premulsum_i32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_premulsum_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_premulsum_u32.cu -o /<>/build/obj/collectives/device/broadcast_premulsum_u32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_premulsum_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_premulsum_i64.cu -o /<>/build/obj/collectives/device/broadcast_premulsum_i64.o ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_premulsum_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_premulsum_u64.cu -o /<>/build/obj/collectives/device/broadcast_premulsum_u64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_premulsum_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_premulsum_f16.cu -o /<>/build/obj/collectives/device/broadcast_premulsum_f16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_premulsum_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_premulsum_f32.cu -o /<>/build/obj/collectives/device/broadcast_premulsum_f32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_premulsum_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_premulsum_f64.cu -o /<>/build/obj/collectives/device/broadcast_premulsum_f64.o ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_premulsum_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_premulsum_bf16.cu -o /<>/build/obj/collectives/device/broadcast_premulsum_bf16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_sumpostdiv_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_sumpostdiv_i8.cu -o /<>/build/obj/collectives/device/broadcast_sumpostdiv_i8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_sumpostdiv_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_sumpostdiv_u8.cu -o /<>/build/obj/collectives/device/broadcast_sumpostdiv_u8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_sumpostdiv_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_sumpostdiv_i32.cu -o /<>/build/obj/collectives/device/broadcast_sumpostdiv_i32.o ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_sumpostdiv_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_sumpostdiv_u32.cu -o /<>/build/obj/collectives/device/broadcast_sumpostdiv_u32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_sumpostdiv_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_sumpostdiv_i64.cu -o /<>/build/obj/collectives/device/broadcast_sumpostdiv_i64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_sumpostdiv_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_sumpostdiv_u64.cu -o /<>/build/obj/collectives/device/broadcast_sumpostdiv_u64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_sumpostdiv_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_sumpostdiv_f16.cu -o /<>/build/obj/collectives/device/broadcast_sumpostdiv_f16.o ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_sumpostdiv_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_sumpostdiv_f32.cu -o /<>/build/obj/collectives/device/broadcast_sumpostdiv_f32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_sumpostdiv_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_sumpostdiv_f64.cu -o /<>/build/obj/collectives/device/broadcast_sumpostdiv_f64.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling broadcast.cu > /<>/build/obj/collectives/device/broadcast_sumpostdiv_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/broadcast_sumpostdiv_bf16.cu -o /<>/build/obj/collectives/device/broadcast_sumpostdiv_bf16.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_sum_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_sum_i8.cu -o /<>/build/obj/collectives/device/reduce_sum_i8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_sum_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_sum_u8.cu -o /<>/build/obj/collectives/device/reduce_sum_u8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z36ncclKernel_Reduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z36ncclKernel_Reduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 76 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z36ncclKernel_Reduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Sum_int8_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Sum_int8_tv 504 bytes stack frame, 368 bytes spill stores, 320 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Sum_int8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_sum_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_sum_i32.cu -o /<>/build/obj/collectives/device/reduce_sum_i32.o ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z37ncclKernel_Reduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z37ncclKernel_Reduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 76 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z37ncclKernel_Reduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Sum_uint8_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Sum_uint8_tv 504 bytes stack frame, 368 bytes spill stores, 320 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Sum_uint8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_sum_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_sum_u32.cu -o /<>/build/obj/collectives/device/reduce_sum_u32.o ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z36ncclKernel_Reduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z36ncclKernel_Reduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 76 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z36ncclKernel_Reduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Sum_int8_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Sum_int8_tv 504 bytes stack frame, 368 bytes spill stores, 320 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Sum_int8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z37ncclKernel_Reduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z37ncclKernel_Reduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 76 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z37ncclKernel_Reduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Sum_uint8_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Sum_uint8_tv 504 bytes stack frame, 368 bytes spill stores, 320 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Sum_uint8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z37ncclKernel_Reduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z37ncclKernel_Reduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z37ncclKernel_Reduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Sum_int32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Sum_int32_tv 520 bytes stack frame, 392 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Sum_int32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z38ncclKernel_Reduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z38ncclKernel_Reduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z38ncclKernel_Reduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Sum_uint32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Sum_uint32_tv 520 bytes stack frame, 392 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Sum_uint32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z36ncclKernel_Reduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z36ncclKernel_Reduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 76 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z36ncclKernel_Reduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Sum_int8_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Sum_int8_tv 504 bytes stack frame, 368 bytes spill stores, 320 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Sum_int8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z37ncclKernel_Reduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z37ncclKernel_Reduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 76 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z37ncclKernel_Reduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Sum_uint8_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Sum_uint8_tv 504 bytes stack frame, 368 bytes spill stores, 320 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Sum_uint8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z37ncclKernel_Reduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z37ncclKernel_Reduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z37ncclKernel_Reduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Sum_int32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Sum_int32_tv 512 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Sum_int32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z38ncclKernel_Reduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z38ncclKernel_Reduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z38ncclKernel_Reduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Sum_uint32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Sum_uint32_tv 512 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Sum_uint32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z36ncclKernel_Reduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z36ncclKernel_Reduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 79 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z36ncclKernel_Reduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Sum_int8_tv 216 bytes stack frame, 208 bytes spill stores, 208 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Sum_int8_tv 560 bytes stack frame, 436 bytes spill stores, 432 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Sum_int8_tv 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z37ncclKernel_Reduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z37ncclKernel_Reduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 79 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z37ncclKernel_Reduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Sum_uint8_tv 216 bytes stack frame, 208 bytes spill stores, 208 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Sum_uint8_tv 560 bytes stack frame, 436 bytes spill stores, 432 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Sum_uint8_tv 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z37ncclKernel_Reduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z37ncclKernel_Reduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z37ncclKernel_Reduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Sum_int32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Sum_int32_tv 512 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Sum_int32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z38ncclKernel_Reduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z38ncclKernel_Reduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z38ncclKernel_Reduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Sum_uint32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Sum_uint32_tv 512 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Sum_uint32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z36ncclKernel_Reduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z36ncclKernel_Reduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 81 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z36ncclKernel_Reduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Sum_int8_tv 376 bytes stack frame, 464 bytes spill stores, 812 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Sum_int8_tv 544 bytes stack frame, 408 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Sum_int8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z37ncclKernel_Reduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z37ncclKernel_Reduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 82 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z37ncclKernel_Reduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Sum_int32_tv 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Sum_int32_tv 576 bytes stack frame, 460 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Sum_int32_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z37ncclKernel_Reduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z37ncclKernel_Reduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 81 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z37ncclKernel_Reduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Sum_uint8_tv 376 bytes stack frame, 464 bytes spill stores, 812 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Sum_uint8_tv 544 bytes stack frame, 408 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Sum_uint8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z38ncclKernel_Reduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z38ncclKernel_Reduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 82 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z38ncclKernel_Reduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Sum_uint32_tv 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Sum_uint32_tv 576 bytes stack frame, 460 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Sum_uint32_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z37ncclKernel_Reduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z37ncclKernel_Reduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 84 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z37ncclKernel_Reduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Sum_int32_tv 312 bytes stack frame, 332 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Sum_int32_tv 560 bytes stack frame, 448 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Sum_int32_tv 128 bytes stack frame, 128 bytes spill stores, 128 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z36ncclKernel_Reduce_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z36ncclKernel_Reduce_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z36ncclKernel_Reduce_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Sum_int8_tv 360 bytes stack frame, 448 bytes spill stores, 748 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Sum_int8_tv 528 bytes stack frame, 368 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Sum_int8_tv 112 bytes stack frame, 108 bytes spill stores, 108 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z37ncclKernel_Reduce_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z37ncclKernel_Reduce_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z37ncclKernel_Reduce_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Sum_uint8_tv 360 bytes stack frame, 448 bytes spill stores, 748 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Sum_uint8_tv 528 bytes stack frame, 368 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Sum_uint8_tv 112 bytes stack frame, 108 bytes spill stores, 108 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_sum_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_sum_i64.cu -o /<>/build/obj/collectives/device/reduce_sum_i64.o ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z38ncclKernel_Reduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z38ncclKernel_Reduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 84 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z38ncclKernel_Reduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Sum_uint32_tv 312 bytes stack frame, 332 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Sum_uint32_tv 560 bytes stack frame, 448 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Sum_uint32_tv 128 bytes stack frame, 128 bytes spill stores, 128 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_sum_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_sum_u64.cu -o /<>/build/obj/collectives/device/reduce_sum_u64.o ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z37ncclKernel_Reduce_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z37ncclKernel_Reduce_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z37ncclKernel_Reduce_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Sum_int32_tv 264 bytes stack frame, 276 bytes spill stores, 304 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Sum_int32_tv 544 bytes stack frame, 400 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Sum_int32_tv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_sum_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_sum_f16.cu -o /<>/build/obj/collectives/device/reduce_sum_f16.o ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z38ncclKernel_Reduce_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z38ncclKernel_Reduce_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z38ncclKernel_Reduce_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Sum_uint32_tv 264 bytes stack frame, 276 bytes spill stores, 304 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Sum_uint32_tv 544 bytes stack frame, 400 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Sum_uint32_tv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z37ncclKernel_Reduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z37ncclKernel_Reduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 76 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z37ncclKernel_Reduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Sum_int64_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Sum_int64_tv 520 bytes stack frame, 392 bytes spill stores, 344 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Sum_int64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_sum_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_sum_f32.cu -o /<>/build/obj/collectives/device/reduce_sum_f32.o ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z38ncclKernel_Reduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z38ncclKernel_Reduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 76 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z38ncclKernel_Reduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Sum_uint64_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Sum_uint64_tv 520 bytes stack frame, 392 bytes spill stores, 344 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Sum_uint64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z37ncclKernel_Reduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z37ncclKernel_Reduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 76 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z37ncclKernel_Reduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Sum_int64_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Sum_int64_tv 512 bytes stack frame, 380 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Sum_int64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z39ncclKernel_Reduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z39ncclKernel_Reduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z34ncclKernel_Reduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z34ncclKernel_Reduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z43ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z44ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z34ncclKernel_Reduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z34ncclKernel_Reduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 75 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z34ncclKernel_Reduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z34ncclKernel_Reduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_NVLS_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_SIMPLE_Sum_halfv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL128_Sum_halfv 528 bytes stack frame, 404 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_RING_LL_Sum_halfv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z38ncclKernel_Reduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z38ncclKernel_Reduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 76 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z38ncclKernel_Reduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Sum_uint64_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Sum_uint64_tv 512 bytes stack frame, 380 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Sum_uint64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z40ncclKernel_Reduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z40ncclKernel_Reduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z35ncclKernel_Reduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z35ncclKernel_Reduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z44ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z45ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z35ncclKernel_Reduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z35ncclKernel_Reduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z35ncclKernel_Reduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z35ncclKernel_Reduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Sum_floatv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Sum_floatv 520 bytes stack frame, 392 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Sum_floatv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z37ncclKernel_Reduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z37ncclKernel_Reduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 76 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z37ncclKernel_Reduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Sum_int64_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Sum_int64_tv 512 bytes stack frame, 380 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Sum_int64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z39ncclKernel_Reduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z39ncclKernel_Reduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z34ncclKernel_Reduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z34ncclKernel_Reduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z43ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z44ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z34ncclKernel_Reduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z34ncclKernel_Reduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 75 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z34ncclKernel_Reduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z34ncclKernel_Reduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_NVLS_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_SIMPLE_Sum_halfv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL128_Sum_halfv 544 bytes stack frame, 416 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_RING_LL_Sum_halfv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z38ncclKernel_Reduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z38ncclKernel_Reduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 76 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z38ncclKernel_Reduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Sum_uint64_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Sum_uint64_tv 512 bytes stack frame, 380 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Sum_uint64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z40ncclKernel_Reduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z40ncclKernel_Reduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z35ncclKernel_Reduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z35ncclKernel_Reduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z44ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z45ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z35ncclKernel_Reduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z35ncclKernel_Reduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z35ncclKernel_Reduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z35ncclKernel_Reduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Sum_floatv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Sum_floatv 512 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Sum_floatv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z37ncclKernel_Reduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z37ncclKernel_Reduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z37ncclKernel_Reduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Sum_int64_tv 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Sum_int64_tv 560 bytes stack frame, 456 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Sum_int64_tv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z38ncclKernel_Reduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z38ncclKernel_Reduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z38ncclKernel_Reduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Sum_uint64_tv 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Sum_uint64_tv 560 bytes stack frame, 456 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Sum_uint64_tv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z39ncclKernel_Reduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z39ncclKernel_Reduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z34ncclKernel_Reduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z34ncclKernel_Reduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z43ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z44ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z34ncclKernel_Reduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z34ncclKernel_Reduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 75 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z34ncclKernel_Reduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z34ncclKernel_Reduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_NVLS_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_SIMPLE_Sum_halfv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL128_Sum_halfv 528 bytes stack frame, 404 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_RING_LL_Sum_halfv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z40ncclKernel_Reduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z40ncclKernel_Reduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z35ncclKernel_Reduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z35ncclKernel_Reduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z44ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z45ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z35ncclKernel_Reduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z35ncclKernel_Reduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z35ncclKernel_Reduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z35ncclKernel_Reduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Sum_floatv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Sum_floatv 512 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Sum_floatv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z37ncclKernel_Reduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z37ncclKernel_Reduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z37ncclKernel_Reduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Sum_int64_tv 328 bytes stack frame, 364 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Sum_int64_tv 560 bytes stack frame, 444 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Sum_int64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z38ncclKernel_Reduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z38ncclKernel_Reduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z38ncclKernel_Reduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Sum_uint64_tv 328 bytes stack frame, 364 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Sum_uint64_tv 560 bytes stack frame, 444 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Sum_uint64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z39ncclKernel_Reduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z39ncclKernel_Reduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z34ncclKernel_Reduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z34ncclKernel_Reduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z43ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z44ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z34ncclKernel_Reduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z34ncclKernel_Reduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 78 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z34ncclKernel_Reduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z34ncclKernel_Reduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_NVLS_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_SIMPLE_Sum_halfv 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL128_Sum_halfv 600 bytes stack frame, 476 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_RING_LL_Sum_halfv 112 bytes stack frame, 108 bytes spill stores, 108 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z40ncclKernel_Reduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z40ncclKernel_Reduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z35ncclKernel_Reduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z35ncclKernel_Reduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z44ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z45ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z35ncclKernel_Reduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z35ncclKernel_Reduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 82 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z35ncclKernel_Reduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z35ncclKernel_Reduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Sum_floatv 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Sum_floatv 576 bytes stack frame, 460 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Sum_floatv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z42ncclKernel_Reduce_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z37ncclKernel_Reduce_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z37ncclKernel_Reduce_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers ptxas info : Compiling entry function '_Z37ncclKernel_Reduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z37ncclKernel_Reduce_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Sum_int64_tv 264 bytes stack frame, 276 bytes spill stores, 304 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Sum_int64_tv 536 bytes stack frame, 376 bytes spill stores, 416 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Sum_int64_tv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z43ncclKernel_Reduce_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z38ncclKernel_Reduce_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z47ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z48ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z38ncclKernel_Reduce_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers ptxas info : Compiling entry function '_Z38ncclKernel_Reduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z38ncclKernel_Reduce_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Sum_uint64_tv 264 bytes stack frame, 276 bytes spill stores, 304 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Sum_uint64_tv 536 bytes stack frame, 376 bytes spill stores, 416 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Sum_uint64_tv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_sum_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_sum_f64.cu -o /<>/build/obj/collectives/device/reduce_sum_f64.o ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z39ncclKernel_Reduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z39ncclKernel_Reduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z34ncclKernel_Reduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z34ncclKernel_Reduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z43ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z44ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z34ncclKernel_Reduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z34ncclKernel_Reduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 79 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z34ncclKernel_Reduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z34ncclKernel_Reduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_NVLS_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_SIMPLE_Sum_halfv 264 bytes stack frame, 276 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL128_Sum_halfv 584 bytes stack frame, 476 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_RING_LL_Sum_halfv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_sum_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_sum_bf16.cu -o /<>/build/obj/collectives/device/reduce_sum_bf16.o ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z40ncclKernel_Reduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z40ncclKernel_Reduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z35ncclKernel_Reduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z35ncclKernel_Reduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z44ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z45ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z35ncclKernel_Reduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z35ncclKernel_Reduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 84 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z35ncclKernel_Reduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z35ncclKernel_Reduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Sum_floatv 312 bytes stack frame, 332 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Sum_floatv 560 bytes stack frame, 448 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Sum_floatv 128 bytes stack frame, 128 bytes spill stores, 128 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z39ncclKernel_Reduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z39ncclKernel_Reduce_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z34ncclKernel_Reduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z34ncclKernel_Reduce_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z43ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z44ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z44ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z34ncclKernel_Reduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z34ncclKernel_Reduce_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers ptxas info : Compiling entry function '_Z34ncclKernel_Reduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z34ncclKernel_Reduce_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_NVLS_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_SIMPLE_Sum_halfv 240 bytes stack frame, 252 bytes spill stores, 244 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL128_Sum_halfv 560 bytes stack frame, 428 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_RING_LL_Sum_halfv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z36ncclKernel_Reduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z36ncclKernel_Reduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 76 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z36ncclKernel_Reduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Sum_doublev 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Sum_doublev 520 bytes stack frame, 392 bytes spill stores, 344 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Sum_doublev 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z40ncclKernel_Reduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z40ncclKernel_Reduce_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z35ncclKernel_Reduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z35ncclKernel_Reduce_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z44ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z44ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z45ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z45ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z35ncclKernel_Reduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z35ncclKernel_Reduce_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers ptxas info : Compiling entry function '_Z35ncclKernel_Reduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z35ncclKernel_Reduce_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Sum_floatv 264 bytes stack frame, 276 bytes spill stores, 304 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Sum_floatv 544 bytes stack frame, 400 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Sum_floatv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_prod_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_prod_i8.cu -o /<>/build/obj/collectives/device/reduce_prod_i8.o Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_prod_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_prod_u8.cu -o /<>/build/obj/collectives/device/reduce_prod_u8.o ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z48ncclKernel_Reduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z48ncclKernel_Reduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z43ncclKernel_Reduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z52ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z52ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z53ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z43ncclKernel_Reduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 75 registers, 344 bytes cmem[0], 12 bytes cmem[2] ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z43ncclKernel_Reduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_Sum___nv_bfloat16v 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_Sum___nv_bfloat16v 536 bytes stack frame, 408 bytes spill stores, 380 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_Sum___nv_bfloat16v 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z36ncclKernel_Reduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z36ncclKernel_Reduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 76 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z36ncclKernel_Reduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Sum_doublev 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Sum_doublev 512 bytes stack frame, 380 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Sum_doublev 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Prod_int8_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Prod_int8_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Prod_int8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Prod_uint8_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Prod_uint8_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Prod_uint8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z48ncclKernel_Reduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z48ncclKernel_Reduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z43ncclKernel_Reduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z52ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z52ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z53ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z43ncclKernel_Reduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 75 registers, 344 bytes cmem[0], 12 bytes cmem[2] ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z43ncclKernel_Reduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_Sum___nv_bfloat16v 160 bytes stack frame, 160 bytes spill stores, 160 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_Sum___nv_bfloat16v 536 bytes stack frame, 408 bytes spill stores, 380 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_Sum___nv_bfloat16v 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z36ncclKernel_Reduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z36ncclKernel_Reduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 76 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z36ncclKernel_Reduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Sum_doublev 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Sum_doublev 512 bytes stack frame, 380 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Sum_doublev 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Prod_int8_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Prod_int8_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Prod_int8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Prod_uint8_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Prod_uint8_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Prod_uint8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z36ncclKernel_Reduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z36ncclKernel_Reduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z36ncclKernel_Reduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Sum_doublev 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Sum_doublev 560 bytes stack frame, 448 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Sum_doublev 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z48ncclKernel_Reduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z48ncclKernel_Reduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z43ncclKernel_Reduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z52ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z52ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z53ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z43ncclKernel_Reduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 75 registers, 344 bytes cmem[0], 12 bytes cmem[2] ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z43ncclKernel_Reduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_Sum___nv_bfloat16v 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_Sum___nv_bfloat16v 536 bytes stack frame, 408 bytes spill stores, 380 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_Sum___nv_bfloat16v 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Prod_int8_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Prod_int8_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Prod_int8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Prod_uint8_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Prod_uint8_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Prod_uint8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z36ncclKernel_Reduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z36ncclKernel_Reduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z36ncclKernel_Reduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Sum_doublev 272 bytes stack frame, 288 bytes spill stores, 316 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Sum_doublev 544 bytes stack frame, 408 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Sum_doublev 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Prod_int8_tv 216 bytes stack frame, 208 bytes spill stores, 208 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Prod_int8_tv 592 bytes stack frame, 460 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Prod_int8_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Prod_uint8_tv 216 bytes stack frame, 208 bytes spill stores, 208 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Prod_uint8_tv 592 bytes stack frame, 460 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Prod_uint8_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z48ncclKernel_Reduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z48ncclKernel_Reduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z43ncclKernel_Reduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z52ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z52ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z53ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z43ncclKernel_Reduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 78 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z43ncclKernel_Reduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_Sum___nv_bfloat16v 200 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_Sum___nv_bfloat16v 584 bytes stack frame, 464 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_Sum___nv_bfloat16v 112 bytes stack frame, 108 bytes spill stores, 108 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z41ncclKernel_Reduce_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z36ncclKernel_Reduce_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z45ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z46ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z36ncclKernel_Reduce_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers ptxas info : Compiling entry function '_Z36ncclKernel_Reduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z36ncclKernel_Reduce_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Sum_doublev 216 bytes stack frame, 212 bytes spill stores, 212 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Sum_doublev 520 bytes stack frame, 352 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Sum_doublev 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_prod_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_prod_i32.cu -o /<>/build/obj/collectives/device/reduce_prod_i32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Prod_int8_tv 376 bytes stack frame, 464 bytes spill stores, 812 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Prod_int8_tv 584 bytes stack frame, 452 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Prod_int8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Prod_uint8_tv 376 bytes stack frame, 464 bytes spill stores, 812 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Prod_uint8_tv 584 bytes stack frame, 452 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Prod_uint8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z48ncclKernel_Reduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z48ncclKernel_Reduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z43ncclKernel_Reduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z52ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z52ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z53ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z43ncclKernel_Reduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 79 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z43ncclKernel_Reduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_Sum___nv_bfloat16v 264 bytes stack frame, 276 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_Sum___nv_bfloat16v 584 bytes stack frame, 476 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_Sum___nv_bfloat16v 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Prod_int32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Prod_int32_tv 528 bytes stack frame, 408 bytes spill stores, 356 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Prod_int32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Prod_int8_tv 344 bytes stack frame, 440 bytes spill stores, 784 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Prod_int8_tv 560 bytes stack frame, 432 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Prod_int8_tv 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Prod_uint8_tv 344 bytes stack frame, 440 bytes spill stores, 784 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Prod_uint8_tv 560 bytes stack frame, 432 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Prod_uint8_tv 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_prod_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_prod_u32.cu -o /<>/build/obj/collectives/device/reduce_prod_u32.o ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z48ncclKernel_Reduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z48ncclKernel_Reduce_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z43ncclKernel_Reduce_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z52ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z52ncclKernel_Reduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z53ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z53ncclKernel_Reduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z43ncclKernel_Reduce_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers ptxas info : Compiling entry function '_Z43ncclKernel_Reduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z43ncclKernel_Reduce_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_Sum___nv_bfloat16v 240 bytes stack frame, 252 bytes spill stores, 244 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_Sum___nv_bfloat16v 560 bytes stack frame, 428 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_Sum___nv_bfloat16v 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_prod_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_prod_i64.cu -o /<>/build/obj/collectives/device/reduce_prod_i64.o Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_prod_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_prod_u64.cu -o /<>/build/obj/collectives/device/reduce_prod_u64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Prod_int32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Prod_int32_tv 520 bytes stack frame, 396 bytes spill stores, 344 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Prod_int32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_SIMPLE_Prod_uint32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL128_Prod_uint32_tv 528 bytes stack frame, 408 bytes spill stores, 356 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL_Prod_uint32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Prod_int64_tv 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Prod_int64_tv 520 bytes stack frame, 400 bytes spill stores, 352 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Prod_int64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_SIMPLE_Prod_uint64_tv 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL128_Prod_uint64_tv 520 bytes stack frame, 400 bytes spill stores, 352 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL_Prod_uint64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Prod_int32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Prod_int32_tv 520 bytes stack frame, 396 bytes spill stores, 344 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Prod_int32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_SIMPLE_Prod_uint32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL128_Prod_uint32_tv 520 bytes stack frame, 396 bytes spill stores, 344 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL_Prod_uint32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Prod_int64_tv 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Prod_int64_tv 520 bytes stack frame, 388 bytes spill stores, 340 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Prod_int64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_SIMPLE_Prod_uint64_tv 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL128_Prod_uint64_tv 520 bytes stack frame, 388 bytes spill stores, 340 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL_Prod_uint64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Prod_int32_tv 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Prod_int32_tv 576 bytes stack frame, 460 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Prod_int32_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_SIMPLE_Prod_uint32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL128_Prod_uint32_tv 520 bytes stack frame, 396 bytes spill stores, 344 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL_Prod_uint32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Prod_int64_tv 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Prod_int64_tv 520 bytes stack frame, 388 bytes spill stores, 340 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Prod_int64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_SIMPLE_Prod_uint64_tv 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL128_Prod_uint64_tv 520 bytes stack frame, 388 bytes spill stores, 340 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL_Prod_uint64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Prod_int32_tv 312 bytes stack frame, 332 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Prod_int32_tv 560 bytes stack frame, 448 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Prod_int32_tv 128 bytes stack frame, 128 bytes spill stores, 128 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_SIMPLE_Prod_uint32_tv 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL128_Prod_uint32_tv 576 bytes stack frame, 460 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL_Prod_uint32_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Prod_int64_tv 216 bytes stack frame, 208 bytes spill stores, 208 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Prod_int64_tv 560 bytes stack frame, 456 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Prod_int64_tv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_SIMPLE_Prod_uint64_tv 216 bytes stack frame, 208 bytes spill stores, 208 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL128_Prod_uint64_tv 560 bytes stack frame, 456 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL_Prod_uint64_tv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Prod_int32_tv 264 bytes stack frame, 276 bytes spill stores, 304 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Prod_int32_tv 544 bytes stack frame, 400 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Prod_int32_tv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_prod_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_prod_f16.cu -o /<>/build/obj/collectives/device/reduce_prod_f16.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_SIMPLE_Prod_uint32_tv 312 bytes stack frame, 332 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL128_Prod_uint32_tv 560 bytes stack frame, 448 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL_Prod_uint32_tv 128 bytes stack frame, 128 bytes spill stores, 128 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Prod_int64_tv 336 bytes stack frame, 372 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Prod_int64_tv 552 bytes stack frame, 420 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Prod_int64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_SIMPLE_Prod_uint64_tv 336 bytes stack frame, 372 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL128_Prod_uint64_tv 552 bytes stack frame, 420 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL_Prod_uint64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Prod_int64_tv 264 bytes stack frame, 276 bytes spill stores, 304 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Prod_int64_tv 528 bytes stack frame, 356 bytes spill stores, 380 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Prod_int64_tv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_SIMPLE_Prod_uint32_tv 264 bytes stack frame, 276 bytes spill stores, 304 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL128_Prod_uint32_tv 544 bytes stack frame, 400 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL_Prod_uint32_tv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Prod_halfv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Prod_halfv 528 bytes stack frame, 404 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Prod_halfv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_SIMPLE_Prod_uint64_tv 264 bytes stack frame, 276 bytes spill stores, 304 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL128_Prod_uint64_tv 528 bytes stack frame, 356 bytes spill stores, 380 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL_Prod_uint64_tv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_prod_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_prod_f32.cu -o /<>/build/obj/collectives/device/reduce_prod_f32.o Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_prod_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_prod_f64.cu -o /<>/build/obj/collectives/device/reduce_prod_f64.o Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_prod_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_prod_bf16.cu -o /<>/build/obj/collectives/device/reduce_prod_bf16.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Prod_halfv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Prod_halfv 544 bytes stack frame, 416 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Prod_halfv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Prod_doublev 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Prod_doublev 520 bytes stack frame, 392 bytes spill stores, 344 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Prod_doublev 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Prod_floatv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Prod_floatv 520 bytes stack frame, 392 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Prod_floatv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_Prod___nv_bfloat16v 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_Prod___nv_bfloat16v 536 bytes stack frame, 408 bytes spill stores, 380 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_Prod___nv_bfloat16v 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Prod_halfv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Prod_halfv 528 bytes stack frame, 404 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Prod_halfv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Prod_doublev 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Prod_doublev 512 bytes stack frame, 380 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Prod_doublev 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Prod_floatv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Prod_floatv 512 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Prod_floatv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Prod_halfv 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Prod_halfv 600 bytes stack frame, 476 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Prod_halfv 112 bytes stack frame, 108 bytes spill stores, 108 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Prod_doublev 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Prod_doublev 512 bytes stack frame, 380 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Prod_doublev 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_Prod___nv_bfloat16v 160 bytes stack frame, 160 bytes spill stores, 160 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_Prod___nv_bfloat16v 536 bytes stack frame, 408 bytes spill stores, 380 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_Prod___nv_bfloat16v 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Prod_floatv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Prod_floatv 512 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Prod_floatv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Prod_halfv 264 bytes stack frame, 276 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Prod_halfv 584 bytes stack frame, 476 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Prod_halfv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Prod_doublev 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Prod_doublev 560 bytes stack frame, 448 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Prod_doublev 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Prod_floatv 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Prod_floatv 576 bytes stack frame, 460 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Prod_floatv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_Prod___nv_bfloat16v 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_Prod___nv_bfloat16v 536 bytes stack frame, 408 bytes spill stores, 380 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_Prod___nv_bfloat16v 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Prod_doublev 272 bytes stack frame, 288 bytes spill stores, 316 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Prod_doublev 544 bytes stack frame, 408 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Prod_doublev 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Prod_halfv 240 bytes stack frame, 252 bytes spill stores, 244 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Prod_halfv 560 bytes stack frame, 428 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Prod_halfv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Prod_floatv 312 bytes stack frame, 332 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Prod_floatv 560 bytes stack frame, 448 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Prod_floatv 128 bytes stack frame, 128 bytes spill stores, 128 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_min_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_min_i8.cu -o /<>/build/obj/collectives/device/reduce_min_i8.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_Prod___nv_bfloat16v 200 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_Prod___nv_bfloat16v 584 bytes stack frame, 464 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_Prod___nv_bfloat16v 112 bytes stack frame, 108 bytes spill stores, 108 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Prod_doublev 216 bytes stack frame, 212 bytes spill stores, 212 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Prod_doublev 520 bytes stack frame, 352 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Prod_doublev 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Prod_floatv 264 bytes stack frame, 276 bytes spill stores, 304 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Prod_floatv 544 bytes stack frame, 400 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Prod_floatv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_min_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_min_u8.cu -o /<>/build/obj/collectives/device/reduce_min_u8.o Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_min_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_min_i32.cu -o /<>/build/obj/collectives/device/reduce_min_i32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Min_int8_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Min_int8_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Min_int8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_Prod___nv_bfloat16v 264 bytes stack frame, 276 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_Prod___nv_bfloat16v 584 bytes stack frame, 476 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_Prod___nv_bfloat16v 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Min_uint8_tv 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Min_uint8_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Min_uint8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Min_int32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Min_int32_tv 520 bytes stack frame, 392 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Min_int32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Min_int8_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Min_int8_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Min_int8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_Prod___nv_bfloat16v 240 bytes stack frame, 252 bytes spill stores, 244 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_Prod___nv_bfloat16v 560 bytes stack frame, 428 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_Prod___nv_bfloat16v 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_min_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_min_u32.cu -o /<>/build/obj/collectives/device/reduce_min_u32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Min_int32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Min_int32_tv 512 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Min_int32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Min_uint8_tv 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Min_uint8_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Min_uint8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Min_int8_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Min_int8_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Min_int8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Min_uint32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Min_uint32_tv 520 bytes stack frame, 392 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Min_uint32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Min_int32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Min_int32_tv 512 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Min_int32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Min_uint8_tv 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Min_uint8_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Min_uint8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Min_int8_tv 296 bytes stack frame, 332 bytes spill stores, 388 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Min_int8_tv 592 bytes stack frame, 460 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Min_int8_tv 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Min_uint32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Min_uint32_tv 512 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Min_uint32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Min_int32_tv 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Min_int32_tv 576 bytes stack frame, 460 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Min_int32_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Min_uint8_tv 296 bytes stack frame, 332 bytes spill stores, 388 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Min_uint8_tv 592 bytes stack frame, 460 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Min_uint8_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Min_uint32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Min_uint32_tv 512 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Min_uint32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Min_int8_tv 376 bytes stack frame, 464 bytes spill stores, 812 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Min_int8_tv 584 bytes stack frame, 452 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Min_int8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Min_int32_tv 312 bytes stack frame, 332 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Min_int32_tv 560 bytes stack frame, 448 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Min_int32_tv 128 bytes stack frame, 128 bytes spill stores, 128 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Min_uint8_tv 376 bytes stack frame, 464 bytes spill stores, 812 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Min_uint8_tv 584 bytes stack frame, 452 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Min_uint8_tv 128 bytes stack frame, 128 bytes spill stores, 128 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Min_uint32_tv 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Min_uint32_tv 576 bytes stack frame, 460 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Min_uint32_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Min_int32_tv 264 bytes stack frame, 276 bytes spill stores, 304 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Min_int32_tv 544 bytes stack frame, 400 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Min_int32_tv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Min_int8_tv 344 bytes stack frame, 436 bytes spill stores, 780 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Min_int8_tv 560 bytes stack frame, 432 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Min_int8_tv 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_min_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_min_i64.cu -o /<>/build/obj/collectives/device/reduce_min_i64.o Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_min_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_min_u64.cu -o /<>/build/obj/collectives/device/reduce_min_u64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Min_uint32_tv 312 bytes stack frame, 332 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Min_uint32_tv 560 bytes stack frame, 448 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Min_uint32_tv 128 bytes stack frame, 128 bytes spill stores, 128 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Min_uint8_tv 344 bytes stack frame, 440 bytes spill stores, 784 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Min_uint8_tv 560 bytes stack frame, 432 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Min_uint8_tv 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_min_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_min_f16.cu -o /<>/build/obj/collectives/device/reduce_min_f16.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Min_int64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Min_int64_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Min_int64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Min_uint64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Min_uint64_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Min_uint64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Min_uint32_tv 264 bytes stack frame, 276 bytes spill stores, 304 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Min_uint32_tv 544 bytes stack frame, 400 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Min_uint32_tv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_min_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_min_f32.cu -o /<>/build/obj/collectives/device/reduce_min_f32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Min_int64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Min_int64_tv 504 bytes stack frame, 360 bytes spill stores, 316 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Min_int64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_NVLS_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_SIMPLE_Min_halfv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL128_Min_halfv 544 bytes stack frame, 416 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_RING_LL_Min_halfv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Min_uint64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Min_uint64_tv 504 bytes stack frame, 360 bytes spill stores, 316 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Min_uint64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Min_int64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Min_int64_tv 504 bytes stack frame, 360 bytes spill stores, 316 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Min_int64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Min_floatv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Min_floatv 520 bytes stack frame, 392 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Min_floatv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_NVLS_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_SIMPLE_Min_halfv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL128_Min_halfv 528 bytes stack frame, 404 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_RING_LL_Min_halfv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Min_uint64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Min_uint64_tv 504 bytes stack frame, 360 bytes spill stores, 316 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Min_uint64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Min_int64_tv 208 bytes stack frame, 204 bytes spill stores, 204 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Min_int64_tv 560 bytes stack frame, 456 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Min_int64_tv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Min_floatv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Min_floatv 512 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Min_floatv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Min_uint64_tv 208 bytes stack frame, 204 bytes spill stores, 204 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Min_uint64_tv 560 bytes stack frame, 456 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Min_uint64_tv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_NVLS_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_SIMPLE_Min_halfv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL128_Min_halfv 544 bytes stack frame, 416 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_RING_LL_Min_halfv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Min_floatv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Min_floatv 512 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Min_floatv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Min_int64_tv 320 bytes stack frame, 360 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Min_int64_tv 560 bytes stack frame, 444 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Min_int64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Min_uint64_tv 320 bytes stack frame, 360 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Min_uint64_tv 560 bytes stack frame, 444 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Min_uint64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_NVLS_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_SIMPLE_Min_halfv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL128_Min_halfv 592 bytes stack frame, 472 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_RING_LL_Min_halfv 112 bytes stack frame, 108 bytes spill stores, 108 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Min_floatv 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Min_floatv 576 bytes stack frame, 460 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Min_floatv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Min_int64_tv 264 bytes stack frame, 272 bytes spill stores, 292 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Min_int64_tv 536 bytes stack frame, 376 bytes spill stores, 416 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Min_int64_tv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Min_uint64_tv 264 bytes stack frame, 272 bytes spill stores, 292 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Min_uint64_tv 536 bytes stack frame, 376 bytes spill stores, 416 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Min_uint64_tv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_min_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_min_f64.cu -o /<>/build/obj/collectives/device/reduce_min_f64.o Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_min_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_min_bf16.cu -o /<>/build/obj/collectives/device/reduce_min_bf16.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_NVLS_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_SIMPLE_Min_halfv 264 bytes stack frame, 276 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL128_Min_halfv 584 bytes stack frame, 476 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_RING_LL_Min_halfv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Min_floatv 312 bytes stack frame, 332 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Min_floatv 560 bytes stack frame, 448 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Min_floatv 128 bytes stack frame, 128 bytes spill stores, 128 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Min_doublev 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Min_doublev 520 bytes stack frame, 392 bytes spill stores, 344 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Min_doublev 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_NVLS_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_SIMPLE_Min_halfv 240 bytes stack frame, 252 bytes spill stores, 244 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL128_Min_halfv 560 bytes stack frame, 428 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_RING_LL_Min_halfv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_max_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_max_i8.cu -o /<>/build/obj/collectives/device/reduce_max_i8.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Min_floatv 264 bytes stack frame, 276 bytes spill stores, 304 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Min_floatv 544 bytes stack frame, 400 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Min_floatv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_Min___nv_bfloat16v 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_Min___nv_bfloat16v 536 bytes stack frame, 408 bytes spill stores, 380 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_Min___nv_bfloat16v 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_max_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_max_u8.cu -o /<>/build/obj/collectives/device/reduce_max_u8.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Min_doublev 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Min_doublev 512 bytes stack frame, 380 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Min_doublev 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Max_int8_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Max_int8_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Max_int8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Min_doublev 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Min_doublev 512 bytes stack frame, 380 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Min_doublev 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_Min___nv_bfloat16v 160 bytes stack frame, 160 bytes spill stores, 160 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_Min___nv_bfloat16v 536 bytes stack frame, 408 bytes spill stores, 380 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_Min___nv_bfloat16v 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Max_uint8_tv 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Max_uint8_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Max_uint8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Max_int8_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Max_int8_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Max_int8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Min_doublev 208 bytes stack frame, 204 bytes spill stores, 204 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Min_doublev 560 bytes stack frame, 448 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Min_doublev 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Max_uint8_tv 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Max_uint8_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Max_uint8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_Min___nv_bfloat16v 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_Min___nv_bfloat16v 536 bytes stack frame, 408 bytes spill stores, 380 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_Min___nv_bfloat16v 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Min_doublev 336 bytes stack frame, 372 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Min_doublev 544 bytes stack frame, 408 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Min_doublev 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Max_int8_tv 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Max_int8_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Max_int8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Max_uint8_tv 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Max_uint8_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Max_uint8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_Min___nv_bfloat16v 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_Min___nv_bfloat16v 584 bytes stack frame, 464 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_Min___nv_bfloat16v 112 bytes stack frame, 108 bytes spill stores, 108 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Min_doublev 272 bytes stack frame, 300 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Min_doublev 512 bytes stack frame, 344 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Min_doublev 104 bytes stack frame, 100 bytes spill stores, 100 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Max_int8_tv 296 bytes stack frame, 332 bytes spill stores, 388 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Max_int8_tv 592 bytes stack frame, 460 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Max_int8_tv 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_max_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_max_i32.cu -o /<>/build/obj/collectives/device/reduce_max_i32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Max_uint8_tv 296 bytes stack frame, 332 bytes spill stores, 388 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Max_uint8_tv 592 bytes stack frame, 460 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Max_uint8_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_Min___nv_bfloat16v 264 bytes stack frame, 276 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_Min___nv_bfloat16v 584 bytes stack frame, 476 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_Min___nv_bfloat16v 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Max_int8_tv 376 bytes stack frame, 464 bytes spill stores, 812 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Max_int8_tv 584 bytes stack frame, 452 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Max_int8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Max_int32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Max_int32_tv 520 bytes stack frame, 392 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Max_int32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Max_uint8_tv 376 bytes stack frame, 464 bytes spill stores, 812 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Max_uint8_tv 584 bytes stack frame, 452 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Max_uint8_tv 128 bytes stack frame, 128 bytes spill stores, 128 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_Min___nv_bfloat16v 240 bytes stack frame, 252 bytes spill stores, 244 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_Min___nv_bfloat16v 560 bytes stack frame, 428 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_Min___nv_bfloat16v 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_max_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_max_u32.cu -o /<>/build/obj/collectives/device/reduce_max_u32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Max_int32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Max_int32_tv 512 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Max_int32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Max_int8_tv 344 bytes stack frame, 436 bytes spill stores, 780 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Max_int8_tv 560 bytes stack frame, 432 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Max_int8_tv 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_max_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_max_i64.cu -o /<>/build/obj/collectives/device/reduce_max_i64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Max_uint8_tv 344 bytes stack frame, 440 bytes spill stores, 784 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Max_uint8_tv 560 bytes stack frame, 432 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Max_uint8_tv 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_max_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_max_u64.cu -o /<>/build/obj/collectives/device/reduce_max_u64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Max_int32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Max_int32_tv 512 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Max_int32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Max_uint32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Max_uint32_tv 520 bytes stack frame, 392 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Max_uint32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Max_int64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Max_int64_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Max_int64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Max_int32_tv 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Max_int32_tv 576 bytes stack frame, 460 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Max_int32_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Max_uint32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Max_uint32_tv 512 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Max_uint32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Max_uint64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Max_uint64_tv 512 bytes stack frame, 376 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Max_uint64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Max_int64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Max_int64_tv 504 bytes stack frame, 360 bytes spill stores, 316 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Max_int64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Max_uint32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Max_uint32_tv 512 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Max_uint32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Max_int32_tv 312 bytes stack frame, 332 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Max_int32_tv 560 bytes stack frame, 448 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Max_int32_tv 128 bytes stack frame, 128 bytes spill stores, 128 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Max_uint64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Max_uint64_tv 504 bytes stack frame, 360 bytes spill stores, 316 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Max_uint64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Max_int64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Max_int64_tv 504 bytes stack frame, 360 bytes spill stores, 316 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Max_int64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Max_uint32_tv 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Max_uint32_tv 576 bytes stack frame, 460 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Max_uint32_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Max_uint64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Max_uint64_tv 504 bytes stack frame, 360 bytes spill stores, 316 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Max_uint64_tv 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Max_int32_tv 264 bytes stack frame, 276 bytes spill stores, 304 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Max_int32_tv 544 bytes stack frame, 400 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Max_int32_tv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_max_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_max_f16.cu -o /<>/build/obj/collectives/device/reduce_max_f16.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Max_int64_tv 208 bytes stack frame, 204 bytes spill stores, 204 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Max_int64_tv 560 bytes stack frame, 456 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Max_int64_tv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Max_uint32_tv 312 bytes stack frame, 332 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Max_uint32_tv 560 bytes stack frame, 448 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Max_uint32_tv 128 bytes stack frame, 128 bytes spill stores, 128 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Max_uint64_tv 208 bytes stack frame, 204 bytes spill stores, 204 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Max_uint64_tv 560 bytes stack frame, 456 bytes spill stores, 464 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Max_uint64_tv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_NVLS_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_SIMPLE_Max_halfv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL128_Max_halfv 544 bytes stack frame, 416 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_RING_LL_Max_halfv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Max_int64_tv 320 bytes stack frame, 360 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Max_int64_tv 560 bytes stack frame, 444 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Max_int64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Max_uint64_tv 320 bytes stack frame, 360 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Max_uint64_tv 560 bytes stack frame, 444 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Max_uint64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Max_uint32_tv 264 bytes stack frame, 276 bytes spill stores, 304 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Max_uint32_tv 544 bytes stack frame, 400 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Max_uint32_tv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_max_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_max_f32.cu -o /<>/build/obj/collectives/device/reduce_max_f32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_SIMPLE_Max_int64_tv 264 bytes stack frame, 272 bytes spill stores, 292 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL128_Max_int64_tv 536 bytes stack frame, 376 bytes spill stores, 416 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL_Max_int64_tv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_NVLS_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_SIMPLE_Max_halfv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL128_Max_halfv 528 bytes stack frame, 404 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_RING_LL_Max_halfv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_max_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_max_f64.cu -o /<>/build/obj/collectives/device/reduce_max_f64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_SIMPLE_Max_uint64_tv 264 bytes stack frame, 272 bytes spill stores, 292 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL128_Max_uint64_tv 536 bytes stack frame, 376 bytes spill stores, 416 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL_Max_uint64_tv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_max_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_max_bf16.cu -o /<>/build/obj/collectives/device/reduce_max_bf16.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Max_floatv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Max_floatv 520 bytes stack frame, 392 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Max_floatv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_NVLS_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_SIMPLE_Max_halfv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL128_Max_halfv 544 bytes stack frame, 416 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_RING_LL_Max_halfv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Max_doublev 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Max_doublev 520 bytes stack frame, 392 bytes spill stores, 344 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Max_doublev 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Max_floatv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Max_floatv 512 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Max_floatv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_Max___nv_bfloat16v 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_Max___nv_bfloat16v 536 bytes stack frame, 408 bytes spill stores, 380 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_Max___nv_bfloat16v 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_NVLS_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_SIMPLE_Max_halfv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL128_Max_halfv 592 bytes stack frame, 472 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_RING_LL_Max_halfv 112 bytes stack frame, 108 bytes spill stores, 108 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Max_doublev 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Max_doublev 512 bytes stack frame, 380 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Max_doublev 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Max_floatv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Max_floatv 512 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Max_floatv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_NVLS_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_SIMPLE_Max_halfv 264 bytes stack frame, 276 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL128_Max_halfv 584 bytes stack frame, 476 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_RING_LL_Max_halfv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Max_doublev 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Max_doublev 512 bytes stack frame, 380 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Max_doublev 88 bytes stack frame, 84 bytes spill stores, 84 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_Max___nv_bfloat16v 160 bytes stack frame, 160 bytes spill stores, 160 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_Max___nv_bfloat16v 536 bytes stack frame, 408 bytes spill stores, 380 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_Max___nv_bfloat16v 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Max_floatv 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Max_floatv 576 bytes stack frame, 460 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Max_floatv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Max_doublev 208 bytes stack frame, 204 bytes spill stores, 204 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Max_doublev 560 bytes stack frame, 448 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Max_doublev 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_NVLS_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_NVLS_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_SIMPLE_Max_halfv 240 bytes stack frame, 252 bytes spill stores, 244 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_RING_LL128_Max_halfv 560 bytes stack frame, 428 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_RING_LL_Max_halfv 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z39ncclFunction_Reduce_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z36ncclFunction_Reduce_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_premulsum_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_premulsum_i8.cu -o /<>/build/obj/collectives/device/reduce_premulsum_i8.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Max_floatv 312 bytes stack frame, 332 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Max_floatv 560 bytes stack frame, 448 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Max_floatv 128 bytes stack frame, 128 bytes spill stores, 128 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_Max___nv_bfloat16v 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_Max___nv_bfloat16v 536 bytes stack frame, 408 bytes spill stores, 380 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_Max___nv_bfloat16v 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Max_doublev 336 bytes stack frame, 372 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Max_doublev 544 bytes stack frame, 408 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Max_doublev 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_NVLS_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_NVLS_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_SIMPLE_Max_floatv 264 bytes stack frame, 276 bytes spill stores, 304 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_RING_LL128_Max_floatv 544 bytes stack frame, 400 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_RING_LL_Max_floatv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z40ncclFunction_Reduce_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z37ncclFunction_Reduce_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_premulsum_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_premulsum_u8.cu -o /<>/build/obj/collectives/device/reduce_premulsum_u8.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_SIMPLE_PreMulSum_int8_tv 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL128_PreMulSum_int8_tv 512 bytes stack frame, 384 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL_PreMulSum_int8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_Max___nv_bfloat16v 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_Max___nv_bfloat16v 584 bytes stack frame, 464 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_Max___nv_bfloat16v 112 bytes stack frame, 108 bytes spill stores, 108 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_NVLS_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_NVLS_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_COLLNET_CHAIN_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_COLLNET_DIRECT_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_SIMPLE_Max_doublev 272 bytes stack frame, 300 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_RING_LL128_Max_doublev 512 bytes stack frame, 344 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_RING_LL_Max_doublev 104 bytes stack frame, 100 bytes spill stores, 100 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_Reduce_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z38ncclFunction_Reduce_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_premulsum_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_premulsum_i32.cu -o /<>/build/obj/collectives/device/reduce_premulsum_i32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_SIMPLE_PreMulSum_int8_tv 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL128_PreMulSum_int8_tv 512 bytes stack frame, 384 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL_PreMulSum_int8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_Max___nv_bfloat16v 264 bytes stack frame, 276 bytes spill stores, 276 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_Max___nv_bfloat16v 584 bytes stack frame, 476 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_Max___nv_bfloat16v 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_PreMulSum_uint8_tv 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_PreMulSum_uint8_tv 512 bytes stack frame, 384 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_PreMulSum_uint8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_PreMulSum_int32_tv 160 bytes stack frame, 160 bytes spill stores, 160 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_PreMulSum_int32_tv 528 bytes stack frame, 400 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_PreMulSum_int32_tv 104 bytes stack frame, 100 bytes spill stores, 100 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_Max___nv_bfloat16v 240 bytes stack frame, 252 bytes spill stores, 244 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_Max___nv_bfloat16v 560 bytes stack frame, 428 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_Max___nv_bfloat16v 104 bytes stack frame, 104 bytes spill stores, 104 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_SIMPLE_PreMulSum_int8_tv 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL128_PreMulSum_int8_tv 512 bytes stack frame, 384 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL_PreMulSum_int8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_premulsum_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_premulsum_u32.cu -o /<>/build/obj/collectives/device/reduce_premulsum_u32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_PreMulSum_uint8_tv 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_PreMulSum_uint8_tv 512 bytes stack frame, 384 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_PreMulSum_uint8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_PreMulSum_int32_tv 160 bytes stack frame, 160 bytes spill stores, 160 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_PreMulSum_int32_tv 520 bytes stack frame, 408 bytes spill stores, 352 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_PreMulSum_int32_tv 104 bytes stack frame, 100 bytes spill stores, 100 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_SIMPLE_PreMulSum_int8_tv 216 bytes stack frame, 208 bytes spill stores, 208 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL128_PreMulSum_int8_tv 568 bytes stack frame, 464 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL_PreMulSum_int8_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_PreMulSum_uint8_tv 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_PreMulSum_uint8_tv 512 bytes stack frame, 384 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_PreMulSum_uint8_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_PreMulSum_int32_tv 160 bytes stack frame, 160 bytes spill stores, 160 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_PreMulSum_int32_tv 520 bytes stack frame, 408 bytes spill stores, 352 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_PreMulSum_int32_tv 104 bytes stack frame, 100 bytes spill stores, 100 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_PreMulSum_uint32_tv 160 bytes stack frame, 160 bytes spill stores, 160 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_PreMulSum_uint32_tv 528 bytes stack frame, 400 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_PreMulSum_uint32_tv 104 bytes stack frame, 100 bytes spill stores, 100 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_PreMulSum_int32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_PreMulSum_int32_tv 576 bytes stack frame, 468 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_PreMulSum_int32_tv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_PreMulSum_uint32_tv 160 bytes stack frame, 160 bytes spill stores, 160 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_PreMulSum_uint32_tv 520 bytes stack frame, 408 bytes spill stores, 352 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_PreMulSum_uint32_tv 104 bytes stack frame, 100 bytes spill stores, 100 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_SIMPLE_PreMulSum_int8_tv 376 bytes stack frame, 464 bytes spill stores, 808 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL128_PreMulSum_int8_tv 584 bytes stack frame, 460 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL_PreMulSum_int8_tv 128 bytes stack frame, 128 bytes spill stores, 128 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_PreMulSum_uint8_tv 216 bytes stack frame, 208 bytes spill stores, 208 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_PreMulSum_uint8_tv 568 bytes stack frame, 464 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_PreMulSum_uint8_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_PreMulSum_uint32_tv 160 bytes stack frame, 160 bytes spill stores, 160 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_PreMulSum_uint32_tv 520 bytes stack frame, 408 bytes spill stores, 352 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_PreMulSum_uint32_tv 104 bytes stack frame, 100 bytes spill stores, 100 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_PreMulSum_int32_tv 320 bytes stack frame, 336 bytes spill stores, 472 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_PreMulSum_int32_tv 576 bytes stack frame, 480 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_PreMulSum_int32_tv 136 bytes stack frame, 136 bytes spill stores, 136 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_PreMulSum_uint8_tv 376 bytes stack frame, 464 bytes spill stores, 808 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_PreMulSum_uint8_tv 584 bytes stack frame, 460 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_PreMulSum_uint8_tv 128 bytes stack frame, 128 bytes spill stores, 128 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_SIMPLE_PreMulSum_int8_tv 368 bytes stack frame, 452 bytes spill stores, 752 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL128_PreMulSum_int8_tv 544 bytes stack frame, 408 bytes spill stores, 424 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL_PreMulSum_int8_tv 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_premulsum_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_premulsum_i64.cu -o /<>/build/obj/collectives/device/reduce_premulsum_i64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_PreMulSum_uint32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_PreMulSum_uint32_tv 576 bytes stack frame, 468 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_PreMulSum_uint32_tv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_PreMulSum_int32_tv 272 bytes stack frame, 280 bytes spill stores, 340 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_PreMulSum_int32_tv 544 bytes stack frame, 392 bytes spill stores, 412 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_PreMulSum_int32_tv 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_premulsum_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_premulsum_u64.cu -o /<>/build/obj/collectives/device/reduce_premulsum_u64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_PreMulSum_uint8_tv 368 bytes stack frame, 452 bytes spill stores, 752 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_PreMulSum_uint8_tv 544 bytes stack frame, 408 bytes spill stores, 424 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_PreMulSum_uint8_tv 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_PreMulSum_int64_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_PreMulSum_int64_tv 544 bytes stack frame, 408 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_PreMulSum_int64_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_premulsum_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_premulsum_f16.cu -o /<>/build/obj/collectives/device/reduce_premulsum_f16.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_PreMulSum_uint32_tv 320 bytes stack frame, 336 bytes spill stores, 472 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_PreMulSum_uint32_tv 576 bytes stack frame, 480 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_PreMulSum_uint32_tv 136 bytes stack frame, 136 bytes spill stores, 136 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_PreMulSum_uint64_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_PreMulSum_uint64_tv 544 bytes stack frame, 408 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_PreMulSum_uint64_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_PreMulSum_int64_tv 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_PreMulSum_int64_tv 536 bytes stack frame, 392 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_PreMulSum_int64_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_PreMulSum_uint32_tv 272 bytes stack frame, 280 bytes spill stores, 340 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_PreMulSum_uint32_tv 544 bytes stack frame, 392 bytes spill stores, 412 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_PreMulSum_uint32_tv 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_premulsum_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_premulsum_f32.cu -o /<>/build/obj/collectives/device/reduce_premulsum_f32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_SIMPLE_PreMulSum_halfv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL128_PreMulSum_halfv 552 bytes stack frame, 428 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL_PreMulSum_halfv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_PreMulSum_uint64_tv 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_PreMulSum_uint64_tv 536 bytes stack frame, 392 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_PreMulSum_uint64_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_PreMulSum_int64_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_PreMulSum_int64_tv 536 bytes stack frame, 392 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_PreMulSum_int64_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_PreMulSum_uint64_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_PreMulSum_uint64_tv 536 bytes stack frame, 392 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_PreMulSum_uint64_tv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_SIMPLE_PreMulSum_halfv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL128_PreMulSum_halfv 536 bytes stack frame, 412 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL_PreMulSum_halfv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_SIMPLE_PreMulSum_floatv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL128_PreMulSum_floatv 528 bytes stack frame, 424 bytes spill stores, 364 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL_PreMulSum_floatv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_PreMulSum_int64_tv 208 bytes stack frame, 204 bytes spill stores, 204 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_PreMulSum_int64_tv 584 bytes stack frame, 460 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_PreMulSum_int64_tv 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_PreMulSum_uint64_tv 208 bytes stack frame, 204 bytes spill stores, 204 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_PreMulSum_uint64_tv 584 bytes stack frame, 460 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_PreMulSum_uint64_tv 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_SIMPLE_PreMulSum_floatv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL128_PreMulSum_floatv 520 bytes stack frame, 408 bytes spill stores, 352 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL_PreMulSum_floatv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_SIMPLE_PreMulSum_halfv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL128_PreMulSum_halfv 552 bytes stack frame, 428 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL_PreMulSum_halfv 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_PreMulSum_int64_tv 336 bytes stack frame, 360 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_PreMulSum_int64_tv 576 bytes stack frame, 444 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_PreMulSum_int64_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_PreMulSum_uint64_tv 336 bytes stack frame, 360 bytes spill stores, 612 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_PreMulSum_uint64_tv 576 bytes stack frame, 444 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_PreMulSum_uint64_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_SIMPLE_PreMulSum_floatv 192 bytes stack frame, 188 bytes spill stores, 188 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL128_PreMulSum_floatv 520 bytes stack frame, 408 bytes spill stores, 352 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL_PreMulSum_floatv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_SIMPLE_PreMulSum_halfv 208 bytes stack frame, 204 bytes spill stores, 204 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL128_PreMulSum_halfv 592 bytes stack frame, 460 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL_PreMulSum_halfv 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_PreMulSum_int64_tv 288 bytes stack frame, 316 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_PreMulSum_int64_tv 552 bytes stack frame, 416 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_PreMulSum_int64_tv 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_premulsum_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_premulsum_f64.cu -o /<>/build/obj/collectives/device/reduce_premulsum_f64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_PreMulSum_uint64_tv 288 bytes stack frame, 316 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_PreMulSum_uint64_tv 552 bytes stack frame, 416 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_PreMulSum_uint64_tv 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_SIMPLE_PreMulSum_floatv 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL128_PreMulSum_floatv 576 bytes stack frame, 468 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL_PreMulSum_floatv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_premulsum_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_premulsum_bf16.cu -o /<>/build/obj/collectives/device/reduce_premulsum_bf16.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_SIMPLE_PreMulSum_halfv 272 bytes stack frame, 284 bytes spill stores, 300 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL128_PreMulSum_halfv 592 bytes stack frame, 476 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL_PreMulSum_halfv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_SIMPLE_PreMulSum_doublev 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL128_PreMulSum_doublev 536 bytes stack frame, 392 bytes spill stores, 364 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL_PreMulSum_doublev 96 bytes stack frame, 92 bytes spill stores, 92 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_SIMPLE_PreMulSum_floatv 320 bytes stack frame, 336 bytes spill stores, 472 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL128_PreMulSum_floatv 576 bytes stack frame, 480 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL_PreMulSum_floatv 136 bytes stack frame, 136 bytes spill stores, 136 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_NVLS_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_SIMPLE_PreMulSum_halfv 240 bytes stack frame, 252 bytes spill stores, 244 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL128_PreMulSum_halfv 560 bytes stack frame, 436 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_RING_LL_PreMulSum_halfv 112 bytes stack frame, 108 bytes spill stores, 108 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_Reduce_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_SIMPLE_PreMulSum_doublev 160 bytes stack frame, 160 bytes spill stores, 160 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL128_PreMulSum_doublev 536 bytes stack frame, 380 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL_PreMulSum_doublev 96 bytes stack frame, 92 bytes spill stores, 92 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_sumpostdiv_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_sumpostdiv_i8.cu -o /<>/build/obj/collectives/device/reduce_sumpostdiv_i8.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_NVLS_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_SIMPLE_PreMulSum_floatv 272 bytes stack frame, 280 bytes spill stores, 340 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL128_PreMulSum_floatv 544 bytes stack frame, 392 bytes spill stores, 412 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_RING_LL_PreMulSum_floatv 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_Reduce_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z60ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_RING_SIMPLE_PreMulSum___nv_bfloat16v 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_RING_LL128_PreMulSum___nv_bfloat16v 536 bytes stack frame, 424 bytes spill stores, 392 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_RING_LL_PreMulSum___nv_bfloat16v 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_sumpostdiv_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_sumpostdiv_u8.cu -o /<>/build/obj/collectives/device/reduce_sumpostdiv_u8.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_SIMPLE_PreMulSum_doublev 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL128_PreMulSum_doublev 536 bytes stack frame, 380 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL_PreMulSum_doublev 96 bytes stack frame, 92 bytes spill stores, 92 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_int8_tv 240 bytes stack frame, 256 bytes spill stores, 256 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_SumPostDiv_int8_tv 560 bytes stack frame, 424 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_SumPostDiv_int8_tv 96 bytes stack frame, 92 bytes spill stores, 92 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_SIMPLE_PreMulSum_doublev 208 bytes stack frame, 204 bytes spill stores, 204 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL128_PreMulSum_doublev 576 bytes stack frame, 456 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL_PreMulSum_doublev 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_uint8_tv 240 bytes stack frame, 256 bytes spill stores, 256 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_SumPostDiv_uint8_tv 560 bytes stack frame, 424 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_SumPostDiv_uint8_tv 96 bytes stack frame, 92 bytes spill stores, 92 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z60ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_RING_SIMPLE_PreMulSum___nv_bfloat16v 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_RING_LL128_PreMulSum___nv_bfloat16v 536 bytes stack frame, 424 bytes spill stores, 392 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_RING_LL_PreMulSum___nv_bfloat16v 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_SIMPLE_PreMulSum_doublev 272 bytes stack frame, 284 bytes spill stores, 300 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL128_PreMulSum_doublev 568 bytes stack frame, 440 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL_PreMulSum_doublev 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_int8_tv 240 bytes stack frame, 256 bytes spill stores, 256 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_SumPostDiv_int8_tv 560 bytes stack frame, 424 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_SumPostDiv_int8_tv 96 bytes stack frame, 92 bytes spill stores, 92 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_uint8_tv 240 bytes stack frame, 256 bytes spill stores, 256 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_SumPostDiv_uint8_tv 560 bytes stack frame, 424 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_SumPostDiv_uint8_tv 96 bytes stack frame, 92 bytes spill stores, 92 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_NVLS_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_SIMPLE_PreMulSum_doublev 216 bytes stack frame, 204 bytes spill stores, 204 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL128_PreMulSum_doublev 536 bytes stack frame, 368 bytes spill stores, 412 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_RING_LL_PreMulSum_doublev 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_Reduce_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_sumpostdiv_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_sumpostdiv_i32.cu -o /<>/build/obj/collectives/device/reduce_sumpostdiv_i32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_int8_tv 240 bytes stack frame, 256 bytes spill stores, 256 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_SumPostDiv_int8_tv 560 bytes stack frame, 424 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_SumPostDiv_int8_tv 96 bytes stack frame, 92 bytes spill stores, 92 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z60ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_RING_SIMPLE_PreMulSum___nv_bfloat16v 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_RING_LL128_PreMulSum___nv_bfloat16v 536 bytes stack frame, 424 bytes spill stores, 392 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_RING_LL_PreMulSum___nv_bfloat16v 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_uint8_tv 240 bytes stack frame, 256 bytes spill stores, 256 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_SumPostDiv_uint8_tv 560 bytes stack frame, 424 bytes spill stores, 400 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_SumPostDiv_uint8_tv 96 bytes stack frame, 92 bytes spill stores, 92 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_int8_tv 280 bytes stack frame, 296 bytes spill stores, 348 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_SumPostDiv_int8_tv 592 bytes stack frame, 456 bytes spill stores, 492 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_SumPostDiv_int8_tv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_int32_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_SumPostDiv_int32_tv 528 bytes stack frame, 392 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_SumPostDiv_int32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_uint8_tv 280 bytes stack frame, 304 bytes spill stores, 356 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_SumPostDiv_uint8_tv 592 bytes stack frame, 456 bytes spill stores, 492 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_SumPostDiv_uint8_tv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z60ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_RING_SIMPLE_PreMulSum___nv_bfloat16v 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_RING_LL128_PreMulSum___nv_bfloat16v 576 bytes stack frame, 496 bytes spill stores, 476 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_RING_LL_PreMulSum___nv_bfloat16v 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_int32_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_SumPostDiv_int32_tv 520 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_SumPostDiv_int32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_int8_tv 424 bytes stack frame, 484 bytes spill stores, 884 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_SumPostDiv_int8_tv 592 bytes stack frame, 460 bytes spill stores, 516 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_SumPostDiv_int8_tv 128 bytes stack frame, 128 bytes spill stores, 128 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z60ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_RING_SIMPLE_PreMulSum___nv_bfloat16v 272 bytes stack frame, 284 bytes spill stores, 300 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_RING_LL128_PreMulSum___nv_bfloat16v 592 bytes stack frame, 476 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_RING_LL_PreMulSum___nv_bfloat16v 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_uint8_tv 424 bytes stack frame, 484 bytes spill stores, 884 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_SumPostDiv_uint8_tv 592 bytes stack frame, 460 bytes spill stores, 516 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_SumPostDiv_uint8_tv 136 bytes stack frame, 132 bytes spill stores, 132 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_int32_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_SumPostDiv_int32_tv 520 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_SumPostDiv_int32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_NVLS_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_NVLS_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_int8_tv 376 bytes stack frame, 484 bytes spill stores, 788 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_RING_LL128_SumPostDiv_int8_tv 568 bytes stack frame, 428 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_RING_LL_SumPostDiv_int8_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_Reduce_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_Reduce_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z60ncclFunction_Reduce_NVLS_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_NVLS_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_NVLS_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_Reduce_COLLNET_CHAIN_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_CHAIN_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_Reduce_COLLNET_DIRECT_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_Reduce_COLLNET_DIRECT_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_RING_SIMPLE_PreMulSum___nv_bfloat16v 240 bytes stack frame, 252 bytes spill stores, 244 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_RING_LL128_PreMulSum___nv_bfloat16v 560 bytes stack frame, 436 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_RING_LL_PreMulSum___nv_bfloat16v 112 bytes stack frame, 108 bytes spill stores, 108 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_int32_tv 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_SumPostDiv_int32_tv 576 bytes stack frame, 460 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_SumPostDiv_int32_tv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_sumpostdiv_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_sumpostdiv_u32.cu -o /<>/build/obj/collectives/device/reduce_sumpostdiv_u32.o Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_sumpostdiv_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_sumpostdiv_i64.cu -o /<>/build/obj/collectives/device/reduce_sumpostdiv_i64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_uint8_tv 376 bytes stack frame, 484 bytes spill stores, 788 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_SumPostDiv_uint8_tv 576 bytes stack frame, 436 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_SumPostDiv_uint8_tv 112 bytes stack frame, 108 bytes spill stores, 108 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_sumpostdiv_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_sumpostdiv_u64.cu -o /<>/build/obj/collectives/device/reduce_sumpostdiv_u64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_int32_tv 320 bytes stack frame, 348 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_SumPostDiv_int32_tv 560 bytes stack frame, 448 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_SumPostDiv_int32_tv 136 bytes stack frame, 132 bytes spill stores, 132 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_uint32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_LL128_SumPostDiv_uint32_tv 528 bytes stack frame, 392 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL_SumPostDiv_uint32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_int64_tv 208 bytes stack frame, 204 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_SumPostDiv_int64_tv 528 bytes stack frame, 396 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_SumPostDiv_int64_tv 112 bytes stack frame, 108 bytes spill stores, 108 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_uint64_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_LL128_SumPostDiv_uint64_tv 528 bytes stack frame, 396 bytes spill stores, 356 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL_SumPostDiv_uint64_tv 96 bytes stack frame, 92 bytes spill stores, 92 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_int32_tv 272 bytes stack frame, 304 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_SumPostDiv_int32_tv 544 bytes stack frame, 400 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_SumPostDiv_int32_tv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_sumpostdiv_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_sumpostdiv_f16.cu -o /<>/build/obj/collectives/device/reduce_sumpostdiv_f16.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_uint32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_LL128_SumPostDiv_uint32_tv 520 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL_SumPostDiv_uint32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_int64_tv 208 bytes stack frame, 204 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_SumPostDiv_int64_tv 520 bytes stack frame, 376 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_SumPostDiv_int64_tv 112 bytes stack frame, 108 bytes spill stores, 108 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_uint64_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_LL128_SumPostDiv_uint64_tv 520 bytes stack frame, 388 bytes spill stores, 340 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL_SumPostDiv_uint64_tv 96 bytes stack frame, 92 bytes spill stores, 92 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_int64_tv 208 bytes stack frame, 204 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_SumPostDiv_int64_tv 520 bytes stack frame, 376 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_SumPostDiv_int64_tv 112 bytes stack frame, 108 bytes spill stores, 108 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_uint32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_LL128_SumPostDiv_uint32_tv 520 bytes stack frame, 380 bytes spill stores, 324 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL_SumPostDiv_uint32_tv 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_uint64_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_LL128_SumPostDiv_uint64_tv 520 bytes stack frame, 388 bytes spill stores, 340 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL_SumPostDiv_uint64_tv 96 bytes stack frame, 92 bytes spill stores, 92 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_int64_tv 240 bytes stack frame, 240 bytes spill stores, 236 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_SumPostDiv_int64_tv 560 bytes stack frame, 448 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_SumPostDiv_int64_tv 128 bytes stack frame, 128 bytes spill stores, 128 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_uint32_tv 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_LL128_SumPostDiv_uint32_tv 576 bytes stack frame, 460 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL_SumPostDiv_uint32_tv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_uint64_tv 224 bytes stack frame, 220 bytes spill stores, 220 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_LL128_SumPostDiv_uint64_tv 560 bytes stack frame, 436 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL_SumPostDiv_uint64_tv 104 bytes stack frame, 100 bytes spill stores, 100 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_sumpostdiv_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_sumpostdiv_f32.cu -o /<>/build/obj/collectives/device/reduce_sumpostdiv_f32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_int64_tv 368 bytes stack frame, 400 bytes spill stores, 760 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_SumPostDiv_int64_tv 560 bytes stack frame, 456 bytes spill stores, 476 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_SumPostDiv_int64_tv 136 bytes stack frame, 136 bytes spill stores, 136 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_uint32_tv 312 bytes stack frame, 340 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_LL128_SumPostDiv_uint32_tv 560 bytes stack frame, 448 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL_SumPostDiv_uint32_tv 136 bytes stack frame, 132 bytes spill stores, 132 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_uint64_tv 328 bytes stack frame, 348 bytes spill stores, 520 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_LL128_SumPostDiv_uint64_tv 560 bytes stack frame, 424 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL_SumPostDiv_uint64_tv 112 bytes stack frame, 112 bytes spill stores, 112 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_NVLS_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_NVLS_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_int64_tv 320 bytes stack frame, 352 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_RING_LL128_SumPostDiv_int64_tv 544 bytes stack frame, 420 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_RING_LL_SumPostDiv_int64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_Reduce_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_Reduce_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_uint32_tv 264 bytes stack frame, 292 bytes spill stores, 348 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_LL128_SumPostDiv_uint32_tv 544 bytes stack frame, 400 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL_SumPostDiv_uint32_tv 120 bytes stack frame, 120 bytes spill stores, 120 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_sumpostdiv_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_sumpostdiv_f64.cu -o /<>/build/obj/collectives/device/reduce_sumpostdiv_f64.o Compiling reduce.cu > /<>/build/obj/collectives/device/reduce_sumpostdiv_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_sumpostdiv_bf16.cu -o /<>/build/obj/collectives/device/reduce_sumpostdiv_bf16.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_Reduce_NVLS_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_Reduce_NVLS_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_Reduce_NVLS_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_NVLS_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_NVLS_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_NVLS_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_Reduce_COLLNET_CHAIN_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_Reduce_COLLNET_CHAIN_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_Reduce_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_Reduce_COLLNET_DIRECT_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_Reduce_COLLNET_DIRECT_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_RING_SIMPLE_SumPostDiv_uint64_tv 288 bytes stack frame, 308 bytes spill stores, 416 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_RING_LL128_SumPostDiv_uint64_tv 544 bytes stack frame, 396 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_RING_LL_SumPostDiv_uint64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_Reduce_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_Reduce_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_Reduce_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sum_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_sum_i8.cu -o /<>/build/obj/collectives/device/reduce_scatter_sum_i8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sum_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_sum_u8.cu -o /<>/build/obj/collectives/device/reduce_scatter_sum_u8.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sum_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_sum_i32.cu -o /<>/build/obj/collectives/device/reduce_scatter_sum_i32.o Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sum_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_sum_u32.cu -o /<>/build/obj/collectives/device/reduce_scatter_sum_u32.o ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 87 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_int8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Sum_int8_tv 304 bytes stack frame, 332 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Sum_int8_tv 536 bytes stack frame, 376 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Sum_int8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 87 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_uint8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Sum_uint8_tv 304 bytes stack frame, 332 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Sum_uint8_tv 536 bytes stack frame, 376 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Sum_uint8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 88 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_int8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Sum_int8_tv 304 bytes stack frame, 332 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Sum_int8_tv 536 bytes stack frame, 376 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Sum_int8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 89 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_int32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Sum_int32_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Sum_int32_tv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Sum_int32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 89 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_uint32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Sum_uint32_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Sum_uint32_tv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Sum_uint32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 88 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_uint8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Sum_uint8_tv 304 bytes stack frame, 332 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Sum_uint8_tv 536 bytes stack frame, 376 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Sum_uint8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 87 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_int8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Sum_int8_tv 304 bytes stack frame, 332 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Sum_int8_tv 536 bytes stack frame, 376 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Sum_int8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 88 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_int32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Sum_int32_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Sum_int32_tv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Sum_int32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 88 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_uint32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Sum_uint32_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Sum_uint32_tv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Sum_uint32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 87 registers, 344 bytes cmem[0], 8 bytes cmem[2] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_uint8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Sum_uint8_tv 304 bytes stack frame, 332 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Sum_uint8_tv 536 bytes stack frame, 376 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Sum_uint8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 92 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_int8_tv 296 bytes stack frame, 308 bytes spill stores, 296 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Sum_int8_tv 344 bytes stack frame, 388 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Sum_int8_tv 592 bytes stack frame, 444 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Sum_int8_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 89 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_int32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Sum_int32_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Sum_int32_tv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Sum_int32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 89 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_uint32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Sum_uint32_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Sum_uint32_tv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Sum_uint32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 92 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_uint8_tv 296 bytes stack frame, 308 bytes spill stores, 296 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Sum_uint8_tv 344 bytes stack frame, 388 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Sum_uint8_tv 592 bytes stack frame, 444 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Sum_uint8_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 94 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_int8_tv 432 bytes stack frame, 492 bytes spill stores, 708 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Sum_int8_tv 480 bytes stack frame, 728 bytes spill stores, 1228 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Sum_int8_tv 584 bytes stack frame, 432 bytes spill stores, 488 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Sum_int8_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_int32_tv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Sum_int32_tv 312 bytes stack frame, 336 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Sum_int32_tv 600 bytes stack frame, 464 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Sum_int32_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_uint32_tv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Sum_uint32_tv 312 bytes stack frame, 336 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Sum_uint32_tv 600 bytes stack frame, 464 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Sum_uint32_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 94 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_uint8_tv 432 bytes stack frame, 492 bytes spill stores, 708 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Sum_uint8_tv 480 bytes stack frame, 728 bytes spill stores, 1228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Sum_uint8_tv 584 bytes stack frame, 432 bytes spill stores, 488 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Sum_uint8_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_int32_tv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Sum_int32_tv 408 bytes stack frame, 452 bytes spill stores, 792 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Sum_int32_tv 608 bytes stack frame, 476 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Sum_int32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_uint32_tv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Sum_uint32_tv 408 bytes stack frame, 452 bytes spill stores, 792 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Sum_uint32_tv 608 bytes stack frame, 476 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Sum_uint32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_RING_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 91 registers ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_TREE_LL_Sum_int8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_int8_tv 424 bytes stack frame, 512 bytes spill stores, 768 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Sum_int8_tv 408 bytes stack frame, 524 bytes spill stores, 836 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Sum_int8_tv 544 bytes stack frame, 396 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Sum_int8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Sum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sum_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_sum_i64.cu -o /<>/build/obj/collectives/device/reduce_scatter_sum_i64.o ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_RING_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 91 registers ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_TREE_LL_Sum_uint8_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_uint8_tv 424 bytes stack frame, 512 bytes spill stores, 768 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Sum_uint8_tv 408 bytes stack frame, 524 bytes spill stores, 836 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Sum_uint8_tv 544 bytes stack frame, 396 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Sum_uint8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Sum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sum_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_sum_u64.cu -o /<>/build/obj/collectives/device/reduce_scatter_sum_u64.o ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_RING_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 94 registers ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_int32_tv 272 bytes stack frame, 280 bytes spill stores, 292 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Sum_int32_tv 320 bytes stack frame, 344 bytes spill stores, 472 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Sum_int32_tv 568 bytes stack frame, 432 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Sum_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Sum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 94 registers ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint32_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_uint32_tv 272 bytes stack frame, 280 bytes spill stores, 292 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Sum_uint32_tv 320 bytes stack frame, 344 bytes spill stores, 472 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Sum_uint32_tv 568 bytes stack frame, 432 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Sum_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Sum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sum_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_sum_f16.cu -o /<>/build/obj/collectives/device/reduce_scatter_sum_f16.o Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sum_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_sum_f32.cu -o /<>/build/obj/collectives/device/reduce_scatter_sum_f32.o ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 85 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_int64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Sum_int64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Sum_int64_tv 552 bytes stack frame, 420 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Sum_int64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 85 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_uint64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Sum_uint64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Sum_uint64_tv 552 bytes stack frame, 420 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Sum_uint64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 87 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_int64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Sum_int64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Sum_int64_tv 552 bytes stack frame, 420 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Sum_int64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z47ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z47ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z42ncclKernel_ReduceScatter_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z42ncclKernel_ReduceScatter_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z51ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z51ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z52ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z52ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z42ncclKernel_ReduceScatter_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z42ncclKernel_ReduceScatter_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 89 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z42ncclKernel_ReduceScatter_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z42ncclKernel_ReduceScatter_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_floatv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Sum_floatv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Sum_floatv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Sum_floatv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z46ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z46ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_ReduceScatter_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z41ncclKernel_ReduceScatter_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z51ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z51ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_ReduceScatter_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z41ncclKernel_ReduceScatter_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 88 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z41ncclKernel_ReduceScatter_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z41ncclKernel_ReduceScatter_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_halfv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_NVLS_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_SIMPLE_Sum_halfv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL128_Sum_halfv 560 bytes stack frame, 428 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_RING_LL_Sum_halfv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 87 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_uint64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Sum_uint64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Sum_uint64_tv 552 bytes stack frame, 420 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Sum_uint64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 85 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_int64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Sum_int64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Sum_int64_tv 552 bytes stack frame, 420 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Sum_int64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z47ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z47ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z42ncclKernel_ReduceScatter_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z42ncclKernel_ReduceScatter_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z51ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z51ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z52ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z52ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z42ncclKernel_ReduceScatter_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z42ncclKernel_ReduceScatter_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 88 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z42ncclKernel_ReduceScatter_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z42ncclKernel_ReduceScatter_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_floatv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Sum_floatv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Sum_floatv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Sum_floatv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z46ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z46ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_ReduceScatter_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z41ncclKernel_ReduceScatter_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z51ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z51ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_ReduceScatter_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z41ncclKernel_ReduceScatter_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 88 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z41ncclKernel_ReduceScatter_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z41ncclKernel_ReduceScatter_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_halfv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_NVLS_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_SIMPLE_Sum_halfv 264 bytes stack frame, 296 bytes spill stores, 300 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL128_Sum_halfv 568 bytes stack frame, 436 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_RING_LL_Sum_halfv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 85 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_uint64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Sum_uint64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Sum_uint64_tv 552 bytes stack frame, 420 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Sum_uint64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 93 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_int64_tv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Sum_int64_tv 312 bytes stack frame, 336 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Sum_int64_tv 600 bytes stack frame, 464 bytes spill stores, 564 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Sum_int64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z47ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z47ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z42ncclKernel_ReduceScatter_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z42ncclKernel_ReduceScatter_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z51ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z51ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z52ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z52ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z42ncclKernel_ReduceScatter_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z42ncclKernel_ReduceScatter_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 89 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z42ncclKernel_ReduceScatter_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z42ncclKernel_ReduceScatter_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_floatv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Sum_floatv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Sum_floatv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Sum_floatv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 93 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_uint64_tv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Sum_uint64_tv 312 bytes stack frame, 336 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Sum_uint64_tv 600 bytes stack frame, 464 bytes spill stores, 564 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Sum_uint64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z46ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z46ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_ReduceScatter_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z41ncclKernel_ReduceScatter_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z51ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z51ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_ReduceScatter_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z41ncclKernel_ReduceScatter_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 88 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z41ncclKernel_ReduceScatter_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z41ncclKernel_ReduceScatter_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_halfv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_NVLS_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_SIMPLE_Sum_halfv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL128_Sum_halfv 560 bytes stack frame, 428 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_RING_LL_Sum_halfv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 93 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_int64_tv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Sum_int64_tv 408 bytes stack frame, 452 bytes spill stores, 780 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Sum_int64_tv 616 bytes stack frame, 484 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Sum_int64_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z47ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z47ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z42ncclKernel_ReduceScatter_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z42ncclKernel_ReduceScatter_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z51ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z51ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z52ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z52ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z42ncclKernel_ReduceScatter_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z42ncclKernel_ReduceScatter_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z42ncclKernel_ReduceScatter_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z42ncclKernel_ReduceScatter_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_floatv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Sum_floatv 312 bytes stack frame, 336 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Sum_floatv 600 bytes stack frame, 464 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Sum_floatv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 93 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_uint64_tv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Sum_uint64_tv 408 bytes stack frame, 452 bytes spill stores, 780 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Sum_uint64_tv 616 bytes stack frame, 484 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Sum_uint64_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z46ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z46ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_ReduceScatter_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z41ncclKernel_ReduceScatter_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z51ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z51ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_ReduceScatter_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z41ncclKernel_ReduceScatter_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 94 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_ReduceScatter_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z41ncclKernel_ReduceScatter_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_halfv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_NVLS_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_SIMPLE_Sum_halfv 296 bytes stack frame, 324 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL128_Sum_halfv 624 bytes stack frame, 508 bytes spill stores, 564 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_RING_LL_Sum_halfv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z49ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_NVLS_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_RING_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 90 registers ptxas info : Compiling entry function '_Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z44ncclKernel_ReduceScatter_TREE_LL_Sum_int64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_int64_tv 280 bytes stack frame, 288 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Sum_int64_tv 320 bytes stack frame, 344 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Sum_int64_tv 568 bytes stack frame, 440 bytes spill stores, 516 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Sum_int64_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Sum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z47ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z47ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z42ncclKernel_ReduceScatter_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z42ncclKernel_ReduceScatter_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z51ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z51ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z52ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z52ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z42ncclKernel_ReduceScatter_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z42ncclKernel_ReduceScatter_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 96 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z42ncclKernel_ReduceScatter_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z42ncclKernel_ReduceScatter_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_floatv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Sum_floatv 408 bytes stack frame, 452 bytes spill stores, 792 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Sum_floatv 608 bytes stack frame, 476 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Sum_floatv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sum_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_sum_f64.cu -o /<>/build/obj/collectives/device/reduce_scatter_sum_f64.o ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_NVLS_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z54ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z55ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_RING_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 90 registers ptxas info : Compiling entry function '_Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z45ncclKernel_ReduceScatter_TREE_LL_Sum_uint64_tP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_uint64_tv 280 bytes stack frame, 288 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Sum_uint64_tv 320 bytes stack frame, 344 bytes spill stores, 504 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Sum_uint64_tv 568 bytes stack frame, 440 bytes spill stores, 516 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Sum_uint64_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Sum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sum_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=0 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_sum_bf16.cu -o /<>/build/obj/collectives/device/reduce_scatter_sum_bf16.o ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z46ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z46ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_ReduceScatter_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z41ncclKernel_ReduceScatter_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z51ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z51ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_ReduceScatter_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z41ncclKernel_ReduceScatter_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 95 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z41ncclKernel_ReduceScatter_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z41ncclKernel_ReduceScatter_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_halfv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_NVLS_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_SIMPLE_Sum_halfv 384 bytes stack frame, 448 bytes spill stores, 660 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL128_Sum_halfv 616 bytes stack frame, 472 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_RING_LL_Sum_halfv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z47ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z47ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z42ncclKernel_ReduceScatter_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z42ncclKernel_ReduceScatter_NVLS_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z51ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z51ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z52ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z52ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z42ncclKernel_ReduceScatter_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z42ncclKernel_ReduceScatter_RING_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 94 registers ptxas info : Compiling entry function '_Z42ncclKernel_ReduceScatter_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z42ncclKernel_ReduceScatter_TREE_LL_Sum_floatP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_floatv 272 bytes stack frame, 280 bytes spill stores, 292 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Sum_floatv 320 bytes stack frame, 344 bytes spill stores, 472 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Sum_floatv 568 bytes stack frame, 432 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Sum_floatv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Sum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 87 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_doublev 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Sum_doublev 248 bytes stack frame, 276 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Sum_doublev 552 bytes stack frame, 412 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Sum_doublev 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_prod_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_prod_i8.cu -o /<>/build/obj/collectives/device/reduce_scatter_prod_i8.o ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z46ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z46ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z41ncclKernel_ReduceScatter_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z41ncclKernel_ReduceScatter_NVLS_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z51ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z51ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z41ncclKernel_ReduceScatter_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z41ncclKernel_ReduceScatter_RING_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 90 registers ptxas info : Compiling entry function '_Z41ncclKernel_ReduceScatter_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z41ncclKernel_ReduceScatter_TREE_LL_Sum_halfP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_halfv 288 bytes stack frame, 300 bytes spill stores, 320 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_NVLS_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_SIMPLE_Sum_halfv 288 bytes stack frame, 304 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL128_Sum_halfv 576 bytes stack frame, 436 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_RING_LL_Sum_halfv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL128_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_TREE_LL_Sum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_prod_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_prod_u8.cu -o /<>/build/obj/collectives/device/reduce_scatter_prod_u8.o ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z55ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z55ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z59ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z59ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z60ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z60ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 87 registers, 344 bytes cmem[0], 12 bytes cmem[2] ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_50' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum___nv_bfloat16v 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_Sum___nv_bfloat16v 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_Sum___nv_bfloat16v 560 bytes stack frame, 428 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_Sum___nv_bfloat16v 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 87 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_doublev 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Sum_doublev 248 bytes stack frame, 276 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Sum_doublev 552 bytes stack frame, 412 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Sum_doublev 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_int8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Prod_int8_tv 296 bytes stack frame, 328 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Prod_int8_tv 536 bytes stack frame, 372 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Prod_int8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_uint8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Prod_uint8_tv 296 bytes stack frame, 328 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Prod_uint8_tv 536 bytes stack frame, 372 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Prod_uint8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z55ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z55ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z59ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z59ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z60ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z60ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 88 registers, 344 bytes cmem[0], 12 bytes cmem[2] ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_60' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum___nv_bfloat16v 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_Sum___nv_bfloat16v 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_Sum___nv_bfloat16v 560 bytes stack frame, 428 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_Sum___nv_bfloat16v 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 87 registers, 344 bytes cmem[0], 4 bytes cmem[2] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_doublev 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Sum_doublev 248 bytes stack frame, 276 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Sum_doublev 552 bytes stack frame, 412 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Sum_doublev 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_int8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Prod_int8_tv 296 bytes stack frame, 328 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Prod_int8_tv 536 bytes stack frame, 372 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Prod_int8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_uint8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Prod_uint8_tv 296 bytes stack frame, 328 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Prod_uint8_tv 536 bytes stack frame, 372 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Prod_uint8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 93 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_doublev 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Sum_doublev 296 bytes stack frame, 320 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Sum_doublev 600 bytes stack frame, 464 bytes spill stores, 564 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Sum_doublev 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_int8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Prod_int8_tv 296 bytes stack frame, 328 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Prod_int8_tv 536 bytes stack frame, 372 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Prod_int8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z55ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z55ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z59ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z59ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z60ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z60ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 87 registers, 344 bytes cmem[0], 12 bytes cmem[2] ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_61' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 22 registers, 344 bytes cmem[0] ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum___nv_bfloat16v 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_Sum___nv_bfloat16v 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_Sum___nv_bfloat16v 560 bytes stack frame, 428 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_Sum___nv_bfloat16v 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_uint8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Prod_uint8_tv 296 bytes stack frame, 328 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Prod_uint8_tv 536 bytes stack frame, 372 bytes spill stores, 428 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Prod_uint8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 93 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_doublev 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Sum_doublev 360 bytes stack frame, 392 bytes spill stores, 524 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Sum_doublev 600 bytes stack frame, 456 bytes spill stores, 548 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Sum_doublev 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_int8_tv 296 bytes stack frame, 308 bytes spill stores, 296 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Prod_int8_tv 344 bytes stack frame, 388 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Prod_int8_tv 624 bytes stack frame, 500 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Prod_int8_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_uint8_tv 296 bytes stack frame, 308 bytes spill stores, 296 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Prod_uint8_tv 344 bytes stack frame, 388 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Prod_uint8_tv 624 bytes stack frame, 500 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Prod_uint8_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z55ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z55ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z59ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z59ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z60ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z60ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 93 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_70' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum___nv_bfloat16v 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_Sum___nv_bfloat16v 272 bytes stack frame, 292 bytes spill stores, 288 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_Sum___nv_bfloat16v 608 bytes stack frame, 468 bytes spill stores, 548 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_Sum___nv_bfloat16v 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z48ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_NVLS_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z52ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z53ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_RING_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 91 registers ptxas info : Compiling entry function '_Z43ncclKernel_ReduceScatter_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z43ncclKernel_ReduceScatter_TREE_LL_Sum_doubleP11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum_doublev 280 bytes stack frame, 288 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Sum_doublev 272 bytes stack frame, 284 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Sum_doublev 552 bytes stack frame, 408 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Sum_doublev 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Sum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_prod_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_prod_i32.cu -o /<>/build/obj/collectives/device/reduce_scatter_prod_i32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_int8_tv 432 bytes stack frame, 492 bytes spill stores, 708 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Prod_int8_tv 480 bytes stack frame, 676 bytes spill stores, 1248 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Prod_int8_tv 616 bytes stack frame, 480 bytes spill stores, 568 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Prod_int8_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z55ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z55ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z59ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z59ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z60ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z60ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 95 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_80' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 376 bytes cmem[0] ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum___nv_bfloat16v 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_Sum___nv_bfloat16v 384 bytes stack frame, 448 bytes spill stores, 660 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_Sum___nv_bfloat16v 616 bytes stack frame, 472 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_Sum___nv_bfloat16v 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_uint8_tv 432 bytes stack frame, 492 bytes spill stores, 708 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Prod_uint8_tv 480 bytes stack frame, 676 bytes spill stores, 1248 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Prod_uint8_tv 616 bytes stack frame, 480 bytes spill stores, 568 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Prod_uint8_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_int32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Prod_int32_tv 264 bytes stack frame, 292 bytes spill stores, 288 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Prod_int32_tv 560 bytes stack frame, 424 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Prod_int32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_int8_tv 424 bytes stack frame, 512 bytes spill stores, 768 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Prod_int8_tv 416 bytes stack frame, 528 bytes spill stores, 840 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Prod_int8_tv 568 bytes stack frame, 428 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Prod_int8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Prod_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_prod_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_prod_u32.cu -o /<>/build/obj/collectives/device/reduce_scatter_prod_u32.o ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z55ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z55ncclKernel_ReduceScatter_NVLS_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_NVLS_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z59ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z59ncclKernel_ReduceScatter_COLLNET_CHAIN_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z60ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z60ncclKernel_ReduceScatter_COLLNET_DIRECT_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_RING_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 90 registers ptxas info : Compiling entry function '_Z50ncclKernel_ReduceScatter_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork' for 'sm_90' ptxas info : Function properties for _Z50ncclKernel_ReduceScatter_TREE_LL_Sum___nv_bfloat16P11ncclDevCommmP8ncclWork 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_Sum___nv_bfloat16v 288 bytes stack frame, 300 bytes spill stores, 320 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_Sum___nv_bfloat16v 288 bytes stack frame, 304 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_Sum___nv_bfloat16v 576 bytes stack frame, 436 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_Sum___nv_bfloat16v 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_Sum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_uint8_tv 424 bytes stack frame, 512 bytes spill stores, 768 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Prod_uint8_tv 416 bytes stack frame, 528 bytes spill stores, 840 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Prod_uint8_tv 568 bytes stack frame, 428 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Prod_uint8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Prod_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_prod_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_prod_i64.cu -o /<>/build/obj/collectives/device/reduce_scatter_prod_i64.o Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_prod_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_prod_u64.cu -o /<>/build/obj/collectives/device/reduce_scatter_prod_u64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_int32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Prod_int32_tv 264 bytes stack frame, 292 bytes spill stores, 288 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Prod_int32_tv 560 bytes stack frame, 424 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Prod_int32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_uint32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_SIMPLE_Prod_uint32_tv 264 bytes stack frame, 292 bytes spill stores, 288 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL128_Prod_uint32_tv 560 bytes stack frame, 424 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL_Prod_uint32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_int64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Prod_int64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Prod_int64_tv 552 bytes stack frame, 428 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Prod_int64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_uint64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_SIMPLE_Prod_uint64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL128_Prod_uint64_tv 552 bytes stack frame, 428 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL_Prod_uint64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_int32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Prod_int32_tv 264 bytes stack frame, 292 bytes spill stores, 288 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Prod_int32_tv 560 bytes stack frame, 424 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Prod_int32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_uint32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_SIMPLE_Prod_uint32_tv 264 bytes stack frame, 292 bytes spill stores, 288 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL128_Prod_uint32_tv 560 bytes stack frame, 424 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL_Prod_uint32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_int64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Prod_int64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Prod_int64_tv 552 bytes stack frame, 420 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Prod_int64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_uint64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_SIMPLE_Prod_uint64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL128_Prod_uint64_tv 552 bytes stack frame, 420 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL_Prod_uint64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_int32_tv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Prod_int32_tv 312 bytes stack frame, 336 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Prod_int32_tv 600 bytes stack frame, 464 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Prod_int32_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_uint32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_SIMPLE_Prod_uint32_tv 264 bytes stack frame, 292 bytes spill stores, 288 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL128_Prod_uint32_tv 560 bytes stack frame, 424 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL_Prod_uint32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_int64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Prod_int64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Prod_int64_tv 552 bytes stack frame, 420 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Prod_int64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_uint64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_SIMPLE_Prod_uint64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL128_Prod_uint64_tv 552 bytes stack frame, 420 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL_Prod_uint64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_int32_tv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Prod_int32_tv 408 bytes stack frame, 452 bytes spill stores, 792 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Prod_int32_tv 608 bytes stack frame, 476 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Prod_int32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_uint32_tv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_SIMPLE_Prod_uint32_tv 312 bytes stack frame, 336 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL128_Prod_uint32_tv 600 bytes stack frame, 464 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL_Prod_uint32_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_int64_tv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Prod_int64_tv 304 bytes stack frame, 328 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Prod_int64_tv 600 bytes stack frame, 464 bytes spill stores, 564 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Prod_int64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_uint64_tv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_SIMPLE_Prod_uint64_tv 304 bytes stack frame, 328 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL128_Prod_uint64_tv 600 bytes stack frame, 464 bytes spill stores, 564 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL_Prod_uint64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_int32_tv 272 bytes stack frame, 280 bytes spill stores, 292 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Prod_int32_tv 320 bytes stack frame, 344 bytes spill stores, 472 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Prod_int32_tv 568 bytes stack frame, 432 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Prod_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Prod_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_prod_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_prod_f16.cu -o /<>/build/obj/collectives/device/reduce_scatter_prod_f16.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_uint32_tv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_SIMPLE_Prod_uint32_tv 408 bytes stack frame, 452 bytes spill stores, 792 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL128_Prod_uint32_tv 608 bytes stack frame, 476 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL_Prod_uint32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_int64_tv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Prod_int64_tv 416 bytes stack frame, 492 bytes spill stores, 808 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Prod_int64_tv 592 bytes stack frame, 440 bytes spill stores, 540 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Prod_int64_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_uint64_tv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_SIMPLE_Prod_uint64_tv 416 bytes stack frame, 492 bytes spill stores, 808 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL128_Prod_uint64_tv 592 bytes stack frame, 440 bytes spill stores, 540 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL_Prod_uint64_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_halfv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Prod_halfv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Prod_halfv 560 bytes stack frame, 428 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Prod_halfv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_uint32_tv 272 bytes stack frame, 280 bytes spill stores, 292 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_SIMPLE_Prod_uint32_tv 320 bytes stack frame, 344 bytes spill stores, 472 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL128_Prod_uint32_tv 568 bytes stack frame, 432 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL_Prod_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL128_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL_Prod_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_int64_tv 280 bytes stack frame, 288 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Prod_int64_tv 320 bytes stack frame, 356 bytes spill stores, 476 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Prod_int64_tv 544 bytes stack frame, 400 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Prod_int64_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Prod_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_uint64_tv 280 bytes stack frame, 288 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_SIMPLE_Prod_uint64_tv 320 bytes stack frame, 356 bytes spill stores, 476 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL128_Prod_uint64_tv 544 bytes stack frame, 400 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL_Prod_uint64_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL128_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL_Prod_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_prod_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_prod_f32.cu -o /<>/build/obj/collectives/device/reduce_scatter_prod_f32.o Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_prod_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_prod_f64.cu -o /<>/build/obj/collectives/device/reduce_scatter_prod_f64.o Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_prod_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=1 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_prod_bf16.cu -o /<>/build/obj/collectives/device/reduce_scatter_prod_bf16.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_halfv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Prod_halfv 264 bytes stack frame, 296 bytes spill stores, 300 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Prod_halfv 568 bytes stack frame, 436 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Prod_halfv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_doublev 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Prod_doublev 248 bytes stack frame, 276 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Prod_doublev 552 bytes stack frame, 412 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Prod_doublev 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_floatv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Prod_floatv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Prod_floatv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Prod_floatv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod___nv_bfloat16v 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_Prod___nv_bfloat16v 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_Prod___nv_bfloat16v 560 bytes stack frame, 428 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_Prod___nv_bfloat16v 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_halfv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Prod_halfv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Prod_halfv 560 bytes stack frame, 428 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Prod_halfv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_doublev 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Prod_doublev 248 bytes stack frame, 276 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Prod_doublev 552 bytes stack frame, 412 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Prod_doublev 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_floatv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Prod_floatv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Prod_floatv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Prod_floatv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_halfv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Prod_halfv 296 bytes stack frame, 324 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Prod_halfv 624 bytes stack frame, 508 bytes spill stores, 564 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Prod_halfv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod___nv_bfloat16v 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_Prod___nv_bfloat16v 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_Prod___nv_bfloat16v 560 bytes stack frame, 428 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_Prod___nv_bfloat16v 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_doublev 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Prod_doublev 248 bytes stack frame, 276 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Prod_doublev 552 bytes stack frame, 412 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Prod_doublev 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_floatv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Prod_floatv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Prod_floatv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Prod_floatv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_halfv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Prod_halfv 384 bytes stack frame, 448 bytes spill stores, 660 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Prod_halfv 616 bytes stack frame, 472 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Prod_halfv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_doublev 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Prod_doublev 296 bytes stack frame, 320 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Prod_doublev 600 bytes stack frame, 464 bytes spill stores, 564 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Prod_doublev 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_floatv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Prod_floatv 312 bytes stack frame, 336 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Prod_floatv 600 bytes stack frame, 464 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Prod_floatv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod___nv_bfloat16v 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_Prod___nv_bfloat16v 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_Prod___nv_bfloat16v 560 bytes stack frame, 428 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_Prod___nv_bfloat16v 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_halfv 288 bytes stack frame, 300 bytes spill stores, 320 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Prod_halfv 288 bytes stack frame, 304 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Prod_halfv 576 bytes stack frame, 436 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Prod_halfv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Prod_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_doublev 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Prod_doublev 360 bytes stack frame, 392 bytes spill stores, 524 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Prod_doublev 600 bytes stack frame, 456 bytes spill stores, 548 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Prod_doublev 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_min_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_min_i8.cu -o /<>/build/obj/collectives/device/reduce_scatter_min_i8.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_floatv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Prod_floatv 408 bytes stack frame, 452 bytes spill stores, 792 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Prod_floatv 608 bytes stack frame, 476 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Prod_floatv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod___nv_bfloat16v 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_Prod___nv_bfloat16v 272 bytes stack frame, 292 bytes spill stores, 288 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_Prod___nv_bfloat16v 608 bytes stack frame, 468 bytes spill stores, 548 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_Prod___nv_bfloat16v 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_doublev 280 bytes stack frame, 288 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Prod_doublev 272 bytes stack frame, 284 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Prod_doublev 552 bytes stack frame, 408 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Prod_doublev 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Prod_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_min_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_min_u8.cu -o /<>/build/obj/collectives/device/reduce_scatter_min_u8.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod_floatv 272 bytes stack frame, 280 bytes spill stores, 292 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Prod_floatv 320 bytes stack frame, 344 bytes spill stores, 472 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Prod_floatv 568 bytes stack frame, 432 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Prod_floatv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Prod_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_min_i32.o mkdir -p /<>/build/obj/collectives/device ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_int8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Min_int8_tv 280 bytes stack frame, 308 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Min_int8_tv 552 bytes stack frame, 392 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Min_int8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_min_i32.cu -o /<>/build/obj/collectives/device/reduce_scatter_min_i32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod___nv_bfloat16v 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_Prod___nv_bfloat16v 384 bytes stack frame, 448 bytes spill stores, 660 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_Prod___nv_bfloat16v 616 bytes stack frame, 472 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_Prod___nv_bfloat16v 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_uint8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Min_uint8_tv 280 bytes stack frame, 308 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Min_uint8_tv 552 bytes stack frame, 392 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Min_uint8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_int8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Min_int8_tv 280 bytes stack frame, 308 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Min_int8_tv 552 bytes stack frame, 392 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Min_int8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_Prod___nv_bfloat16v 288 bytes stack frame, 300 bytes spill stores, 320 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_Prod___nv_bfloat16v 288 bytes stack frame, 304 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_Prod___nv_bfloat16v 576 bytes stack frame, 436 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_Prod___nv_bfloat16v 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_Prod___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_int32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Min_int32_tv 248 bytes stack frame, 272 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Min_int32_tv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Min_int32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_min_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_min_u32.cu -o /<>/build/obj/collectives/device/reduce_scatter_min_u32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_int32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Min_int32_tv 248 bytes stack frame, 272 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Min_int32_tv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Min_int32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_uint8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Min_uint8_tv 280 bytes stack frame, 308 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Min_uint8_tv 552 bytes stack frame, 392 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Min_uint8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_int8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Min_int8_tv 280 bytes stack frame, 308 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Min_int8_tv 552 bytes stack frame, 392 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Min_int8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_uint32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Min_uint32_tv 248 bytes stack frame, 272 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Min_uint32_tv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Min_uint32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_int32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Min_int32_tv 248 bytes stack frame, 272 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Min_int32_tv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Min_int32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_int8_tv 296 bytes stack frame, 308 bytes spill stores, 296 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Min_int8_tv 400 bytes stack frame, 444 bytes spill stores, 792 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Min_int8_tv 624 bytes stack frame, 500 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Min_int8_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_uint8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Min_uint8_tv 280 bytes stack frame, 308 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Min_uint8_tv 552 bytes stack frame, 392 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Min_uint8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_uint32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Min_uint32_tv 248 bytes stack frame, 272 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Min_uint32_tv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Min_uint32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_int32_tv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Min_int32_tv 312 bytes stack frame, 336 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Min_int32_tv 600 bytes stack frame, 464 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Min_int32_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_uint8_tv 296 bytes stack frame, 308 bytes spill stores, 296 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Min_uint8_tv 400 bytes stack frame, 444 bytes spill stores, 792 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Min_uint8_tv 624 bytes stack frame, 500 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Min_uint8_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_int8_tv 432 bytes stack frame, 492 bytes spill stores, 708 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Min_int8_tv 488 bytes stack frame, 684 bytes spill stores, 1256 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Min_int8_tv 616 bytes stack frame, 480 bytes spill stores, 568 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Min_int8_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_uint32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Min_uint32_tv 248 bytes stack frame, 272 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Min_uint32_tv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Min_uint32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_int32_tv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Min_int32_tv 408 bytes stack frame, 452 bytes spill stores, 792 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Min_int32_tv 608 bytes stack frame, 476 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Min_int32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_uint8_tv 432 bytes stack frame, 492 bytes spill stores, 708 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Min_uint8_tv 480 bytes stack frame, 676 bytes spill stores, 1248 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Min_uint8_tv 616 bytes stack frame, 480 bytes spill stores, 568 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Min_uint8_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_uint32_tv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Min_uint32_tv 312 bytes stack frame, 336 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Min_uint32_tv 600 bytes stack frame, 464 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Min_uint32_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_int8_tv 424 bytes stack frame, 512 bytes spill stores, 768 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Min_int8_tv 424 bytes stack frame, 532 bytes spill stores, 844 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Min_int8_tv 568 bytes stack frame, 428 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Min_int8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Min_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_min_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_min_i64.cu -o /<>/build/obj/collectives/device/reduce_scatter_min_i64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_int32_tv 272 bytes stack frame, 280 bytes spill stores, 292 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Min_int32_tv 320 bytes stack frame, 344 bytes spill stores, 472 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Min_int32_tv 568 bytes stack frame, 432 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Min_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Min_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_min_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_min_u64.cu -o /<>/build/obj/collectives/device/reduce_scatter_min_u64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_uint32_tv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Min_uint32_tv 408 bytes stack frame, 452 bytes spill stores, 792 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Min_uint32_tv 608 bytes stack frame, 476 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Min_uint32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_uint8_tv 424 bytes stack frame, 512 bytes spill stores, 768 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Min_uint8_tv 424 bytes stack frame, 532 bytes spill stores, 852 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Min_uint8_tv 576 bytes stack frame, 436 bytes spill stores, 476 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Min_uint8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Min_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_min_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_min_f16.cu -o /<>/build/obj/collectives/device/reduce_scatter_min_f16.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_int64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Min_int64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Min_int64_tv 544 bytes stack frame, 404 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Min_int64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_uint64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Min_uint64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Min_uint64_tv 544 bytes stack frame, 404 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Min_uint64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_uint32_tv 272 bytes stack frame, 280 bytes spill stores, 292 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Min_uint32_tv 320 bytes stack frame, 344 bytes spill stores, 472 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Min_uint32_tv 568 bytes stack frame, 432 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Min_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Min_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_min_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_min_f32.cu -o /<>/build/obj/collectives/device/reduce_scatter_min_f32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_int64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Min_int64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Min_int64_tv 544 bytes stack frame, 404 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Min_int64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_halfv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_NVLS_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_SIMPLE_Min_halfv 264 bytes stack frame, 296 bytes spill stores, 300 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL128_Min_halfv 560 bytes stack frame, 420 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_RING_LL_Min_halfv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_uint64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Min_uint64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Min_uint64_tv 544 bytes stack frame, 404 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Min_uint64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_int64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Min_int64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Min_int64_tv 544 bytes stack frame, 404 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Min_int64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_floatv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Min_floatv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Min_floatv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Min_floatv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_halfv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_NVLS_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_SIMPLE_Min_halfv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL128_Min_halfv 560 bytes stack frame, 420 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_RING_LL_Min_halfv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_uint64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Min_uint64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Min_uint64_tv 544 bytes stack frame, 404 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Min_uint64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_int64_tv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Min_int64_tv 312 bytes stack frame, 336 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Min_int64_tv 600 bytes stack frame, 464 bytes spill stores, 564 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Min_int64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_floatv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Min_floatv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Min_floatv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Min_floatv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_uint64_tv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Min_uint64_tv 312 bytes stack frame, 336 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Min_uint64_tv 600 bytes stack frame, 464 bytes spill stores, 564 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Min_uint64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_halfv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_NVLS_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_SIMPLE_Min_halfv 264 bytes stack frame, 296 bytes spill stores, 300 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL128_Min_halfv 560 bytes stack frame, 420 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_RING_LL_Min_halfv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_int64_tv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Min_int64_tv 408 bytes stack frame, 448 bytes spill stores, 776 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Min_int64_tv 608 bytes stack frame, 468 bytes spill stores, 568 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Min_int64_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_floatv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Min_floatv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Min_floatv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Min_floatv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_uint64_tv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Min_uint64_tv 408 bytes stack frame, 448 bytes spill stores, 776 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Min_uint64_tv 608 bytes stack frame, 468 bytes spill stores, 568 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Min_uint64_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_halfv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_NVLS_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_SIMPLE_Min_halfv 312 bytes stack frame, 344 bytes spill stores, 380 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL128_Min_halfv 616 bytes stack frame, 484 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_RING_LL_Min_halfv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_int64_tv 280 bytes stack frame, 288 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Min_int64_tv 328 bytes stack frame, 348 bytes spill stores, 476 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Min_int64_tv 568 bytes stack frame, 440 bytes spill stores, 516 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Min_int64_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Min_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_floatv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Min_floatv 312 bytes stack frame, 336 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Min_floatv 600 bytes stack frame, 464 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Min_floatv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_min_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_min_f64.cu -o /<>/build/obj/collectives/device/reduce_scatter_min_f64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_uint64_tv 280 bytes stack frame, 288 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Min_uint64_tv 328 bytes stack frame, 348 bytes spill stores, 476 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Min_uint64_tv 568 bytes stack frame, 440 bytes spill stores, 516 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Min_uint64_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Min_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_halfv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_NVLS_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_SIMPLE_Min_halfv 384 bytes stack frame, 448 bytes spill stores, 660 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL128_Min_halfv 616 bytes stack frame, 472 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_RING_LL_Min_halfv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_min_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=2 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_min_bf16.cu -o /<>/build/obj/collectives/device/reduce_scatter_min_bf16.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_floatv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Min_floatv 408 bytes stack frame, 452 bytes spill stores, 792 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Min_floatv 608 bytes stack frame, 476 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Min_floatv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_doublev 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Min_doublev 264 bytes stack frame, 296 bytes spill stores, 300 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Min_doublev 552 bytes stack frame, 412 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Min_doublev 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_halfv 288 bytes stack frame, 300 bytes spill stores, 320 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_NVLS_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_SIMPLE_Min_halfv 288 bytes stack frame, 304 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL128_Min_halfv 576 bytes stack frame, 436 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_RING_LL_Min_halfv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_SIMPLE_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL128_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_TREE_LL_Min_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_max_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_max_i8.cu -o /<>/build/obj/collectives/device/reduce_scatter_max_i8.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_floatv 272 bytes stack frame, 280 bytes spill stores, 292 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Min_floatv 320 bytes stack frame, 344 bytes spill stores, 472 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Min_floatv 568 bytes stack frame, 432 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Min_floatv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Min_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_Min___nv_bfloat16v 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_Min___nv_bfloat16v 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_Min___nv_bfloat16v 568 bytes stack frame, 436 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_Min___nv_bfloat16v 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_max_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_max_u8.cu -o /<>/build/obj/collectives/device/reduce_scatter_max_u8.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_doublev 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Min_doublev 248 bytes stack frame, 276 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Min_doublev 552 bytes stack frame, 412 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Min_doublev 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_int8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Max_int8_tv 280 bytes stack frame, 308 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Max_int8_tv 552 bytes stack frame, 392 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Max_int8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_doublev 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Min_doublev 248 bytes stack frame, 276 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Min_doublev 552 bytes stack frame, 412 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Min_doublev 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_uint8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Max_uint8_tv 280 bytes stack frame, 308 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Max_uint8_tv 552 bytes stack frame, 392 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Max_uint8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_Min___nv_bfloat16v 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_Min___nv_bfloat16v 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_Min___nv_bfloat16v 568 bytes stack frame, 436 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_Min___nv_bfloat16v 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_doublev 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Min_doublev 304 bytes stack frame, 332 bytes spill stores, 352 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Min_doublev 600 bytes stack frame, 464 bytes spill stores, 564 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Min_doublev 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_int8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Max_int8_tv 280 bytes stack frame, 308 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Max_int8_tv 552 bytes stack frame, 392 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Max_int8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_uint8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Max_uint8_tv 280 bytes stack frame, 308 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Max_uint8_tv 552 bytes stack frame, 392 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Max_uint8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_Min___nv_bfloat16v 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_Min___nv_bfloat16v 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_Min___nv_bfloat16v 568 bytes stack frame, 436 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_Min___nv_bfloat16v 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_doublev 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Min_doublev 408 bytes stack frame, 456 bytes spill stores, 796 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Min_doublev 592 bytes stack frame, 460 bytes spill stores, 524 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Min_doublev 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_int8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Max_int8_tv 280 bytes stack frame, 308 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Max_int8_tv 552 bytes stack frame, 392 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Max_int8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_uint8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Max_uint8_tv 280 bytes stack frame, 308 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Max_uint8_tv 552 bytes stack frame, 392 bytes spill stores, 436 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Max_uint8_tv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_Min___nv_bfloat16v 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_Min___nv_bfloat16v 272 bytes stack frame, 292 bytes spill stores, 288 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_Min___nv_bfloat16v 608 bytes stack frame, 468 bytes spill stores, 548 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_Min___nv_bfloat16v 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Min_doublev 280 bytes stack frame, 288 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Min_doublev 328 bytes stack frame, 364 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Min_doublev 544 bytes stack frame, 392 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Min_doublev 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Min_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_max_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_max_i32.cu -o /<>/build/obj/collectives/device/reduce_scatter_max_i32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_int8_tv 296 bytes stack frame, 308 bytes spill stores, 296 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Max_int8_tv 400 bytes stack frame, 444 bytes spill stores, 792 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Max_int8_tv 624 bytes stack frame, 500 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Max_int8_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_uint8_tv 296 bytes stack frame, 308 bytes spill stores, 296 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Max_uint8_tv 400 bytes stack frame, 444 bytes spill stores, 792 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Max_uint8_tv 624 bytes stack frame, 500 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Max_uint8_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_Min___nv_bfloat16v 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_Min___nv_bfloat16v 384 bytes stack frame, 448 bytes spill stores, 660 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_Min___nv_bfloat16v 616 bytes stack frame, 472 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_Min___nv_bfloat16v 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_int32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Max_int32_tv 248 bytes stack frame, 272 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Max_int32_tv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Max_int32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_int8_tv 432 bytes stack frame, 492 bytes spill stores, 708 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Max_int8_tv 488 bytes stack frame, 684 bytes spill stores, 1256 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Max_int8_tv 616 bytes stack frame, 480 bytes spill stores, 568 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Max_int8_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_uint8_tv 432 bytes stack frame, 492 bytes spill stores, 708 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Max_uint8_tv 480 bytes stack frame, 676 bytes spill stores, 1248 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Max_uint8_tv 616 bytes stack frame, 480 bytes spill stores, 568 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Max_uint8_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_Min___nv_bfloat16v 288 bytes stack frame, 300 bytes spill stores, 320 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_Min___nv_bfloat16v 288 bytes stack frame, 304 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_Min___nv_bfloat16v 576 bytes stack frame, 436 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_Min___nv_bfloat16v 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_Min___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_max_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_max_u32.cu -o /<>/build/obj/collectives/device/reduce_scatter_max_u32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_int32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Max_int32_tv 248 bytes stack frame, 272 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Max_int32_tv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Max_int32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_int8_tv 424 bytes stack frame, 512 bytes spill stores, 768 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Max_int8_tv 424 bytes stack frame, 532 bytes spill stores, 844 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Max_int8_tv 568 bytes stack frame, 428 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Max_int8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Max_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_max_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_max_i64.cu -o /<>/build/obj/collectives/device/reduce_scatter_max_i64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_uint8_tv 424 bytes stack frame, 512 bytes spill stores, 768 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Max_uint8_tv 424 bytes stack frame, 532 bytes spill stores, 852 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Max_uint8_tv 576 bytes stack frame, 436 bytes spill stores, 476 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Max_uint8_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Max_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_max_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_max_u64.cu -o /<>/build/obj/collectives/device/reduce_scatter_max_u64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_uint32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Max_uint32_tv 248 bytes stack frame, 272 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Max_uint32_tv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Max_uint32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_int32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Max_int32_tv 248 bytes stack frame, 272 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Max_int32_tv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Max_int32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_int64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Max_int64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Max_int64_tv 544 bytes stack frame, 404 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Max_int64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_uint64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Max_uint64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Max_uint64_tv 544 bytes stack frame, 404 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Max_uint64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_uint32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Max_uint32_tv 248 bytes stack frame, 272 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Max_uint32_tv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Max_uint32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_int32_tv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Max_int32_tv 312 bytes stack frame, 336 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Max_int32_tv 600 bytes stack frame, 464 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Max_int32_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_int64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Max_int64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Max_int64_tv 544 bytes stack frame, 404 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Max_int64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_uint64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Max_uint64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Max_uint64_tv 544 bytes stack frame, 404 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Max_uint64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_uint32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Max_uint32_tv 248 bytes stack frame, 272 bytes spill stores, 260 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Max_uint32_tv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Max_uint32_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_int32_tv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Max_int32_tv 408 bytes stack frame, 452 bytes spill stores, 792 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Max_int32_tv 608 bytes stack frame, 476 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Max_int32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_int64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Max_int64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Max_int64_tv 544 bytes stack frame, 404 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Max_int64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_uint64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Max_uint64_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Max_uint64_tv 544 bytes stack frame, 404 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Max_uint64_tv 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_uint32_tv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Max_uint32_tv 312 bytes stack frame, 336 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Max_uint32_tv 600 bytes stack frame, 464 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Max_uint32_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_int32_tv 272 bytes stack frame, 280 bytes spill stores, 292 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Max_int32_tv 320 bytes stack frame, 344 bytes spill stores, 472 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Max_int32_tv 568 bytes stack frame, 432 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Max_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Max_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_max_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_max_f16.cu -o /<>/build/obj/collectives/device/reduce_scatter_max_f16.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_int64_tv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Max_int64_tv 312 bytes stack frame, 336 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Max_int64_tv 600 bytes stack frame, 464 bytes spill stores, 564 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Max_int64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_uint64_tv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Max_uint64_tv 312 bytes stack frame, 336 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Max_uint64_tv 600 bytes stack frame, 464 bytes spill stores, 564 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Max_uint64_tv 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_uint32_tv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Max_uint32_tv 408 bytes stack frame, 452 bytes spill stores, 792 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Max_uint32_tv 608 bytes stack frame, 476 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Max_uint32_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_int64_tv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Max_int64_tv 408 bytes stack frame, 448 bytes spill stores, 776 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Max_int64_tv 608 bytes stack frame, 468 bytes spill stores, 568 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Max_int64_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_halfv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_NVLS_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_SIMPLE_Max_halfv 264 bytes stack frame, 296 bytes spill stores, 300 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL128_Max_halfv 560 bytes stack frame, 420 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_RING_LL_Max_halfv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_uint64_tv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Max_uint64_tv 408 bytes stack frame, 448 bytes spill stores, 776 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Max_uint64_tv 608 bytes stack frame, 468 bytes spill stores, 568 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Max_uint64_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_uint32_tv 272 bytes stack frame, 280 bytes spill stores, 292 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Max_uint32_tv 320 bytes stack frame, 344 bytes spill stores, 472 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Max_uint32_tv 568 bytes stack frame, 432 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Max_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Max_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_max_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_max_f32.cu -o /<>/build/obj/collectives/device/reduce_scatter_max_f32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_int64_tv 280 bytes stack frame, 288 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_SIMPLE_Max_int64_tv 328 bytes stack frame, 348 bytes spill stores, 476 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL128_Max_int64_tv 568 bytes stack frame, 440 bytes spill stores, 516 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL_Max_int64_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_SIMPLE_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL128_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL_Max_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_max_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_max_f64.cu -o /<>/build/obj/collectives/device/reduce_scatter_max_f64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_halfv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_NVLS_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_SIMPLE_Max_halfv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL128_Max_halfv 560 bytes stack frame, 420 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_RING_LL_Max_halfv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_uint64_tv 280 bytes stack frame, 288 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_SIMPLE_Max_uint64_tv 328 bytes stack frame, 348 bytes spill stores, 476 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL128_Max_uint64_tv 568 bytes stack frame, 440 bytes spill stores, 516 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL_Max_uint64_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_SIMPLE_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL128_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL_Max_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_max_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=3 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_max_bf16.cu -o /<>/build/obj/collectives/device/reduce_scatter_max_bf16.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_floatv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Max_floatv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Max_floatv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Max_floatv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_halfv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_NVLS_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_SIMPLE_Max_halfv 264 bytes stack frame, 296 bytes spill stores, 300 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL128_Max_halfv 560 bytes stack frame, 420 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_RING_LL_Max_halfv 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_doublev 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Max_doublev 264 bytes stack frame, 296 bytes spill stores, 300 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Max_doublev 552 bytes stack frame, 412 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Max_doublev 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_floatv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Max_floatv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Max_floatv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Max_floatv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_Max___nv_bfloat16v 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_Max___nv_bfloat16v 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_Max___nv_bfloat16v 568 bytes stack frame, 436 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_Max___nv_bfloat16v 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_doublev 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Max_doublev 248 bytes stack frame, 276 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Max_doublev 552 bytes stack frame, 412 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Max_doublev 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_halfv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_NVLS_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_SIMPLE_Max_halfv 312 bytes stack frame, 344 bytes spill stores, 380 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL128_Max_halfv 616 bytes stack frame, 484 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_RING_LL_Max_halfv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_floatv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Max_floatv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Max_floatv 544 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Max_floatv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_doublev 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Max_doublev 248 bytes stack frame, 276 bytes spill stores, 264 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Max_doublev 552 bytes stack frame, 412 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Max_doublev 120 bytes stack frame, 116 bytes spill stores, 116 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_halfv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_NVLS_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_SIMPLE_Max_halfv 384 bytes stack frame, 448 bytes spill stores, 660 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL128_Max_halfv 616 bytes stack frame, 472 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_RING_LL_Max_halfv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_Max___nv_bfloat16v 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_Max___nv_bfloat16v 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_Max___nv_bfloat16v 568 bytes stack frame, 436 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_Max___nv_bfloat16v 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_floatv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Max_floatv 312 bytes stack frame, 336 bytes spill stores, 372 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Max_floatv 600 bytes stack frame, 464 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Max_floatv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_doublev 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Max_doublev 304 bytes stack frame, 332 bytes spill stores, 352 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Max_doublev 600 bytes stack frame, 464 bytes spill stores, 564 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Max_doublev 168 bytes stack frame, 164 bytes spill stores, 164 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_halfv 288 bytes stack frame, 300 bytes spill stores, 320 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_NVLS_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_NVLS_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_SIMPLE_Max_halfv 288 bytes stack frame, 304 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_RING_LL128_Max_halfv 576 bytes stack frame, 436 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_RING_LL_Max_halfv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_SIMPLE_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z46ncclFunction_ReduceScatter_TREE_LL128_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_ReduceScatter_TREE_LL_Max_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_premulsum_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_premulsum_i8.cu -o /<>/build/obj/collectives/device/reduce_scatter_premulsum_i8.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_Max___nv_bfloat16v 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_Max___nv_bfloat16v 200 bytes stack frame, 196 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_Max___nv_bfloat16v 568 bytes stack frame, 436 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_Max___nv_bfloat16v 128 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_floatv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Max_floatv 408 bytes stack frame, 452 bytes spill stores, 792 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Max_floatv 608 bytes stack frame, 476 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Max_floatv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_doublev 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Max_doublev 408 bytes stack frame, 456 bytes spill stores, 796 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Max_doublev 592 bytes stack frame, 460 bytes spill stores, 524 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Max_doublev 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_floatv 272 bytes stack frame, 280 bytes spill stores, 292 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_NVLS_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_NVLS_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_SIMPLE_Max_floatv 320 bytes stack frame, 344 bytes spill stores, 472 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_RING_LL128_Max_floatv 568 bytes stack frame, 432 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_RING_LL_Max_floatv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_SIMPLE_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z47ncclFunction_ReduceScatter_TREE_LL128_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_ReduceScatter_TREE_LL_Max_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_int8_tv 264 bytes stack frame, 280 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_int8_tv 304 bytes stack frame, 340 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL128_PreMulSum_int8_tv 536 bytes stack frame, 368 bytes spill stores, 416 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL_PreMulSum_int8_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_Max___nv_bfloat16v 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_Max___nv_bfloat16v 272 bytes stack frame, 292 bytes spill stores, 288 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_Max___nv_bfloat16v 608 bytes stack frame, 468 bytes spill stores, 548 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_Max___nv_bfloat16v 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_premulsum_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_premulsum_u8.cu -o /<>/build/obj/collectives/device/reduce_scatter_premulsum_u8.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_SIMPLE_Max_doublev 280 bytes stack frame, 288 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_NVLS_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_NVLS_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_SIMPLE_Max_doublev 328 bytes stack frame, 364 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_RING_LL128_Max_doublev 544 bytes stack frame, 392 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_RING_LL_Max_doublev 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_SIMPLE_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z48ncclFunction_ReduceScatter_TREE_LL128_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_ReduceScatter_TREE_LL_Max_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_premulsum_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_premulsum_i32.cu -o /<>/build/obj/collectives/device/reduce_scatter_premulsum_i32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_Max___nv_bfloat16v 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_Max___nv_bfloat16v 384 bytes stack frame, 448 bytes spill stores, 660 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_Max___nv_bfloat16v 616 bytes stack frame, 472 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_Max___nv_bfloat16v 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_int8_tv 264 bytes stack frame, 280 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_int8_tv 304 bytes stack frame, 340 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL128_PreMulSum_int8_tv 536 bytes stack frame, 368 bytes spill stores, 416 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL_PreMulSum_int8_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_uint8_tv 264 bytes stack frame, 280 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_uint8_tv 304 bytes stack frame, 340 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_PreMulSum_uint8_tv 536 bytes stack frame, 368 bytes spill stores, 416 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_PreMulSum_uint8_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_int32_tv 208 bytes stack frame, 208 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_int32_tv 280 bytes stack frame, 308 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_PreMulSum_int32_tv 568 bytes stack frame, 420 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_PreMulSum_int32_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_Max___nv_bfloat16v 288 bytes stack frame, 300 bytes spill stores, 320 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_Max___nv_bfloat16v 288 bytes stack frame, 304 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_Max___nv_bfloat16v 576 bytes stack frame, 436 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_Max___nv_bfloat16v 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_Max___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_premulsum_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_premulsum_u32.cu -o /<>/build/obj/collectives/device/reduce_scatter_premulsum_u32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_int8_tv 264 bytes stack frame, 280 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_int8_tv 304 bytes stack frame, 340 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL128_PreMulSum_int8_tv 536 bytes stack frame, 368 bytes spill stores, 416 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL_PreMulSum_int8_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_uint8_tv 264 bytes stack frame, 280 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_uint8_tv 304 bytes stack frame, 340 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_PreMulSum_uint8_tv 536 bytes stack frame, 368 bytes spill stores, 416 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_PreMulSum_uint8_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_int32_tv 208 bytes stack frame, 208 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_int32_tv 280 bytes stack frame, 308 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_PreMulSum_int32_tv 568 bytes stack frame, 428 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_PreMulSum_int32_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_int8_tv 288 bytes stack frame, 300 bytes spill stores, 296 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_int8_tv 344 bytes stack frame, 388 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL128_PreMulSum_int8_tv 600 bytes stack frame, 456 bytes spill stores, 512 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL_PreMulSum_int8_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_uint32_tv 208 bytes stack frame, 208 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_uint32_tv 280 bytes stack frame, 308 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_PreMulSum_uint32_tv 568 bytes stack frame, 420 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_PreMulSum_uint32_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_int32_tv 208 bytes stack frame, 208 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_int32_tv 280 bytes stack frame, 308 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_PreMulSum_int32_tv 568 bytes stack frame, 428 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_PreMulSum_int32_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_uint8_tv 264 bytes stack frame, 280 bytes spill stores, 268 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_uint8_tv 304 bytes stack frame, 340 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_PreMulSum_uint8_tv 536 bytes stack frame, 368 bytes spill stores, 416 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_PreMulSum_uint8_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_uint32_tv 208 bytes stack frame, 208 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_uint32_tv 280 bytes stack frame, 308 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_PreMulSum_uint32_tv 568 bytes stack frame, 428 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_PreMulSum_uint32_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_int32_tv 224 bytes stack frame, 216 bytes spill stores, 208 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_int32_tv 320 bytes stack frame, 348 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_PreMulSum_int32_tv 608 bytes stack frame, 464 bytes spill stores, 548 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_PreMulSum_int32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_int8_tv 424 bytes stack frame, 480 bytes spill stores, 652 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_int8_tv 480 bytes stack frame, 708 bytes spill stores, 1280 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL128_PreMulSum_int8_tv 608 bytes stack frame, 464 bytes spill stores, 512 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL_PreMulSum_int8_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_uint8_tv 288 bytes stack frame, 300 bytes spill stores, 296 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_uint8_tv 344 bytes stack frame, 388 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_PreMulSum_uint8_tv 600 bytes stack frame, 456 bytes spill stores, 512 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_PreMulSum_uint8_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_uint32_tv 208 bytes stack frame, 208 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_uint32_tv 280 bytes stack frame, 308 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_PreMulSum_uint32_tv 568 bytes stack frame, 428 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_PreMulSum_uint32_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_int32_tv 320 bytes stack frame, 328 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_int32_tv 416 bytes stack frame, 464 bytes spill stores, 840 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_PreMulSum_int32_tv 616 bytes stack frame, 472 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_PreMulSum_int32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_int8_tv 352 bytes stack frame, 412 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_int8_tv 392 bytes stack frame, 536 bytes spill stores, 860 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL128_PreMulSum_int8_tv 560 bytes stack frame, 408 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL_PreMulSum_int8_tv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL_PreMulSum_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_uint8_tv 424 bytes stack frame, 480 bytes spill stores, 652 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_uint8_tv 480 bytes stack frame, 708 bytes spill stores, 1280 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_PreMulSum_uint8_tv 608 bytes stack frame, 464 bytes spill stores, 512 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_PreMulSum_uint8_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_premulsum_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_premulsum_i64.cu -o /<>/build/obj/collectives/device/reduce_scatter_premulsum_i64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_uint32_tv 224 bytes stack frame, 216 bytes spill stores, 208 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_uint32_tv 320 bytes stack frame, 348 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_PreMulSum_uint32_tv 608 bytes stack frame, 464 bytes spill stores, 548 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_PreMulSum_uint32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_int32_tv 256 bytes stack frame, 264 bytes spill stores, 256 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_int32_tv 328 bytes stack frame, 348 bytes spill stores, 476 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_PreMulSum_int32_tv 560 bytes stack frame, 416 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_PreMulSum_int32_tv 160 bytes stack frame, 160 bytes spill stores, 160 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_PreMulSum_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_premulsum_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_premulsum_u64.cu -o /<>/build/obj/collectives/device/reduce_scatter_premulsum_u64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_uint32_tv 320 bytes stack frame, 328 bytes spill stores, 332 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_uint32_tv 416 bytes stack frame, 464 bytes spill stores, 840 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_PreMulSum_uint32_tv 616 bytes stack frame, 472 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_PreMulSum_uint32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_uint8_tv 352 bytes stack frame, 412 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_uint8_tv 392 bytes stack frame, 536 bytes spill stores, 860 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_PreMulSum_uint8_tv 560 bytes stack frame, 408 bytes spill stores, 456 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_PreMulSum_uint8_tv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_PreMulSum_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_int64_tv 240 bytes stack frame, 240 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_int64_tv 264 bytes stack frame, 292 bytes spill stores, 288 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_PreMulSum_int64_tv 552 bytes stack frame, 376 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_PreMulSum_int64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_premulsum_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_premulsum_f16.cu -o /<>/build/obj/collectives/device/reduce_scatter_premulsum_f16.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_uint64_tv 240 bytes stack frame, 240 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_uint64_tv 264 bytes stack frame, 292 bytes spill stores, 288 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_PreMulSum_uint64_tv 552 bytes stack frame, 376 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_PreMulSum_uint64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_uint32_tv 256 bytes stack frame, 264 bytes spill stores, 256 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_uint32_tv 328 bytes stack frame, 348 bytes spill stores, 476 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_PreMulSum_uint32_tv 560 bytes stack frame, 416 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_PreMulSum_uint32_tv 160 bytes stack frame, 160 bytes spill stores, 160 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_PreMulSum_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_premulsum_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_premulsum_f32.cu -o /<>/build/obj/collectives/device/reduce_scatter_premulsum_f32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_int64_tv 240 bytes stack frame, 240 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_int64_tv 264 bytes stack frame, 292 bytes spill stores, 288 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_PreMulSum_int64_tv 552 bytes stack frame, 376 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_PreMulSum_int64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_halfv 208 bytes stack frame, 204 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_halfv 240 bytes stack frame, 264 bytes spill stores, 256 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL128_PreMulSum_halfv 560 bytes stack frame, 404 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL_PreMulSum_halfv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_uint64_tv 240 bytes stack frame, 240 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_uint64_tv 264 bytes stack frame, 292 bytes spill stores, 288 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_PreMulSum_uint64_tv 552 bytes stack frame, 376 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_PreMulSum_uint64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_floatv 208 bytes stack frame, 208 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_floatv 272 bytes stack frame, 304 bytes spill stores, 316 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL128_PreMulSum_floatv 568 bytes stack frame, 428 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL_PreMulSum_floatv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_int64_tv 240 bytes stack frame, 240 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_int64_tv 264 bytes stack frame, 292 bytes spill stores, 288 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_PreMulSum_int64_tv 552 bytes stack frame, 376 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_PreMulSum_int64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_halfv 224 bytes stack frame, 224 bytes spill stores, 212 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_halfv 272 bytes stack frame, 304 bytes spill stores, 316 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL128_PreMulSum_halfv 560 bytes stack frame, 404 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL_PreMulSum_halfv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_uint64_tv 240 bytes stack frame, 240 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_uint64_tv 264 bytes stack frame, 292 bytes spill stores, 288 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_PreMulSum_uint64_tv 552 bytes stack frame, 376 bytes spill stores, 448 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_PreMulSum_uint64_tv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_floatv 208 bytes stack frame, 208 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_floatv 272 bytes stack frame, 304 bytes spill stores, 316 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL128_PreMulSum_floatv 568 bytes stack frame, 428 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL_PreMulSum_floatv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_int64_tv 264 bytes stack frame, 268 bytes spill stores, 248 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_int64_tv 312 bytes stack frame, 336 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_PreMulSum_int64_tv 600 bytes stack frame, 444 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_PreMulSum_int64_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_uint64_tv 264 bytes stack frame, 268 bytes spill stores, 248 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_uint64_tv 312 bytes stack frame, 336 bytes spill stores, 360 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_PreMulSum_uint64_tv 600 bytes stack frame, 444 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_PreMulSum_uint64_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_halfv 208 bytes stack frame, 204 bytes spill stores, 196 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_halfv 240 bytes stack frame, 264 bytes spill stores, 256 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL128_PreMulSum_halfv 560 bytes stack frame, 404 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL_PreMulSum_halfv 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_floatv 208 bytes stack frame, 208 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_floatv 272 bytes stack frame, 304 bytes spill stores, 316 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL128_PreMulSum_floatv 568 bytes stack frame, 428 bytes spill stores, 452 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL_PreMulSum_floatv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_int64_tv 320 bytes stack frame, 328 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_int64_tv 432 bytes stack frame, 484 bytes spill stores, 924 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_PreMulSum_int64_tv 608 bytes stack frame, 452 bytes spill stores, 560 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_PreMulSum_int64_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_uint64_tv 320 bytes stack frame, 328 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_uint64_tv 432 bytes stack frame, 484 bytes spill stores, 924 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_PreMulSum_uint64_tv 608 bytes stack frame, 452 bytes spill stores, 560 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_PreMulSum_uint64_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_halfv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_halfv 296 bytes stack frame, 324 bytes spill stores, 328 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL128_PreMulSum_halfv 616 bytes stack frame, 476 bytes spill stores, 564 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL_PreMulSum_halfv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_floatv 224 bytes stack frame, 216 bytes spill stores, 208 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_floatv 320 bytes stack frame, 348 bytes spill stores, 384 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL128_PreMulSum_floatv 608 bytes stack frame, 464 bytes spill stores, 548 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL_PreMulSum_floatv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_int64_tv 248 bytes stack frame, 252 bytes spill stores, 240 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_int64_tv 344 bytes stack frame, 380 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_PreMulSum_int64_tv 544 bytes stack frame, 412 bytes spill stores, 484 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_PreMulSum_int64_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_PreMulSum_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_premulsum_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_premulsum_f64.cu -o /<>/build/obj/collectives/device/reduce_scatter_premulsum_f64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_floatv 304 bytes stack frame, 316 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_floatv 416 bytes stack frame, 464 bytes spill stores, 840 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL128_PreMulSum_floatv 616 bytes stack frame, 472 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL_PreMulSum_floatv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_uint64_tv 248 bytes stack frame, 252 bytes spill stores, 240 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_uint64_tv 344 bytes stack frame, 380 bytes spill stores, 636 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_PreMulSum_uint64_tv 544 bytes stack frame, 412 bytes spill stores, 484 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_PreMulSum_uint64_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_PreMulSum_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_halfv 304 bytes stack frame, 312 bytes spill stores, 300 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_halfv 408 bytes stack frame, 464 bytes spill stores, 780 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL128_PreMulSum_halfv 624 bytes stack frame, 480 bytes spill stores, 568 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL_PreMulSum_halfv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_premulsum_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=4 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_premulsum_bf16.cu -o /<>/build/obj/collectives/device/reduce_scatter_premulsum_bf16.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_doublev 232 bytes stack frame, 232 bytes spill stores, 220 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_doublev 256 bytes stack frame, 288 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL128_PreMulSum_doublev 544 bytes stack frame, 376 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL_PreMulSum_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_floatv 248 bytes stack frame, 256 bytes spill stores, 244 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_floatv 328 bytes stack frame, 348 bytes spill stores, 476 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL128_PreMulSum_floatv 560 bytes stack frame, 416 bytes spill stores, 460 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_RING_LL_PreMulSum_floatv 160 bytes stack frame, 160 bytes spill stores, 160 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_ReduceScatter_TREE_LL_PreMulSum_floatv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=0 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i8.cu -o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i8.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_halfv 248 bytes stack frame, 252 bytes spill stores, 240 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_halfv 304 bytes stack frame, 324 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL128_PreMulSum_halfv 560 bytes stack frame, 424 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_RING_LL_PreMulSum_halfv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z49ncclFunction_ReduceScatter_TREE_LL_PreMulSum_halfv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u8.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=1 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u8.cu -o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u8.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_doublev 232 bytes stack frame, 232 bytes spill stores, 220 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_doublev 256 bytes stack frame, 288 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL128_PreMulSum_doublev 544 bytes stack frame, 376 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL_PreMulSum_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum___nv_bfloat16v 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z71ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z70ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z72ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z71ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum___nv_bfloat16v 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_RING_LL128_PreMulSum___nv_bfloat16v 560 bytes stack frame, 404 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_RING_LL_PreMulSum___nv_bfloat16v 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_int8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_int8_tv 320 bytes stack frame, 392 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_int8_tv 568 bytes stack frame, 416 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_SumPostDiv_int8_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_doublev 232 bytes stack frame, 232 bytes spill stores, 220 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_doublev 256 bytes stack frame, 288 bytes spill stores, 284 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL128_PreMulSum_doublev 544 bytes stack frame, 376 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL_PreMulSum_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_uint8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_uint8_tv 320 bytes stack frame, 392 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_uint8_tv 568 bytes stack frame, 416 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_SumPostDiv_uint8_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_doublev 248 bytes stack frame, 240 bytes spill stores, 232 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_doublev 304 bytes stack frame, 328 bytes spill stores, 336 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL128_PreMulSum_doublev 600 bytes stack frame, 436 bytes spill stores, 552 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL_PreMulSum_doublev 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_int8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_int8_tv 320 bytes stack frame, 392 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_int8_tv 568 bytes stack frame, 416 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_SumPostDiv_int8_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_uint8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_uint8_tv 320 bytes stack frame, 392 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_uint8_tv 568 bytes stack frame, 416 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_SumPostDiv_uint8_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum___nv_bfloat16v 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z71ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z70ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z72ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z71ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum___nv_bfloat16v 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_RING_LL128_PreMulSum___nv_bfloat16v 560 bytes stack frame, 404 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_RING_LL_PreMulSum___nv_bfloat16v 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_doublev 312 bytes stack frame, 320 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_doublev 384 bytes stack frame, 448 bytes spill stores, 616 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL128_PreMulSum_doublev 616 bytes stack frame, 468 bytes spill stores, 568 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL_PreMulSum_doublev 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_int8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_int8_tv 320 bytes stack frame, 392 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_int8_tv 568 bytes stack frame, 416 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_SumPostDiv_int8_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_uint8_tv 280 bytes stack frame, 296 bytes spill stores, 280 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_uint8_tv 320 bytes stack frame, 392 bytes spill stores, 496 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_uint8_tv 568 bytes stack frame, 416 bytes spill stores, 444 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_SumPostDiv_uint8_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum_doublev 248 bytes stack frame, 252 bytes spill stores, 240 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_NVLS_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum_doublev 288 bytes stack frame, 300 bytes spill stores, 348 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL128_PreMulSum_doublev 544 bytes stack frame, 404 bytes spill stores, 472 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_RING_LL_PreMulSum_doublev 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL128_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z51ncclFunction_ReduceScatter_TREE_LL_PreMulSum_doublev 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=2 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i32.cu -o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_int8_tv 296 bytes stack frame, 308 bytes spill stores, 296 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_int8_tv 368 bytes stack frame, 432 bytes spill stores, 548 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_int8_tv 616 bytes stack frame, 476 bytes spill stores, 548 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_SumPostDiv_int8_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_uint8_tv 296 bytes stack frame, 308 bytes spill stores, 296 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_uint8_tv 368 bytes stack frame, 432 bytes spill stores, 568 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_uint8_tv 624 bytes stack frame, 484 bytes spill stores, 556 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_SumPostDiv_uint8_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum___nv_bfloat16v 192 bytes stack frame, 192 bytes spill stores, 192 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z71ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z70ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z72ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z71ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum___nv_bfloat16v 208 bytes stack frame, 200 bytes spill stores, 200 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_RING_LL128_PreMulSum___nv_bfloat16v 560 bytes stack frame, 404 bytes spill stores, 440 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_RING_LL_PreMulSum___nv_bfloat16v 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_int32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_int32_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_int32_tv 552 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_SumPostDiv_int32_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_int8_tv 432 bytes stack frame, 492 bytes spill stores, 708 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_int8_tv 512 bytes stack frame, 708 bytes spill stores, 1272 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_int8_tv 624 bytes stack frame, 480 bytes spill stores, 568 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_SumPostDiv_int8_tv 184 bytes stack frame, 184 bytes spill stores, 184 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_uint8_tv 432 bytes stack frame, 492 bytes spill stores, 708 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_uint8_tv 504 bytes stack frame, 708 bytes spill stores, 1272 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_uint8_tv 624 bytes stack frame, 480 bytes spill stores, 568 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_SumPostDiv_uint8_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_int32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_int32_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_int32_tv 552 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_SumPostDiv_int32_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum___nv_bfloat16v 248 bytes stack frame, 240 bytes spill stores, 232 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z71ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z70ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z72ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z71ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum___nv_bfloat16v 272 bytes stack frame, 296 bytes spill stores, 304 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_RING_LL128_PreMulSum___nv_bfloat16v 608 bytes stack frame, 464 bytes spill stores, 548 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_RING_LL_PreMulSum___nv_bfloat16v 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z60ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_int8_tv 424 bytes stack frame, 512 bytes spill stores, 768 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_int8_tv 448 bytes stack frame, 572 bytes spill stores, 920 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_int8_tv 568 bytes stack frame, 428 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_RING_LL_SumPostDiv_int8_tv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z55ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z52ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_int8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=3 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u32.cu -o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u32.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_uint8_tv 424 bytes stack frame, 512 bytes spill stores, 768 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_uint8_tv 448 bytes stack frame, 572 bytes spill stores, 920 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_uint8_tv 568 bytes stack frame, 428 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_SumPostDiv_uint8_tv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_uint8_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_int32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_int32_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_int32_tv 552 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_SumPostDiv_int32_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=4 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i64.cu -o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum___nv_bfloat16v 304 bytes stack frame, 312 bytes spill stores, 300 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z71ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z70ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z72ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z71ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum___nv_bfloat16v 408 bytes stack frame, 464 bytes spill stores, 780 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_RING_LL128_PreMulSum___nv_bfloat16v 624 bytes stack frame, 480 bytes spill stores, 568 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_RING_LL_PreMulSum___nv_bfloat16v 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_int32_tv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_int32_tv 320 bytes stack frame, 344 bytes spill stores, 376 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_int32_tv 608 bytes stack frame, 464 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_SumPostDiv_int32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_uint32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_uint32_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_uint32_tv 552 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL_SumPostDiv_uint32_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_int64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_int64_tv 280 bytes stack frame, 312 bytes spill stores, 340 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_int64_tv 560 bytes stack frame, 420 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_SumPostDiv_int64_tv 136 bytes stack frame, 132 bytes spill stores, 132 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_NVLS_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_NVLS_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_SIMPLE_PreMulSum___nv_bfloat16v 248 bytes stack frame, 252 bytes spill stores, 240 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z71ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z70ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z72ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z71ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_RING_SIMPLE_PreMulSum___nv_bfloat16v 304 bytes stack frame, 324 bytes spill stores, 408 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_RING_LL128_PreMulSum___nv_bfloat16v 560 bytes stack frame, 424 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_RING_LL_PreMulSum___nv_bfloat16v 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_TREE_SIMPLE_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_TREE_LL128_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_TREE_LL_PreMulSum___nv_bfloat16v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=5 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u64.cu -o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_int32_tv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_int32_tv 424 bytes stack frame, 492 bytes spill stores, 832 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_int32_tv 616 bytes stack frame, 476 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_SumPostDiv_int32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_uint32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_uint32_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_uint32_tv 552 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL_SumPostDiv_uint32_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_int64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_int64_tv 280 bytes stack frame, 312 bytes spill stores, 340 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_int64_tv 560 bytes stack frame, 420 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_SumPostDiv_int64_tv 136 bytes stack frame, 132 bytes spill stores, 132 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_int32_tv 272 bytes stack frame, 280 bytes spill stores, 292 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_int32_tv 328 bytes stack frame, 356 bytes spill stores, 488 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_int32_tv 568 bytes stack frame, 432 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_SumPostDiv_int32_tv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_int32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_uint64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_uint64_tv 264 bytes stack frame, 296 bytes spill stores, 288 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_uint64_tv 560 bytes stack frame, 420 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL_SumPostDiv_uint64_tv 128 bytes stack frame, 128 bytes spill stores, 128 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=6 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f16.cu -o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f16.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_int64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_int64_tv 280 bytes stack frame, 312 bytes spill stores, 340 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_int64_tv 560 bytes stack frame, 420 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_SumPostDiv_int64_tv 136 bytes stack frame, 132 bytes spill stores, 132 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_uint32_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_uint32_tv 272 bytes stack frame, 300 bytes spill stores, 312 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_uint32_tv 552 bytes stack frame, 404 bytes spill stores, 420 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL_SumPostDiv_uint32_tv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_uint64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_uint64_tv 264 bytes stack frame, 296 bytes spill stores, 288 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_uint64_tv 560 bytes stack frame, 420 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL_SumPostDiv_uint64_tv 128 bytes stack frame, 128 bytes spill stores, 128 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_int64_tv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_int64_tv 360 bytes stack frame, 400 bytes spill stores, 480 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_int64_tv 608 bytes stack frame, 464 bytes spill stores, 564 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_SumPostDiv_int64_tv 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_uint32_tv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_uint32_tv 312 bytes stack frame, 336 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_uint32_tv 608 bytes stack frame, 464 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL_SumPostDiv_uint32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f32.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=7 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f32.cu -o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f32.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_uint64_tv 232 bytes stack frame, 236 bytes spill stores, 228 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_uint64_tv 264 bytes stack frame, 296 bytes spill stores, 288 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_uint64_tv 560 bytes stack frame, 420 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL_SumPostDiv_uint64_tv 128 bytes stack frame, 128 bytes spill stores, 128 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_int64_tv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_int64_tv 408 bytes stack frame, 460 bytes spill stores, 840 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_int64_tv 616 bytes stack frame, 484 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_SumPostDiv_int64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_uint32_tv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_uint32_tv 416 bytes stack frame, 484 bytes spill stores, 824 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_uint32_tv 616 bytes stack frame, 476 bytes spill stores, 528 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL_SumPostDiv_uint32_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_uint64_tv 240 bytes stack frame, 232 bytes spill stores, 224 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_uint64_tv 352 bytes stack frame, 388 bytes spill stores, 468 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_uint64_tv 608 bytes stack frame, 464 bytes spill stores, 564 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL_SumPostDiv_uint64_tv 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f64.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=8 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f64.cu -o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f64.o ptxas info : 0 bytes gmem ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z61ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_int64_tv 280 bytes stack frame, 288 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z65ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_int64_tv 352 bytes stack frame, 376 bytes spill stores, 580 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_int64_tv 568 bytes stack frame, 440 bytes spill stores, 540 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_RING_LL_SumPostDiv_int64_tv 176 bytes stack frame, 172 bytes spill stores, 172 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z56ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z53ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_int64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_uint32_tv 272 bytes stack frame, 280 bytes spill stores, 292 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_uint32_tv 328 bytes stack frame, 368 bytes spill stores, 500 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_uint32_tv 568 bytes stack frame, 432 bytes spill stores, 508 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL_SumPostDiv_uint32_tv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_uint32_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads Compiling reduce_scatter.cu > /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_bf16.o mkdir -p /<>/build/obj/collectives/device /usr/bin/nvcc -DNCCL_OP=5 -DNCCL_TYPE=9 -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_bf16.cu -o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_bf16.o Compiling functions.cu > /<>/build/obj/collectives/device/functions.o mkdir -p `dirname /<>/build/obj/collectives/device/functions.o` /usr/bin/nvcc -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc functions.cu -o /<>/build/obj/collectives/device/functions.o ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 43288 bytes gmem ptxas info : Function properties for _Z25ncclWorkaroundClangD55580v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_uint64_tv 352 bytes stack frame, 360 bytes spill stores, 368 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_uint64_tv 400 bytes stack frame, 436 bytes spill stores, 816 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_uint64_tv 616 bytes stack frame, 484 bytes spill stores, 576 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL_SumPostDiv_uint64_tv 176 bytes stack frame, 176 bytes spill stores, 176 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 43288 bytes gmem ptxas info : Function properties for _Z25ncclWorkaroundClangD55580v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem ptxas info : 43288 bytes gmem ptxas info : Function properties for _Z25ncclWorkaroundClangD55580v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 0 bytes gmem Compiling onerank_reduce.cu > /<>/build/obj/collectives/device/onerank_reduce.o mkdir -p `dirname /<>/build/obj/collectives/device/onerank_reduce.o` /usr/bin/nvcc -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dc onerank_reduce.cu -o /<>/build/obj/collectives/device/onerank_reduce.o ptxas info : 43288 bytes gmem ptxas info : Function properties for _Z25ncclWorkaroundClangD55580v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : 43288 bytes gmem ptxas info : Function properties for _Z25ncclWorkaroundClangD55580v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_NVLS_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z62ncclFunction_ReduceScatter_NVLS_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z59ncclFunction_ReduceScatter_NVLS_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_NVLS_SIMPLE_SumPostDiv_uint64_tv 280 bytes stack frame, 288 bytes spill stores, 308 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_NVLS_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_NVLS_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_CHAIN_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z66ncclFunction_ReduceScatter_COLLNET_CHAIN_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z63ncclFunction_ReduceScatter_COLLNET_CHAIN_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z68ncclFunction_ReduceScatter_COLLNET_DIRECT_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z67ncclFunction_ReduceScatter_COLLNET_DIRECT_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z64ncclFunction_ReduceScatter_COLLNET_DIRECT_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_RING_SIMPLE_SumPostDiv_uint64_tv 312 bytes stack frame, 328 bytes spill stores, 404 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_RING_LL128_SumPostDiv_uint64_tv 568 bytes stack frame, 440 bytes spill stores, 540 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_RING_LL_SumPostDiv_uint64_tv 184 bytes stack frame, 180 bytes spill stores, 180 bytes spill loads ptxas info : Function properties for _Z58ncclFunction_ReduceScatter_TREE_SIMPLE_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z57ncclFunction_ReduceScatter_TREE_LL128_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z54ncclFunction_ReduceScatter_TREE_LL_SumPostDiv_uint64_tv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 43288 bytes gmem ptxas info : Function properties for _Z25ncclWorkaroundClangD55580v 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z43ncclFunction_OneRankReduce_PreMulSum_doublev 64 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_OneRankReduce_PreMulSum_floatv 64 bytes stack frame, 60 bytes spill stores, 60 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_OneRankReduce_PreMulSum___nv_bfloat16v 64 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_OneRankReduce_PreMulSum_halfv 64 bytes stack frame, 60 bytes spill stores, 60 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_OneRankReduce_PreMulSum_uint64_tv 64 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_OneRankReduce_PreMulSum_int64_tv 64 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_OneRankReduce_PreMulSum_uint32_tv 64 bytes stack frame, 60 bytes spill stores, 60 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_OneRankReduce_PreMulSum_int32_tv 64 bytes stack frame, 60 bytes spill stores, 60 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_OneRankReduce_PreMulSum_uint8_tv 136 bytes stack frame, 136 bytes spill stores, 136 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_OneRankReduce_PreMulSum_int8_tv 136 bytes stack frame, 136 bytes spill stores, 136 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z43ncclFunction_OneRankReduce_PreMulSum_doublev 64 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_OneRankReduce_PreMulSum_floatv 64 bytes stack frame, 60 bytes spill stores, 60 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_OneRankReduce_PreMulSum___nv_bfloat16v 64 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_OneRankReduce_PreMulSum_halfv 64 bytes stack frame, 60 bytes spill stores, 60 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_OneRankReduce_PreMulSum_uint64_tv 64 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_OneRankReduce_PreMulSum_int64_tv 64 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_OneRankReduce_PreMulSum_uint32_tv 64 bytes stack frame, 60 bytes spill stores, 60 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_OneRankReduce_PreMulSum_int32_tv 64 bytes stack frame, 60 bytes spill stores, 60 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_OneRankReduce_PreMulSum_uint8_tv 136 bytes stack frame, 136 bytes spill stores, 136 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_OneRankReduce_PreMulSum_int8_tv 136 bytes stack frame, 136 bytes spill stores, 136 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z43ncclFunction_OneRankReduce_PreMulSum_doublev 64 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_OneRankReduce_PreMulSum_floatv 64 bytes stack frame, 60 bytes spill stores, 60 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_OneRankReduce_PreMulSum___nv_bfloat16v 64 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_OneRankReduce_PreMulSum_halfv 64 bytes stack frame, 60 bytes spill stores, 60 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_OneRankReduce_PreMulSum_uint64_tv 64 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_OneRankReduce_PreMulSum_int64_tv 64 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_OneRankReduce_PreMulSum_uint32_tv 64 bytes stack frame, 60 bytes spill stores, 60 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_OneRankReduce_PreMulSum_int32_tv 64 bytes stack frame, 60 bytes spill stores, 60 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_OneRankReduce_PreMulSum_uint8_tv 136 bytes stack frame, 136 bytes spill stores, 136 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_OneRankReduce_PreMulSum_int8_tv 136 bytes stack frame, 136 bytes spill stores, 136 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z43ncclFunction_OneRankReduce_PreMulSum_doublev 72 bytes stack frame, 72 bytes spill stores, 72 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_OneRankReduce_PreMulSum_floatv 72 bytes stack frame, 72 bytes spill stores, 72 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_OneRankReduce_PreMulSum___nv_bfloat16v 80 bytes stack frame, 80 bytes spill stores, 80 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_OneRankReduce_PreMulSum_halfv 72 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_OneRankReduce_PreMulSum_uint64_tv 72 bytes stack frame, 72 bytes spill stores, 72 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_OneRankReduce_PreMulSum_int64_tv 72 bytes stack frame, 72 bytes spill stores, 72 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_OneRankReduce_PreMulSum_uint32_tv 72 bytes stack frame, 72 bytes spill stores, 72 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_OneRankReduce_PreMulSum_int32_tv 72 bytes stack frame, 72 bytes spill stores, 72 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_OneRankReduce_PreMulSum_uint8_tv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_OneRankReduce_PreMulSum_int8_tv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z43ncclFunction_OneRankReduce_PreMulSum_doublev 144 bytes stack frame, 140 bytes spill stores, 140 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_OneRankReduce_PreMulSum_floatv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_OneRankReduce_PreMulSum___nv_bfloat16v 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_OneRankReduce_PreMulSum_halfv 144 bytes stack frame, 144 bytes spill stores, 144 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_OneRankReduce_PreMulSum_uint64_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_OneRankReduce_PreMulSum_int64_tv 152 bytes stack frame, 148 bytes spill stores, 148 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_OneRankReduce_PreMulSum_uint32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_OneRankReduce_PreMulSum_int32_tv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_OneRankReduce_PreMulSum_uint8_tv 256 bytes stack frame, 256 bytes spill stores, 220 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_OneRankReduce_PreMulSum_int8_tv 256 bytes stack frame, 256 bytes spill stores, 220 bytes spill loads ptxas info : 0 bytes gmem ptxas info : Function properties for _Z43ncclFunction_OneRankReduce_PreMulSum_doublev 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z42ncclFunction_OneRankReduce_PreMulSum_floatv 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z50ncclFunction_OneRankReduce_PreMulSum___nv_bfloat16v 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z41ncclFunction_OneRankReduce_PreMulSum_halfv 152 bytes stack frame, 152 bytes spill stores, 152 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_OneRankReduce_PreMulSum_uint64_tv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_OneRankReduce_PreMulSum_int64_tv 160 bytes stack frame, 156 bytes spill stores, 156 bytes spill loads ptxas info : Function properties for _Z45ncclFunction_OneRankReduce_PreMulSum_uint32_tv 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_OneRankReduce_PreMulSum_int32_tv 168 bytes stack frame, 168 bytes spill stores, 168 bytes spill loads ptxas info : Function properties for _Z44ncclFunction_OneRankReduce_PreMulSum_uint8_tv 232 bytes stack frame, 224 bytes spill stores, 204 bytes spill loads ptxas info : Function properties for _Z43ncclFunction_OneRankReduce_PreMulSum_int8_tv 232 bytes stack frame, 224 bytes spill stores, 204 bytes spill loads /usr/bin/nvcc -ccbin cuda-g++ -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -std=c++11 --expt-extended-lambda -Xptxas -maxrregcount=96 -Xfatbin -compress-all -O3 -Xptxas -v -Xcompiler -Wall,-Wextra,-Wno-unused-parameter -I. -I.. -I/<>/build/include -I../../include --compiler-options "-fPIC -fvisibility=hidden" -dlink /<>/build/obj/collectives/device/sendrecv_sum_i8.o /<>/build/obj/collectives/device/sendrecv_sum_u8.o /<>/build/obj/collectives/device/sendrecv_sum_i32.o /<>/build/obj/collectives/device/sendrecv_sum_u32.o /<>/build/obj/collectives/device/sendrecv_sum_i64.o /<>/build/obj/collectives/device/sendrecv_sum_u64.o /<>/build/obj/collectives/device/sendrecv_sum_f16.o /<>/build/obj/collectives/device/sendrecv_sum_f32.o /<>/build/obj/collectives/device/sendrecv_sum_f64.o /<>/build/obj/collectives/device/sendrecv_sum_bf16.o /<>/build/obj/collectives/device/sendrecv_prod_i8.o /<>/build/obj/collectives/device/sendrecv_prod_u8.o /<>/build/obj/collectives/device/sendrecv_prod_i32.o /<>/build/obj/collectives/device/sendrecv_prod_u32.o /<>/build/obj/collectives/device/sendrecv_prod_i64.o /<>/build/obj/collectives/device/sendrecv_prod_u64.o /<>/build/obj/collectives/device/sendrecv_prod_f16.o /<>/build/obj/collectives/device/sendrecv_prod_f32.o /<>/build/obj/collectives/device/sendrecv_prod_f64.o /<>/build/obj/collectives/device/sendrecv_prod_bf16.o /<>/build/obj/collectives/device/sendrecv_min_i8.o /<>/build/obj/collectives/device/sendrecv_min_u8.o /<>/build/obj/collectives/device/sendrecv_min_i32.o /<>/build/obj/collectives/device/sendrecv_min_u32.o /<>/build/obj/collectives/device/sendrecv_min_i64.o /<>/build/obj/collectives/device/sendrecv_min_u64.o /<>/build/obj/collectives/device/sendrecv_min_f16.o /<>/build/obj/collectives/device/sendrecv_min_f32.o /<>/build/obj/collectives/device/sendrecv_min_f64.o /<>/build/obj/collectives/device/sendrecv_min_bf16.o /<>/build/obj/collectives/device/sendrecv_max_i8.o /<>/build/obj/collectives/device/sendrecv_max_u8.o /<>/build/obj/collectives/device/sendrecv_max_i32.o /<>/build/obj/collectives/device/sendrecv_max_u32.o /<>/build/obj/collectives/device/sendrecv_max_i64.o /<>/build/obj/collectives/device/sendrecv_max_u64.o /<>/build/obj/collectives/device/sendrecv_max_f16.o /<>/build/obj/collectives/device/sendrecv_max_f32.o /<>/build/obj/collectives/device/sendrecv_max_f64.o /<>/build/obj/collectives/device/sendrecv_max_bf16.o /<>/build/obj/collectives/device/sendrecv_premulsum_i8.o /<>/build/obj/collectives/device/sendrecv_premulsum_u8.o /<>/build/obj/collectives/device/sendrecv_premulsum_i32.o /<>/build/obj/collectives/device/sendrecv_premulsum_u32.o /<>/build/obj/collectives/device/sendrecv_premulsum_i64.o /<>/build/obj/collectives/device/sendrecv_premulsum_u64.o /<>/build/obj/collectives/device/sendrecv_premulsum_f16.o /<>/build/obj/collectives/device/sendrecv_premulsum_f32.o /<>/build/obj/collectives/device/sendrecv_premulsum_f64.o /<>/build/obj/collectives/device/sendrecv_premulsum_bf16.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i8.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u8.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i32.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u32.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i64.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u64.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f16.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f32.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f64.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_bf16.o /<>/build/obj/collectives/device/all_reduce_sum_i8.o /<>/build/obj/collectives/device/all_reduce_sum_u8.o /<>/build/obj/collectives/device/all_reduce_sum_i32.o /<>/build/obj/collectives/device/all_reduce_sum_u32.o /<>/build/obj/collectives/device/all_reduce_sum_i64.o /<>/build/obj/collectives/device/all_reduce_sum_u64.o /<>/build/obj/collectives/device/all_reduce_sum_f16.o /<>/build/obj/collectives/device/all_reduce_sum_f32.o /<>/build/obj/collectives/device/all_reduce_sum_f64.o /<>/build/obj/collectives/device/all_reduce_sum_bf16.o /<>/build/obj/collectives/device/all_reduce_prod_i8.o /<>/build/obj/collectives/device/all_reduce_prod_u8.o /<>/build/obj/collectives/device/all_reduce_prod_i32.o /<>/build/obj/collectives/device/all_reduce_prod_u32.o /<>/build/obj/collectives/device/all_reduce_prod_i64.o /<>/build/obj/collectives/device/all_reduce_prod_u64.o /<>/build/obj/collectives/device/all_reduce_prod_f16.o /<>/build/obj/collectives/device/all_reduce_prod_f32.o /<>/build/obj/collectives/device/all_reduce_prod_f64.o /<>/build/obj/collectives/device/all_reduce_prod_bf16.o /<>/build/obj/collectives/device/all_reduce_min_i8.o /<>/build/obj/collectives/device/all_reduce_min_u8.o /<>/build/obj/collectives/device/all_reduce_min_i32.o /<>/build/obj/collectives/device/all_reduce_min_u32.o /<>/build/obj/collectives/device/all_reduce_min_i64.o /<>/build/obj/collectives/device/all_reduce_min_u64.o /<>/build/obj/collectives/device/all_reduce_min_f16.o /<>/build/obj/collectives/device/all_reduce_min_f32.o /<>/build/obj/collectives/device/all_reduce_min_f64.o /<>/build/obj/collectives/device/all_reduce_min_bf16.o /<>/build/obj/collectives/device/all_reduce_max_i8.o /<>/build/obj/collectives/device/all_reduce_max_u8.o /<>/build/obj/collectives/device/all_reduce_max_i32.o /<>/build/obj/collectives/device/all_reduce_max_u32.o /<>/build/obj/collectives/device/all_reduce_max_i64.o /<>/build/obj/collectives/device/all_reduce_max_u64.o /<>/build/obj/collectives/device/all_reduce_max_f16.o /<>/build/obj/collectives/device/all_reduce_max_f32.o /<>/build/obj/collectives/device/all_reduce_max_f64.o /<>/build/obj/collectives/device/all_reduce_max_bf16.o /<>/build/obj/collectives/device/all_reduce_premulsum_i8.o /<>/build/obj/collectives/device/all_reduce_premulsum_u8.o /<>/build/obj/collectives/device/all_reduce_premulsum_i32.o /<>/build/obj/collectives/device/all_reduce_premulsum_u32.o /<>/build/obj/collectives/device/all_reduce_premulsum_i64.o /<>/build/obj/collectives/device/all_reduce_premulsum_u64.o /<>/build/obj/collectives/device/all_reduce_premulsum_f16.o /<>/build/obj/collectives/device/all_reduce_premulsum_f32.o /<>/build/obj/collectives/device/all_reduce_premulsum_f64.o /<>/build/obj/collectives/device/all_reduce_premulsum_bf16.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i8.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u8.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i32.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u32.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i64.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u64.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f16.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f32.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f64.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_bf16.o /<>/build/obj/collectives/device/all_gather_sum_i8.o /<>/build/obj/collectives/device/all_gather_sum_u8.o /<>/build/obj/collectives/device/all_gather_sum_i32.o /<>/build/obj/collectives/device/all_gather_sum_u32.o /<>/build/obj/collectives/device/all_gather_sum_i64.o /<>/build/obj/collectives/device/all_gather_sum_u64.o /<>/build/obj/collectives/device/all_gather_sum_f16.o /<>/build/obj/collectives/device/all_gather_sum_f32.o /<>/build/obj/collectives/device/all_gather_sum_f64.o /<>/build/obj/collectives/device/all_gather_sum_bf16.o /<>/build/obj/collectives/device/all_gather_prod_i8.o /<>/build/obj/collectives/device/all_gather_prod_u8.o /<>/build/obj/collectives/device/all_gather_prod_i32.o /<>/build/obj/collectives/device/all_gather_prod_u32.o /<>/build/obj/collectives/device/all_gather_prod_i64.o /<>/build/obj/collectives/device/all_gather_prod_u64.o /<>/build/obj/collectives/device/all_gather_prod_f16.o /<>/build/obj/collectives/device/all_gather_prod_f32.o /<>/build/obj/collectives/device/all_gather_prod_f64.o /<>/build/obj/collectives/device/all_gather_prod_bf16.o /<>/build/obj/collectives/device/all_gather_min_i8.o /<>/build/obj/collectives/device/all_gather_min_u8.o /<>/build/obj/collectives/device/all_gather_min_i32.o /<>/build/obj/collectives/device/all_gather_min_u32.o /<>/build/obj/collectives/device/all_gather_min_i64.o /<>/build/obj/collectives/device/all_gather_min_u64.o /<>/build/obj/collectives/device/all_gather_min_f16.o /<>/build/obj/collectives/device/all_gather_min_f32.o /<>/build/obj/collectives/device/all_gather_min_f64.o /<>/build/obj/collectives/device/all_gather_min_bf16.o /<>/build/obj/collectives/device/all_gather_max_i8.o /<>/build/obj/collectives/device/all_gather_max_u8.o /<>/build/obj/collectives/device/all_gather_max_i32.o /<>/build/obj/collectives/device/all_gather_max_u32.o /<>/build/obj/collectives/device/all_gather_max_i64.o /<>/build/obj/collectives/device/all_gather_max_u64.o /<>/build/obj/collectives/device/all_gather_max_f16.o /<>/build/obj/collectives/device/all_gather_max_f32.o /<>/build/obj/collectives/device/all_gather_max_f64.o /<>/build/obj/collectives/device/all_gather_max_bf16.o /<>/build/obj/collectives/device/all_gather_premulsum_i8.o /<>/build/obj/collectives/device/all_gather_premulsum_u8.o /<>/build/obj/collectives/device/all_gather_premulsum_i32.o /<>/build/obj/collectives/device/all_gather_premulsum_u32.o /<>/build/obj/collectives/device/all_gather_premulsum_i64.o /<>/build/obj/collectives/device/all_gather_premulsum_u64.o /<>/build/obj/collectives/device/all_gather_premulsum_f16.o /<>/build/obj/collectives/device/all_gather_premulsum_f32.o /<>/build/obj/collectives/device/all_gather_premulsum_f64.o /<>/build/obj/collectives/device/all_gather_premulsum_bf16.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_i8.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_u8.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_i32.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_u32.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_i64.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_u64.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_f16.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_f32.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_f64.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_bf16.o /<>/build/obj/collectives/device/broadcast_sum_i8.o /<>/build/obj/collectives/device/broadcast_sum_u8.o /<>/build/obj/collectives/device/broadcast_sum_i32.o /<>/build/obj/collectives/device/broadcast_sum_u32.o /<>/build/obj/collectives/device/broadcast_sum_i64.o /<>/build/obj/collectives/device/broadcast_sum_u64.o /<>/build/obj/collectives/device/broadcast_sum_f16.o /<>/build/obj/collectives/device/broadcast_sum_f32.o /<>/build/obj/collectives/device/broadcast_sum_f64.o /<>/build/obj/collectives/device/broadcast_sum_bf16.o /<>/build/obj/collectives/device/broadcast_prod_i8.o /<>/build/obj/collectives/device/broadcast_prod_u8.o /<>/build/obj/collectives/device/broadcast_prod_i32.o /<>/build/obj/collectives/device/broadcast_prod_u32.o /<>/build/obj/collectives/device/broadcast_prod_i64.o /<>/build/obj/collectives/device/broadcast_prod_u64.o /<>/build/obj/collectives/device/broadcast_prod_f16.o /<>/build/obj/collectives/device/broadcast_prod_f32.o /<>/build/obj/collectives/device/broadcast_prod_f64.o /<>/build/obj/collectives/device/broadcast_prod_bf16.o /<>/build/obj/collectives/device/broadcast_min_i8.o /<>/build/obj/collectives/device/broadcast_min_u8.o /<>/build/obj/collectives/device/broadcast_min_i32.o /<>/build/obj/collectives/device/broadcast_min_u32.o /<>/build/obj/collectives/device/broadcast_min_i64.o /<>/build/obj/collectives/device/broadcast_min_u64.o /<>/build/obj/collectives/device/broadcast_min_f16.o /<>/build/obj/collectives/device/broadcast_min_f32.o /<>/build/obj/collectives/device/broadcast_min_f64.o /<>/build/obj/collectives/device/broadcast_min_bf16.o /<>/build/obj/collectives/device/broadcast_max_i8.o /<>/build/obj/collectives/device/broadcast_max_u8.o /<>/build/obj/collectives/device/broadcast_max_i32.o /<>/build/obj/collectives/device/broadcast_max_u32.o /<>/build/obj/collectives/device/broadcast_max_i64.o /<>/build/obj/collectives/device/broadcast_max_u64.o /<>/build/obj/collectives/device/broadcast_max_f16.o /<>/build/obj/collectives/device/broadcast_max_f32.o /<>/build/obj/collectives/device/broadcast_max_f64.o /<>/build/obj/collectives/device/broadcast_max_bf16.o /<>/build/obj/collectives/device/broadcast_premulsum_i8.o /<>/build/obj/collectives/device/broadcast_premulsum_u8.o /<>/build/obj/collectives/device/broadcast_premulsum_i32.o /<>/build/obj/collectives/device/broadcast_premulsum_u32.o /<>/build/obj/collectives/device/broadcast_premulsum_i64.o /<>/build/obj/collectives/device/broadcast_premulsum_u64.o /<>/build/obj/collectives/device/broadcast_premulsum_f16.o /<>/build/obj/collectives/device/broadcast_premulsum_f32.o /<>/build/obj/collectives/device/broadcast_premulsum_f64.o /<>/build/obj/collectives/device/broadcast_premulsum_bf16.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_i8.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_u8.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_i32.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_u32.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_i64.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_u64.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_f16.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_f32.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_f64.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_bf16.o /<>/build/obj/collectives/device/reduce_sum_i8.o /<>/build/obj/collectives/device/reduce_sum_u8.o /<>/build/obj/collectives/device/reduce_sum_i32.o /<>/build/obj/collectives/device/reduce_sum_u32.o /<>/build/obj/collectives/device/reduce_sum_i64.o /<>/build/obj/collectives/device/reduce_sum_u64.o /<>/build/obj/collectives/device/reduce_sum_f16.o /<>/build/obj/collectives/device/reduce_sum_f32.o /<>/build/obj/collectives/device/reduce_sum_f64.o /<>/build/obj/collectives/device/reduce_sum_bf16.o /<>/build/obj/collectives/device/reduce_prod_i8.o /<>/build/obj/collectives/device/reduce_prod_u8.o /<>/build/obj/collectives/device/reduce_prod_i32.o /<>/build/obj/collectives/device/reduce_prod_u32.o /<>/build/obj/collectives/device/reduce_prod_i64.o /<>/build/obj/collectives/device/reduce_prod_u64.o /<>/build/obj/collectives/device/reduce_prod_f16.o /<>/build/obj/collectives/device/reduce_prod_f32.o /<>/build/obj/collectives/device/reduce_prod_f64.o /<>/build/obj/collectives/device/reduce_prod_bf16.o /<>/build/obj/collectives/device/reduce_min_i8.o /<>/build/obj/collectives/device/reduce_min_u8.o /<>/build/obj/collectives/device/reduce_min_i32.o /<>/build/obj/collectives/device/reduce_min_u32.o /<>/build/obj/collectives/device/reduce_min_i64.o /<>/build/obj/collectives/device/reduce_min_u64.o /<>/build/obj/collectives/device/reduce_min_f16.o /<>/build/obj/collectives/device/reduce_min_f32.o /<>/build/obj/collectives/device/reduce_min_f64.o /<>/build/obj/collectives/device/reduce_min_bf16.o /<>/build/obj/collectives/device/reduce_max_i8.o /<>/build/obj/collectives/device/reduce_max_u8.o /<>/build/obj/collectives/device/reduce_max_i32.o /<>/build/obj/collectives/device/reduce_max_u32.o /<>/build/obj/collectives/device/reduce_max_i64.o /<>/build/obj/collectives/device/reduce_max_u64.o /<>/build/obj/collectives/device/reduce_max_f16.o /<>/build/obj/collectives/device/reduce_max_f32.o /<>/build/obj/collectives/device/reduce_max_f64.o /<>/build/obj/collectives/device/reduce_max_bf16.o /<>/build/obj/collectives/device/reduce_premulsum_i8.o /<>/build/obj/collectives/device/reduce_premulsum_u8.o /<>/build/obj/collectives/device/reduce_premulsum_i32.o /<>/build/obj/collectives/device/reduce_premulsum_u32.o /<>/build/obj/collectives/device/reduce_premulsum_i64.o /<>/build/obj/collectives/device/reduce_premulsum_u64.o /<>/build/obj/collectives/device/reduce_premulsum_f16.o /<>/build/obj/collectives/device/reduce_premulsum_f32.o /<>/build/obj/collectives/device/reduce_premulsum_f64.o /<>/build/obj/collectives/device/reduce_premulsum_bf16.o /<>/build/obj/collectives/device/reduce_sumpostdiv_i8.o /<>/build/obj/collectives/device/reduce_sumpostdiv_u8.o /<>/build/obj/collectives/device/reduce_sumpostdiv_i32.o /<>/build/obj/collectives/device/reduce_sumpostdiv_u32.o /<>/build/obj/collectives/device/reduce_sumpostdiv_i64.o /<>/build/obj/collectives/device/reduce_sumpostdiv_u64.o /<>/build/obj/collectives/device/reduce_sumpostdiv_f16.o /<>/build/obj/collectives/device/reduce_sumpostdiv_f32.o /<>/build/obj/collectives/device/reduce_sumpostdiv_f64.o /<>/build/obj/collectives/device/reduce_sumpostdiv_bf16.o /<>/build/obj/collectives/device/reduce_scatter_sum_i8.o /<>/build/obj/collectives/device/reduce_scatter_sum_u8.o /<>/build/obj/collectives/device/reduce_scatter_sum_i32.o /<>/build/obj/collectives/device/reduce_scatter_sum_u32.o /<>/build/obj/collectives/device/reduce_scatter_sum_i64.o /<>/build/obj/collectives/device/reduce_scatter_sum_u64.o /<>/build/obj/collectives/device/reduce_scatter_sum_f16.o /<>/build/obj/collectives/device/reduce_scatter_sum_f32.o /<>/build/obj/collectives/device/reduce_scatter_sum_f64.o /<>/build/obj/collectives/device/reduce_scatter_sum_bf16.o /<>/build/obj/collectives/device/reduce_scatter_prod_i8.o /<>/build/obj/collectives/device/reduce_scatter_prod_u8.o /<>/build/obj/collectives/device/reduce_scatter_prod_i32.o /<>/build/obj/collectives/device/reduce_scatter_prod_u32.o /<>/build/obj/collectives/device/reduce_scatter_prod_i64.o /<>/build/obj/collectives/device/reduce_scatter_prod_u64.o /<>/build/obj/collectives/device/reduce_scatter_prod_f16.o /<>/build/obj/collectives/device/reduce_scatter_prod_f32.o /<>/build/obj/collectives/device/reduce_scatter_prod_f64.o /<>/build/obj/collectives/device/reduce_scatter_prod_bf16.o /<>/build/obj/collectives/device/reduce_scatter_min_i8.o /<>/build/obj/collectives/device/reduce_scatter_min_u8.o /<>/build/obj/collectives/device/reduce_scatter_min_i32.o /<>/build/obj/collectives/device/reduce_scatter_min_u32.o /<>/build/obj/collectives/device/reduce_scatter_min_i64.o /<>/build/obj/collectives/device/reduce_scatter_min_u64.o /<>/build/obj/collectives/device/reduce_scatter_min_f16.o /<>/build/obj/collectives/device/reduce_scatter_min_f32.o /<>/build/obj/collectives/device/reduce_scatter_min_f64.o /<>/build/obj/collectives/device/reduce_scatter_min_bf16.o /<>/build/obj/collectives/device/reduce_scatter_max_i8.o /<>/build/obj/collectives/device/reduce_scatter_max_u8.o /<>/build/obj/collectives/device/reduce_scatter_max_i32.o /<>/build/obj/collectives/device/reduce_scatter_max_u32.o /<>/build/obj/collectives/device/reduce_scatter_max_i64.o /<>/build/obj/collectives/device/reduce_scatter_max_u64.o /<>/build/obj/collectives/device/reduce_scatter_max_f16.o /<>/build/obj/collectives/device/reduce_scatter_max_f32.o /<>/build/obj/collectives/device/reduce_scatter_max_f64.o /<>/build/obj/collectives/device/reduce_scatter_max_bf16.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_i8.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_u8.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_i32.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_u32.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_i64.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_u64.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_f16.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_f32.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_f64.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_bf16.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i8.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u8.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i32.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u32.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i64.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u64.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f16.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f32.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f64.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_bf16.o /<>/build/obj/collectives/device/functions.o /<>/build/obj/collectives/device/onerank_reduce.o -o /<>/build/obj/collectives/device/devlink.o Archiving objects > /<>/build/obj/collectives/device/colldevice.a ar cr /<>/build/obj/collectives/device/colldevice.a /<>/build/obj/collectives/device/sendrecv_sum_i8.o /<>/build/obj/collectives/device/sendrecv_sum_u8.o /<>/build/obj/collectives/device/sendrecv_sum_i32.o /<>/build/obj/collectives/device/sendrecv_sum_u32.o /<>/build/obj/collectives/device/sendrecv_sum_i64.o /<>/build/obj/collectives/device/sendrecv_sum_u64.o /<>/build/obj/collectives/device/sendrecv_sum_f16.o /<>/build/obj/collectives/device/sendrecv_sum_f32.o /<>/build/obj/collectives/device/sendrecv_sum_f64.o /<>/build/obj/collectives/device/sendrecv_sum_bf16.o /<>/build/obj/collectives/device/sendrecv_prod_i8.o /<>/build/obj/collectives/device/sendrecv_prod_u8.o /<>/build/obj/collectives/device/sendrecv_prod_i32.o /<>/build/obj/collectives/device/sendrecv_prod_u32.o /<>/build/obj/collectives/device/sendrecv_prod_i64.o /<>/build/obj/collectives/device/sendrecv_prod_u64.o /<>/build/obj/collectives/device/sendrecv_prod_f16.o /<>/build/obj/collectives/device/sendrecv_prod_f32.o /<>/build/obj/collectives/device/sendrecv_prod_f64.o /<>/build/obj/collectives/device/sendrecv_prod_bf16.o /<>/build/obj/collectives/device/sendrecv_min_i8.o /<>/build/obj/collectives/device/sendrecv_min_u8.o /<>/build/obj/collectives/device/sendrecv_min_i32.o /<>/build/obj/collectives/device/sendrecv_min_u32.o /<>/build/obj/collectives/device/sendrecv_min_i64.o /<>/build/obj/collectives/device/sendrecv_min_u64.o /<>/build/obj/collectives/device/sendrecv_min_f16.o /<>/build/obj/collectives/device/sendrecv_min_f32.o /<>/build/obj/collectives/device/sendrecv_min_f64.o /<>/build/obj/collectives/device/sendrecv_min_bf16.o /<>/build/obj/collectives/device/sendrecv_max_i8.o /<>/build/obj/collectives/device/sendrecv_max_u8.o /<>/build/obj/collectives/device/sendrecv_max_i32.o /<>/build/obj/collectives/device/sendrecv_max_u32.o /<>/build/obj/collectives/device/sendrecv_max_i64.o /<>/build/obj/collectives/device/sendrecv_max_u64.o /<>/build/obj/collectives/device/sendrecv_max_f16.o /<>/build/obj/collectives/device/sendrecv_max_f32.o /<>/build/obj/collectives/device/sendrecv_max_f64.o /<>/build/obj/collectives/device/sendrecv_max_bf16.o /<>/build/obj/collectives/device/sendrecv_premulsum_i8.o /<>/build/obj/collectives/device/sendrecv_premulsum_u8.o /<>/build/obj/collectives/device/sendrecv_premulsum_i32.o /<>/build/obj/collectives/device/sendrecv_premulsum_u32.o /<>/build/obj/collectives/device/sendrecv_premulsum_i64.o /<>/build/obj/collectives/device/sendrecv_premulsum_u64.o /<>/build/obj/collectives/device/sendrecv_premulsum_f16.o /<>/build/obj/collectives/device/sendrecv_premulsum_f32.o /<>/build/obj/collectives/device/sendrecv_premulsum_f64.o /<>/build/obj/collectives/device/sendrecv_premulsum_bf16.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i8.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u8.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i32.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u32.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_i64.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_u64.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f16.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f32.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_f64.o /<>/build/obj/collectives/device/sendrecv_sumpostdiv_bf16.o /<>/build/obj/collectives/device/all_reduce_sum_i8.o /<>/build/obj/collectives/device/all_reduce_sum_u8.o /<>/build/obj/collectives/device/all_reduce_sum_i32.o /<>/build/obj/collectives/device/all_reduce_sum_u32.o /<>/build/obj/collectives/device/all_reduce_sum_i64.o /<>/build/obj/collectives/device/all_reduce_sum_u64.o /<>/build/obj/collectives/device/all_reduce_sum_f16.o /<>/build/obj/collectives/device/all_reduce_sum_f32.o /<>/build/obj/collectives/device/all_reduce_sum_f64.o /<>/build/obj/collectives/device/all_reduce_sum_bf16.o /<>/build/obj/collectives/device/all_reduce_prod_i8.o /<>/build/obj/collectives/device/all_reduce_prod_u8.o /<>/build/obj/collectives/device/all_reduce_prod_i32.o /<>/build/obj/collectives/device/all_reduce_prod_u32.o /<>/build/obj/collectives/device/all_reduce_prod_i64.o /<>/build/obj/collectives/device/all_reduce_prod_u64.o /<>/build/obj/collectives/device/all_reduce_prod_f16.o /<>/build/obj/collectives/device/all_reduce_prod_f32.o /<>/build/obj/collectives/device/all_reduce_prod_f64.o /<>/build/obj/collectives/device/all_reduce_prod_bf16.o /<>/build/obj/collectives/device/all_reduce_min_i8.o /<>/build/obj/collectives/device/all_reduce_min_u8.o /<>/build/obj/collectives/device/all_reduce_min_i32.o /<>/build/obj/collectives/device/all_reduce_min_u32.o /<>/build/obj/collectives/device/all_reduce_min_i64.o /<>/build/obj/collectives/device/all_reduce_min_u64.o /<>/build/obj/collectives/device/all_reduce_min_f16.o /<>/build/obj/collectives/device/all_reduce_min_f32.o /<>/build/obj/collectives/device/all_reduce_min_f64.o /<>/build/obj/collectives/device/all_reduce_min_bf16.o /<>/build/obj/collectives/device/all_reduce_max_i8.o /<>/build/obj/collectives/device/all_reduce_max_u8.o /<>/build/obj/collectives/device/all_reduce_max_i32.o /<>/build/obj/collectives/device/all_reduce_max_u32.o /<>/build/obj/collectives/device/all_reduce_max_i64.o /<>/build/obj/collectives/device/all_reduce_max_u64.o /<>/build/obj/collectives/device/all_reduce_max_f16.o /<>/build/obj/collectives/device/all_reduce_max_f32.o /<>/build/obj/collectives/device/all_reduce_max_f64.o /<>/build/obj/collectives/device/all_reduce_max_bf16.o /<>/build/obj/collectives/device/all_reduce_premulsum_i8.o /<>/build/obj/collectives/device/all_reduce_premulsum_u8.o /<>/build/obj/collectives/device/all_reduce_premulsum_i32.o /<>/build/obj/collectives/device/all_reduce_premulsum_u32.o /<>/build/obj/collectives/device/all_reduce_premulsum_i64.o /<>/build/obj/collectives/device/all_reduce_premulsum_u64.o /<>/build/obj/collectives/device/all_reduce_premulsum_f16.o /<>/build/obj/collectives/device/all_reduce_premulsum_f32.o /<>/build/obj/collectives/device/all_reduce_premulsum_f64.o /<>/build/obj/collectives/device/all_reduce_premulsum_bf16.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i8.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u8.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i32.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u32.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_i64.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_u64.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f16.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f32.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_f64.o /<>/build/obj/collectives/device/all_reduce_sumpostdiv_bf16.o /<>/build/obj/collectives/device/all_gather_sum_i8.o /<>/build/obj/collectives/device/all_gather_sum_u8.o /<>/build/obj/collectives/device/all_gather_sum_i32.o /<>/build/obj/collectives/device/all_gather_sum_u32.o /<>/build/obj/collectives/device/all_gather_sum_i64.o /<>/build/obj/collectives/device/all_gather_sum_u64.o /<>/build/obj/collectives/device/all_gather_sum_f16.o /<>/build/obj/collectives/device/all_gather_sum_f32.o /<>/build/obj/collectives/device/all_gather_sum_f64.o /<>/build/obj/collectives/device/all_gather_sum_bf16.o /<>/build/obj/collectives/device/all_gather_prod_i8.o /<>/build/obj/collectives/device/all_gather_prod_u8.o /<>/build/obj/collectives/device/all_gather_prod_i32.o /<>/build/obj/collectives/device/all_gather_prod_u32.o /<>/build/obj/collectives/device/all_gather_prod_i64.o /<>/build/obj/collectives/device/all_gather_prod_u64.o /<>/build/obj/collectives/device/all_gather_prod_f16.o /<>/build/obj/collectives/device/all_gather_prod_f32.o /<>/build/obj/collectives/device/all_gather_prod_f64.o /<>/build/obj/collectives/device/all_gather_prod_bf16.o /<>/build/obj/collectives/device/all_gather_min_i8.o /<>/build/obj/collectives/device/all_gather_min_u8.o /<>/build/obj/collectives/device/all_gather_min_i32.o /<>/build/obj/collectives/device/all_gather_min_u32.o /<>/build/obj/collectives/device/all_gather_min_i64.o /<>/build/obj/collectives/device/all_gather_min_u64.o /<>/build/obj/collectives/device/all_gather_min_f16.o /<>/build/obj/collectives/device/all_gather_min_f32.o /<>/build/obj/collectives/device/all_gather_min_f64.o /<>/build/obj/collectives/device/all_gather_min_bf16.o /<>/build/obj/collectives/device/all_gather_max_i8.o /<>/build/obj/collectives/device/all_gather_max_u8.o /<>/build/obj/collectives/device/all_gather_max_i32.o /<>/build/obj/collectives/device/all_gather_max_u32.o /<>/build/obj/collectives/device/all_gather_max_i64.o /<>/build/obj/collectives/device/all_gather_max_u64.o /<>/build/obj/collectives/device/all_gather_max_f16.o /<>/build/obj/collectives/device/all_gather_max_f32.o /<>/build/obj/collectives/device/all_gather_max_f64.o /<>/build/obj/collectives/device/all_gather_max_bf16.o /<>/build/obj/collectives/device/all_gather_premulsum_i8.o /<>/build/obj/collectives/device/all_gather_premulsum_u8.o /<>/build/obj/collectives/device/all_gather_premulsum_i32.o /<>/build/obj/collectives/device/all_gather_premulsum_u32.o /<>/build/obj/collectives/device/all_gather_premulsum_i64.o /<>/build/obj/collectives/device/all_gather_premulsum_u64.o /<>/build/obj/collectives/device/all_gather_premulsum_f16.o /<>/build/obj/collectives/device/all_gather_premulsum_f32.o /<>/build/obj/collectives/device/all_gather_premulsum_f64.o /<>/build/obj/collectives/device/all_gather_premulsum_bf16.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_i8.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_u8.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_i32.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_u32.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_i64.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_u64.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_f16.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_f32.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_f64.o /<>/build/obj/collectives/device/all_gather_sumpostdiv_bf16.o /<>/build/obj/collectives/device/broadcast_sum_i8.o /<>/build/obj/collectives/device/broadcast_sum_u8.o /<>/build/obj/collectives/device/broadcast_sum_i32.o /<>/build/obj/collectives/device/broadcast_sum_u32.o /<>/build/obj/collectives/device/broadcast_sum_i64.o /<>/build/obj/collectives/device/broadcast_sum_u64.o /<>/build/obj/collectives/device/broadcast_sum_f16.o /<>/build/obj/collectives/device/broadcast_sum_f32.o /<>/build/obj/collectives/device/broadcast_sum_f64.o /<>/build/obj/collectives/device/broadcast_sum_bf16.o /<>/build/obj/collectives/device/broadcast_prod_i8.o /<>/build/obj/collectives/device/broadcast_prod_u8.o /<>/build/obj/collectives/device/broadcast_prod_i32.o /<>/build/obj/collectives/device/broadcast_prod_u32.o /<>/build/obj/collectives/device/broadcast_prod_i64.o /<>/build/obj/collectives/device/broadcast_prod_u64.o /<>/build/obj/collectives/device/broadcast_prod_f16.o /<>/build/obj/collectives/device/broadcast_prod_f32.o /<>/build/obj/collectives/device/broadcast_prod_f64.o /<>/build/obj/collectives/device/broadcast_prod_bf16.o /<>/build/obj/collectives/device/broadcast_min_i8.o /<>/build/obj/collectives/device/broadcast_min_u8.o /<>/build/obj/collectives/device/broadcast_min_i32.o /<>/build/obj/collectives/device/broadcast_min_u32.o /<>/build/obj/collectives/device/broadcast_min_i64.o /<>/build/obj/collectives/device/broadcast_min_u64.o /<>/build/obj/collectives/device/broadcast_min_f16.o /<>/build/obj/collectives/device/broadcast_min_f32.o /<>/build/obj/collectives/device/broadcast_min_f64.o /<>/build/obj/collectives/device/broadcast_min_bf16.o /<>/build/obj/collectives/device/broadcast_max_i8.o /<>/build/obj/collectives/device/broadcast_max_u8.o /<>/build/obj/collectives/device/broadcast_max_i32.o /<>/build/obj/collectives/device/broadcast_max_u32.o /<>/build/obj/collectives/device/broadcast_max_i64.o /<>/build/obj/collectives/device/broadcast_max_u64.o /<>/build/obj/collectives/device/broadcast_max_f16.o /<>/build/obj/collectives/device/broadcast_max_f32.o /<>/build/obj/collectives/device/broadcast_max_f64.o /<>/build/obj/collectives/device/broadcast_max_bf16.o /<>/build/obj/collectives/device/broadcast_premulsum_i8.o /<>/build/obj/collectives/device/broadcast_premulsum_u8.o /<>/build/obj/collectives/device/broadcast_premulsum_i32.o /<>/build/obj/collectives/device/broadcast_premulsum_u32.o /<>/build/obj/collectives/device/broadcast_premulsum_i64.o /<>/build/obj/collectives/device/broadcast_premulsum_u64.o /<>/build/obj/collectives/device/broadcast_premulsum_f16.o /<>/build/obj/collectives/device/broadcast_premulsum_f32.o /<>/build/obj/collectives/device/broadcast_premulsum_f64.o /<>/build/obj/collectives/device/broadcast_premulsum_bf16.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_i8.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_u8.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_i32.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_u32.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_i64.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_u64.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_f16.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_f32.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_f64.o /<>/build/obj/collectives/device/broadcast_sumpostdiv_bf16.o /<>/build/obj/collectives/device/reduce_sum_i8.o /<>/build/obj/collectives/device/reduce_sum_u8.o /<>/build/obj/collectives/device/reduce_sum_i32.o /<>/build/obj/collectives/device/reduce_sum_u32.o /<>/build/obj/collectives/device/reduce_sum_i64.o /<>/build/obj/collectives/device/reduce_sum_u64.o /<>/build/obj/collectives/device/reduce_sum_f16.o /<>/build/obj/collectives/device/reduce_sum_f32.o /<>/build/obj/collectives/device/reduce_sum_f64.o /<>/build/obj/collectives/device/reduce_sum_bf16.o /<>/build/obj/collectives/device/reduce_prod_i8.o /<>/build/obj/collectives/device/reduce_prod_u8.o /<>/build/obj/collectives/device/reduce_prod_i32.o /<>/build/obj/collectives/device/reduce_prod_u32.o /<>/build/obj/collectives/device/reduce_prod_i64.o /<>/build/obj/collectives/device/reduce_prod_u64.o /<>/build/obj/collectives/device/reduce_prod_f16.o /<>/build/obj/collectives/device/reduce_prod_f32.o /<>/build/obj/collectives/device/reduce_prod_f64.o /<>/build/obj/collectives/device/reduce_prod_bf16.o /<>/build/obj/collectives/device/reduce_min_i8.o /<>/build/obj/collectives/device/reduce_min_u8.o /<>/build/obj/collectives/device/reduce_min_i32.o /<>/build/obj/collectives/device/reduce_min_u32.o /<>/build/obj/collectives/device/reduce_min_i64.o /<>/build/obj/collectives/device/reduce_min_u64.o /<>/build/obj/collectives/device/reduce_min_f16.o /<>/build/obj/collectives/device/reduce_min_f32.o /<>/build/obj/collectives/device/reduce_min_f64.o /<>/build/obj/collectives/device/reduce_min_bf16.o /<>/build/obj/collectives/device/reduce_max_i8.o /<>/build/obj/collectives/device/reduce_max_u8.o /<>/build/obj/collectives/device/reduce_max_i32.o /<>/build/obj/collectives/device/reduce_max_u32.o /<>/build/obj/collectives/device/reduce_max_i64.o /<>/build/obj/collectives/device/reduce_max_u64.o /<>/build/obj/collectives/device/reduce_max_f16.o /<>/build/obj/collectives/device/reduce_max_f32.o /<>/build/obj/collectives/device/reduce_max_f64.o /<>/build/obj/collectives/device/reduce_max_bf16.o /<>/build/obj/collectives/device/reduce_premulsum_i8.o /<>/build/obj/collectives/device/reduce_premulsum_u8.o /<>/build/obj/collectives/device/reduce_premulsum_i32.o /<>/build/obj/collectives/device/reduce_premulsum_u32.o /<>/build/obj/collectives/device/reduce_premulsum_i64.o /<>/build/obj/collectives/device/reduce_premulsum_u64.o /<>/build/obj/collectives/device/reduce_premulsum_f16.o /<>/build/obj/collectives/device/reduce_premulsum_f32.o /<>/build/obj/collectives/device/reduce_premulsum_f64.o /<>/build/obj/collectives/device/reduce_premulsum_bf16.o /<>/build/obj/collectives/device/reduce_sumpostdiv_i8.o /<>/build/obj/collectives/device/reduce_sumpostdiv_u8.o /<>/build/obj/collectives/device/reduce_sumpostdiv_i32.o /<>/build/obj/collectives/device/reduce_sumpostdiv_u32.o /<>/build/obj/collectives/device/reduce_sumpostdiv_i64.o /<>/build/obj/collectives/device/reduce_sumpostdiv_u64.o /<>/build/obj/collectives/device/reduce_sumpostdiv_f16.o /<>/build/obj/collectives/device/reduce_sumpostdiv_f32.o /<>/build/obj/collectives/device/reduce_sumpostdiv_f64.o /<>/build/obj/collectives/device/reduce_sumpostdiv_bf16.o /<>/build/obj/collectives/device/reduce_scatter_sum_i8.o /<>/build/obj/collectives/device/reduce_scatter_sum_u8.o /<>/build/obj/collectives/device/reduce_scatter_sum_i32.o /<>/build/obj/collectives/device/reduce_scatter_sum_u32.o /<>/build/obj/collectives/device/reduce_scatter_sum_i64.o /<>/build/obj/collectives/device/reduce_scatter_sum_u64.o /<>/build/obj/collectives/device/reduce_scatter_sum_f16.o /<>/build/obj/collectives/device/reduce_scatter_sum_f32.o /<>/build/obj/collectives/device/reduce_scatter_sum_f64.o /<>/build/obj/collectives/device/reduce_scatter_sum_bf16.o /<>/build/obj/collectives/device/reduce_scatter_prod_i8.o /<>/build/obj/collectives/device/reduce_scatter_prod_u8.o /<>/build/obj/collectives/device/reduce_scatter_prod_i32.o /<>/build/obj/collectives/device/reduce_scatter_prod_u32.o /<>/build/obj/collectives/device/reduce_scatter_prod_i64.o /<>/build/obj/collectives/device/reduce_scatter_prod_u64.o /<>/build/obj/collectives/device/reduce_scatter_prod_f16.o /<>/build/obj/collectives/device/reduce_scatter_prod_f32.o /<>/build/obj/collectives/device/reduce_scatter_prod_f64.o /<>/build/obj/collectives/device/reduce_scatter_prod_bf16.o /<>/build/obj/collectives/device/reduce_scatter_min_i8.o /<>/build/obj/collectives/device/reduce_scatter_min_u8.o /<>/build/obj/collectives/device/reduce_scatter_min_i32.o /<>/build/obj/collectives/device/reduce_scatter_min_u32.o /<>/build/obj/collectives/device/reduce_scatter_min_i64.o /<>/build/obj/collectives/device/reduce_scatter_min_u64.o /<>/build/obj/collectives/device/reduce_scatter_min_f16.o /<>/build/obj/collectives/device/reduce_scatter_min_f32.o /<>/build/obj/collectives/device/reduce_scatter_min_f64.o /<>/build/obj/collectives/device/reduce_scatter_min_bf16.o /<>/build/obj/collectives/device/reduce_scatter_max_i8.o /<>/build/obj/collectives/device/reduce_scatter_max_u8.o /<>/build/obj/collectives/device/reduce_scatter_max_i32.o /<>/build/obj/collectives/device/reduce_scatter_max_u32.o /<>/build/obj/collectives/device/reduce_scatter_max_i64.o /<>/build/obj/collectives/device/reduce_scatter_max_u64.o /<>/build/obj/collectives/device/reduce_scatter_max_f16.o /<>/build/obj/collectives/device/reduce_scatter_max_f32.o /<>/build/obj/collectives/device/reduce_scatter_max_f64.o /<>/build/obj/collectives/device/reduce_scatter_max_bf16.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_i8.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_u8.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_i32.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_u32.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_i64.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_u64.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_f16.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_f32.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_f64.o /<>/build/obj/collectives/device/reduce_scatter_premulsum_bf16.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i8.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u8.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i32.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u32.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_i64.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_u64.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f16.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f32.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_f64.o /<>/build/obj/collectives/device/reduce_scatter_sumpostdiv_bf16.o /<>/build/obj/collectives/device/functions.o /<>/build/obj/collectives/device/onerank_reduce.o /<>/build/obj/collectives/device/devlink.o make[4]: Leaving directory '/<>/src/collectives/device' Linking libnccl.so.2.18.3 > /<>/build/lib/libnccl.so.2.18.3 mkdir -p /<>/build/lib Archiving libnccl_static.a > /<>/build/lib/libnccl_static.a mkdir -p /<>/build/lib cuda-g++ -DCUDA_MAJOR=12 -DCUDA_MINOR=0 -fPIC -fvisibility=hidden -Wall -Wno-unused-function -Wno-sign-compare -std=c++11 -Wvla -I /usr/include -Wdate-time -D_FORTIFY_SOURCE=2 -g -O3 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fdebug-prefix-map=/<>=/usr/src/nvidia-nccl-2.18.5-1-2 -O3 -g -Wall -Wextra -DPROFAPI -shared -Wl,--no-as-needed -Wl,-soname,libnccl.so.2 -o /<>/build/lib/libnccl.so.2.18.3 /<>/build/obj/init.o /<>/build/obj/init_nvtx.o /<>/build/obj/channel.o /<>/build/obj/bootstrap.o /<>/build/obj/transport.o /<>/build/obj/enqueue.o /<>/build/obj/group.o /<>/build/obj/debug.o /<>/build/obj/proxy.o /<>/build/obj/net.o /<>/build/obj/misc/cudawrap.o /<>/build/obj/misc/nvmlwrap.o /<>/build/obj/misc/ibvsymbols.o /<>/build/obj/misc/ibvwrap.o /<>/build/obj/misc/gdrwrap.o /<>/build/obj/misc/utils.o /<>/build/obj/misc/argcheck.o /<>/build/obj/misc/socket.o /<>/build/obj/misc/shmutils.o /<>/build/obj/misc/profiler.o /<>/build/obj/misc/param.o /<>/build/obj/misc/strongstream.o /<>/build/obj/misc/ipcsocket.o /<>/build/obj/transport/p2p.o /<>/build/obj/transport/shm.o /<>/build/obj/transport/net.o /<>/build/obj/transport/net_socket.o /<>/build/obj/transport/net_ib.o /<>/build/obj/transport/coll_net.o /<>/build/obj/transport/nvls.o /<>/build/obj/collectives/sendrecv.o /<>/build/obj/collectives/all_reduce.o /<>/build/obj/collectives/all_gather.o /<>/build/obj/collectives/broadcast.o /<>/build/obj/collectives/reduce.o /<>/build/obj/collectives/reduce_scatter.o /<>/build/obj/graph/topo.o /<>/build/obj/graph/paths.o /<>/build/obj/graph/search.o /<>/build/obj/graph/connect.o /<>/build/obj/graph/rings.o /<>/build/obj/graph/trees.o /<>/build/obj/graph/tuning.o /<>/build/obj/graph/xml.o /<>/build/obj/enhcompat.o /<>/build/obj/collectives/device/colldevice.a -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -Wl,-z,relro -Wl,-z,now -L/usr/lib64 -lcudart_static -lpthread -lrt -ldl printf "create /<>/build/lib/libnccl_static.a\naddlib /<>/build/obj/collectives/device/colldevice.a\naddmod /<>/build/obj/init.o,/<>/build/obj/init_nvtx.o,/<>/build/obj/channel.o,/<>/build/obj/bootstrap.o,/<>/build/obj/transport.o,/<>/build/obj/enqueue.o,/<>/build/obj/group.o,/<>/build/obj/debug.o,/<>/build/obj/proxy.o,/<>/build/obj/net.o,/<>/build/obj/misc/cudawrap.o,/<>/build/obj/misc/nvmlwrap.o,/<>/build/obj/misc/ibvsymbols.o,/<>/build/obj/misc/ibvwrap.o,/<>/build/obj/misc/gdrwrap.o,/<>/build/obj/misc/utils.o,/<>/build/obj/misc/argcheck.o,/<>/build/obj/misc/socket.o,/<>/build/obj/misc/shmutils.o,/<>/build/obj/misc/profiler.o,/<>/build/obj/misc/param.o,/<>/build/obj/misc/strongstream.o,/<>/build/obj/misc/ipcsocket.o,/<>/build/obj/transport/p2p.o,/<>/build/obj/transport/shm.o,/<>/build/obj/transport/net.o,/<>/build/obj/transport/net_socket.o,/<>/build/obj/transport/net_ib.o,/<>/build/obj/transport/coll_net.o,/<>/build/obj/transport/nvls.o,/<>/build/obj/collectives/sendrecv.o,/<>/build/obj/collectives/all_reduce.o,/<>/build/obj/collectives/all_gather.o,/<>/build/obj/collectives/broadcast.o,/<>/build/obj/collectives/reduce.o,/<>/build/obj/collectives/reduce_scatter.o,/<>/build/obj/graph/topo.o,/<>/build/obj/graph/paths.o,/<>/build/obj/graph/search.o,/<>/build/obj/graph/connect.o,/<>/build/obj/graph/rings.o,/<>/build/obj/graph/trees.o,/<>/build/obj/graph/tuning.o,/<>/build/obj/graph/xml.o,/<>/build/obj/enhcompat.o\nsave\nend" | ar -M transport/net.cc:59:8: warning: type ‘struct connectMapMem’ violates the C++ One Definition Rule [-Wodr] 59 | struct connectMapMem{ | ^ transport/coll_net.cc:70:8: note: a different type is defined in another translation unit 70 | struct connectMapMem{ | ^ transport/net.cc:63:15: note: the first difference of corresponding definitions is field ‘ipcDesc’ 63 | ncclIpcDesc ipcDesc; | ^ transport/coll_net.cc:70:8: note: a type with different number of fields is defined in another translation unit 70 | struct connectMapMem{ | ^ transport/net.cc:69:8: warning: type ‘struct connectMap’ violates the C++ One Definition Rule [-Wodr] 69 | struct connectMap { | ^ transport/coll_net.cc:76:8: note: a different type is defined in another translation unit 76 | struct connectMap { | ^ transport/net.cc:70:7: note: the first difference of corresponding definitions is field ‘sameProcess’ 70 | int sameProcess; | ^ transport/coll_net.cc:77:7: note: a field with different name is defined in another translation unit 77 | int shared; | ^ transport/net.cc:108:8: warning: type ‘struct recvResources’ violates the C++ One Definition Rule [-Wodr] 108 | struct recvResources { | ^ transport/coll_net.cc:113:8: note: a different type is defined in another translation unit 113 | struct recvResources { | ^ transport/net.cc:109:21: note: the first difference of corresponding definitions is field ‘map’ 109 | struct connectMap map; | ^ transport/coll_net.cc:114:21: note: a field of same name but different type is defined in another translation unit 114 | struct connectMap map; | ^ transport/net.cc:69:8: note: type ‘struct connectMap’ itself violates the C++ One Definition Rule 69 | struct connectMap { | ^ transport/coll_net.cc:76:8: note: the incompatible type is defined here 76 | struct connectMap { | ^ transport/net.cc:83:8: warning: type ‘struct sendResources’ violates the C++ One Definition Rule [-Wodr] 83 | struct sendResources { | ^ transport/coll_net.cc:93:8: note: a different type is defined in another translation unit 93 | struct sendResources { | ^ transport/net.cc:84:21: note: the first difference of corresponding definitions is field ‘map’ 84 | struct connectMap map; | ^ transport/coll_net.cc:94:21: note: a field of same name but different type is defined in another translation unit 94 | struct connectMap map; | ^ transport/net.cc:69:8: note: type ‘struct connectMap’ itself violates the C++ One Definition Rule 69 | struct connectMap { | ^ transport/coll_net.cc:76:8: note: the incompatible type is defined here 76 | struct connectMap { | ^ transport/net.cc:150:8: warning: type ‘struct setupReq’ violates the C++ One Definition Rule [-Wodr] 150 | struct setupReq { | ^ transport/coll_net.cc:140:8: note: a different type is defined in another translation unit 140 | struct setupReq { | ^ transport/net.cc:151:7: note: the first difference of corresponding definitions is field ‘tpRank’ 151 | int tpRank; | ^ transport/coll_net.cc:141:7: note: a field with different name is defined in another translation unit 141 | int netDev; | ^ graph/search.cc: In function ‘ncclTopoCompute’: graph/search.cc:823:5: warning: ‘nChannels’ may be used uninitialized [-Wmaybe-uninitialized] 823 | INFO(NCCL_GRAPH, "Search %d : %d channels loaded from XML graph", graph->id, nChannels); | ^ graph/search.cc:821:9: note: ‘nChannels’ was declared here 821 | int nChannels; | ^ In function ‘addP2pToPlan’, inlined from ‘scheduleP2pTasksToPlan’ at enqueue.cc:686:13: enqueue.cc:387:20: warning: ‘fuseOk’ may be used uninitialized [-Wmaybe-uninitialized] 387 | appendWorkElemP2p(comm, plan, channelId, &elem, fuseOk); | ^ enqueue.cc: In function ‘scheduleP2pTasksToPlan’: enqueue.cc:639:8: note: ‘fuseOk’ was declared here 639 | bool fuseOk; | ^ ln -sf libnccl.so.2 /<>/build/lib/libnccl.so ln -sf libnccl.so.2.18.3 /<>/build/lib/libnccl.so.2 make[3]: Leaving directory '/<>/src' make[2]: Leaving directory '/<>' make[1]: Leaving directory '/<>' create-stamp debian/debhelper-build-stamp dh_prep -a debian/rules override_dh_auto_install make[1]: Entering directory '/<>' PREFIX=/<>/debian/tmp dh_auto_install make -j4 install DESTDIR=/<>/debian/tmp AM_UPDATE_INFO_DIR=no "INSTALL=install --strip-program=true" make[2]: Entering directory '/<>' make -C src install BUILDDIR=/<>/build make[3]: Entering directory '/<>/src' NVCC_GENCODE is -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 make -C collectives/device make[4]: Entering directory '/<>/src/collectives/device' NVCC_GENCODE is -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 make[4]: Nothing to be done for 'all'. make[4]: Leaving directory '/<>/src/collectives/device' mkdir -p /<>/debian/tmp/lib mkdir -p /<>/debian/tmp/lib/pkgconfig mkdir -p /<>/debian/tmp/include cp -P -v /<>/build/lib/lib* /<>/debian/tmp/lib/ '/<>/build/lib/libnccl.so' -> '/<>/debian/tmp/lib/libnccl.so' '/<>/build/lib/libnccl.so.2' -> '/<>/debian/tmp/lib/libnccl.so.2' '/<>/build/lib/libnccl.so.2.18.3' -> '/<>/debian/tmp/lib/libnccl.so.2.18.3' '/<>/build/lib/libnccl_static.a' -> '/<>/debian/tmp/lib/libnccl_static.a' cp -P -v /<>/build/lib/pkgconfig/* /<>/debian/tmp/lib/pkgconfig/ '/<>/build/lib/pkgconfig/nccl.pc' -> '/<>/debian/tmp/lib/pkgconfig/nccl.pc' cp -v /<>/build/include/* /<>/debian/tmp/include/ '/<>/build/include/nccl.h' -> '/<>/debian/tmp/include/nccl.h' '/<>/build/include/nccl_net.h' -> '/<>/debian/tmp/include/nccl_net.h' make[3]: Leaving directory '/<>/src' make[2]: Leaving directory '/<>' make[1]: Leaving directory '/<>' dh_install -a dh_installdocs -a dh_installchangelogs -a dh_perl -a dh_link -a dh_strip_nondeterminism -a dh_compress -a dh_fixperms -a dh_missing -a dh_dwz -a dh_strip -a c00973f1c782264d1212ee4f69dd22ac285936bb dh_makeshlibs -a dh_shlibdeps -a dh_installdeb -a dh_gencontrol -a dpkg-gencontrol: warning: Depends field of package libnccl-dev: substitution variable ${shlibs:Depends} used, but is not defined dh_md5sums -a dh_builddeb -a INFO: pkgstriptranslations version 154 INFO: pkgstriptranslations version 154 INFO: pkgstriptranslations version 154 pkgstriptranslations: processing libnccl2 (in debian/libnccl2); do_strip: , oemstrip: pkgstriptranslations: processing libnccl-dev (in debian/libnccl-dev); do_strip: , oemstrip: pkgstriptranslations: processing libnccl2-dbgsym (in debian/.debhelper/libnccl2/dbgsym-root); do_strip: , oemstrip: pkgmaintainermangler: Maintainer field overridden to "Ubuntu Developers " pkgmaintainermangler: Maintainer field overridden to "Ubuntu Developers " pkgstripfiles: processing control file: debian/libnccl2/DEBIAN/control, package libnccl2, directory debian/libnccl2 pkgstripfiles: Running PNG optimization (using 4 cpus) for package libnccl2 ... pkgstripfiles: No PNG files. dpkg-deb: building package 'libnccl2' in '../libnccl2_2.18.5-1-2_ppc64el.deb'. pkgstripfiles: processing control file: debian/.debhelper/libnccl2/dbgsym-root/DEBIAN/control, package libnccl2-dbgsym, directory debian/.debhelper/libnccl2/dbgsym-root dpkg-deb: building package 'libnccl2-dbgsym' in 'debian/.debhelper/scratch-space/build-libnccl2/libnccl2-dbgsym_2.18.5-1-2_ppc64el.deb'. Renaming libnccl2-dbgsym_2.18.5-1-2_ppc64el.deb to libnccl2-dbgsym_2.18.5-1-2_ppc64el.ddeb pkgmaintainermangler: Maintainer field overridden to "Ubuntu Developers " pkgstripfiles: processing control file: debian/libnccl-dev/DEBIAN/control, package libnccl-dev, directory debian/libnccl-dev Searching for duplicated docs in dependency libnccl2... symlinking changelog.Debian.gz in libnccl-dev to file in libnccl2 pkgstripfiles: Running PNG optimization (using 4 cpus) for package libnccl-dev ... pkgstripfiles: No PNG files. dpkg-deb: building package 'libnccl-dev' in '../libnccl-dev_2.18.5-1-2_ppc64el.deb'. dpkg-genbuildinfo --build=any -O../nvidia-nccl_2.18.5-1-2_ppc64el.buildinfo dpkg-genchanges --build=any -mLaunchpad Build Daemon -O../nvidia-nccl_2.18.5-1-2_ppc64el.changes dpkg-genchanges: info: binary-only arch-specific upload (source code and arch-indep packages not included) dpkg-source --after-build . dpkg-buildpackage: info: binary-only upload (no source included) -------------------------------------------------------------------------------- Build finished at 2023-11-12T05:50:39Z Finished -------- I: Built successfully +------------------------------------------------------------------------------+ | Changes | +------------------------------------------------------------------------------+ nvidia-nccl_2.18.5-1-2_ppc64el.changes: --------------------------------------- Format: 1.8 Date: Sun, 12 Nov 2023 02:15:42 +0100 Source: nvidia-nccl Binary: libnccl-dev libnccl2 Built-For-Profiles: noudeb Architecture: ppc64el Version: 2.18.5-1-2 Distribution: noble-proposed Urgency: medium Maintainer: Launchpad Build Daemon Changed-By: Andreas Beckmann Description: libnccl-dev - NVIDIA Optimized primitives for inter-GPU communication (developm libnccl2 - NVIDIA Optimized primitives for inter-GPU communication Changes: nvidia-nccl (2.18.5-1-2) unstable; urgency=medium . * Team upload. * ci: Let blhc ignore some nonverbose lines. * Add upstream metadata. * With debhelper-compat 13 there is no more need for dh-exec. * Bump Standards-Version to 4.6.2, no changes needed. * Fix or override some issues found by Lintian. * Enable more hardening flags. Checksums-Sha1: 86487892a64d52ac9ea9edd6ca65237cf73841dd 98335988 libnccl-dev_2.18.5-1-2_ppc64el.deb a2f47947f1e351f2e00d367fab83a9e882bb1a48 1067182 libnccl2-dbgsym_2.18.5-1-2_ppc64el.ddeb 8fdee438391d29616a7407b7bf35b08bfc96dbfe 100329902 libnccl2_2.18.5-1-2_ppc64el.deb 61ff185672b3ababf77ec3c8d7cbd64d32f052c5 7795 nvidia-nccl_2.18.5-1-2_ppc64el.buildinfo Checksums-Sha256: 592b596187f954f13459b7a56059266141f4895052f1281f4518ddedaf533318 98335988 libnccl-dev_2.18.5-1-2_ppc64el.deb 264bedc57ab06b4887df53e0fbfaea91e1f0a297ebb943a2ece78ba856e4525d 1067182 libnccl2-dbgsym_2.18.5-1-2_ppc64el.ddeb 48b6d227d9ed421d143cf1c33f5531225d6c3b086077c6f48efbd714a9aca14b 100329902 libnccl2_2.18.5-1-2_ppc64el.deb 9dea3ded6be5ae248bd2605cd889daf72cc55e4ddaf64efc592311a23d244097 7795 nvidia-nccl_2.18.5-1-2_ppc64el.buildinfo Files: 8fbbc54ef187eaddab71023f4a5297a0 98335988 contrib/libdevel optional libnccl-dev_2.18.5-1-2_ppc64el.deb 05e3baf0a4d3628a890f0c681cc881eb 1067182 contrib/debug optional libnccl2-dbgsym_2.18.5-1-2_ppc64el.ddeb 6eca7d4b40643afc113d9aa5be922bfc 100329902 contrib/libs optional libnccl2_2.18.5-1-2_ppc64el.deb ebe3700b67c0ea0fc8ec4f6fc573901b 7795 contrib/libs optional nvidia-nccl_2.18.5-1-2_ppc64el.buildinfo /<>/nvidia-nccl_2.18.5-1-2_ppc64el.changes.new could not be renamed to /<>/nvidia-nccl_2.18.5-1-2_ppc64el.changes: Illegal seek Distribution field may be wrong!!! +------------------------------------------------------------------------------+ | Buildinfo | +------------------------------------------------------------------------------+ Format: 1.0 Source: nvidia-nccl Binary: libnccl-dev libnccl2 libnccl2-dbgsym Architecture: ppc64el Version: 2.18.5-1-2 Checksums-Md5: 8fbbc54ef187eaddab71023f4a5297a0 98335988 libnccl-dev_2.18.5-1-2_ppc64el.deb 05e3baf0a4d3628a890f0c681cc881eb 1067182 libnccl2-dbgsym_2.18.5-1-2_ppc64el.ddeb 6eca7d4b40643afc113d9aa5be922bfc 100329902 libnccl2_2.18.5-1-2_ppc64el.deb Checksums-Sha1: 86487892a64d52ac9ea9edd6ca65237cf73841dd 98335988 libnccl-dev_2.18.5-1-2_ppc64el.deb a2f47947f1e351f2e00d367fab83a9e882bb1a48 1067182 libnccl2-dbgsym_2.18.5-1-2_ppc64el.ddeb 8fdee438391d29616a7407b7bf35b08bfc96dbfe 100329902 libnccl2_2.18.5-1-2_ppc64el.deb Checksums-Sha256: 592b596187f954f13459b7a56059266141f4895052f1281f4518ddedaf533318 98335988 libnccl-dev_2.18.5-1-2_ppc64el.deb 264bedc57ab06b4887df53e0fbfaea91e1f0a297ebb943a2ece78ba856e4525d 1067182 libnccl2-dbgsym_2.18.5-1-2_ppc64el.ddeb 48b6d227d9ed421d143cf1c33f5531225d6c3b086077c6f48efbd714a9aca14b 100329902 libnccl2_2.18.5-1-2_ppc64el.deb Build-Origin: Ubuntu Build-Architecture: ppc64el Build-Date: Sun, 12 Nov 2023 05:50:36 +0000 Build-Path: /<> Build-Tainted-By: merged-usr-via-aliased-dirs usr-local-has-programs Installed-Build-Depends: autoconf (= 2.71-3), automake (= 1:1.16.5-1.3), autopoint (= 0.21-13build1), autotools-dev (= 20220109.1), base-files (= 13ubuntu4), base-passwd (= 3.6.2), bash (= 5.2.15-2ubuntu1), binutils (= 2.41-6ubuntu1), binutils-common (= 2.41-6ubuntu1), binutils-powerpc64le-linux-gnu (= 2.41-6ubuntu1), bsdextrautils (= 2.39.1-4ubuntu2), bsdutils (= 1:2.39.1-4ubuntu2), build-essential (= 12.10ubuntu1), bzip2 (= 1.0.8-5build1), coreutils (= 9.1-1ubuntu2), cpp (= 4:13.2.0-1ubuntu1), cpp-12 (= 12.3.0-11ubuntu1), cpp-13 (= 13.2.0-6ubuntu1), dash (= 0.5.12-6ubuntu1), debconf (= 1.5.82), debhelper (= 13.11.7ubuntu1), debianutils (= 5.14), debugedit (= 1:5.0-5), dh-autoreconf (= 20), dh-strip-nondeterminism (= 1.13.1-1), diffutils (= 1:3.10-1), dpkg (= 1.22.1ubuntu2), dpkg-dev (= 1.22.1ubuntu2), dwz (= 0.15-1), file (= 1:5.45-2), findutils (= 4.9.0-5), g++ (= 4:13.2.0-1ubuntu1), g++-12 (= 12.3.0-11ubuntu1), g++-13 (= 13.2.0-6ubuntu1), gcc (= 4:13.2.0-1ubuntu1), gcc-12 (= 12.3.0-11ubuntu1), gcc-12-base (= 12.3.0-11ubuntu1), gcc-13 (= 13.2.0-6ubuntu1), gcc-13-base (= 13.2.0-6ubuntu1), gettext (= 0.21-13build1), gettext-base (= 0.21-13build1), grep (= 3.11-3), groff-base (= 1.23.0-3), gzip (= 1.12-1ubuntu1), hostname (= 3.23+nmu1ubuntu1), init-system-helpers (= 1.65.2ubuntu1), intltool-debian (= 0.35.0+20060710.6), libaccinj64-12.0 (= 12.0.146~12.0.1-3), libacl1 (= 2.3.1-3), libarchive-zip-perl (= 1.68-1), libasan8 (= 13.2.0-6ubuntu1), libatomic1 (= 13.2.0-6ubuntu1), libattr1 (= 1:2.5.1-4), libaudit-common (= 1:3.1.1-1build1), libaudit1 (= 1:3.1.1-1build1), libbinutils (= 2.41-6ubuntu1), libblkid1 (= 2.39.1-4ubuntu2), libbz2-1.0 (= 1.0.8-5build1), libc-bin (= 2.38-3ubuntu1), libc-dev-bin (= 2.38-3ubuntu1), libc6 (= 2.38-3ubuntu1), libc6-dev (= 2.38-3ubuntu1), libcap-ng0 (= 0.8.3-1build3), libcap2 (= 1:2.66-4ubuntu1), libcc1-0 (= 13.2.0-6ubuntu1), libcom-err2 (= 1.47.0-2ubuntu1), libcrypt-dev (= 1:4.4.36-2), libcrypt1 (= 1:4.4.36-2), libctf-nobfd0 (= 2.41-6ubuntu1), libctf0 (= 2.41-6ubuntu1), libcu++-dev (= 1.9.0-3), libcub-dev (= 2.0.1-2), libcublas12 (= 12.0.2.224~12.0.1-3), libcublaslt12 (= 12.0.2.224~12.0.1-3), libcudart12 (= 12.0.146~12.0.1-3), libcufft11 (= 11.0.1.95~12.0.1-3), libcufftw11 (= 11.0.1.95~12.0.1-3), libcuinj64-12.0 (= 12.0.146~12.0.1-3), libcupti-dev (= 12.0.146~12.0.1-3), libcupti12 (= 12.0.146~12.0.1-3), libcurand10 (= 11.1.1+~10.3.1.124~12.0.1-3), libcusolver11 (= 11.4.3.1~12.0.1-3), libcusolvermg11 (= 11.4.3.1~12.0.1-3), libcusparse12 (= 12.0.1.140~12.0.1-3), libdb5.3 (= 5.3.28+dfsg2-4), libdebconfclient0 (= 0.270ubuntu1), libdebhelper-perl (= 13.11.7ubuntu1), libdpkg-perl (= 1.22.1ubuntu2), libdw1 (= 0.189-4), libelf1 (= 0.189-4), libfile-stripnondeterminism-perl (= 1.13.1-1), libgcc-12-dev (= 12.3.0-11ubuntu1), libgcc-13-dev (= 13.2.0-6ubuntu1), libgcc-s1 (= 13.2.0-6ubuntu1), libgcrypt20 (= 1.10.2-3ubuntu1), libgdbm-compat4 (= 1.23-3), libgdbm6 (= 1.23-3), libgmp10 (= 2:6.3.0+dfsg-2ubuntu4), libgomp1 (= 13.2.0-6ubuntu1), libgpg-error0 (= 1.47-2), libgssapi-krb5-2 (= 1.20.1-3ubuntu1), libicu72 (= 72.1-3ubuntu3), libisl23 (= 0.26-3), libitm1 (= 13.2.0-6ubuntu1), libjansson4 (= 2.14-2), libk5crypto3 (= 1.20.1-3ubuntu1), libkeyutils1 (= 1.6.3-2), libkrb5-3 (= 1.20.1-3ubuntu1), libkrb5support0 (= 1.20.1-3ubuntu1), liblsan0 (= 13.2.0-6ubuntu1), liblz4-1 (= 1.9.4-1), liblzma5 (= 5.4.4-0.1), libmagic-mgc (= 1:5.45-2), libmagic1 (= 1:5.45-2), libmd0 (= 1.1.0-1), libmount1 (= 2.39.1-4ubuntu2), libmpc3 (= 1.3.1-1), libmpfr6 (= 4.2.1-1), libnppc12 (= 12.0.1.104~12.0.1-3), libnppial12 (= 12.0.1.104~12.0.1-3), libnppicc12 (= 12.0.1.104~12.0.1-3), libnppidei12 (= 12.0.1.104~12.0.1-3), libnppif12 (= 12.0.1.104~12.0.1-3), libnppig12 (= 12.0.1.104~12.0.1-3), libnppim12 (= 12.0.1.104~12.0.1-3), libnppist12 (= 12.0.1.104~12.0.1-3), libnppisu12 (= 12.0.1.104~12.0.1-3), libnppitc12 (= 12.0.1.104~12.0.1-3), libnpps12 (= 12.0.1.104~12.0.1-3), libnsl-dev (= 1.3.0-3), libnsl2 (= 1.3.0-3), libnvblas12 (= 12.0.2.224~12.0.1-3), libnvidia-ml-dev (= 12.0.140~12.0.1-3), libnvjitlink12 (= 12.0.140~12.0.1-3), libnvjpeg12 (= 12.0.1.102~12.0.1-3), libnvrtc-builtins12.0 (= 12.0.140~12.0.1-3), libnvrtc12 (= 12.0.140~12.0.1-3), libnvtoolsext1 (= 12.0.140~12.0.1-3), libnvvm4 (= 12.0.140~12.0.1-3), libpam-modules (= 1.5.2-6ubuntu1), libpam-modules-bin (= 1.5.2-6ubuntu1), libpam-runtime (= 1.5.2-6ubuntu1), libpam0g (= 1.5.2-6ubuntu1), libpcre2-8-0 (= 10.42-4), libperl5.36 (= 5.36.0-9ubuntu1), libpipeline1 (= 1.5.7-1), libquadmath0 (= 13.2.0-6ubuntu1), libseccomp2 (= 2.5.4-2ubuntu1), libselinux1 (= 3.5-1build1), libsframe1 (= 2.41-6ubuntu1), libsmartcols1 (= 2.39.1-4ubuntu2), libssl3 (= 3.0.10-1ubuntu2.1), libstdc++-12-dev (= 12.3.0-11ubuntu1), libstdc++-13-dev (= 13.2.0-6ubuntu1), libstdc++6 (= 13.2.0-6ubuntu1), libsub-override-perl (= 0.09-4), libsystemd0 (= 253.5-1ubuntu7), libthrust-dev (= 2.0.1-2), libtinfo6 (= 6.4+20231016-1), libtirpc-common (= 1.3.3+ds-1), libtirpc-dev (= 1.3.3+ds-1), libtirpc3 (= 1.3.3+ds-1), libtool (= 2.4.7-7), libtsan2 (= 13.2.0-6ubuntu1), libubsan1 (= 13.2.0-6ubuntu1), libuchardet0 (= 0.0.7-1build2), libudev1 (= 253.5-1ubuntu7), libunistring5 (= 1.1-2), libuuid1 (= 2.39.1-4ubuntu2), libxml2 (= 2.9.14+dfsg-1.3build1), libzstd1 (= 1.5.5+dfsg2-2), linux-libc-dev (= 6.5.0-9.9), login (= 1:4.13+dfsg1-1ubuntu1), lto-disabled-list (= 43), m4 (= 1.4.19-4), make (= 4.3-4.1build1), man-db (= 2.12.0-1), mawk (= 1.3.4.20230808-1), ncurses-base (= 6.4+20231016-1), ncurses-bin (= 6.4+20231016-1), nvidia-cuda-dev (= 12.0.146~12.0.1-3), nvidia-cuda-toolkit (= 12.0.140~12.0.1-3), nvidia-cuda-toolkit-gcc (= 12.0.1-3), nvidia-opencl-dev (= 12.0.140~12.0.1-3), nvidia-profiler (= 12.0.146~12.0.1-3), ocl-icd-libopencl1 (= 2.3.2-1), ocl-icd-opencl-dev (= 2.3.2-1), opencl-c-headers (= 3.0~2023.04.17-1), opencl-clhpp-headers (= 3.0~2023.04.17-2ubuntu1), patch (= 2.7.6-7build2), perl (= 5.36.0-9ubuntu1), perl-base (= 5.36.0-9ubuntu1), perl-modules-5.36 (= 5.36.0-9ubuntu1), po-debconf (= 1.0.21+nmu1), rpcsvc-proto (= 1.4.2-0ubuntu6), sed (= 4.9-1), sensible-utils (= 0.0.20), sysvinit-utils (= 3.07-1ubuntu1), tar (= 1.34+dfsg-1.2ubuntu1), util-linux (= 2.39.1-4ubuntu2), xz-utils (= 5.4.4-0.1), zlib1g (= 1:1.2.13.dfsg-1ubuntu5) Environment: DEB_BUILD_OPTIONS="parallel=4" DEB_BUILD_PROFILES="noudeb" LANG="C.UTF-8" LC_ALL="C.UTF-8" SOURCE_DATE_EPOCH="1699751742" +------------------------------------------------------------------------------+ | Package contents | +------------------------------------------------------------------------------+ libnccl-dev_2.18.5-1-2_ppc64el.deb ---------------------------------- new Debian package, version 2.0. size 98335988 bytes: control archive=998 bytes. 1099 bytes, 23 lines control 343 bytes, 5 lines md5sums Package: libnccl-dev Source: nvidia-nccl Version: 2.18.5-1-2 Architecture: ppc64el Maintainer: Ubuntu Developers Original-Maintainer: Debian NVIDIA Maintainers Installed-Size: 294328 Depends: libnccl2 (= 2.18.5-1-2) Provides: libnccl.so Section: contrib/libdevel Priority: optional Homepage: https://github.com/NVIDIA/nccl Description: NVIDIA Optimized primitives for inter-GPU communication (development) NCCL (pronounced "Nickel") is a stand-alone library of standard communication routines for GPUs, implementing all-reduce, all-gather, reduce, broadcast, reduce-scatter, as well as any send/receive based communication pattern. It has been optimized to achieve high bandwidth on platforms using PCIe, NVLink, NVswitch, as well as networking using InfiniBand Verbs or TCP/IP sockets. NCCL supports an arbitrary number of GPUs installed in a single node or across multiple nodes, and can be used in either single- or multi-process (e.g., MPI) applications. . This package contains the development files. drwxr-xr-x root/root 0 2023-11-12 01:15 ./ drwxr-xr-x root/root 0 2023-11-12 01:15 ./usr/ drwxr-xr-x root/root 0 2023-11-12 01:15 ./usr/include/ -rw-r--r-- root/root 17843 2023-11-12 01:15 ./usr/include/nccl.h -rw-r--r-- root/root 17159 2023-11-12 01:15 ./usr/include/nccl_net.h drwxr-xr-x root/root 0 2023-11-12 01:15 ./usr/lib/ drwxr-xr-x root/root 0 2023-11-12 01:15 ./usr/lib/powerpc64le-linux-gnu/ lrwxrwxrwx root/root 0 2023-11-12 01:15 ./usr/lib/powerpc64le-linux-gnu/libnccl.so -> libnccl.so.2 -rw-r--r-- root/root 301336776 2023-11-12 01:15 ./usr/lib/powerpc64le-linux-gnu/libnccl_static.a drwxr-xr-x root/root 0 2023-11-12 01:15 ./usr/lib/powerpc64le-linux-gnu/pkgconfig/ -rw-r--r-- root/root 245 2023-11-12 01:15 ./usr/lib/powerpc64le-linux-gnu/pkgconfig/nccl.pc drwxr-xr-x root/root 0 2023-11-12 01:15 ./usr/share/ drwxr-xr-x root/root 0 2023-11-12 01:15 ./usr/share/doc/ drwxr-xr-x root/root 0 2023-11-12 01:15 ./usr/share/doc/libnccl-dev/ lrwxrwxrwx root/root 0 2023-11-12 01:15 ./usr/share/doc/libnccl-dev/changelog.Debian.gz -> ../libnccl2/changelog.Debian.gz -rw-r--r-- root/root 4498 2023-11-12 01:15 ./usr/share/doc/libnccl-dev/copyright libnccl2_2.18.5-1-2_ppc64el.deb ------------------------------- new Debian package, version 2.0. size 100329902 bytes: control archive=1413 bytes. 1112 bytes, 23 lines control 226 bytes, 3 lines md5sums 33 bytes, 1 lines shlibs 1643 bytes, 56 lines symbols 75 bytes, 2 lines triggers Package: libnccl2 Source: nvidia-nccl Version: 2.18.5-1-2 Architecture: ppc64el Maintainer: Ubuntu Developers Original-Maintainer: Debian NVIDIA Maintainers Installed-Size: 285859 Depends: libc6 (>= 2.38), libgcc-s1 (>= 3.3.1), libstdc++6 (>= 4.8) Provides: libnccl.so.2 Section: contrib/libs Priority: optional Homepage: https://github.com/NVIDIA/nccl Description: NVIDIA Optimized primitives for inter-GPU communication NCCL (pronounced "Nickel") is a stand-alone library of standard communication routines for GPUs, implementing all-reduce, all-gather, reduce, broadcast, reduce-scatter, as well as any send/receive based communication pattern. It has been optimized to achieve high bandwidth on platforms using PCIe, NVLink, NVswitch, as well as networking using InfiniBand Verbs or TCP/IP sockets. NCCL supports an arbitrary number of GPUs installed in a single node or across multiple nodes, and can be used in either single- or multi-process (e.g., MPI) applications. . This package contains the shared objects. drwxr-xr-x root/root 0 2023-11-12 01:15 ./ drwxr-xr-x root/root 0 2023-11-12 01:15 ./usr/ drwxr-xr-x root/root 0 2023-11-12 01:15 ./usr/lib/ drwxr-xr-x root/root 0 2023-11-12 01:15 ./usr/lib/powerpc64le-linux-gnu/ lrwxrwxrwx root/root 0 2023-11-12 01:15 ./usr/lib/powerpc64le-linux-gnu/libnccl.so.2 -> libnccl.so.2.18.3 -rw-r--r-- root/root 292700080 2023-11-12 01:15 ./usr/lib/powerpc64le-linux-gnu/libnccl.so.2.18.3 drwxr-xr-x root/root 0 2023-11-12 01:15 ./usr/share/ drwxr-xr-x root/root 0 2023-11-12 01:15 ./usr/share/doc/ drwxr-xr-x root/root 0 2023-11-12 01:15 ./usr/share/doc/libnccl2/ -rw-r--r-- root/root 701 2023-11-12 01:15 ./usr/share/doc/libnccl2/changelog.Debian.gz -rw-r--r-- root/root 4498 2023-11-12 01:15 ./usr/share/doc/libnccl2/copyright +------------------------------------------------------------------------------+ | Post Build | +------------------------------------------------------------------------------+ +------------------------------------------------------------------------------+ | Cleanup | +------------------------------------------------------------------------------+ Purging /<> Not removing build depends: as requested +------------------------------------------------------------------------------+ | Summary | +------------------------------------------------------------------------------+ Build Architecture: ppc64el Build Type: any Build-Space: 2621224 Build-Time: 2389 Distribution: noble-proposed Host Architecture: ppc64el Install-Time: 94 Job: nvidia-nccl_2.18.5-1-2.dsc Machine Architecture: ppc64el Package: nvidia-nccl Package-Time: 2483 Source-Version: 2.18.5-1-2 Space: 2621224 Status: successful Version: 2.18.5-1-2 -------------------------------------------------------------------------------- Finished at 2023-11-12T05:50:39Z Build needed 00:41:23, 2621224k disk space RUN: /usr/share/launchpad-buildd/bin/in-target scan-for-processes --backend=chroot --series=noble --arch=ppc64el PACKAGEBUILD-26988716 Scanning for processes to kill in build PACKAGEBUILD-26988716