nvidia-cutlass 3.1.0+ds-2 source package in Ubuntu

Changelog

nvidia-cutlass (3.1.0+ds-2) unstable; urgency=medium

  * Upload to unstable.

 -- Mo Zhou <email address hidden>  Wed, 28 Feb 2024 12:07:52 -0500

Upload details

Uploaded by:
Debian NVIDIA Maintainers
Uploaded to:
Sid
Original maintainer:
Debian NVIDIA Maintainers
Architectures:
all
Section:
misc
Urgency:
Medium Urgency

See full publishing history Publishing

Series Pocket Published Component Section
Oracular release multiverse misc
Noble release multiverse misc

Builds

Noble: [FULLYBUILT] amd64

Downloads

File Size SHA-256 Checksum
nvidia-cutlass_3.1.0+ds-2.dsc 2.0 KiB 4b0c7a34b417bc3ca6bf84f3297b4b8033a1a5fd7f7e8cf78af7f628972b3a40
nvidia-cutlass_3.1.0+ds.orig.tar.xz 11.6 MiB 21771ad5a3ee51083e5ce38f079e9efbf9d5cb34d297756d3cbfb0673b93da59
nvidia-cutlass_3.1.0+ds-2.debian.tar.xz 3.1 KiB e5222b546da1f34191ce9bdf2d12490e9279d4263443616cbd72feb3aceb9da1

Available diffs

No changes file available.

Binary packages built by this source

libcutlass-dev: CUDA Templates for Linear Algebra Subroutines

 CUTLASS is a collection of CUDA C++ template abstractions for implementing
 high-performance matrix-matrix multiplication (GEMM) and related computations
 at all levels and scales within CUDA. It incorporates strategies for
 hierarchical decomposition and data movement similar to those used to implement
 cuBLAS and cuDNN. CUTLASS decomposes these "moving parts" into reusable,
 modular software components abstracted by C++ template classes. Primitives for
 different levels of a conceptual parallelization hierarchy can be specialized
 and tuned via custom tiling sizes, data types, and other algorithmic policy.
 The resulting flexibility simplifies their use as building blocks within custom
 kernels and applications.
 .
 To support a wide variety of applications, CUTLASS provides extensive support
 for mixed-precision computations, providing specialized data-movement and
 multiply-accumulate abstractions for half-precision floating point (FP16),
 BFloat16 (BF16), Tensor Float 32 (TF32), single-precision floating point
 (FP32), FP32 emulation via tensor core instruction, double-precision
 floating point (FP64) types, integer data types (4b and 8b), and binary
 data types (1b). CUTLASS demonstrates warp-synchronous matrix multiply
 operations targeting the programmable, high-throughput Tensor Cores
 implemented by NVIDIA's Volta, Turing, Ampere, and Hopper architectures.
 .
 This is a header-only library.