Building¶
Kokkos Kernels is a stand-alone library in the Kokkos Ecosystem, as well as a package within the Trilinos Project.
Building Kokkos Kernels as a stand-alone library requires CMake. Using Kokkos Kernels as a package within Trilinos additionally requires the TriBITS build system.
General Requirements¶
Compatible versions of Kokkos and Kokkos Kernels cloned or downloaded from https://github.com/kokkos/kokkos-kernels.
Supported compiler and computing hardware - see Kokkos’ README for what is currently tested.
CUDA builds require use of the
nvcc_wrapper
script provided by Kokkos, unless using Clang-CUDA.
Basic steps for building stand-alone Kokkos Kernels:¶
Create a build directory
<BUILD_DIR>
(different from source and install locations).
> mkdir <BUILD_DIR>
cd <BUILD_DIR>
Run
cmake -S ${SOURCE_DIR} <...>
whereSOURCE_DIR
is the location of the Kokkos Kernels source.<...>
is a list of CMake options given as-D{OPTION}={VALUE}
Build and install the library, depending on the generator (make is default):
> make install
or
> ninja install
To use the Ninja build system add -G Ninja
.
A full listing of CMake options is given below. Another way to get a list of options and documentation is to the use ccmake
utility.
ccmake <SOURCE_DIR>
which brings up a user interface listing all the options, their default values, and associated documentation.
Sample CMake¶
Below is a list of example CMake configurations. Kokkos Kernels requires first building Kokkos (or including it as a subproject). To see a full list of options for building Kokkos, see [BUILD](https://github.com/kokkos/kokkos/blob/master/BUILD.md)
OpenMP backend, g++ compiler, Intel Skylake architecture:¶
First install Kokkos:
> cmake \
-DCMAKE_CXX_COMPILER=g++ \
-DCMAKE_INSTALL_PREFIX=${HOME}/kokkos-install \
-DKokkos_ENABLE_OPENMP=ON \
-DKokkos_ARCH_SKX=ON \
<KOKKOS_SOURCE>
> make install
Then build Kokkos Kernels, pointing to Kokkos:
> cmake \
-S <KOKKOS_KERNELS_SOURCE> \
-B <KOKKOS_KERNELS_BUILD_DIRECTORY> \
-DKokkos_ROOT=${HOME}/kokkos-install \
-DCMAKE_CXX_COMPILER=g++
> cmake --build <KOKKOS_KERNELS_BUILD_DIRECTORY> --parallel
Cuda and Serial backends, nvcc_wrapper compiler, Power8 and Volta sm_70 architectures, various compilation flags
First install Kokkos:
> cmake \
-S <KOKKOS_SOURCE> \
-B <KOKKOS_BUILD_DIR> \
-DCMAKE_CXX_COMPILER=<KOKKOS_SOURCE>/bin/nvcc_wrapper \
-DCMAKE_INSTALL_PREFIX=${HOME}/kokkos-install \
-DKokkos_ENABLE_CUDA=ON \
-DKokkos_ENABLE_SERIAL=ON \
-DKokkos_ARCH_VOLTA70=ON \
-DKokkos_ARCH_POWER8=ON
> cmake --build <KOKKOS_BUILD_DIR> --parallel
> cmake --install <KOKKOS_BUILD_DIR>
Then build Kokkos Kernels, pointing to Kokkos:
> cmake \
-S <KOKKOS_KERNELS_SOURCE> \
-B <KOKKOS_KERNELS_BUILD_DIR> \
-DKokkos_ROOT=${HOME}/kokkos-install \
-DCMAKE_CXX_COMPILER=${HOME}/kokkos-install/bin/nvcc_wrapper \
> cmake --build <KOKKOS_KERNELS_BUILD_DIR>
If you wish to enable certain CUDA third-party libraries (TPLs), you can also configure with
> cmake \
-S <KOKKOS_KERNELS_SOURCE> \
-B <KOKKOS_KERNELS_BUILD_DIR> \
-DKokkos_ROOT=${HOME}/kokkos-install \
-DCMAKE_CXX_COMPILER=${HOME}/kokkos-install/bin/nvcc_wrapper \
-DKokkosKernels_ENABLE_TPL_CUSPARSE=ON
Required¶
CMake >= 3.10
Compatible compiler and hardware
#### Trilinos
If building with Trilinos, the same set of CMake options apply. The only difference is you must enable KokkosKernels:
> cmake \
-D Trilinos_ENABLE_KokkosKernels:BOOL=ON \
...
## Running tests:
Note, no tests will be available unless -DKokkosKernels_ENABLE_TESTS=ON
is in your cmake command.
To run the tests, simply execute a CMake build and then run:
> make test
To limit the tests, one can cd
into either unit_test
or perf_test
and also run make test
. To show full detail of all tests, you can run ctest --extra-verbose
.
You can filter exactly which tests are run based on regular expressions with
> ctest -R <match_string>
Tests are grouped into individual executables. You can run the executable for one of the enabled backends based on your configuration, for example if OpenMP is enabled:
> ./KokkosKernels_UnitTest_OpenMP
To run a specific test in the executable use the --gtest_filter
flag:
> ./KokkosKernels_UnitTest_OpenMP --gtest_filter=openmp.dot_double`
Explicit Template Instantiation (ETI) in KokkosKernels¶
Explicit Template Instantiation (ETI) in KokkosKernels is a performance and compile-time optimization strategy. It controls where and how much template code gets instantiated—especially important given that KokkosKernels is a heavily templated C++ library where kernels depend on execution space, memory space, and scalar type.
Why ETI matters¶
Avoids redundant instantiations across translation units, which otherwise leads to: - Long compile times - Excessive object file sizes - Link-time bloat
Improves performance by instantiating only specific kernels for combinations of
ExecutionSpace
,MemorySpace
,ScalarType
, etc., that you actually use.
How it works¶
ETI splits compilation into:
Library mode: Precompile and explicitly instantiate selected template combinations in the KokkosKernels library itself (using
.cpp
files). These go into the final.a
or.so
library.Non-ETI (Header-only) mode: Templates are instantiated wherever they are used. This works but explodes compile/link time.
In ETI mode:
You tell CMake which combinations (e.g.,
Kokkos::CudaSpace + float
,Kokkos::OpenMP + double
) you want pre-instantiated.Those combinations are explicitly instantiated in special
.cpp
files like:template struct KokkosSparse::spmv_struct<Kokkos::CudaSpace, float>;
CMake compiles only these
.cpp
files for your platform.Your app reuses the precompiled kernels, skipping redundant template instantiation.
How to use it in CMake¶
Relevant flags:
KokkosKernels_ENABLE_EXPLICIT_INSTANTIATION=ON
KokkosKernels_INST_DOUBLE=ON
,KokkosKernels_INST_FLOAT=OFF
, etc.KokkosKernels_INST_CUDA=ON
,KokkosKernels_INST_OPENMP=OFF
, etc.
These together tell the build system:
“Only explicitly instantiate KokkosKernels templates for combinations of (execution space × scalar type × layout) that I need.”
When to disable ETI?¶
Set KokkosKernels_ENABLE_EXPLICIT_INSTANTIATION=OFF
if:
You’re experimenting with types or devices that aren’t in the prebuilt list
You want to keep everything header-only for simplicity or portability
You’re on an exotic architecture or compiler and want to test things locally
How ETI appears in the code¶
You’ll see files like:
src/impl/generated_specializations_cpp/spmv_double_int_int_execspace_MemSpace.cpp
Each file contains template declarations for the pre-instantiated types.