blaspp
BLAS++ provides a modern C++ interface to the Basic Linear Algebra Subroutines (BLAS), supporting both CPU and GPU execution.
Features
Type-generic templates: Single API for
float,double,std::complex<float>,std::complex<double>Multiple backends: Reference C++, vendor BLAS (MKL, OpenBLAS), GPU (cuBLAS, rocBLAS, SYCL)
Modern C++: C++11/14 features, strong typing,
std::complexPerformance counters: Optional PAPI integration
Device support: Asynchronous GPU operations
Organization
BLAS++ operations are organized by level:
Level 1: Vector-vector operations (axpy, dot, nrm2, scal, etc.)
Level 2: Matrix-vector operations (gemv, ger, trmv, etc.)
Level 3: Matrix-matrix operations (gemm, trmm, herk, etc.)
Contents
Quick Reference
Level 1 BLAS (Vector-Vector)
Function |
Operation |
|---|---|
|
Sum of absolute values: \(\sum |x_i|\) |
|
Vector plus scaled vector: \(y = \alpha x + y\) |
|
Copy vector: \(y = x\) |
|
Dot product: \(x^T y\) (conjugate for complex) |
|
Dot product unconjugated: \(x^T y\) |
|
Index of max absolute value |
|
Euclidean norm: \(\|x\|_2\) |
|
Scale vector: \(x = \alpha x\) |
|
Swap vectors: \(x \leftrightarrow y\) |
|
Apply plane rotation |
|
Generate plane rotation |
|
Apply modified plane rotation |
|
Generate modified plane rotation |
Level 2 BLAS (Matrix-Vector)
Function |
Operation |
|---|---|
|
General matrix-vector multiply: \(y = \alpha Ax + \beta y\) |
|
General rank-1 update: \(A = \alpha xy^T + A\) |
|
General rank-1 update unconjugated |
|
Hermitian matrix-vector multiply |
|
Hermitian rank-1 update |
|
Hermitian rank-2 update |
|
Symmetric matrix-vector multiply |
|
Symmetric rank-1 update |
|
Symmetric rank-2 update |
|
Triangular matrix-vector multiply |
|
Triangular solve: \(x = A^{-1}x\) |
Level 3 BLAS (Matrix-Matrix)
Function |
Operation |
|---|---|
|
General matrix multiply: \(C = \alpha AB + \beta C\) |
|
Hermitian matrix multiply |
|
Hermitian rank-k update: \(C = \alpha AA^H + \beta C\) |
|
Hermitian rank-2k update |
|
Symmetric matrix multiply |
|
Symmetric rank-k update |
|
Symmetric rank-2k update |
|
Triangular matrix multiply |
|
Triangular solve: \(X = \alpha A^{-1}B\) |
Basic Usage
CPU (Host) Operations
#include <blas.hh>
// Matrix-matrix multiply: C = alpha*A*B + beta*C
blas::gemm(
blas::Layout::ColMajor,
blas::Op::NoTrans, blas::Op::NoTrans,
m, n, k,
alpha, A, lda,
B, ldb,
beta, C, ldc);
// Vector operations
blas::axpy(n, alpha, x, incx, y, incy); // y = alpha*x + y
double norm = blas::nrm2(n, x, incx); // ||x||_2
GPU (Device) Operations
#include <blas.hh>
// Create device queue
blas::Queue queue(device_id);
// Device matrix multiply (d_A, d_B, d_C are device pointers)
blas::gemm(
blas::Layout::ColMajor,
blas::Op::NoTrans, blas::Op::NoTrans,
m, n, k,
alpha, d_A, lda,
d_B, ldb,
beta, d_C, ldc,
queue);
// Wait for completion
queue.sync();
Common Parameters
Layout
Specifies matrix storage order:
Layout::ColMajor- Column-major (Fortran-style)Layout::RowMajor- Row-major (C-style)
Op
Specifies transpose operation:
Op::NoTrans- No transpose: \(A\)Op::Trans- Transpose: \(A^T\)Op::ConjTrans- Conjugate transpose: \(A^H\)
Uplo
Specifies triangular/symmetric matrix part:
Uplo::Upper- Upper triangleUplo::Lower- Lower triangle
Diag
Specifies diagonal type for triangular matrices:
Diag::NonUnit- Diagonal elements are arbitraryDiag::Unit- Diagonal elements are 1
Side
Specifies matrix position in operation:
Side::Left- Matrix on left: \(AB\)Side::Right- Matrix on right: \(BA\)
Data Types
BLAS++ functions are templated on scalar type:
float- Single precision realdouble- Double precision realstd::complex<float>- Single precision complexstd::complex<double>- Double precision complex
Header Files
Main header (includes everything):
#include <blas.hh>
Individual operation headers:
#include <blas/gemm.hh>
#include <blas/axpy.hh>
// etc.
Utility headers:
#include <blas/util.hh> // Enumerations, error handling
#include <blas/device.hh> // Device queue, memory management
#include <blas/counter.hh> // Performance counters (PAPI)
#include <blas/flops.hh> // FLOP counting
Error Handling
BLAS++ uses exceptions for errors:
try {
blas::gemm(...);
}
catch (blas::Error& e) {
std::cerr << "BLAS++ error: " << e.what() << std::endl;
}