Performance Counting
PAPI performance counters and FLOP/bandwidth calculations.
PAPI Counter Integration
-
class counter
Performance counter integration for BLAS++.
This class provides integration with PAPI (Performance API) for counting BLAS operations and computing floating-point operation counts. Uses the Scott Meyers Singleton pattern for thread-safe initialization.
The counter system tracks:
Number of calls to each BLAS routine
Dimensions and parameters for each call
Total floating-point operations performed
Usage (when PAPI is available):
// Insert operation into counting set counter::gemm_type op = {transA, transB, m, n, k}; counter::insert(op, counter::Id::gemm); // Get total flop count long long flops = counter::get_flop_count(&atomic_var);
Note
This is essentially a namespace - all public functions are static.
-
struct axpy_type
Parameters for Level 1 BLAS operations (vector length only).
Used by: axpy, scal, copy, swap, dot, dotu, nrm2, asum, iamax, rot, rotm.
Public Members
-
int64_t n
Vector length.
-
int64_t n
-
struct gemv_type
Parameters for gemv (general matrix-vector multiply).
-
struct hemv_type
Parameters for Hermitian/symmetric matrix-vector operations.
Used by: hemv, symv, her, her2, syr, syr2.
-
struct trmv_type
Parameters for triangular matrix-vector operations.
Used by: trmv, trsv.
-
struct ger_type
Parameters for rank-1 update operations.
Used by: ger, geru, gerc.
-
struct gemm_type
Parameters for gemm (general matrix-matrix multiply).
-
struct hemm_type
Parameters for Hermitian/symmetric matrix-matrix multiply.
Used by: hemm, symm.
-
struct herk_type
Parameters for Hermitian/symmetric rank-k and rank-2k updates.
Used by: herk, syrk, syr2k, her2k.
-
struct trmm_type
Parameters for triangular matrix-matrix operations.
Used by: trmm, trsm.
-
struct dev_batch_gemm_type
Parameters for batch gemm on device.
FLOP Calculations
-
template<typename T>
class Gflop Floating-point operation counting in gigaflops.
Template class for computing FLOPs (floating-point operations) for BLAS routines. Accounts for both multiplies and adds, properly handling complex arithmetic via FlopTraits.
Example usage:
// For single precision real gemm double gflops = Gflop<float>::gemm(m, n, k); // For single precision complex gemm double gflops = Gflop<std::complex<float>>::gemm(m, n, k);
- Template Parameters:
T – Scalar type (float, double, std::complex<float>, std::complex<double>)
Subclassed by lapack::Gflop< T >
Public Static Functions
-
static inline double asum(double n)
Giga-FLOPs for asum (sum of absolute values).
- Parameters:
n – [in] Vector length
- Returns:
Gigaflops
-
static inline double axpy(double n)
Giga-FLOPs for axpy (y = alpha*x + y).
- Parameters:
n – [in] Vector length
- Returns:
Gigaflops
-
static inline double copy(double n)
Giga-FLOPs for copy (no arithmetic operations).
- Parameters:
n – [in] Vector length
- Returns:
0 (copy has no FLOPs)
-
static inline double iamax(double n)
Giga-FLOPs for iamax (index of maximum absolute value).
- Parameters:
n – [in] Vector length
- Returns:
Gigaflops
-
static inline double nrm2(double n)
Giga-FLOPs for nrm2 (Euclidean norm).
- Parameters:
n – [in] Vector length
- Returns:
Gigaflops
-
static inline double dot(double n)
Giga-FLOPs for dot product.
- Parameters:
n – [in] Vector length
- Returns:
Gigaflops
-
static inline double scal(double n)
Giga-FLOPs for scal (vector scaling).
- Parameters:
n – [in] Vector length
- Returns:
Gigaflops
-
static inline double swap(double n)
Giga-FLOPs for swap (no arithmetic operations).
- Parameters:
n – [in] Vector length
- Returns:
0 (swap has no FLOPs)
-
static inline double rot(double n)
Giga-FLOPs for Givens rotation.
- Parameters:
n – [in] Vector length
- Returns:
Gigaflops
-
static inline double rotm(double n)
Giga-FLOPs for modified Givens rotation.
- Parameters:
n – [in] Vector length
- Returns:
Gigaflops
-
static inline double gemv(double m, double n)
Giga-FLOPs for gemv (general matrix-vector multiply).
- Parameters:
m – [in] Number of rows
n – [in] Number of columns
- Returns:
Gigaflops
-
static inline double symv(double n)
Giga-FLOPs for symv (symmetric matrix-vector multiply).
- Parameters:
n – [in] Matrix dimension
- Returns:
Gigaflops
-
static inline double hemv(double n)
Giga-FLOPs for hemv (Hermitian matrix-vector multiply).
- Parameters:
n – [in] Matrix dimension
- Returns:
Gigaflops
-
static inline double trmv(double n)
Giga-FLOPs for trmv (triangular matrix-vector multiply).
- Parameters:
n – [in] Matrix dimension
- Returns:
Gigaflops
-
static inline double trsv(double n)
-
static inline double her(double n)
Giga-FLOPs for her (Hermitian rank-1 update).
- Parameters:
n – [in] Matrix dimension
- Returns:
Gigaflops
-
static inline double syr(double n)
-
static inline double ger(double m, double n)
Giga-FLOPs for ger (general rank-1 update).
- Parameters:
m – [in] Number of rows
n – [in] Number of columns
- Returns:
Gigaflops
-
static inline double her2(double n)
Giga-FLOPs for her2 (Hermitian rank-2 update).
- Parameters:
n – [in] Matrix dimension
- Returns:
Gigaflops
-
static inline double syr2(double n)
-
static inline double gemm(double m, double n, double k)
Giga-FLOPs for gemm (C = alpha*op(A)*op(B) + beta*C).
- Parameters:
m – [in] Number of rows of C
n – [in] Number of columns of C
k – [in] Inner dimension
- Returns:
Gigaflops
-
static inline double gbmm(double m, double n, double k, double kl, double ku)
Giga-FLOPs for gbmm (banded matrix-matrix multiply).
- Parameters:
m – [in] Number of rows
n – [in] Number of columns
k – [in] Inner dimension
kl – [in] Lower bandwidth
ku – [in] Upper bandwidth
- Returns:
Gigaflops
-
static inline double hemm(blas::Side side, double m, double n)
Giga-FLOPs for hemm (Hermitian matrix-matrix multiply).
- Parameters:
side – [in] Side where Hermitian matrix appears
m – [in] Number of rows of C
n – [in] Number of columns of C
- Returns:
Gigaflops
-
static inline double hbmm(double m, double n, double kd)
Giga-FLOPs for hbmm (Hermitian banded matrix-matrix multiply).
- Parameters:
m – [in] Number of rows
n – [in] Number of columns
kd – [in] Bandwidth
- Returns:
Gigaflops
-
static inline double symm(blas::Side side, double m, double n)
Giga-FLOPs for symm (symmetric matrix-matrix multiply).
- Parameters:
side – [in] Side where symmetric matrix appears
m – [in] Number of rows of C
n – [in] Number of columns of C
- Returns:
Gigaflops
-
static inline double herk(double n, double k)
Giga-FLOPs for herk (Hermitian rank-k update).
- Parameters:
n – [in] Dimension of C
k – [in] Inner dimension
- Returns:
Gigaflops
-
static inline double syrk(double n, double k)
Giga-FLOPs for syrk (symmetric rank-k update, same as herk).
- Parameters:
n – [in] Dimension of C
k – [in] Inner dimension
- Returns:
Gigaflops
-
static inline double her2k(double n, double k)
Giga-FLOPs for her2k (Hermitian rank-2k update).
- Parameters:
n – [in] Dimension of C
k – [in] Inner dimension
- Returns:
Gigaflops
-
static inline double syr2k(double n, double k)
Giga-FLOPs for syr2k (symmetric rank-2k update, same as her2k).
- Parameters:
n – [in] Dimension of C
k – [in] Inner dimension
- Returns:
Gigaflops
-
static inline double trmm(blas::Side side, double m, double n)
Giga-FLOPs for trmm (triangular matrix-matrix multiply).
- Parameters:
side – [in] Side where triangular matrix appears
m – [in] Number of rows of B
n – [in] Number of columns of B
- Returns:
Gigaflops
-
static inline double trsm(blas::Side side, double m, double n)
Giga-FLOPs for trsm (triangular solve, same as trmm).
- Parameters:
side – [in] Side where triangular matrix appears
m – [in] Number of rows of B
n – [in] Number of columns of B
- Returns:
Gigaflops
Public Static Attributes
-
static double mul_ops = FlopTraits<T>::mul_ops
Number of real ops per multiply for type T.
-
static double add_ops = FlopTraits<T>::add_ops
Number of real ops per add for type T.
-
template<typename T>
class FlopTraits Traits for counting operations per multiply and add.
For real types, one multiply = 1 op, one add = 1 op. For complex types, one complex multiply = 6 real ops (4 muls + 2 adds), one complex add = 2 real ops.
- Template Parameters:
T – Scalar type
Bandwidth Calculations
-
template<typename T>
class Gbyte Data transfer counting in gigabytes.
Template class for computing data transfer (in gigabytes) for BLAS operations. Accounts for reading and writing matrices/vectors based on operation semantics.
Example usage:
double gb = Gbyte<float>::gemm(m, n, k); double gb_complex = Gbyte<std::complex<float>>::gemm(m, n, k);
- Template Parameters:
T – Scalar type (e.g., float, double, std::complex<float>)
Subclassed by lapack::Gbyte< T >
Public Static Functions
-
static inline double asum(double n)
Data transfer for asum (sum of absolute values).
Reads vector x.
- Parameters:
n – [in] Vector length
- Returns:
Gigabytes transferred
-
static inline double axpy(double n)
Data transfer for axpy (y = alpha*x + y).
Reads x and y, writes y.
- Parameters:
n – [in] Vector length
- Returns:
Gigabytes transferred
-
static inline double copy(double n)
Data transfer for copy (y = x).
Reads x, writes y.
- Parameters:
n – [in] Vector length
- Returns:
Gigabytes transferred
-
static inline double iamax(double n)
Data transfer for iamax (index of max absolute value).
Reads vector x.
- Parameters:
n – [in] Vector length
- Returns:
Gigabytes transferred
-
static inline double nrm2(double n)
Data transfer for nrm2 (Euclidean norm).
Reads vector x.
- Parameters:
n – [in] Vector length
- Returns:
Gigabytes transferred
-
static inline double dot(double n)
Data transfer for dot product.
Reads vectors x and y.
- Parameters:
n – [in] Vector length
- Returns:
Gigabytes transferred
-
static inline double scal(double n)
Data transfer for scal (x = alpha*x).
Reads and writes vector x.
- Parameters:
n – [in] Vector length
- Returns:
Gigabytes transferred
-
static inline double swap(double n)
Data transfer for swap (exchange x and y).
Reads and writes vectors x and y.
- Parameters:
n – [in] Vector length
- Returns:
Gigabytes transferred
-
static inline double gemv(double m, double n)
Data transfer for gemv (y = alpha*A*x + beta*y).
Reads matrix A, vectors x and y, writes y.
- Parameters:
m – [in] Number of rows
n – [in] Number of columns
- Returns:
Gigabytes transferred
-
static inline double hemv(double n)
Data transfer for hemv (Hermitian matrix-vector multiply).
Reads Hermitian matrix A (triangle), vector x, writes y.
- Parameters:
n – [in] Matrix dimension
- Returns:
Gigabytes transferred
-
static inline double symv(double n)
Data transfer for symv (same as hemv for symmetric).
- Parameters:
n – [in] Matrix dimension
- Returns:
Gigabytes transferred
-
static inline double trmv(double n)
Data transfer for trmv/trsv (triangular matrix-vector ops).
Reads triangular matrix A, vector x, writes x.
- Parameters:
n – [in] Matrix dimension
- Returns:
Gigabytes transferred
-
static inline double trsv(double n)
Data transfer for trsv (same as trmv).
Giga-FLOPs for trsv (triangular solve, same as trmv).
- Parameters:
n – [in] Matrix dimension
n – [in] Matrix dimension
- Returns:
Gigabytes transferred
- Returns:
Gigaflops
-
static inline double ger(double m, double n)
Data transfer for ger (rank-1 update A = A + alpha*x*y^T).
Reads A, x, y, writes A.
- Parameters:
m – [in] Number of rows
n – [in] Number of columns
- Returns:
Gigabytes transferred
-
static inline double her(double n)
Data transfer for her/syr (Hermitian/symmetric rank-1 update).
Reads triangular A, vector x, writes triangular A.
- Parameters:
n – [in] Matrix dimension
- Returns:
Gigabytes transferred
-
static inline double syr(double n)
Data transfer for syr (same as her for symmetric).
Giga-FLOPs for syr (symmetric rank-1 update, same as her).
- Parameters:
n – [in] Matrix dimension
n – [in] Matrix dimension
- Returns:
Gigabytes transferred
- Returns:
Gigaflops
-
static inline double her2(double n)
Data transfer for her2/syr2 (Hermitian/symmetric rank-2 update).
Reads triangular A, vectors x and y, writes triangular A.
- Parameters:
n – [in] Matrix dimension
- Returns:
Gigabytes transferred
-
static inline double syr2(double n)
Data transfer for syr2 (same as her2 for symmetric).
Giga-FLOPs for syr2 (symmetric rank-2 update, same as her2).
- Parameters:
n – [in] Matrix dimension
n – [in] Matrix dimension
- Returns:
Gigabytes transferred
- Returns:
Gigaflops
-
static inline double copy_2d(double m, double n)
Data transfer for 2D matrix copy.
Reads matrix A, writes matrix B.
- Parameters:
m – [in] Number of rows
n – [in] Number of columns
- Returns:
Gigabytes transferred
-
static inline double gemm(double m, double n, double k)
Data transfer for gemm (C = alpha*A*B + beta*C).
Reads A, B, C, writes C.
- Parameters:
m – [in] Number of rows of C
n – [in] Number of columns of C
k – [in] Inner dimension
- Returns:
Gigabytes transferred
-
static inline double hemm(blas::Side side, double m, double n)
Data transfer for hemm (Hermitian matrix-matrix multiply).
Reads Hermitian A, matrices B and C, writes C.
- Parameters:
side – [in] Side where Hermitian matrix appears
m – [in] Number of rows of C
n – [in] Number of columns of C
- Returns:
Gigabytes transferred
-
static inline double symm(blas::Side side, double m, double n)
Data transfer for symm (same as hemm for symmetric).
- Parameters:
side – [in] Side where symmetric matrix appears
m – [in] Number of rows of C
n – [in] Number of columns of C
- Returns:
Gigabytes transferred
-
static inline double herk(double n, double k)
Data transfer for herk (Hermitian rank-k update).
Reads matrix A, Hermitian C, writes C.
- Parameters:
n – [in] Dimension of C
k – [in] Inner dimension
- Returns:
Gigabytes transferred
-
static inline double syrk(double n, double k)
Data transfer for syrk (same as herk for symmetric).
- Parameters:
n – [in] Dimension of C
k – [in] Inner dimension
- Returns:
Gigabytes transferred
-
static inline double her2k(double n, double k)
Data transfer for her2k (Hermitian rank-2k update).
Reads matrices A and B, Hermitian C, writes C.
- Parameters:
n – [in] Dimension of C
k – [in] Inner dimension
- Returns:
Gigabytes transferred
-
static inline double syr2k(double n, double k)
Data transfer for syr2k (same as her2k for symmetric).
- Parameters:
n – [in] Dimension of C
k – [in] Inner dimension
- Returns:
Gigabytes transferred
-
static inline double trmm(blas::Side side, double m, double n)
Data transfer for trmm/trsm (triangular matrix-matrix ops).
Reads triangular A, matrix B, writes B.
- Parameters:
side – [in] Side where triangular matrix appears
m – [in] Number of rows of B
n – [in] Number of columns of B
- Returns:
Gigabytes transferred
-
static inline double trsm(blas::Side side, double m, double n)
Data transfer for trsm (same as trmm).
- Parameters:
side – [in] Side where triangular matrix appears
m – [in] Number of rows of B
n – [in] Number of columns of B
- Returns:
Gigabytes transferred