Performance Counters

FLOP (floating-point operation) and bandwidth counters for LAPACK routines.

Overview

LAPACK++ provides theoretical operation counts based on LAWN 41 formulas to:

  • Measure computational complexity

  • Estimate performance

  • Calculate theoretical peak FLOP rates

  • Analyze memory bandwidth requirements

All counts are templated on scalar type (float, double, complex<float>, complex<double>) to account for real vs. complex arithmetic overhead.

FLOP Counting

template<typename T>
class Gflop : public blas::Gflop<T>

Public Static Functions

static inline double gesv(double n, double nrhs)
static inline double getrf(double m, double n)
static inline double getri(double n)
static inline double getrs(double n, double nrhs)
static inline double posv(double n, double nrhs)
static inline double potrf(double n)
static inline double potri(double n)
static inline double potrs(double n, double nrhs)
static inline double pbsv(double n, double nrhs, double k)
static inline double pbtrf(double n, double k)
static inline double pbtrs(double n, double nrhs, double k)
static inline double sysv(double n, double nrhs)
static inline double sytrf(double n)
static inline double sytri(double n)
static inline double sytrs(double n, double nrhs)
static inline double hesv(double n, double nrhs)
static inline double hetrf(double n)
static inline double hetri(double n)
static inline double hetrs(double n, double nrhs)
static inline double geqrf(double m, double n)
static inline double geqrt(double m, double n)
static inline double geqlf(double m, double n)
static inline double gerqf(double m, double n)
static inline double gelqf(double m, double n)
static inline double ungqr(double m, double n, double k)
static inline double orgqr(double m, double n, double k)
static inline double ungql(double m, double n, double k)
static inline double orgql(double m, double n, double k)
static inline double ungrq(double m, double n, double k)
static inline double orgrq(double m, double n, double k)
static inline double unglq(double m, double n, double k)
static inline double orglq(double m, double n, double k)
static inline double unmqr(lapack::Side side, double m, double n, double k)
static inline double ormqr(lapack::Side side, double m, double n, double k)
static inline double unmql(lapack::Side side, double m, double n, double k)
static inline double ormql(lapack::Side side, double m, double n, double k)
static inline double unmrq(lapack::Side side, double m, double n, double k)
static inline double ormrq(lapack::Side side, double m, double n, double k)
static inline double unmlq(lapack::Side side, double m, double n, double k)
static inline double ormlq(lapack::Side side, double m, double n, double k)
static inline double gels(double m, double n, double nrhs)
static inline double trtri(double n)
static inline double gehrd(double n)
static inline double hetrd(double n)
static inline double sytrd(double n)
static inline double gebrd(double m, double n)
static inline double larfg(double n)
static inline double geadd(double m, double n)
static inline double lauum(double n)
static inline double lange(lapack::Norm norm, double m, double n)
static inline double lanhe(lapack::Norm norm, double n)
static inline double lansy(lapack::Norm norm, double n)

Bandwidth Analysis

template<typename T>
class Gbyte : public blas::Gbyte<T>

Usage Example

// Count FLOPs for double-precision LU factorization
double m = 1000, n = 1000;
double flops = lapack::Gflop<double>::getrf(m, n);
std::cout << "LU factorization: " << flops << " GFLOPs" << std::endl;

// Count FLOPs for complex QR factorization
double gflops_complex = lapack::Gflop<std::complex<double>>::geqrf(m, n);

// Data transfer for Cholesky
double bytes = lapack::Gbyte<double>::potrf(n);
std::cout << "Cholesky bandwidth: " << bytes << " GB" << std::endl;

Supported Operations

Linear Systems: - LU: gesv, getrf, getrs, getri - Cholesky: posv, potrf, potrs, potri - Band Cholesky: pbsv, pbtrf, pbtrs - Symmetric: sysv, sytrf, sytrs, sytri, hesv, hetrf, hetrs, hetri

QR Factorizations: - geqrf, geqlf, gerqf, gelqf, geqrt - ungqr/orgqr, ungql/orgql, ungrq/orgrq, unglq/orglq - unmqr/ormqr, unmql/ormql, unmrq/ormrq, unmlq/ormlq - gels (least squares)

Reductions: - gehrd (Hessenberg) - hetrd/sytrd (tridiagonal) - gebrd (bidiagonal)

Utilities: - trtri (triangular inverse) - lauum (U^H*U or L*L^T) - larfg (Householder reflector) - geadd (matrix addition)

Norms: - lange (general matrix) - lanhe/lansy (Hermitian/symmetric)

Notes

  • Formulas assume standard LAPACK calling conventions

  • Some counts may be inaccurate for edge cases (m, n, or k = 0)

  • Complex arithmetic counted as: 1 complex multiply = 6 real ops, 1 complex add = 2 real ops

  • Based on LAWN 41: “Operation Count for the QR and Cholesky Factorizations” by Demmel and Hida