Performance Counters
FLOP (floating-point operation) and bandwidth counters for LAPACK routines.
Overview
LAPACK++ provides theoretical operation counts based on LAWN 41 formulas to:
Measure computational complexity
Estimate performance
Calculate theoretical peak FLOP rates
Analyze memory bandwidth requirements
All counts are templated on scalar type (float, double, complex<float>, complex<double>) to account for real vs. complex arithmetic overhead.
FLOP Counting
-
template<typename T>
class Gflop : public blas::Gflop<T> Public Static Functions
-
static inline double gesv(double n, double nrhs)
-
static inline double getrf(double m, double n)
-
static inline double getri(double n)
-
static inline double getrs(double n, double nrhs)
-
static inline double posv(double n, double nrhs)
-
static inline double potrf(double n)
-
static inline double potri(double n)
-
static inline double potrs(double n, double nrhs)
-
static inline double pbsv(double n, double nrhs, double k)
-
static inline double pbtrf(double n, double k)
-
static inline double pbtrs(double n, double nrhs, double k)
-
static inline double sysv(double n, double nrhs)
-
static inline double sytrf(double n)
-
static inline double sytri(double n)
-
static inline double sytrs(double n, double nrhs)
-
static inline double hesv(double n, double nrhs)
-
static inline double hetrf(double n)
-
static inline double hetri(double n)
-
static inline double hetrs(double n, double nrhs)
-
static inline double geqrf(double m, double n)
-
static inline double geqrt(double m, double n)
-
static inline double geqlf(double m, double n)
-
static inline double gerqf(double m, double n)
-
static inline double gelqf(double m, double n)
-
static inline double ungqr(double m, double n, double k)
-
static inline double orgqr(double m, double n, double k)
-
static inline double ungql(double m, double n, double k)
-
static inline double orgql(double m, double n, double k)
-
static inline double ungrq(double m, double n, double k)
-
static inline double orgrq(double m, double n, double k)
-
static inline double unglq(double m, double n, double k)
-
static inline double orglq(double m, double n, double k)
-
static inline double unmqr(lapack::Side side, double m, double n, double k)
-
static inline double ormqr(lapack::Side side, double m, double n, double k)
-
static inline double unmql(lapack::Side side, double m, double n, double k)
-
static inline double ormql(lapack::Side side, double m, double n, double k)
-
static inline double unmrq(lapack::Side side, double m, double n, double k)
-
static inline double ormrq(lapack::Side side, double m, double n, double k)
-
static inline double unmlq(lapack::Side side, double m, double n, double k)
-
static inline double ormlq(lapack::Side side, double m, double n, double k)
-
static inline double gels(double m, double n, double nrhs)
-
static inline double trtri(double n)
-
static inline double gehrd(double n)
-
static inline double hetrd(double n)
-
static inline double sytrd(double n)
-
static inline double gebrd(double m, double n)
-
static inline double larfg(double n)
-
static inline double geadd(double m, double n)
-
static inline double lauum(double n)
-
static inline double lange(lapack::Norm norm, double m, double n)
-
static inline double lanhe(lapack::Norm norm, double n)
-
static inline double lansy(lapack::Norm norm, double n)
-
static inline double gesv(double n, double nrhs)
Bandwidth Analysis
Usage Example
// Count FLOPs for double-precision LU factorization
double m = 1000, n = 1000;
double flops = lapack::Gflop<double>::getrf(m, n);
std::cout << "LU factorization: " << flops << " GFLOPs" << std::endl;
// Count FLOPs for complex QR factorization
double gflops_complex = lapack::Gflop<std::complex<double>>::geqrf(m, n);
// Data transfer for Cholesky
double bytes = lapack::Gbyte<double>::potrf(n);
std::cout << "Cholesky bandwidth: " << bytes << " GB" << std::endl;
Supported Operations
Linear Systems: - LU: gesv, getrf, getrs, getri - Cholesky: posv, potrf, potrs, potri - Band Cholesky: pbsv, pbtrf, pbtrs - Symmetric: sysv, sytrf, sytrs, sytri, hesv, hetrf, hetrs, hetri
QR Factorizations: - geqrf, geqlf, gerqf, gelqf, geqrt - ungqr/orgqr, ungql/orgql, ungrq/orgrq, unglq/orglq - unmqr/ormqr, unmql/ormql, unmrq/ormrq, unmlq/ormlq - gels (least squares)
Reductions: - gehrd (Hessenberg) - hetrd/sytrd (tridiagonal) - gebrd (bidiagonal)
Utilities: - trtri (triangular inverse) - lauum (U^H*U or L*L^T) - larfg (Householder reflector) - geadd (matrix addition)
Norms: - lange (general matrix) - lanhe/lansy (Hermitian/symmetric)
Notes
Formulas assume standard LAPACK calling conventions
Some counts may be inaccurate for edge cases (m, n, or k = 0)
Complex arithmetic counted as: 1 complex multiply = 6 real ops, 1 complex add = 2 real ops
Based on LAWN 41: “Operation Count for the QR and Cholesky Factorizations” by Demmel and Hida