ScaLAPACK Compatibility
SLATE provides compatibility APIs for easy transition from ScaLAPACK and LAPACK applications. This chapter covers both APIs and migration strategies.
Overview
SLATE provides two compatibility layers:
LAPACK Compatibility API: For single-node LAPACK applications
ScaLAPACK Compatibility API: For distributed ScaLAPACK applications
Both APIs allow existing code to use SLATE with minimal changes.
LAPACK Compatibility API
The LAPACK compatibility API provides routines with a slate_ prefix that match standard
LAPACK interfaces.
Using the API
C Example:
// Compile with:
// mpicc -o example example.c -lslate_lapack_api
// Original LAPACK call
dgetrf_(&m, &n, A, &lda, ipiv, &info);
// SLATE equivalent
slate_dgetrf_(&m, &n, A, &lda, ipiv, &info);
Fortran Example:
!! Compile with:
!! mpif90 -o example example.f90 -lslate_lapack_api
!! Original LAPACK call
call dgetrf(m, n, A, lda, ipiv, info)
!! SLATE equivalent
call slate_dgetrf(m, n, A, lda, ipiv, info)
How It Works
The compatibility library:
Creates SLATE matrix from LAPACK data using
fromLAPACKSets execution target from environment variables
Calls the corresponding SLATE routine
Returns results in the original LAPACK arrays
Configuration
Set execution target via environment variables:
# CPU execution (default)
export SLATE_LAPACK_TARGET=HostTask
# GPU execution
export SLATE_LAPACK_TARGET=Devices
Tile Size
SLATE divides LAPACK matrices into tiles. The tile size can be configured:
export SLATE_LAPACK_NB=256
Available Routines
All routines have slate_ prefix with standard LAPACK naming:
Linear systems:
slate_dgesv,slate_dgetrf,slate_dgetrsCholesky:
slate_dposv,slate_dpotrf,slate_dpotrsLeast squares:
slate_dgelsEigenvalues:
slate_dsyev,slate_dheevSVD:
slate_dgesvdBLAS:
slate_dgemm,slate_dtrsm, etc.
ScaLAPACK Compatibility API
The ScaLAPACK compatibility API is link-time compatible with standard ScaLAPACK, using identical function names and parameters.
Using the API
Link with the SLATE ScaLAPACK library before the actual ScaLAPACK:
C Example:
// Compile with:
// mpicc -o example example.c -lslate_scalapack_api -lscalapack
// Standard ScaLAPACK call - automatically intercepted by SLATE
pdgetrf_(&m, &n, A, &ia, &ja, descA, ipiv, &info);
Fortran Example:
!! Compile with:
!! mpif90 -o example example.f90 -lslate_scalapack_api -lscalapack
!! Standard ScaLAPACK call - automatically intercepted by SLATE
call pdgetrf(m, n, A, ia, ja, descA, ipiv, info)
How It Works
The compatibility library intercepts ScaLAPACK function calls:
Intercepts standard ScaLAPACK routine names (
pdgemm,PDGEMM,pdgemm_, etc.)Maps ScaLAPACK descriptors to SLATE matrices using
fromScaLAPACKUses ScaLAPACK blocking factor as SLATE tile size
Calls the SLATE implementation
Routines not implemented in SLATE fall through to actual ScaLAPACK
Configuration
Set execution target via environment variables:
# CPU execution (default)
export SLATE_SCALAPACK_TARGET=HostTask
# GPU execution
export SLATE_SCALAPACK_TARGET=Devices
Supported Routines
Currently implemented ScaLAPACK routines:
BLAS:
pdgemm,psgemm,pzgemm,pcgemmpdsymm,pzhemm, etc.pdsyrk,pzherk, etc.pdtrsm,pstrsm, etc.
Linear Systems:
pdgesv,psgesv, etc.pdgetrf,pdgetrspdposv,pdpotrf,pdpotrs
Eigenvalues:
pdsyev,pzheev
SVD:
pdgesvd,pzgesvd
Routines not in the list pass through to the original ScaLAPACK.
Matrix Layout Compatibility
SLATE natively supports the ScaLAPACK 2D block-cyclic layout.
Creating from ScaLAPACK Data
// ScaLAPACK-style allocation
int myrow = mpi_rank % p;
int mycol = mpi_rank / p;
int64_t mlocal = slate::num_local_rows_cols(m, nb, myrow, 0, p);
int64_t nlocal = slate::num_local_rows_cols(n, nb, mycol, 0, q);
int64_t lld = mlocal; // Local leading dimension
double* A_data = new double[lld * nlocal];
// Create SLATE matrix from ScaLAPACK data
auto A = slate::Matrix<double>::fromScaLAPACK(
m, n, // Global dimensions
A_data, // Local data array
lld, // Local leading dimension
nb, nb, // Block size
slate::GridOrder::Col, // Column-major grid (ScaLAPACK default)
p, q, // Process grid
MPI_COMM_WORLD);
Grid Order
ScaLAPACK typically uses column-major process grids:
// Column-major (ScaLAPACK default)
slate::GridOrder::Col
// Process 0 at (0,0), Process 1 at (1,0), ...
// Row-major
slate::GridOrder::Row
// Process 0 at (0,0), Process 1 at (0,1), ...
Migration Strategies
Gradual Migration
Start with compatibility API: Link with
slate_scalapack_apiVerify correctness: Run tests to ensure results match
Benchmark performance: Compare SLATE vs ScaLAPACK
Enable GPU: Set
SLATE_SCALAPACK_TARGET=DevicesMigrate to native API: For new code or performance-critical sections
Direct Migration
Convert ScaLAPACK calls directly to SLATE:
Before (ScaLAPACK):
call descinit(descA, m, n, nb, nb, 0, 0, ictxt, lld, info)
call pdgetrf(m, n, A, 1, 1, descA, ipiv, info)
call pdgetrs('N', n, nrhs, A, 1, 1, descA, ipiv,
B, 1, 1, descB, info)
After (SLATE C++):
auto A = slate::Matrix<double>::fromScaLAPACK(
m, n, A_data, lld, nb, nb,
slate::GridOrder::Col, p, q, MPI_COMM_WORLD);
auto B = slate::Matrix<double>::fromScaLAPACK(...);
slate::Pivots pivots;
slate::lu_factor(A, pivots);
slate::lu_solve_using_factor(A, pivots, B);
Performance Considerations
Tile Size
ScaLAPACK blocking factor becomes SLATE tile size. Consider:
SLATE performs better with larger tiles (especially on GPU)
If ScaLAPACK uses small
nb, performance may be suboptimalFor best performance, use larger
nb(256-1024)
GPU Acceleration
The compatibility API can run on GPUs:
export SLATE_SCALAPACK_TARGET=Devices
# Run existing ScaLAPACK code on GPU
mpirun -n 4 ./my_scalapack_app
This provides GPU acceleration without code changes.
Limitations
Not all ScaLAPACK routines are implemented in SLATE
Some ScaLAPACK features (workspace queries) may behave differently
Performance depends on tile size chosen in original code
Complex workspace management may not translate directly
Troubleshooting
Linker errors with compatibility API:
Ensure correct link order (SLATE before ScaLAPACK):
-lslate_scalapack_api -lscalapack -lblas
Results differ from ScaLAPACK:
Different algorithms may give different (but equally valid) results due to:
Different pivoting strategies
Different numerical ordering
Check that error is within tolerance, not for exact match.
Routine not intercepted:
Routine may not be implemented in SLATE. Check available routines or use native SLATE API.