ScaLAPACK Compatibility

SLATE provides compatibility APIs for easy transition from ScaLAPACK and LAPACK applications. This chapter covers both APIs and migration strategies.

Overview

SLATE provides two compatibility layers:

LAPACK Compatibility API: For single-node LAPACK applications
ScaLAPACK Compatibility API: For distributed ScaLAPACK applications

Both APIs allow existing code to use SLATE with minimal changes.

LAPACK Compatibility API

The LAPACK compatibility API provides routines with a slate_ prefix that match standard LAPACK interfaces.

Using the API

C Example:

// Compile with:
// mpicc -o example example.c -lslate_lapack_api

// Original LAPACK call
dgetrf_(&m, &n, A, &lda, ipiv, &info);

// SLATE equivalent
slate_dgetrf_(&m, &n, A, &lda, ipiv, &info);

Fortran Example:

!! Compile with:
!! mpif90 -o example example.f90 -lslate_lapack_api

!! Original LAPACK call
call dgetrf(m, n, A, lda, ipiv, info)

!! SLATE equivalent
call slate_dgetrf(m, n, A, lda, ipiv, info)

How It Works

The compatibility library:

Creates SLATE matrix from LAPACK data using fromLAPACK
Sets execution target from environment variables
Calls the corresponding SLATE routine
Returns results in the original LAPACK arrays

Configuration

Set execution target via environment variables:

# CPU execution (default)
export SLATE_LAPACK_TARGET=HostTask

# GPU execution
export SLATE_LAPACK_TARGET=Devices

Tile Size

SLATE divides LAPACK matrices into tiles. The tile size can be configured:

export SLATE_LAPACK_NB=256

Available Routines

All routines have slate_ prefix with standard LAPACK naming:

Linear systems: slate_dgesv, slate_dgetrf, slate_dgetrs
Cholesky: slate_dposv, slate_dpotrf, slate_dpotrs
Least squares: slate_dgels
Eigenvalues: slate_dsyev, slate_dheev
SVD: slate_dgesvd
BLAS: slate_dgemm, slate_dtrsm, etc.

ScaLAPACK Compatibility API

The ScaLAPACK compatibility API is link-time compatible with standard ScaLAPACK, using identical function names and parameters.

Using the API

Link with the SLATE ScaLAPACK library before the actual ScaLAPACK:

C Example:

// Compile with:
// mpicc -o example example.c -lslate_scalapack_api -lscalapack

// Standard ScaLAPACK call - automatically intercepted by SLATE
pdgetrf_(&m, &n, A, &ia, &ja, descA, ipiv, &info);

Fortran Example:

!! Compile with:
!! mpif90 -o example example.f90 -lslate_scalapack_api -lscalapack

!! Standard ScaLAPACK call - automatically intercepted by SLATE
call pdgetrf(m, n, A, ia, ja, descA, ipiv, info)

How It Works

The compatibility library intercepts ScaLAPACK function calls:

Intercepts standard ScaLAPACK routine names (pdgemm, PDGEMM, pdgemm_, etc.)
Maps ScaLAPACK descriptors to SLATE matrices using fromScaLAPACK
Uses ScaLAPACK blocking factor as SLATE tile size
Calls the SLATE implementation
Routines not implemented in SLATE fall through to actual ScaLAPACK

Configuration

Set execution target via environment variables:

# CPU execution (default)
export SLATE_SCALAPACK_TARGET=HostTask

# GPU execution
export SLATE_SCALAPACK_TARGET=Devices

Supported Routines

Currently implemented ScaLAPACK routines:

BLAS:

pdgemm, psgemm, pzgemm, pcgemm
pdsymm, pzhemm, etc.
pdsyrk, pzherk, etc.
pdtrsm, pstrsm, etc.

Linear Systems:

pdgesv, psgesv, etc.
pdgetrf, pdgetrs
pdposv, pdpotrf, pdpotrs

Eigenvalues:

pdsyev, pzheev

SVD:

pdgesvd, pzgesvd

Routines not in the list pass through to the original ScaLAPACK.

Matrix Layout Compatibility

SLATE natively supports the ScaLAPACK 2D block-cyclic layout.

Creating from ScaLAPACK Data

// ScaLAPACK-style allocation
int myrow = mpi_rank % p;
int mycol = mpi_rank / p;
int64_t mlocal = slate::num_local_rows_cols(m, nb, myrow, 0, p);
int64_t nlocal = slate::num_local_rows_cols(n, nb, mycol, 0, q);
int64_t lld = mlocal;  // Local leading dimension
double* A_data = new double[lld * nlocal];

// Create SLATE matrix from ScaLAPACK data
auto A = slate::Matrix<double>::fromScaLAPACK(
    m, n,                    // Global dimensions
    A_data,                  // Local data array
    lld,                     // Local leading dimension
    nb, nb,                  // Block size
    slate::GridOrder::Col,   // Column-major grid (ScaLAPACK default)
    p, q,                    // Process grid
    MPI_COMM_WORLD);

Grid Order

ScaLAPACK typically uses column-major process grids:

// Column-major (ScaLAPACK default)
slate::GridOrder::Col
// Process 0 at (0,0), Process 1 at (1,0), ...

// Row-major
slate::GridOrder::Row
// Process 0 at (0,0), Process 1 at (0,1), ...

Migration Strategies

Gradual Migration

Start with compatibility API: Link with slate_scalapack_api
Verify correctness: Run tests to ensure results match
Benchmark performance: Compare SLATE vs ScaLAPACK
Enable GPU: Set SLATE_SCALAPACK_TARGET=Devices
Migrate to native API: For new code or performance-critical sections

Direct Migration

Convert ScaLAPACK calls directly to SLATE:

Before (ScaLAPACK):

call descinit(descA, m, n, nb, nb, 0, 0, ictxt, lld, info)
call pdgetrf(m, n, A, 1, 1, descA, ipiv, info)
call pdgetrs('N', n, nrhs, A, 1, 1, descA, ipiv,
             B, 1, 1, descB, info)

After (SLATE C++):

auto A = slate::Matrix<double>::fromScaLAPACK(
    m, n, A_data, lld, nb, nb,
    slate::GridOrder::Col, p, q, MPI_COMM_WORLD);
auto B = slate::Matrix<double>::fromScaLAPACK(...);

slate::Pivots pivots;
slate::lu_factor(A, pivots);
slate::lu_solve_using_factor(A, pivots, B);

Performance Considerations

Tile Size

ScaLAPACK blocking factor becomes SLATE tile size. Consider:

SLATE performs better with larger tiles (especially on GPU)
If ScaLAPACK uses small nb, performance may be suboptimal
For best performance, use larger nb (256-1024)

GPU Acceleration

The compatibility API can run on GPUs:

export SLATE_SCALAPACK_TARGET=Devices

# Run existing ScaLAPACK code on GPU
mpirun -n 4 ./my_scalapack_app

This provides GPU acceleration without code changes.

Limitations

Not all ScaLAPACK routines are implemented in SLATE
Some ScaLAPACK features (workspace queries) may behave differently
Performance depends on tile size chosen in original code
Complex workspace management may not translate directly

Troubleshooting

Linker errors with compatibility API:

Ensure correct link order (SLATE before ScaLAPACK):

-lslate_scalapack_api -lscalapack -lblas

Results differ from ScaLAPACK:

Different algorithms may give different (but equally valid) results due to:

Different pivoting strategies
Different numerical ordering

Check that error is within tolerance, not for exact match.

Routine not intercepted:

Routine may not be implemented in SLATE. Check available routines or use native SLATE API.