SLATE

USER GUIDE

  • Overview
    • Design Goals
    • Software Requirements
    • Project Resources
  • Getting Started
    • Example Program: LU Solve
    • Understanding the Example
      • MPI Initialization
      • Creating Matrices
      • Allocating Tiles
      • Solving the System
      • Execution Options
    • Building the Example
    • Running the Example
    • Simplifying Assumptions
    • Next Steps
  • Installation
    • Prerequisites
    • Downloading Source
    • Makefile Build
      • Basic Configuration
      • Build and Install
      • Configuration Options
    • CMake Build
      • Basic Build
      • CMake Options
      • Example with MKL and CUDA
    • Spack Installation
      • Install Spack
      • Install SLATE
      • Loading SLATE
    • GPU-Aware MPI
    • Verifying Installation
    • Troubleshooting
      • Common Issues
      • Getting Help
    • Module Environment (HPC Systems)
  • Matrices in SLATE
    • Matrix Hierarchy
      • Matrix Types
    • Creating Matrices
      • Basic Creation
      • Creating from Existing Matrix Structure
    • Allocating Tile Memory
      • CPU Host Memory
      • GPU Device Memory
      • User-Provided Memory
      • ScaLAPACK Layout
    • Accessing Matrix Elements
      • Tile Access
      • Element Access
      • Matrix Properties
    • Matrix Conversions
    • Transpose and Conjugate Transpose
    • Submatrices and Slices
      • Tile-Based Submatrices
      • Element-Based Slices
    • Copying Matrices
      • Deep Copy
      • Precision Conversion
    • Tile Layout and Distribution
      • Data Distribution
      • Tile Sizes
      • Memory Management
    • Best Practices
  • SLATE Operations
    • Execution Options
      • Common Options
    • Matrix Norms
    • Level 3 BLAS Operations
      • Matrix Multiply (gemm)
      • Hermitian/Symmetric Matrix Multiply (hemm/symm)
      • Rank-k Update (herk/syrk)
      • Rank-2k Update (her2k/syr2k)
      • Triangular Multiply (trmm)
      • Triangular Solve (trsm)
    • Linear Systems
      • LU Factorization (General Matrices)
      • LU Pivoting Options
      • Cholesky Factorization (Positive Definite)
      • Indefinite Solve (Aasen’s Algorithm)
      • Band Matrix Solve
    • Mixed-Precision Iterative Refinement
    • Least Squares
    • QR and LQ Factorizations
    • Eigenvalue Problems
      • Hermitian/Symmetric Eigenvalues
      • Eigenvalue Methods
      • Generalized Eigenvalue Problem
    • Singular Value Decomposition
    • Auxiliary Operations
      • Matrix Add
      • Matrix Copy
      • Matrix Set
      • Matrix Scale
      • Condition Number Estimate
      • Matrix Print
    • Naming Conventions
  • Testing and Tuning
    • SLATE Tester
      • Basic Usage
      • Single-Process Testing
      • Multi-Process Testing
      • Tester Parameters
      • Example Tester Output
    • Accuracy Verification
      • Without Reference (–ref=n)
      • With Reference (–ref=y)
      • Norm Verification
    • Full Testing Suite
      • Custom Test Commands
    • Performance Tuning
      • Tile Size
      • Process Grid
      • Lookahead
      • Panel Threads
      • Multi-threaded MPI Broadcast
      • GPU-Aware MPI
      • Performance Examples
    • Unit Tests
    • Benchmark Suite
    • Troubleshooting Tests
      • Common Issues
      • Debugging
  • ScaLAPACK Compatibility
    • Overview
    • LAPACK Compatibility API
      • Using the API
      • How It Works
      • Configuration
      • Tile Size
      • Available Routines
    • ScaLAPACK Compatibility API
      • Using the API
      • How It Works
      • Configuration
      • Supported Routines
    • Matrix Layout Compatibility
      • Creating from ScaLAPACK Data
      • Grid Order
    • Migration Strategies
      • Gradual Migration
      • Direct Migration
    • Performance Considerations
      • Tile Size
      • GPU Acceleration
    • Limitations
    • Troubleshooting

API REFERENCE

  • slate
    • Quick Reference
    • Contents
      • BLAS and Auxiliary Operations
        • Matrix Multiplication
        • Triangular Matrix Multiplication
        • Rank Updates
        • Triangular Solve
        • Auxiliary Routines
        • Standard BLAS Routines
      • Linear Systems
        • LU Factorization (General)
        • Cholesky Factorization (Positive Definite)
        • Indefinite Factorization (Symmetric/Hermitian)
      • Least Squares
        • Least Squares Solve
        • QR Factorization
        • LQ Factorization
      • Eigenvalue Problems
        • Symmetric/Hermitian Eigenvalue Problems
        • Generalized Symmetric/Hermitian Eigenvalue Problems
        • Tridiagonal Eigenvalue Solvers
      • Singular Value Decomposition
        • SVD Functions
        • Notes
      • Matrix Classes
        • General Matrices
        • Structured Matrices
        • Band Matrices
        • Tiles
        • Auxiliary Classes
      • Enumerations and Types
        • Core Enumerations
        • Options
        • Method Enumerations
        • Type Traits
    • Data Types
    • Common Parameters
      • routine()
    • Error Handling
    • Header Files
    • C and Fortran APIs
    • See Also
  • blaspp
    • Features
    • Organization
    • Contents
      • Level 1 BLAS (Vector Operations)
        • Operations
      • Level 2 BLAS (Matrix-Vector Operations)
        • Operations
      • Level 3 BLAS (Matrix-Matrix Operations)
        • Operations
      • Utilities and Types
        • Enumerations
        • Error Handling
        • Type Traits and Safety Functions
      • Device (GPU) Operations
        • Queue Management
        • Device Memory and Batch Operations
      • Performance Counting
        • PAPI Counter Integration
        • FLOP Calculations
        • Bandwidth Calculations
    • Quick Reference
      • Level 1 BLAS (Vector-Vector)
      • Level 2 BLAS (Matrix-Vector)
      • Level 3 BLAS (Matrix-Matrix)
    • Basic Usage
      • CPU (Host) Operations
      • GPU (Device) Operations
    • Common Parameters
      • Layout
      • Op
      • Uplo
      • Diag
      • Side
    • Data Types
    • Header Files
    • Error Handling
    • See Also
  • lapackpp
    • Utilities and Enumerations
      • Error Handling
        • lapack::Error
      • Enumerations
      • Version Information
    • Device (GPU) Operations
      • Queue Management
        • lapack::Queue
      • GPU Operations
        • Cholesky Factorization
        • LU Factorization
        • QR Factorization
        • Eigenvalue Decomposition
      • Device Memory Types
    • Performance Counters
      • Overview
      • FLOP Counting
        • lapack::Gflop
      • Bandwidth Analysis
        • lapack::Gbyte
      • Usage Example
      • Supported Operations
      • Notes
    • Matrix Factorizations
      • LU Factorization
      • Cholesky Factorization
      • QR Factorization
      • LQ Factorization
      • QL Factorization
      • RQ Factorization
      • Symmetric/Hermitian Factorizations
      • Triangular Factorizations
      • Bidiagonal Reduction
      • Tridiagonal Reduction
      • Hessenberg Reduction
      • Example Usage
      • Factorization Algorithms
      • Storage Efficiency
      • Performance Characteristics
      • Blocking and Level-3 BLAS
      • Choosing a Factorization
      • See Also
    • Linear Systems
      • General (Non-Symmetric) Systems
      • Symmetric/Hermitian Positive Definite
      • Symmetric/Hermitian Indefinite
      • Triangular Systems
      • Band Matrices
      • Tridiagonal Systems
      • Symmetric Positive Definite Tridiagonal
      • See Also
    • Least Squares and Linear Regression
      • Overview
      • QR-Based Least Squares
      • QR Factorization
      • LQ Factorization
      • QL and RQ Factorizations
      • Orthogonal Matrix Generation
      • Multiplication by Q
      • Complete Orthogonal Factorization
      • Constrained Least Squares
      • Rank Estimation
      • See Also
    • Eigenvalue Problems
      • Overview
      • Standard Eigenvalue Problems
        • Symmetric/Hermitian Eigenvalues
        • Non-Symmetric Eigenvalues
        • Tridiagonal/Banded Eigenproblems
      • Generalized Eigenvalue Problems
        • Symmetric/Hermitian
        • Non-Symmetric (Generalized Schur)
      • Reduction to Standard Form
      • Auxiliary Reductions
      • Schur Form Manipulation
      • Eigenvector Computation
      • Balancing
      • Utility Functions
      • See Also
    • Singular Value Decomposition (SVD)
      • Overview
      • Standard SVD
      • Specialized SVD Routines
      • Bidiagonal SVD
      • Bidiagonal Reduction
      • Generalized SVD (GSVD)
      • CS Decomposition (CSD)
      • Utility Functions
      • Rank and Pseudoinverse
      • Applications
      • See Also
    • Auxiliary Functions
      • Matrix Norms
      • Condition Number Estimation
      • Equilibration and Scaling
      • Matrix Initialization and Copying
      • Matrix Addition
      • Householder Transformations
      • Givens Rotations
      • Matrix Multiplication (Triangular)
      • Special Matrix Operations
      • Permutations and Sorting
      • Workspace Queries
      • Error Checking and Diagnostics
      • See Also
    • Overview
    • Key Features
    • See Also

EXAMPLES

  • Overview
  • Building Examples
    • Option 1: Makefile
    • Option 2: CMake
  • Example 01: Matrix Construction
    • Key Concepts
    • C++ Example
  • Example 02: Matrix Type Conversion
    • Key Concepts
    • C++ Example
  • Example 03: Submatrices and Slicing
    • Key Concepts
    • C++ Example
  • Example 04: Matrix Norms
    • Key Concepts
    • C++ Example
  • Example 05: BLAS Operations
    • Key Concepts
    • C++ Example
    • C API Example
    • Fortran API Example
  • Example 06: Linear Systems (LU)
    • Key Concepts
    • C++ Example
    • C API Example
  • Example 07: Linear Systems (Cholesky)
    • Key Concepts
    • C++ Example
  • Example 08: Linear Systems (Indefinite)
    • Key Concepts
    • C++ Example
  • Example 09: Least Squares
    • Key Concepts
    • C++ Example
  • Example 10: Singular Value Decomposition (SVD)
    • Key Concepts
    • C++ Example
  • Example 11: Hermitian Eigenvalue Problems
    • Key Concepts
    • C++ Example
  • Example 12: Generalized Hermitian Eigenvalues
    • Key Concepts
    • C++ Example
  • Example 13: Non-uniform Block Sizes
    • Key Concepts
    • C++ Example
  • Example 14: ScaLAPACK Compatibility
    • Key Concepts
    • C++ Example
  • Example 15: Setting Matrix Elements
    • Key Concepts
    • C++ Example

DEVELOPER GUIDE

  • Introduction
    • Revision Notes
  • API Layers
    • Drivers
    • Computational Routines
      • Comments on the Code
      • Template Dispatch
      • Executing Multiple Internal Routines on Devices
    • Internal Routines for Major, Parallel Tasks
      • Batched GPU Tasks
    • Tile Operations for Small, Sequential Tasks
    • BLAS++, Batched BLAS++, and LAPACK++
    • Work Routines for Actual OpenMP Work
  • Matrix Storage
    • Tile Management
  • Handling of Side, Uplo, Trans, etc.
  • Handling of Precisions
  • Parallelism Model
  • Message Passing Communication
  • MOSI Coherency Protocol
    • Coherency Control
    • Tile States
    • MOSI API
    • Data Transfer
    • Developer Hints
  • Column Major and Row Major Layout
    • Layout Representation and API
    • Layout Conversion
      • Layout Conversion of Extended Tiles
  • Bibliography

ADDITIONAL INFO

  • About SLATE
    • Key Features
    • Project Information
    • Version
  • Changelog
  • License
    • BSD 3-Clause License
SLATE
  • Search


© Copyright 2017-2025, Innovative Computing Laboratory, University of Tennessee.

Built with Sphinx using a theme provided by Read the Docs.