Example 01: Matrix Construction =============================== This example demonstrates the fundamental operations for creating and managing SLATE matrices. Key Concepts ------------ 1. **Constructors**: Creating empty matrices with specific dimensions and distribution. 2. **Tile Insertion**: Allocating memory for tiles on Host (CPU) or Devices (GPU). 3. **User Data**: Wrapping existing data pointers (e.g., from ScaLAPACK) into a SLATE matrix. 4. **Transposition**: Creating transposed views of matrices without copying data. 5. **Element Access**: Iterating over tiles and elements efficiently. C++ Example ----------- **Matrix Construction (Lines 18-44)** .. code-block:: cpp // Create an empty matrix (2D block cyclic layout, p x q grid, // no tiles allocated, square nb x nb tiles) slate::Matrix A( m, n, nb, grid_p, grid_q, MPI_COMM_WORLD ); Here, we construct a standard general matrix `A`. Note that: - `m` and `n` are the global dimensions (rows, columns). - `nb` is the block size (tile size). - `grid_p` and `grid_q` define the MPI process grid dimensions. - `MPI_COMM_WORLD` is the communicator. - **Important**: This constructor defines the matrix *structure* and *distribution*, but does **not** allocate memory for the matrix elements (tiles). The matrix is initially empty. We can also create matrices with rectangular tiles (specifying `mb` and `nb`) or triangular matrices (`slate::TriangularMatrix`). `A.emptyLike()` (line 42) is a convenient way to create a new matrix `A2` with the same structure and distribution as `A` but without any data allocated. **Allocating Memory (Lines 47-68)** .. code-block:: cpp // Insert tiles on the CPU host. A.insertLocalTiles( slate::Target::Host ); Since the constructor creates an empty shell, we must explicitly allocate memory for the tiles that belong to this MPI rank. `insertLocalTiles` is the standard way to do this. - `Target::Host` allocates memory on the CPU. - `Target::Devices` (seen in lines 71-92) allocates memory on GPU devices if available. The loop shown in lines 63-66 demonstrates what `insertLocalTiles` effectively does under the hood: it iterates over the matrix structure, checks if a tile is local to the current rank, and if so, inserts (allocates) it. **Wrapping Existing Data (Lines 97-132)** .. code-block:: cpp // Attach user allocated tiles, from pointers in data( i, j ) // with local stride lld between columns. for (int64_t j = 0; j < A.nt(); ++j) { for (int64_t i = 0; i < A.mt(); ++i) { if (A.tileIsLocal( i, j )) A.tileInsert( i, j, data( i, j ), lld ); } } If you already have data allocated (e.g., from a legacy application), you can wrap it into a SLATE matrix without copying. The `data` lambda in this example simulates calculating the pointer to the start of each tile in a ScaLAPACK-style array. `tileInsert` is then called with the pointer and leading dimension (`lld`). **ScaLAPACK Conversion (Lines 135-162)** .. code-block:: cpp auto A = slate::Matrix::fromScaLAPACK( ... ); This is the simplified, preferred way to wrap legacy ScaLAPACK data. It handles all the index calculations and tile insertions automatically. **Transposition (Lines 165-185)** .. code-block:: cpp auto AT = transpose( A ); auto AH = conj_transpose( A ); SLATE operations like transposition are typically metadata operations. `AT` is a **view** of `A`. No data is copied or moved. - `AT` has its operation flag set to `Op::Trans`. - Accessing `AT(i, j)` effectively accesses `A(j, i)` with transposition applied on the fly. - This allows efficient passing of transposed arguments to BLAS routines (like `gemm`) without overhead. **Element Access (Lines 189-254)** .. code-block:: cpp if (A.tileIsLocal( i, j )) { // ... slate::Tile T = A( i, j, slate::HostNum ); for (int64_t jj = 0; jj < T.nb(); ++jj) { for (int64_t ii = 0; ii < T.mb(); ++ii) { T.at( ii, jj ) = ... } } } Direct element access in distributed memory requires care: 1. We iterate over global tiles `(i, j)`. 2. We check `A.tileIsLocal(i, j)` to ensure the current rank owns the data. 3. We acquire the tile `T` on the host. 4. We iterate within the tile using local indices `ii`, `jj`. 5. `T.at(ii, jj)` provides safe access (with bounds checking in debug mode), while `T.data()` (shown in `elements2`) provides raw pointer access for maximum performance in inner loops. .. literalinclude:: ../../../examples/ex01_matrix.cc :language: cpp :linenos: