Example 05: BLAS Operations
===========================

This example demonstrates how to perform Basic Linear Algebra Subprograms (BLAS) operations in SLATE.

Key Concepts
------------

1.  **Matrix Multiplication**: Using ``slate::multiply`` (gemm, hemm, symm) for matrix products.
2.  **Rank Updates**: Performing rank-k (herk, syrk) and rank-2k (her2k, syr2k) updates.
3.  **Triangular Operations**: Triangular matrix multiplication (trmm) and solving triangular systems (trsm).
4.  **Simplified vs Traditional API**: Comparing the descriptive ``multiply`` API with the traditional BLAS-named API.

C++ Example
-----------

**General Matrix Multiplication (GEMM) (Lines 36-40)**

.. code-block:: cpp

    // C = alpha A B + beta C
    slate::multiply( alpha, A, B, beta, C );  // simplified API
    slate::gemm( alpha, A, B, beta, C );      // traditional API

Here we perform the standard operation :math:`C = \alpha AB + \beta C`.

- `A` is an `m` by `k` matrix.
- `B` is a `k` by `n` matrix.
- `C` is an `m` by `n` matrix.

SLATE provides both a descriptive `multiply` routine and the traditional BLAS-named `gemm`. They are equivalent.

**GPU Execution with Options (Lines 43-52)**

.. code-block:: cpp

    if (blas::get_device_count() > 0) {
        slate::Options opts = {
            { slate::Option::Lookahead, 2 },
            { slate::Option::Target, slate::Target::Devices },
        };
        slate::multiply( alpha, A, B, beta, C, opts );
    }

Most SLATE routines accept an `Options` map as the final argument. Here we:

- Set `Target::Devices` to offload computation to GPUs.
- Set `Lookahead` to 2 to overlap communication and computation.

**Transposed Multiplication (Lines 77-83)**

.. code-block:: cpp

    auto AT = transpose( A );
    auto BH = conj_transpose( B );
    slate::multiply( alpha, AT, BH, beta, C );

To compute :math:`C = \alpha A^T B^H + \beta C`, we simply create transposed views `AT` and `BH` and pass them to the multiply function. SLATE detects the transposition flags on the views and handles the logic internally.

**Symmetric/Hermitian Multiplication (SYMM/HEMM) (Lines 97-118)**

.. code-block:: cpp

    slate::multiply( alpha, A, B, beta, C );                  // simplified
    slate::symm( slate::Side::Left, alpha, A, B, beta, C );   // traditional

When `A` is a `SymmetricMatrix` (or `HermitianMatrix`), `multiply` automatically dispatches to the efficient symmetric/Hermitian algorithm (`symm`/`hemm`).

- `Side::Left` means :math:`C = \alpha A B + \beta C`.
- `Side::Right` means :math:`C = \alpha B A + \beta C` (demonstrated in lines 141-147).

**Rank-K Updates (SYRK/HERK) (Lines 230-241)**

.. code-block:: cpp

    slate::rank_k_update( alpha, A, beta, C );
    slate::syrk( alpha, A, beta, C );

Computes :math:`C = \alpha A A^T + \beta C` where `C` is symmetric. Only the designated triangle of `C` (Lower or Upper) is updated.

**Triangular Operations (TRMM/TRSM) (Lines 299-310)**

.. code-block:: cpp

    // B = alpha A B
    slate::triangular_multiply( alpha, A, B );       // trmm

    // B = alpha A^{-1} B (Solve AX = B)
    slate::triangular_solve( alpha, A, B );          // trsm

For triangular matrices, we can multiply (`trmm`) or solve (`trsm`). The simplified API names make the intent clear ("multiply" vs "solve").


.. literalinclude:: ../../../examples/ex05_blas.cc
   :language: cpp
   :linenos:


C API Example
-------------

.. literalinclude:: ../../../examples/c_api/ex05_blas.c
   :language: c
   :linenos:

Fortran API Example
-------------------

.. literalinclude:: ../../../examples/fortran/ex05_blas.f90
   :language: fortran
   :linenos: