uxlfoundation
diff --git a/‎source/elements/oneMKL/source/architecture/api_design.inc.rst‎
Lines changed: 2 additions & 2 deletions b/‎source/elements/oneMKL/source/architecture/api_design.inc.rst‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎source/elements/oneMKL/source/domains/blas/blas-like-extensions.rst‎
Lines changed: 21 additions & 0 deletions b/‎source/elements/oneMKL/source/domains/blas/blas-like-extensions.rst‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎source/elements/oneMKL/source/domains/blas/imatcopy.rst‎
Lines changed: 299 additions & 0 deletions b/‎source/elements/oneMKL/source/domains/blas/imatcopy.rst‎
Lines changed: 299 additions & 0 deletions
@@ -99,10 +99,10 @@ Each enumeration value comes with two names: A single-character name (the tradit
               -  Do not transpose or conjugate the matrix.
             * -  ``transpose::T``
               -  ``transpose::trans``
-              -  Transpose the matrix.
+              -  Transpose the matrix (without complex conjugation).
             * -  ``transpose::C``
               -  ``transpose::conjtrans``
-              -  Perform Hermitian transpose (transpose and conjugate). Only applicable to complex matrices.
+              -  Perform Hermitian transpose (transpose and conjugate). Is the same as ``transpose::trans`` for real matrices.
 
       .. _onemkl_enum_uplo:
 
 
@@ -37,6 +37,20 @@ BLAS-like Extensions
                  only the upper or lower triangular part of the result matrix.
          * -     :ref:`onemkl_blas_gemm_bias`   
            -     Computes a matrix-matrix product using general integer matrices with bias
+         * -     :ref:`onemkl_blas_imatcopy`
+           -     Computes an in-place matrix transposition or copy.
+         * -     :ref:`onemkl_blas_omatcopy`
+           -     Computes an out-of-place matrix transposition or copy.
+         * -     :ref:`onemkl_blas_omatcopy2`
+           -     Computes a two-strided out-of-place matrix transposition or copy.
+         * -     :ref:`onemkl_blas_omatadd`
+           -     Computes scaled matrix addition with possibly transposed arguments.
+         * -     :ref:`onemkl_blas_imatcopy_batch`
+           -     Computes groups of in-place matrix transposition or copy operations.
+         * -     :ref:`onemkl_blas_omatcopy_batch`
+           -     Computes groups of out-of-place matrix transposition or copy operations.
+         * -     :ref:`onemkl_blas_omatadd_batch`
+           -     Computes groups of scaled matrix additions.
 
 
 
@@ -55,5 +69,12 @@ BLAS-like Extensions
     trsm_batch
     gemmt
     gemm_bias
+    imatcopy
+    omatcopy
+    omatcopy2
+    omatadd
+    imatcopy_batch
+    omatcopy_batch
+    omatadd_batch
 
 **Parent topic:** :ref:`onemkl_blas`
@@ -0,0 +1,299 @@
+.. SPDX-FileCopyrightText: 2022 Intel Corporation
+..
+.. SPDX-License-Identifier: CC-BY-4.0
+
+.. _onemkl_blas_imatcopy:
+
+imatcopy
+========
+
+Computes an in-place scaled matrix transpose or copy operation
+using a general dense matrix.
+
+.. _onemkl_blas_imatcopy_description:
+
+.. rubric:: Description
+
+The ``imatcopy`` routine performs an in-place scaled
+matrix copy or transposition.
+
+The operation is defined as:
+
+.. math::
+
+      C \leftarrow \alpha * op(C)
+
+where:
+
+op(X) is one of op(X) = X, or op(X) = X\ :sup:`T`, or op(X) = X\ :sup:`H`,
+
+``alpha`` is a scalar,
+
+``C`` is a matrix to be transformed in place,
+
+and ``C`` is ``m`` x ``n`` on input.
+
+``imatcopy`` supports the following precisions:
+
+   .. list-table::
+      :header-rows: 1
+
+      * -  T 
+      * -  ``float`` 
+      * -  ``double`` 
+      * -  ``std::complex<float>`` 
+      * -  ``std::complex<double>`` 
+
+.. _onemkl_blas_imatcopy_buffer:
+
+imatcopy (Buffer Version)
+-------------------------
+
+.. rubric:: Syntax
+
+.. code-block:: cpp
+
+   namespace oneapi::mkl::blas::column_major {
+       void imatcopy(sycl::queue &queue,
+                     oneapi::mkl::transpose trans,
+                     std::int64_t m,
+                     std::int64_t n,
+                     T alpha,
+                     sycl::buffer<T, 1> &matrix_in_out,
+                     std::int64_t ld_in,
+                     std::int64_t ld_out);
+   }
+.. code-block:: cpp
+
+   namespace oneapi::mkl::blas::row_major {
+       void imatcopy(sycl::queue &queue,
+                     oneapi::mkl::transpose trans,
+                     std::int64_t m,
+                     std::int64_t n,
+                     T alpha,
+                     sycl::buffer<T, 1> &matrix_in_out,
+                     std::int64_t ld_in,
+                     std::int64_t ld_out);
+   }
+
+.. container:: section
+
+   .. rubric:: Input Parameters
+
+   queue
+      The queue where the routine should be executed.
+
+   trans
+      Specifies op(``C``), the transposition operation applied to the
+      matrix ``C``. See :ref:`onemkl_datatypes` for more details.
+
+   m
+      Number of rows of ``C`` on input. Must be at least zero.
+
+   n
+      Number of columns of ``C`` on input. Must be at least zero.
+
+   alpha
+      Scaling factor for the matrix transposition or copy.
+
+   matrix_in_out
+      Buffer holding the input/output matrix ``C``. Must have size as follows:
+
+      .. list-table::
+         :header-rows: 1
+     
+         * -
+           - ``trans`` = ``transpose::nontrans``
+           - ``trans`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans``
+         * - Column major
+           - Size of array ``matrix_in_out`` must be at least max(``ld_in``, ``ld_out``) * ``n``
+           - Size of array ``matrix_in_out`` must be at least max(``ld_in``*``n``, ``ld_out``*``m``)
+         * - Row major
+           - Size of array ``matrix_in_out`` must be at least max(``ld_in``, ``ld_out``) * ``m``
+           - Size of array ``matrix_in_out`` must be at least max(``ld_in``*``m``, ``ld_out``*``n``)
+
+   ld_in
+      The leading dimension of the matrix ``C`` on input. It must be
+      positive, and must be at least ``m`` if column major layout is
+      used, and at least ``n`` if row-major layout is used.
+
+   ld_out
+      The leading dimension of the matrix ``C`` on output. It must be positive.
+
+      .. list-table::
+         :header-rows: 1
+
+         * -
+           - ``trans`` = ``transpose::nontrans``
+           - ``trans`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans``
+         * - Column major
+           - ``ld_out`` must be at least ``m``.
+           - ``ld_out`` must be at least ``n``.
+         * - Row major
+           - ``ld_out`` must be at least ``n``.
+           - ``ld_out`` must be at least ``m``.
+
+.. container:: section
+
+   .. rubric:: Output Parameters
+
+   matrix_in_out
+      Output buffer, overwritten by ``alpha`` * op(``C``).
+
+.. container:: section
+
+   .. rubric:: Throws
+
+   This routine shall throw the following exceptions if the associated
+   condition is detected. An implementation may throw additional
+   implementation-specific exception(s) in case of error conditions
+   not covered here.
+
+   :ref:`oneapi::mkl::invalid_argument<onemkl_exception_invalid_argument>`
+       
+   
+   :ref:`oneapi::mkl::unsupported_device<onemkl_exception_unsupported_device>`
+       
+
+   :ref:`oneapi::mkl::host_bad_alloc<onemkl_exception_host_bad_alloc>`
+       
+
+   :ref:`oneapi::mkl::device_bad_alloc<onemkl_exception_device_bad_alloc>`
+       
+
+   :ref:`oneapi::mkl::unimplemented<onemkl_exception_unimplemented>`
+      
+
+.. _onemkl_blas_imatcopy_usm:
+
+imatcopy (USM Version)
+----------------------
+
+.. rubric:: Syntax
+
+.. code-block:: cpp
+
+   namespace oneapi::mkl::blas::column_major {
+       sycl::event imatcopy(sycl::queue &queue,
+                            oneapi::mkl::transpose trans,
+                            std::int64_t m,
+                            std::int64_t n,
+                            T alpha,
+                            const T *matrix_in_out,
+                            std::int64_t ld_in,
+                            std::int64_t ld_out,
+                            const std::vector<sycl::event> &dependencies = {});
+.. code-block:: cpp
+
+   namespace oneapi::mkl::blas::row_major {
+       sycl::event imatcopy(sycl::queue &queue,
+                            oneapi::mkl::transpose trans,
+                            std::int64_t m,
+                            std::int64_t n,
+                            T alpha,
+                            const T *matrix_in_out,
+                            std::int64_t ld_in,
+                            std::int64_t ld_out,
+                            const std::vector<sycl::event> &dependencies = {});
+
+.. container:: section
+
+   .. rubric:: Input Parameters
+
+   queue
+      The queue where the routine will be executed.
+
+   trans
+      Specifies op(``C``), the transposition operation applied to the
+      matrix ``C``. See :ref:`onemkl_datatypes` for more details.
+
+   m
+      Number of rows for the matrix ``C`` on input. Must be at least zero.
+
+   n
+      Number of columns for the matrix ``C`` on input. Must be at least zero.
+
+   alpha
+      Scaling factor for the matrix transpose or copy operation.
+
+   matrix_in_out
+         Pointer to input/output matrix ``C``. Must have size as follows:
+
+      .. list-table::
+         :header-rows: 1
+     
+         * -
+           - ``trans`` = ``transpose::nontrans``
+           - ``trans`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans``
+         * - Column major
+           - Size of array ``matrix_in_out`` must be at least max(``ld_in``, ``ld_out``) * ``n``
+           - Size of array ``matrix_in_out`` must be at least max(``ld_in``*``n``, ``ld_out``*``m``)
+         * - Row major
+           - Size of array ``matrix_in_out`` must be at least max(``ld_in``, ``ld_out``) * ``m``
+           - Size of array ``matrix_in_out`` must be at least max(``ld_in``*``m``, ``ld_out``*``n``)
+
+   ld_in
+      Leading dimension of the matrix ``C`` on input. If matrices are stored
+      using column major layout, ``ld_in`` must be at least ``m``. If matrices
+      are stored using row major layout, ``ld_in`` must be at least ``n``. 
+      Must be positive.
+
+   ld_out
+      Leading dimension of the matrix ``C`` on output. Must be positive.
+
+      .. list-table::
+         :header-rows: 1
+
+         * -
+           - ``trans`` = ``transpose::nontrans``
+           - ``trans`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans``
+         * - Column major
+           - ``ld_out`` must be at least ``m``.
+           - ``ld_out`` must be at least ``n``.
+         * - Row major
+           - ``ld_out`` must be at least ``n``.
+           - ``ld_out`` must be at least ``m``.
+
+   dependencies
+      List of events to wait for before starting computation, if any.
+      If omitted, defaults to no dependencies.
+
+.. container:: section
+
+   .. rubric:: Output Parameters
+
+   matrix_in_out
+      Pointer to output matrix ``C`` overwritten by ``alpha`` * op(``C``).
+
+.. container:: section
+      
+   .. rubric:: Return Values
+
+   Output event to wait on to ensure computation is complete.
+
+.. container:: section
+
+   .. rubric:: Throws
+
+   This routine shall throw the following exceptions if the associated
+   condition is detected. An implementation may throw additional
+   implementation-specific exception(s) in case of error conditions
+   not covered here.
+
+   :ref:`oneapi::mkl::invalid_argument<onemkl_exception_invalid_argument>`
+
+
+   :ref:`oneapi::mkl::unsupported_device<onemkl_exception_unsupported_device>`
+       
+
+   :ref:`oneapi::mkl::host_bad_alloc<onemkl_exception_host_bad_alloc>`
+       
+
+   :ref:`oneapi::mkl::device_bad_alloc<onemkl_exception_device_bad_alloc>`
+       
+
+   :ref:`oneapi::mkl::unimplemented<onemkl_exception_unimplemented>`
+      
+
+   **Parent topic:** :ref:`blas-like-extensions`
+