Skip to content

Commit 956f4ab

Browse files
[oneMKL][BLAS] Add dense matrix transpose routines to BLAS-like extensions (#420)
1 parent bfd7e34 commit 956f4ab

File tree

9 files changed

+3287
-2
lines changed

9 files changed

+3287
-2
lines changed

source/elements/oneMKL/source/architecture/api_design.inc.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -99,10 +99,10 @@ Each enumeration value comes with two names: A single-character name (the tradit
9999
- Do not transpose or conjugate the matrix.
100100
* - ``transpose::T``
101101
- ``transpose::trans``
102-
- Transpose the matrix.
102+
- Transpose the matrix (without complex conjugation).
103103
* - ``transpose::C``
104104
- ``transpose::conjtrans``
105-
- Perform Hermitian transpose (transpose and conjugate). Only applicable to complex matrices.
105+
- Perform Hermitian transpose (transpose and conjugate). Is the same as ``transpose::trans`` for real matrices.
106106

107107
.. _onemkl_enum_uplo:
108108

source/elements/oneMKL/source/domains/blas/blas-like-extensions.rst

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,20 @@ BLAS-like Extensions
3737
only the upper or lower triangular part of the result matrix.
3838
* - :ref:`onemkl_blas_gemm_bias`
3939
- Computes a matrix-matrix product using general integer matrices with bias
40+
* - :ref:`onemkl_blas_imatcopy`
41+
- Computes an in-place matrix transposition or copy.
42+
* - :ref:`onemkl_blas_omatcopy`
43+
- Computes an out-of-place matrix transposition or copy.
44+
* - :ref:`onemkl_blas_omatcopy2`
45+
- Computes a two-strided out-of-place matrix transposition or copy.
46+
* - :ref:`onemkl_blas_omatadd`
47+
- Computes scaled matrix addition with possibly transposed arguments.
48+
* - :ref:`onemkl_blas_imatcopy_batch`
49+
- Computes groups of in-place matrix transposition or copy operations.
50+
* - :ref:`onemkl_blas_omatcopy_batch`
51+
- Computes groups of out-of-place matrix transposition or copy operations.
52+
* - :ref:`onemkl_blas_omatadd_batch`
53+
- Computes groups of scaled matrix additions.
4054

4155

4256

@@ -55,5 +69,12 @@ BLAS-like Extensions
5569
trsm_batch
5670
gemmt
5771
gemm_bias
72+
imatcopy
73+
omatcopy
74+
omatcopy2
75+
omatadd
76+
imatcopy_batch
77+
omatcopy_batch
78+
omatadd_batch
5879

5980
**Parent topic:** :ref:`onemkl_blas`
Lines changed: 299 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,299 @@
1+
.. SPDX-FileCopyrightText: 2022 Intel Corporation
2+
..
3+
.. SPDX-License-Identifier: CC-BY-4.0
4+
5+
.. _onemkl_blas_imatcopy:
6+
7+
imatcopy
8+
========
9+
10+
Computes an in-place scaled matrix transpose or copy operation
11+
using a general dense matrix.
12+
13+
.. _onemkl_blas_imatcopy_description:
14+
15+
.. rubric:: Description
16+
17+
The ``imatcopy`` routine performs an in-place scaled
18+
matrix copy or transposition.
19+
20+
The operation is defined as:
21+
22+
.. math::
23+
24+
C \leftarrow \alpha * op(C)
25+
26+
where:
27+
28+
op(X) is one of op(X) = X, or op(X) = X\ :sup:`T`, or op(X) = X\ :sup:`H`,
29+
30+
``alpha`` is a scalar,
31+
32+
``C`` is a matrix to be transformed in place,
33+
34+
and ``C`` is ``m`` x ``n`` on input.
35+
36+
``imatcopy`` supports the following precisions:
37+
38+
.. list-table::
39+
:header-rows: 1
40+
41+
* - T
42+
* - ``float``
43+
* - ``double``
44+
* - ``std::complex<float>``
45+
* - ``std::complex<double>``
46+
47+
.. _onemkl_blas_imatcopy_buffer:
48+
49+
imatcopy (Buffer Version)
50+
-------------------------
51+
52+
.. rubric:: Syntax
53+
54+
.. code-block:: cpp
55+
56+
namespace oneapi::mkl::blas::column_major {
57+
void imatcopy(sycl::queue &queue,
58+
oneapi::mkl::transpose trans,
59+
std::int64_t m,
60+
std::int64_t n,
61+
T alpha,
62+
sycl::buffer<T, 1> &matrix_in_out,
63+
std::int64_t ld_in,
64+
std::int64_t ld_out);
65+
}
66+
.. code-block:: cpp
67+
68+
namespace oneapi::mkl::blas::row_major {
69+
void imatcopy(sycl::queue &queue,
70+
oneapi::mkl::transpose trans,
71+
std::int64_t m,
72+
std::int64_t n,
73+
T alpha,
74+
sycl::buffer<T, 1> &matrix_in_out,
75+
std::int64_t ld_in,
76+
std::int64_t ld_out);
77+
}
78+
79+
.. container:: section
80+
81+
.. rubric:: Input Parameters
82+
83+
queue
84+
The queue where the routine should be executed.
85+
86+
trans
87+
Specifies op(``C``), the transposition operation applied to the
88+
matrix ``C``. See :ref:`onemkl_datatypes` for more details.
89+
90+
m
91+
Number of rows of ``C`` on input. Must be at least zero.
92+
93+
n
94+
Number of columns of ``C`` on input. Must be at least zero.
95+
96+
alpha
97+
Scaling factor for the matrix transposition or copy.
98+
99+
matrix_in_out
100+
Buffer holding the input/output matrix ``C``. Must have size as follows:
101+
102+
.. list-table::
103+
:header-rows: 1
104+
105+
* -
106+
- ``trans`` = ``transpose::nontrans``
107+
- ``trans`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans``
108+
* - Column major
109+
- Size of array ``matrix_in_out`` must be at least max(``ld_in``, ``ld_out``) * ``n``
110+
- Size of array ``matrix_in_out`` must be at least max(``ld_in``*``n``, ``ld_out``*``m``)
111+
* - Row major
112+
- Size of array ``matrix_in_out`` must be at least max(``ld_in``, ``ld_out``) * ``m``
113+
- Size of array ``matrix_in_out`` must be at least max(``ld_in``*``m``, ``ld_out``*``n``)
114+
115+
ld_in
116+
The leading dimension of the matrix ``C`` on input. It must be
117+
positive, and must be at least ``m`` if column major layout is
118+
used, and at least ``n`` if row-major layout is used.
119+
120+
ld_out
121+
The leading dimension of the matrix ``C`` on output. It must be positive.
122+
123+
.. list-table::
124+
:header-rows: 1
125+
126+
* -
127+
- ``trans`` = ``transpose::nontrans``
128+
- ``trans`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans``
129+
* - Column major
130+
- ``ld_out`` must be at least ``m``.
131+
- ``ld_out`` must be at least ``n``.
132+
* - Row major
133+
- ``ld_out`` must be at least ``n``.
134+
- ``ld_out`` must be at least ``m``.
135+
136+
.. container:: section
137+
138+
.. rubric:: Output Parameters
139+
140+
matrix_in_out
141+
Output buffer, overwritten by ``alpha`` * op(``C``).
142+
143+
.. container:: section
144+
145+
.. rubric:: Throws
146+
147+
This routine shall throw the following exceptions if the associated
148+
condition is detected. An implementation may throw additional
149+
implementation-specific exception(s) in case of error conditions
150+
not covered here.
151+
152+
:ref:`oneapi::mkl::invalid_argument<onemkl_exception_invalid_argument>`
153+
154+
155+
:ref:`oneapi::mkl::unsupported_device<onemkl_exception_unsupported_device>`
156+
157+
158+
:ref:`oneapi::mkl::host_bad_alloc<onemkl_exception_host_bad_alloc>`
159+
160+
161+
:ref:`oneapi::mkl::device_bad_alloc<onemkl_exception_device_bad_alloc>`
162+
163+
164+
:ref:`oneapi::mkl::unimplemented<onemkl_exception_unimplemented>`
165+
166+
167+
.. _onemkl_blas_imatcopy_usm:
168+
169+
imatcopy (USM Version)
170+
----------------------
171+
172+
.. rubric:: Syntax
173+
174+
.. code-block:: cpp
175+
176+
namespace oneapi::mkl::blas::column_major {
177+
sycl::event imatcopy(sycl::queue &queue,
178+
oneapi::mkl::transpose trans,
179+
std::int64_t m,
180+
std::int64_t n,
181+
T alpha,
182+
const T *matrix_in_out,
183+
std::int64_t ld_in,
184+
std::int64_t ld_out,
185+
const std::vector<sycl::event> &dependencies = {});
186+
.. code-block:: cpp
187+
188+
namespace oneapi::mkl::blas::row_major {
189+
sycl::event imatcopy(sycl::queue &queue,
190+
oneapi::mkl::transpose trans,
191+
std::int64_t m,
192+
std::int64_t n,
193+
T alpha,
194+
const T *matrix_in_out,
195+
std::int64_t ld_in,
196+
std::int64_t ld_out,
197+
const std::vector<sycl::event> &dependencies = {});
198+
199+
.. container:: section
200+
201+
.. rubric:: Input Parameters
202+
203+
queue
204+
The queue where the routine will be executed.
205+
206+
trans
207+
Specifies op(``C``), the transposition operation applied to the
208+
matrix ``C``. See :ref:`onemkl_datatypes` for more details.
209+
210+
m
211+
Number of rows for the matrix ``C`` on input. Must be at least zero.
212+
213+
n
214+
Number of columns for the matrix ``C`` on input. Must be at least zero.
215+
216+
alpha
217+
Scaling factor for the matrix transpose or copy operation.
218+
219+
matrix_in_out
220+
Pointer to input/output matrix ``C``. Must have size as follows:
221+
222+
.. list-table::
223+
:header-rows: 1
224+
225+
* -
226+
- ``trans`` = ``transpose::nontrans``
227+
- ``trans`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans``
228+
* - Column major
229+
- Size of array ``matrix_in_out`` must be at least max(``ld_in``, ``ld_out``) * ``n``
230+
- Size of array ``matrix_in_out`` must be at least max(``ld_in``*``n``, ``ld_out``*``m``)
231+
* - Row major
232+
- Size of array ``matrix_in_out`` must be at least max(``ld_in``, ``ld_out``) * ``m``
233+
- Size of array ``matrix_in_out`` must be at least max(``ld_in``*``m``, ``ld_out``*``n``)
234+
235+
ld_in
236+
Leading dimension of the matrix ``C`` on input. If matrices are stored
237+
using column major layout, ``ld_in`` must be at least ``m``. If matrices
238+
are stored using row major layout, ``ld_in`` must be at least ``n``.
239+
Must be positive.
240+
241+
ld_out
242+
Leading dimension of the matrix ``C`` on output. Must be positive.
243+
244+
.. list-table::
245+
:header-rows: 1
246+
247+
* -
248+
- ``trans`` = ``transpose::nontrans``
249+
- ``trans`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans``
250+
* - Column major
251+
- ``ld_out`` must be at least ``m``.
252+
- ``ld_out`` must be at least ``n``.
253+
* - Row major
254+
- ``ld_out`` must be at least ``n``.
255+
- ``ld_out`` must be at least ``m``.
256+
257+
dependencies
258+
List of events to wait for before starting computation, if any.
259+
If omitted, defaults to no dependencies.
260+
261+
.. container:: section
262+
263+
.. rubric:: Output Parameters
264+
265+
matrix_in_out
266+
Pointer to output matrix ``C`` overwritten by ``alpha`` * op(``C``).
267+
268+
.. container:: section
269+
270+
.. rubric:: Return Values
271+
272+
Output event to wait on to ensure computation is complete.
273+
274+
.. container:: section
275+
276+
.. rubric:: Throws
277+
278+
This routine shall throw the following exceptions if the associated
279+
condition is detected. An implementation may throw additional
280+
implementation-specific exception(s) in case of error conditions
281+
not covered here.
282+
283+
:ref:`oneapi::mkl::invalid_argument<onemkl_exception_invalid_argument>`
284+
285+
286+
:ref:`oneapi::mkl::unsupported_device<onemkl_exception_unsupported_device>`
287+
288+
289+
:ref:`oneapi::mkl::host_bad_alloc<onemkl_exception_host_bad_alloc>`
290+
291+
292+
:ref:`oneapi::mkl::device_bad_alloc<onemkl_exception_device_bad_alloc>`
293+
294+
295+
:ref:`oneapi::mkl::unimplemented<onemkl_exception_unimplemented>`
296+
297+
298+
**Parent topic:** :ref:`blas-like-extensions`
299+

0 commit comments

Comments
 (0)