compiler: Avoid int32 overflow in linearized host-device transfer size by gaoflow · Pull Request #2939 · devitocodes/devito

gaoflow · 2026-05-29T02:07:52Z

Description

When a host↔device data transfer is linearized, its array section size is emitted as a product of the Function's per-dimension sizes, for example:

#pragma acc enter data copyin(u[0:u_vec->size[0]*u_vec->size[1]*u_vec->size[2]*u_vec->size[3]])

The u_vec->size[i] fields are 32-bit C ints, so the product size[0]*size[1]*size[2]*size[3] is evaluated in 32-bit arithmetic. For a Function with more than ~2**31 elements (e.g. the reporter's 1295**3 ≈ 2.17e9 points, ~24.5 GB) the product overflows int before it is used as the transfer bound, producing a bogus size (the reporter saw 18446744065653020036) and a corrupt / failed device transfer — independent of index-mode=int64/linearize=True, because those control the kernel index type, not the type of the transfer-clause arithmetic.

As @mloubout noted on the issue, the fix is to perform the size multiplication in 64-bit. This casts each factor of a product section bound to a 64-bit integer:

#pragma acc enter data copyin(u[0:(long)(u_vec->size[0])*(long)(u_vec->size[1])*(long)(u_vec->size[2])*(long)(u_vec->size[3])])

Casting the whole product ((long)(a*b*c)) would be too late — the overflow would already have happened in 32-bit — so each factor is cast individually, which forces every multiplication to be 64-bit regardless of operand ordering.

The change lives in PragmaTransfer._generate, so it is scoped to host-device transfer clauses only. Non-product bounds (a single dimension size, an offset, a constant) cannot overflow and are left untouched, addressing the concern that there is "no reason to use long for all of those". Non-transfer expressions (e.g. free-space guards, TMA descriptors) are unaffected.

Reproduction

from devito import Eq, Grid, Operator, TimeFunction

grid = Grid(shape=(4, 5, 6))
u = TimeFunction(name='u', grid=grid)
op = Operator(Eq(u.forward, u + 1), platform='nvidiaX', language='openacc',
              opt=('advanced', {'linearize': True}))
print(op.body.maps[0].ccode.value)

Before:

acc enter data copyin(u[0:u_vec->size[0]*u_vec->size[1]*u_vec->size[2]*u_vec->size[3]])

After:

acc enter data copyin(u[0:(long)(u_vec->size[0])*(long)(u_vec->size[1])*(long)(u_vec->size[2])*(long)(u_vec->size[3])])

Verification

The fix applies to both backends (openacc copyin/copyout/delete and openmp map(to:/release:)) and to 2D/3D Functions.
The non-linearized transfer path (separate per-dimension sections [0:s0][0:s1]...) is unchanged — there is no product there, hence no overflow.
Added TestPassesOptional::test_linearize_transfer_no_overflow asserting each size[i] factor of a linearized transfer is cast to long and that no bare 32-bit product remains.
Updated the existing test_gpu_openmp.py expectations (test_basic, test_multiple_eqns) whose OpenMP transfers use the flattened product form.
Host (CPU) operators, including linearize=True, build and run unchanged (no device transfers emitted). flake8 clean on the changed files.

Note: the GPU test modules are skipif(['nodevice']), so the codegen assertions run on the GPU CI runners. They were validated locally by forcing platform='nvidiaX'.

When a host-device data transfer is linearized, its array section size is emitted as a product of the Function's per-dimension sizes, e.g. `copyin(u[0:u_vec->size[0]*u_vec->size[1]*u_vec->size[2]*u_vec->size[3]])`. The `size[i]` fields are 32-bit C ints, so for a Function with more than ~2**31 elements the product overflows `int` before it is used as the transfer bound, yielding a bogus size and a corrupt/failed device transfer. Cast each factor of the product to a 64-bit integer so the multiplication is carried out in 64-bit arithmetic. Casting the whole product would be too late (the overflow would already have occurred), so each factor is cast individually. Non-product bounds (a single size, an offset, a constant) cannot overflow and are left untouched, as are non-transfer expressions. Fixes devitocodes#2777

FabioLuporini · 2026-05-29T08:31:09Z

        return self._make_parallel(graph, sync_mapper=graph.sync_mapper)


+def _avoid_overflow(expr):


there's an as_long helper in symbolics/manipulation.py that you should use instead of this function (which, btw, wouldn't even logically belong to this module -- besides its implementation being quite hacky)

Thanks! Done — switched to as_long and dropped _avoid_overflow.

One thing worth flagging: as_long as written was a no-op here. The transfer size factors (vec->size[i]) are IndexedPointers, but as_long substituted only what retrieve_symbols returns (plain Symbols), so the product came out un-cast and the overflow remained. I changed as_long to use retrieve_terminals instead, so Indexed/IndexedPointer leaves are cast too (single-Symbol callers are unaffected, and numeric literals aren't touched). I kept the call scoped to Mul products in PragmaTransfer._generate so non-linearized multi-dimensional sections aren't needlessly upcast. test_symbolics/test_gpu_common green locally; the OpenMP exact-string expectations were updated for the new factor order.

Address review: replace the ad-hoc _avoid_overflow helper with the existing as_long. as_long only substituted plain Symbols (retrieve_symbols), so it was a no-op on the IndexedPointer size factors (vec->size[i]) of a linearized transfer bound; extend it to retrieve_terminals so Indexed/IndexedPointer leaves are cast too. Keep the cast scoped to Mul products in PragmaTransfer so non-linearized multi-dimensional sections are not needlessly upcast.

FabioLuporini requested changes May 29, 2026

View reviewed changes

mloubout added the compiler label May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compiler: Avoid int32 overflow in linearized host-device transfer size#2939

compiler: Avoid int32 overflow in linearized host-device transfer size#2939
gaoflow wants to merge 2 commits into
devitocodes:mainfrom
gaoflow:fix-2777-transfer-size-overflow

gaoflow commented May 29, 2026

Uh oh!

FabioLuporini May 29, 2026

Uh oh!

gaoflow May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		return self._make_parallel(graph, sync_mapper=graph.sync_mapper)


		def _avoid_overflow(expr):

Conversation

gaoflow commented May 29, 2026

Description

Reproduction

Verification

Uh oh!

FabioLuporini May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gaoflow May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants