[pauthabielf64] Define R_AARCH64_AUTH_TLSDESC_CALL.#395
Conversation
The TLSDESC sequence for accessing an authenticated pointer is similar to the traditional TLSDESC sequence. As pointer signing must be done at run-time there are much more limited opportunities to relax a TLSDESC AUTH, we introduce a R_AARCH64_AUTH_TLSDESC_CALL relocation that permits a static linker to transform the blraa to a nop, when relaxation is possible. A new relocation code has been introduced rather than reusing R_AARCH64_TLSDESC_CALL so that a static linker can assume that the destination symbol is signed, without having to derive it from other R_AARCH64_TLSDESC_* relocations. TLSDESC sequence adrp x0, :tlsdesc:v //R_AARCH64_TLSDESC_ADR_PAGE21 ldr x1, [x0, #:tlsdesc_lo12:v] //R_AARCH64_TLSDESC_LD64_LO12 add x0, x0, #:tlsdesc_lo12:v //R_AARCH64_TLSDESC_ADD_LO12 .tlsdesccall var //R_AARCH64_TLSDESC_CALL blr x1 TLSDESC AUTH sequence adrp x0, :tlsdesc_auth:v //R_AARCH64_AUTH_TLSDESC_ADR_PAGE21 ldr x16, [x0, #:tlsdesc_auth_lo12:v]//R_AARCH64_AUTH_TLSDESC_LD64_LO12 add x0, x0, #:tlsdesc_auth_lo12:v //R_AARCH64_AUTH_TLSDESC_ADD_LO12 .tlsdescauthcall v //R_AARCH64_AUTH_TLSDESC_CALL blraa x16, x0 fixes ARM-software#393
|
Thanks @smithp35! Just for transparency - PRs implementing support in llvm-project: |
| ldr x16, [x0, :tlsdesc_auth_lo12: undefined_weak // R_AARCH64_AUTH_TLSDESC_LD64_LO12 | ||
| add x0, x0 :tlsdesc_auth_lo12: undefined_weak // R_AARCH64_AUTH_TLSDESC_ADD_LO12 | ||
| .tlddescauthcall undefined_weak // R_AARCH64_AUTH_TLSDESC_CALL | ||
| autia x0, x8 |
There was a problem hiding this comment.
#393 (comment) gives this as blraa x16, x0 instead?
There was a problem hiding this comment.
Thanks for spotting. Will update.
| autia x0, x8 | ||
|
|
||
| // After relaxation, assuming undefined_weak is known to be 0 at static-link time. | ||
| mov x0, #0x0 |
There was a problem hiding this comment.
If the resolver is returning a (potentially between distinct allocations) offset from TP, wouldn't this cause a change in behaviour from giving "NULL" to giving "TP"? At least within FreeBSD our normal AArch64 TLSDESC resolver for undefined weak symbols is to return -TP(+A) so adding it to TP gives NULL(+A). I know undefined weak TLS objects are historically very cursed and break in all kinds of ways, but this would be a regression over non-PAuth TLSDESC, I think.
There was a problem hiding this comment.
(This also loses the addend entirely)
There was a problem hiding this comment.
This is a good point, @kovdan01 is your TLSDESC resolver function for an undefined weak capable of returning -TP? I believe AArch64 glibc does this too for undefined weak TLS symbols.
I think this would be preferable to relaxing the sequence to 0 as currently that 0 would get added to TP, which would not result in a value of 0 for an undefined weak.
There's always the possibility of altering the TLSDESC sequence of checking for 0 before adding to TP as 0 isn't a valid offset (first offset is TCB size + alignment padding) which is a minimum of 16. However this seems worse than the resolver.
As an aside I've not been able to create a TLSDESC sequence with a non-zero addend from some simple compiled code. Always seems like the TLSDESC calculates the address of the symbol, then does an addition, or load with immediate offset instead.
There was a problem hiding this comment.
Oh, that may be true in practice, just like GOT entries typically don't actually have addends (except when turned to relative ones, of course) as it's annoying to have potentially multiple entries per symbol to track in the linker.
There was a problem hiding this comment.
@kovdan01 is your TLSDESC resolver function for an undefined weak capable of returning -TP? I believe AArch64 glibc does this too for undefined weak TLS symbols.
@smithp35 Yes, glibc does this, and we match that behavior in our PoC reference musl implementation - see https://github.com/access-softek/musl/blob/v1.2.5-pauth-rev2025-11-21/src/ldso/aarch64/tlsdesc.S#L63-L70
The R_AARCH64_AUTH_TLSDESC_CALL is introduced to allow linker relaxation of AUTH TLSDESC call sequences for non-preemptible undefined weak symbols. The lld patch introducing the relaxation: #194636 Corresponding ARM docs PR: ARM-software/abi-aa#395
The R_AARCH64_AUTH_TLSDESC_CALL is introduced to allow linker relaxation of AUTH TLSDESC call sequences for non-preemptible undefined weak symbols. The lld patch introducing the relaxation: #194636 Corresponding ARM docs PR: ARM-software/abi-aa#395
…reloc See specification ARM-software/abi-aa#395
* use blraa as in commit message. * mention that a value of 0 when added to the thread pointer is invalid.
|
I've fixed the typos and added a note that when 0 is added to the TP it will point at the thread control block. Do we still need this? EDIT, yes as on other platforms there may be other relaxations possible. |
|
Reading through https://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-ARM.txt again. There is a relaxable sequence that could be used when static linking (so we know that the weak reference won't be defined). In effect this is inlining the resolver function that returns -TP |
|
|
||
| .. note:: | ||
|
|
||
| Relocation code ``R_AARCH64_AUTH_TLSDESC_CALL`` is needed to permit |
There was a problem hiding this comment.
I think I'll move the details about the relaxation to a separate document in the design-documents folder.
I think it is platform specific choice of whether both fields of the TLS descriptor are signed. If only the resolver function address is signed then more relaxations are possible.
Likely to be next week before I can do that.
There was a problem hiding this comment.
Well, it's not about whether the second word of the descriptor is signed. The generated code never uses that, either it passes a pointer to it as an opaque blob to the resolver or it relaxes the entire sequence so there is no descriptor to sign? The question is whether the TLS data is being signed like globals can be, and therefore whether &tls_var - TP is the same for all threads or differs in the high bits due to signing. If the data isn't signed, you can just relax to that constant (which will be the same as non-PAuth, and is either an LE immediate or an IE run-time constant), it's only if the data is signed that IE/LE are fundamentally broken?
There was a problem hiding this comment.
Thanks for the comment. I'll probably just take the rationale/relaxation bits out of the main document for now until I've got the time to work through this slowly.
Reading through the initial issue again #393. I think I've put too much weight on the comment:
Broader relaxations (such as GD->IE or GD->LE with non-statically-known-NULL symbols) are not possible because pointer authentication requires signing at program start-up.
I've missed the non-statically-known-NULL symbols part and in my haste reading. I had been trying to reconcile why other relaxations weren't possible with the code sequence the compiler uses for TLSDESC and a signed GOT.
As an aside:
Empirically using: clang --target=aarch64-linux -march=armv8.3-a -S -O2 tlsdesc.c -o - -fptrauth-elf-got -mabi=pauthtest with a trivial __thread int x; int val() { return x; }
I get:
pacibsp
stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
mov x29, sp
adrp x0, :tlsdesc_auth:x
ldr x16, [x0, :tlsdesc_auth_lo12:x]
add x0, x0, :tlsdesc_auth_lo12:x
blraa x16, x0
mrs x8, TPIDR_EL0
ldr w0, [x8, x0]
ldp x29, x30, [sp], #16 // 16-byte Folded Reload
retab
If I'm reading that correctly, the return value from the TLS resolver function isn't signed. Nor is the value of x. It looks like the only thing that is signed is the descriptor in the GOT.
That looks relaxable in principle, although maybe not in practice to initial exec. I'd expect if it were initial exec the (&tls_var - TP) would be signed in the GOT, there are enough spare instructions and registers to extract the unsigned (&tls_var - TP), but there aren't enough spare instructions to test whether the authenticate failed, which I believe is a requirement for -fptrauth-traps (for systems without FEAT_FPAC).
* Clarify that TLSDESC refers to the dialect * Clarify that signing the parameter to a resolver function is a contract between the dynamic linker and the resolver function.
|
I've split out the commentary/rationale about TLS into a separate design document based on my understanding of how TLS is being used today, and what could be done in a different signing-schema (defined separately from the PAuthABI). |
jrtc27
left a comment
There was a problem hiding this comment.
Thanks, a lot clearer to me now as to what this is actually trying to achieve and how. Some comments inline.
| dialects. The "traditional" dialect and the "descriptor" dialect. In | ||
| the traditional dialect global and local dynamic TLS use the | ||
| ``R_<CLS>_TLSGD`` and ``R_<CLS>_TLSLD`` prefixed relocations. These | ||
| create a pair of GOT entries relocated by ``R_<CLS>_TLS_DTPMOD``. In |
| TLS are handled the same way in both dialects. | ||
|
|
||
| The `PAUTHABIELF64`_ only supports the descriptor based dialect, | ||
| primarily because clang only supports the "descriptor" based dialect. |
There was a problem hiding this comment.
I think a stronger reason these days is because the traditional one is legacy so a new ABI should just follow the new approach? Clang supports traditional for a bunch of architectures so adding it for AArch64 would be straightforward.
| defined for TLSDESC, but not for Initial Exec. | ||
|
|
||
| Local dynamic TLS does not use the GOT so it can be handled by the | ||
| ``R_<CLS>_TLSLE`` prefixed relocations defined in `AAELF64`_. |
There was a problem hiding this comment.
Maybe reiterate "from the base ABI" like a few paragraphs above to be clearer it's unchanged?
| signing-schema for the platform. For example a signing-schema may only | ||
| sign GOT entries containing code-pointers, which would permit Initial | ||
| Exec TLS using the ``R_<CLS>_TLSIE`` prefixed relocations defined in | ||
| `AAELF64`_. Alternatively a signing-schema may sign all GOT entries. |
There was a problem hiding this comment.
"..., which would require AUTH variant static and dynamic relocations to be defined for Initial Exec"?
There was a problem hiding this comment.
ACK, have added that in.
|
|
||
| The static linker may relax a more general TLS model to a more | ||
| constrained model when TLS variables meet the requirements for using | ||
| the constrained model, and the relaxed sequence is permitted by the |
There was a problem hiding this comment.
I feel this comma hurts legibility, as the "and" relates to the "when" rather than the whole clause? At least, I had to backtrack in my head to parse it properly.
|
|
||
| .. code | ||
|
|
||
| adrp x0, :gottprel:v // R_AARCH64_AUTH_TLSIE_ADR_GOTTPREL_PAGE21 v |
There was a problem hiding this comment.
For this hypothetical support would you not need auth somewhere in the modifiers(?) to get the right relocation, like :got_auth(_lo12): (presumably :gottprel_auth(_lo12):)?
There was a problem hiding this comment.
Yes, I've added in auth to match the other ones.
| linker optimization of TLS descriptor code sequences involving | ||
| authenticated pointers, when undefined weak non-preemptible symbols | ||
| are known to resolve to 0; this can only be done if all relevant uses | ||
| of TLS descriptors are marked to permit accurate relaxation. |
There was a problem hiding this comment.
This should probably have a very brief outline of the cases discussed in the design doc and refer to it, rather than only list undefined weak non-preemptible symbols as the case that can be relaxed?
There was a problem hiding this comment.
I've removed the part about undefined weak, thought I'd done that already but my eyes were deceiving me.
I've replaced it with a reference to the design document.
* Add DTPREL dynamic relocation for traditional dialect. * Emphasised traditional dialect is legacy. * Removed comma making sentence difficult to parse. * Added auth to theoretical initial exec. * Reference design doc from main PAuthABI, further simplifying the section on relaxation.
smithp35
left a comment
There was a problem hiding this comment.
Thanks very much for the review, and apologies for the delay in responding. I'll upload a new version.
| dialects. The "traditional" dialect and the "descriptor" dialect. In | ||
| the traditional dialect global and local dynamic TLS use the | ||
| ``R_<CLS>_TLSGD`` and ``R_<CLS>_TLSLD`` prefixed relocations. These | ||
| create a pair of GOT entries relocated by ``R_<CLS>_TLS_DTPMOD``. In |
| TLS are handled the same way in both dialects. | ||
|
|
||
| The `PAUTHABIELF64`_ only supports the descriptor based dialect, | ||
| primarily because clang only supports the "descriptor" based dialect. |
| defined for TLSDESC, but not for Initial Exec. | ||
|
|
||
| Local dynamic TLS does not use the GOT so it can be handled by the | ||
| ``R_<CLS>_TLSLE`` prefixed relocations defined in `AAELF64`_. |
| signing-schema for the platform. For example a signing-schema may only | ||
| sign GOT entries containing code-pointers, which would permit Initial | ||
| Exec TLS using the ``R_<CLS>_TLSIE`` prefixed relocations defined in | ||
| `AAELF64`_. Alternatively a signing-schema may sign all GOT entries. |
There was a problem hiding this comment.
ACK, have added that in.
|
|
||
| The static linker may relax a more general TLS model to a more | ||
| constrained model when TLS variables meet the requirements for using | ||
| the constrained model, and the relaxed sequence is permitted by the |
|
|
||
| .. code | ||
|
|
||
| adrp x0, :gottprel:v // R_AARCH64_AUTH_TLSIE_ADR_GOTTPREL_PAGE21 v |
There was a problem hiding this comment.
Yes, I've added in auth to match the other ones.
| linker optimization of TLS descriptor code sequences involving | ||
| authenticated pointers, when undefined weak non-preemptible symbols | ||
| are known to resolve to 0; this can only be done if all relevant uses | ||
| of TLS descriptors are marked to permit accurate relaxation. |
There was a problem hiding this comment.
I've removed the part about undefined weak, thought I'd done that already but my eyes were deceiving me.
I've replaced it with a reference to the design document.
| Relocation code ``R_AARCH64_AUTH_TLSDESC_CALL`` is needed to permit | ||
| linker optimization of TLS descriptor code sequences involving | ||
| signed GOT entries. Further information, including possible | ||
| relaxations is available in the `PAUTHABITLS`_ design document. |
There was a problem hiding this comment.
nit:
| relaxations is available in the `PAUTHABITLS`_ design document. | |
| relaxations, is available in the `PAUTHABITLS`_ design document. |
There was a problem hiding this comment.
ACK, have adopted.
The TLSDESC sequence for accessing an authenticated pointer is similar to the traditional TLSDESC sequence. As pointer signing must be done at run-time there are much more limited opportunities to relax a TLSDESC AUTH, we introduce a R_AARCH64_AUTH_TLSDESC_CALL relocation that permits a static linker to transform the blraa to a nop, when relaxation is possible.
A new relocation code has been introduced rather than reusing R_AARCH64_TLSDESC_CALL so that a static linker can assume that the destination symbol is signed, without having to derive it from other R_AARCH64_TLSDESC_* relocations.
TLSDESC sequence
adrp x0, :tlsdesc:v //R_AARCH64_TLSDESC_ADR_PAGE21
ldr x1, [x0, #:tlsdesc_lo12:v] //R_AARCH64_TLSDESC_LD64_LO12
add x0, x0, #:tlsdesc_lo12:v //R_AARCH64_TLSDESC_ADD_LO12
.tlsdesccall var //R_AARCH64_TLSDESC_CALL
blr x1
TLSDESC AUTH sequence
adrp x0, :tlsdesc_auth:v //R_AARCH64_AUTH_TLSDESC_ADR_PAGE21
ldr x16, [x0, #:tlsdesc_auth_lo12:v]//R_AARCH64_AUTH_TLSDESC_LD64_LO12
add x0, x0, #:tlsdesc_auth_lo12:v //R_AARCH64_AUTH_TLSDESC_ADD_LO12
.tlsdescauthcall v //R_AARCH64_AUTH_TLSDESC_CALL
blraa x16, x0
fixes #393