+ "details": "### Summary\n\nA timing side-channel was discovered in the Decompose algorithm which is used during ML-DSA signing to generate hints for the signature.\n\n### Details\n\nThe analysis was performed using a constant-time analyzer that examines compiled assembly code for instructions with data-dependent timing behavior. The analyzer flags:\n\n- **UDIV/SDIV instructions**: Hardware division instructions have early termination optimizations where execution time depends on operand values.\n\nThe `decompose` function used a hardware division instruction to compute `r1.0 / TwoGamma2::U32`. This function is called during signing through `high_bits()` and `low_bits()`, which process values derived from secret key components:\n\n- `(&w - &cs2).low_bits()` where `cs2` is derived from secret key component `s2`\n- `Hint::new()` calls `high_bits()` on values derived from secret key component `t0`\n\n**Original Code**:\n```rust\nfn decompose<TwoGamma2: Unsigned>(self) -> (Elem, Elem) {\n // ...\n let mut r1 = r_plus - r0;\n r1.0 /= TwoGamma2::U32; // Variable-time division on secret-derived data\n (r1, r0)\n}\n```\n\n### PoC\n\nI do not have an exploit written for this, currently.\n\n### Impact\n\nThe dividend (`r1.0`) is derived from secret key material. An attacker with precise timing measurements could extract information about the signing key by observing timing variations in the division operation.\n\n### Mitigation\n\nReplacing division with constant-time Barrett reduction mitigates this risk. Since `TwoGamma2` is a compile-time constant, we precompute the multiplicative inverse:\n\n```patch\ndiff --git a/ml-dsa/src/algebra.rs b/ml-dsa/src/algebra.rs\nindex 559b68a..bb126ce 100644\n--- a/ml-dsa/src/algebra.rs\n+++ b/ml-dsa/src/algebra.rs\n@@ -54,8 +54,50 @@ pub(crate) trait Decompose {\n fn decompose<TwoGamma2: Unsigned>(self) -> (Elem, Elem);\n }\n \n+/// Constant-time division by a compile-time constant divisor.\n+///\n+/// This trait provides a constant-time alternative to the hardware division\n+/// instruction, which has variable timing based on operand values.\n+/// Uses Barrett reduction to compute `x / M` where M is a compile-time constant.\n+pub(crate) trait ConstantTimeDiv: Unsigned {\n+ /// Bit shift for Barrett reduction, chosen to provide sufficient precision\n+ const CT_DIV_SHIFT: usize;\n+ /// Precomputed multiplier: ceil(2^SHIFT / M)\n+ const CT_DIV_MULTIPLIER: u64;\n+\n+ /// Perform constant-time division of x by Self::U32\n+ /// Requires: x < Q (the field modulus, ~2^23)\n+ #[inline(always)]\n+ fn ct_div(x: u32) -> u32 {\n+ // Barrett reduction: q = (x * MULTIPLIER) >> SHIFT\n+ // This gives us floor(x / M) for x < 2^SHIFT / MULTIPLIER * M\n+ let x64 = u64::from(x);\n+ let quotient = (x64 * Self::CT_DIV_MULTIPLIER) >> Self::CT_DIV_SHIFT;\n+ quotient as u32\n+ }\n+}\n+\n+impl<M> ConstantTimeDiv for M\n+where\n+ M: Unsigned,\n+{\n+ // Use a shift that provides enough precision for the ML-DSA field (Q ~ 2^23)\n+ // We need SHIFT > log2(Q) + log2(M) to ensure accuracy\n+ // With Q < 2^24 and M < 2^20, SHIFT = 48 is sufficient\n+ const CT_DIV_SHIFT: usize = 48;\n+\n+ // Precompute the multiplier at compile time\n+ // We add (M-1) before dividing to get ceiling division, ensuring we never underestimate\n+ #[allow(clippy::integer_division_remainder_used)]\n+ const CT_DIV_MULTIPLIER: u64 = ((1u64 << Self::CT_DIV_SHIFT) + M::U64 - 1) / M::U64;\n+}\n+\n impl Decompose for Elem {\n // Algorithm 36 Decompose\n+ //\n+ // This implementation uses constant-time division to avoid timing side-channels.\n+ // The original algorithm used hardware division which has variable timing based\n+ // on operand values, potentially leaking secret information during signing.\n fn decompose<TwoGamma2: Unsigned>(self) -> (Elem, Elem) {\n let r_plus = self.clone();\n let r0 = r_plus.mod_plus_minus::<TwoGamma2>();\n@@ -63,8 +105,9 @@ impl Decompose for Elem {\n if r_plus - r0 == Elem::new(BaseField::Q - 1) {\n (Elem::new(0), r0 - Elem::new(1))\n } else {\n- let mut r1 = r_plus - r0;\n- r1.0 /= TwoGamma2::U32;\n+ let diff = r_plus - r0;\n+ // Use constant-time division instead of hardware division\n+ let r1 = Elem::new(TwoGamma2::ct_div(diff.0));\n (r1, r0)\n }\n }\n```\n\nSee our blog post on [how we avoided side-channels in our Go implementation of ML-DSA](https://blog.trailofbits.com/2025/11/14/how-we-avoided-side-channels-in-our-new-post-quantum-go-cryptography-libraries/) for more information.",
0 commit comments