Skip to content

Commit c61a4b6

Browse files
grodranlorthayasin-a
authored andcommitted
[X86] Use shift+add/sub for vXi8 splat multiplies (llvm#174110)
Fixes llvm#164200 ~~I will create a separate PR to the `llvm-test-suite` repo for the microbenchmark for this change.~~ The benchmark is in llvm/llvm-test-suite#316 In my experiments on an EC2 `c6i.4xl`, the change gives a small improvement for the `x86-64`, `x86-64-v2`, and `x86-64-v3` targets. It regresses performance on `x86-64-v4` (in particular, when the constant decomposes into two shifts). The performance summary follows: ``` $ ../MicroBenchmarks/libs/benchmark/tools/compare.py benchmarks results-baseline-generic-v1.json results-opt-generic-v1.json |tail -n1 OVERALL_GEOMEAN -0.2846 -0.2846 0 0 0 0 $ ../MicroBenchmarks/libs/benchmark/tools/compare.py benchmarks results-baseline-generic-v2.json results-opt-generic-v2.json |tail -n1 OVERALL_GEOMEAN -0.0907 -0.0907 0 0 0 0 $ ../MicroBenchmarks/libs/benchmark/tools/compare.py benchmarks results-baseline-generic-v3.json results-opt-generic-v3.json |tail -n1 OVERALL_GEOMEAN -0.1821 -0.1821 0 0 0 0 $ ../MicroBenchmarks/libs/benchmark/tools/compare.py benchmarks results-baseline-generic-v4.json results-opt-generic-v4.json |tail -n1 OVERALL_GEOMEAN +0.0190 +0.0190 0 0 0 0 ```
1 parent 1ca5045 commit c61a4b6

File tree

2 files changed

+1781
-0
lines changed

2 files changed

+1781
-0
lines changed

llvm/lib/Target/X86/X86ISelLowering.cpp

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3497,6 +3497,18 @@ bool X86TargetLowering::decomposeMulByConstant(LLVMContext &Context, EVT VT,
34973497
if (!ISD::isConstantSplatVector(C.getNode(), MulC))
34983498
return false;
34993499

3500+
if (VT.isVector() && VT.getScalarSizeInBits() == 8) {
3501+
// Check whether a vXi8 multiply can be decomposed into two shifts
3502+
// (decomposing 2^m ± 2^n as 2^(a+b) ± 2^b). Similar to
3503+
// DAGCombiner::visitMUL, consider the constant `2` decomposable as
3504+
// (2^0 + 1).
3505+
APInt ShiftedMulC = MulC.abs();
3506+
unsigned TZeros = ShiftedMulC == 2 ? 0 : ShiftedMulC.countr_zero();
3507+
ShiftedMulC.lshrInPlace(TZeros);
3508+
if ((ShiftedMulC - 1).isPowerOf2() || (ShiftedMulC + 1).isPowerOf2())
3509+
return true;
3510+
}
3511+
35003512
// Find the type this will be legalized too. Otherwise we might prematurely
35013513
// convert this to shl+add/sub and then still have to type legalize those ops.
35023514
// Another choice would be to defer the decision for illegal types until

0 commit comments

Comments
 (0)