Commit c61a4b6
[X86] Use shift+add/sub for vXi8 splat multiplies (llvm#174110)
Fixes llvm#164200
~~I will create a separate PR to the `llvm-test-suite` repo for the
microbenchmark for this change.~~ The benchmark is in
llvm/llvm-test-suite#316
In my experiments on an EC2 `c6i.4xl`, the change gives a small
improvement for the `x86-64`, `x86-64-v2`, and `x86-64-v3` targets. It
regresses performance on `x86-64-v4` (in particular, when the constant
decomposes into two shifts). The performance summary follows:
```
$ ../MicroBenchmarks/libs/benchmark/tools/compare.py benchmarks results-baseline-generic-v1.json results-opt-generic-v1.json |tail -n1
OVERALL_GEOMEAN -0.2846 -0.2846 0 0 0 0
$ ../MicroBenchmarks/libs/benchmark/tools/compare.py benchmarks results-baseline-generic-v2.json results-opt-generic-v2.json |tail -n1
OVERALL_GEOMEAN -0.0907 -0.0907 0 0 0 0
$ ../MicroBenchmarks/libs/benchmark/tools/compare.py benchmarks results-baseline-generic-v3.json results-opt-generic-v3.json |tail -n1
OVERALL_GEOMEAN -0.1821 -0.1821 0 0 0 0
$ ../MicroBenchmarks/libs/benchmark/tools/compare.py benchmarks results-baseline-generic-v4.json results-opt-generic-v4.json |tail -n1
OVERALL_GEOMEAN +0.0190 +0.0190 0 0 0 0
```1 parent 1ca5045 commit c61a4b6
File tree
2 files changed
+1781
-0
lines changed- llvm
- lib/Target/X86
- test/CodeGen/X86
2 files changed
+1781
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3497 | 3497 | | |
3498 | 3498 | | |
3499 | 3499 | | |
| 3500 | + | |
| 3501 | + | |
| 3502 | + | |
| 3503 | + | |
| 3504 | + | |
| 3505 | + | |
| 3506 | + | |
| 3507 | + | |
| 3508 | + | |
| 3509 | + | |
| 3510 | + | |
| 3511 | + | |
3500 | 3512 | | |
3501 | 3513 | | |
3502 | 3514 | | |
| |||
0 commit comments