feat(smp): re-add an anti-spurious-wakeup feature#2470
Conversation
54cfef7 to
b69fae2
Compare
There was a problem hiding this comment.
Benchmark Results
Details
| Benchmark | Current: c88e434 | Previous: 30cb3f9 | Performance Ratio |
|---|---|---|---|
| startup_benchmark Build Time | 77.61 s |
78.59 s |
0.99 ❗ |
| startup_benchmark File Size | 0.76 MB |
0.76 MB |
1.00 ❗ |
| Startup Time - 1 core | 0.74 s (±0.02 s) |
0.73 s (±0.02 s) |
1.02 |
| Startup Time - 2 cores | 0.73 s (±0.02 s) |
0.75 s (±0.02 s) |
0.98 |
| Startup Time - 4 cores | 0.75 s (±0.02 s) |
0.76 s (±0.02 s) |
1.00 |
| multithreaded_benchmark Build Time | 80.67 s |
80.39 s |
1.00 ❗ |
| multithreaded_benchmark File Size | 0.82 MB |
0.82 MB |
1.01 ❗ |
| Multithreaded Pi Efficiency - 2 Threads | 90.93 % (±6.08 %) |
89.59 % (±5.95 %) |
1.02 |
| Multithreaded Pi Efficiency - 4 Threads | 44.49 % (±3.62 %) |
43.86 % (±2.44 %) |
1.01 |
| Multithreaded Pi Efficiency - 8 Threads | 26.16 % (±1.37 %) |
25.65 % (±1.39 %) |
1.02 |
| micro_benchmarks Build Time | 87.03 s |
87.27 s |
1.00 ❗ |
| micro_benchmarks File Size | 0.83 MB |
0.82 MB |
1.01 ❗ |
| Scheduling time - 1 thread | 59.80 ticks (±1.77 ticks) |
64.58 ticks (±2.95 ticks) |
0.93 ❗ |
| Scheduling time - 2 threads | 33.31 ticks (±2.54 ticks) |
35.27 ticks (±2.44 ticks) |
0.94 |
| Micro - Time for syscall (getpid) | 2.80 ticks (±0.20 ticks) |
2.72 ticks (±0.18 ticks) |
1.03 |
| Memcpy speed - (built_in) block size 4096 | 84981.71 MByte/s (±58667.63 MByte/s) |
84336.36 MByte/s (±58124.47 MByte/s) |
1.01 |
| Memcpy speed - (built_in) block size 1048576 | 30873.90 MByte/s (±25037.42 MByte/s) |
30954.90 MByte/s (±25149.11 MByte/s) |
1.00 |
| Memcpy speed - (built_in) block size 16777216 | 27912.52 MByte/s (±23154.88 MByte/s) |
27618.34 MByte/s (±22854.11 MByte/s) |
1.01 |
| Memset speed - (built_in) block size 4096 | 85091.75 MByte/s (±58745.19 MByte/s) |
84961.36 MByte/s (±58506.83 MByte/s) |
1.00 |
| Memset speed - (built_in) block size 1048576 | 31757.79 MByte/s (±25600.09 MByte/s) |
31697.81 MByte/s (±25573.78 MByte/s) |
1.00 |
| Memset speed - (built_in) block size 16777216 | 28360.23 MByte/s (±23342.98 MByte/s) |
28408.60 MByte/s (±23353.48 MByte/s) |
1.00 |
| Memcpy speed - (rust) block size 4096 | 75285.07 MByte/s (±52498.42 MByte/s) |
74740.31 MByte/s (±52172.06 MByte/s) |
1.01 |
| Memcpy speed - (rust) block size 1048576 | 31130.50 MByte/s (±25332.13 MByte/s) |
30989.85 MByte/s (±25131.37 MByte/s) |
1.00 |
| Memcpy speed - (rust) block size 16777216 | 27938.15 MByte/s (±23241.99 MByte/s) |
27766.19 MByte/s (±22932.85 MByte/s) |
1.01 |
| Memset speed - (rust) block size 4096 | 75500.38 MByte/s (±52636.73 MByte/s) |
75196.89 MByte/s (±52449.92 MByte/s) |
1.00 |
| Memset speed - (rust) block size 1048576 | 31927.34 MByte/s (±25795.65 MByte/s) |
31748.80 MByte/s (±25574.00 MByte/s) |
1.01 |
| Memset speed - (rust) block size 16777216 | 28478.15 MByte/s (±23491.83 MByte/s) |
28552.59 MByte/s (±23427.10 MByte/s) |
1.00 |
| alloc_benchmarks Build Time | 80.85 s |
81.63 s |
0.99 ❗ |
| alloc_benchmarks File Size | 0.84 MB |
0.84 MB |
1.00 ❗ |
| Allocations - Allocation success | 100.00 % |
100.00 % |
1 |
| Allocations - Deallocation success | 100.00 % |
100.00 % |
1 |
| Allocations - Pre-fail Allocations | 100.00 % |
100.00 % |
1 |
| Allocations - Average Allocation time | 4686.19 Ticks (±1512.86 Ticks) |
5722.73 Ticks (±63.96 Ticks) |
0.82 |
| Allocations - Average Allocation time (no fail) | 4686.19 Ticks (±1512.86 Ticks) |
5722.73 Ticks (±63.96 Ticks) |
0.82 |
| Allocations - Average Deallocation time | 860.91 Ticks (±149.07 Ticks) |
1530.42 Ticks (±212.44 Ticks) |
0.56 ❗ |
| mutex_benchmark Build Time | 103.71 s |
80.99 s |
1.28 ❗ |
| mutex_benchmark File Size | 0.83 MB |
0.82 MB |
1.01 ❗ |
| Mutex Stress Test Average Time per Iteration - 1 Threads | 12.02 ns (±0.32 ns) |
12.10 ns (±0.36 ns) |
0.99 |
| Mutex Stress Test Average Time per Iteration - 2 Threads | 13.66 ns (±0.51 ns) |
17.00 ns (±3.01 ns) |
0.80 ❗ |
This comment was automatically generated by workflow using github-action-benchmark.
b69fae2 to
8e50433
Compare
This one has tons of comments to convince myself it works. I am relatively convinced, but concurrent processing is hard you know?
8e50433 to
c88e434
Compare
mkroening
left a comment
There was a problem hiding this comment.
Thanks for this PR! :)
This does fix the laplace performance issue on my nested Intel VM (Hermit VM started with 4 cores):
Regarding the code:
Could you move this into a sleep_state.rs file and call it SleepState? That way, the gates should be less scattered, and the name might be clearer.
Please also don't use #[inline(always)]. For reference, see When to #[inline] - Standard library developers Guide. #[inline] is okay if the function is very small.
Regarding the atomics: relaxed ordering should be fine here, no? We don't use atomics here to build a synchronization primitive to synchronize memory access and are only interested in the values to make decisions.
As a follow-up to #2468, try to re-implement the logic I removed previously, but this time avoiding any potential race condition (hopefully??).