-
-
Notifications
You must be signed in to change notification settings - Fork 34.5k
gh-139109: A new tracing JIT compiler frontend for CPython #140310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 117 commits
795ef49
40bf6c1
13188a9
e63de39
fba9d2d
7192671
07542dd
021fc44
f886c43
3066963
20b283b
36554a5
92bba64
2e3ddc1
aada168
7b5c655
3e9f782
9a66605
108ab7f
fac8c74
fd3bb48
96b7bb2
396818b
57f417e
2c603cc
02f1fb4
dc414a3
0ffc2dd
2032b9c
299a068
8e0fb21
95eee89
6936a38
a274451
f55129e
71bd27b
e834c88
cae8f10
39bc819
ff92937
2589eb0
a2e92a6
cbb3ad2
9910b65
5102ab6
c0c14b4
7d4f866
1981f50
b879dab
093578c
d114944
e50ff65
54f6cd6
e3f18e6
608772f
1872715
460fb39
8d6f1db
a8762c2
72c2242
d76dc85
87c0b72
24cd7f9
8ae2e4c
f38ef69
960d647
1798ab1
681485f
9bb03a8
4b26cde
d820e22
00c81fa
ba64a5b
b00252e
754b3b7
6045a67
ec2971f
d49e367
55892a4
dd0e16f
d18c1a1
7d17741
8ebb6cb
c23e591
a62fe40
e4f1624
a7fcf24
eb73378
2b5fe3a
3385420
cedd7af
676faf8
cdcce30
1a3f129
e8fff00
7b2a8ca
abb1757
0fee4e9
ccc7893
eb970e0
abecfd6
8aeabd5
6bd1541
a53ca1d
ce66f3b
5733455
4aab2df
ab7527c
7e7b240
4a4a31f
72e1738
5e17707
1e132f0
bf17539
7ab76a8
86ab7f1
5f39672
440ad03
9cc7999
4b4e857
a5d918e
d601256
425fd51
1f8c3df
bdd2123
5f4f310
b5a9b07
8adaf4d
e918cb2
e9e2bb9
01c2d73
1d3aed1
7bfac26
ac0711d
da66058
692a992
84ee07b
72368ed
8e62fd1
92cc140
af9ea57
3ed1402
0a1bad2
0498568
d44ca1e
2dbc291
82ec8f1
f526688
c78a4e6
aa92d84
253f230
ffa2b72
897edf5
1ef4a37
0e92118
c75d91c
bf1ddab
a64215d
da5e7b7
83c85c3
6c77ee3
3fd3ab5
6429b2f
5cfc7a9
c5de275
b7b3c23
cda3dce
3f212a4
4f29dd3
5af4b0a
46c079a
4aeb4ab
fe3a6a1
10fa14a
0f978c4
3c80446
aaf6873
547f587
a55d766
278bbe6
f547880
7e2bc1d
fa3e285
251e19e
08ec600
f7c26d4
ae1d6fe
0658f1b
f8a764a
b72ab8b
8993e6c
fa45ae8
308ccc0
33cf287
3163d9b
658fd98
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -99,8 +99,13 @@ backoff_counter_triggers(_Py_BackoffCounter counter) | |
| // Must be larger than ADAPTIVE_COOLDOWN_VALUE, otherwise when JIT code is | ||
| // invalidated we may construct a new trace before the bytecode has properly | ||
| // re-specialized: | ||
| #define JUMP_BACKWARD_INITIAL_VALUE 4095 | ||
| #define JUMP_BACKWARD_INITIAL_BACKOFF 12 | ||
| // Note: this should be a prime number-1. This increases the likelihood of | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this comment is true, then we should change the backoff counter to use a table lookup instead of using
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I manually verified it was true on the nqueens benchmark. it was the main reason why the perf used to be so bad for it. |
||
| // finding a "good" loop iteration to trace. | ||
| // For example, 4095 does not work for the nqueens benchmark on pyperformanc | ||
|
Fidget-Spinner marked this conversation as resolved.
Outdated
|
||
| // as we always end up tracing the loop iteration's | ||
| // exhaustion iteration. Which aborts our current tracer. | ||
| #define JUMP_BACKWARD_INITIAL_VALUE 4000 | ||
| #define JUMP_BACKWARD_INITIAL_BACKOFF 14 | ||
|
Fidget-Spinner marked this conversation as resolved.
Outdated
|
||
| static inline _Py_BackoffCounter | ||
| initial_jump_backoff_counter(void) | ||
| { | ||
|
|
@@ -112,7 +117,7 @@ initial_jump_backoff_counter(void) | |
| * Must be larger than ADAPTIVE_COOLDOWN_VALUE, | ||
| * otherwise when a side exit warms up we may construct | ||
| * a new trace before the Tier 1 code has properly re-specialized. */ | ||
| #define SIDE_EXIT_INITIAL_VALUE 4095 | ||
| #define SIDE_EXIT_INITIAL_VALUE 4000 | ||
| #define SIDE_EXIT_INITIAL_BACKOFF 12 | ||
|
|
||
| static inline _Py_BackoffCounter | ||
|
|
@@ -129,6 +134,14 @@ initial_unreachable_backoff_counter(void) | |
| return make_backoff_counter(0, UNREACHABLE_BACKOFF); | ||
| } | ||
|
|
||
| // Required to not get stuck in infinite specialization loops due to specialization failure. | ||
| // We use 2 here as tnere are a few scenarios: | ||
|
Fidget-Spinner marked this conversation as resolved.
Outdated
|
||
| // 1. Freshly specialized from unspecialized, in which case the counter will be 1. | ||
| // 2. Re-specialized from deopt, in which case the counter will be 1. | ||
| // 3. Deopt -> Specialize -> Deopt -> Specialize, in which case the counter will be 2. | ||
| // We do not want the 3rd case. | ||
| #define MAX_SPECIALIZATION_TRIES 2 | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this necessary? If we set the backoff counter to zero in
It is also simple and fast.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. If specialization fails and we get re-specialization, we need to distinguish between a successful re-specialization due to a normal deopt, and an unsuccessful re-specialization due to a deopt that is "real". In both cases, their count would be 1, and both bytecode would still be the outdated specialization form due to exponential backoff/bug in the specializer. Recall that on FT it is possible for specialization to make no progress due to a failed lock. So you can have a correct specialization form, just failed due to lock acquired failure and that would also force a re-specialization on the JIT. on JIT, this may possibly be stuck in re-specialization forever. We need to protect against that on FT.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't follow. A deopt is a deopt. It has the same effect of calling then specialization or decrementing the counter. Whether specializatio succeeds or fails, it will set the counter to non-zero and not repeat.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the only way that FT specialization can fail without setting the counter to non-zero is if the instruction gets asynchronously instrumented. And that is a form of progress as well.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah this doesn't happen for everything. There are multiple bugs in a few opcodes right now where their specialization don't match their
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this an infinite loop? I would expect the following sequence:
Are you going to back to tracing before the instruction completes?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No the sequence is different: STORE_SUBSCR_LIST_INT (tracing) Hence the loop We cant record during a deopt otherwise we have to deal with complicated rewinding of uops.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That doesn't sounds right to me. Otherwise you are recording multiple instructions for a single instruction. Once
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
How do you detect that?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do you detect what?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if it's at all worth instrumenting stats for this? Would it be worth understanding how often we're running up against this? |
||
|
|
||
| #ifdef __cplusplus | ||
| } | ||
| #endif | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -757,6 +757,29 @@ struct _Py_unique_id_pool { | |
|
|
||
| typedef _Py_CODEUNIT *(*_PyJitEntryFuncPtr)(struct _PyExecutorObject *exec, _PyInterpreterFrame *frame, _PyStackRef *stack_pointer, PyThreadState *tstate); | ||
|
|
||
| typedef struct _PyJitTracerState { | ||
|
Fidget-Spinner marked this conversation as resolved.
Outdated
|
||
| bool dependencies_still_valid; | ||
| bool dynamic_jump_taken; | ||
| bool prev_instr_is_super; | ||
| int code_max_size; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is this for? The max code size is
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No. It follows the old code where we deduct from the max length whenever we see something that could exit or deopt to reserve space for it later. |
||
| int code_curr_size; | ||
| int initial_stack_depth; | ||
|
Fidget-Spinner marked this conversation as resolved.
Outdated
|
||
| int initial_chain_depth; | ||
| int prev_instr_oparg; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Likewise all the "previous" stuff need for trace recording can go in its own struct. |
||
| int prev_instr_stacklevel; | ||
| int specialize_counter; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As I said before, you really don't need this. Only specialize once.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To repeat again: that won't work on FT, as locks may spuriously fail and force a re-specialization since we always specialize when we are tracing. |
||
| _PyUOpInstruction *code_buffer; | ||
| _Py_CODEUNIT *insert_exec_instr; | ||
|
Fidget-Spinner marked this conversation as resolved.
Outdated
|
||
| _Py_CODEUNIT *close_loop_instr; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code involving this and
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Close loop instruction points to the loop header. insert_exec_instr (now called start_instr) points to the backward jump. If you think about it, both are valid for closing the loop. In fact, there's a higher chance we close the loop if we use the loop header instead of the backward jump as a metric, as there's only one loop header, but multiple backward jumps that could lead the loop header. |
||
| PyCodeObject *initial_code; // Strong | ||
| PyFunctionObject *initial_func; // Strong | ||
| _Py_CODEUNIT *prev_instr; | ||
| PyCodeObject *prev_instr_code; // Strong | ||
| struct _PyExitData *prev_exit; | ||
|
Fidget-Spinner marked this conversation as resolved.
Outdated
|
||
| _PyInterpreterFrame *prev_instr_frame; | ||
| _PyBloomFilter dependencies; | ||
| } _PyJitTracerState; | ||
|
|
||
|
Fidget-Spinner marked this conversation as resolved.
Outdated
|
||
| /* PyInterpreterState holds the global state for one of the runtime's | ||
| interpreters. Typically the initial (main) interpreter is the only one. | ||
|
|
||
|
|
@@ -932,9 +955,9 @@ struct _is { | |
| struct types_state types; | ||
| struct callable_cache callable_cache; | ||
| PyObject *common_consts[NUM_COMMON_CONSTANTS]; | ||
| _PyJitTracerState jit_state; | ||
| bool jit; | ||
| bool compiling; | ||
| struct _PyUOpInstruction *jit_uop_buffer; | ||
| struct _PyExecutorObject *executor_list_head; | ||
| struct _PyExecutorObject *executor_deletion_list_head; | ||
| struct _PyExecutorObject *cold_executor; | ||
|
|
||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -35,10 +35,18 @@ typedef struct _PyUOpInstruction{ | |
| #endif | ||
| } _PyUOpInstruction; | ||
|
|
||
| // This is the length of the trace we project initially. | ||
| #define UOP_MAX_TRACE_LENGTH 1200 | ||
| // This is the length of the trace we translate initially. | ||
| #define UOP_MAX_TRACE_LENGTH 3000 | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Based on what you said in the PR comment about seeing a lot of "trace too long" aborts even with the higher limit, I'm wondering if you've benchmarked even higher limits like 5k or 10k?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oops, the trace too long aborts are with the old limit (1200). I have not benchmarked the stats for the newer limit. We should definitely gather stats for this in the future and fine-tune it though |
||
| #define UOP_BUFFER_SIZE (UOP_MAX_TRACE_LENGTH * sizeof(_PyUOpInstruction)) | ||
|
|
||
| /* Bloom filter with m = 256 | ||
| * https://en.wikipedia.org/wiki/Bloom_filter */ | ||
| #define _Py_BLOOM_FILTER_WORDS 8 | ||
|
|
||
| typedef struct { | ||
| uint32_t bits[_Py_BLOOM_FILTER_WORDS]; | ||
| } _PyBloomFilter; | ||
|
|
||
| #ifdef __cplusplus | ||
| } | ||
| #endif | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.