[bugfix] continuous action precision mismatch by yacineMTB · Pull Request #584 · PufferAI/PufferLib

yacineMTB · 2026-06-13T03:34:31Z

I think the best test that shows this improvement is an independent sweep before the fix, and then after the fix.

Basically; if precision_t was bf16/fp16, the sampled fp32 action gets rounded, and that messes with things downstream in PPO. I'm not sure exactly how PPO uses the ratio; but before this fix, if you did not change the policy, the logratio would be nonzero. As I understand it; if the policy did not change, this should be zero. With discrete actionspace, it does stay zero, but in continuous. it is does not.

discrete doesn't have this problem because it uses integers.

yacineMTB commented Jun 13, 2026

View reviewed changes

Comment thread src/pufferlib.cu Outdated

yacineMTB force-pushed the fix-continuous-action-logprob-4.0 branch from 4dd4990 to af45050 Compare June 13, 2026 03:37

Fix continuous action logprob precision mismatch

21c987b

yacineMTB force-pushed the fix-continuous-action-logprob-4.0 branch from af45050 to 21c987b Compare June 15, 2026 16:31

yacineMTB changed the title ~~[bug] [draft] continuous action logprob precision mismatch~~ [draft] continuous action precision mismatch Jun 15, 2026

yacineMTB changed the title ~~[draft] continuous action precision mismatch~~ [bugfix] continuous action precision mismatch Jun 15, 2026

yacineMTB marked this pull request as ready for review June 15, 2026 16:37

jsuarez5341 merged commit e90b58e into PufferAI:4.0 Jun 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix] continuous action precision mismatch#584

[bugfix] continuous action precision mismatch#584
jsuarez5341 merged 1 commit into
PufferAI:4.0from
yacineMTB:fix-continuous-action-logprob-4.0

yacineMTB commented Jun 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yacineMTB commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yacineMTB commented Jun 13, 2026 •

edited

Loading