feat: integrate Entire (entire.io) to log AI workflows with consent-gated upload by qyli00 · Pull Request #4 · AISmithLab/aicodinggym-cli

qyli00 · 2026-06-03T20:57:51Z

Summary

Integrates Entire into the CLI to capture AI agent sessions (prompts, responses, tool calls, files touched) as users solve problems, and — only with the user's consent — uploads them for research. Capture is local-only until consent; uploaded data is used solely for research and is de-identified/anonymized before use.

Logging is set up at fetch/download time and uploaded at submit time. Everything degrades gracefully: if the entire binary isn't installed, the core fetch/submit flow is never blocked — instead the user is pointed at aicodinggym configure (which offers to install it).

How it works

configure — offers to install the Entire CLI, records the writable submission-repo URL, and accepts --upload-logs/--no-upload-logs to pre-set consent.
swe fetch / cr fetch / mle download — install Entire's hooks so the session is captured locally as the user works. MLE workspaces are git init'd (they aren't repos by default).
swe submit / cr submit / mle submit — consent-gated upload. The first submit (if not already configured) prompts once:

AI Coding Gym can upload this AI coding session (prompts, responses, files changed) for research only. Data is de-identified/anonymized before use.

The choice is saved to ~/.aicodinggym/config.json. Non-interactive sessions never upload without a recorded choice.

Log identification & target

One repo, many branches: all three benchmarks log to the user's single writable repo (recorded from SWE fetch / configure), so each upload is identifiable by its branch even when one repo holds many problems.
Logs land on aicodinggym-logs/<benchmark>/<problem_id> with an aicodinggym-meta.json file at the tip (problem id, benchmark, user, tool, timestamp), injected via git plumbing without touching the working tree or Entire's branch.
CR's cloned PR repo is read-only, so CR/MLE logs target the user's own repo, never the review repo.

MLE code push

mle submit also pushes the user's solution code (notebooks/scripts/CSV; data/ excluded) to a branch named after the competition (e.g. spaceship-titanic), gated by the same log-upload consent. The prediction CSV still goes to the scoring API as before.

Files

entire_logging.py (new) — thin best-effort wrapper around the entire binary: setup, ensure_git_repo, commit_workspace + push_branch (MLE), flush (CR), has_sessions, upload.
config.py — persist entire_logging_consent and submission_repo_url; get_logging_consent/set_logging_consent.
cli.py — wire logging into configure/fetch/download/submit; consent + upload helpers.
README.md — document the feature, consent flow, unified repo, MLE code push, privacy.
Version bump 0.5.1 → 0.6.0.

Testing

All modules byte-compile; every command's --help exits 0.
SWE: upload pushes to the per-problem branch with aicodinggym-meta.json injected and session data carried over.
MLE: ensure_git_repo → commit_workspace → push_branch pushes solution files to the competition branch with data/ excluded.
Config: consent + submission_repo_url round-trip through the field allowlist.

Notes for reviewers

CR/MLE upload depends on submission_repo_url, which is auto-recorded after any SWE fetch, or from configure if the backend /api/configure response includes a repo_url (it currently returns only repo_name). Until then a CR/MLE-only user gets a graceful "no repo configured — pass --logs-remote" note instead of an upload.
The consent prompt centers on the AI session (the privacy-sensitive part) but also governs the MLE code push (per design). Easy to broaden the wording if preferred.

🤖 Generated with Claude Code

The previous implementation stopped at one level of nesting, silently dropping any files in sub-subdirectories. Extract the subdirectory walk into a helper that recurses through the GitHub Contents API listing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ated upload Capture AI agent sessions (prompts, responses, tool calls, files touched) via the Entire CLI and, only with user consent, upload them for research. - entire_logging.py: best-effort wrapper around the `entire` binary — setup (enable hooks + detected agents), ensure_git_repo (init non-git MLE workspaces), commit_workspace + push_branch (MLE code push), flush (CR checkpoint), has_sessions, and upload (inject aicodinggym-meta.json via git plumbing, push entire/checkpoints/v1 to a per-problem branch). - config.py: persist upload consent (entire_logging_consent) and the writable submission_repo_url; get_logging_consent/set_logging_consent helpers. - cli.py: - configure: offers to install Entire, captures submission repo URL, --upload-logs/--no-upload-logs to pre-set consent. - fetch/download (swe/cr/mle): set up local capture. If Entire isn't installed, point the user at `aicodinggym configure` (which offers to install it) instead of silently skipping. MLE inits a git repo. - submit (swe/cr/mle): consent-gated upload. First submit prompts once (research-only, de-identified). Non-interactive sessions never upload without recorded consent. - One repo, many branches: all three benchmarks log to the user's single repo (recorded from SWE fetch / configure), identified by branch aicodinggym-logs/<benchmark>/<problem_id> + an aicodinggym-meta.json file. CR's cloned PR repo (read-only) is never used as a target. - MLE also pushes the user's solution code (data/ excluded) to a <competition_id> branch, gated by the same log-upload consent. - README: document the logging feature, consent flow, unified repo, MLE code push, and privacy. - bump version 0.5.1 -> 0.6.0 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…xpand MLE ignores, add tests Addresses review feedback on PR #4: - Ordering: submit commands now resolve logging (incl. the interactive consent prompt) BEFORE printing the success banner, and embed the Logs/Code status line into the summary. The helpers return status text instead of echoing. - No overwrites: every submission pushes to a unique per-submission branch `aicodinggym-logs/<benchmark>/<problem_id>/<submission-id>` (UTC timestamp + random), so re-submissions and submissions from different directories/machines never clobber previous logs. MLE code goes to `<competition_id>/<submission-id>`; code and logs share one submission id. Dropped the force-push. - Expanded MLE .gitignore (model weights, checkpoints, caches, archives, venvs) so the pushed code branch stays small. - Added pytest suite (tests/): entire_logging git behaviour (metadata injection, unique branches, ignore list, has_sessions), config consent round-trip + allowlist persistence, and CLI remote/consent resolution. 27 tests. - pyproject: dev extra (pytest) + pytest testpaths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…s hermetic - Reordering logging before the success banner meant an unexpected error in the logging path could suppress the "Successfully submitted" summary. Wrap the pre-banner logging call in _logging_status() so it degrades to a warning and the banner always prints. - tests/test_cli_logging.py: autouse fixture clears ambient AICODINGGYM_LOGS_REMOTE (resolution reads it first) so the resolver tests are hermetic; add tests for _logging_status. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…output - mle download: api.mlebench_download_open() exposes Content-Length + a chunk iterator so the CLI drives a click.progressbar (falls back to a running MB counter when the server omits Content-Length). Replaces the silent mlebench_download_info(). - configure: the Entire auto-installer no longer captures output — it streams, and we print "Installing Entire (downloading...)" first, so it no longer looks frozen during the download (the likely cause of the "stuck on configure" report). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…preserve workspace on re-configure - fetch/download now print an instruction (when Entire capture is active) to start the AI agent INSIDE the fetched directory, because Claude Code/Codex load capture hooks from the launch dir and fix them for the session — cd-ing in later does not activate them, so the session wouldn't be captured. - configure: only change workspace_dir when --workspace-dir is explicitly given; on re-configure preserve the existing workspace instead of silently resetting it to the current directory (fall back to cwd only on first-time setup). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

qyli00 and others added 6 commits April 23, 2026 13:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: integrate Entire (entire.io) to log AI workflows with consent-gated upload#4

feat: integrate Entire (entire.io) to log AI workflows with consent-gated upload#4
qyli00 wants to merge 6 commits into
mainfrom
feat/entire-logging

qyli00 commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

qyli00 commented Jun 3, 2026

Summary

How it works

Log identification & target

MLE code push

Files

Testing

Notes for reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant