feat: integrate Entire (entire.io) to log AI workflows with consent-gated upload#4
Open
qyli00 wants to merge 6 commits into
Open
feat: integrate Entire (entire.io) to log AI workflows with consent-gated upload#4qyli00 wants to merge 6 commits into
qyli00 wants to merge 6 commits into
Conversation
The previous implementation stopped at one level of nesting, silently dropping any files in sub-subdirectories. Extract the subdirectory walk into a helper that recurses through the GitHub Contents API listing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ated upload
Capture AI agent sessions (prompts, responses, tool calls, files touched) via
the Entire CLI and, only with user consent, upload them for research.
- entire_logging.py: best-effort wrapper around the `entire` binary —
setup (enable hooks + detected agents), ensure_git_repo (init non-git MLE
workspaces), commit_workspace + push_branch (MLE code push), flush (CR
checkpoint), has_sessions, and upload (inject aicodinggym-meta.json via git
plumbing, push entire/checkpoints/v1 to a per-problem branch).
- config.py: persist upload consent (entire_logging_consent) and the writable
submission_repo_url; get_logging_consent/set_logging_consent helpers.
- cli.py:
- configure: offers to install Entire, captures submission repo URL,
--upload-logs/--no-upload-logs to pre-set consent.
- fetch/download (swe/cr/mle): set up local capture. If Entire isn't
installed, point the user at `aicodinggym configure` (which offers to
install it) instead of silently skipping. MLE inits a git repo.
- submit (swe/cr/mle): consent-gated upload. First submit prompts once
(research-only, de-identified). Non-interactive sessions never upload
without recorded consent.
- One repo, many branches: all three benchmarks log to the user's single
repo (recorded from SWE fetch / configure), identified by branch
aicodinggym-logs/<benchmark>/<problem_id> + an aicodinggym-meta.json file.
CR's cloned PR repo (read-only) is never used as a target.
- MLE also pushes the user's solution code (data/ excluded) to a
<competition_id> branch, gated by the same log-upload consent.
- README: document the logging feature, consent flow, unified repo, MLE code
push, and privacy.
- bump version 0.5.1 -> 0.6.0
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…xpand MLE ignores, add tests Addresses review feedback on PR #4: - Ordering: submit commands now resolve logging (incl. the interactive consent prompt) BEFORE printing the success banner, and embed the Logs/Code status line into the summary. The helpers return status text instead of echoing. - No overwrites: every submission pushes to a unique per-submission branch `aicodinggym-logs/<benchmark>/<problem_id>/<submission-id>` (UTC timestamp + random), so re-submissions and submissions from different directories/machines never clobber previous logs. MLE code goes to `<competition_id>/<submission-id>`; code and logs share one submission id. Dropped the force-push. - Expanded MLE .gitignore (model weights, checkpoints, caches, archives, venvs) so the pushed code branch stays small. - Added pytest suite (tests/): entire_logging git behaviour (metadata injection, unique branches, ignore list, has_sessions), config consent round-trip + allowlist persistence, and CLI remote/consent resolution. 27 tests. - pyproject: dev extra (pytest) + pytest testpaths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s hermetic - Reordering logging before the success banner meant an unexpected error in the logging path could suppress the "Successfully submitted" summary. Wrap the pre-banner logging call in _logging_status() so it degrades to a warning and the banner always prints. - tests/test_cli_logging.py: autouse fixture clears ambient AICODINGGYM_LOGS_REMOTE (resolution reads it first) so the resolver tests are hermetic; add tests for _logging_status. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…output - mle download: api.mlebench_download_open() exposes Content-Length + a chunk iterator so the CLI drives a click.progressbar (falls back to a running MB counter when the server omits Content-Length). Replaces the silent mlebench_download_info(). - configure: the Entire auto-installer no longer captures output — it streams, and we print "Installing Entire (downloading...)" first, so it no longer looks frozen during the download (the likely cause of the "stuck on configure" report). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…preserve workspace on re-configure - fetch/download now print an instruction (when Entire capture is active) to start the AI agent INSIDE the fetched directory, because Claude Code/Codex load capture hooks from the launch dir and fix them for the session — cd-ing in later does not activate them, so the session wouldn't be captured. - configure: only change workspace_dir when --workspace-dir is explicitly given; on re-configure preserve the existing workspace instead of silently resetting it to the current directory (fall back to cwd only on first-time setup). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Integrates Entire into the CLI to capture AI agent sessions (prompts, responses, tool calls, files touched) as users solve problems, and — only with the user's consent — uploads them for research. Capture is local-only until consent; uploaded data is used solely for research and is de-identified/anonymized before use.
Logging is set up at fetch/download time and uploaded at submit time. Everything degrades gracefully: if the
entirebinary isn't installed, the core fetch/submit flow is never blocked — instead the user is pointed ataicodinggym configure(which offers to install it).How it works
configure— offers to install the Entire CLI, records the writable submission-repo URL, and accepts--upload-logs/--no-upload-logsto pre-set consent.swe fetch/cr fetch/mle download— install Entire's hooks so the session is captured locally as the user works. MLE workspaces aregit init'd (they aren't repos by default).swe submit/cr submit/mle submit— consent-gated upload. The first submit (if not already configured) prompts once:The choice is saved to
~/.aicodinggym/config.json. Non-interactive sessions never upload without a recorded choice.Log identification & target
fetch/configure), so each upload is identifiable by its branch even when one repo holds many problems.aicodinggym-logs/<benchmark>/<problem_id>with anaicodinggym-meta.jsonfile at the tip (problem id, benchmark, user, tool, timestamp), injected via git plumbing without touching the working tree or Entire's branch.MLE code push
mle submitalso pushes the user's solution code (notebooks/scripts/CSV;data/excluded) to a branch named after the competition (e.g.spaceship-titanic), gated by the same log-upload consent. The prediction CSV still goes to the scoring API as before.Files
entire_logging.py(new) — thin best-effort wrapper around theentirebinary:setup,ensure_git_repo,commit_workspace+push_branch(MLE),flush(CR),has_sessions,upload.config.py— persistentire_logging_consentandsubmission_repo_url;get_logging_consent/set_logging_consent.cli.py— wire logging into configure/fetch/download/submit; consent + upload helpers.README.md— document the feature, consent flow, unified repo, MLE code push, privacy.0.5.1→0.6.0.Testing
--helpexits 0.uploadpushes to the per-problem branch withaicodinggym-meta.jsoninjected and session data carried over.ensure_git_repo→commit_workspace→push_branchpushes solution files to the competition branch withdata/excluded.submission_repo_urlround-trip through the field allowlist.Notes for reviewers
submission_repo_url, which is auto-recorded after any SWEfetch, or fromconfigureif the backend/api/configureresponse includes arepo_url(it currently returns onlyrepo_name). Until then a CR/MLE-only user gets a graceful "no repo configured — pass--logs-remote" note instead of an upload.🤖 Generated with Claude Code