Skip to content

feat: integrate Entire (entire.io) to log AI workflows with consent-gated upload#4

Open
qyli00 wants to merge 6 commits into
mainfrom
feat/entire-logging
Open

feat: integrate Entire (entire.io) to log AI workflows with consent-gated upload#4
qyli00 wants to merge 6 commits into
mainfrom
feat/entire-logging

Conversation

@qyli00
Copy link
Copy Markdown
Contributor

@qyli00 qyli00 commented Jun 3, 2026

Summary

Integrates Entire into the CLI to capture AI agent sessions (prompts, responses, tool calls, files touched) as users solve problems, and — only with the user's consent — uploads them for research. Capture is local-only until consent; uploaded data is used solely for research and is de-identified/anonymized before use.

Logging is set up at fetch/download time and uploaded at submit time. Everything degrades gracefully: if the entire binary isn't installed, the core fetch/submit flow is never blocked — instead the user is pointed at aicodinggym configure (which offers to install it).

How it works

  • configure — offers to install the Entire CLI, records the writable submission-repo URL, and accepts --upload-logs/--no-upload-logs to pre-set consent.

  • swe fetch / cr fetch / mle download — install Entire's hooks so the session is captured locally as the user works. MLE workspaces are git init'd (they aren't repos by default).

  • swe submit / cr submit / mle submit — consent-gated upload. The first submit (if not already configured) prompts once:

    AI Coding Gym can upload this AI coding session (prompts, responses, files changed) for research only. Data is de-identified/anonymized before use.

    The choice is saved to ~/.aicodinggym/config.json. Non-interactive sessions never upload without a recorded choice.

Log identification & target

  • One repo, many branches: all three benchmarks log to the user's single writable repo (recorded from SWE fetch / configure), so each upload is identifiable by its branch even when one repo holds many problems.
  • Logs land on aicodinggym-logs/<benchmark>/<problem_id> with an aicodinggym-meta.json file at the tip (problem id, benchmark, user, tool, timestamp), injected via git plumbing without touching the working tree or Entire's branch.
  • CR's cloned PR repo is read-only, so CR/MLE logs target the user's own repo, never the review repo.

MLE code push

mle submit also pushes the user's solution code (notebooks/scripts/CSV; data/ excluded) to a branch named after the competition (e.g. spaceship-titanic), gated by the same log-upload consent. The prediction CSV still goes to the scoring API as before.

Files

  • entire_logging.py (new) — thin best-effort wrapper around the entire binary: setup, ensure_git_repo, commit_workspace + push_branch (MLE), flush (CR), has_sessions, upload.
  • config.py — persist entire_logging_consent and submission_repo_url; get_logging_consent/set_logging_consent.
  • cli.py — wire logging into configure/fetch/download/submit; consent + upload helpers.
  • README.md — document the feature, consent flow, unified repo, MLE code push, privacy.
  • Version bump 0.5.10.6.0.

Testing

  • All modules byte-compile; every command's --help exits 0.
  • SWE: upload pushes to the per-problem branch with aicodinggym-meta.json injected and session data carried over.
  • MLE: ensure_git_repocommit_workspacepush_branch pushes solution files to the competition branch with data/ excluded.
  • Config: consent + submission_repo_url round-trip through the field allowlist.

Notes for reviewers

  • CR/MLE upload depends on submission_repo_url, which is auto-recorded after any SWE fetch, or from configure if the backend /api/configure response includes a repo_url (it currently returns only repo_name). Until then a CR/MLE-only user gets a graceful "no repo configured — pass --logs-remote" note instead of an upload.
  • The consent prompt centers on the AI session (the privacy-sensitive part) but also governs the MLE code push (per design). Easy to broaden the wording if preferred.

🤖 Generated with Claude Code

qyli00 and others added 6 commits April 23, 2026 13:22
The previous implementation stopped at one level of nesting, silently
dropping any files in sub-subdirectories. Extract the subdirectory walk
into a helper that recurses through the GitHub Contents API listing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ated upload

Capture AI agent sessions (prompts, responses, tool calls, files touched) via
the Entire CLI and, only with user consent, upload them for research.

- entire_logging.py: best-effort wrapper around the `entire` binary —
  setup (enable hooks + detected agents), ensure_git_repo (init non-git MLE
  workspaces), commit_workspace + push_branch (MLE code push), flush (CR
  checkpoint), has_sessions, and upload (inject aicodinggym-meta.json via git
  plumbing, push entire/checkpoints/v1 to a per-problem branch).
- config.py: persist upload consent (entire_logging_consent) and the writable
  submission_repo_url; get_logging_consent/set_logging_consent helpers.
- cli.py:
  - configure: offers to install Entire, captures submission repo URL,
    --upload-logs/--no-upload-logs to pre-set consent.
  - fetch/download (swe/cr/mle): set up local capture. If Entire isn't
    installed, point the user at `aicodinggym configure` (which offers to
    install it) instead of silently skipping. MLE inits a git repo.
  - submit (swe/cr/mle): consent-gated upload. First submit prompts once
    (research-only, de-identified). Non-interactive sessions never upload
    without recorded consent.
  - One repo, many branches: all three benchmarks log to the user's single
    repo (recorded from SWE fetch / configure), identified by branch
    aicodinggym-logs/<benchmark>/<problem_id> + an aicodinggym-meta.json file.
    CR's cloned PR repo (read-only) is never used as a target.
  - MLE also pushes the user's solution code (data/ excluded) to a
    <competition_id> branch, gated by the same log-upload consent.
- README: document the logging feature, consent flow, unified repo, MLE code
  push, and privacy.
- bump version 0.5.1 -> 0.6.0

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…xpand MLE ignores, add tests

Addresses review feedback on PR #4:

- Ordering: submit commands now resolve logging (incl. the interactive consent
  prompt) BEFORE printing the success banner, and embed the Logs/Code status
  line into the summary. The helpers return status text instead of echoing.
- No overwrites: every submission pushes to a unique per-submission branch
  `aicodinggym-logs/<benchmark>/<problem_id>/<submission-id>` (UTC timestamp +
  random), so re-submissions and submissions from different directories/machines
  never clobber previous logs. MLE code goes to `<competition_id>/<submission-id>`;
  code and logs share one submission id. Dropped the force-push.
- Expanded MLE .gitignore (model weights, checkpoints, caches, archives, venvs)
  so the pushed code branch stays small.
- Added pytest suite (tests/): entire_logging git behaviour (metadata injection,
  unique branches, ignore list, has_sessions), config consent round-trip +
  allowlist persistence, and CLI remote/consent resolution. 27 tests.
- pyproject: dev extra (pytest) + pytest testpaths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s hermetic

- Reordering logging before the success banner meant an unexpected error in the
  logging path could suppress the "Successfully submitted" summary. Wrap the
  pre-banner logging call in _logging_status() so it degrades to a warning and
  the banner always prints.
- tests/test_cli_logging.py: autouse fixture clears ambient AICODINGGYM_LOGS_REMOTE
  (resolution reads it first) so the resolver tests are hermetic; add tests for
  _logging_status.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…output

- mle download: api.mlebench_download_open() exposes Content-Length + a chunk
  iterator so the CLI drives a click.progressbar (falls back to a running MB
  counter when the server omits Content-Length). Replaces the silent
  mlebench_download_info().
- configure: the Entire auto-installer no longer captures output — it streams,
  and we print "Installing Entire (downloading...)" first, so it no longer looks
  frozen during the download (the likely cause of the "stuck on configure"
  report).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…preserve workspace on re-configure

- fetch/download now print an instruction (when Entire capture is active) to
  start the AI agent INSIDE the fetched directory, because Claude Code/Codex
  load capture hooks from the launch dir and fix them for the session — cd-ing
  in later does not activate them, so the session wouldn't be captured.
- configure: only change workspace_dir when --workspace-dir is explicitly given;
  on re-configure preserve the existing workspace instead of silently resetting
  it to the current directory (fall back to cwd only on first-time setup).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant