☰ 👉 Datalayer GitHub Actions

This repository contains reusable GitHub Actions for Datalayer workflows.

datalayer-evals

The datalayer-evals action runs Datalayer eval reports in CI and produces report artifacts.

It uses the datalayer-core Python API directly (the DatalayerClient and the core eval-report helpers) rather than shelling out to the CLI, so the generated reports contain the full structured failure diagnostics (per-run failure causes, stages, types and detail excerpts). Failures are also aggregated into the GitHub step summary and exposed as action outputs.

It supports two execution modes:

Primary report mode (single evalset).
Comparison mode (primary + secondary evalsets) with a generated summary markdown.

Evalsets can be provided as IDs, or created on the fly from spec files.

Primary report mode produces, for each report:

<output-markdown> and a matching .csv (when export-csv is true)
a <output-markdown>.log artifact containing the full structured report JSON (including every run-level failure cause)
timestamped files report-<timestamp>.md and report-<timestamp>.csv

The action is implemented in Python and can be consumed from other repositories.

Inputs

evalset-id: required, target evalset UID
evalset-spec-file: optional, path to primary evalset spec JSON; action creates evalset and reports it
secondary-evalset-id: optional, secondary evalset UID
secondary-evalset-spec-file: optional, path to secondary evalset spec JSON
token: required, Datalayer API token
ai-agents-url: optional, override API URL
account-uid: optional, account/org context
run-limit: optional, default 50
output-markdown: optional, default evals-report.md
secondary-output-markdown: optional, output file for secondary report
comparison-summary-output: optional, output file for secondary-vs-primary summary
export-csv: optional, default true
iam-url: optional, IAM URL override used when creating the optional agent runtime
runtimes-url: optional, Runtimes URL override used when creating the optional agent runtime
agentspec-id: optional, create an agent runtime before reporting using this spec id (default example-simple)
agentspec: optional, URL or local file path to YAML/JSON agent spec; mutually exclusive with agentspec-id
agent-environment-name: optional, default ai-agents-env
agent-given-name: optional runtime name for the created agent runtime
agent-time-reservation: optional runtime reservation in minutes, default 10
billable-account-uid: optional billable account UID used when creating the optional agent runtime

If billable-account-uid is not provided, the action also checks the environment for DATALAYER_BILLABLE_ACCOUNT_UID (for example from a repository secret).

When the action creates a runtime via agentspec-id or agentspec, it automatically tears the runtime down after report generation (including early-exit paths).

At least one of evalset-id or evalset-spec-file must be provided.

Outputs

report-file: markdown report file path
csv-file: CSV report file path (empty when export-csv=false)
log-file: full structured report JSON log file path (captures all failure causes)
timestamped_report_file: timestamped markdown path
timestamped_csv_file: timestamped CSV path
secondary-report-file: secondary markdown report path
secondary-csv-file: secondary CSV report path
secondary-log-file: secondary structured report JSON log
secondary-timestamped-report-file: secondary timestamped markdown
secondary-timestamped-csv-file: secondary timestamped CSV
comparison-summary-file: generated comparison summary markdown
agent-runtime-pod-name: pod name of runtime optionally created through the core client
agent-runtime-ingress: ingress URL of that optional runtime
failed-run-count: total number of failed runs across primary and secondary reports
primary-failed-run-count: number of failed runs in the primary report
secondary-failed-run-count: number of failed runs in the secondary report

Use From Another Repository

Example workflow step (single evalset):

uses: datalayer/github-actions@v1 with: evalset-id: 01KXXXXXXXXXXXX token: ${{ secrets.DATALAYER_API_KEY }} run-limit: "50" output-markdown: artifacts/evals-report.md export-csv: "true"

Example workflow step with runtime bootstrap from spec id before report:

uses: datalayer/github-actions@v1 with: evalset-id: 01KXXXXXXXXXXXX token: ${{ secrets.DATALAYER_API_KEY }} agentspec-id: example-simple agent-environment-name: ai-agents-env agent-time-reservation: "10" billable-account-uid: ${{ secrets.DATALAYER_BILLABLE_ACCOUNT_UID }} output-markdown: artifacts/evals-report.md export-csv: "true"

Example workflow step (two spec files, one comparison run):

uses: datalayer/github-actions@v1 with: evalset-spec-file: .github/evals/no-codemode.evalset.json secondary-evalset-spec-file: .github/evals/codemode.evalset.json token: ${{ secrets.DATALAYER_API_KEY }} output-markdown: artifacts/no-codemode-report.md secondary-output-markdown: artifacts/codemode-report.md comparison-summary-output: artifacts/comparison-summary.md export-csv: "true"

Upload artifacts in the consumer workflow:

uses: actions/upload-artifact@v4 with: name: evals-report path: | artifacts/evals-report.md artifacts/evals-report.csv artifacts/evals-report.md.log

For two-spec comparison mode, also upload:

	artifacts/no-codemode-report.md
	artifacts/no-codemode-report.csv
	artifacts/no-codemode-report.md.log
	artifacts/codemode-report.md
	artifacts/codemode-report.csv
	artifacts/codemode-report.md.log
	artifacts/comparison-summary.md

Publish New Versions

Commit and push changes to main.
Tag a version.
Push the tag.

Commands:

git tag -a v1.0.0 -m "datalayer-evals v1.0.0" git push origin v1.0.0

Recommended tag strategy:

Maintain a moving major tag for stable consumers.
Example:
- v1.0.0 immutable release tag
- v1 moving major tag

Move major tag:

git tag -f v1 v1.0.0 git push -f origin v1

Consumers should reference v1 for stable updates, or pin an immutable tag for strict reproducibility.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
action.yml		action.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

☰ 👉 Datalayer GitHub Actions

datalayer-evals

Inputs

Outputs

Use From Another Repository

Publish New Versions

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

☰ 👉 Datalayer GitHub Actions

datalayer-evals

Inputs

Outputs

Use From Another Repository

Publish New Versions

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages