Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,38 @@ jobs:

- name: Run ruff format check
run: uv run ruff format --check .

build-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v4

- name: Set up Python
run: uv python install 3.12

- name: Build package
run: uv build

- name: Check build artifacts
run: |
uv pip install twine
uv run twine check dist/*

web-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install Bun
uses: oven-sh/setup-bun@v1

- name: Install dependencies
run: bun install
working-directory: web

- name: Build website
run: bun run build
working-directory: web
43 changes: 43 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: Release

on:
push:
tags:
- 'v*'

jobs:
publish:
runs-on: ubuntu-latest
permissions:
id-token: write # Mandatory for trusted publishing
contents: read

steps:
- uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v4

- name: Set up Python
run: uv python install 3.12

- name: Build package
run: uv build

- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
# with:
# repository-url: https://upload.pypi.org/legacy/ # Optional: defaults to PyPI

github-release:
runs-on: ubuntu-latest
needs: publish
permissions:
contents: write
steps:
- uses: actions/checkout@v4
- name: Create GitHub Release
uses: softprops/action-gh-release@v2
with:
generate_release_notes: true
files: dist/*
36 changes: 36 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.1.0] - 2026-05-22

### Added
- First official stable release.
- Comprehensive dataset profiling and health checks.
- Support for multiple report formats: Markdown, JSON, HTML, and PDF.
- Two HTML report themes: `minimal` and `neubrutalism`.
- Automatic suggestion provider for data quality issues.
- Code generation for pandas fix scripts.
- Sklearn preprocessing pipeline generation.
- CLI commands: `scan`, `details`, `report`, `checks`, and `version`.
- Robust error handling for invalid files, empty datasets, and malformed configurations.
- Configuration support via YAML, TOML, and JSON.
- Intelligent sampling for large datasets.
- Dataset drift detection.
- Mutual information and statistical checks.

### Changed
- Refactored report generators for robust lazy loading.
- Improved CLI output with better error messages and color-coded statuses.
- Updated documentation and website for stable release.

### Fixed
- Fixed crash when generating reports for single-row datasets.
- Fixed dependency issues in report generation when optional libraries are missing.
- Fixed non-deterministic order in generated fix scripts.

## [0.1.0b3] - 2026-04-15
- Initial beta release with core features.
14 changes: 12 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
</div>

> [!NOTE]
> HashPrep is in **beta** (v0.1.0b3). Core features are fully tested with CI. The API may still evolve based on community feedback.
> HashPrep is now in its first stable release (v0.1.0). Core features are fully tested with CI.

## Overview

Expand Down Expand Up @@ -130,6 +130,7 @@ hashprep report dataset.csv --format html --theme minimal
```

**Options:**
- `--output PATH`, `-o PATH`: Custom output file path
- `--format {md,json,html,pdf}`: Report format (default: md)
- `--theme {minimal,neubrutalism}`: HTML report theme (default: minimal)
- `--with-code`: Generate Python scripts for fixes and pipelines
Expand All @@ -153,6 +154,9 @@ hashprep report dataset.csv --format pdf --no-visualizations
# Generate report with automatic fix scripts
hashprep report dataset.csv --with-code

# Generate report with custom output path
hashprep report dataset.csv --format html --output my_reports/analysis.html

# This creates:
# - dataset_hashprep_report.md (or .html/.pdf/.json)
# - dataset_hashprep_report_fixes.py (pandas script)
Expand All @@ -162,7 +166,13 @@ hashprep report dataset.csv --with-code
hashprep report train.csv --comparison test.csv --format html
```

#### 4. Version
#### 4. List Available Checks
Discover all data quality checks that HashPrep can perform.
```bash
hashprep checks
```

#### 5. Version
Check HashPrep version.
```bash
hashprep version
Expand Down
181 changes: 181 additions & 0 deletions RELEASE_TODO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
# HashPrep Official Release TODO

## Release Scope

- [x] Decide the official version target, likely `0.1.0` unless the release should signal `1.0.0`.
- [ ] Freeze public API expectations for `DatasetAnalyzer`.
- [ ] Freeze public API expectations for `HashPrepConfig`.
- [ ] Freeze public API expectations for `load_config`.
- [ ] Freeze public API expectations for `generate_report`.
- [x] Freeze CLI behavior for `scan`, `details`, `report`, and `version`.
- [ ] Define what is stable versus experimental, especially auto-fixes, generated pipelines, report themes, and statistical checks.

## Code Readiness

- [ ] Run the full test suite across Python `3.10`, `3.11`, and `3.12`.
- [ ] Run lint checks.
- [ ] Run format checks.
- [x] Add or verify tests for CLI error handling.
- [ ] Add or verify tests for invalid check names.
- [x] Add or verify tests for report generation in `md`, `json`, `html`, and `pdf`.
- [x] Add or verify tests for generated `fixes.py` code.
- [x] Add or verify tests for generated sklearn pipeline code.
- [x] Add or verify tests for config loading from YAML, TOML, and JSON.
- [ ] Add or verify tests for large dataset sampling behavior.
- [ ] Verify empty CSV behavior.
- [ ] Verify duplicate column behavior.
- [ ] Verify missing target column behavior.
- [ ] Verify non-numeric target behavior for mutual information and statistical checks.
- [ ] Verify datasets with infinities, all-null columns, and mixed types.
- [ ] Verify reports do not crash when plots are disabled.
- [ ] Verify reports do not crash when optional summary data is missing.

## Functionality Readiness

### Must Improve Before Stable

- [x] Improve CLI error handling for invalid files.
- [x] Improve CLI error handling for empty CSVs.
- [x] Improve CLI error handling for bad target columns.
- [x] Improve CLI error handling for bad config files.
- [x] Improve CLI error handling for unsupported report formats.
- [x] Improve CLI error handling for failed PDF generation.
- [x] Ensure CLI failures produce clear user-facing messages.
- [ ] Ensure HTML reports work when no issues are found.
- [ ] Ensure PDF reports work when no issues are found.
- [ ] Ensure Markdown reports work when no issues are found.
- [ ] Ensure JSON reports work when no issues are found.
- [ ] Ensure all report formats work when plots are disabled.
- [ ] Ensure all report formats work for tiny datasets.
- [ ] Ensure all report formats work for mostly missing datasets.
- [ ] Ensure all report formats work when optional summaries are absent.
- [x] Verify generated fix scripts are deterministic.
- [x] Verify generated fix scripts are valid Python.
- [x] Verify generated sklearn pipeline code is deterministic.
- [x] Verify generated sklearn pipeline code is valid Python.
- [x] Add tests that execute generated fix scripts where practical.
- [x] Add tests that execute generated sklearn pipeline code where practical.
- [ ] Clearly label generated fixes as suggestions if they are heuristic or incomplete.
- [x] Add config validation for unknown keys.
- [x] Add config validation for wrong value types.
- [x] Add clear errors for malformed YAML, TOML, and JSON config files.
- [ ] Confirm threshold behavior is predictable and documented.
- [ ] Decide whether summary dictionary shapes are part of the stable public API.
- [ ] Document stable summary keys if summary dictionaries are part of the public API.

### Strongly Recommended

- [x] Add an `--output` option to `hashprep report`.
- [x] Allow `hashprep report data.csv --format html --output reports/data.html`.
- [ ] Add machine-readable JSON output for `hashprep details`.
- [x] Add check discovery through a command such as `hashprep checks`.
- [x] Alternatively add check discovery through an option such as `hashprep scan --list-checks`.
- [ ] Document why issues are classified as `critical` versus `warning`.
- [ ] Review whether PDF/reporting dependencies should move to optional extras in a future release.
- [ ] Document any dependency-extra plan if it is deferred.

### Not For This Stable Release

- [ ] Avoid adding major new check families before the first stable release unless they fix a release blocker.
- [ ] Avoid adding model integrations before the first stable release.
- [ ] Avoid adding dashboard features before the first stable release.
- [ ] Avoid expanding automatic dataset repair workflows before the first stable release.
- [ ] Keep the first stable release focused on hardening existing behavior.

## Packaging

- [x] Update `hashprep/__init__.py` from `0.1.0b3` to the official release version.
- [x] Update the beta note in `README.md`.
- [x] Update the beta support table in `SECURITY.md`.
- [x] Add or verify PyPI classifiers in `pyproject.toml`.
- [x] Add or verify package keywords in `pyproject.toml`.
- [x] Add or verify project URLs in `pyproject.toml`.
- [x] Add or verify Python version classifiers in `pyproject.toml`.
- [x] Add or verify license metadata in `pyproject.toml`.
- [ ] Build source distribution.
- [ ] Build wheel distribution.
- [ ] Inspect built package artifacts.
- [ ] Install the built wheel in a clean environment.
- [ ] Smoke test `import hashprep` from the built wheel.
- [ ] Smoke test `hashprep version` from the built wheel.
- [ ] Smoke test `hashprep scan datasets/train.csv` from the built wheel.
- [ ] Smoke test `hashprep report datasets/train.csv --format html` from the built wheel.
- [ ] Confirm `MANIFEST.in` excludes dev/demo files intentionally.
- [ ] Confirm `MANIFEST.in` includes all runtime files needed by reports and templates.

## Documentation

- [x] Replace beta references in `README.md`.
- [x] Replace beta references in `SECURITY.md`.
- [x] Replace beta references in `web/src/lib/components/Hero.svelte`.
- [x] Add `CHANGELOG.md`.
- [ ] Add first official release notes.
- [ ] Document available checks in release notes.
- [ ] Document CLI commands in release notes.
- [ ] Document report formats in release notes.
- [ ] Document known limitations in release notes.
- [ ] Document upgrade notes from beta.
- [ ] Verify README examples run exactly as written.
- [ ] Update the documentation URL in `pyproject.toml` if a dedicated docs page is available.
- [ ] Refresh generated example reports under `examples/reports/` if needed.

## CI/CD

- [x] Add package build validation to CI.
- [x] Add `twine check` or equivalent artifact validation to CI.
- [ ] Add built-wheel install smoke test to CI.
- [ ] Add CLI smoke test against the built wheel to CI.
- [x] Add website build check for `web/`.
- [x] Add or verify release workflow triggered by version tags.
- [ ] Configure PyPI publishing, preferably with trusted publishing.
- [x] Configure GitHub release creation.
- [ ] Consider adding dependency and security scanning for Python dependencies.
- [ ] Consider adding dependency and security scanning for web dependencies.

## Security And Dependencies

- [ ] Review pinned and minimum Python dependency versions.
- [ ] Review pinned and minimum web dependency versions.
- [ ] Confirm heavy dependencies are intentional, especially `weasyprint`, `matplotlib`, `seaborn`, and `scikit-learn`.
- [ ] Decide whether PDF/report dependencies should remain core dependencies or move to optional extras in a future release.
- [ ] Run vulnerability checks for Python dependencies.
- [ ] Run vulnerability checks for web dependencies.
- [x] Update `SECURITY.md` to describe stable release support.

## Website

- [x] Update website beta badges.
- [ ] Update website install examples.
- [ ] Confirm the docs page matches the current README and API.
- [x] Build the static site successfully.
- [ ] Verify PyPI link.
- [ ] Verify GitHub link.
- [ ] Verify docs link.
- [ ] Verify license link.
- [ ] Verify issue tracker link.
- [ ] Decide deployment target for the docs site.
- [ ] Decide release timing for the docs site.

## Release Process

- [ ] Create a release branch.
- [ ] Make version, docs, and changelog updates.
- [ ] Run full validation locally.
- [ ] Merge after CI passes.
- [ ] Tag the release, for example `v0.1.0`.
- [ ] Publish to PyPI.
- [ ] Create GitHub release with release notes.
- [ ] Verify public install with `pip install hashprep`.
- [ ] Verify public CLI with `hashprep version`.
- [ ] Announce the release as the first stable release after alpha and beta.

## Known Immediate Gaps

- [x] Version is still `0.1.0b3`.
- [x] `README.md` still says beta.
- [x] Website hero still says beta and shows `hashprep-0.1.0b3`.
- [x] `SECURITY.md` only describes beta support.
- [x] `CHANGELOG.md` is not present.
- [x] CI does not currently build or check publish artifacts.
- [x] CI does not currently build the Svelte website.
- [x] No release or publish workflow is present.
6 changes: 3 additions & 3 deletions SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@

## Supported Versions

hashprep is currently in beta (`0.1.0bX`). Only the latest beta release on the `main` branch receives security updates. Older pre-releases are not patched — please upgrade to the newest version to pick up fixes.
hashprep has reached a stable `0.1.0` release. Only the latest minor release is supported for security updates.

| Version | Supported |
| ---------- | ------------------ |
| `0.1.0b3` | :white_check_mark: |
| `< 0.1.0b3`| :x: |
| `0.1.0` | :white_check_mark: |
| `< 0.1.0` | :x: |

Once hashprep reaches a stable `0.1.0` release, this table will be updated to reflect supported minor versions.

Expand Down
2 changes: 1 addition & 1 deletion hashprep/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
from .core.analyzer import DatasetAnalyzer as DatasetAnalyzer
from .utils.config_loader import load_config as load_config

__version__ = "0.1.0b3"
__version__ = "0.1.0"
Loading
Loading