SynthData

A sandbox for synthetic data generation and evaluation.

It keeps forks of syntheval and synthcity as editable submodules to make it easy to test new features and bug fixes in those libraries. It also contains some early versions of apps, notebooks, and scripts for testing out different synthetic data generation and evaluation techniques.

Quick Start

Clone the repo, initialize submodules, and install the main environment with uv:

git clone https://github.com/childmindresearch/synthdata.git
cd synthdata
git submodule update --init --recursive
uv sync

uv sync installs the newer synthcity and syntheval workflow by default. Install optional extras when you need the older experiment tracks:

uv sync --extra ydata
uv sync --extra presidio

Apps

apps/presidio/presidio_streamlit.py: Presidio's Streamlit app, modified for offline use. For the full version of the anonymizer, see anonymize-pii.

See PRESIDIO APP GUIDE for details.

Notebooks

notebooks/ydata-test.py: Testing ydata-synthetic library for tabular data synthesis. To run using marimo:
```
uv run --extra ydata marimo run notebooks/ydata-test.py
```
notebooks/test_hepatitis_data.ipynb: Testing synthcity generators (+TabPFN) and syntheval & synthcity evaluations on the hepatitis dataset.
notebooks/tabpfn_demo.ipynb: Testing classification and synthetic data generation with TabPFN. Add a TABPFN_TOKEN (and optionally HF_TOKEN) to an .env file at the root of the project to access the TabPFN API (and download HuggingFace models faster).

Scripts

scripts/markdown_parser.py: Early, monolithic version of the markdown parser. For the full version, see headhunter.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
apps/presidio		apps/presidio
models		models
notebooks		notebooks
scripts		scripts
submodules		submodules
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SynthData

Quick Start

Apps

Notebooks

Scripts

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

SynthData

Quick Start

Apps

Notebooks

Scripts

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages