Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
-
Updated
Jan 13, 2026 - Python
Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.
A list of tools for annotating data, managing annotations, etc.
🍳 Recipes for the Prodigy, our fully scriptable annotation tool
🧬 A JupyterLab extension for annotating data with Prodigy
The Wearables Development Toolkit - a development environment for activity recognition applications with sensor signals
Social Media Mining Toolkit (SMMT) main repository
🔥 One of the most comprehensive open-source data annotation platform.
Tornado is an open source Human-in-the-loop machine learning tool. It helps you label your dataset on the fly while training your model through a simple web user interface. It supports all data types: structured, text and image.
Data-centric AI building blocks for computer vision applications
A system for prompted weak supervision. Alfred is a powerful tool that leverages large language models to accelerate data annotation.
PersianDataAnnotations is ASP.NET Core MVC & ASP.NET MVC Custom Localization DataAnnotations (Localized MVC Errors) for Persian(Farsi) language - فارسی سازی خطاهای اعتبارسنجی توکار ام.وی.سی. و کور.ام.وی.سی. برای نمایش اعتبار سنجی سمت کلاینت
Visualization and Annotation Tool for ROS
Use Large Language Models like OpenAI's GPT-3.5 for data annotation and model enhancement. This framework combines human expertise with LLMs, employs Iterative Active Learning for continuous improvement, and integrates CleanLab (Confident Learning) to ensure high-quality datasets and better model performance
Lightweight self-hosted span annotation tool
A free and opensource yolov8, yolo11 and yolo26 all in one training tool that automates file structure and yaml files, auto labeling with SAM2, brush system for uninterupted labeling, a strong modular augmentation system where anybody can write their own filters and training. Without having to open terminal.
🧬 A VS Code extension for annotating data with Prodigy
AnnoTheia is a data annotation toolkit that identifies when a person speaks in a scene and transcribes their speech, also offering flexibility to replace modules for different languages.
This is a tool to annotate the focus plane of z-stacked images.
a tool for mapping free-text descriptions of entities to ontology terms
Add a description, image, and links to the data-annotation topic page so that developers can more easily learn about it.
To associate your repository with the data-annotation topic, visit your repo's landing page and select "manage topics."