This repository contains datasets for manual annotation projects for the AAPB-CLAMS collaboration.
The American Archive of Public Broadcasting (AAPB) has involved the CLAMS team to develop information extraction systems for digital archives of public media (primarily video and audio from publicly-funded tv shows and radio broadcasts). This collaboration will facilitate the research and preservation of significant historical content from this media collection. This repository provides training and evaluation data for the machine learning-based CLAMS apps in this process.
This repository contains:
projects/- Annotation project directories (one per project). See projects/README.md for details on project structure, gold datasets, and a complete list of current projects.batches/- Annotation batch definitions tracking source data selections from the AAPB collection. See batches/README.md for information about batches and AAPB GUIDs.- Documentation files
For contributors and data managers, see CONTRIBUTING.md for detailed guidelines on creating and maintaining annotation projects.
Progress and other discussion by AAPB/CLAMS/GBH is tracked via the open and closed GitHub Issues feature.
Please email CLAMS.ai admin for other inquiries.