This directory contains all data files for the Bishop State Student Success Prediction project.
| File | Description | Records | Size |
|---|---|---|---|
De-identified PDP AR Files.xlsx |
Original Excel file with AR data | ~20K | 4 MB |
ar_bscc_with_zip.csv |
Bishop State AR data with zip codes | ~4K | 2 MB |
bishop_state_cohorts_with_zip.csv |
Student cohort data with zip codes | ~4K | 7 MB |
bishop_state_courses.csv |
Course enrollment records | ~100K | 29 MB |
| File | Description | Records | Size |
|---|---|---|---|
bishop_state_student_level_with_zip.csv |
Student-level aggregated data | ~4K | 12 MB |
| File | Description | Records | Size |
|---|---|---|---|
bishop_state_student_level_with_predictions.csv |
Student-level data with ML predictions | ~4K | 17 MB |
bishop_state_merged_with_predictions.csv |
Course-level data with ML predictions | ~100K | 72 MB |
Note: The original KCTCS source files (
kctcs_*.csv,ar_kcts_with_zip.csv) remain in this directory for reference and will be removed in a future cleanup task.
Student_GUID: Unique student identifier used across all files
bishop_state_cohorts_with_zip.csv (1 row per student)
├── Merged with → ar_bscc_with_zip.csv (1 row per student)
└── Merged with → bishop_state_courses.csv (multiple rows per student)
└── Aggregated to → bishop_state_student_level_with_zip.csv (1 row per student)
└── ML Pipeline adds predictions → bishop_state_student_level_with_predictions.csv
- Source Files: Start with raw CSV and Excel files
- Merge: Run
ai_model/merge_bishop_state_data.pyto create merged dataset - Aggregate: Create student-level aggregated features
- ML Pipeline: Run
ai_model/complete_ml_pipeline.pyto generate predictions - Output: Predictions saved to both student-level and course-level files
- All student data is de-identified
- Missing values are handled during preprocessing
- Zip codes added for geographic analysis
- Course-level data includes one row per course enrollment per student
- Student-level data aggregates all courses into summary statistics
All data files contain de-identified student information. No personally identifiable information (PII) is included.
For detailed field descriptions, see:
- DATA_DICTIONARY.md: Complete data dictionary
- ML_MODELS_GUIDE.md: ML models documentation
- README.md: Main project README