An end-to-end MLOps project that predicts body fat percentage from anthropometric measurements using SVR, achieving < 1% error (MAE: 0.10, R²: 0.9996).
| Model | MAE | R² | MAPE |
|---|---|---|---|
| SVR (best) | 0.10 | 0.9996 | 0.86% |
| MLPRegressor | 0.24 | 0.9979 | 1.82% |
| LinearRegression | 0.50 | 0.9925 | 3.79% |
Full benchmark (11 models):
SVR 0.103 0.9996 0.87%
MLPRegressor 0.248 0.9979 1.82%
StackingRegressor 0.319 0.9947 3.45%
LinearRegression 0.506 0.9926 3.79%
Ridge 0.525 0.9920 4.00%
GradientBoosting 1.315 0.9469 9.06%
XGBoost 1.502 0.9134 9.75%
RandomForest 1.703 0.9104 12.77%
Lasso 2.071 0.8885 16.67%
AdaBoost 2.291 0.8571 17.80%
ElasticNet 2.861 0.7875 23.82%
| Layer | Tool/Framework |
|---|---|
| Language | Python 3.12 |
| Data Processing | Pandas, Polars, Scikit-learn |
| ML Models | SVR, XGBoost, MLP, Ridge, etc. |
| Feature Engineering | PolynomialFeatures, RFE, PCA |
| Workflow Orchestration | Metaflow |
| Experiment Tracking | MLflow |
| Model Monitoring | Evidently AI |
| HPO | Optuna |
| Feature Store | Featureform (YAML) |
| API | FastAPI |
| Frontend | Streamlit |
| Containerization | Docker + Docker Compose |
| Infrastructure | OpenTofu, K3s/Minikube |
| CI/CD | GitHub Actions |
| Linting | Ruff |
cd Project
# Install dependencies
pip install -r requirements.txt
pip install -e ".[dev,mlops,viz]"
# Train models
make train
# Run API
make api
# -> http://localhost:8000 (web form)
# -> http://localhost:8000/docs (Swagger UI)
# Run tests
make testcd Project
docker compose up --build # Start API on port 8000
docker compose --profile dev up -d # Include Jupyter
docker compose --profile train run --rm train # Traincurl -X POST "http://localhost:8000/predict/" \
-H "accept: application/json" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "abdomen=85&hip=100&weight=75&thigh=60&knee=38&biceps=32&neck=37"Project/
├── api/ # FastAPI prediction service
│ ├── main.py # Endpoints: GET /, POST /predict/, GET /health
│ └── templates/form.html # Web form UI
├── modeling/ # ML training & inference
│ ├── train.py # SVR pipeline: PolyFeatures -> RFE -> PCA -> SVR
│ └── predict.py # Batch & single prediction
├── dataset.py # Data loading, feature engineering, outlier removal
├── plots.py # EDA visualizations (distributions, heatmaps, boxplots)
├── config.py # Path configuration
├── tests/ # pytest test suite
│ ├── test_api.py
│ ├── test_dataset.py
│ └── test_modeling.py
├── mlops/ # MLOps pipeline
│ ├── src/flows/ # Metaflow workflows
│ ├── src/monitoring/ # Evidently drift detection
│ ├── src/retraining/ # Automated retraining flow
│ ├── features/ # Featureform YAML definitions
│ ├── dashboards/ # Streamlit dashboards
│ └── infra/ # Terraform/OpenTofu IaC
├── notebooks/ # Streamlit app & analysis
├── docker/ # Dockerfiles (API, Jupyter, Train)
├── docker-compose.yml # Service orchestration
├── Makefile # Dev commands
├── pyproject.toml # Project config, deps, ruff, pytest
├── pixi.toml # Conda environment (cross-platform)
└── requirements.txt # Core pip dependencies
- Source: Kaggle - Body Fat Prediction
- Size: 436 samples, 16 features
- Target: Body fat percentage
- Features: Age, Weight, Height, Neck, Chest, Abdomen, Hip, Thigh, Knee, Ankle, Biceps, Forearm, Wrist, Density
bmi = Weight / (Height/100)^2waist_to_hip = Abdomen / Hipwaist_to_height = Abdomen / Heightarm_ratio = Forearm / Biceps
graph LR
A[CSV Data] --> B[Feature Engineering]
B --> C[Outlier Removal<br/>z-score]
C --> D[PolynomialFeatures<br/>degree=2]
D --> E[RFE<br/>8 features]
E --> F[PCA<br/>5 components]
F --> G[SVR<br/>C=10, rbf]
G --> H[MLflow Tracking]
G --> I[Evidently Reports]
G --> J[FastAPI Serving]
- Combined (all data)
- Male-only subset
- Female-only subset
- Data drift detection (Evidently)
- Target drift detection
- Regression performance reports
- Automated retraining via Metaflow when drift is detected
# Lint & format
make lint
make format
# Run full test suite
make test
# Generate EDA plots
make plots
# Process dataset
make datagraph TD
subgraph Client
A1[Web Form] --> A2[FastAPI]
A3[REST Client] --> A2
end
A2 --> B1[ML Model<br/>joblib]
B1 --> B2[MLflow Tracking]
B1 --> B3[Evidently Monitoring]
subgraph MLOps
C1[Metaflow Orchestration]
C2[Drift Detection] --> C3[Auto-Retrain]
C4[Optuna HPO]
end
subgraph Infrastructure
D1[Docker Compose]
D2[GitHub Actions CI/CD]
D3[OpenTofu IaC]
end
| Team Member | |
|---|---|
| Alejandro Castellanos | k2jac9@users.noreply.github.com |
| Anna Wong | annawong.qea@gmail.com |
| Faisal Khan | fa.khan@alumni.utoronto.ca |
| Hassan Saade | saadehassan@hotmail.com |
| Igor Bak | baxwork88@gmail.com |
UofT DSI Team-17
BSD License - see LICENSE