Skip to content

k2jac9/LeanAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LeanAI 2026

Predicting Body Fat Percentage Using Machine Learning

An end-to-end MLOps project that predicts body fat percentage from anthropometric measurements using SVR, achieving < 1% error (MAE: 0.10, R²: 0.9996).


Results

Model MAE MAPE
SVR (best) 0.10 0.9996 0.86%
MLPRegressor 0.24 0.9979 1.82%
LinearRegression 0.50 0.9925 3.79%

Full benchmark (11 models):

SVR                0.103    0.9996    0.87%
MLPRegressor       0.248    0.9979    1.82%
StackingRegressor  0.319    0.9947    3.45%
LinearRegression   0.506    0.9926    3.79%
Ridge              0.525    0.9920    4.00%
GradientBoosting   1.315    0.9469    9.06%
XGBoost            1.502    0.9134    9.75%
RandomForest       1.703    0.9104   12.77%
Lasso              2.071    0.8885   16.67%
AdaBoost           2.291    0.8571   17.80%
ElasticNet         2.861    0.7875   23.82%

Tech Stack

Layer Tool/Framework
Language Python 3.12
Data Processing Pandas, Polars, Scikit-learn
ML Models SVR, XGBoost, MLP, Ridge, etc.
Feature Engineering PolynomialFeatures, RFE, PCA
Workflow Orchestration Metaflow
Experiment Tracking MLflow
Model Monitoring Evidently AI
HPO Optuna
Feature Store Featureform (YAML)
API FastAPI
Frontend Streamlit
Containerization Docker + Docker Compose
Infrastructure OpenTofu, K3s/Minikube
CI/CD GitHub Actions
Linting Ruff

Quick Start

Local Development

cd Project

# Install dependencies
pip install -r requirements.txt
pip install -e ".[dev,mlops,viz]"

# Train models
make train

# Run API
make api
# -> http://localhost:8000 (web form)
# -> http://localhost:8000/docs (Swagger UI)

# Run tests
make test

Docker

cd Project
docker compose up --build    # Start API on port 8000
docker compose --profile dev up -d   # Include Jupyter
docker compose --profile train run --rm train  # Train

API Usage

curl -X POST "http://localhost:8000/predict/" \
     -H "accept: application/json" \
     -H "Content-Type: application/x-www-form-urlencoded" \
     -d "abdomen=85&hip=100&weight=75&thigh=60&knee=38&biceps=32&neck=37"

Project Structure

Project/
├── api/                    # FastAPI prediction service
│   ├── main.py             # Endpoints: GET /, POST /predict/, GET /health
│   └── templates/form.html # Web form UI
├── modeling/               # ML training & inference
│   ├── train.py            # SVR pipeline: PolyFeatures -> RFE -> PCA -> SVR
│   └── predict.py          # Batch & single prediction
├── dataset.py              # Data loading, feature engineering, outlier removal
├── plots.py                # EDA visualizations (distributions, heatmaps, boxplots)
├── config.py               # Path configuration
├── tests/                  # pytest test suite
│   ├── test_api.py
│   ├── test_dataset.py
│   └── test_modeling.py
├── mlops/                  # MLOps pipeline
│   ├── src/flows/          # Metaflow workflows
│   ├── src/monitoring/     # Evidently drift detection
│   ├── src/retraining/     # Automated retraining flow
│   ├── features/           # Featureform YAML definitions
│   ├── dashboards/         # Streamlit dashboards
│   └── infra/              # Terraform/OpenTofu IaC
├── notebooks/              # Streamlit app & analysis
├── docker/                 # Dockerfiles (API, Jupyter, Train)
├── docker-compose.yml      # Service orchestration
├── Makefile                # Dev commands
├── pyproject.toml          # Project config, deps, ruff, pytest
├── pixi.toml               # Conda environment (cross-platform)
└── requirements.txt        # Core pip dependencies

Dataset

  • Source: Kaggle - Body Fat Prediction
  • Size: 436 samples, 16 features
  • Target: Body fat percentage
  • Features: Age, Weight, Height, Neck, Chest, Abdomen, Hip, Thigh, Knee, Ankle, Biceps, Forearm, Wrist, Density

Feature Engineering

  • bmi = Weight / (Height/100)^2
  • waist_to_hip = Abdomen / Hip
  • waist_to_height = Abdomen / Height
  • arm_ratio = Forearm / Biceps

ML Pipeline

graph LR
  A[CSV Data] --> B[Feature Engineering]
  B --> C[Outlier Removal<br/>z-score]
  C --> D[PolynomialFeatures<br/>degree=2]
  D --> E[RFE<br/>8 features]
  E --> F[PCA<br/>5 components]
  F --> G[SVR<br/>C=10, rbf]
  G --> H[MLflow Tracking]
  G --> I[Evidently Reports]
  G --> J[FastAPI Serving]
Loading

Models Trained

  • Combined (all data)
  • Male-only subset
  • Female-only subset

Monitoring

  • Data drift detection (Evidently)
  • Target drift detection
  • Regression performance reports
  • Automated retraining via Metaflow when drift is detected

Development

# Lint & format
make lint
make format

# Run full test suite
make test

# Generate EDA plots
make plots

# Process dataset
make data

Architecture

graph TD
  subgraph Client
    A1[Web Form] --> A2[FastAPI]
    A3[REST Client] --> A2
  end
  A2 --> B1[ML Model<br/>joblib]
  B1 --> B2[MLflow Tracking]
  B1 --> B3[Evidently Monitoring]
  
  subgraph MLOps
    C1[Metaflow Orchestration]
    C2[Drift Detection] --> C3[Auto-Retrain]
    C4[Optuna HPO]
  end
  
  subgraph Infrastructure
    D1[Docker Compose]
    D2[GitHub Actions CI/CD]
    D3[OpenTofu IaC]
  end
Loading

Team

Team Member Email
Alejandro Castellanos k2jac9@users.noreply.github.com
Anna Wong annawong.qea@gmail.com
Faisal Khan fa.khan@alumni.utoronto.ca
Hassan Saade saadehassan@hotmail.com
Igor Bak baxwork88@gmail.com

UofT DSI Team-17


License

BSD License - see LICENSE

Releases

No releases published

Packages

 
 
 

Contributors