Challenge Framing
Churn modeling often starts in notebooks and becomes difficult to reproduce, compare, or operationalize once preprocessing and training logic spread across experiments.
A reproducible churn workflow designed to bridge notebook experimentation and deployable ML systems.
This project emphasizes engineering discipline as much as model quality. The core work was not just fitting a classifier, but structuring the pipeline so preprocessing, evaluation, and inference all remain inspectable.
Overview
Reworked exploratory notebook experimentation into a cleaner pipeline architecture that separates data preparation, model training, inference, and artifact tracking while preserving reproducibility and reviewability.
Problem
Churn modeling often starts in notebooks and becomes difficult to reproduce, compare, or operationalize once preprocessing and training logic spread across experiments.
Approach
I centralized configuration, separated data and training stages, used PySpark for scalable preparation, and logged experiments plus artifacts with MLflow.
Reworked exploratory notebook experimentation into a cleaner pipeline architecture that separates data preparation, model training, inference, and artifact tracking while preserving reproducibility and reviewability.
Challenge Framing
Churn modeling often starts in notebooks and becomes difficult to reproduce, compare, or operationalize once preprocessing and training logic spread across experiments.
Solution Strategy
I centralized configuration, separated data and training stages, used PySpark for scalable preparation, and logged experiments plus artifacts with MLflow.
Project Highlights
ML pipeline engineering, traceability, artifact lineage, experiment management, and model evaluation for imbalanced classification.
Core Stack
Key Features
Cleaning, encoding, scaling, and split generation are orchestrated as pipeline code rather than notebook cells.
Multiple classical baselines stay comparable through consistent evaluation and shared artifact outputs.
MLflow captures model parameters, metrics, processed datasets, and serialized artifacts.
Prediction latency and batch-level analytics are surfaced through a streaming-style inference wrapper.
Each layer stays explicit so reviewers can quickly understand where ingestion, orchestration, persistence, and model-serving responsibilities live.
Raw telco data is transformed into consistent train/test artifacts with PySpark-backed cleaning and feature prep.
Classifiers are trained, evaluated, and versioned with a shared experiment workflow.
Saved models and preprocessing assets are reused for prediction and telemetry logging.
The pipeline section keeps the most important engineering steps visible without collapsing them into generic bullet lists.
Load the telco churn dataset and normalize schema-level issues such as incomplete numeric values.
Handle missing values, remove outliers, encode categoricals, and scale key numeric features.
Benchmark classical ML baselines and log metrics, parameters, and artifacts into MLflow.
Load serialized assets for downstream prediction while recording latency and batch-level telemetry.
This timeline keeps the implementation story concise: what was framed first, what was hardened next, and what ultimately made the project production-ready.
Moved scattered preparation and evaluation logic into a coherent repository structure.
Created modular data, training, and inference stages with shared configuration.
Added MLflow-based lineage so metrics and model artifacts remain reviewable across runs.
This section is intentionally recruiter-friendly and engineer-friendly at the same time: each challenge is tied to a concrete design choice and a specific outcome.
Challenge
Solution
Centralized logic under scriptable modules and orchestrated pipelines with shared config.
Outcome
Reduced manual repetition and improved auditability.
Challenge
Solution
Used PySpark for deterministic transformation, then converted to pandas for estimator compatibility.
Outcome
Kept the system scalable without sacrificing model tooling.
Challenge
Solution
Elevated F1, precision, and recall beside raw accuracy during evaluation and reporting.
Outcome
Made the model discussion more credible for reviewers and recruiters.
The emphasis here is signal, not decoration: key numbers, verifiable outcomes, and the context needed to interpret them responsibly.
Best CV F1
0.845
Random Forest baseline during notebook benchmarking.
Holdout Recall
0.735
Kept churn detection visible instead of hiding behind accuracy.
Train/Test Split
80/20
Consistent split strategy for reproducible evaluation.
Pipeline Scope
4 stages
Data prep, training, evaluation, and inference telemetry.
Key Results
Research + Business Impact
Supports retention-risk analysis with a pipeline that can evolve into batch scoring or service-based predictions.
Shows ML systems maturity through configuration, artifacts, reproducibility, and observability rather than just raw metrics.