MindMirror Personality Forecasting Engine

The engineering challenge here was to serve a causality-safe longitudinal NLP pipeline through an API and background worker model without collapsing into fragile batch-only research code.

View Repository

Case Study

Overview

Project Overview

What I built, the problem, and the solution

Designed a longitudinal personality prediction module that converts raw Reddit behavior into weekly trait-direction forecasts, persists run metadata, and serves multi-trait results through authenticated backend endpoints.

Challenge Framing

Static personality labels fail to capture how user behavior shifts over time, but longitudinal prediction introduces leakage risks, irregular activity gaps, and operational complexity.

Solution Strategy

I built a trait-specific weekly forecasting service with async job orchestration, causal feature generation, versioned artifacts, and authenticated prediction APIs.

Project Highlights

Weekly trait-direction prediction for all Big Five dimensions.Cold-start inference path for new users with raw Reddit activity only.Background job execution decoupled from user-facing request latency.Persisted run metadata, confidence scores, and model lineage for auditability.

Gallery

Product screens and workflow snapshots

Selected screens show the visible product experience and the operational surfaces behind each project. Projects without captured assets keep a structured placeholder until real screenshots are available.

Module 03Frame 01

MindMirror Personality Forecasting Engine

Python 3.11FastAPI

Weekly trait forecasting pipeline

Raw Reddit activity flows through scoring, weekly aggregation, causal feature construction, and trait-specific classification.

Module 03Frame 02

MindMirror Personality Forecasting Engine

Python 3.11FastAPI

Async inference lifecycle

User signup, job creation, worker execution, and persisted predictions are decoupled for operational safety.

Module 03Frame 03

MindMirror Personality Forecasting Engine

Python 3.11FastAPI

Model lineage surface

Each inference run stores artifact versions, confidence values, and week-level prediction records.

Tech Stack

Built with tools chosen for reliability and iteration speed

Temporal NLP pipelines, async inference orchestration, feature lineage, and production-style delivery of research-oriented models.

Backend

FastAPI
SQLAlchemy Async

AI / ML

Python 3.11
XGBoost
Transformers
PyTorch

Data

PostgreSQL

DevOps

MLflow

Key Features

Temporal feature pipeline

Transforms raw Reddit posts into weekly aggregates, lagged trait states, gap features, and rolling statistics.

Trait-specific inference

Each Big Five dimension resolves its own artifact, label encoder, and version metadata.

Async orchestration

Signup inference creates background jobs so heavy NLP processing stays outside request latency.

Authenticated API delivery

Prediction results are exposed through JWT-protected endpoints for integrated product consumption.

Architecture

System architecture designed as a readable engineering story

Each layer stays explicit so reviewers can quickly understand where interface, orchestration, persistence, and service responsibilities live.

Acquisition + Identity

The application links user accounts to Reddit identities and persists inference requests into a job queue.

FastAPIJWT authPostgreSQL

Feature and Model Layer

Post-level personality signals and text embeddings roll into weekly causal feature frames for each trait.

TransformersPyTorchXGBoost

Serving + Lineage

Runs, predictions, and model artifacts are versioned so each response remains traceable.

MLflowSQLAlchemy AsyncPostgreSQL

System Flow

Key stages broken down as a readable execution path

The pipeline section keeps the most important engineering steps visible without collapsing them into generic bullet lists.

Extract

Pull Reddit content, normalize text, and prepare post-level inputs for personality scoring.

Reddit dataPythonemoji handling

Represent

Generate transformer-based signals and weekly aggregates with causal shifting rules.

DistilBERT personality modelbert-base-uncased

Predict

Run trait-specific classifiers for up, neutral, or down direction forecasting.

XGBoostMLflow

Serve

Persist predictions and expose authenticated endpoints plus job-based status polling.

FastAPIPostgreSQLworker process

Timeline

A case-study flow that explains how the system took shape

This timeline keeps the implementation story concise: what was framed first, what was hardened next, and what ultimately made the project production-ready.

Phase 01

Longitudinal formulation

Reframed personality as a weekly direction prediction problem rather than a static classification task.

Phase 02

Cold-start inference path

Built a runtime feature builder that can infer directly from raw Reddit bundles for new users.

Phase 03

Async API integration

Separated request handling from heavy inference using a PostgreSQL-backed job queue and worker process.

Challenges

Technical constraints, decisions, and the reasoning behind them

Each challenge is tied to a concrete design choice and a specific outcome.

Solution

Shifted embeddings and derived features, computed thresholds on the training subset only, and used author-aware temporal splits.

Outcome

Preserved causal validity and reviewer credibility.

Solution

Engineered gap-aware features and robust variance fallbacks to stabilize weekly signals.

Outcome

Made the pipeline more reliable across inconsistent user histories.

Solution

Queued jobs in PostgreSQL and processed them through a dedicated worker instead of blocking the API.

Outcome

Kept the product-facing experience responsive.

Results

Metrics and outcomes presented for quick technical review

The emphasis here is signal, not decoration: key numbers, verifiable outcomes, and the context needed to interpret them responsibly.

30,116

Weekly Samples

Per-trait dataset scale across processed weekly records.

774

Feature Count

Embeddings, temporal gaps, lag states, and rolling stats.

75.6-76.9%

Trait Accuracy

Across five trait-specific classifiers.

Queue-based

Async Delivery

Signup inference runs through a DB-backed worker flow.

Key Results

Served weekly direction predictions for all five Big Five traits through authenticated APIs.
Engineered a 774-feature temporal NLP pipeline over 30,116 weekly samples per trait.
Achieved 75.6% to 76.9% accuracy across trait-specific classifiers.
Delivered async inference with persisted lineage and background job processing.

Business Impact

Research value

Demonstrates how longitudinal behavioral modeling can move from static personality labeling to production-oriented forecasting.

Product value

Creates a more adaptable input for personalized product experiences and trend-aware user analytics.

Continue