Back To Projects
Module 03

Temporal personality forecasting built for product integration, not just offline experimentation.

MindMirror Personality Forecasting Engine

The engineering challenge here was to serve a causality-safe longitudinal NLP pipeline through an API and background worker model without collapsing into fragile batch-only research code.

Overview

Designed a longitudinal personality prediction module that converts raw Reddit behavior into weekly trait-direction forecasts, persists run metadata, and serves multi-trait results through authenticated backend endpoints.

Problem

Static personality labels fail to capture how user behavior shifts over time, but longitudinal prediction introduces leakage risks, irregular activity gaps, and operational complexity.

Approach

I built a trait-specific weekly forecasting service with async job orchestration, causal feature generation, versioned artifacts, and authenticated prediction APIs.

Project Overview

A modern engineering case study, structured for both recruiters and builders

Designed a longitudinal personality prediction module that converts raw Reddit behavior into weekly trait-direction forecasts, persists run metadata, and serves multi-trait results through authenticated backend endpoints.

Challenge Framing

Static personality labels fail to capture how user behavior shifts over time, but longitudinal prediction introduces leakage risks, irregular activity gaps, and operational complexity.

Solution Strategy

I built a trait-specific weekly forecasting service with async job orchestration, causal feature generation, versioned artifacts, and authenticated prediction APIs.

Project Highlights

  • Weekly trait-direction prediction for all Big Five dimensions.
  • Cold-start inference path for new users with raw Reddit activity only.
  • Background job execution decoupled from user-facing request latency.
  • Persisted run metadata, confidence scores, and model lineage for auditability.
Tech Stack

Built with tools chosen for reliability and iteration speed

Temporal NLP pipelines, async inference orchestration, feature lineage, and production-style delivery of research-oriented models.

Core Stack

  • Python 3.11
  • FastAPI
  • PostgreSQL
  • SQLAlchemy Async
  • MLflow
  • XGBoost
  • Transformers
  • PyTorch

Key Features

Temporal feature pipeline

Transforms raw Reddit posts into weekly aggregates, lagged trait states, gap features, and rolling statistics.

Trait-specific inference

Each Big Five dimension resolves its own artifact, label encoder, and version metadata.

Async orchestration

Signup inference creates background jobs so heavy NLP processing stays outside request latency.

Authenticated API delivery

Prediction results are exposed through JWT-protected endpoints for integrated product consumption.

Architecture

System architecture designed as a readable engineering story

Each layer stays explicit so reviewers can quickly understand where ingestion, orchestration, persistence, and model-serving responsibilities live.

01

Acquisition + Identity

The application links user accounts to Reddit identities and persists inference requests into a job queue.

FastAPIJWT authPostgreSQL
02

Feature and Model Layer

Post-level personality signals and text embeddings roll into weekly causal feature frames for each trait.

TransformersPyTorchXGBoost
03

Serving + Lineage

Runs, predictions, and model artifacts are versioned so each response remains traceable.

MLflowSQLAlchemy AsyncPostgreSQL
AI Pipeline

Pipeline stages broken down as a readable execution path

The pipeline section keeps the most important engineering steps visible without collapsing them into generic bullet lists.

01

Extract

Pull Reddit content, normalize text, and prepare post-level inputs for personality scoring.

Reddit dataPythonemoji handling
02

Represent

Generate transformer-based signals and weekly aggregates with causal shifting rules.

DistilBERT personality modelbert-base-uncased
03

Predict

Run trait-specific classifiers for up, neutral, or down direction forecasting.

XGBoostMLflow
04

Serve

Persist predictions and expose authenticated endpoints plus job-based status polling.

FastAPIPostgreSQLworker process
Timeline

A case-study flow that explains how the system took shape

This timeline keeps the implementation story concise: what was framed first, what was hardened next, and what ultimately made the project production-ready.

Phase 01

Longitudinal formulation

Reframed personality as a weekly direction prediction problem rather than a static classification task.

Phase 02

Cold-start inference path

Built a runtime feature builder that can infer directly from raw Reddit bundles for new users.

Phase 03

Async API integration

Separated request handling from heavy inference using a PostgreSQL-backed job queue and worker process.

Challenges

Technical constraints, decisions, and the reasoning behind them

This section is intentionally recruiter-friendly and engineer-friendly at the same time: each challenge is tied to a concrete design choice and a specific outcome.

Challenge

Avoiding temporal leakage in behavioral forecasting.

Solution

Shifted embeddings and derived features, computed thresholds on the training subset only, and used author-aware temporal splits.

Outcome

Preserved causal validity and reviewer credibility.

Challenge

Handling sparse and irregular posting behavior.

Solution

Engineered gap-aware features and robust variance fallbacks to stabilize weekly signals.

Outcome

Made the pipeline more reliable across inconsistent user histories.

Challenge

Serving heavyweight NLP inference at signup.

Solution

Queued jobs in PostgreSQL and processed them through a dedicated worker instead of blocking the API.

Outcome

Kept the product-facing experience responsive.

Results

Metrics and outcomes presented for quick technical review

The emphasis here is signal, not decoration: key numbers, verifiable outcomes, and the context needed to interpret them responsibly.

Weekly Samples

30,116

Per-trait dataset scale across processed weekly records.

Feature Count

774

Embeddings, temporal gaps, lag states, and rolling stats.

Trait Accuracy

75.6-76.9%

Across five trait-specific classifiers.

Async Delivery

Queue-based

Signup inference runs through a DB-backed worker flow.

Key Results

  • Served weekly direction predictions for all five Big Five traits through authenticated APIs.
  • Engineered a 774-feature temporal NLP pipeline over 30,116 weekly samples per trait.
  • Achieved 75.6% to 76.9% accuracy across trait-specific classifiers.
  • Delivered async inference with persisted lineage and background job processing.

Research + Business Impact

Research value

Demonstrates how longitudinal behavioral modeling can move from static personality labeling to production-oriented forecasting.

Product value

Creates a more adaptable input for personalized product experiences and trend-aware user analytics.