Adam Bandel


Personal Metrics Dashboard

Nov 2025
Type: data-pipeline
Code: 12k lines
Files: 97
Active: Nov 2025 — Nov 2025
Stack:
PythonDuckDBStreamlitPandasAltair
Tags:
dataquantified-selfanalytics

Overview

A comprehensive personal data analytics platform that ingests, unifies, and correlates health and productivity metrics from over 15 different data sources. The system implements a bronze/silver/gold data lake architecture using DuckDB as the analytical database, transforming raw exports from Fitbit, ActivityWatch, RescueTime, Daylio, and other apps into a unified daily metrics table optimized for correlation analysis.

The pipeline processes years of personal data (spanning 2017-2025) including sleep stages, heart rate variability, SpO2, steps, app usage patterns, mood tracking, weight measurements, and location history. A Streamlit-based dashboard provides interactive visualizations, lagged correlation analysis, and a pipeline status monitor.

Screenshots

Pipeline Status

Metrics Visualization

Correlation Analysis

Problem

Personal health and productivity data is fragmented across dozens of apps and devices, each with proprietary export formats. Manually analyzing relationships between sleep quality and next-day productivity, or tracking how exercise affects mood over time, requires tedious data wrangling and loses the temporal context needed for meaningful insights.

This project solves the data unification problem by creating a single source of truth where metrics from different sources can be joined, compared, and analyzed together. It handles the complexity of overlapping data sources (e.g., multiple sleep trackers), timezone-aware daily boundaries, and the challenge of correlating metrics with different sampling frequencies.

Approach

The architecture follows a medallion (bronze/silver/gold) data lake pattern:

Stack

Data Flow

Source Files (CSV/JSON)
        │
        ▼
  ┌─────────────┐
  │   Parsers   │  → raw_data.duckdb (Bronze)
  │  (26 files) │     fb_sleep_logs, aw_events, etc.
  └─────────────┘
        │
        ▼
  ┌─────────────┐
  │   Unify     │  → unified_sleep, unified_usage
  │  (Overlap   │     Source priority: AW > RT > AppUsage
  │  Resolution)│
  └─────────────┘
        │
        ▼
  ┌─────────────┐
  │ Aggregators │  → aggregates.duckdb (Silver)
  │ (12 files)  │     sleep_daily_agg, usage_daily_agg, etc.
  └─────────────┘
        │
        ▼
  ┌─────────────┐
  │  Wide Join  │  → daily_wide_metrics (Gold)
  │             │     One row per day, 150+ columns
  └─────────────┘

Challenges

Outcomes

The unified daily metrics table enables analyses that were previously impossible:

The modular parser architecture makes adding new data sources straightforward - each parser is a self-contained ~200 line script following the same pattern.

Implementation Notes

Daily Wide Table Join Strategy

The daily_wide_metrics.py aggregator uses a chain of FULL OUTER JOIN operations to preserve all dates present in any source table:

SELECT 
    COALESCE(
        base_usage.daily_date,
        sleep.agg_date,
        steps.agg_date,
        ...
    ) AS daily_date,
    usage.total_usage_minutes,
    sleep.daily_total_sleep,
    steps.daily_step_total,
    ...
FROM usage_daily_agg AS base_usage
FULL OUTER JOIN unified_sleep_daily_agg AS sleep
    ON base_usage.daily_date = sleep.agg_date
FULL OUTER JOIN fb_steps_daily_agg AS steps
    ON COALESCE(base_usage.daily_date, sleep.agg_date) = steps.agg_date
...

File Processing Pattern

Every parser follows this incremental processing pattern:

for file_path in source_files:
    file_hash = compute_file_hash(str(file_path))
    file_size = file_path.stat().st_size
    
    if file_already_processed(con, PARSER_NAME, str(file_path), file_hash, file_size):
        continue  # Skip unchanged files
    
    # Parse and insert...
    
    mark_file_as_processed(con, PARSER_NAME, str(file_path), file_hash, file_size)

Data Classifications

The system handles multiple temporal data patterns:

Type Examples Aggregation Strategy
Continuous time series Heart rate, steps Sum/avg within daily boundary
Interval/session data Sleep logs, app usage Split across boundaries, sum durations
Point events Weight, mood entries Last value or average per day
Categorical App categories, mood labels Count/mode per day

Related Posts

No posts yet.