Adam Bandel


Inbox Processor

Jan 2026
Type: web-app
Code: 1k lines
Files: 23
Active: Nov 2025 — Jan 2026
Stack:
PythonFastAPIReactTypeScriptSQLiteChromaDBDocker
Tags:
aiautomationproductivity

Overview

Inbox Processor is a unified “second brain” entry point that aggregates content from multiple sources—Gmail and manual notes—into a single processing pipeline. It standardizes inputs, enforces context assignment, detects duplicates using both hash-based and semantic similarity matching, and intelligently routes items to their final destinations in external systems.

The system addresses a common knowledge management pain point: information arrives from multiple channels but lacks a unified system for organization, deduplication, and routing. Rather than manually sorting emails and notes into different apps, Inbox Processor provides a triage interface that ensures nothing falls through the cracks.

Screenshots

Inbox View

Duplicate Detection

Item Detail Modal

Problem

Information overload is real. Emails containing bookmarks, articles, media recommendations, and notes arrive constantly but end up scattered across inboxes without proper categorization. Manual triage is tedious, duplicates slip through, and items meant for specific systems (like a media feed or note-taking app) require manual copy-paste workflows.

The goal was to create a single intake point that:

Approach

The system uses a hybrid architecture combining traditional CRUD operations with vector embeddings for intelligent deduplication.

Stack

Challenges

Outcomes

The system successfully consolidates the email-to-second-brain workflow into a single interface. Key wins:

Vector embeddings proved surprisingly effective for catching “soft duplicates” like forwarded emails or slightly reworded bookmarks. The hybrid approach (hash + vector) provides both speed and accuracy.

Implementation Notes

Deduplication Pipeline

New items pass through a two-stage deduplication check:

# Stage 1: Exact hash match
content_hash = hashlib.sha256(
    (content + "".join(sorted(attachment_names))).encode()
).hexdigest()

existing = db.query(InboxItem).filter(
    InboxItem.content_hash == content_hash
).first()

# Stage 2: Semantic similarity via ChromaDB
if not existing:
    results = collection.query(
        query_texts=[content],
        n_results=5,
        where={"context": context}
    )
    for distance in results["distances"][0]:
        if distance < 0.5:  # Similarity threshold
            # Flag as potential duplicate for manual review

Data Model

Items flow through defined states with clear transitions:

inbox → processed → archived
          ↓
       duplicate (linked to original)
          ↓
       pmf_failed (routing error captured)

The InboxItem model tracks source (gmail or manual), source_id for Gmail idempotency, mandatory context assignment, and attachments as JSON for file metadata.

Frontend Filtering

The sidebar provides multi-dimensional filtering:

Filters combine with useMemo for client-side performance. Dark mode persists to localStorage with system preference detection on first load.


Related Posts

No posts yet.