Personal Media Feed

Jan 2026

Type: web-app

Code: 47k lines

Files: 253

Active: Apr 2025 — Jan 2026

Stack:

Tags:

Overview

Personal Media Feed (PMF) is a sophisticated content aggregation and management system that pulls media from diverse sources—YouTube channels, podcasts, RSS feeds, books, TV shows, games, manga, comics, and more—then enriches each item with AI-generated metadata. The system uses LLMs to extract summaries, keywords, sentiment, and named entities, while ChromaDB provides semantic search across the entire library.

What sets PMF apart is its resolver orchestrator: a unified abstraction over 15+ external APIs (TMDB, IGDB, AniList, MangaDex, iTunes, PodcastIndex, etc.) that automatically identifies content and fetches rich metadata. Combined with smart views, content classification, and Sonarr/Radarr integration, it creates a personalized media dashboard that grows smarter over time.

Screenshots

Problem

Modern content consumption is fragmented across dozens of platforms—YouTube, Spotify, Goodreads, Steam, Crunchyroll—each with its own interface, recommendations, and tracking. There’s no unified view of “what should I consume next?” that respects personal context and history.

Existing solutions either focus on a single medium (Plex for video, Calibre for books) or require manual curation. I wanted a system that could ingest a URL, automatically identify what it is, fetch rich metadata, and organize it alongside everything else I’m tracking.

Approach

The architecture separates concerns into a modular backend with specialized domains, connected to a React frontend via REST and Server-Sent Events.

Stack

FastAPI + Python - Async-first backend with SQLModel ORM for clean data modeling; enables parallel resolver calls and streaming enrichment updates
React 19 + TypeScript - Modern frontend with Zustand for state management and TanStack Query for server state synchronization
ChromaDB - Vector embeddings enable semantic search (“find me something like that documentary about space”)
OpenAI/OpenRouter - LLM interface for extracting titles, summaries, keywords, mood, complexity, and named entities from raw content
SQLite - Simple, portable database with FTS5 for full-text search; separate dev/prod databases prevent accidents

Challenges

Resolver orchestration - Coordinating 15+ external APIs with different rate limits, authentication schemes, and response formats required building a token-bucket rate limiter and unified type system. Each resolver implements a common interface, and the orchestrator routes requests based on content type hints.
Enrichment pipeline - Processing YouTube transcripts, PDF content, and HTML articles through LLMs needed careful prompt engineering and cost tracking. The system extracts content via Trafilatura/BeautifulSoup, chunks it appropriately, and sends structured prompts that return consistent JSON schemas.
Filter synchronization - Keeping filter state consistent across sidebar, filter panel, URL parameters, and smart views required a centralized Zustand store with careful attention to which component owns what. Smart views store filter criteria as JSON that maps directly to store keys.

Outcomes

The system now tracks 1000+ media items across all categories with minimal manual input. Adding a URL triggers automatic identification, metadata resolution, and AI enrichment—typically completing in under 10 seconds for known sources.

Key learnings:

Resolver abstraction pays off - Adding new sources (MangaDex, ComicVine) takes hours, not days, because the orchestrator pattern is established
Semantic search changes discovery - Finding “that video about distributed systems with the funny presenter” actually works via ChromaDB embeddings
Smart views reduce friction - Pre-defined filters like “Unwatched Tech Videos” or “Books Added This Month” make the feed immediately useful

Implementation Notes

Resolver Orchestrator

The system routes metadata requests through a unified orchestrator that selects appropriate providers based on content type:

class ResolverOrchestrator:
    async def resolve(self, url: str, hints: ResolveHints) -> ResolvedMetadata:
        # Identity resolver determines content type from URL patterns
        identity = await self.identity_resolver.identify(url)
        
        # Route to appropriate providers based on type
        providers = self.routing_config.get_providers(identity.content_type)
        
        # Fan out requests with rate limiting
        results = await asyncio.gather(*[
            self.rate_limiters[p].execute(p.fetch, url)
            for p in providers
        ])
        
        # Adjudicator selects best metadata from multiple sources
        return self.adjudicator.merge(results)

Collection Population

YouTube channels use an efficient upload playlist conversion for pagination:

# Convert channel ID to uploads playlist for efficient fetching
# UC... (channel) -> UU... (uploads playlist)
uploads_playlist_id = "UU" + channel_id[2:]

SSE Enrichment Updates

Real-time progress updates stream to the frontend during enrichment:

@router.get("/stream")
async def event_stream():
    async def generate():
        async for event in sse_service.subscribe():
            yield {
                "event": event.type,
                "data": json.dumps(event.payload)
            }
    return EventSourceResponse(generate())

No posts yet.

Adam Bandel