Adam Bandel


News Aggregator

Jan 2026
Type: web-app
Code: 20k lines
Files: 90
Active: Nov 2024 — Jan 2026
Stack:
PythonFastAPIPostgreSQLReactTypeScriptDocker
Tags:
aidataautomation

Overview

News Aggregator is a self-hosted platform designed to combat information overload by intelligently collecting, filtering, and enriching content from multiple sources. It aggregates posts from Reddit, Twitter/X, RSS feeds, and custom websites into a unified interface, then applies LLM-powered analysis to de-sensationalize headlines, extract key concepts, and score content relevance.

The system follows a microservices architecture with specialized scrapers, a central data pipeline, and an LLM enrichment layer. Users can define preference rules to automatically boost or penalize content based on keywords, sentiment, categories, and engagement metrics.

Screenshots

Main feed view

Preference rules configuration

Article detail with AI analysis

Problem

Modern news consumption involves checking multiple platforms (Reddit, Twitter, RSS readers) while being bombarded with sensationalized headlines, duplicate stories, and irrelevant content. There’s no unified way to:

Approach

The solution uses a distributed microservices architecture where each component has a single responsibility.

Stack

Challenges

Outcomes

The platform successfully aggregates thousands of posts daily while maintaining sub-second query times. The preference scoring system allows fine-grained control over content ranking, and the LLM integration provides genuinely useful summaries that cut through sensationalism.

Key technical wins:

Implementation Notes

The recommendation engine normalizes scores across different source types since Reddit and Twitter have vastly different engagement scales:

# From shared/recommendation_utils.py
def calculate_recommended_score(item, source_type, index_in_type):
    base = normalize_score(item.trending_score, source_type)

    # Apply source fatigue - recently seen sources get penalized
    fatigue_boost = math.log(max(1, index_diff)) * 3600

    # Anti-consecutive penalty prevents 5 Reddit posts in a row
    if consecutive_count > 2:
        base *= 0.7 ** (consecutive_count - 2)

    return base + recency_boost + preference_adjustment

Preference rules support complex boolean conditions stored as JSONB:

# Example rule: Boost AI content from Twitter
{
    "conditions": {
        "AND": [
            {"field": "category", "op": "contains", "value": "AI"},
            {"field": "source_type", "op": "==", "value": "Twitter"}
        ]
    },
    "adjustments": {"score": 5}
}

The data flow follows a clear pipeline: Scheduler triggers Scraper -> Scraper POSTs to Ingestor -> Ingestor stores and queues LLM jobs -> LLM Manager enriches top items -> Fetcher serves filtered results to Frontend.


Related Posts

No posts yet.