Adam Bandel


Augur Engine

Jan 2026
Type: data-pipeline
Code: 126k lines
Files: 482
Active: Dec 2025 — Jan 2026
Stack:
PythonDagsterClickHousePostgreSQLFastAPIPolars
Tags:
financedataaideveloper-tools

Overview

Augur Engine is a production-grade financial data platform that ingests, transforms, and serves market data for quantitative analysis. Built on Dagster’s orchestration framework, it implements a Medallion Architecture (Bronze/Silver/Gold tiers) to progressively refine raw data from 32 external providers into analytics-ready datasets stored in ClickHouse.

The platform’s vision extends beyond data pipelines: it provides an API surface that enables LLM-powered trading agents to test theories, run backtests, and execute paper trading simulations. The broad data coverage (SEC filings, Congressional bills, lobbying records, GDELT geopolitical events, macro indicators) supports analyst agents that surface alpha signals across structured data, documents, and alternative datasets.

Screenshots

Asset Graph

Market Dashboard

Query Explorer

Problem

Building quantitative trading strategies requires access to diverse, high-quality financial data. Most retail and small institutional traders face fragmented data sources, inconsistent schemas, stale data, and no infrastructure to combine market prices with alternative data like SEC filings, Congressional voting records, or institutional fund flows.

This project consolidates 30+ data sources into a unified, query-ready platform with:

Approach

Stack

Challenges

Outcomes

The platform successfully ingests data from 32 providers into 88 ClickHouse tables, with 182 Dagster assets and 238+ automated quality checks. Key achievements:

Lessons learned:

Implementation Notes

Schema-Driven Architecture

All table definitions live in Python dataclasses, generating DDL, IO manager configs, and asset checks:

RAW_OHLCV_DAILY = ClickHouseTableSchema(
    name="raw_ohlcv_daily",
    tier="bronze",
    description="Daily OHLCV from Stooq/yFinance/Massive",
    engine="ReplacingMergeTree",
    order_by=("symbol", "date", "source"),
    partition_by="toYYYYMM(date)",
    columns={
        "symbol": pl.Utf8,
        "date": pl.Date,
        "open": pl.Float64,
        "high": pl.Float64,
        "low": pl.Float64,
        "close": pl.Float64,
        "volume": pl.UInt64,
        "source": pl.Utf8,
    },
    watermark_column="date",
    check_config=CheckConfig(
        min_row_count=1000,
        min_instrument_count=500,
        value_bounds={"open": (0, 1_000_000), "volume": (0, 1e12)},
    ),
)

LLM Trading Agent Response Schema

Agents return structured JSON with trades, research queries, and memory updates:

AGENT_RESPONSE_SCHEMA = {
    "type": "object",
    "properties": {
        "reasoning": {"type": "string"},
        "research_queries": {
            "type": "array",
            "items": {
                "properties": {"tool": {"type": "string"}, "args": {"type": "object"}}
            }
        },
        "trades": {
            "type": "array",
            "items": {
                "properties": {
                    "action": {"enum": ["BUY", "SELL", "CLOSE"]},
                    "symbol": {"type": "string"},
                    "quantity": {"type": "number"}
                }
            }
        },
        "memories_to_record": {"type": "object"},
        "watchlist_updates": {"type": "object"}
    }
}

Data Provider Coverage

Category Providers
Market Data Stooq, yFinance, Polygon (Massive), CBOE
SEC Filings 13F holdings, Form 4 insider, N-PORT funds, FTD
Macro FRED, Treasury, BLS, EIA, World Bank
Alternative GDELT, Congress.gov, LobbyView, Senate LDA
Reference GLEIF (LEI), Fama-French factors, FINRA

Related Posts

No posts yet.