Adam Bandel


Search Gateway

Dec 2025
Type: api
Code: 19k lines
Files: 107
Active: Nov 2025 — Dec 2025
Stack:
PythonFastAPISQLiteRedisPlaywrightPrometheus
Tags:
developer-toolsaidataautomation

Overview

Search Gateway is a unified REST API that aggregates 27+ search and content extraction providers behind a single endpoint. It enables applications to query Brave, Tavily, DuckDuckGo, arXiv, GitHub, Reddit, YouTube, Wikipedia, and many more through consistent request/response schemas, with automatic fallback chains when providers fail.

The gateway handles the complexity of managing multiple API keys, rate limits, cost tracking, and caching so consumers can focus on their search logic rather than provider integration. It is designed for AI agents, developer tools, and any application requiring reliable, cost-controlled access to diverse web search and content sources.

Screenshots

API Documentation

Metrics Dashboard

Problem

Modern applications often need to search the web, fetch articles, query academic papers, or extract content from URLs. Each provider (Brave, Tavily, DuckDuckGo, etc.) has different APIs, rate limits, pricing models, and capabilities. Managing 10+ provider integrations creates significant maintenance burden:

Search Gateway solves this by providing one API to rule them all, with intelligent routing, automatic retries, and comprehensive observability.

Approach

Stack

Challenges

Outcomes

The gateway successfully abstracts provider complexity, reducing integration effort from weeks to hours for new applications. Key achievements:

Learned the importance of defensive coding when dealing with third-party APIs - providers change schemas, rate limits, and behaviors without notice. The circuit breaker pattern proved essential for graceful degradation.

Implementation Notes

Provider Adapter Pattern

Each provider extends BaseAdapter with standardized retry logic:

class BraveAdapter(BaseAdapter):
    name = "brave"
    base_url = "https://api.search.brave.com/res/v1"
    
    def capabilities(self) -> Dict[str, Any]:
        return {
            "ops": ["search:web", "search:news", "ai:grounding"],
            "filters_supported": ["include_domains", "freshness_days"],
            "options_supported": ["max_results", "safesearch"],
        }
    
    async def search(self, req: SearchRequestModel) -> List[SearchResult]:
        response = await self._request_with_retry(
            "GET", f"{self.base_url}/web/search",
            params={"q": req.query, "count": req.options.max_results}
        )
        return self._transform_results(response.json())

Operation-Based Routing

The selector routes by operation category, not just provider name:

selection = selector.select(
    operation="search:academic",  # Routes to arXiv, Semantic Scholar, OpenAlex
    client_id=x_client_id,
    fallback=True,
)
# Returns prioritized list: ["arxiv", "semantic_scholar", "openalex"]

Rate Limiting with Token Bucket

Per-client, per-provider rate limiting with burst allowance:

class TokenBucket:
    def allow(self) -> bool:
        now = time.monotonic()
        self.tokens = min(self.capacity, self.tokens + elapsed * self.rate)
        if self.tokens >= 1:
            self.tokens -= 1
            return True
        return False

Provider Catalog Configuration

All provider metadata lives in provider-catalog.yaml:

providers:
  brave:
    ops: ["search:web", "search:news", "ai:grounding"]
    limits: { rps: 1, monthly_cap: 2000 }
    pricing_usd:
      "search:web": 0.003
    plans:
      free:
        limits: { rps: 1, monthly_cap: 2000 }
      pro_ai:
        limits: { rps: 50 }
        pricing_usd: { "search:web": 0.009 }

Related Posts

No posts yet.