Adam Bandel


List Import Studio

Oct 2025
Type: desktop
Code: 48k lines
Files: 269
Active: Oct 2025 — Oct 2025
Stack:
TypeScriptReact 19Tauri 2PythonPolarsSQLite
Tags:
datadeveloper-toolsautomation

Overview

List Import Studio is a desktop application for importing, transforming, and enriching tabular data through visual pipelines. Users load CSV/XLSX files, map columns to reference schemas, build transformation graphs with filtering, branching, and derivation nodes, then execute workflows that handle deduplication, fuzzy record matching, and human-in-the-loop review gates.

The application combines a React/TypeScript frontend running in Tauri with a Python sidecar that handles heavy data processing via Polars. Communication happens through JSON-RPC over stdio, avoiding HTTP overhead while enabling rich bidirectional command dispatch.

Screenshots

Plan Canvas

Dataset Mapping

Transform Editor

Problem

Data imports from external sources are messy. Phone numbers come in inconsistent formats, emails need validation, company names require standardization, and duplicate records need detection. Traditional ETL tools are either too complex for one-off imports or too limited for sophisticated matching logic.

The goal was to build a tool that makes it easy to:

Approach

Stack

Challenges

Outcomes

The visual pipeline approach proves effective for complex data workflows. Key wins:

Implementation Notes

JSON-RPC Protocol

Frontend and sidecar communicate via JSON-RPC 2.0 over stdio:

// Frontend: src/services/sidecar/client.ts
export async function callRpc<T>(method: string, params: unknown): Promise<T> {
  const response = await Command.create('sidecar', ['--rpc'])
    .execute(JSON.stringify({ jsonrpc: '2.0', method, params, id: generateId() }));
  return JSON.parse(response.stdout).result;
}
# Sidecar: sidecar/rpc/dispatcher.py
def dispatch(request: dict) -> dict:
    handler = REGISTRY.get(request['method'])
    result = handler(**request.get('params', {}))
    return {'jsonrpc': '2.0', 'result': result, 'id': request['id']}

Plan Execution Engine

The plan engine traverses the node graph, caching results at each stage:

# sidecar/plan_engine.py
def execute_node(node: PlanNode, context: PlanContext) -> DataFrame:
    cache_key = compute_stable_key(node, context.upstream_data)

    if cached := context.cache.get(cache_key):
        return cached

    match node.type:
        case 'transform':
            result = apply_standardizers(context.upstream_data, node.config.operations)
        case 'filter':
            result = apply_filter_conditions(context.upstream_data, node.config.conditions)
        case 'branch':
            # Returns dict of outcome -> DataFrame
            result = route_by_conditions(context.upstream_data, node.config.branches)
        case 'match':
            result = score_candidates(context.upstream_data, node.config.ensemble)

    context.cache.set(cache_key, result)
    return result

Blocking Rules for Matching

Fuzzy matching at scale requires blocking to avoid O(n*m) comparisons:

# sidecar/match/blocking.py
def build_blocking_index(records: DataFrame, rules: list[BlockingRule]) -> dict:
    """Group records by blocking keys for efficient candidate generation."""
    index = defaultdict(list)
    for row in records.iter_rows(named=True):
        for rule in rules:
            key = normalize_blocking_key(row, rule)
            index[key].append(row['_row_ref'])
    return index

State Persistence

Workspace state auto-saves to SQLite on navigation, enabling seamless resume:

// Frontend: src/features/workspace/hooks/useProjectPersistence.ts
const saveState = useCallback(async () => {
  const snapshot = {
    plan: planState,
    datasets: datasetState,
    mapping: mappingState,
    settings: settingsState,
  };
  await rpc.project.saveState(projectId, snapshot);
}, [projectId, planState, datasetState, mappingState, settingsState]);

Related Posts