LCAV

May 2025

Type: web-app

Code: 14k lines

Files: 118

Active: Apr 2025 — May 2025

Stack:

Tags:

Overview

LCAV (LLM Code Analysis and Validation System) is a developer tool that provides a safe, transparent way to integrate code generated by Large Language Models into existing software projects. Instead of blindly copy-pasting LLM outputs, developers can paste raw LLM responses into LCAV, which then parses, analyzes, and simulates the changes before any modifications touch the actual codebase.

The system employs a simulation-first architecture: all analysis operates on an in-memory simulated state, comparing the original repository against what it would look like after applying LLM suggestions. This provides multiple layers of validation—textual diffs, linting comparison, semantic analysis, and dependency impact graphs—giving developers confidence before committing changes.

Screenshots

Main Interface

Diff Analysis

Dependency Graph

Problem

Working with LLM-generated code presents a fundamental trust problem. When an LLM outputs modified files or code snippets, developers face several risks:

Silent breaking changes: Code may compile but introduce subtle bugs
Unintended side effects: Changes may cascade through dependencies in unexpected ways
Code quality degradation: LLM suggestions may violate project conventions or introduce lint errors
Manual integration overhead: Parsing multi-file LLM responses and applying changes correctly is tedious and error-prone

LCAV addresses these concerns by creating a validation layer between LLM output and the actual codebase.

Approach

The solution centers on a simulation-based analysis pipeline that never modifies the actual filesystem until the user explicitly approves changes.

Stack

Backend (FastAPI/Python) - Provides the analysis engine with robust parsing via LibCST for Python CST manipulation and Tree-sitter for multi-language support
Frontend (React/TypeScript) - Modern UI built with Vite, shadcn/ui components, and Cytoscape.js for interactive dependency graphs
Database (SQLite/SQLModel) - Lightweight persistence for project profiles and cached file metadata
Analysis Tools - Jedi for Python semantic analysis, Ruff for linting, NetworkX/Rustworkx for graph algorithms

Challenges

Robust LLM Output Parsing - LLM responses come in many formats: full files, code snippets, unified diffs, or mixed content. Built a multi-tier parser that handles markdown code fences, language detection, and graceful degradation for malformed outputs.
Smart File Matching - Determining which repository file an LLM code block corresponds to requires exact path matching, fuzzy matching on filenames and content, plus manual override capabilities when automation fails.
Semantic Analysis at Scale - Building Program Dependency Graphs (PDGs) for each file and understanding cross-file impacts requires careful caching strategies. Implemented a two-tier cache (in-memory with SQLite backing) keyed on content hashes.
Simulation State Management - Maintaining an accurate simulated state where LLM changes are applied textually without touching disk requires careful orchestration between parsing, matching, and diff services.

Outcomes

LCAV provides a comprehensive analysis pipeline that transforms the LLM integration workflow from “paste and pray” to “analyze and apply.” Key capabilities include:

Multi-format LLM response parsing with automatic language detection
Fuzzy matching of code blocks to repository files with confidence scoring
Side-by-side diff visualization with syntax highlighting
Before/after lint comparison to catch introduced issues
Changed entity detection (functions, classes added/removed/modified)
Interactive dependency graphs for impact analysis
Safe Git integration with branch creation and selective change application

Implementation Notes

The analysis pipeline follows a clear data flow:

# Simplified pipeline flow
llm_dump = user_input                           # Raw LLM response
blocks = parsing_service.extract_blocks(dump)   # Parse into code blocks
matches = matching_service.match(blocks, repo)  # Map to repo files
simulated = simulation_service.apply(matches)   # Create simulated state
results = analysis_service.compare(repo, sim)   # Multi-layer analysis

The frontend-backend API mapping feature demonstrates the system’s own dogfooding—it uses Tree-sitter to parse the TypeScript frontend and FastAPI introspection on the backend to visualize which frontend API calls map to which backend endpoints:

// Frontend API call detection via Tree-sitter
const apiCalls = parseTypeScriptForApiCalls(sourceFile);
// Backend endpoint introspection
const endpoints = await fetchBackendRoutes();
// Matching logic finds unmapped calls
const unmapped = findUnmappedCalls(apiCalls, endpoints);

The dependency graph visualization uses Cytoscape.js with multiple layout algorithms (Dagre for hierarchical, CoSE-Bilkent for force-directed) and supports backward slicing to trace data flow from any node.

May 03, 2025 Devlog: LCAV / Hellcats

Adam Bandel