LCAV
May 2025Overview
LCAV (LLM Code Analysis and Validation System) is a developer tool that provides a safe, transparent way to integrate code generated by Large Language Models into existing software projects. Instead of blindly copy-pasting LLM outputs, developers can paste raw LLM responses into LCAV, which then parses, analyzes, and simulates the changes before any modifications touch the actual codebase.
The system employs a simulation-first architecture: all analysis operates on an in-memory simulated state, comparing the original repository against what it would look like after applying LLM suggestions. This provides multiple layers of validation—textual diffs, linting comparison, semantic analysis, and dependency impact graphs—giving developers confidence before committing changes.
Screenshots



Problem
Working with LLM-generated code presents a fundamental trust problem. When an LLM outputs modified files or code snippets, developers face several risks:
- Silent breaking changes: Code may compile but introduce subtle bugs
- Unintended side effects: Changes may cascade through dependencies in unexpected ways
- Code quality degradation: LLM suggestions may violate project conventions or introduce lint errors
- Manual integration overhead: Parsing multi-file LLM responses and applying changes correctly is tedious and error-prone
LCAV addresses these concerns by creating a validation layer between LLM output and the actual codebase.
Approach
The solution centers on a simulation-based analysis pipeline that never modifies the actual filesystem until the user explicitly approves changes.
Stack
- Backend (FastAPI/Python) - Provides the analysis engine with robust parsing via LibCST for Python CST manipulation and Tree-sitter for multi-language support
- Frontend (React/TypeScript) - Modern UI built with Vite, shadcn/ui components, and Cytoscape.js for interactive dependency graphs
- Database (SQLite/SQLModel) - Lightweight persistence for project profiles and cached file metadata
- Analysis Tools - Jedi for Python semantic analysis, Ruff for linting, NetworkX/Rustworkx for graph algorithms
Challenges
-
Robust LLM Output Parsing - LLM responses come in many formats: full files, code snippets, unified diffs, or mixed content. Built a multi-tier parser that handles markdown code fences, language detection, and graceful degradation for malformed outputs.
-
Smart File Matching - Determining which repository file an LLM code block corresponds to requires exact path matching, fuzzy matching on filenames and content, plus manual override capabilities when automation fails.
-
Semantic Analysis at Scale - Building Program Dependency Graphs (PDGs) for each file and understanding cross-file impacts requires careful caching strategies. Implemented a two-tier cache (in-memory with SQLite backing) keyed on content hashes.
-
Simulation State Management - Maintaining an accurate simulated state where LLM changes are applied textually without touching disk requires careful orchestration between parsing, matching, and diff services.
Outcomes
LCAV provides a comprehensive analysis pipeline that transforms the LLM integration workflow from “paste and pray” to “analyze and apply.” Key capabilities include:
- Multi-format LLM response parsing with automatic language detection
- Fuzzy matching of code blocks to repository files with confidence scoring
- Side-by-side diff visualization with syntax highlighting
- Before/after lint comparison to catch introduced issues
- Changed entity detection (functions, classes added/removed/modified)
- Interactive dependency graphs for impact analysis
- Safe Git integration with branch creation and selective change application
Implementation Notes
The analysis pipeline follows a clear data flow:
# Simplified pipeline flow
llm_dump = user_input # Raw LLM response
blocks = parsing_service.extract_blocks(dump) # Parse into code blocks
matches = matching_service.match(blocks, repo) # Map to repo files
simulated = simulation_service.apply(matches) # Create simulated state
results = analysis_service.compare(repo, sim) # Multi-layer analysis
The frontend-backend API mapping feature demonstrates the system’s own dogfooding—it uses Tree-sitter to parse the TypeScript frontend and FastAPI introspection on the backend to visualize which frontend API calls map to which backend endpoints:
// Frontend API call detection via Tree-sitter
const apiCalls = parseTypeScriptForApiCalls(sourceFile);
// Backend endpoint introspection
const endpoints = await fetchBackendRoutes();
// Matching logic finds unmapped calls
const unmapped = findUnmappedCalls(apiCalls, endpoints);
The dependency graph visualization uses Cytoscape.js with multiple layout algorithms (Dagre for hierarchical, CoSE-Bilkent for force-directed) and supports backward slicing to trace data flow from any node.
Related Posts
- May 03, 2025 Devlog: LCAV / Hellcats