About
About the Appropriations Parser
This project analyzes federal appropriations bills and their explanatory statements, converting complex budget language into structured, searchable data. It takes unstructured PDF documents and breaks them down into clean program entries with funding amounts, fiscal years, identifiers, and context, making it far easier to explore how government money is allocated and how those allocations change over time.
How It Works
The system is built around a FastAPI backend, a PostgreSQL database, and MinIO object storage. Uploaded bills are stored in MinIO using secure presigned URLs, while the backend manages parsing, AI labeling, diff generation, and data integrity checks. Parsed tables and their rows are stored in Postgres with full relationships, audits, and deduplication. A Next.js frontend queries the API to display parsed entries, compare fiscal years, render source highlights, and export CSV or PDF summaries.
Deployment
The entire environment runs inside Docker Compose. Postgres handles relational data, MinIO provides S3-compatible storage for PDFs and rendered images, FastAPI processes uploads and generates structured output, and Nginx serves as the single public entry point. This setup ensures predictable, isolated services and makes the system easy to deploy on any server.
Parsing Pipeline
When a PDF is uploaded, the frontend obtains a presigned URL and places the file directly into storage. The backend pulls the file, extracts program lines and amounts, merges table and prose text, collects bounding boxes, detects the fiscal year, and inserts all cleaned rows into the database. If AI is enabled, the system generates clearer program names and short human summaries. Each table is automatically audited, and users can generate comparisons, exports, or highlighted views of specific entries.