This project is hosted at https://regobservatory.vercel.app
This tool is for analyzing the size and scope of federal agency regulations located here.
- Historical word count bar chart, optionally filterable by agency.
- Bubble plot showing the relative size of agencies with regard to their regulation word counts.
- Vector search of regulation text, optionally filterable by agency.
- Data is ingested with the help of Inngest.
- Manual jobs may be triggered through their dashboard.
- Ingest runs on Vercel serverless functions.
- Data is stored on a managed postgres instance hosted by Neon
- Agencies are loaded in an idemptotent manual job load-agencies.
- Ingest is started via the ingest manual job.
- The ingest begins from the start date of eCFR data (2017-01-01).
- Ingest fetches the agencies then kicks off a job called process-reference for each agency.
- Fetch the xml for the agency as indicated by the reference.
- Parse the xml for paragraphs. Take note of the encapsulating sections.
- Get the word count and upsert the history.
- If the date being processed is the current date or RUN_UNTIL, generate and store embeddings.
- The ingest job will update the application state once process references is done for all agencies.
- If the date is less than the current date or RUN_UNTIL, the ingest job will kick off another iteration with the next day.
- Runs once a day.
- If the application state indicates ingest is caught up to the current date it will kick off the ingest job for the current date.
The app is built with NextJS + React 18
- pgvector
- AI SDK
- Dayjs
- Recharts
- D3 JS