docs: add comprehensive architecture documentation
Covers full stack, database schema, API endpoints, Celery pipeline, LLM service design, frontend structure, auth, deployment, and feature history through v0.2.0. Authored-By: Jack Levy
This commit is contained in:
796
ARCHITECTURE.md
Normal file
796
ARCHITECTURE.md
Normal file
@@ -0,0 +1,796 @@
|
|||||||
|
# PocketVeto — Architecture & Feature Documentation
|
||||||
|
|
||||||
|
> **App brand:** PocketVeto
|
||||||
|
> **Repo:** civicstack
|
||||||
|
> **Purpose:** Citizen-grade US Congress monitoring with AI-powered bill analysis, per-claim citations, and personalized tracking.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
1. [Overview](#overview)
|
||||||
|
2. [Tech Stack](#tech-stack)
|
||||||
|
3. [Infrastructure & Docker](#infrastructure--docker)
|
||||||
|
4. [Configuration & Environment](#configuration--environment)
|
||||||
|
5. [Database Schema](#database-schema)
|
||||||
|
6. [Alembic Migrations](#alembic-migrations)
|
||||||
|
7. [Backend API](#backend-api)
|
||||||
|
8. [Celery Workers & Pipeline](#celery-workers--pipeline)
|
||||||
|
9. [LLM Service](#llm-service)
|
||||||
|
10. [Frontend](#frontend)
|
||||||
|
11. [Authentication](#authentication)
|
||||||
|
12. [Key Architectural Patterns](#key-architectural-patterns)
|
||||||
|
13. [Feature History](#feature-history)
|
||||||
|
14. [Deployment](#deployment)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
PocketVeto is a self-hosted, full-stack application that automatically tracks US Congress legislation, fetches bill text, generates AI summaries with per-claim source citations, correlates bills with news and Google Trends, and presents everything through a personalized dashboard. Users follow bills, members of Congress, and policy topics; the system surfaces relevant activity in their feed.
|
||||||
|
|
||||||
|
```
|
||||||
|
Congress.gov API → Poller → DB → Document Fetcher → GovInfo
|
||||||
|
↓
|
||||||
|
LLM Processor
|
||||||
|
↓
|
||||||
|
BillBrief
|
||||||
|
(cited AI brief)
|
||||||
|
↓
|
||||||
|
News Fetcher + Trend Scorer
|
||||||
|
↓
|
||||||
|
Next.js Frontend
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Tech Stack
|
||||||
|
|
||||||
|
| Layer | Technology |
|
||||||
|
|---|---|
|
||||||
|
| Reverse Proxy | Nginx (alpine) |
|
||||||
|
| Backend API | FastAPI + SQLAlchemy (async) |
|
||||||
|
| Task Queue | Celery 5 + Redis |
|
||||||
|
| Task Scheduler | Celery Beat + RedBeat (Redis-backed) |
|
||||||
|
| Database | PostgreSQL 16 |
|
||||||
|
| Cache / Broker | Redis 7 |
|
||||||
|
| Frontend | Next.js 15, React, Tailwind CSS, TypeScript |
|
||||||
|
| Auth | JWT (python-jose) + bcrypt (passlib) |
|
||||||
|
| LLM | Multi-provider factory: OpenAI, Anthropic, Gemini, Ollama |
|
||||||
|
| Bill Metadata | Congress.gov API (api.data.gov key) |
|
||||||
|
| Bill Text | GovInfo API (same api.data.gov key) |
|
||||||
|
| News | NewsAPI.org (100 req/day free tier) |
|
||||||
|
| Trends | Google Trends via pytrends |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Infrastructure & Docker
|
||||||
|
|
||||||
|
### Services (`docker-compose.yml`)
|
||||||
|
|
||||||
|
```
|
||||||
|
postgres:16-alpine
|
||||||
|
DB: pocketveto
|
||||||
|
User: congress
|
||||||
|
Port: 5432 (internal)
|
||||||
|
|
||||||
|
redis:7-alpine
|
||||||
|
Port: 6379 (internal)
|
||||||
|
Role: Celery broker, result backend, RedBeat schedule store
|
||||||
|
|
||||||
|
api (civicstack-api image)
|
||||||
|
Port: 8000 (internal)
|
||||||
|
Command: alembic upgrade head && uvicorn app.main:app --host 0.0.0.0 --port 8000
|
||||||
|
Depends: postgres (healthy), redis (healthy)
|
||||||
|
|
||||||
|
worker (civicstack-worker image)
|
||||||
|
Command: celery -A app.workers.celery_app worker -Q polling,documents,llm,news -c 4
|
||||||
|
Depends: postgres (healthy), redis (healthy)
|
||||||
|
|
||||||
|
beat (civicstack-beat image)
|
||||||
|
Command: celery -A app.workers.celery_app beat -S redbeat.RedBeatScheduler
|
||||||
|
Depends: redis (healthy)
|
||||||
|
|
||||||
|
frontend (civicstack-frontend image)
|
||||||
|
Port: 3000 (internal)
|
||||||
|
Build: Next.js standalone output
|
||||||
|
|
||||||
|
nginx:alpine
|
||||||
|
Port: 80 → public
|
||||||
|
Routes: /api/* → api:8000 | /* → frontend:3000
|
||||||
|
```
|
||||||
|
|
||||||
|
### Nginx Config (`nginx/nginx.conf`)
|
||||||
|
|
||||||
|
- `resolver 127.0.0.11 valid=10s` — re-resolves Docker DNS after container restarts (prevents stale-IP 502s on redeploy)
|
||||||
|
- `/api/` → FastAPI, 120s read timeout
|
||||||
|
- `/_next/static/` → frontend with 1-day cache header
|
||||||
|
- `/` → frontend with WebSocket upgrade support
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Configuration & Environment
|
||||||
|
|
||||||
|
Copy `.env.example` → `.env` and fill in keys before first run.
|
||||||
|
|
||||||
|
```env
|
||||||
|
# Network
|
||||||
|
LOCAL_URL=http://localhost
|
||||||
|
PUBLIC_URL= # optional, e.g. https://yourapp.com
|
||||||
|
|
||||||
|
# Auth
|
||||||
|
JWT_SECRET_KEY= # python -c "import secrets; print(secrets.token_hex(32))"
|
||||||
|
|
||||||
|
# PostgreSQL
|
||||||
|
POSTGRES_USER=congress
|
||||||
|
POSTGRES_PASSWORD=congress
|
||||||
|
POSTGRES_DB=pocketveto
|
||||||
|
|
||||||
|
# Redis
|
||||||
|
REDIS_URL=redis://redis:6379/0
|
||||||
|
|
||||||
|
# Congress.gov + GovInfo (shared key from api.data.gov)
|
||||||
|
DATA_GOV_API_KEY=
|
||||||
|
CONGRESS_POLL_INTERVAL_MINUTES=30
|
||||||
|
|
||||||
|
# LLM — pick one provider
|
||||||
|
LLM_PROVIDER=openai # openai | anthropic | gemini | ollama
|
||||||
|
OPENAI_API_KEY=
|
||||||
|
OPENAI_MODEL=gpt-4o
|
||||||
|
ANTHROPIC_API_KEY=
|
||||||
|
ANTHROPIC_MODEL=claude-opus-4-6
|
||||||
|
GEMINI_API_KEY=
|
||||||
|
GEMINI_MODEL=gemini-1.5-pro
|
||||||
|
OLLAMA_BASE_URL=http://host.docker.internal:11434
|
||||||
|
OLLAMA_MODEL=llama3.1
|
||||||
|
|
||||||
|
# News & Trends
|
||||||
|
NEWSAPI_KEY=
|
||||||
|
PYTRENDS_ENABLED=true
|
||||||
|
```
|
||||||
|
|
||||||
|
**Runtime overrides:** LLM provider/model and poll interval can be changed live through the Admin page — stored in the `app_settings` table and take precedence over env vars.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Database Schema
|
||||||
|
|
||||||
|
### `bills`
|
||||||
|
Primary key: `bill_id` — natural key in format `{congress}-{type}-{number}` (e.g. `119-hr-1234`).
|
||||||
|
|
||||||
|
| Column | Type | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| bill_id | varchar (PK) | |
|
||||||
|
| congress_number | int | |
|
||||||
|
| bill_type | varchar | `hr`, `s`, `hjres`, `sjres` (tracked); `hres`, `sres`, `hconres`, `sconres` (not tracked) |
|
||||||
|
| bill_number | int | |
|
||||||
|
| title | text | |
|
||||||
|
| short_title | text | |
|
||||||
|
| sponsor_id | varchar (FK → members) | bioguide_id |
|
||||||
|
| introduced_date | date | |
|
||||||
|
| latest_action_date | date | |
|
||||||
|
| latest_action_text | text | |
|
||||||
|
| status | varchar | |
|
||||||
|
| chamber | varchar | House / Senate |
|
||||||
|
| congress_url | varchar | congress.gov link |
|
||||||
|
| govtrack_url | varchar | |
|
||||||
|
| last_checked_at | timestamptz | |
|
||||||
|
| actions_fetched_at | timestamptz | |
|
||||||
|
| created_at / updated_at | timestamptz | |
|
||||||
|
|
||||||
|
Indexes: `congress_number`, `latest_action_date`, `introduced_date`, `chamber`, `sponsor_id`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `bill_actions`
|
||||||
|
|
||||||
|
| Column | Type | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| id | int (PK) | |
|
||||||
|
| bill_id | varchar (FK → bills, CASCADE) | |
|
||||||
|
| action_date | date | |
|
||||||
|
| action_text | text | |
|
||||||
|
| action_type | varchar | |
|
||||||
|
| chamber | varchar | |
|
||||||
|
| created_at | timestamptz | |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `bill_documents`
|
||||||
|
Stores fetched bill text versions from GovInfo.
|
||||||
|
|
||||||
|
| Column | Type | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| id | int (PK) | |
|
||||||
|
| bill_id | varchar (FK → bills, CASCADE) | |
|
||||||
|
| doc_type | varchar | `bill_text`, `committee_report`, `amendment` |
|
||||||
|
| doc_version | varchar | Introduced, Enrolled, etc. |
|
||||||
|
| govinfo_url | varchar | Source URL on GovInfo |
|
||||||
|
| raw_text | text | Full extracted text |
|
||||||
|
| fetched_at | timestamptz | |
|
||||||
|
| created_at | timestamptz | |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `bill_briefs`
|
||||||
|
AI-generated analysis. `key_points` and `risks` are JSONB arrays of cited objects.
|
||||||
|
|
||||||
|
| Column | Type | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| id | int (PK) | |
|
||||||
|
| bill_id | varchar (FK → bills, CASCADE) | |
|
||||||
|
| document_id | int (FK → bill_documents, SET NULL) | |
|
||||||
|
| brief_type | varchar | `full` (first version) or `amendment` (diff from prior version) |
|
||||||
|
| summary | text | 2-4 paragraph plain-language summary |
|
||||||
|
| key_points | jsonb | `[{text, citation, quote}]` |
|
||||||
|
| risks | jsonb | `[{text, citation, quote}]` |
|
||||||
|
| deadlines | jsonb | `[{date, description}]` |
|
||||||
|
| topic_tags | jsonb | `["healthcare", "taxation", ...]` |
|
||||||
|
| llm_provider | varchar | Which provider generated this brief |
|
||||||
|
| llm_model | varchar | Specific model name |
|
||||||
|
| govinfo_url | varchar (nullable) | Source document URL (from bill_documents) |
|
||||||
|
| created_at | timestamptz | |
|
||||||
|
|
||||||
|
Indexes: `bill_id`, `topic_tags` (GIN for JSONB containment queries)
|
||||||
|
|
||||||
|
**Citation structure** — each `key_points`/`risks` item:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"text": "The bill allocates $50B for defense",
|
||||||
|
"citation": "Section 301(a)(2)",
|
||||||
|
"quote": "There is hereby appropriated for fiscal year 2026, $50,000,000,000 for the Department of Defense..."
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `members`
|
||||||
|
Primary key: `bioguide_id` (Congress.gov canonical identifier).
|
||||||
|
|
||||||
|
| Column | Type |
|
||||||
|
|---|---|
|
||||||
|
| bioguide_id | varchar (PK) |
|
||||||
|
| name | varchar |
|
||||||
|
| first_name / last_name | varchar |
|
||||||
|
| party | varchar |
|
||||||
|
| state | varchar |
|
||||||
|
| chamber | varchar |
|
||||||
|
| district | varchar (nullable, House only) |
|
||||||
|
| photo_url | varchar (nullable) |
|
||||||
|
| created_at / updated_at | timestamptz |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `users`
|
||||||
|
|
||||||
|
| Column | Type | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| id | int (PK) | |
|
||||||
|
| email | varchar (unique) | |
|
||||||
|
| hashed_password | varchar | bcrypt |
|
||||||
|
| is_admin | bool | First registered user = true |
|
||||||
|
| notification_prefs | jsonb | Future: ntfy, Telegram, RSS config |
|
||||||
|
| created_at | timestamptz | |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `follows`
|
||||||
|
|
||||||
|
| Column | Type | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| id | int (PK) | |
|
||||||
|
| user_id | int (FK → users, CASCADE) | |
|
||||||
|
| follow_type | varchar | `bill`, `member`, `topic` |
|
||||||
|
| follow_value | varchar | bill_id, bioguide_id, or topic name |
|
||||||
|
| created_at | timestamptz | |
|
||||||
|
|
||||||
|
Unique constraint: `(user_id, follow_type, follow_value)`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `news_articles`
|
||||||
|
|
||||||
|
| Column | Type | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| id | int (PK) | |
|
||||||
|
| bill_id | varchar (FK → bills, CASCADE) | |
|
||||||
|
| source | varchar | News outlet |
|
||||||
|
| headline | varchar | |
|
||||||
|
| url | varchar (unique) | Deduplication key |
|
||||||
|
| published_at | timestamptz | |
|
||||||
|
| relevance_score | float | Default 1.0 |
|
||||||
|
| created_at | timestamptz | |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `trend_scores`
|
||||||
|
One record per bill per day.
|
||||||
|
|
||||||
|
| Column | Type | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| id | int (PK) | |
|
||||||
|
| bill_id | varchar (FK → bills, CASCADE) | |
|
||||||
|
| score_date | date | |
|
||||||
|
| newsapi_count | int | Articles from NewsAPI (30-day window) |
|
||||||
|
| gnews_count | int | Articles from Google News RSS |
|
||||||
|
| gtrends_score | float | Google Trends interest 0–100 |
|
||||||
|
| composite_score | float | Weighted combination 0–100 |
|
||||||
|
| created_at | timestamptz | |
|
||||||
|
|
||||||
|
**Composite score formula:**
|
||||||
|
```
|
||||||
|
newsapi_pts = min(newsapi_count / 20, 1.0) × 40 # saturates at 20 articles
|
||||||
|
gnews_pts = min(gnews_count / 50, 1.0) × 30 # saturates at 50 articles
|
||||||
|
gtrends_pts = (gtrends_score / 100) × 30
|
||||||
|
composite = newsapi_pts + gnews_pts + gtrends_pts # range 0–100
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `committees` / `committee_bills`
|
||||||
|
|
||||||
|
| committees | committee_id (PK), name, chamber, type |
|
||||||
|
|---|---|
|
||||||
|
| committee_bills | id, committee_id (FK), bill_id (FK), referred_date |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `app_settings`
|
||||||
|
Key-value store for runtime-configurable settings.
|
||||||
|
|
||||||
|
| Key | Purpose |
|
||||||
|
|---|---|
|
||||||
|
| `congress_last_polled_at` | ISO timestamp of last successful poll |
|
||||||
|
| `llm_provider` | Overrides `LLM_PROVIDER` env var |
|
||||||
|
| `llm_model` | Overrides provider default model |
|
||||||
|
| `congress_poll_interval_minutes` | Overrides env var |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Alembic Migrations
|
||||||
|
|
||||||
|
| File | Description |
|
||||||
|
|---|---|
|
||||||
|
| `0001_initial_schema.py` | All initial tables |
|
||||||
|
| `0002_widen_chamber_party_columns.py` | Wider varchar for Bill.chamber, Member.party |
|
||||||
|
| `0003_widen_member_state_district.py` | Wider varchar for Member.state, Member.district |
|
||||||
|
| `0004_add_brief_type.py` | BillBrief.brief_type column (`full`/`amendment`) |
|
||||||
|
| `0005_add_users_and_user_follows.py` | users table + user_id FK on follows; drops global follows |
|
||||||
|
| `0006_add_brief_govinfo_url.py` | BillBrief.govinfo_url for frontend source links |
|
||||||
|
|
||||||
|
Migrations run automatically on API startup: `alembic upgrade head`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Backend API
|
||||||
|
|
||||||
|
Base URL: `/api`
|
||||||
|
Auth header: `Authorization: Bearer <jwt>`
|
||||||
|
|
||||||
|
### `/api/auth`
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| POST | `/register` | — | Create account. First user → admin. Returns token + user. |
|
||||||
|
| POST | `/login` | — | Returns token + user. |
|
||||||
|
| GET | `/me` | Required | Current user info. |
|
||||||
|
|
||||||
|
### `/api/bills`
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| GET | `/` | — | Paginated bill list. Query: `chamber`, `topic`, `sponsor_id`, `q`, `page`, `per_page`, `sort`. |
|
||||||
|
| GET | `/{bill_id}` | — | Full bill detail with sponsor, actions, briefs, news, trend scores. |
|
||||||
|
| GET | `/{bill_id}/actions` | — | Action timeline, newest first. |
|
||||||
|
| GET | `/{bill_id}/news` | — | Related news articles, limit 20. |
|
||||||
|
| GET | `/{bill_id}/trend` | — | Trend score history. Query: `days` (7–365, default 30). |
|
||||||
|
|
||||||
|
### `/api/members`
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| GET | `/` | — | Paginated members. Query: `chamber`, `party`, `state`, `q`, `page`, `per_page`. |
|
||||||
|
| GET | `/{bioguide_id}` | — | Member detail. |
|
||||||
|
| GET | `/{bioguide_id}/bills` | — | Member's sponsored bills, paginated. |
|
||||||
|
|
||||||
|
### `/api/follows`
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| GET | `/` | Required | Current user's follows. |
|
||||||
|
| POST | `/` | Required | Add follow `{follow_type, follow_value}`. Idempotent. |
|
||||||
|
| DELETE | `/{id}` | Required | Remove follow (ownership checked). |
|
||||||
|
|
||||||
|
### `/api/dashboard`
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| GET | `/` | Required | Personalized feed from followed bills/members/topics + trending. Returns `{feed, trending, follows}`. |
|
||||||
|
|
||||||
|
### `/api/search`
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| GET | `/` | — | Full-text search. Query: `q` (min 2 chars). Returns `{bills, members}`. |
|
||||||
|
|
||||||
|
### `/api/settings`
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| GET | `/` | Required | Current settings (DB overrides env). |
|
||||||
|
| PUT | `/` | Admin | Update `{key, value}`. Allowed keys: `llm_provider`, `llm_model`, `congress_poll_interval_minutes`. |
|
||||||
|
| POST | `/test-llm` | Admin | Test LLM connection. Returns `{status, provider, model, summary_preview}`. |
|
||||||
|
|
||||||
|
### `/api/admin`
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| GET | `/users` | Admin | All users with follow counts. |
|
||||||
|
| DELETE | `/users/{id}` | Admin | Delete user (cannot delete self). Cascades follows. |
|
||||||
|
| PATCH | `/users/{id}/toggle-admin` | Admin | Promote/demote admin status (cannot change self). |
|
||||||
|
| GET | `/stats` | Admin | Pipeline progress: total bills, docs fetched, briefs generated, remaining. |
|
||||||
|
| POST | `/trigger-poll` | Admin | Queue immediate Congress.gov poll. |
|
||||||
|
| POST | `/trigger-member-sync` | Admin | Queue member sync. |
|
||||||
|
| POST | `/trigger-trend-scores` | Admin | Queue trend score calculation. |
|
||||||
|
| GET | `/task-status/{task_id}` | Admin | Celery task status and result. |
|
||||||
|
|
||||||
|
### `/api/health`
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| GET | `/` | Simple health check `{status: "ok", timestamp}`. |
|
||||||
|
| GET | `/detailed` | Tests PostgreSQL + Redis. Returns per-service status. |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Celery Workers & Pipeline
|
||||||
|
|
||||||
|
**Celery app name:** `pocketveto`
|
||||||
|
**Broker / Backend:** Redis
|
||||||
|
|
||||||
|
### Queue Routing
|
||||||
|
|
||||||
|
| Queue | Workers | Tasks |
|
||||||
|
|---|---|---|
|
||||||
|
| `polling` | worker | `poll_congress_bills`, `sync_members` |
|
||||||
|
| `documents` | worker | `fetch_bill_documents` |
|
||||||
|
| `llm` | worker | `process_document_with_llm` |
|
||||||
|
| `news` | worker | `fetch_news_for_bill`, `fetch_news_for_active_bills`, `calculate_all_trend_scores` |
|
||||||
|
|
||||||
|
**Worker settings:**
|
||||||
|
- `task_acks_late = True` — task removed from queue only after completion, not on pickup
|
||||||
|
- `worker_prefetch_multiplier = 1` — prevents workers from hoarding LLM tasks
|
||||||
|
- Serialization: JSON
|
||||||
|
|
||||||
|
### Beat Schedule (RedBeat, stored in Redis)
|
||||||
|
|
||||||
|
| Schedule | Task | When |
|
||||||
|
|---|---|---|
|
||||||
|
| Configurable (default 30 min) | `poll_congress_bills` | Continuous |
|
||||||
|
| Every 6 hours | `fetch_news_for_active_bills` | Ongoing |
|
||||||
|
| Daily 2 AM UTC | `calculate_all_trend_scores` | Nightly |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Pipeline Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
1. congress_poller.poll_congress_bills()
|
||||||
|
↳ Fetches bills updated since last poll (fromDateTime param)
|
||||||
|
↳ Filters: only hr, s, hjres, sjres (legislation that can become law)
|
||||||
|
↳ First run: seeds from 60 days back
|
||||||
|
↳ New bills → fetch_bill_documents.delay(bill_id)
|
||||||
|
↳ Updated bills → fetch_bill_documents.delay(bill_id) if changed
|
||||||
|
|
||||||
|
2. document_fetcher.fetch_bill_documents(bill_id)
|
||||||
|
↳ Gets text versions from Congress.gov (XML preferred, falls back to HTML/PDF)
|
||||||
|
↳ Fetches raw text from GovInfo
|
||||||
|
↳ Idempotent: skips if doc_version already stored
|
||||||
|
↳ Stores BillDocument with govinfo_url + raw_text
|
||||||
|
↳ → process_document_with_llm.delay(document_id)
|
||||||
|
|
||||||
|
3. llm_processor.process_document_with_llm(document_id)
|
||||||
|
↳ Rate limited: 10/minute
|
||||||
|
↳ Idempotent: skips if brief exists for document
|
||||||
|
↳ Determines type:
|
||||||
|
- No prior brief → "full" brief
|
||||||
|
- Prior brief exists → "amendment" brief (diff vs previous)
|
||||||
|
↳ Calls configured LLM provider
|
||||||
|
↳ Stores BillBrief with cited key_points and risks
|
||||||
|
↳ → fetch_news_for_bill.delay(bill_id)
|
||||||
|
|
||||||
|
4. news_fetcher.fetch_news_for_bill(bill_id)
|
||||||
|
↳ Queries NewsAPI using bill title + topic_tags
|
||||||
|
↳ Deduplicates by URL
|
||||||
|
↳ Stores NewsArticle records
|
||||||
|
|
||||||
|
5. trend_scorer.calculate_all_trend_scores() [nightly]
|
||||||
|
↳ Bills active in last 90 days
|
||||||
|
↳ Skips bills already scored today
|
||||||
|
↳ Fetches: NewsAPI count + Google News RSS count + Google Trends score
|
||||||
|
↳ Calculates composite_score (0–100)
|
||||||
|
↳ Stores TrendScore record
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## LLM Service
|
||||||
|
|
||||||
|
**File:** `backend/app/services/llm_service.py`
|
||||||
|
|
||||||
|
### Provider Factory
|
||||||
|
|
||||||
|
```python
|
||||||
|
get_llm_provider() → LLMProvider
|
||||||
|
```
|
||||||
|
|
||||||
|
Reads `LLM_PROVIDER` from AppSetting (DB) then env var. Instantiates the matching provider class.
|
||||||
|
|
||||||
|
| Provider | Class | Key Setting |
|
||||||
|
|---|---|---|
|
||||||
|
| `openai` | `OpenAIProvider` | `OPENAI_API_KEY`, `OPENAI_MODEL` |
|
||||||
|
| `anthropic` | `AnthropicProvider` | `ANTHROPIC_API_KEY`, `ANTHROPIC_MODEL` |
|
||||||
|
| `gemini` | `GeminiProvider` | `GEMINI_API_KEY`, `GEMINI_MODEL` |
|
||||||
|
| `ollama` | `OllamaProvider` | `OLLAMA_BASE_URL`, `OLLAMA_MODEL` |
|
||||||
|
|
||||||
|
All providers implement:
|
||||||
|
```python
|
||||||
|
generate_brief(doc_text, bill_metadata) → ReverseBrief
|
||||||
|
generate_amendment_brief(new_text, prev_text, bill_metadata) → ReverseBrief
|
||||||
|
```
|
||||||
|
|
||||||
|
### ReverseBrief Dataclass
|
||||||
|
|
||||||
|
```python
|
||||||
|
@dataclass
|
||||||
|
class ReverseBrief:
|
||||||
|
summary: str
|
||||||
|
key_points: list[dict] # [{text, citation, quote}]
|
||||||
|
risks: list[dict] # [{text, citation, quote}]
|
||||||
|
deadlines: list[dict] # [{date, description}]
|
||||||
|
topic_tags: list[str]
|
||||||
|
llm_provider: str
|
||||||
|
llm_model: str
|
||||||
|
```
|
||||||
|
|
||||||
|
### Prompt Design
|
||||||
|
|
||||||
|
**Full brief prompt** instructs the LLM to produce:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"summary": "2-4 paragraph plain-language explanation",
|
||||||
|
"key_points": [
|
||||||
|
{"text": "claim", "citation": "Section X(y)", "quote": "verbatim excerpt ≤80 words"}
|
||||||
|
],
|
||||||
|
"risks": [
|
||||||
|
{"text": "concern", "citation": "Section X(y)", "quote": "verbatim excerpt ≤80 words"}
|
||||||
|
],
|
||||||
|
"deadlines": [{"date": "YYYY-MM-DD or null", "description": "..."}],
|
||||||
|
"topic_tags": ["healthcare", "taxation"]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Amendment brief prompt** focuses on what changed between document versions.
|
||||||
|
|
||||||
|
**Smart truncation:** Bills exceeding the token budget are trimmed — 75% of budget from the start (preamble/purpose), 25% from the end (enforcement/effective dates), with an omission notice in the middle.
|
||||||
|
|
||||||
|
**Token budgets:**
|
||||||
|
- OpenAI / Anthropic / Gemini: 6,000 tokens
|
||||||
|
- Ollama: 3,000 tokens (local models have smaller context windows)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Frontend
|
||||||
|
|
||||||
|
**Framework:** Next.js 15 (App Router), TypeScript, Tailwind CSS
|
||||||
|
**State:** Zustand (auth), TanStack Query (server state)
|
||||||
|
**HTTP:** Axios with JWT interceptor
|
||||||
|
|
||||||
|
### Pages
|
||||||
|
|
||||||
|
| Route | Description |
|
||||||
|
|---|---|
|
||||||
|
| `/` | Dashboard — personalized feed + trending bills |
|
||||||
|
| `/bills` | Browse all bills with search, chamber/topic filters, pagination |
|
||||||
|
| `/bills/[id]` | Bill detail — brief with § citations, action timeline, news, trend chart |
|
||||||
|
| `/members` | Browse members of Congress, filter by chamber/party/state |
|
||||||
|
| `/members/[id]` | Member profile + sponsored bills |
|
||||||
|
| `/following` | User's followed bills, members, and topics |
|
||||||
|
| `/topics` | Browse and follow policy topics |
|
||||||
|
| `/settings` | Admin panel (admin only) |
|
||||||
|
| `/login` | Email + password sign-in |
|
||||||
|
| `/register` | Account creation |
|
||||||
|
|
||||||
|
### Key Components
|
||||||
|
|
||||||
|
**`AIBriefCard.tsx`**
|
||||||
|
Renders the LLM brief. For cited items (new format), shows a `§ Section X(y)` chip next to each bullet. Clicking the chip expands an inline panel with:
|
||||||
|
- Blockquoted verbatim excerpt from the bill
|
||||||
|
- "View source →" link to GovInfo (opens in new tab)
|
||||||
|
- One chip open at a time per card
|
||||||
|
- Old plain-string briefs render without chips (graceful backward compat)
|
||||||
|
|
||||||
|
**`AuthGuard.tsx`**
|
||||||
|
Client component wrapping the entire app. Waits for Zustand hydration, then redirects unauthenticated users to `/login`. Public paths (`/login`, `/register`) bypass the guard.
|
||||||
|
|
||||||
|
**`Sidebar.tsx`**
|
||||||
|
Navigation with: Home, Bills, Members, Following, Topics, Settings (admin only). Shows current user email + logout button at the bottom.
|
||||||
|
|
||||||
|
**`BillCard.tsx`**
|
||||||
|
Compact bill preview showing bill ID, title, sponsor with party badge, latest action date, and status.
|
||||||
|
|
||||||
|
**`TrendChart.tsx`**
|
||||||
|
Line chart of `composite_score` over time with tooltip breakdown of each data source.
|
||||||
|
|
||||||
|
### Utility Functions (`lib/utils.ts`)
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
partyBadgeColor(party) → Tailwind classes
|
||||||
|
"Republican" → "bg-red-600 text-white"
|
||||||
|
"Democrat" → "bg-blue-600 text-white"
|
||||||
|
other → "bg-slate-500 text-white"
|
||||||
|
|
||||||
|
partyColor(party) → text color class (used inline)
|
||||||
|
trendColor(score) → color class based on score thresholds
|
||||||
|
billLabel(type, number) → "H.R. 1234", "S. 567", etc.
|
||||||
|
formatDate(date) → "Feb 28, 2026"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Auth Store (`stores/authStore.ts`)
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
interface AuthState {
|
||||||
|
token: string | null
|
||||||
|
user: { id: number; email: string; is_admin: boolean } | null
|
||||||
|
setAuth(token, user): void
|
||||||
|
logout(): void
|
||||||
|
}
|
||||||
|
// Persisted to localStorage as "pocketveto-auth"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Authentication
|
||||||
|
|
||||||
|
- **Algorithm:** HS256 JWT, 7-day expiry
|
||||||
|
- **Storage:** Zustand store persisted to `localStorage` key `pocketveto-auth`
|
||||||
|
- **Injection:** Axios request interceptor reads from localStorage and adds `Authorization: Bearer <token>` to every request
|
||||||
|
- **First user:** The first account registered automatically receives `is_admin = true`
|
||||||
|
- **Admin role:** Required for PUT/POST `/api/settings`, all `/api/admin/*` endpoints, and viewing the Settings page in the UI
|
||||||
|
- **No email verification:** Accounts are active immediately on registration
|
||||||
|
- **Public endpoints:** `/api/bills`, `/api/members`, `/api/search`, `/api/health` — no auth required
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Architectural Patterns
|
||||||
|
|
||||||
|
### Idempotent Workers
|
||||||
|
Every Celery task checks for existing records before processing. Combined with `task_acks_late=True`, this means:
|
||||||
|
- Tasks can be retried without creating duplicates
|
||||||
|
- Worker crashes don't lose work (task stays in queue until acknowledged)
|
||||||
|
|
||||||
|
### Incremental Polling
|
||||||
|
The Congress.gov poller uses `fromDateTime` to fetch only recently updated bills, tracking the last poll timestamp in `app_settings`. On first run it seeds 60 days back to avoid processing thousands of old bills.
|
||||||
|
|
||||||
|
### Bill Type Filtering
|
||||||
|
Only tracks legislation that can become law:
|
||||||
|
- `hr` (House Resolution → Bill)
|
||||||
|
- `s` (Senate Bill)
|
||||||
|
- `hjres` (House Joint Resolution)
|
||||||
|
- `sjres` (Senate Joint Resolution)
|
||||||
|
|
||||||
|
Excluded (procedural, cannot become law): `hres`, `sres`, `hconres`, `sconres`
|
||||||
|
|
||||||
|
### Queue Specialization
|
||||||
|
Separate queues prevent a flood of LLM tasks from blocking time-sensitive polling tasks. Worker prefetch of 1 prevents any single worker from hoarding slow LLM jobs.
|
||||||
|
|
||||||
|
### LLM Provider Abstraction
|
||||||
|
All LLM providers implement the same interface. Switching providers is a single admin setting change — no code changes, no restart required (the factory reads from DB on each task invocation).
|
||||||
|
|
||||||
|
### JSONB for Flexible Brief Storage
|
||||||
|
`key_points`, `risks`, `deadlines`, `topic_tags` are stored as JSONB. This means the schema change from `list[str]` to `list[{text, citation, quote}]` required no migration — only the LLM prompt and application code changed. Old string-format briefs and new cited-object briefs coexist in the same column.
|
||||||
|
|
||||||
|
### Redis-backed Beat Schedule (RedBeat)
|
||||||
|
The Celery Beat schedule is stored in Redis rather than in memory. This means the beat scheduler can restart without losing schedule state or double-firing tasks.
|
||||||
|
|
||||||
|
### Docker DNS Re-resolution
|
||||||
|
Nginx uses `resolver 127.0.0.11 valid=10s` (Docker's internal DNS) so upstream container IPs are refreshed every 10 seconds. Without this, nginx caches the IP at startup and returns 502 errors after any container is recreated.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Feature History
|
||||||
|
|
||||||
|
### v0.1.0 — Foundation
|
||||||
|
- Docker Compose stack: PostgreSQL, Redis, FastAPI, Celery, Next.js, Nginx
|
||||||
|
- Congress.gov API integration: bill polling, member sync
|
||||||
|
- GovInfo document fetching with intelligent truncation
|
||||||
|
- Multi-provider LLM service (OpenAI, Anthropic, Gemini, Ollama)
|
||||||
|
- AI brief generation: summary, key points, risks, deadlines, topic tags
|
||||||
|
- Amendment-aware processing: diffs new bill versions against prior
|
||||||
|
- NewsAPI + Google News RSS article correlation
|
||||||
|
- Google Trends (pytrends) scoring
|
||||||
|
- Composite trend score (0–100) with weighted formula
|
||||||
|
- Full-text bill search (PostgreSQL tsvector)
|
||||||
|
- Member of Congress browsing
|
||||||
|
- Global follows (bill / member / topic)
|
||||||
|
- Personalized dashboard feed
|
||||||
|
- Admin settings page (LLM provider selection, data source status)
|
||||||
|
- Manual Celery task triggers from UI
|
||||||
|
- Bill type filtering: only legislation that can become law
|
||||||
|
- 60-day seed window on fresh install
|
||||||
|
|
||||||
|
**Multi-User Auth (added to v0.1.0):**
|
||||||
|
- Email + password registration/login (JWT, bcrypt)
|
||||||
|
- Per-user follow scoping
|
||||||
|
- Admin role (first user = admin)
|
||||||
|
- Admin user management: list, delete, promote/demote
|
||||||
|
- AuthGuard with login/register pages
|
||||||
|
- Analysis status dashboard (auto-refresh every 30s)
|
||||||
|
|
||||||
|
### v0.2.0 — Citations
|
||||||
|
- **Per-claim citations on AI briefs:** every key point and risk includes:
|
||||||
|
- `citation` — section reference (e.g., "Section 301(a)(2)")
|
||||||
|
- `quote` — verbatim excerpt ≤80 words from that section
|
||||||
|
- `§` citation chip UI on each bullet — click to expand quote + GovInfo source link
|
||||||
|
- `govinfo_url` stored on `BillBrief` for direct frontend access
|
||||||
|
- Old briefs (plain strings) render without chips — backward compatible
|
||||||
|
- Migration 0006: `govinfo_url` column on `bill_briefs`
|
||||||
|
- Party badges redesigned: solid `red-600` / `blue-600` / `slate-500` with white text, readable in both light and dark mode
|
||||||
|
- Tailwind content scan extended to include `lib/` directory
|
||||||
|
- Nginx DNS resolver fix: prevents stale-IP 502s after container restarts
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Deployment
|
||||||
|
|
||||||
|
### First Deploy
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cp .env.example .env
|
||||||
|
# Edit .env — add API keys, generate JWT_SECRET_KEY
|
||||||
|
docker compose up --build -d
|
||||||
|
```
|
||||||
|
|
||||||
|
Migrations run automatically. Navigate to the app, register the first account (it becomes admin).
|
||||||
|
|
||||||
|
### Updating
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git pull origin main
|
||||||
|
docker compose up --build -d
|
||||||
|
docker compose exec nginx nginx -s reload # if nginx wasn't recreated
|
||||||
|
```
|
||||||
|
|
||||||
|
### Useful Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check all service status
|
||||||
|
docker compose ps
|
||||||
|
|
||||||
|
# View logs
|
||||||
|
docker compose logs api --tail=50
|
||||||
|
docker compose logs worker --tail=50
|
||||||
|
|
||||||
|
# Force a bill poll now
|
||||||
|
# → Admin page → Manual Controls → Trigger Poll
|
||||||
|
|
||||||
|
# Check DB column layout
|
||||||
|
docker compose exec postgres psql -U congress -d pocketveto -c "\d bill_briefs"
|
||||||
|
|
||||||
|
# Tail live worker output
|
||||||
|
docker compose logs -f worker
|
||||||
|
|
||||||
|
# Restart a specific service
|
||||||
|
docker compose restart worker
|
||||||
|
```
|
||||||
|
|
||||||
|
### Bill Regeneration (Optional)
|
||||||
|
|
||||||
|
Existing briefs generated before v0.2.0 use plain strings (no citations). To regenerate with citations:
|
||||||
|
|
||||||
|
1. Delete existing `bill_briefs` rows (keeps `bill_documents` intact)
|
||||||
|
2. Re-queue all documents via a one-off script similar to `queue_docs.py`
|
||||||
|
3. Worker will regenerate using the new cited prompt at 10/minute
|
||||||
|
4. ~1,000 briefs ≈ 2 hours
|
||||||
|
|
||||||
|
This is **optional** — old string briefs render correctly in the UI with no citation chips.
|
||||||
Reference in New Issue
Block a user