From 795385dcbabb1ecdfc6a0c8e87789ee123636400 Mon Sep 17 00:00:00 2001 From: Jack Levy Date: Sat, 28 Feb 2026 23:07:43 -0500 Subject: [PATCH] docs: add comprehensive architecture documentation Covers full stack, database schema, API endpoints, Celery pipeline, LLM service design, frontend structure, auth, deployment, and feature history through v0.2.0. Authored-By: Jack Levy --- ARCHITECTURE.md | 796 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 796 insertions(+) create mode 100644 ARCHITECTURE.md diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 0000000..8769c6b --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,796 @@ +# PocketVeto — Architecture & Feature Documentation + +> **App brand:** PocketVeto +> **Repo:** civicstack +> **Purpose:** Citizen-grade US Congress monitoring with AI-powered bill analysis, per-claim citations, and personalized tracking. + +--- + +## Table of Contents + +1. [Overview](#overview) +2. [Tech Stack](#tech-stack) +3. [Infrastructure & Docker](#infrastructure--docker) +4. [Configuration & Environment](#configuration--environment) +5. [Database Schema](#database-schema) +6. [Alembic Migrations](#alembic-migrations) +7. [Backend API](#backend-api) +8. [Celery Workers & Pipeline](#celery-workers--pipeline) +9. [LLM Service](#llm-service) +10. [Frontend](#frontend) +11. [Authentication](#authentication) +12. [Key Architectural Patterns](#key-architectural-patterns) +13. [Feature History](#feature-history) +14. [Deployment](#deployment) + +--- + +## Overview + +PocketVeto is a self-hosted, full-stack application that automatically tracks US Congress legislation, fetches bill text, generates AI summaries with per-claim source citations, correlates bills with news and Google Trends, and presents everything through a personalized dashboard. Users follow bills, members of Congress, and policy topics; the system surfaces relevant activity in their feed. + +``` +Congress.gov API → Poller → DB → Document Fetcher → GovInfo + ↓ + LLM Processor + ↓ + BillBrief + (cited AI brief) + ↓ + News Fetcher + Trend Scorer + ↓ + Next.js Frontend +``` + +--- + +## Tech Stack + +| Layer | Technology | +|---|---| +| Reverse Proxy | Nginx (alpine) | +| Backend API | FastAPI + SQLAlchemy (async) | +| Task Queue | Celery 5 + Redis | +| Task Scheduler | Celery Beat + RedBeat (Redis-backed) | +| Database | PostgreSQL 16 | +| Cache / Broker | Redis 7 | +| Frontend | Next.js 15, React, Tailwind CSS, TypeScript | +| Auth | JWT (python-jose) + bcrypt (passlib) | +| LLM | Multi-provider factory: OpenAI, Anthropic, Gemini, Ollama | +| Bill Metadata | Congress.gov API (api.data.gov key) | +| Bill Text | GovInfo API (same api.data.gov key) | +| News | NewsAPI.org (100 req/day free tier) | +| Trends | Google Trends via pytrends | + +--- + +## Infrastructure & Docker + +### Services (`docker-compose.yml`) + +``` +postgres:16-alpine + DB: pocketveto + User: congress + Port: 5432 (internal) + +redis:7-alpine + Port: 6379 (internal) + Role: Celery broker, result backend, RedBeat schedule store + +api (civicstack-api image) + Port: 8000 (internal) + Command: alembic upgrade head && uvicorn app.main:app --host 0.0.0.0 --port 8000 + Depends: postgres (healthy), redis (healthy) + +worker (civicstack-worker image) + Command: celery -A app.workers.celery_app worker -Q polling,documents,llm,news -c 4 + Depends: postgres (healthy), redis (healthy) + +beat (civicstack-beat image) + Command: celery -A app.workers.celery_app beat -S redbeat.RedBeatScheduler + Depends: redis (healthy) + +frontend (civicstack-frontend image) + Port: 3000 (internal) + Build: Next.js standalone output + +nginx:alpine + Port: 80 → public + Routes: /api/* → api:8000 | /* → frontend:3000 +``` + +### Nginx Config (`nginx/nginx.conf`) + +- `resolver 127.0.0.11 valid=10s` — re-resolves Docker DNS after container restarts (prevents stale-IP 502s on redeploy) +- `/api/` → FastAPI, 120s read timeout +- `/_next/static/` → frontend with 1-day cache header +- `/` → frontend with WebSocket upgrade support + +--- + +## Configuration & Environment + +Copy `.env.example` → `.env` and fill in keys before first run. + +```env +# Network +LOCAL_URL=http://localhost +PUBLIC_URL= # optional, e.g. https://yourapp.com + +# Auth +JWT_SECRET_KEY= # python -c "import secrets; print(secrets.token_hex(32))" + +# PostgreSQL +POSTGRES_USER=congress +POSTGRES_PASSWORD=congress +POSTGRES_DB=pocketveto + +# Redis +REDIS_URL=redis://redis:6379/0 + +# Congress.gov + GovInfo (shared key from api.data.gov) +DATA_GOV_API_KEY= +CONGRESS_POLL_INTERVAL_MINUTES=30 + +# LLM — pick one provider +LLM_PROVIDER=openai # openai | anthropic | gemini | ollama +OPENAI_API_KEY= +OPENAI_MODEL=gpt-4o +ANTHROPIC_API_KEY= +ANTHROPIC_MODEL=claude-opus-4-6 +GEMINI_API_KEY= +GEMINI_MODEL=gemini-1.5-pro +OLLAMA_BASE_URL=http://host.docker.internal:11434 +OLLAMA_MODEL=llama3.1 + +# News & Trends +NEWSAPI_KEY= +PYTRENDS_ENABLED=true +``` + +**Runtime overrides:** LLM provider/model and poll interval can be changed live through the Admin page — stored in the `app_settings` table and take precedence over env vars. + +--- + +## Database Schema + +### `bills` +Primary key: `bill_id` — natural key in format `{congress}-{type}-{number}` (e.g. `119-hr-1234`). + +| Column | Type | Notes | +|---|---|---| +| bill_id | varchar (PK) | | +| congress_number | int | | +| bill_type | varchar | `hr`, `s`, `hjres`, `sjres` (tracked); `hres`, `sres`, `hconres`, `sconres` (not tracked) | +| bill_number | int | | +| title | text | | +| short_title | text | | +| sponsor_id | varchar (FK → members) | bioguide_id | +| introduced_date | date | | +| latest_action_date | date | | +| latest_action_text | text | | +| status | varchar | | +| chamber | varchar | House / Senate | +| congress_url | varchar | congress.gov link | +| govtrack_url | varchar | | +| last_checked_at | timestamptz | | +| actions_fetched_at | timestamptz | | +| created_at / updated_at | timestamptz | | + +Indexes: `congress_number`, `latest_action_date`, `introduced_date`, `chamber`, `sponsor_id` + +--- + +### `bill_actions` + +| Column | Type | Notes | +|---|---|---| +| id | int (PK) | | +| bill_id | varchar (FK → bills, CASCADE) | | +| action_date | date | | +| action_text | text | | +| action_type | varchar | | +| chamber | varchar | | +| created_at | timestamptz | | + +--- + +### `bill_documents` +Stores fetched bill text versions from GovInfo. + +| Column | Type | Notes | +|---|---|---| +| id | int (PK) | | +| bill_id | varchar (FK → bills, CASCADE) | | +| doc_type | varchar | `bill_text`, `committee_report`, `amendment` | +| doc_version | varchar | Introduced, Enrolled, etc. | +| govinfo_url | varchar | Source URL on GovInfo | +| raw_text | text | Full extracted text | +| fetched_at | timestamptz | | +| created_at | timestamptz | | + +--- + +### `bill_briefs` +AI-generated analysis. `key_points` and `risks` are JSONB arrays of cited objects. + +| Column | Type | Notes | +|---|---|---| +| id | int (PK) | | +| bill_id | varchar (FK → bills, CASCADE) | | +| document_id | int (FK → bill_documents, SET NULL) | | +| brief_type | varchar | `full` (first version) or `amendment` (diff from prior version) | +| summary | text | 2-4 paragraph plain-language summary | +| key_points | jsonb | `[{text, citation, quote}]` | +| risks | jsonb | `[{text, citation, quote}]` | +| deadlines | jsonb | `[{date, description}]` | +| topic_tags | jsonb | `["healthcare", "taxation", ...]` | +| llm_provider | varchar | Which provider generated this brief | +| llm_model | varchar | Specific model name | +| govinfo_url | varchar (nullable) | Source document URL (from bill_documents) | +| created_at | timestamptz | | + +Indexes: `bill_id`, `topic_tags` (GIN for JSONB containment queries) + +**Citation structure** — each `key_points`/`risks` item: +```json +{ + "text": "The bill allocates $50B for defense", + "citation": "Section 301(a)(2)", + "quote": "There is hereby appropriated for fiscal year 2026, $50,000,000,000 for the Department of Defense..." +} +``` + +--- + +### `members` +Primary key: `bioguide_id` (Congress.gov canonical identifier). + +| Column | Type | +|---|---| +| bioguide_id | varchar (PK) | +| name | varchar | +| first_name / last_name | varchar | +| party | varchar | +| state | varchar | +| chamber | varchar | +| district | varchar (nullable, House only) | +| photo_url | varchar (nullable) | +| created_at / updated_at | timestamptz | + +--- + +### `users` + +| Column | Type | Notes | +|---|---|---| +| id | int (PK) | | +| email | varchar (unique) | | +| hashed_password | varchar | bcrypt | +| is_admin | bool | First registered user = true | +| notification_prefs | jsonb | Future: ntfy, Telegram, RSS config | +| created_at | timestamptz | | + +--- + +### `follows` + +| Column | Type | Notes | +|---|---|---| +| id | int (PK) | | +| user_id | int (FK → users, CASCADE) | | +| follow_type | varchar | `bill`, `member`, `topic` | +| follow_value | varchar | bill_id, bioguide_id, or topic name | +| created_at | timestamptz | | + +Unique constraint: `(user_id, follow_type, follow_value)` + +--- + +### `news_articles` + +| Column | Type | Notes | +|---|---|---| +| id | int (PK) | | +| bill_id | varchar (FK → bills, CASCADE) | | +| source | varchar | News outlet | +| headline | varchar | | +| url | varchar (unique) | Deduplication key | +| published_at | timestamptz | | +| relevance_score | float | Default 1.0 | +| created_at | timestamptz | | + +--- + +### `trend_scores` +One record per bill per day. + +| Column | Type | Notes | +|---|---|---| +| id | int (PK) | | +| bill_id | varchar (FK → bills, CASCADE) | | +| score_date | date | | +| newsapi_count | int | Articles from NewsAPI (30-day window) | +| gnews_count | int | Articles from Google News RSS | +| gtrends_score | float | Google Trends interest 0–100 | +| composite_score | float | Weighted combination 0–100 | +| created_at | timestamptz | | + +**Composite score formula:** +``` +newsapi_pts = min(newsapi_count / 20, 1.0) × 40 # saturates at 20 articles +gnews_pts = min(gnews_count / 50, 1.0) × 30 # saturates at 50 articles +gtrends_pts = (gtrends_score / 100) × 30 +composite = newsapi_pts + gnews_pts + gtrends_pts # range 0–100 +``` + +--- + +### `committees` / `committee_bills` + +| committees | committee_id (PK), name, chamber, type | +|---|---| +| committee_bills | id, committee_id (FK), bill_id (FK), referred_date | + +--- + +### `app_settings` +Key-value store for runtime-configurable settings. + +| Key | Purpose | +|---|---| +| `congress_last_polled_at` | ISO timestamp of last successful poll | +| `llm_provider` | Overrides `LLM_PROVIDER` env var | +| `llm_model` | Overrides provider default model | +| `congress_poll_interval_minutes` | Overrides env var | + +--- + +## Alembic Migrations + +| File | Description | +|---|---| +| `0001_initial_schema.py` | All initial tables | +| `0002_widen_chamber_party_columns.py` | Wider varchar for Bill.chamber, Member.party | +| `0003_widen_member_state_district.py` | Wider varchar for Member.state, Member.district | +| `0004_add_brief_type.py` | BillBrief.brief_type column (`full`/`amendment`) | +| `0005_add_users_and_user_follows.py` | users table + user_id FK on follows; drops global follows | +| `0006_add_brief_govinfo_url.py` | BillBrief.govinfo_url for frontend source links | + +Migrations run automatically on API startup: `alembic upgrade head`. + +--- + +## Backend API + +Base URL: `/api` +Auth header: `Authorization: Bearer ` + +### `/api/auth` + +| Method | Path | Auth | Description | +|---|---|---|---| +| POST | `/register` | — | Create account. First user → admin. Returns token + user. | +| POST | `/login` | — | Returns token + user. | +| GET | `/me` | Required | Current user info. | + +### `/api/bills` + +| Method | Path | Auth | Description | +|---|---|---|---| +| GET | `/` | — | Paginated bill list. Query: `chamber`, `topic`, `sponsor_id`, `q`, `page`, `per_page`, `sort`. | +| GET | `/{bill_id}` | — | Full bill detail with sponsor, actions, briefs, news, trend scores. | +| GET | `/{bill_id}/actions` | — | Action timeline, newest first. | +| GET | `/{bill_id}/news` | — | Related news articles, limit 20. | +| GET | `/{bill_id}/trend` | — | Trend score history. Query: `days` (7–365, default 30). | + +### `/api/members` + +| Method | Path | Auth | Description | +|---|---|---|---| +| GET | `/` | — | Paginated members. Query: `chamber`, `party`, `state`, `q`, `page`, `per_page`. | +| GET | `/{bioguide_id}` | — | Member detail. | +| GET | `/{bioguide_id}/bills` | — | Member's sponsored bills, paginated. | + +### `/api/follows` + +| Method | Path | Auth | Description | +|---|---|---|---| +| GET | `/` | Required | Current user's follows. | +| POST | `/` | Required | Add follow `{follow_type, follow_value}`. Idempotent. | +| DELETE | `/{id}` | Required | Remove follow (ownership checked). | + +### `/api/dashboard` + +| Method | Path | Auth | Description | +|---|---|---|---| +| GET | `/` | Required | Personalized feed from followed bills/members/topics + trending. Returns `{feed, trending, follows}`. | + +### `/api/search` + +| Method | Path | Auth | Description | +|---|---|---|---| +| GET | `/` | — | Full-text search. Query: `q` (min 2 chars). Returns `{bills, members}`. | + +### `/api/settings` + +| Method | Path | Auth | Description | +|---|---|---|---| +| GET | `/` | Required | Current settings (DB overrides env). | +| PUT | `/` | Admin | Update `{key, value}`. Allowed keys: `llm_provider`, `llm_model`, `congress_poll_interval_minutes`. | +| POST | `/test-llm` | Admin | Test LLM connection. Returns `{status, provider, model, summary_preview}`. | + +### `/api/admin` + +| Method | Path | Auth | Description | +|---|---|---|---| +| GET | `/users` | Admin | All users with follow counts. | +| DELETE | `/users/{id}` | Admin | Delete user (cannot delete self). Cascades follows. | +| PATCH | `/users/{id}/toggle-admin` | Admin | Promote/demote admin status (cannot change self). | +| GET | `/stats` | Admin | Pipeline progress: total bills, docs fetched, briefs generated, remaining. | +| POST | `/trigger-poll` | Admin | Queue immediate Congress.gov poll. | +| POST | `/trigger-member-sync` | Admin | Queue member sync. | +| POST | `/trigger-trend-scores` | Admin | Queue trend score calculation. | +| GET | `/task-status/{task_id}` | Admin | Celery task status and result. | + +### `/api/health` + +| Method | Path | Description | +|---|---|---| +| GET | `/` | Simple health check `{status: "ok", timestamp}`. | +| GET | `/detailed` | Tests PostgreSQL + Redis. Returns per-service status. | + +--- + +## Celery Workers & Pipeline + +**Celery app name:** `pocketveto` +**Broker / Backend:** Redis + +### Queue Routing + +| Queue | Workers | Tasks | +|---|---|---| +| `polling` | worker | `poll_congress_bills`, `sync_members` | +| `documents` | worker | `fetch_bill_documents` | +| `llm` | worker | `process_document_with_llm` | +| `news` | worker | `fetch_news_for_bill`, `fetch_news_for_active_bills`, `calculate_all_trend_scores` | + +**Worker settings:** +- `task_acks_late = True` — task removed from queue only after completion, not on pickup +- `worker_prefetch_multiplier = 1` — prevents workers from hoarding LLM tasks +- Serialization: JSON + +### Beat Schedule (RedBeat, stored in Redis) + +| Schedule | Task | When | +|---|---|---| +| Configurable (default 30 min) | `poll_congress_bills` | Continuous | +| Every 6 hours | `fetch_news_for_active_bills` | Ongoing | +| Daily 2 AM UTC | `calculate_all_trend_scores` | Nightly | + +--- + +### Pipeline Flow + +``` +1. congress_poller.poll_congress_bills() + ↳ Fetches bills updated since last poll (fromDateTime param) + ↳ Filters: only hr, s, hjres, sjres (legislation that can become law) + ↳ First run: seeds from 60 days back + ↳ New bills → fetch_bill_documents.delay(bill_id) + ↳ Updated bills → fetch_bill_documents.delay(bill_id) if changed + +2. document_fetcher.fetch_bill_documents(bill_id) + ↳ Gets text versions from Congress.gov (XML preferred, falls back to HTML/PDF) + ↳ Fetches raw text from GovInfo + ↳ Idempotent: skips if doc_version already stored + ↳ Stores BillDocument with govinfo_url + raw_text + ↳ → process_document_with_llm.delay(document_id) + +3. llm_processor.process_document_with_llm(document_id) + ↳ Rate limited: 10/minute + ↳ Idempotent: skips if brief exists for document + ↳ Determines type: + - No prior brief → "full" brief + - Prior brief exists → "amendment" brief (diff vs previous) + ↳ Calls configured LLM provider + ↳ Stores BillBrief with cited key_points and risks + ↳ → fetch_news_for_bill.delay(bill_id) + +4. news_fetcher.fetch_news_for_bill(bill_id) + ↳ Queries NewsAPI using bill title + topic_tags + ↳ Deduplicates by URL + ↳ Stores NewsArticle records + +5. trend_scorer.calculate_all_trend_scores() [nightly] + ↳ Bills active in last 90 days + ↳ Skips bills already scored today + ↳ Fetches: NewsAPI count + Google News RSS count + Google Trends score + ↳ Calculates composite_score (0–100) + ↳ Stores TrendScore record +``` + +--- + +## LLM Service + +**File:** `backend/app/services/llm_service.py` + +### Provider Factory + +```python +get_llm_provider() → LLMProvider +``` + +Reads `LLM_PROVIDER` from AppSetting (DB) then env var. Instantiates the matching provider class. + +| Provider | Class | Key Setting | +|---|---|---| +| `openai` | `OpenAIProvider` | `OPENAI_API_KEY`, `OPENAI_MODEL` | +| `anthropic` | `AnthropicProvider` | `ANTHROPIC_API_KEY`, `ANTHROPIC_MODEL` | +| `gemini` | `GeminiProvider` | `GEMINI_API_KEY`, `GEMINI_MODEL` | +| `ollama` | `OllamaProvider` | `OLLAMA_BASE_URL`, `OLLAMA_MODEL` | + +All providers implement: +```python +generate_brief(doc_text, bill_metadata) → ReverseBrief +generate_amendment_brief(new_text, prev_text, bill_metadata) → ReverseBrief +``` + +### ReverseBrief Dataclass + +```python +@dataclass +class ReverseBrief: + summary: str + key_points: list[dict] # [{text, citation, quote}] + risks: list[dict] # [{text, citation, quote}] + deadlines: list[dict] # [{date, description}] + topic_tags: list[str] + llm_provider: str + llm_model: str +``` + +### Prompt Design + +**Full brief prompt** instructs the LLM to produce: +```json +{ + "summary": "2-4 paragraph plain-language explanation", + "key_points": [ + {"text": "claim", "citation": "Section X(y)", "quote": "verbatim excerpt ≤80 words"} + ], + "risks": [ + {"text": "concern", "citation": "Section X(y)", "quote": "verbatim excerpt ≤80 words"} + ], + "deadlines": [{"date": "YYYY-MM-DD or null", "description": "..."}], + "topic_tags": ["healthcare", "taxation"] +} +``` + +**Amendment brief prompt** focuses on what changed between document versions. + +**Smart truncation:** Bills exceeding the token budget are trimmed — 75% of budget from the start (preamble/purpose), 25% from the end (enforcement/effective dates), with an omission notice in the middle. + +**Token budgets:** +- OpenAI / Anthropic / Gemini: 6,000 tokens +- Ollama: 3,000 tokens (local models have smaller context windows) + +--- + +## Frontend + +**Framework:** Next.js 15 (App Router), TypeScript, Tailwind CSS +**State:** Zustand (auth), TanStack Query (server state) +**HTTP:** Axios with JWT interceptor + +### Pages + +| Route | Description | +|---|---| +| `/` | Dashboard — personalized feed + trending bills | +| `/bills` | Browse all bills with search, chamber/topic filters, pagination | +| `/bills/[id]` | Bill detail — brief with § citations, action timeline, news, trend chart | +| `/members` | Browse members of Congress, filter by chamber/party/state | +| `/members/[id]` | Member profile + sponsored bills | +| `/following` | User's followed bills, members, and topics | +| `/topics` | Browse and follow policy topics | +| `/settings` | Admin panel (admin only) | +| `/login` | Email + password sign-in | +| `/register` | Account creation | + +### Key Components + +**`AIBriefCard.tsx`** +Renders the LLM brief. For cited items (new format), shows a `§ Section X(y)` chip next to each bullet. Clicking the chip expands an inline panel with: +- Blockquoted verbatim excerpt from the bill +- "View source →" link to GovInfo (opens in new tab) +- One chip open at a time per card +- Old plain-string briefs render without chips (graceful backward compat) + +**`AuthGuard.tsx`** +Client component wrapping the entire app. Waits for Zustand hydration, then redirects unauthenticated users to `/login`. Public paths (`/login`, `/register`) bypass the guard. + +**`Sidebar.tsx`** +Navigation with: Home, Bills, Members, Following, Topics, Settings (admin only). Shows current user email + logout button at the bottom. + +**`BillCard.tsx`** +Compact bill preview showing bill ID, title, sponsor with party badge, latest action date, and status. + +**`TrendChart.tsx`** +Line chart of `composite_score` over time with tooltip breakdown of each data source. + +### Utility Functions (`lib/utils.ts`) + +```typescript +partyBadgeColor(party) → Tailwind classes + "Republican" → "bg-red-600 text-white" + "Democrat" → "bg-blue-600 text-white" + other → "bg-slate-500 text-white" + +partyColor(party) → text color class (used inline) +trendColor(score) → color class based on score thresholds +billLabel(type, number) → "H.R. 1234", "S. 567", etc. +formatDate(date) → "Feb 28, 2026" +``` + +### Auth Store (`stores/authStore.ts`) + +```typescript +interface AuthState { + token: string | null + user: { id: number; email: string; is_admin: boolean } | null + setAuth(token, user): void + logout(): void +} +// Persisted to localStorage as "pocketveto-auth" +``` + +--- + +## Authentication + +- **Algorithm:** HS256 JWT, 7-day expiry +- **Storage:** Zustand store persisted to `localStorage` key `pocketveto-auth` +- **Injection:** Axios request interceptor reads from localStorage and adds `Authorization: Bearer ` to every request +- **First user:** The first account registered automatically receives `is_admin = true` +- **Admin role:** Required for PUT/POST `/api/settings`, all `/api/admin/*` endpoints, and viewing the Settings page in the UI +- **No email verification:** Accounts are active immediately on registration +- **Public endpoints:** `/api/bills`, `/api/members`, `/api/search`, `/api/health` — no auth required + +--- + +## Key Architectural Patterns + +### Idempotent Workers +Every Celery task checks for existing records before processing. Combined with `task_acks_late=True`, this means: +- Tasks can be retried without creating duplicates +- Worker crashes don't lose work (task stays in queue until acknowledged) + +### Incremental Polling +The Congress.gov poller uses `fromDateTime` to fetch only recently updated bills, tracking the last poll timestamp in `app_settings`. On first run it seeds 60 days back to avoid processing thousands of old bills. + +### Bill Type Filtering +Only tracks legislation that can become law: +- `hr` (House Resolution → Bill) +- `s` (Senate Bill) +- `hjres` (House Joint Resolution) +- `sjres` (Senate Joint Resolution) + +Excluded (procedural, cannot become law): `hres`, `sres`, `hconres`, `sconres` + +### Queue Specialization +Separate queues prevent a flood of LLM tasks from blocking time-sensitive polling tasks. Worker prefetch of 1 prevents any single worker from hoarding slow LLM jobs. + +### LLM Provider Abstraction +All LLM providers implement the same interface. Switching providers is a single admin setting change — no code changes, no restart required (the factory reads from DB on each task invocation). + +### JSONB for Flexible Brief Storage +`key_points`, `risks`, `deadlines`, `topic_tags` are stored as JSONB. This means the schema change from `list[str]` to `list[{text, citation, quote}]` required no migration — only the LLM prompt and application code changed. Old string-format briefs and new cited-object briefs coexist in the same column. + +### Redis-backed Beat Schedule (RedBeat) +The Celery Beat schedule is stored in Redis rather than in memory. This means the beat scheduler can restart without losing schedule state or double-firing tasks. + +### Docker DNS Re-resolution +Nginx uses `resolver 127.0.0.11 valid=10s` (Docker's internal DNS) so upstream container IPs are refreshed every 10 seconds. Without this, nginx caches the IP at startup and returns 502 errors after any container is recreated. + +--- + +## Feature History + +### v0.1.0 — Foundation +- Docker Compose stack: PostgreSQL, Redis, FastAPI, Celery, Next.js, Nginx +- Congress.gov API integration: bill polling, member sync +- GovInfo document fetching with intelligent truncation +- Multi-provider LLM service (OpenAI, Anthropic, Gemini, Ollama) +- AI brief generation: summary, key points, risks, deadlines, topic tags +- Amendment-aware processing: diffs new bill versions against prior +- NewsAPI + Google News RSS article correlation +- Google Trends (pytrends) scoring +- Composite trend score (0–100) with weighted formula +- Full-text bill search (PostgreSQL tsvector) +- Member of Congress browsing +- Global follows (bill / member / topic) +- Personalized dashboard feed +- Admin settings page (LLM provider selection, data source status) +- Manual Celery task triggers from UI +- Bill type filtering: only legislation that can become law +- 60-day seed window on fresh install + +**Multi-User Auth (added to v0.1.0):** +- Email + password registration/login (JWT, bcrypt) +- Per-user follow scoping +- Admin role (first user = admin) +- Admin user management: list, delete, promote/demote +- AuthGuard with login/register pages +- Analysis status dashboard (auto-refresh every 30s) + +### v0.2.0 — Citations +- **Per-claim citations on AI briefs:** every key point and risk includes: + - `citation` — section reference (e.g., "Section 301(a)(2)") + - `quote` — verbatim excerpt ≤80 words from that section +- `§` citation chip UI on each bullet — click to expand quote + GovInfo source link +- `govinfo_url` stored on `BillBrief` for direct frontend access +- Old briefs (plain strings) render without chips — backward compatible +- Migration 0006: `govinfo_url` column on `bill_briefs` +- Party badges redesigned: solid `red-600` / `blue-600` / `slate-500` with white text, readable in both light and dark mode +- Tailwind content scan extended to include `lib/` directory +- Nginx DNS resolver fix: prevents stale-IP 502s after container restarts + +--- + +## Deployment + +### First Deploy + +```bash +cp .env.example .env +# Edit .env — add API keys, generate JWT_SECRET_KEY +docker compose up --build -d +``` + +Migrations run automatically. Navigate to the app, register the first account (it becomes admin). + +### Updating + +```bash +git pull origin main +docker compose up --build -d +docker compose exec nginx nginx -s reload # if nginx wasn't recreated +``` + +### Useful Commands + +```bash +# Check all service status +docker compose ps + +# View logs +docker compose logs api --tail=50 +docker compose logs worker --tail=50 + +# Force a bill poll now +# → Admin page → Manual Controls → Trigger Poll + +# Check DB column layout +docker compose exec postgres psql -U congress -d pocketveto -c "\d bill_briefs" + +# Tail live worker output +docker compose logs -f worker + +# Restart a specific service +docker compose restart worker +``` + +### Bill Regeneration (Optional) + +Existing briefs generated before v0.2.0 use plain strings (no citations). To regenerate with citations: + +1. Delete existing `bill_briefs` rows (keeps `bill_documents` intact) +2. Re-queue all documents via a one-off script similar to `queue_docs.py` +3. Worker will regenerate using the new cited prompt at 10/minute +4. ~1,000 briefs ≈ 2 hours + +This is **optional** — old string briefs render correctly in the UI with no citation chips.