Files
PocketVeto/ARCHITECTURE.md
Jack Levy 1e37c99599 feat(phase2): fact/inference labeling, change-driven alerts, admin cleanup
- Add label: cited_fact | inference to LLM brief schema (all 4 providers)
- Inferred badge in AIBriefCard for inference-labeled points
- backfill_brief_labels Celery task: classifies existing cited points in-place
- POST /api/admin/backfill-labels + unlabeled_briefs stat counter
- Expand milestone keywords: markup, conference
- Add is_referral_action() for committee referrals (referred to)
- Two-tier milestone notifications: progress tier (all follow modes) and
  referral tier (pocket_veto/boost only, neutral suppressed)
- Topic followers now receive bill_updated milestone notifications via
  latest brief topic_tags lookup in _update_bill_if_changed()
- Admin Manual Controls: collapsible Maintenance section for backfill tasks
- Update ARCHITECTURE.md and roadmap for Phase 2 completion

Co-Authored-By: Jack Levy
2026-03-01 17:34:45 -05:00

1070 lines
47 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# PocketVeto — Architecture & Feature Documentation
> **App brand:** PocketVeto
> **Repo:** civicstack
> **Purpose:** Citizen-grade US Congress monitoring with AI-powered bill analysis, per-claim citations, and personalized tracking.
---
## Table of Contents
1. [Overview](#overview)
2. [Tech Stack](#tech-stack)
3. [Infrastructure & Docker](#infrastructure--docker)
4. [Configuration & Environment](#configuration--environment)
5. [Database Schema](#database-schema)
6. [Alembic Migrations](#alembic-migrations)
7. [Backend API](#backend-api)
8. [Celery Workers & Pipeline](#celery-workers--pipeline)
9. [LLM Service](#llm-service)
10. [Frontend](#frontend)
11. [Authentication](#authentication)
12. [Key Architectural Patterns](#key-architectural-patterns)
13. [Feature History](#feature-history)
14. [Deployment](#deployment)
---
## Overview
PocketVeto is a self-hosted, full-stack application that automatically tracks US Congress legislation, fetches bill text, generates AI summaries with per-claim source citations, correlates bills with news and Google Trends, and presents everything through a personalized dashboard. Users follow bills, members of Congress, and policy topics; the system surfaces relevant activity in their feed.
```
Congress.gov API → Poller → DB → Document Fetcher → GovInfo
LLM Processor
BillBrief
(cited AI brief)
News Fetcher + Trend Scorer
Next.js Frontend
```
---
## Tech Stack
| Layer | Technology |
|---|---|
| Reverse Proxy | Nginx (alpine) |
| Backend API | FastAPI + SQLAlchemy (async) |
| Task Queue | Celery 5 + Redis |
| Task Scheduler | Celery Beat + RedBeat (Redis-backed) |
| Database | PostgreSQL 16 |
| Cache / Broker | Redis 7 |
| Frontend | Next.js 15, React, Tailwind CSS, TypeScript |
| Auth | JWT (python-jose) + bcrypt (passlib) |
| LLM | Multi-provider factory: OpenAI, Anthropic, Gemini, Ollama |
| Bill Metadata | Congress.gov API (api.data.gov key) |
| Bill Text | GovInfo API (same api.data.gov key) |
| News | NewsAPI.org (100 req/day free tier) |
| Trends | Google Trends via pytrends |
---
## Infrastructure & Docker
### Services (`docker-compose.yml`)
```
postgres:16-alpine
DB: pocketveto
User: congress
Port: 5432 (internal)
redis:7-alpine
Port: 6379 (internal)
Role: Celery broker, result backend, RedBeat schedule store
api (civicstack-api image)
Port: 8000 (internal)
Command: alembic upgrade head && uvicorn app.main:app --host 0.0.0.0 --port 8000
Depends: postgres (healthy), redis (healthy)
worker (civicstack-worker image)
Command: celery -A app.workers.celery_app worker -Q polling,documents,llm,news -c 4
Depends: postgres (healthy), redis (healthy)
beat (civicstack-beat image)
Command: celery -A app.workers.celery_app beat -S redbeat.RedBeatScheduler
Depends: redis (healthy)
frontend (civicstack-frontend image)
Port: 3000 (internal)
Build: Next.js standalone output
nginx:alpine
Port: 80 → public
Routes: /api/* → api:8000 | /* → frontend:3000
```
### Nginx Config (`nginx/nginx.conf`)
- `resolver 127.0.0.11 valid=10s` — re-resolves Docker DNS after container restarts (prevents stale-IP 502s on redeploy)
- `/api/` → FastAPI, 120s read timeout
- `/_next/static/` → frontend with 1-day cache header
- `/` → frontend with WebSocket upgrade support
---
## Configuration & Environment
Copy `.env.example``.env` and fill in keys before first run.
```env
# Network
LOCAL_URL=http://localhost
PUBLIC_URL= # optional, e.g. https://yourapp.com
# Auth
JWT_SECRET_KEY= # python -c "import secrets; print(secrets.token_hex(32))"
# PostgreSQL
POSTGRES_USER=congress
POSTGRES_PASSWORD=congress
POSTGRES_DB=pocketveto
# Redis
REDIS_URL=redis://redis:6379/0
# Congress.gov + GovInfo (shared key from api.data.gov)
DATA_GOV_API_KEY=
CONGRESS_POLL_INTERVAL_MINUTES=30
# LLM — pick one provider
LLM_PROVIDER=openai # openai | anthropic | gemini | ollama
OPENAI_API_KEY=
OPENAI_MODEL=gpt-4o
ANTHROPIC_API_KEY=
ANTHROPIC_MODEL=claude-opus-4-6
GEMINI_API_KEY=
GEMINI_MODEL=gemini-2.0-flash
OLLAMA_BASE_URL=http://host.docker.internal:11434
OLLAMA_MODEL=llama3.1
# News & Trends
NEWSAPI_KEY=
PYTRENDS_ENABLED=true
```
**Runtime overrides:** LLM provider/model and poll interval can be changed live through the Admin page — stored in the `app_settings` table and take precedence over env vars.
---
## Database Schema
### `bills`
Primary key: `bill_id` — natural key in format `{congress}-{type}-{number}` (e.g. `119-hr-1234`).
| Column | Type | Notes |
|---|---|---|
| bill_id | varchar (PK) | |
| congress_number | int | |
| bill_type | varchar | `hr`, `s`, `hjres`, `sjres` (tracked); `hres`, `sres`, `hconres`, `sconres` (not tracked) |
| bill_number | int | |
| title | text | |
| short_title | text | |
| sponsor_id | varchar (FK → members) | bioguide_id |
| introduced_date | date | |
| latest_action_date | date | |
| latest_action_text | text | |
| status | varchar | |
| chamber | varchar | House / Senate |
| congress_url | varchar | congress.gov link |
| govtrack_url | varchar | |
| last_checked_at | timestamptz | |
| actions_fetched_at | timestamptz | |
| created_at / updated_at | timestamptz | |
Indexes: `congress_number`, `latest_action_date`, `introduced_date`, `chamber`, `sponsor_id`
---
### `bill_actions`
| Column | Type | Notes |
|---|---|---|
| id | int (PK) | |
| bill_id | varchar (FK → bills, CASCADE) | |
| action_date | date | |
| action_text | text | |
| action_type | varchar | |
| chamber | varchar | |
| created_at | timestamptz | |
---
### `bill_documents`
Stores fetched bill text versions from GovInfo.
| Column | Type | Notes |
|---|---|---|
| id | int (PK) | |
| bill_id | varchar (FK → bills, CASCADE) | |
| doc_type | varchar | `bill_text`, `committee_report`, `amendment` |
| doc_version | varchar | Introduced, Enrolled, etc. |
| govinfo_url | varchar | Source URL on GovInfo |
| raw_text | text | Full extracted text |
| fetched_at | timestamptz | |
| created_at | timestamptz | |
---
### `bill_briefs`
AI-generated analysis. `key_points` and `risks` are JSONB arrays of cited objects.
| Column | Type | Notes |
|---|---|---|
| id | int (PK) | |
| bill_id | varchar (FK → bills, CASCADE) | |
| document_id | int (FK → bill_documents, SET NULL) | |
| brief_type | varchar | `full` (first version) or `amendment` (diff from prior version) |
| summary | text | 2-4 paragraph plain-language summary |
| key_points | jsonb | `[{text, citation, quote}]` |
| risks | jsonb | `[{text, citation, quote}]` |
| deadlines | jsonb | `[{date, description}]` |
| topic_tags | jsonb | `["healthcare", "taxation", ...]` |
| llm_provider | varchar | Which provider generated this brief |
| llm_model | varchar | Specific model name |
| govinfo_url | varchar (nullable) | Source document URL (from bill_documents) |
| created_at | timestamptz | |
Indexes: `bill_id`, `topic_tags` (GIN for JSONB containment queries)
**Citation structure** — each `key_points`/`risks` item:
```json
{
"text": "The bill allocates $50B for defense",
"citation": "Section 301(a)(2)",
"quote": "There is hereby appropriated for fiscal year 2026, $50,000,000,000 for the Department of Defense...",
"label": "cited_fact"
}
```
`label` is `"cited_fact"` when the claim is explicitly stated in the quoted text, or `"inference"` when it is an analytical interpretation. Old briefs without this field render without a badge (backward compatible).
---
### `members`
Primary key: `bioguide_id` (Congress.gov canonical identifier).
| Column | Type | Notes |
|---|---|---|
| bioguide_id | varchar (PK) | |
| name | varchar | Stored as "Last, First" |
| first_name / last_name | varchar | |
| party | varchar | |
| state | varchar | |
| chamber | varchar | |
| district | varchar (nullable) | House only |
| photo_url | varchar (nullable) | |
| official_url | varchar (nullable) | Member's official website |
| congress_url | varchar (nullable) | congress.gov profile link |
| birth_year | varchar(10) (nullable) | |
| address | varchar (nullable) | DC office address |
| phone | varchar(50) (nullable) | DC office phone |
| terms_json | json (nullable) | Array of `{congress, startYear, endYear, chamber, partyName, stateName, district}` |
| leadership_json | json (nullable) | Array of `{type, congress, current}` |
| sponsored_count | int (nullable) | Total bills sponsored (lifetime) |
| cosponsored_count | int (nullable) | Total bills cosponsored (lifetime) |
| detail_fetched | timestamptz (nullable) | Set when bio detail was enriched from Congress.gov |
| created_at / updated_at | timestamptz | |
Member detail fields (`congress_url` through `detail_fetched`) are populated lazily on first profile view via a Congress.gov detail API call. The `detail_fetched` timestamp is the gate for scheduling member interest scoring.
### `member_trend_scores`
One record per member per day (mirrors `trend_scores` for bills).
| Column | Type | Notes |
|---|---|---|
| id | int (PK) | |
| member_id | varchar (FK → members, CASCADE) | bioguide_id |
| score_date | date | |
| newsapi_count | int | Articles from NewsAPI (30-day window) |
| gnews_count | int | Articles from Google News RSS |
| gtrends_score | float | Google Trends interest 0100 |
| composite_score | float | Weighted combination 0100 (same formula as bill trend scores) |
Unique constraint: `(member_id, score_date)`. Indexes: `member_id`, `score_date`, `composite_score`.
### `member_news_articles`
News articles correlated to a specific member of Congress.
| Column | Type | Notes |
|---|---|---|
| id | int (PK) | |
| member_id | varchar (FK → members, CASCADE) | bioguide_id |
| source | varchar | News outlet |
| headline | text | |
| url | varchar | Unique per `(member_id, url)` |
| published_at | timestamptz | |
| relevance_score | float | Default 1.0 |
| created_at | timestamptz | |
---
### `users`
| Column | Type | Notes |
|---|---|---|
| id | int (PK) | |
| email | varchar (unique) | |
| hashed_password | varchar | bcrypt |
| is_admin | bool | First registered user = true |
| notification_prefs | jsonb | ntfy topic URL, ntfy auth token, ntfy enabled, RSS token |
| rss_token | varchar (nullable) | Unique token for personal RSS feed URL |
| created_at | timestamptz | |
---
### `follows`
| Column | Type | Notes |
|---|---|---|
| id | int (PK) | |
| user_id | int (FK → users, CASCADE) | |
| follow_type | varchar | `bill`, `member`, `topic` |
| follow_value | varchar | bill_id, bioguide_id, or topic name |
| follow_mode | varchar | `neutral` \| `pocket_veto` \| `pocket_boost` (default `neutral`) |
| created_at | timestamptz | |
Unique constraint: `(user_id, follow_type, follow_value)`
---
### `news_articles`
| Column | Type | Notes |
|---|---|---|
| id | int (PK) | |
| bill_id | varchar (FK → bills, CASCADE) | |
| source | varchar | News outlet |
| headline | varchar | |
| url | varchar | Unique per `(bill_id, url)` — same article can appear across multiple bills |
| published_at | timestamptz | |
| relevance_score | float | Default 1.0 |
| created_at | timestamptz | |
---
### `trend_scores`
One record per bill per day.
| Column | Type | Notes |
|---|---|---|
| id | int (PK) | |
| bill_id | varchar (FK → bills, CASCADE) | |
| score_date | date | |
| newsapi_count | int | Articles from NewsAPI (30-day window) |
| gnews_count | int | Articles from Google News RSS |
| gtrends_score | float | Google Trends interest 0100 |
| composite_score | float | Weighted combination 0100 |
| created_at | timestamptz | |
**Composite score formula:**
```
newsapi_pts = min(newsapi_count / 20, 1.0) × 40 # saturates at 20 articles
gnews_pts = min(gnews_count / 50, 1.0) × 30 # saturates at 50 articles
gtrends_pts = (gtrends_score / 100) × 30
composite = newsapi_pts + gnews_pts + gtrends_pts # range 0100
```
---
### `committees` / `committee_bills`
| committees | committee_id (PK), name, chamber, type |
|---|---|
| committee_bills | id, committee_id (FK), bill_id (FK), referred_date |
---
### `app_settings`
Key-value store for runtime-configurable settings.
| Key | Purpose |
|---|---|
| `congress_last_polled_at` | ISO timestamp of last successful poll |
| `llm_provider` | Overrides `LLM_PROVIDER` env var |
| `llm_model` | Overrides provider default model |
| `congress_poll_interval_minutes` | Overrides env var |
---
### `notifications`
Stores notification events for dispatching to user channels (ntfy, RSS).
| Column | Type | Notes |
|---|---|---|
| id | int (PK) | |
| user_id | int (FK → users, CASCADE) | |
| bill_id | varchar (FK → bills, SET NULL) | nullable |
| event_type | varchar | `new_document`, `new_amendment`, `bill_updated` |
| payload | jsonb | `{bill_title, bill_label, brief_summary, bill_url, milestone_tier}` |
| dispatched_at | timestamptz (nullable) | NULL = pending dispatch |
| created_at | timestamptz | |
`milestone_tier` in payload: `"progress"` (passed, signed, markup, conference, etc.) or `"referral"` (committee referral). Neutral follows silently skip referral-tier events; pocket_veto and pocket_boost receive them as early warnings.
---
## Alembic Migrations
| File | Description |
|---|---|
| `0001_initial_schema.py` | All initial tables |
| `0002_widen_chamber_party_columns.py` | Wider varchar for Bill.chamber, Member.party |
| `0003_widen_member_state_district.py` | Wider varchar for Member.state, Member.district |
| `0004_add_brief_type.py` | BillBrief.brief_type column (`full`/`amendment`) |
| `0005_add_users_and_user_follows.py` | users table + user_id FK on follows; drops global follows |
| `0006_add_brief_govinfo_url.py` | BillBrief.govinfo_url for frontend source links |
| `0007_add_member_bio_fields.py` | Member extended bio: `congress_url`, `birth_year`, `address`, `phone`, `terms_json`, `leadership_json`, `sponsored_count`, `cosponsored_count`, `detail_fetched` |
| `0008_add_member_interest_tables.py` | New tables: `member_trend_scores`, `member_news_articles` |
| `0009_fix_news_articles_url_uniqueness.py` | Changed `news_articles.url` from globally unique to per-bill unique `(bill_id, url)` |
| `0010_backfill_bill_congress_urls.py` | Backfill congress_url on existing bill records |
| `0011_add_notifications.py` | `notifications` table + `rss_token` column on users |
Migrations run automatically on API startup: `alembic upgrade head`.
---
## Backend API
Base URL: `/api`
Auth header: `Authorization: Bearer <jwt>`
### `/api/auth`
| Method | Path | Auth | Description |
|---|---|---|---|
| POST | `/register` | — | Create account. First user → admin. Returns token + user. |
| POST | `/login` | — | Returns token + user. |
| GET | `/me` | Required | Current user info. |
### `/api/bills`
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | `/` | — | Paginated bill list. Query: `chamber`, `topic`, `sponsor_id`, `q`, `page`, `per_page`, `sort`. Includes `has_document` flag per bill via a single batch query. |
| GET | `/{bill_id}` | — | Full bill detail with sponsor, actions, briefs, news, trend scores. |
| GET | `/{bill_id}/actions` | — | Action timeline, newest first. |
| GET | `/{bill_id}/news` | — | Related news articles, limit 20. |
| GET | `/{bill_id}/trend` | — | Trend score history. Query: `days` (7365, default 30). |
| POST | `/{bill_id}/draft-letter` | — | Generate a constituent letter draft via the configured LLM. Body: `{stance, recipient, tone, selected_points, include_citations, zip_code?}`. Returns `{draft: string}`. ZIP code is used in the prompt only — never stored or logged. |
### `/api/members`
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | `/` | — | Paginated members. Query: `chamber`, `party`, `state`, `q`, `page`, `per_page`. |
| GET | `/{bioguide_id}` | — | Member detail. On first view, lazily enriches bio from Congress.gov and queues member interest scoring. Returns `latest_trend` if scored. |
| GET | `/{bioguide_id}/bills` | — | Member's sponsored bills, paginated. |
| GET | `/{bioguide_id}/trend` | — | Member trend score history. Query: `days` (7365, default 30). |
| GET | `/{bioguide_id}/news` | — | Member's recent news articles, limit 20. |
### `/api/follows`
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | `/` | Required | Current user's follows. |
| POST | `/` | Required | Add follow `{follow_type, follow_value}`. Idempotent. |
| DELETE | `/{id}` | Required | Remove follow (ownership checked). |
### `/api/dashboard`
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | `/` | Required | Personalized feed from followed bills/members/topics + trending. Returns `{feed, trending, follows}`. |
### `/api/search`
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | `/` | — | Full-text search. Query: `q` (min 2 chars). Returns `{bills, members}`. |
### `/api/settings`
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | `/` | Required | Current settings (DB overrides env). |
| PUT | `/` | Admin | Update `{key, value}`. Allowed keys: `llm_provider`, `llm_model`, `congress_poll_interval_minutes`. |
| POST | `/test-llm` | Admin | Test LLM connection with a lightweight ping (max_tokens=20). Returns `{status, provider, model, reply}`. |
| GET | `/llm-models?provider=X` | Admin | Fetch available models from the live provider API. Supports openai, anthropic, gemini, ollama. |
### `/api/notifications`
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | `/settings` | Required | User's notification preferences (ntfy URL/token, ntfy enabled, RSS token). |
| PUT | `/settings` | Required | Update notification preferences. |
| POST | `/settings/rss-reset` | Required | Regenerate RSS token (invalidates old URL). |
| GET | `/feed/{rss_token}.xml` | — | Personal RSS feed of notification events for this user. |
### `/api/admin`
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | `/users` | Admin | All users with follow counts. |
| DELETE | `/users/{id}` | Admin | Delete user (cannot delete self). Cascades follows. |
| PATCH | `/users/{id}/toggle-admin` | Admin | Promote/demote admin status (cannot change self). |
| GET | `/stats` | Admin | Pipeline counters: total bills, docs fetched, briefs generated, pending LLM, missing metadata/sponsors/actions, uncited briefs, unlabeled briefs (cited objects without a fact/inference label). |
| GET | `/api-health` | Admin | Test each external API in parallel; returns status + latency for Congress.gov, GovInfo, NewsAPI, Google News. |
| POST | `/trigger-poll` | Admin | Queue immediate Congress.gov poll. |
| POST | `/trigger-member-sync` | Admin | Queue member sync. |
| POST | `/trigger-trend-scores` | Admin | Queue trend score calculation. |
| POST | `/trigger-fetch-actions` | Admin | Queue action fetch for recently active bills (last 30 days). |
| POST | `/backfill-all-actions` | Admin | Queue action fetch for ALL bills with no action history (one-time catch-up). |
| POST | `/backfill-sponsors` | Admin | Queue one-off task to populate `sponsor_id` on bills where it is NULL. |
| POST | `/backfill-metadata` | Admin | Fill null `introduced_date`, `chamber`, `congress_url` by re-fetching bill detail. |
| POST | `/backfill-citations` | Admin | Delete pre-citation briefs and re-queue LLM using stored document text. |
| POST | `/backfill-labels` | Admin | Classify existing cited brief points as `cited_fact` or `inference` in-place — one compact LLM call per brief, no re-generation. |
| POST | `/resume-analysis` | Admin | Re-queue LLM for docs with no brief; re-queue doc fetch for bills with no doc. |
| POST | `/bills/{bill_id}/reprocess` | Admin | Queue document + action fetches for a specific bill (debugging). |
| GET | `/task-status/{task_id}` | Admin | Celery task status and result. |
### `/api/health`
| Method | Path | Description |
|---|---|---|
| GET | `/` | Simple health check `{status: "ok", timestamp}`. |
| GET | `/detailed` | Tests PostgreSQL + Redis. Returns per-service status. |
---
## Celery Workers & Pipeline
**Celery app name:** `pocketveto`
**Broker / Backend:** Redis
### Queue Routing
| Queue | Workers | Tasks |
|---|---|---|
| `polling` | worker | `app.workers.congress_poller.*`, `app.workers.notification_dispatcher.*` |
| `documents` | worker | `fetch_bill_documents` |
| `llm` | worker | `process_document_with_llm` |
| `news` | worker | `app.workers.news_fetcher.*`, `app.workers.trend_scorer.*`, `app.workers.member_interest.*` |
**Worker settings:**
- `task_acks_late = True` — task removed from queue only after completion, not on pickup
- `worker_prefetch_multiplier = 1` — prevents workers from hoarding LLM tasks
- Serialization: JSON
### Beat Schedule (RedBeat, stored in Redis)
| Schedule | Task | When |
|---|---|---|
| Configurable (default 30 min) | `poll_congress_bills` | Continuous |
| Every 6 hours | `fetch_news_for_active_bills` | Ongoing |
| Daily 2 AM UTC | `calculate_all_trend_scores` | Nightly |
| Every 12 hours (at :30) | `fetch_news_for_active_members` | Ongoing |
| Daily 3 AM UTC | `calculate_all_member_trend_scores` | Nightly |
| Daily 4 AM UTC | `fetch_actions_for_active_bills` | Nightly |
| Every 5 minutes | `dispatch_notifications` | Continuous |
---
### Pipeline Flow
```
1. congress_poller.poll_congress_bills()
↳ Fetches bills updated since last poll (fromDateTime param)
↳ Filters: only hr, s, hjres, sjres (legislation that can become law)
↳ First run: seeds from 60 days back
↳ New bills: fetches bill detail endpoint to get sponsor (list endpoint
has no sponsor data), upserts Member, sets bill.sponsor_id
↳ New bills → fetch_bill_documents.delay(bill_id)
↳ Updated bills → fetch_bill_documents.delay(bill_id) if changed
↳ Updated bills → emit bill_updated notification if action is a milestone:
- "progress" tier: passed/failed, signed/vetoed, enacted, markup, conference,
reported from committee, placed on calendar, cloture, roll call
→ all follow types (bill, sponsor, topic) receive notification
- "referral" tier: referred to committee
→ pocket_veto and pocket_boost only; neutral follows silently skip
2. document_fetcher.fetch_bill_documents(bill_id)
↳ Gets text versions from Congress.gov (XML preferred, falls back to HTML/PDF)
↳ Fetches raw text from GovInfo
↳ Idempotent: skips if doc_version already stored
↳ Stores BillDocument with govinfo_url + raw_text
↳ → process_document_with_llm.delay(document_id)
3. llm_processor.process_document_with_llm(document_id)
↳ Rate limited: 10/minute
↳ Idempotent: skips if brief exists for document
↳ Determines type:
- No prior brief → "full" brief
- Prior brief exists → "amendment" brief (diff vs previous)
↳ Calls configured LLM provider
↳ Stores BillBrief with cited key_points and risks
↳ → fetch_news_for_bill.delay(bill_id)
4. news_fetcher.fetch_news_for_bill(bill_id)
↳ Queries NewsAPI + Google News RSS using bill title/number
↳ Deduplicates by (bill_id, url) — same article can appear for multiple bills
↳ Stores NewsArticle records
5. trend_scorer.calculate_all_trend_scores() [nightly]
↳ Bills active in last 90 days
↳ Skips bills already scored today
↳ Fetches: NewsAPI count + Google News RSS count + Google Trends score
↳ Calculates composite_score (0100)
↳ Stores TrendScore record
Member interest pipeline (independent of bill pipeline):
6. member_interest.fetch_member_news(bioguide_id) [on first profile view + every 12h]
↳ Triggered on first member profile view (non-blocking via .delay())
↳ Queries NewsAPI + Google News RSS using member name + title
↳ Deduplicates by (member_id, url)
↳ Stores MemberNewsArticle records
7. member_interest.calculate_member_trend_score(bioguide_id) [on first profile view + nightly]
↳ Triggered on first member profile view (non-blocking via .delay())
↳ Only runs if member detail has been fetched (gate: detail_fetched IS NOT NULL)
↳ Fetches: NewsAPI count + Google News RSS count + Google Trends score
↳ Uses the same composite formula as bills
↳ Stores MemberTrendScore record
```
---
## LLM Service
**File:** `backend/app/services/llm_service.py`
### Provider Factory
```python
get_llm_provider() LLMProvider
```
Reads `LLM_PROVIDER` from AppSetting (DB) then env var. Instantiates the matching provider class.
| Provider | Class | Key Setting |
|---|---|---|
| `openai` | `OpenAIProvider` | `OPENAI_API_KEY`, `OPENAI_MODEL` |
| `anthropic` | `AnthropicProvider` | `ANTHROPIC_API_KEY`, `ANTHROPIC_MODEL` |
| `gemini` | `GeminiProvider` | `GEMINI_API_KEY`, `GEMINI_MODEL` |
| `ollama` | `OllamaProvider` | `OLLAMA_BASE_URL`, `OLLAMA_MODEL` |
All providers implement:
```python
generate_brief(doc_text, bill_metadata) ReverseBrief
generate_amendment_brief(new_text, prev_text, bill_metadata) ReverseBrief
generate_text(prompt) str # free-form text, used by draft letter generator
```
### ReverseBrief Dataclass
```python
@dataclass
class ReverseBrief:
summary: str
key_points: list[dict] # [{text, citation, quote, label}]
risks: list[dict] # [{text, citation, quote, label}]
deadlines: list[dict] # [{date, description}]
topic_tags: list[str]
llm_provider: str
llm_model: str
```
### Prompt Design
**Full brief prompt** instructs the LLM to produce:
```json
{
"summary": "2-4 paragraph plain-language explanation",
"key_points": [
{
"text": "claim",
"citation": "Section X(y)",
"quote": "verbatim excerpt ≤80 words",
"label": "cited_fact"
}
],
"risks": [
{
"text": "concern",
"citation": "Section X(y)",
"quote": "verbatim excerpt ≤80 words",
"label": "inference"
}
],
"deadlines": [{"date": "YYYY-MM-DD or null", "description": "..."}],
"topic_tags": ["healthcare", "taxation"]
}
```
`label` classification rules baked into the system prompt: `"cited_fact"` if the claim is explicitly stated in the quoted text; `"inference"` if it is an analytical interpretation, projection, or implication not literally stated. The UI shows a neutral "Inferred" badge on inference items only (cited_fact is the clean default).
**Amendment brief prompt** focuses on what changed between document versions.
**Smart truncation:** Bills exceeding the token budget are trimmed — 75% of budget from the start (preamble/purpose), 25% from the end (enforcement/effective dates), with an omission notice in the middle.
**Token budgets:**
- OpenAI / Anthropic / Gemini: 6,000 tokens
- Ollama: 3,000 tokens (local models have smaller context windows)
---
## Frontend
**Framework:** Next.js 15 (App Router), TypeScript, Tailwind CSS
**State:** Zustand (auth), TanStack Query (server state)
**HTTP:** Axios with JWT interceptor
### Pages
| Route | Description |
|---|---|
| `/` | Dashboard — personalized feed + trending bills |
| `/bills` | Browse all bills with search, chamber/topic filters, pagination |
| `/bills/[id]` | Bill detail — brief with § citations, action timeline, news, trend chart |
| `/members` | Browse members of Congress, filter by chamber/party/state |
| `/members/[id]` | Member profile — bio, contact info, leadership roles, service history, sponsored bills, public interest trend chart, recent news |
| `/following` | User's followed bills, members, and topics |
| `/topics` | Browse and follow policy topics |
| `/settings` | Admin panel (admin only) |
| `/login` | Email + password sign-in |
| `/register` | Account creation |
### Key Components
**`BriefPanel.tsx`**
Orchestrates AI brief display. If the latest brief is type `amendment`, shows an amber "What Changed" badge. Renders the latest brief via `AIBriefCard`. Below it, a collapsible "Version History" lists all older briefs; clicking one expands an inline `AIBriefCard`.
**`AIBriefCard.tsx`**
Renders the LLM brief. For cited items (new format), shows a `§ Section X(y)` chip next to each bullet. Clicking the chip expands an inline panel with:
- Blockquoted verbatim excerpt from the bill
- "View source →" link to GovInfo (opens in new tab)
- One chip open at a time per card
- Inference items show a neutral "Inferred" badge (analytical interpretation, not a literal quote)
- Old plain-string briefs render without chips (graceful backward compat)
**`ActionTimeline.tsx`**
Renders the legislative action history as a vertical timeline. Accepts optional `latestActionDate`/`latestActionText` fallback props — when `actions` is empty but a latest action exists (actions not yet fetched from Congress.gov), shows a single "latest known action" entry with a note that full history loads in the background.
**`MobileHeader.tsx`**
Top bar shown only on mobile (`md:hidden`). Displays the PocketVeto logo and a hamburger button that opens the slide-in drawer.
**`AuthGuard.tsx`**
Client component wrapping the entire app. Waits for Zustand hydration, then redirects unauthenticated users to `/login`. Public paths (`/login`, `/register`) bypass the guard. Implements the responsive shell: desktop sidebar always-visible (`hidden md:flex`), mobile drawer with backdrop overlay controlled by `drawerOpen` state.
**`Sidebar.tsx`**
Navigation with: Home, Bills, Members, Following, Topics, Settings (admin only). Shows current user email + logout button at the bottom. Accepts optional `onClose` prop — when provided (mobile drawer context), renders an X close button in the header and calls `onClose` on every nav link click.
**`DraftLetterPanel.tsx`**
Collapsible panel rendered below `BriefPanel` on the bill detail page (only when a brief exists). Lets users select up to 3 cited points from the brief, choose stance (YES/NO), tone (short/polite/firm), and optionally enter a ZIP code (not stored). Stance auto-populates from the user's follow mode (`pocket_boost` → YES, `pocket_veto` → NO); clears if they unfollow. Recipient (house/senate) is derived from the bill's chamber. Calls `POST /{bill_id}/draft-letter` and renders the plain-text draft in a readonly textarea with a copy-to-clipboard button.
**`BillCard.tsx`**
Compact bill preview showing bill ID, title, sponsor with party badge, latest action date, status, and a text availability indicator: `Brief` (green, analysis done) / `Pending` (amber, text retrieved but not yet analysed) / `No text` (muted, nothing published on Congress.gov).
**`TrendChart.tsx`**
Line chart of `composite_score` over time with tooltip breakdown of each data source.
### Utility Functions (`lib/utils.ts`)
```typescript
partyBadgeColor(party) Tailwind classes
"Republican" "bg-red-600 text-white"
"Democrat" "bg-blue-600 text-white"
other "bg-slate-500 text-white"
chamberBadgeColor(chamber) Tailwind badge classes
"Senate" amber/gold (bg-amber-100 text-amber-700 )
"House" slate/silver (bg-slate-100 text-slate-600 )
partyColor(party) text color class (used inline)
trendColor(score) color class based on score thresholds
billLabel(type, number) "H.R. 1234", "S. 567", etc.
formatDate(date) "Feb 28, 2026"
```
### Auth Store (`stores/authStore.ts`)
```typescript
interface AuthState {
token: string | null
user: { id: number; email: string; is_admin: boolean } | null
setAuth(token, user): void
logout(): void
}
// Persisted to localStorage as "pocketveto-auth"
```
---
## Authentication
- **Algorithm:** HS256 JWT, 7-day expiry
- **Storage:** Zustand store persisted to `localStorage` key `pocketveto-auth`
- **Injection:** Axios request interceptor reads from localStorage and adds `Authorization: Bearer <token>` to every request
- **First user:** The first account registered automatically receives `is_admin = true`
- **Admin role:** Required for PUT/POST `/api/settings`, all `/api/admin/*` endpoints, and viewing the Settings page in the UI
- **No email verification:** Accounts are active immediately on registration
- **Public endpoints:** `/api/bills`, `/api/members`, `/api/search`, `/api/health` — no auth required
---
## Key Architectural Patterns
### Idempotent Workers
Every Celery task checks for existing records before processing. Combined with `task_acks_late=True`, this means:
- Tasks can be retried without creating duplicates
- Worker crashes don't lose work (task stays in queue until acknowledged)
### Incremental Polling
The Congress.gov poller uses `fromDateTime` to fetch only recently updated bills, tracking the last poll timestamp in `app_settings`. On first run it seeds 60 days back to avoid processing thousands of old bills.
### Bill Type Filtering
Only tracks legislation that can become law:
- `hr` (House Resolution → Bill)
- `s` (Senate Bill)
- `hjres` (House Joint Resolution)
- `sjres` (Senate Joint Resolution)
Excluded (procedural, cannot become law): `hres`, `sres`, `hconres`, `sconres`
### Queue Specialization
Separate queues prevent a flood of LLM tasks from blocking time-sensitive polling tasks. Worker prefetch of 1 prevents any single worker from hoarding slow LLM jobs.
### LLM Provider Abstraction
All LLM providers implement the same interface. Switching providers is a single admin setting change — no code changes, no restart required (the factory reads from DB on each task invocation).
### JSONB for Flexible Brief Storage
`key_points`, `risks`, `deadlines`, `topic_tags` are stored as JSONB. This means schema changes (adding `citation`/`quote` in v0.2.0, adding `label` in v0.6.0) required no migrations — only the LLM prompt and application code changed. Old string-format briefs, cited-object briefs without labels, and fully-labelled briefs all coexist in the same column and render correctly at each fidelity level.
### Redis-backed Beat Schedule (RedBeat)
The Celery Beat schedule is stored in Redis rather than in memory. This means the beat scheduler can restart without losing schedule state or double-firing tasks.
### Docker DNS Re-resolution
Nginx uses `resolver 127.0.0.11 valid=10s` (Docker's internal DNS) so upstream container IPs are refreshed every 10 seconds. Without this, nginx caches the IP at startup and returns 502 errors after any container is recreated.
---
## Feature History
### v0.1.0 — Foundation
- Docker Compose stack: PostgreSQL, Redis, FastAPI, Celery, Next.js, Nginx
- Congress.gov API integration: bill polling, member sync
- GovInfo document fetching with intelligent truncation
- Multi-provider LLM service (OpenAI, Anthropic, Gemini, Ollama)
- AI brief generation: summary, key points, risks, deadlines, topic tags
- Amendment-aware processing: diffs new bill versions against prior
- NewsAPI + Google News RSS article correlation
- Google Trends (pytrends) scoring
- Composite trend score (0100) with weighted formula
- Full-text bill search (PostgreSQL tsvector)
- Member of Congress browsing
- Global follows (bill / member / topic)
- Personalized dashboard feed
- Admin settings page (LLM provider selection, data source status)
- Manual Celery task triggers from UI
- Bill type filtering: only legislation that can become law
- 60-day seed window on fresh install
**Multi-User Auth (added to v0.1.0):**
- Email + password registration/login (JWT, bcrypt)
- Per-user follow scoping
- Admin role (first user = admin)
- Admin user management: list, delete, promote/demote
- AuthGuard with login/register pages
- Analysis status dashboard (auto-refresh every 30s)
### v0.3.0 — Member Profiles & Mobile UI
**Member Interest Tracking:**
- `member_trend_scores` and `member_news_articles` tables (migration 0008)
- `member_interest` Celery worker: `fetch_member_news`, `calculate_member_trend_score`, `fetch_news_for_active_members`, `calculate_all_member_trend_scores`
- Member interest scoring uses the identical composite formula as bills (NewsAPI + GNews + pytrends)
- New beat schedules: member news every 12h, member trend scores nightly at 3 AM UTC
- Lazy enrichment: on first profile view, bio is fetched from Congress.gov detail API and interest scoring is queued non-blocking
- Member detail fields added: `congress_url`, `birth_year`, `address`, `phone`, `terms_json`, `leadership_json`, `sponsored_count`, `cosponsored_count`, `detail_fetched` (migration 0007)
- New API endpoints: `GET /api/members/{id}/trend` and `GET /api/members/{id}/news`
- Member detail page redesigned: photo, bio header with party/state/district/birth year, contact info (address, phone, website, congress.gov), current leadership badges, trend chart ("Public Interest"), news panel, legislation stats (sponsored/cosponsored counts), full service history timeline, all leadership roles history
**News Deduplication Fix:**
- `news_articles.url` changed from globally unique to per-bill unique `(bill_id, url)` (migration 0009)
- The same article can now appear in multiple bills' news panels
- `fetch_news_for_bill` now fetches from both NewsAPI and Google News RSS (previously GNews was volume-signal only)
**Mobile UI:**
- `MobileHeader.tsx` — hamburger + logo top bar, hidden on desktop (`md:hidden`)
- `AuthGuard.tsx` — responsive shell: desktop sidebar always-on, mobile slide-in drawer with backdrop
- `Sidebar.tsx``onClose` prop for drawer mode (X button + close on nav click)
- Dashboard grid: `grid-cols-1 md:grid-cols-3` (single column on mobile)
- Members page: `grid-cols-1 sm:grid-cols-2` (single column on mobile, two on tablet+)
- Topics page: `grid-cols-1 sm:grid-cols-2`
### v0.4.0 — Notifications, Admin Health Panel, Bill Action Pipeline
**Notifications (Phase 1 complete):**
- `notifications` table — stores events per user (new_brief, bill_updated, new_action)
- ntfy dispatch — Celery task POSTs to user's ntfy topic URL (self-hosted or ntfy.sh); optional auth token
- RSS feed — tokenized per-user XML feed at `/api/notifications/feed/{token}.xml`
- `dispatch_notifications` beat task — runs every 5 minutes, fans out unsent events to enabled channels
- Notification settings UI — ntfy topic URL, auth token, enable/disable, RSS URL with copy button
**Bill Action Pipeline:**
- `fetch_bill_actions` Celery task — fetches full legislative history from Congress.gov, idempotent on `(bill_id, action_date, action_text)`, updates `Bill.actions_fetched_at`
- `fetch_actions_for_active_bills` nightly batch — queues action fetches for bills active in last 30 days
- `backfill_all_bill_actions` — one-time task to fetch actions for all bills with `actions_fetched_at IS NULL`
- Beat schedule entry at 4 AM UTC
- `ActionTimeline` updated: shows full history when fetched; falls back to `latest_action_date`/`latest_action_text` with "latest known action" label when history not yet loaded
**"What Changed" — BriefPanel:**
- New `BriefPanel.tsx` component wrapping `AIBriefCard`
- When latest brief is type `amendment`: shows amber "What Changed" badge row + date
- Collapsible "Version History" section listing older briefs (date, type badge, truncated summary)
- Clicking a history row expands an inline `AIBriefCard` for that version
**LLM Provider Improvements:**
- Live model picker — `GET /api/settings/llm-models?provider=X` fetches available models from each provider's API (OpenAI SDK, Anthropic REST, Gemini SDK, Ollama tags endpoint)
- DB overrides now fully propagated: `get_llm_provider(provider, model)` accepts explicit params; all call sites read from `app_settings`
- Default Gemini model updated: `gemini-1.5-pro` (deprecated) → `gemini-2.0-flash`
- Test connection replaced with lightweight ping (max_tokens=20, 3-word prompt) instead of full brief generation
**Admin Panel Overhaul:**
- Bill Pipeline section: progress bar + breakdown table (total, text published, no text yet, AI briefs, pending LLM, uncited)
- External API Health: Run Tests button, parallel health checks for Congress.gov / GovInfo / NewsAPI / Google News RSS with latency display
- Manual Controls redesigned as health panel: each action has a status dot (green/red/gray), description, contextual count badge (e.g. "⚠ 12 bills missing metadata"), and Run button
- Task status polling: after triggering a task, button shows spinning icon; polls `/api/admin/task-status/{id}` every 5s; shows task ID prefix + completion/failure state
- New stat fields: `bills_missing_sponsor`, `bills_missing_metadata`, `bills_missing_actions`, `pending_llm`, `no_text_bills`
- New admin tasks: Backfill Dates & Links, Backfill All Action Histories, Resume Analysis
**Chamber Color Badges:**
- `chamberBadgeColor(chamber)` utility: amber/gold for Senate, slate/silver for House
- Applied everywhere chamber is displayed: BillCard, bill detail header
**Bill Detail Page:**
- "No bill text published" state — shown when `has_document=false` and no briefs; includes bill label, date, and congress.gov link
- `has_document` field added to `BillDetailSchema` and `BillDetail` TypeScript type
- `introduced_date` shown conditionally (not rendered when null, preventing "Introduced: —")
- Admin reprocess endpoint: `POST /api/admin/bills/{bill_id}/reprocess`
### v0.5.0 — Follow Modes, Public Browsing & Draft Letter Generator
**Follow Modes:**
- `follow_mode` column on `follows` table: `neutral | pocket_veto | pocket_boost`
- `FollowButton` replaced with a mode-selector dropdown (shield/zap/heart icons, descriptions for each mode)
- `pocket_veto` — alert only on advancement milestones; `pocket_boost` — all changes + action prompts
- Mode stored per-follow; respected by notification dispatcher
**Public Browsing:**
- Unauthenticated guests can browse bills, members, topics, and the trending dashboard
- `AuthModal` gates follow and other interactive actions
- Sidebar and nav adapt to guest state (no email/logout shown)
- All public endpoints already auth-free; guard refactored to allow guest reads
**Draft Constituent Letter Generator (email_gen):**
- `DraftLetterPanel.tsx` — collapsible UI below `BriefPanel` for bills with a brief
- User selects up to 3 cited points from the brief, picks stance (YES/NO), tone, optional ZIP (not stored)
- Stance pre-fills from follow mode; clears on unfollow (ref-tracked, not effect-guarded)
- Recipient derived from bill chamber — no dropdown needed
- `POST /api/bills/{bill_id}/draft-letter` endpoint: reads LLM provider/model from `AppSetting` (respects Settings page), wraps LLM errors with human-readable messages (quota, rate limit, auth)
- `generate_text(prompt) → str` added to `LLMProvider` ABC and all four providers
**Bill Text Status Indicators:**
- `has_document` field added to `BillSchema` (list endpoint) via a single batch `SELECT DISTINCT` — no per-card queries
- `BillCard` shows: `Brief` (green) / `Pending` (amber) / `No text` (muted) based on brief + document state
### v0.6.0 — Phase 2: Change-driven Alerts & Fact/Inference Labeling
**Change-driven Alerts:**
- `notification_utils.py` milestone keyword list expanded: added `"markup"` (markup sessions) and `"conference"` (conference committee)
- New `is_referral_action()` classifier for committee referrals (`"referred to"`)
- Two-tier notification system: `milestone_tier` field in `NotificationEvent.payload`
- `"progress"` — high-signal milestones (passed, signed, markup, etc.): all follow types notified
- `"referral"` — committee referral: pocket_veto and pocket_boost notified; neutral silently dropped
- **Topic followers now receive `bill_updated` milestone notifications** — previously they only received `new_document`/`new_amendment` events. Fixed by querying the bill's latest brief for `topic_tags` inside `_update_bill_if_changed()`
- All three follow types (bill, sponsor, topic) covered for both tiers
**Fact vs Inference Labeling:**
- `label: "cited_fact" | "inference"` added to every cited key_point and risk in the LLM JSON schema
- System prompt updated for all four providers (OpenAI, Anthropic, Gemini, Ollama)
- UI: neutral "Inferred" badge shown next to inference items in `AIBriefCard`; cited_fact items render cleanly without a badge
- `backfill_brief_labels` Celery task: classifies existing cited points in-place — one compact LLM call per brief (all points batched), updates JSONB with `flag_modified`, no brief re-generation
- `POST /api/admin/backfill-labels` endpoint + "Backfill Fact/Inference Labels" button in Admin panel
- `unlabeled_briefs` counter added to `/api/admin/stats` and pipeline breakdown table
**Admin Panel Cleanup:**
- Manual Controls split into two sections: always-visible recurring controls (Poll, Members, Trends, Actions, Resume) and a collapsible **Maintenance** section for one-time backfill tasks
- Maintenance section header shows "⚠ action needed" when any backfill has a non-zero count
### v0.2.2 — Sponsor Linking & Search Fixes
- **Root cause fixed:** Congress.gov list API does not return sponsor data — only the detail endpoint does. Poller now calls the detail endpoint for each new bill to get the sponsor and populate `bill.sponsor_id`
- **Backfill task:** `backfill_sponsor_ids` Celery task + `/api/admin/backfill-sponsors` endpoint + "Backfill Sponsors" button in Admin UI — fixes existing bills with `NULL` sponsor_id (~10 req/sec, ~3 min for 1,600 bills)
- **Member name search:** members are stored as "Last, First" in the `name` column; search now also matches "First Last" order using PostgreSQL `split_part()` — applied to both the Members page and global search
- **Search spaces:** removed `.trim()` on search `onChange` handlers in Members and Bills pages that was eating spaces as you typed
- **Member bills 500 error:** `get_member_bills` endpoint now eagerly loads `Bill.sponsor` via `selectinload` to prevent Pydantic MissingGreenlet error during serialization
### v0.2.0 — Citations
- **Per-claim citations on AI briefs:** every key point and risk includes:
- `citation` — section reference (e.g., "Section 301(a)(2)")
- `quote` — verbatim excerpt ≤80 words from that section
- `§` citation chip UI on each bullet — click to expand quote + GovInfo source link
- `govinfo_url` stored on `BillBrief` for direct frontend access
- Old briefs (plain strings) render without chips — backward compatible
- Migration 0006: `govinfo_url` column on `bill_briefs`
- Party badges redesigned: solid `red-600` / `blue-600` / `slate-500` with white text, readable in both light and dark mode
- Tailwind content scan extended to include `lib/` directory
- Nginx DNS resolver fix: prevents stale-IP 502s after container restarts
---
## Deployment
### First Deploy
```bash
cp .env.example .env
# Edit .env — add API keys, generate JWT_SECRET_KEY
docker compose up --build -d
```
Migrations run automatically. Navigate to the app, register the first account (it becomes admin).
### Updating
```bash
git pull origin main
docker compose up --build -d
docker compose exec nginx nginx -s reload # if nginx wasn't recreated
```
### Useful Commands
```bash
# Check all service status
docker compose ps
# View logs
docker compose logs api --tail=50
docker compose logs worker --tail=50
# Force a bill poll now
# → Admin page → Manual Controls → Trigger Poll
# Check DB column layout
docker compose exec postgres psql -U congress -d pocketveto -c "\d bill_briefs"
# Tail live worker output
docker compose logs -f worker
# Restart a specific service
docker compose restart worker
```
### Bill Regeneration (Optional)
Existing briefs generated before v0.2.0 use plain strings (no citations). To regenerate with citations:
1. Delete existing `bill_briefs` rows (keeps `bill_documents` intact)
2. Re-queue all documents via a one-off script similar to `queue_docs.py`
3. Worker will regenerate using the new cited prompt at 10/minute
4. ~1,000 briefs ≈ 2 hours
This is **optional** — old string briefs render correctly in the UI with no citation chips.