Documents sponsor linking fix, backfill task, member name search split_part approach, search spaces fix, and eager loading fix. Authored-By: Jack Levy
807 lines
28 KiB
Markdown
807 lines
28 KiB
Markdown
# PocketVeto — Architecture & Feature Documentation
|
||
|
||
> **App brand:** PocketVeto
|
||
> **Repo:** civicstack
|
||
> **Purpose:** Citizen-grade US Congress monitoring with AI-powered bill analysis, per-claim citations, and personalized tracking.
|
||
|
||
---
|
||
|
||
## Table of Contents
|
||
|
||
1. [Overview](#overview)
|
||
2. [Tech Stack](#tech-stack)
|
||
3. [Infrastructure & Docker](#infrastructure--docker)
|
||
4. [Configuration & Environment](#configuration--environment)
|
||
5. [Database Schema](#database-schema)
|
||
6. [Alembic Migrations](#alembic-migrations)
|
||
7. [Backend API](#backend-api)
|
||
8. [Celery Workers & Pipeline](#celery-workers--pipeline)
|
||
9. [LLM Service](#llm-service)
|
||
10. [Frontend](#frontend)
|
||
11. [Authentication](#authentication)
|
||
12. [Key Architectural Patterns](#key-architectural-patterns)
|
||
13. [Feature History](#feature-history)
|
||
14. [Deployment](#deployment)
|
||
|
||
---
|
||
|
||
## Overview
|
||
|
||
PocketVeto is a self-hosted, full-stack application that automatically tracks US Congress legislation, fetches bill text, generates AI summaries with per-claim source citations, correlates bills with news and Google Trends, and presents everything through a personalized dashboard. Users follow bills, members of Congress, and policy topics; the system surfaces relevant activity in their feed.
|
||
|
||
```
|
||
Congress.gov API → Poller → DB → Document Fetcher → GovInfo
|
||
↓
|
||
LLM Processor
|
||
↓
|
||
BillBrief
|
||
(cited AI brief)
|
||
↓
|
||
News Fetcher + Trend Scorer
|
||
↓
|
||
Next.js Frontend
|
||
```
|
||
|
||
---
|
||
|
||
## Tech Stack
|
||
|
||
| Layer | Technology |
|
||
|---|---|
|
||
| Reverse Proxy | Nginx (alpine) |
|
||
| Backend API | FastAPI + SQLAlchemy (async) |
|
||
| Task Queue | Celery 5 + Redis |
|
||
| Task Scheduler | Celery Beat + RedBeat (Redis-backed) |
|
||
| Database | PostgreSQL 16 |
|
||
| Cache / Broker | Redis 7 |
|
||
| Frontend | Next.js 15, React, Tailwind CSS, TypeScript |
|
||
| Auth | JWT (python-jose) + bcrypt (passlib) |
|
||
| LLM | Multi-provider factory: OpenAI, Anthropic, Gemini, Ollama |
|
||
| Bill Metadata | Congress.gov API (api.data.gov key) |
|
||
| Bill Text | GovInfo API (same api.data.gov key) |
|
||
| News | NewsAPI.org (100 req/day free tier) |
|
||
| Trends | Google Trends via pytrends |
|
||
|
||
---
|
||
|
||
## Infrastructure & Docker
|
||
|
||
### Services (`docker-compose.yml`)
|
||
|
||
```
|
||
postgres:16-alpine
|
||
DB: pocketveto
|
||
User: congress
|
||
Port: 5432 (internal)
|
||
|
||
redis:7-alpine
|
||
Port: 6379 (internal)
|
||
Role: Celery broker, result backend, RedBeat schedule store
|
||
|
||
api (civicstack-api image)
|
||
Port: 8000 (internal)
|
||
Command: alembic upgrade head && uvicorn app.main:app --host 0.0.0.0 --port 8000
|
||
Depends: postgres (healthy), redis (healthy)
|
||
|
||
worker (civicstack-worker image)
|
||
Command: celery -A app.workers.celery_app worker -Q polling,documents,llm,news -c 4
|
||
Depends: postgres (healthy), redis (healthy)
|
||
|
||
beat (civicstack-beat image)
|
||
Command: celery -A app.workers.celery_app beat -S redbeat.RedBeatScheduler
|
||
Depends: redis (healthy)
|
||
|
||
frontend (civicstack-frontend image)
|
||
Port: 3000 (internal)
|
||
Build: Next.js standalone output
|
||
|
||
nginx:alpine
|
||
Port: 80 → public
|
||
Routes: /api/* → api:8000 | /* → frontend:3000
|
||
```
|
||
|
||
### Nginx Config (`nginx/nginx.conf`)
|
||
|
||
- `resolver 127.0.0.11 valid=10s` — re-resolves Docker DNS after container restarts (prevents stale-IP 502s on redeploy)
|
||
- `/api/` → FastAPI, 120s read timeout
|
||
- `/_next/static/` → frontend with 1-day cache header
|
||
- `/` → frontend with WebSocket upgrade support
|
||
|
||
---
|
||
|
||
## Configuration & Environment
|
||
|
||
Copy `.env.example` → `.env` and fill in keys before first run.
|
||
|
||
```env
|
||
# Network
|
||
LOCAL_URL=http://localhost
|
||
PUBLIC_URL= # optional, e.g. https://yourapp.com
|
||
|
||
# Auth
|
||
JWT_SECRET_KEY= # python -c "import secrets; print(secrets.token_hex(32))"
|
||
|
||
# PostgreSQL
|
||
POSTGRES_USER=congress
|
||
POSTGRES_PASSWORD=congress
|
||
POSTGRES_DB=pocketveto
|
||
|
||
# Redis
|
||
REDIS_URL=redis://redis:6379/0
|
||
|
||
# Congress.gov + GovInfo (shared key from api.data.gov)
|
||
DATA_GOV_API_KEY=
|
||
CONGRESS_POLL_INTERVAL_MINUTES=30
|
||
|
||
# LLM — pick one provider
|
||
LLM_PROVIDER=openai # openai | anthropic | gemini | ollama
|
||
OPENAI_API_KEY=
|
||
OPENAI_MODEL=gpt-4o
|
||
ANTHROPIC_API_KEY=
|
||
ANTHROPIC_MODEL=claude-opus-4-6
|
||
GEMINI_API_KEY=
|
||
GEMINI_MODEL=gemini-1.5-pro
|
||
OLLAMA_BASE_URL=http://host.docker.internal:11434
|
||
OLLAMA_MODEL=llama3.1
|
||
|
||
# News & Trends
|
||
NEWSAPI_KEY=
|
||
PYTRENDS_ENABLED=true
|
||
```
|
||
|
||
**Runtime overrides:** LLM provider/model and poll interval can be changed live through the Admin page — stored in the `app_settings` table and take precedence over env vars.
|
||
|
||
---
|
||
|
||
## Database Schema
|
||
|
||
### `bills`
|
||
Primary key: `bill_id` — natural key in format `{congress}-{type}-{number}` (e.g. `119-hr-1234`).
|
||
|
||
| Column | Type | Notes |
|
||
|---|---|---|
|
||
| bill_id | varchar (PK) | |
|
||
| congress_number | int | |
|
||
| bill_type | varchar | `hr`, `s`, `hjres`, `sjres` (tracked); `hres`, `sres`, `hconres`, `sconres` (not tracked) |
|
||
| bill_number | int | |
|
||
| title | text | |
|
||
| short_title | text | |
|
||
| sponsor_id | varchar (FK → members) | bioguide_id |
|
||
| introduced_date | date | |
|
||
| latest_action_date | date | |
|
||
| latest_action_text | text | |
|
||
| status | varchar | |
|
||
| chamber | varchar | House / Senate |
|
||
| congress_url | varchar | congress.gov link |
|
||
| govtrack_url | varchar | |
|
||
| last_checked_at | timestamptz | |
|
||
| actions_fetched_at | timestamptz | |
|
||
| created_at / updated_at | timestamptz | |
|
||
|
||
Indexes: `congress_number`, `latest_action_date`, `introduced_date`, `chamber`, `sponsor_id`
|
||
|
||
---
|
||
|
||
### `bill_actions`
|
||
|
||
| Column | Type | Notes |
|
||
|---|---|---|
|
||
| id | int (PK) | |
|
||
| bill_id | varchar (FK → bills, CASCADE) | |
|
||
| action_date | date | |
|
||
| action_text | text | |
|
||
| action_type | varchar | |
|
||
| chamber | varchar | |
|
||
| created_at | timestamptz | |
|
||
|
||
---
|
||
|
||
### `bill_documents`
|
||
Stores fetched bill text versions from GovInfo.
|
||
|
||
| Column | Type | Notes |
|
||
|---|---|---|
|
||
| id | int (PK) | |
|
||
| bill_id | varchar (FK → bills, CASCADE) | |
|
||
| doc_type | varchar | `bill_text`, `committee_report`, `amendment` |
|
||
| doc_version | varchar | Introduced, Enrolled, etc. |
|
||
| govinfo_url | varchar | Source URL on GovInfo |
|
||
| raw_text | text | Full extracted text |
|
||
| fetched_at | timestamptz | |
|
||
| created_at | timestamptz | |
|
||
|
||
---
|
||
|
||
### `bill_briefs`
|
||
AI-generated analysis. `key_points` and `risks` are JSONB arrays of cited objects.
|
||
|
||
| Column | Type | Notes |
|
||
|---|---|---|
|
||
| id | int (PK) | |
|
||
| bill_id | varchar (FK → bills, CASCADE) | |
|
||
| document_id | int (FK → bill_documents, SET NULL) | |
|
||
| brief_type | varchar | `full` (first version) or `amendment` (diff from prior version) |
|
||
| summary | text | 2-4 paragraph plain-language summary |
|
||
| key_points | jsonb | `[{text, citation, quote}]` |
|
||
| risks | jsonb | `[{text, citation, quote}]` |
|
||
| deadlines | jsonb | `[{date, description}]` |
|
||
| topic_tags | jsonb | `["healthcare", "taxation", ...]` |
|
||
| llm_provider | varchar | Which provider generated this brief |
|
||
| llm_model | varchar | Specific model name |
|
||
| govinfo_url | varchar (nullable) | Source document URL (from bill_documents) |
|
||
| created_at | timestamptz | |
|
||
|
||
Indexes: `bill_id`, `topic_tags` (GIN for JSONB containment queries)
|
||
|
||
**Citation structure** — each `key_points`/`risks` item:
|
||
```json
|
||
{
|
||
"text": "The bill allocates $50B for defense",
|
||
"citation": "Section 301(a)(2)",
|
||
"quote": "There is hereby appropriated for fiscal year 2026, $50,000,000,000 for the Department of Defense..."
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### `members`
|
||
Primary key: `bioguide_id` (Congress.gov canonical identifier).
|
||
|
||
| Column | Type |
|
||
|---|---|
|
||
| bioguide_id | varchar (PK) |
|
||
| name | varchar |
|
||
| first_name / last_name | varchar |
|
||
| party | varchar |
|
||
| state | varchar |
|
||
| chamber | varchar |
|
||
| district | varchar (nullable, House only) |
|
||
| photo_url | varchar (nullable) |
|
||
| created_at / updated_at | timestamptz |
|
||
|
||
---
|
||
|
||
### `users`
|
||
|
||
| Column | Type | Notes |
|
||
|---|---|---|
|
||
| id | int (PK) | |
|
||
| email | varchar (unique) | |
|
||
| hashed_password | varchar | bcrypt |
|
||
| is_admin | bool | First registered user = true |
|
||
| notification_prefs | jsonb | Future: ntfy, Telegram, RSS config |
|
||
| created_at | timestamptz | |
|
||
|
||
---
|
||
|
||
### `follows`
|
||
|
||
| Column | Type | Notes |
|
||
|---|---|---|
|
||
| id | int (PK) | |
|
||
| user_id | int (FK → users, CASCADE) | |
|
||
| follow_type | varchar | `bill`, `member`, `topic` |
|
||
| follow_value | varchar | bill_id, bioguide_id, or topic name |
|
||
| created_at | timestamptz | |
|
||
|
||
Unique constraint: `(user_id, follow_type, follow_value)`
|
||
|
||
---
|
||
|
||
### `news_articles`
|
||
|
||
| Column | Type | Notes |
|
||
|---|---|---|
|
||
| id | int (PK) | |
|
||
| bill_id | varchar (FK → bills, CASCADE) | |
|
||
| source | varchar | News outlet |
|
||
| headline | varchar | |
|
||
| url | varchar (unique) | Deduplication key |
|
||
| published_at | timestamptz | |
|
||
| relevance_score | float | Default 1.0 |
|
||
| created_at | timestamptz | |
|
||
|
||
---
|
||
|
||
### `trend_scores`
|
||
One record per bill per day.
|
||
|
||
| Column | Type | Notes |
|
||
|---|---|---|
|
||
| id | int (PK) | |
|
||
| bill_id | varchar (FK → bills, CASCADE) | |
|
||
| score_date | date | |
|
||
| newsapi_count | int | Articles from NewsAPI (30-day window) |
|
||
| gnews_count | int | Articles from Google News RSS |
|
||
| gtrends_score | float | Google Trends interest 0–100 |
|
||
| composite_score | float | Weighted combination 0–100 |
|
||
| created_at | timestamptz | |
|
||
|
||
**Composite score formula:**
|
||
```
|
||
newsapi_pts = min(newsapi_count / 20, 1.0) × 40 # saturates at 20 articles
|
||
gnews_pts = min(gnews_count / 50, 1.0) × 30 # saturates at 50 articles
|
||
gtrends_pts = (gtrends_score / 100) × 30
|
||
composite = newsapi_pts + gnews_pts + gtrends_pts # range 0–100
|
||
```
|
||
|
||
---
|
||
|
||
### `committees` / `committee_bills`
|
||
|
||
| committees | committee_id (PK), name, chamber, type |
|
||
|---|---|
|
||
| committee_bills | id, committee_id (FK), bill_id (FK), referred_date |
|
||
|
||
---
|
||
|
||
### `app_settings`
|
||
Key-value store for runtime-configurable settings.
|
||
|
||
| Key | Purpose |
|
||
|---|---|
|
||
| `congress_last_polled_at` | ISO timestamp of last successful poll |
|
||
| `llm_provider` | Overrides `LLM_PROVIDER` env var |
|
||
| `llm_model` | Overrides provider default model |
|
||
| `congress_poll_interval_minutes` | Overrides env var |
|
||
|
||
---
|
||
|
||
## Alembic Migrations
|
||
|
||
| File | Description |
|
||
|---|---|
|
||
| `0001_initial_schema.py` | All initial tables |
|
||
| `0002_widen_chamber_party_columns.py` | Wider varchar for Bill.chamber, Member.party |
|
||
| `0003_widen_member_state_district.py` | Wider varchar for Member.state, Member.district |
|
||
| `0004_add_brief_type.py` | BillBrief.brief_type column (`full`/`amendment`) |
|
||
| `0005_add_users_and_user_follows.py` | users table + user_id FK on follows; drops global follows |
|
||
| `0006_add_brief_govinfo_url.py` | BillBrief.govinfo_url for frontend source links |
|
||
|
||
Migrations run automatically on API startup: `alembic upgrade head`.
|
||
|
||
---
|
||
|
||
## Backend API
|
||
|
||
Base URL: `/api`
|
||
Auth header: `Authorization: Bearer <jwt>`
|
||
|
||
### `/api/auth`
|
||
|
||
| Method | Path | Auth | Description |
|
||
|---|---|---|---|
|
||
| POST | `/register` | — | Create account. First user → admin. Returns token + user. |
|
||
| POST | `/login` | — | Returns token + user. |
|
||
| GET | `/me` | Required | Current user info. |
|
||
|
||
### `/api/bills`
|
||
|
||
| Method | Path | Auth | Description |
|
||
|---|---|---|---|
|
||
| GET | `/` | — | Paginated bill list. Query: `chamber`, `topic`, `sponsor_id`, `q`, `page`, `per_page`, `sort`. |
|
||
| GET | `/{bill_id}` | — | Full bill detail with sponsor, actions, briefs, news, trend scores. |
|
||
| GET | `/{bill_id}/actions` | — | Action timeline, newest first. |
|
||
| GET | `/{bill_id}/news` | — | Related news articles, limit 20. |
|
||
| GET | `/{bill_id}/trend` | — | Trend score history. Query: `days` (7–365, default 30). |
|
||
|
||
### `/api/members`
|
||
|
||
| Method | Path | Auth | Description |
|
||
|---|---|---|---|
|
||
| GET | `/` | — | Paginated members. Query: `chamber`, `party`, `state`, `q`, `page`, `per_page`. |
|
||
| GET | `/{bioguide_id}` | — | Member detail. |
|
||
| GET | `/{bioguide_id}/bills` | — | Member's sponsored bills, paginated. |
|
||
|
||
### `/api/follows`
|
||
|
||
| Method | Path | Auth | Description |
|
||
|---|---|---|---|
|
||
| GET | `/` | Required | Current user's follows. |
|
||
| POST | `/` | Required | Add follow `{follow_type, follow_value}`. Idempotent. |
|
||
| DELETE | `/{id}` | Required | Remove follow (ownership checked). |
|
||
|
||
### `/api/dashboard`
|
||
|
||
| Method | Path | Auth | Description |
|
||
|---|---|---|---|
|
||
| GET | `/` | Required | Personalized feed from followed bills/members/topics + trending. Returns `{feed, trending, follows}`. |
|
||
|
||
### `/api/search`
|
||
|
||
| Method | Path | Auth | Description |
|
||
|---|---|---|---|
|
||
| GET | `/` | — | Full-text search. Query: `q` (min 2 chars). Returns `{bills, members}`. |
|
||
|
||
### `/api/settings`
|
||
|
||
| Method | Path | Auth | Description |
|
||
|---|---|---|---|
|
||
| GET | `/` | Required | Current settings (DB overrides env). |
|
||
| PUT | `/` | Admin | Update `{key, value}`. Allowed keys: `llm_provider`, `llm_model`, `congress_poll_interval_minutes`. |
|
||
| POST | `/test-llm` | Admin | Test LLM connection. Returns `{status, provider, model, summary_preview}`. |
|
||
|
||
### `/api/admin`
|
||
|
||
| Method | Path | Auth | Description |
|
||
|---|---|---|---|
|
||
| GET | `/users` | Admin | All users with follow counts. |
|
||
| DELETE | `/users/{id}` | Admin | Delete user (cannot delete self). Cascades follows. |
|
||
| PATCH | `/users/{id}/toggle-admin` | Admin | Promote/demote admin status (cannot change self). |
|
||
| GET | `/stats` | Admin | Pipeline progress: total bills, docs fetched, briefs generated, remaining. |
|
||
| POST | `/trigger-poll` | Admin | Queue immediate Congress.gov poll. |
|
||
| POST | `/trigger-member-sync` | Admin | Queue member sync. |
|
||
| POST | `/trigger-trend-scores` | Admin | Queue trend score calculation. |
|
||
| POST | `/backfill-sponsors` | Admin | Queue one-off task to populate `sponsor_id` on all bills where it is NULL. |
|
||
| GET | `/task-status/{task_id}` | Admin | Celery task status and result. |
|
||
|
||
### `/api/health`
|
||
|
||
| Method | Path | Description |
|
||
|---|---|---|
|
||
| GET | `/` | Simple health check `{status: "ok", timestamp}`. |
|
||
| GET | `/detailed` | Tests PostgreSQL + Redis. Returns per-service status. |
|
||
|
||
---
|
||
|
||
## Celery Workers & Pipeline
|
||
|
||
**Celery app name:** `pocketveto`
|
||
**Broker / Backend:** Redis
|
||
|
||
### Queue Routing
|
||
|
||
| Queue | Workers | Tasks |
|
||
|---|---|---|
|
||
| `polling` | worker | `poll_congress_bills`, `sync_members` |
|
||
| `documents` | worker | `fetch_bill_documents` |
|
||
| `llm` | worker | `process_document_with_llm` |
|
||
| `news` | worker | `fetch_news_for_bill`, `fetch_news_for_active_bills`, `calculate_all_trend_scores` |
|
||
|
||
**Worker settings:**
|
||
- `task_acks_late = True` — task removed from queue only after completion, not on pickup
|
||
- `worker_prefetch_multiplier = 1` — prevents workers from hoarding LLM tasks
|
||
- Serialization: JSON
|
||
|
||
### Beat Schedule (RedBeat, stored in Redis)
|
||
|
||
| Schedule | Task | When |
|
||
|---|---|---|
|
||
| Configurable (default 30 min) | `poll_congress_bills` | Continuous |
|
||
| Every 6 hours | `fetch_news_for_active_bills` | Ongoing |
|
||
| Daily 2 AM UTC | `calculate_all_trend_scores` | Nightly |
|
||
|
||
---
|
||
|
||
### Pipeline Flow
|
||
|
||
```
|
||
1. congress_poller.poll_congress_bills()
|
||
↳ Fetches bills updated since last poll (fromDateTime param)
|
||
↳ Filters: only hr, s, hjres, sjres (legislation that can become law)
|
||
↳ First run: seeds from 60 days back
|
||
↳ New bills: fetches bill detail endpoint to get sponsor (list endpoint
|
||
has no sponsor data), upserts Member, sets bill.sponsor_id
|
||
↳ New bills → fetch_bill_documents.delay(bill_id)
|
||
↳ Updated bills → fetch_bill_documents.delay(bill_id) if changed
|
||
|
||
2. document_fetcher.fetch_bill_documents(bill_id)
|
||
↳ Gets text versions from Congress.gov (XML preferred, falls back to HTML/PDF)
|
||
↳ Fetches raw text from GovInfo
|
||
↳ Idempotent: skips if doc_version already stored
|
||
↳ Stores BillDocument with govinfo_url + raw_text
|
||
↳ → process_document_with_llm.delay(document_id)
|
||
|
||
3. llm_processor.process_document_with_llm(document_id)
|
||
↳ Rate limited: 10/minute
|
||
↳ Idempotent: skips if brief exists for document
|
||
↳ Determines type:
|
||
- No prior brief → "full" brief
|
||
- Prior brief exists → "amendment" brief (diff vs previous)
|
||
↳ Calls configured LLM provider
|
||
↳ Stores BillBrief with cited key_points and risks
|
||
↳ → fetch_news_for_bill.delay(bill_id)
|
||
|
||
4. news_fetcher.fetch_news_for_bill(bill_id)
|
||
↳ Queries NewsAPI using bill title + topic_tags
|
||
↳ Deduplicates by URL
|
||
↳ Stores NewsArticle records
|
||
|
||
5. trend_scorer.calculate_all_trend_scores() [nightly]
|
||
↳ Bills active in last 90 days
|
||
↳ Skips bills already scored today
|
||
↳ Fetches: NewsAPI count + Google News RSS count + Google Trends score
|
||
↳ Calculates composite_score (0–100)
|
||
↳ Stores TrendScore record
|
||
```
|
||
|
||
---
|
||
|
||
## LLM Service
|
||
|
||
**File:** `backend/app/services/llm_service.py`
|
||
|
||
### Provider Factory
|
||
|
||
```python
|
||
get_llm_provider() → LLMProvider
|
||
```
|
||
|
||
Reads `LLM_PROVIDER` from AppSetting (DB) then env var. Instantiates the matching provider class.
|
||
|
||
| Provider | Class | Key Setting |
|
||
|---|---|---|
|
||
| `openai` | `OpenAIProvider` | `OPENAI_API_KEY`, `OPENAI_MODEL` |
|
||
| `anthropic` | `AnthropicProvider` | `ANTHROPIC_API_KEY`, `ANTHROPIC_MODEL` |
|
||
| `gemini` | `GeminiProvider` | `GEMINI_API_KEY`, `GEMINI_MODEL` |
|
||
| `ollama` | `OllamaProvider` | `OLLAMA_BASE_URL`, `OLLAMA_MODEL` |
|
||
|
||
All providers implement:
|
||
```python
|
||
generate_brief(doc_text, bill_metadata) → ReverseBrief
|
||
generate_amendment_brief(new_text, prev_text, bill_metadata) → ReverseBrief
|
||
```
|
||
|
||
### ReverseBrief Dataclass
|
||
|
||
```python
|
||
@dataclass
|
||
class ReverseBrief:
|
||
summary: str
|
||
key_points: list[dict] # [{text, citation, quote}]
|
||
risks: list[dict] # [{text, citation, quote}]
|
||
deadlines: list[dict] # [{date, description}]
|
||
topic_tags: list[str]
|
||
llm_provider: str
|
||
llm_model: str
|
||
```
|
||
|
||
### Prompt Design
|
||
|
||
**Full brief prompt** instructs the LLM to produce:
|
||
```json
|
||
{
|
||
"summary": "2-4 paragraph plain-language explanation",
|
||
"key_points": [
|
||
{"text": "claim", "citation": "Section X(y)", "quote": "verbatim excerpt ≤80 words"}
|
||
],
|
||
"risks": [
|
||
{"text": "concern", "citation": "Section X(y)", "quote": "verbatim excerpt ≤80 words"}
|
||
],
|
||
"deadlines": [{"date": "YYYY-MM-DD or null", "description": "..."}],
|
||
"topic_tags": ["healthcare", "taxation"]
|
||
}
|
||
```
|
||
|
||
**Amendment brief prompt** focuses on what changed between document versions.
|
||
|
||
**Smart truncation:** Bills exceeding the token budget are trimmed — 75% of budget from the start (preamble/purpose), 25% from the end (enforcement/effective dates), with an omission notice in the middle.
|
||
|
||
**Token budgets:**
|
||
- OpenAI / Anthropic / Gemini: 6,000 tokens
|
||
- Ollama: 3,000 tokens (local models have smaller context windows)
|
||
|
||
---
|
||
|
||
## Frontend
|
||
|
||
**Framework:** Next.js 15 (App Router), TypeScript, Tailwind CSS
|
||
**State:** Zustand (auth), TanStack Query (server state)
|
||
**HTTP:** Axios with JWT interceptor
|
||
|
||
### Pages
|
||
|
||
| Route | Description |
|
||
|---|---|
|
||
| `/` | Dashboard — personalized feed + trending bills |
|
||
| `/bills` | Browse all bills with search, chamber/topic filters, pagination |
|
||
| `/bills/[id]` | Bill detail — brief with § citations, action timeline, news, trend chart |
|
||
| `/members` | Browse members of Congress, filter by chamber/party/state |
|
||
| `/members/[id]` | Member profile + sponsored bills |
|
||
| `/following` | User's followed bills, members, and topics |
|
||
| `/topics` | Browse and follow policy topics |
|
||
| `/settings` | Admin panel (admin only) |
|
||
| `/login` | Email + password sign-in |
|
||
| `/register` | Account creation |
|
||
|
||
### Key Components
|
||
|
||
**`AIBriefCard.tsx`**
|
||
Renders the LLM brief. For cited items (new format), shows a `§ Section X(y)` chip next to each bullet. Clicking the chip expands an inline panel with:
|
||
- Blockquoted verbatim excerpt from the bill
|
||
- "View source →" link to GovInfo (opens in new tab)
|
||
- One chip open at a time per card
|
||
- Old plain-string briefs render without chips (graceful backward compat)
|
||
|
||
**`AuthGuard.tsx`**
|
||
Client component wrapping the entire app. Waits for Zustand hydration, then redirects unauthenticated users to `/login`. Public paths (`/login`, `/register`) bypass the guard.
|
||
|
||
**`Sidebar.tsx`**
|
||
Navigation with: Home, Bills, Members, Following, Topics, Settings (admin only). Shows current user email + logout button at the bottom.
|
||
|
||
**`BillCard.tsx`**
|
||
Compact bill preview showing bill ID, title, sponsor with party badge, latest action date, and status.
|
||
|
||
**`TrendChart.tsx`**
|
||
Line chart of `composite_score` over time with tooltip breakdown of each data source.
|
||
|
||
### Utility Functions (`lib/utils.ts`)
|
||
|
||
```typescript
|
||
partyBadgeColor(party) → Tailwind classes
|
||
"Republican" → "bg-red-600 text-white"
|
||
"Democrat" → "bg-blue-600 text-white"
|
||
other → "bg-slate-500 text-white"
|
||
|
||
partyColor(party) → text color class (used inline)
|
||
trendColor(score) → color class based on score thresholds
|
||
billLabel(type, number) → "H.R. 1234", "S. 567", etc.
|
||
formatDate(date) → "Feb 28, 2026"
|
||
```
|
||
|
||
### Auth Store (`stores/authStore.ts`)
|
||
|
||
```typescript
|
||
interface AuthState {
|
||
token: string | null
|
||
user: { id: number; email: string; is_admin: boolean } | null
|
||
setAuth(token, user): void
|
||
logout(): void
|
||
}
|
||
// Persisted to localStorage as "pocketveto-auth"
|
||
```
|
||
|
||
---
|
||
|
||
## Authentication
|
||
|
||
- **Algorithm:** HS256 JWT, 7-day expiry
|
||
- **Storage:** Zustand store persisted to `localStorage` key `pocketveto-auth`
|
||
- **Injection:** Axios request interceptor reads from localStorage and adds `Authorization: Bearer <token>` to every request
|
||
- **First user:** The first account registered automatically receives `is_admin = true`
|
||
- **Admin role:** Required for PUT/POST `/api/settings`, all `/api/admin/*` endpoints, and viewing the Settings page in the UI
|
||
- **No email verification:** Accounts are active immediately on registration
|
||
- **Public endpoints:** `/api/bills`, `/api/members`, `/api/search`, `/api/health` — no auth required
|
||
|
||
---
|
||
|
||
## Key Architectural Patterns
|
||
|
||
### Idempotent Workers
|
||
Every Celery task checks for existing records before processing. Combined with `task_acks_late=True`, this means:
|
||
- Tasks can be retried without creating duplicates
|
||
- Worker crashes don't lose work (task stays in queue until acknowledged)
|
||
|
||
### Incremental Polling
|
||
The Congress.gov poller uses `fromDateTime` to fetch only recently updated bills, tracking the last poll timestamp in `app_settings`. On first run it seeds 60 days back to avoid processing thousands of old bills.
|
||
|
||
### Bill Type Filtering
|
||
Only tracks legislation that can become law:
|
||
- `hr` (House Resolution → Bill)
|
||
- `s` (Senate Bill)
|
||
- `hjres` (House Joint Resolution)
|
||
- `sjres` (Senate Joint Resolution)
|
||
|
||
Excluded (procedural, cannot become law): `hres`, `sres`, `hconres`, `sconres`
|
||
|
||
### Queue Specialization
|
||
Separate queues prevent a flood of LLM tasks from blocking time-sensitive polling tasks. Worker prefetch of 1 prevents any single worker from hoarding slow LLM jobs.
|
||
|
||
### LLM Provider Abstraction
|
||
All LLM providers implement the same interface. Switching providers is a single admin setting change — no code changes, no restart required (the factory reads from DB on each task invocation).
|
||
|
||
### JSONB for Flexible Brief Storage
|
||
`key_points`, `risks`, `deadlines`, `topic_tags` are stored as JSONB. This means the schema change from `list[str]` to `list[{text, citation, quote}]` required no migration — only the LLM prompt and application code changed. Old string-format briefs and new cited-object briefs coexist in the same column.
|
||
|
||
### Redis-backed Beat Schedule (RedBeat)
|
||
The Celery Beat schedule is stored in Redis rather than in memory. This means the beat scheduler can restart without losing schedule state or double-firing tasks.
|
||
|
||
### Docker DNS Re-resolution
|
||
Nginx uses `resolver 127.0.0.11 valid=10s` (Docker's internal DNS) so upstream container IPs are refreshed every 10 seconds. Without this, nginx caches the IP at startup and returns 502 errors after any container is recreated.
|
||
|
||
---
|
||
|
||
## Feature History
|
||
|
||
### v0.1.0 — Foundation
|
||
- Docker Compose stack: PostgreSQL, Redis, FastAPI, Celery, Next.js, Nginx
|
||
- Congress.gov API integration: bill polling, member sync
|
||
- GovInfo document fetching with intelligent truncation
|
||
- Multi-provider LLM service (OpenAI, Anthropic, Gemini, Ollama)
|
||
- AI brief generation: summary, key points, risks, deadlines, topic tags
|
||
- Amendment-aware processing: diffs new bill versions against prior
|
||
- NewsAPI + Google News RSS article correlation
|
||
- Google Trends (pytrends) scoring
|
||
- Composite trend score (0–100) with weighted formula
|
||
- Full-text bill search (PostgreSQL tsvector)
|
||
- Member of Congress browsing
|
||
- Global follows (bill / member / topic)
|
||
- Personalized dashboard feed
|
||
- Admin settings page (LLM provider selection, data source status)
|
||
- Manual Celery task triggers from UI
|
||
- Bill type filtering: only legislation that can become law
|
||
- 60-day seed window on fresh install
|
||
|
||
**Multi-User Auth (added to v0.1.0):**
|
||
- Email + password registration/login (JWT, bcrypt)
|
||
- Per-user follow scoping
|
||
- Admin role (first user = admin)
|
||
- Admin user management: list, delete, promote/demote
|
||
- AuthGuard with login/register pages
|
||
- Analysis status dashboard (auto-refresh every 30s)
|
||
|
||
### v0.2.2 — Sponsor Linking & Search Fixes
|
||
- **Root cause fixed:** Congress.gov list API does not return sponsor data — only the detail endpoint does. Poller now calls the detail endpoint for each new bill to get the sponsor and populate `bill.sponsor_id`
|
||
- **Backfill task:** `backfill_sponsor_ids` Celery task + `/api/admin/backfill-sponsors` endpoint + "Backfill Sponsors" button in Admin UI — fixes existing bills with `NULL` sponsor_id (~10 req/sec, ~3 min for 1,600 bills)
|
||
- **Member name search:** members are stored as "Last, First" in the `name` column; search now also matches "First Last" order using PostgreSQL `split_part()` — applied to both the Members page and global search
|
||
- **Search spaces:** removed `.trim()` on search `onChange` handlers in Members and Bills pages that was eating spaces as you typed
|
||
- **Member bills 500 error:** `get_member_bills` endpoint now eagerly loads `Bill.sponsor` via `selectinload` to prevent Pydantic MissingGreenlet error during serialization
|
||
|
||
### v0.2.0 — Citations
|
||
- **Per-claim citations on AI briefs:** every key point and risk includes:
|
||
- `citation` — section reference (e.g., "Section 301(a)(2)")
|
||
- `quote` — verbatim excerpt ≤80 words from that section
|
||
- `§` citation chip UI on each bullet — click to expand quote + GovInfo source link
|
||
- `govinfo_url` stored on `BillBrief` for direct frontend access
|
||
- Old briefs (plain strings) render without chips — backward compatible
|
||
- Migration 0006: `govinfo_url` column on `bill_briefs`
|
||
- Party badges redesigned: solid `red-600` / `blue-600` / `slate-500` with white text, readable in both light and dark mode
|
||
- Tailwind content scan extended to include `lib/` directory
|
||
- Nginx DNS resolver fix: prevents stale-IP 502s after container restarts
|
||
|
||
---
|
||
|
||
## Deployment
|
||
|
||
### First Deploy
|
||
|
||
```bash
|
||
cp .env.example .env
|
||
# Edit .env — add API keys, generate JWT_SECRET_KEY
|
||
docker compose up --build -d
|
||
```
|
||
|
||
Migrations run automatically. Navigate to the app, register the first account (it becomes admin).
|
||
|
||
### Updating
|
||
|
||
```bash
|
||
git pull origin main
|
||
docker compose up --build -d
|
||
docker compose exec nginx nginx -s reload # if nginx wasn't recreated
|
||
```
|
||
|
||
### Useful Commands
|
||
|
||
```bash
|
||
# Check all service status
|
||
docker compose ps
|
||
|
||
# View logs
|
||
docker compose logs api --tail=50
|
||
docker compose logs worker --tail=50
|
||
|
||
# Force a bill poll now
|
||
# → Admin page → Manual Controls → Trigger Poll
|
||
|
||
# Check DB column layout
|
||
docker compose exec postgres psql -U congress -d pocketveto -c "\d bill_briefs"
|
||
|
||
# Tail live worker output
|
||
docker compose logs -f worker
|
||
|
||
# Restart a specific service
|
||
docker compose restart worker
|
||
```
|
||
|
||
### Bill Regeneration (Optional)
|
||
|
||
Existing briefs generated before v0.2.0 use plain strings (no citations). To regenerate with citations:
|
||
|
||
1. Delete existing `bill_briefs` rows (keeps `bill_documents` intact)
|
||
2. Re-queue all documents via a one-off script similar to `queue_docs.py`
|
||
3. Worker will regenerate using the new cited prompt at 10/minute
|
||
4. ~1,000 briefs ≈ 2 hours
|
||
|
||
This is **optional** — old string briefs render correctly in the UI with no citation chips.
|