docs: add comprehensive architecture documentation
Covers full stack, database schema, API endpoints, Celery pipeline, LLM service design, frontend structure, auth, deployment, and feature history through v0.2.0. Authored-By: Jack Levy
This commit is contained in:
796
ARCHITECTURE.md
Normal file
796
ARCHITECTURE.md
Normal file
@@ -0,0 +1,796 @@
|
||||
# PocketVeto — Architecture & Feature Documentation
|
||||
|
||||
> **App brand:** PocketVeto
|
||||
> **Repo:** civicstack
|
||||
> **Purpose:** Citizen-grade US Congress monitoring with AI-powered bill analysis, per-claim citations, and personalized tracking.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Overview](#overview)
|
||||
2. [Tech Stack](#tech-stack)
|
||||
3. [Infrastructure & Docker](#infrastructure--docker)
|
||||
4. [Configuration & Environment](#configuration--environment)
|
||||
5. [Database Schema](#database-schema)
|
||||
6. [Alembic Migrations](#alembic-migrations)
|
||||
7. [Backend API](#backend-api)
|
||||
8. [Celery Workers & Pipeline](#celery-workers--pipeline)
|
||||
9. [LLM Service](#llm-service)
|
||||
10. [Frontend](#frontend)
|
||||
11. [Authentication](#authentication)
|
||||
12. [Key Architectural Patterns](#key-architectural-patterns)
|
||||
13. [Feature History](#feature-history)
|
||||
14. [Deployment](#deployment)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
PocketVeto is a self-hosted, full-stack application that automatically tracks US Congress legislation, fetches bill text, generates AI summaries with per-claim source citations, correlates bills with news and Google Trends, and presents everything through a personalized dashboard. Users follow bills, members of Congress, and policy topics; the system surfaces relevant activity in their feed.
|
||||
|
||||
```
|
||||
Congress.gov API → Poller → DB → Document Fetcher → GovInfo
|
||||
↓
|
||||
LLM Processor
|
||||
↓
|
||||
BillBrief
|
||||
(cited AI brief)
|
||||
↓
|
||||
News Fetcher + Trend Scorer
|
||||
↓
|
||||
Next.js Frontend
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Tech Stack
|
||||
|
||||
| Layer | Technology |
|
||||
|---|---|
|
||||
| Reverse Proxy | Nginx (alpine) |
|
||||
| Backend API | FastAPI + SQLAlchemy (async) |
|
||||
| Task Queue | Celery 5 + Redis |
|
||||
| Task Scheduler | Celery Beat + RedBeat (Redis-backed) |
|
||||
| Database | PostgreSQL 16 |
|
||||
| Cache / Broker | Redis 7 |
|
||||
| Frontend | Next.js 15, React, Tailwind CSS, TypeScript |
|
||||
| Auth | JWT (python-jose) + bcrypt (passlib) |
|
||||
| LLM | Multi-provider factory: OpenAI, Anthropic, Gemini, Ollama |
|
||||
| Bill Metadata | Congress.gov API (api.data.gov key) |
|
||||
| Bill Text | GovInfo API (same api.data.gov key) |
|
||||
| News | NewsAPI.org (100 req/day free tier) |
|
||||
| Trends | Google Trends via pytrends |
|
||||
|
||||
---
|
||||
|
||||
## Infrastructure & Docker
|
||||
|
||||
### Services (`docker-compose.yml`)
|
||||
|
||||
```
|
||||
postgres:16-alpine
|
||||
DB: pocketveto
|
||||
User: congress
|
||||
Port: 5432 (internal)
|
||||
|
||||
redis:7-alpine
|
||||
Port: 6379 (internal)
|
||||
Role: Celery broker, result backend, RedBeat schedule store
|
||||
|
||||
api (civicstack-api image)
|
||||
Port: 8000 (internal)
|
||||
Command: alembic upgrade head && uvicorn app.main:app --host 0.0.0.0 --port 8000
|
||||
Depends: postgres (healthy), redis (healthy)
|
||||
|
||||
worker (civicstack-worker image)
|
||||
Command: celery -A app.workers.celery_app worker -Q polling,documents,llm,news -c 4
|
||||
Depends: postgres (healthy), redis (healthy)
|
||||
|
||||
beat (civicstack-beat image)
|
||||
Command: celery -A app.workers.celery_app beat -S redbeat.RedBeatScheduler
|
||||
Depends: redis (healthy)
|
||||
|
||||
frontend (civicstack-frontend image)
|
||||
Port: 3000 (internal)
|
||||
Build: Next.js standalone output
|
||||
|
||||
nginx:alpine
|
||||
Port: 80 → public
|
||||
Routes: /api/* → api:8000 | /* → frontend:3000
|
||||
```
|
||||
|
||||
### Nginx Config (`nginx/nginx.conf`)
|
||||
|
||||
- `resolver 127.0.0.11 valid=10s` — re-resolves Docker DNS after container restarts (prevents stale-IP 502s on redeploy)
|
||||
- `/api/` → FastAPI, 120s read timeout
|
||||
- `/_next/static/` → frontend with 1-day cache header
|
||||
- `/` → frontend with WebSocket upgrade support
|
||||
|
||||
---
|
||||
|
||||
## Configuration & Environment
|
||||
|
||||
Copy `.env.example` → `.env` and fill in keys before first run.
|
||||
|
||||
```env
|
||||
# Network
|
||||
LOCAL_URL=http://localhost
|
||||
PUBLIC_URL= # optional, e.g. https://yourapp.com
|
||||
|
||||
# Auth
|
||||
JWT_SECRET_KEY= # python -c "import secrets; print(secrets.token_hex(32))"
|
||||
|
||||
# PostgreSQL
|
||||
POSTGRES_USER=congress
|
||||
POSTGRES_PASSWORD=congress
|
||||
POSTGRES_DB=pocketveto
|
||||
|
||||
# Redis
|
||||
REDIS_URL=redis://redis:6379/0
|
||||
|
||||
# Congress.gov + GovInfo (shared key from api.data.gov)
|
||||
DATA_GOV_API_KEY=
|
||||
CONGRESS_POLL_INTERVAL_MINUTES=30
|
||||
|
||||
# LLM — pick one provider
|
||||
LLM_PROVIDER=openai # openai | anthropic | gemini | ollama
|
||||
OPENAI_API_KEY=
|
||||
OPENAI_MODEL=gpt-4o
|
||||
ANTHROPIC_API_KEY=
|
||||
ANTHROPIC_MODEL=claude-opus-4-6
|
||||
GEMINI_API_KEY=
|
||||
GEMINI_MODEL=gemini-1.5-pro
|
||||
OLLAMA_BASE_URL=http://host.docker.internal:11434
|
||||
OLLAMA_MODEL=llama3.1
|
||||
|
||||
# News & Trends
|
||||
NEWSAPI_KEY=
|
||||
PYTRENDS_ENABLED=true
|
||||
```
|
||||
|
||||
**Runtime overrides:** LLM provider/model and poll interval can be changed live through the Admin page — stored in the `app_settings` table and take precedence over env vars.
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
### `bills`
|
||||
Primary key: `bill_id` — natural key in format `{congress}-{type}-{number}` (e.g. `119-hr-1234`).
|
||||
|
||||
| Column | Type | Notes |
|
||||
|---|---|---|
|
||||
| bill_id | varchar (PK) | |
|
||||
| congress_number | int | |
|
||||
| bill_type | varchar | `hr`, `s`, `hjres`, `sjres` (tracked); `hres`, `sres`, `hconres`, `sconres` (not tracked) |
|
||||
| bill_number | int | |
|
||||
| title | text | |
|
||||
| short_title | text | |
|
||||
| sponsor_id | varchar (FK → members) | bioguide_id |
|
||||
| introduced_date | date | |
|
||||
| latest_action_date | date | |
|
||||
| latest_action_text | text | |
|
||||
| status | varchar | |
|
||||
| chamber | varchar | House / Senate |
|
||||
| congress_url | varchar | congress.gov link |
|
||||
| govtrack_url | varchar | |
|
||||
| last_checked_at | timestamptz | |
|
||||
| actions_fetched_at | timestamptz | |
|
||||
| created_at / updated_at | timestamptz | |
|
||||
|
||||
Indexes: `congress_number`, `latest_action_date`, `introduced_date`, `chamber`, `sponsor_id`
|
||||
|
||||
---
|
||||
|
||||
### `bill_actions`
|
||||
|
||||
| Column | Type | Notes |
|
||||
|---|---|---|
|
||||
| id | int (PK) | |
|
||||
| bill_id | varchar (FK → bills, CASCADE) | |
|
||||
| action_date | date | |
|
||||
| action_text | text | |
|
||||
| action_type | varchar | |
|
||||
| chamber | varchar | |
|
||||
| created_at | timestamptz | |
|
||||
|
||||
---
|
||||
|
||||
### `bill_documents`
|
||||
Stores fetched bill text versions from GovInfo.
|
||||
|
||||
| Column | Type | Notes |
|
||||
|---|---|---|
|
||||
| id | int (PK) | |
|
||||
| bill_id | varchar (FK → bills, CASCADE) | |
|
||||
| doc_type | varchar | `bill_text`, `committee_report`, `amendment` |
|
||||
| doc_version | varchar | Introduced, Enrolled, etc. |
|
||||
| govinfo_url | varchar | Source URL on GovInfo |
|
||||
| raw_text | text | Full extracted text |
|
||||
| fetched_at | timestamptz | |
|
||||
| created_at | timestamptz | |
|
||||
|
||||
---
|
||||
|
||||
### `bill_briefs`
|
||||
AI-generated analysis. `key_points` and `risks` are JSONB arrays of cited objects.
|
||||
|
||||
| Column | Type | Notes |
|
||||
|---|---|---|
|
||||
| id | int (PK) | |
|
||||
| bill_id | varchar (FK → bills, CASCADE) | |
|
||||
| document_id | int (FK → bill_documents, SET NULL) | |
|
||||
| brief_type | varchar | `full` (first version) or `amendment` (diff from prior version) |
|
||||
| summary | text | 2-4 paragraph plain-language summary |
|
||||
| key_points | jsonb | `[{text, citation, quote}]` |
|
||||
| risks | jsonb | `[{text, citation, quote}]` |
|
||||
| deadlines | jsonb | `[{date, description}]` |
|
||||
| topic_tags | jsonb | `["healthcare", "taxation", ...]` |
|
||||
| llm_provider | varchar | Which provider generated this brief |
|
||||
| llm_model | varchar | Specific model name |
|
||||
| govinfo_url | varchar (nullable) | Source document URL (from bill_documents) |
|
||||
| created_at | timestamptz | |
|
||||
|
||||
Indexes: `bill_id`, `topic_tags` (GIN for JSONB containment queries)
|
||||
|
||||
**Citation structure** — each `key_points`/`risks` item:
|
||||
```json
|
||||
{
|
||||
"text": "The bill allocates $50B for defense",
|
||||
"citation": "Section 301(a)(2)",
|
||||
"quote": "There is hereby appropriated for fiscal year 2026, $50,000,000,000 for the Department of Defense..."
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `members`
|
||||
Primary key: `bioguide_id` (Congress.gov canonical identifier).
|
||||
|
||||
| Column | Type |
|
||||
|---|---|
|
||||
| bioguide_id | varchar (PK) |
|
||||
| name | varchar |
|
||||
| first_name / last_name | varchar |
|
||||
| party | varchar |
|
||||
| state | varchar |
|
||||
| chamber | varchar |
|
||||
| district | varchar (nullable, House only) |
|
||||
| photo_url | varchar (nullable) |
|
||||
| created_at / updated_at | timestamptz |
|
||||
|
||||
---
|
||||
|
||||
### `users`
|
||||
|
||||
| Column | Type | Notes |
|
||||
|---|---|---|
|
||||
| id | int (PK) | |
|
||||
| email | varchar (unique) | |
|
||||
| hashed_password | varchar | bcrypt |
|
||||
| is_admin | bool | First registered user = true |
|
||||
| notification_prefs | jsonb | Future: ntfy, Telegram, RSS config |
|
||||
| created_at | timestamptz | |
|
||||
|
||||
---
|
||||
|
||||
### `follows`
|
||||
|
||||
| Column | Type | Notes |
|
||||
|---|---|---|
|
||||
| id | int (PK) | |
|
||||
| user_id | int (FK → users, CASCADE) | |
|
||||
| follow_type | varchar | `bill`, `member`, `topic` |
|
||||
| follow_value | varchar | bill_id, bioguide_id, or topic name |
|
||||
| created_at | timestamptz | |
|
||||
|
||||
Unique constraint: `(user_id, follow_type, follow_value)`
|
||||
|
||||
---
|
||||
|
||||
### `news_articles`
|
||||
|
||||
| Column | Type | Notes |
|
||||
|---|---|---|
|
||||
| id | int (PK) | |
|
||||
| bill_id | varchar (FK → bills, CASCADE) | |
|
||||
| source | varchar | News outlet |
|
||||
| headline | varchar | |
|
||||
| url | varchar (unique) | Deduplication key |
|
||||
| published_at | timestamptz | |
|
||||
| relevance_score | float | Default 1.0 |
|
||||
| created_at | timestamptz | |
|
||||
|
||||
---
|
||||
|
||||
### `trend_scores`
|
||||
One record per bill per day.
|
||||
|
||||
| Column | Type | Notes |
|
||||
|---|---|---|
|
||||
| id | int (PK) | |
|
||||
| bill_id | varchar (FK → bills, CASCADE) | |
|
||||
| score_date | date | |
|
||||
| newsapi_count | int | Articles from NewsAPI (30-day window) |
|
||||
| gnews_count | int | Articles from Google News RSS |
|
||||
| gtrends_score | float | Google Trends interest 0–100 |
|
||||
| composite_score | float | Weighted combination 0–100 |
|
||||
| created_at | timestamptz | |
|
||||
|
||||
**Composite score formula:**
|
||||
```
|
||||
newsapi_pts = min(newsapi_count / 20, 1.0) × 40 # saturates at 20 articles
|
||||
gnews_pts = min(gnews_count / 50, 1.0) × 30 # saturates at 50 articles
|
||||
gtrends_pts = (gtrends_score / 100) × 30
|
||||
composite = newsapi_pts + gnews_pts + gtrends_pts # range 0–100
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `committees` / `committee_bills`
|
||||
|
||||
| committees | committee_id (PK), name, chamber, type |
|
||||
|---|---|
|
||||
| committee_bills | id, committee_id (FK), bill_id (FK), referred_date |
|
||||
|
||||
---
|
||||
|
||||
### `app_settings`
|
||||
Key-value store for runtime-configurable settings.
|
||||
|
||||
| Key | Purpose |
|
||||
|---|---|
|
||||
| `congress_last_polled_at` | ISO timestamp of last successful poll |
|
||||
| `llm_provider` | Overrides `LLM_PROVIDER` env var |
|
||||
| `llm_model` | Overrides provider default model |
|
||||
| `congress_poll_interval_minutes` | Overrides env var |
|
||||
|
||||
---
|
||||
|
||||
## Alembic Migrations
|
||||
|
||||
| File | Description |
|
||||
|---|---|
|
||||
| `0001_initial_schema.py` | All initial tables |
|
||||
| `0002_widen_chamber_party_columns.py` | Wider varchar for Bill.chamber, Member.party |
|
||||
| `0003_widen_member_state_district.py` | Wider varchar for Member.state, Member.district |
|
||||
| `0004_add_brief_type.py` | BillBrief.brief_type column (`full`/`amendment`) |
|
||||
| `0005_add_users_and_user_follows.py` | users table + user_id FK on follows; drops global follows |
|
||||
| `0006_add_brief_govinfo_url.py` | BillBrief.govinfo_url for frontend source links |
|
||||
|
||||
Migrations run automatically on API startup: `alembic upgrade head`.
|
||||
|
||||
---
|
||||
|
||||
## Backend API
|
||||
|
||||
Base URL: `/api`
|
||||
Auth header: `Authorization: Bearer <jwt>`
|
||||
|
||||
### `/api/auth`
|
||||
|
||||
| Method | Path | Auth | Description |
|
||||
|---|---|---|---|
|
||||
| POST | `/register` | — | Create account. First user → admin. Returns token + user. |
|
||||
| POST | `/login` | — | Returns token + user. |
|
||||
| GET | `/me` | Required | Current user info. |
|
||||
|
||||
### `/api/bills`
|
||||
|
||||
| Method | Path | Auth | Description |
|
||||
|---|---|---|---|
|
||||
| GET | `/` | — | Paginated bill list. Query: `chamber`, `topic`, `sponsor_id`, `q`, `page`, `per_page`, `sort`. |
|
||||
| GET | `/{bill_id}` | — | Full bill detail with sponsor, actions, briefs, news, trend scores. |
|
||||
| GET | `/{bill_id}/actions` | — | Action timeline, newest first. |
|
||||
| GET | `/{bill_id}/news` | — | Related news articles, limit 20. |
|
||||
| GET | `/{bill_id}/trend` | — | Trend score history. Query: `days` (7–365, default 30). |
|
||||
|
||||
### `/api/members`
|
||||
|
||||
| Method | Path | Auth | Description |
|
||||
|---|---|---|---|
|
||||
| GET | `/` | — | Paginated members. Query: `chamber`, `party`, `state`, `q`, `page`, `per_page`. |
|
||||
| GET | `/{bioguide_id}` | — | Member detail. |
|
||||
| GET | `/{bioguide_id}/bills` | — | Member's sponsored bills, paginated. |
|
||||
|
||||
### `/api/follows`
|
||||
|
||||
| Method | Path | Auth | Description |
|
||||
|---|---|---|---|
|
||||
| GET | `/` | Required | Current user's follows. |
|
||||
| POST | `/` | Required | Add follow `{follow_type, follow_value}`. Idempotent. |
|
||||
| DELETE | `/{id}` | Required | Remove follow (ownership checked). |
|
||||
|
||||
### `/api/dashboard`
|
||||
|
||||
| Method | Path | Auth | Description |
|
||||
|---|---|---|---|
|
||||
| GET | `/` | Required | Personalized feed from followed bills/members/topics + trending. Returns `{feed, trending, follows}`. |
|
||||
|
||||
### `/api/search`
|
||||
|
||||
| Method | Path | Auth | Description |
|
||||
|---|---|---|---|
|
||||
| GET | `/` | — | Full-text search. Query: `q` (min 2 chars). Returns `{bills, members}`. |
|
||||
|
||||
### `/api/settings`
|
||||
|
||||
| Method | Path | Auth | Description |
|
||||
|---|---|---|---|
|
||||
| GET | `/` | Required | Current settings (DB overrides env). |
|
||||
| PUT | `/` | Admin | Update `{key, value}`. Allowed keys: `llm_provider`, `llm_model`, `congress_poll_interval_minutes`. |
|
||||
| POST | `/test-llm` | Admin | Test LLM connection. Returns `{status, provider, model, summary_preview}`. |
|
||||
|
||||
### `/api/admin`
|
||||
|
||||
| Method | Path | Auth | Description |
|
||||
|---|---|---|---|
|
||||
| GET | `/users` | Admin | All users with follow counts. |
|
||||
| DELETE | `/users/{id}` | Admin | Delete user (cannot delete self). Cascades follows. |
|
||||
| PATCH | `/users/{id}/toggle-admin` | Admin | Promote/demote admin status (cannot change self). |
|
||||
| GET | `/stats` | Admin | Pipeline progress: total bills, docs fetched, briefs generated, remaining. |
|
||||
| POST | `/trigger-poll` | Admin | Queue immediate Congress.gov poll. |
|
||||
| POST | `/trigger-member-sync` | Admin | Queue member sync. |
|
||||
| POST | `/trigger-trend-scores` | Admin | Queue trend score calculation. |
|
||||
| GET | `/task-status/{task_id}` | Admin | Celery task status and result. |
|
||||
|
||||
### `/api/health`
|
||||
|
||||
| Method | Path | Description |
|
||||
|---|---|---|
|
||||
| GET | `/` | Simple health check `{status: "ok", timestamp}`. |
|
||||
| GET | `/detailed` | Tests PostgreSQL + Redis. Returns per-service status. |
|
||||
|
||||
---
|
||||
|
||||
## Celery Workers & Pipeline
|
||||
|
||||
**Celery app name:** `pocketveto`
|
||||
**Broker / Backend:** Redis
|
||||
|
||||
### Queue Routing
|
||||
|
||||
| Queue | Workers | Tasks |
|
||||
|---|---|---|
|
||||
| `polling` | worker | `poll_congress_bills`, `sync_members` |
|
||||
| `documents` | worker | `fetch_bill_documents` |
|
||||
| `llm` | worker | `process_document_with_llm` |
|
||||
| `news` | worker | `fetch_news_for_bill`, `fetch_news_for_active_bills`, `calculate_all_trend_scores` |
|
||||
|
||||
**Worker settings:**
|
||||
- `task_acks_late = True` — task removed from queue only after completion, not on pickup
|
||||
- `worker_prefetch_multiplier = 1` — prevents workers from hoarding LLM tasks
|
||||
- Serialization: JSON
|
||||
|
||||
### Beat Schedule (RedBeat, stored in Redis)
|
||||
|
||||
| Schedule | Task | When |
|
||||
|---|---|---|
|
||||
| Configurable (default 30 min) | `poll_congress_bills` | Continuous |
|
||||
| Every 6 hours | `fetch_news_for_active_bills` | Ongoing |
|
||||
| Daily 2 AM UTC | `calculate_all_trend_scores` | Nightly |
|
||||
|
||||
---
|
||||
|
||||
### Pipeline Flow
|
||||
|
||||
```
|
||||
1. congress_poller.poll_congress_bills()
|
||||
↳ Fetches bills updated since last poll (fromDateTime param)
|
||||
↳ Filters: only hr, s, hjres, sjres (legislation that can become law)
|
||||
↳ First run: seeds from 60 days back
|
||||
↳ New bills → fetch_bill_documents.delay(bill_id)
|
||||
↳ Updated bills → fetch_bill_documents.delay(bill_id) if changed
|
||||
|
||||
2. document_fetcher.fetch_bill_documents(bill_id)
|
||||
↳ Gets text versions from Congress.gov (XML preferred, falls back to HTML/PDF)
|
||||
↳ Fetches raw text from GovInfo
|
||||
↳ Idempotent: skips if doc_version already stored
|
||||
↳ Stores BillDocument with govinfo_url + raw_text
|
||||
↳ → process_document_with_llm.delay(document_id)
|
||||
|
||||
3. llm_processor.process_document_with_llm(document_id)
|
||||
↳ Rate limited: 10/minute
|
||||
↳ Idempotent: skips if brief exists for document
|
||||
↳ Determines type:
|
||||
- No prior brief → "full" brief
|
||||
- Prior brief exists → "amendment" brief (diff vs previous)
|
||||
↳ Calls configured LLM provider
|
||||
↳ Stores BillBrief with cited key_points and risks
|
||||
↳ → fetch_news_for_bill.delay(bill_id)
|
||||
|
||||
4. news_fetcher.fetch_news_for_bill(bill_id)
|
||||
↳ Queries NewsAPI using bill title + topic_tags
|
||||
↳ Deduplicates by URL
|
||||
↳ Stores NewsArticle records
|
||||
|
||||
5. trend_scorer.calculate_all_trend_scores() [nightly]
|
||||
↳ Bills active in last 90 days
|
||||
↳ Skips bills already scored today
|
||||
↳ Fetches: NewsAPI count + Google News RSS count + Google Trends score
|
||||
↳ Calculates composite_score (0–100)
|
||||
↳ Stores TrendScore record
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## LLM Service
|
||||
|
||||
**File:** `backend/app/services/llm_service.py`
|
||||
|
||||
### Provider Factory
|
||||
|
||||
```python
|
||||
get_llm_provider() → LLMProvider
|
||||
```
|
||||
|
||||
Reads `LLM_PROVIDER` from AppSetting (DB) then env var. Instantiates the matching provider class.
|
||||
|
||||
| Provider | Class | Key Setting |
|
||||
|---|---|---|
|
||||
| `openai` | `OpenAIProvider` | `OPENAI_API_KEY`, `OPENAI_MODEL` |
|
||||
| `anthropic` | `AnthropicProvider` | `ANTHROPIC_API_KEY`, `ANTHROPIC_MODEL` |
|
||||
| `gemini` | `GeminiProvider` | `GEMINI_API_KEY`, `GEMINI_MODEL` |
|
||||
| `ollama` | `OllamaProvider` | `OLLAMA_BASE_URL`, `OLLAMA_MODEL` |
|
||||
|
||||
All providers implement:
|
||||
```python
|
||||
generate_brief(doc_text, bill_metadata) → ReverseBrief
|
||||
generate_amendment_brief(new_text, prev_text, bill_metadata) → ReverseBrief
|
||||
```
|
||||
|
||||
### ReverseBrief Dataclass
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ReverseBrief:
|
||||
summary: str
|
||||
key_points: list[dict] # [{text, citation, quote}]
|
||||
risks: list[dict] # [{text, citation, quote}]
|
||||
deadlines: list[dict] # [{date, description}]
|
||||
topic_tags: list[str]
|
||||
llm_provider: str
|
||||
llm_model: str
|
||||
```
|
||||
|
||||
### Prompt Design
|
||||
|
||||
**Full brief prompt** instructs the LLM to produce:
|
||||
```json
|
||||
{
|
||||
"summary": "2-4 paragraph plain-language explanation",
|
||||
"key_points": [
|
||||
{"text": "claim", "citation": "Section X(y)", "quote": "verbatim excerpt ≤80 words"}
|
||||
],
|
||||
"risks": [
|
||||
{"text": "concern", "citation": "Section X(y)", "quote": "verbatim excerpt ≤80 words"}
|
||||
],
|
||||
"deadlines": [{"date": "YYYY-MM-DD or null", "description": "..."}],
|
||||
"topic_tags": ["healthcare", "taxation"]
|
||||
}
|
||||
```
|
||||
|
||||
**Amendment brief prompt** focuses on what changed between document versions.
|
||||
|
||||
**Smart truncation:** Bills exceeding the token budget are trimmed — 75% of budget from the start (preamble/purpose), 25% from the end (enforcement/effective dates), with an omission notice in the middle.
|
||||
|
||||
**Token budgets:**
|
||||
- OpenAI / Anthropic / Gemini: 6,000 tokens
|
||||
- Ollama: 3,000 tokens (local models have smaller context windows)
|
||||
|
||||
---
|
||||
|
||||
## Frontend
|
||||
|
||||
**Framework:** Next.js 15 (App Router), TypeScript, Tailwind CSS
|
||||
**State:** Zustand (auth), TanStack Query (server state)
|
||||
**HTTP:** Axios with JWT interceptor
|
||||
|
||||
### Pages
|
||||
|
||||
| Route | Description |
|
||||
|---|---|
|
||||
| `/` | Dashboard — personalized feed + trending bills |
|
||||
| `/bills` | Browse all bills with search, chamber/topic filters, pagination |
|
||||
| `/bills/[id]` | Bill detail — brief with § citations, action timeline, news, trend chart |
|
||||
| `/members` | Browse members of Congress, filter by chamber/party/state |
|
||||
| `/members/[id]` | Member profile + sponsored bills |
|
||||
| `/following` | User's followed bills, members, and topics |
|
||||
| `/topics` | Browse and follow policy topics |
|
||||
| `/settings` | Admin panel (admin only) |
|
||||
| `/login` | Email + password sign-in |
|
||||
| `/register` | Account creation |
|
||||
|
||||
### Key Components
|
||||
|
||||
**`AIBriefCard.tsx`**
|
||||
Renders the LLM brief. For cited items (new format), shows a `§ Section X(y)` chip next to each bullet. Clicking the chip expands an inline panel with:
|
||||
- Blockquoted verbatim excerpt from the bill
|
||||
- "View source →" link to GovInfo (opens in new tab)
|
||||
- One chip open at a time per card
|
||||
- Old plain-string briefs render without chips (graceful backward compat)
|
||||
|
||||
**`AuthGuard.tsx`**
|
||||
Client component wrapping the entire app. Waits for Zustand hydration, then redirects unauthenticated users to `/login`. Public paths (`/login`, `/register`) bypass the guard.
|
||||
|
||||
**`Sidebar.tsx`**
|
||||
Navigation with: Home, Bills, Members, Following, Topics, Settings (admin only). Shows current user email + logout button at the bottom.
|
||||
|
||||
**`BillCard.tsx`**
|
||||
Compact bill preview showing bill ID, title, sponsor with party badge, latest action date, and status.
|
||||
|
||||
**`TrendChart.tsx`**
|
||||
Line chart of `composite_score` over time with tooltip breakdown of each data source.
|
||||
|
||||
### Utility Functions (`lib/utils.ts`)
|
||||
|
||||
```typescript
|
||||
partyBadgeColor(party) → Tailwind classes
|
||||
"Republican" → "bg-red-600 text-white"
|
||||
"Democrat" → "bg-blue-600 text-white"
|
||||
other → "bg-slate-500 text-white"
|
||||
|
||||
partyColor(party) → text color class (used inline)
|
||||
trendColor(score) → color class based on score thresholds
|
||||
billLabel(type, number) → "H.R. 1234", "S. 567", etc.
|
||||
formatDate(date) → "Feb 28, 2026"
|
||||
```
|
||||
|
||||
### Auth Store (`stores/authStore.ts`)
|
||||
|
||||
```typescript
|
||||
interface AuthState {
|
||||
token: string | null
|
||||
user: { id: number; email: string; is_admin: boolean } | null
|
||||
setAuth(token, user): void
|
||||
logout(): void
|
||||
}
|
||||
// Persisted to localStorage as "pocketveto-auth"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Authentication
|
||||
|
||||
- **Algorithm:** HS256 JWT, 7-day expiry
|
||||
- **Storage:** Zustand store persisted to `localStorage` key `pocketveto-auth`
|
||||
- **Injection:** Axios request interceptor reads from localStorage and adds `Authorization: Bearer <token>` to every request
|
||||
- **First user:** The first account registered automatically receives `is_admin = true`
|
||||
- **Admin role:** Required for PUT/POST `/api/settings`, all `/api/admin/*` endpoints, and viewing the Settings page in the UI
|
||||
- **No email verification:** Accounts are active immediately on registration
|
||||
- **Public endpoints:** `/api/bills`, `/api/members`, `/api/search`, `/api/health` — no auth required
|
||||
|
||||
---
|
||||
|
||||
## Key Architectural Patterns
|
||||
|
||||
### Idempotent Workers
|
||||
Every Celery task checks for existing records before processing. Combined with `task_acks_late=True`, this means:
|
||||
- Tasks can be retried without creating duplicates
|
||||
- Worker crashes don't lose work (task stays in queue until acknowledged)
|
||||
|
||||
### Incremental Polling
|
||||
The Congress.gov poller uses `fromDateTime` to fetch only recently updated bills, tracking the last poll timestamp in `app_settings`. On first run it seeds 60 days back to avoid processing thousands of old bills.
|
||||
|
||||
### Bill Type Filtering
|
||||
Only tracks legislation that can become law:
|
||||
- `hr` (House Resolution → Bill)
|
||||
- `s` (Senate Bill)
|
||||
- `hjres` (House Joint Resolution)
|
||||
- `sjres` (Senate Joint Resolution)
|
||||
|
||||
Excluded (procedural, cannot become law): `hres`, `sres`, `hconres`, `sconres`
|
||||
|
||||
### Queue Specialization
|
||||
Separate queues prevent a flood of LLM tasks from blocking time-sensitive polling tasks. Worker prefetch of 1 prevents any single worker from hoarding slow LLM jobs.
|
||||
|
||||
### LLM Provider Abstraction
|
||||
All LLM providers implement the same interface. Switching providers is a single admin setting change — no code changes, no restart required (the factory reads from DB on each task invocation).
|
||||
|
||||
### JSONB for Flexible Brief Storage
|
||||
`key_points`, `risks`, `deadlines`, `topic_tags` are stored as JSONB. This means the schema change from `list[str]` to `list[{text, citation, quote}]` required no migration — only the LLM prompt and application code changed. Old string-format briefs and new cited-object briefs coexist in the same column.
|
||||
|
||||
### Redis-backed Beat Schedule (RedBeat)
|
||||
The Celery Beat schedule is stored in Redis rather than in memory. This means the beat scheduler can restart without losing schedule state or double-firing tasks.
|
||||
|
||||
### Docker DNS Re-resolution
|
||||
Nginx uses `resolver 127.0.0.11 valid=10s` (Docker's internal DNS) so upstream container IPs are refreshed every 10 seconds. Without this, nginx caches the IP at startup and returns 502 errors after any container is recreated.
|
||||
|
||||
---
|
||||
|
||||
## Feature History
|
||||
|
||||
### v0.1.0 — Foundation
|
||||
- Docker Compose stack: PostgreSQL, Redis, FastAPI, Celery, Next.js, Nginx
|
||||
- Congress.gov API integration: bill polling, member sync
|
||||
- GovInfo document fetching with intelligent truncation
|
||||
- Multi-provider LLM service (OpenAI, Anthropic, Gemini, Ollama)
|
||||
- AI brief generation: summary, key points, risks, deadlines, topic tags
|
||||
- Amendment-aware processing: diffs new bill versions against prior
|
||||
- NewsAPI + Google News RSS article correlation
|
||||
- Google Trends (pytrends) scoring
|
||||
- Composite trend score (0–100) with weighted formula
|
||||
- Full-text bill search (PostgreSQL tsvector)
|
||||
- Member of Congress browsing
|
||||
- Global follows (bill / member / topic)
|
||||
- Personalized dashboard feed
|
||||
- Admin settings page (LLM provider selection, data source status)
|
||||
- Manual Celery task triggers from UI
|
||||
- Bill type filtering: only legislation that can become law
|
||||
- 60-day seed window on fresh install
|
||||
|
||||
**Multi-User Auth (added to v0.1.0):**
|
||||
- Email + password registration/login (JWT, bcrypt)
|
||||
- Per-user follow scoping
|
||||
- Admin role (first user = admin)
|
||||
- Admin user management: list, delete, promote/demote
|
||||
- AuthGuard with login/register pages
|
||||
- Analysis status dashboard (auto-refresh every 30s)
|
||||
|
||||
### v0.2.0 — Citations
|
||||
- **Per-claim citations on AI briefs:** every key point and risk includes:
|
||||
- `citation` — section reference (e.g., "Section 301(a)(2)")
|
||||
- `quote` — verbatim excerpt ≤80 words from that section
|
||||
- `§` citation chip UI on each bullet — click to expand quote + GovInfo source link
|
||||
- `govinfo_url` stored on `BillBrief` for direct frontend access
|
||||
- Old briefs (plain strings) render without chips — backward compatible
|
||||
- Migration 0006: `govinfo_url` column on `bill_briefs`
|
||||
- Party badges redesigned: solid `red-600` / `blue-600` / `slate-500` with white text, readable in both light and dark mode
|
||||
- Tailwind content scan extended to include `lib/` directory
|
||||
- Nginx DNS resolver fix: prevents stale-IP 502s after container restarts
|
||||
|
||||
---
|
||||
|
||||
## Deployment
|
||||
|
||||
### First Deploy
|
||||
|
||||
```bash
|
||||
cp .env.example .env
|
||||
# Edit .env — add API keys, generate JWT_SECRET_KEY
|
||||
docker compose up --build -d
|
||||
```
|
||||
|
||||
Migrations run automatically. Navigate to the app, register the first account (it becomes admin).
|
||||
|
||||
### Updating
|
||||
|
||||
```bash
|
||||
git pull origin main
|
||||
docker compose up --build -d
|
||||
docker compose exec nginx nginx -s reload # if nginx wasn't recreated
|
||||
```
|
||||
|
||||
### Useful Commands
|
||||
|
||||
```bash
|
||||
# Check all service status
|
||||
docker compose ps
|
||||
|
||||
# View logs
|
||||
docker compose logs api --tail=50
|
||||
docker compose logs worker --tail=50
|
||||
|
||||
# Force a bill poll now
|
||||
# → Admin page → Manual Controls → Trigger Poll
|
||||
|
||||
# Check DB column layout
|
||||
docker compose exec postgres psql -U congress -d pocketveto -c "\d bill_briefs"
|
||||
|
||||
# Tail live worker output
|
||||
docker compose logs -f worker
|
||||
|
||||
# Restart a specific service
|
||||
docker compose restart worker
|
||||
```
|
||||
|
||||
### Bill Regeneration (Optional)
|
||||
|
||||
Existing briefs generated before v0.2.0 use plain strings (no citations). To regenerate with citations:
|
||||
|
||||
1. Delete existing `bill_briefs` rows (keeps `bill_documents` intact)
|
||||
2. Re-queue all documents via a one-off script similar to `queue_docs.py`
|
||||
3. Worker will regenerate using the new cited prompt at 10/minute
|
||||
4. ~1,000 briefs ≈ 2 hours
|
||||
|
||||
This is **optional** — old string briefs render correctly in the UI with no citation chips.
|
||||
Reference in New Issue
Block a user