feat(phase2): fact/inference labeling, change-driven alerts, admin cleanup

- Add label: cited_fact | inference to LLM brief schema (all 4 providers)
- Inferred badge in AIBriefCard for inference-labeled points
- backfill_brief_labels Celery task: classifies existing cited points in-place
- POST /api/admin/backfill-labels + unlabeled_briefs stat counter
- Expand milestone keywords: markup, conference
- Add is_referral_action() for committee referrals (referred to)
- Two-tier milestone notifications: progress tier (all follow modes) and
  referral tier (pocket_veto/boost only, neutral suppressed)
- Topic followers now receive bill_updated milestone notifications via
  latest brief topic_tags lookup in _update_bill_if_changed()
- Admin Manual Controls: collapsible Maintenance section for backfill tasks
- Update ARCHITECTURE.md and roadmap for Phase 2 completion

Co-Authored-By: Jack Levy
This commit is contained in:
Jack Levy
2026-03-01 17:34:45 -05:00
parent dc5e756749
commit 1e37c99599
12 changed files with 500 additions and 121 deletions

View File

@@ -238,9 +238,11 @@ Indexes: `bill_id`, `topic_tags` (GIN for JSONB containment queries)
{
"text": "The bill allocates $50B for defense",
"citation": "Section 301(a)(2)",
"quote": "There is hereby appropriated for fiscal year 2026, $50,000,000,000 for the Department of Defense..."
"quote": "There is hereby appropriated for fiscal year 2026, $50,000,000,000 for the Department of Defense...",
"label": "cited_fact"
}
```
`label` is `"cited_fact"` when the claim is explicitly stated in the quoted text, or `"inference"` when it is an analytical interpretation. Old briefs without this field render without a badge (backward compatible).
---
@@ -324,6 +326,7 @@ News articles correlated to a specific member of Congress.
| user_id | int (FK → users, CASCADE) | |
| follow_type | varchar | `bill`, `member`, `topic` |
| follow_value | varchar | bill_id, bioguide_id, or topic name |
| follow_mode | varchar | `neutral` \| `pocket_veto` \| `pocket_boost` (default `neutral`) |
| created_at | timestamptz | |
Unique constraint: `(user_id, follow_type, follow_value)`
@@ -397,12 +400,13 @@ Stores notification events for dispatching to user channels (ntfy, RSS).
| id | int (PK) | |
| user_id | int (FK → users, CASCADE) | |
| bill_id | varchar (FK → bills, SET NULL) | nullable |
| event_type | varchar | e.g. `new_brief`, `bill_updated`, `new_action` |
| headline | text | Short description for ntfy title |
| body | text | Longer description for ntfy message / RSS content |
| dispatched_at | timestamptz (nullable) | NULL = not yet sent |
| event_type | varchar | `new_document`, `new_amendment`, `bill_updated` |
| payload | jsonb | `{bill_title, bill_label, brief_summary, bill_url, milestone_tier}` |
| dispatched_at | timestamptz (nullable) | NULL = pending dispatch |
| created_at | timestamptz | |
`milestone_tier` in payload: `"progress"` (passed, signed, markup, conference, etc.) or `"referral"` (committee referral). Neutral follows silently skip referral-tier events; pocket_veto and pocket_boost receive them as early warnings.
---
## Alembic Migrations
@@ -442,11 +446,12 @@ Auth header: `Authorization: Bearer <jwt>`
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | `/` | — | Paginated bill list. Query: `chamber`, `topic`, `sponsor_id`, `q`, `page`, `per_page`, `sort`. |
| GET | `/` | — | Paginated bill list. Query: `chamber`, `topic`, `sponsor_id`, `q`, `page`, `per_page`, `sort`. Includes `has_document` flag per bill via a single batch query. |
| GET | `/{bill_id}` | — | Full bill detail with sponsor, actions, briefs, news, trend scores. |
| GET | `/{bill_id}/actions` | — | Action timeline, newest first. |
| GET | `/{bill_id}/news` | — | Related news articles, limit 20. |
| GET | `/{bill_id}/trend` | — | Trend score history. Query: `days` (7365, default 30). |
| POST | `/{bill_id}/draft-letter` | — | Generate a constituent letter draft via the configured LLM. Body: `{stance, recipient, tone, selected_points, include_citations, zip_code?}`. Returns `{draft: string}`. ZIP code is used in the prompt only — never stored or logged. |
### `/api/members`
@@ -503,7 +508,7 @@ Auth header: `Authorization: Bearer <jwt>`
| GET | `/users` | Admin | All users with follow counts. |
| DELETE | `/users/{id}` | Admin | Delete user (cannot delete self). Cascades follows. |
| PATCH | `/users/{id}/toggle-admin` | Admin | Promote/demote admin status (cannot change self). |
| GET | `/stats` | Admin | Pipeline counters: total bills, docs fetched, briefs generated, pending LLM, missing metadata/sponsors/actions, uncited briefs. |
| GET | `/stats` | Admin | Pipeline counters: total bills, docs fetched, briefs generated, pending LLM, missing metadata/sponsors/actions, uncited briefs, unlabeled briefs (cited objects without a fact/inference label). |
| GET | `/api-health` | Admin | Test each external API in parallel; returns status + latency for Congress.gov, GovInfo, NewsAPI, Google News. |
| POST | `/trigger-poll` | Admin | Queue immediate Congress.gov poll. |
| POST | `/trigger-member-sync` | Admin | Queue member sync. |
@@ -513,6 +518,7 @@ Auth header: `Authorization: Bearer <jwt>`
| POST | `/backfill-sponsors` | Admin | Queue one-off task to populate `sponsor_id` on bills where it is NULL. |
| POST | `/backfill-metadata` | Admin | Fill null `introduced_date`, `chamber`, `congress_url` by re-fetching bill detail. |
| POST | `/backfill-citations` | Admin | Delete pre-citation briefs and re-queue LLM using stored document text. |
| POST | `/backfill-labels` | Admin | Classify existing cited brief points as `cited_fact` or `inference` in-place — one compact LLM call per brief, no re-generation. |
| POST | `/resume-analysis` | Admin | Re-queue LLM for docs with no brief; re-queue doc fetch for bills with no doc. |
| POST | `/bills/{bill_id}/reprocess` | Admin | Queue document + action fetches for a specific bill (debugging). |
| GET | `/task-status/{task_id}` | Admin | Celery task status and result. |
@@ -570,6 +576,12 @@ Auth header: `Authorization: Bearer <jwt>`
has no sponsor data), upserts Member, sets bill.sponsor_id
↳ New bills → fetch_bill_documents.delay(bill_id)
↳ Updated bills → fetch_bill_documents.delay(bill_id) if changed
↳ Updated bills → emit bill_updated notification if action is a milestone:
- "progress" tier: passed/failed, signed/vetoed, enacted, markup, conference,
reported from committee, placed on calendar, cloture, roll call
→ all follow types (bill, sponsor, topic) receive notification
- "referral" tier: referred to committee
→ pocket_veto and pocket_boost only; neutral follows silently skip
2. document_fetcher.fetch_bill_documents(bill_id)
↳ Gets text versions from Congress.gov (XML preferred, falls back to HTML/PDF)
@@ -641,6 +653,7 @@ All providers implement:
```python
generate_brief(doc_text, bill_metadata) ReverseBrief
generate_amendment_brief(new_text, prev_text, bill_metadata) ReverseBrief
generate_text(prompt) str # free-form text, used by draft letter generator
```
### ReverseBrief Dataclass
@@ -649,8 +662,8 @@ generate_amendment_brief(new_text, prev_text, bill_metadata) → ReverseBrief
@dataclass
class ReverseBrief:
summary: str
key_points: list[dict] # [{text, citation, quote}]
risks: list[dict] # [{text, citation, quote}]
key_points: list[dict] # [{text, citation, quote, label}]
risks: list[dict] # [{text, citation, quote, label}]
deadlines: list[dict] # [{date, description}]
topic_tags: list[str]
llm_provider: str
@@ -664,16 +677,28 @@ class ReverseBrief:
{
"summary": "2-4 paragraph plain-language explanation",
"key_points": [
{"text": "claim", "citation": "Section X(y)", "quote": "verbatim excerpt ≤80 words"}
{
"text": "claim",
"citation": "Section X(y)",
"quote": "verbatim excerpt ≤80 words",
"label": "cited_fact"
}
],
"risks": [
{"text": "concern", "citation": "Section X(y)", "quote": "verbatim excerpt ≤80 words"}
{
"text": "concern",
"citation": "Section X(y)",
"quote": "verbatim excerpt ≤80 words",
"label": "inference"
}
],
"deadlines": [{"date": "YYYY-MM-DD or null", "description": "..."}],
"topic_tags": ["healthcare", "taxation"]
}
```
`label` classification rules baked into the system prompt: `"cited_fact"` if the claim is explicitly stated in the quoted text; `"inference"` if it is an analytical interpretation, projection, or implication not literally stated. The UI shows a neutral "Inferred" badge on inference items only (cited_fact is the clean default).
**Amendment brief prompt** focuses on what changed between document versions.
**Smart truncation:** Bills exceeding the token budget are trimmed — 75% of budget from the start (preamble/purpose), 25% from the end (enforcement/effective dates), with an omission notice in the middle.
@@ -715,6 +740,7 @@ Renders the LLM brief. For cited items (new format), shows a `§ Section X(y)` c
- Blockquoted verbatim excerpt from the bill
- "View source →" link to GovInfo (opens in new tab)
- One chip open at a time per card
- Inference items show a neutral "Inferred" badge (analytical interpretation, not a literal quote)
- Old plain-string briefs render without chips (graceful backward compat)
**`ActionTimeline.tsx`**
@@ -729,8 +755,11 @@ Client component wrapping the entire app. Waits for Zustand hydration, then redi
**`Sidebar.tsx`**
Navigation with: Home, Bills, Members, Following, Topics, Settings (admin only). Shows current user email + logout button at the bottom. Accepts optional `onClose` prop — when provided (mobile drawer context), renders an X close button in the header and calls `onClose` on every nav link click.
**`DraftLetterPanel.tsx`**
Collapsible panel rendered below `BriefPanel` on the bill detail page (only when a brief exists). Lets users select up to 3 cited points from the brief, choose stance (YES/NO), tone (short/polite/firm), and optionally enter a ZIP code (not stored). Stance auto-populates from the user's follow mode (`pocket_boost` → YES, `pocket_veto` → NO); clears if they unfollow. Recipient (house/senate) is derived from the bill's chamber. Calls `POST /{bill_id}/draft-letter` and renders the plain-text draft in a readonly textarea with a copy-to-clipboard button.
**`BillCard.tsx`**
Compact bill preview showing bill ID, title, sponsor with party badge, latest action date, and status.
Compact bill preview showing bill ID, title, sponsor with party badge, latest action date, status, and a text availability indicator: `Brief` (green, analysis done) / `Pending` (amber, text retrieved but not yet analysed) / `No text` (muted, nothing published on Congress.gov).
**`TrendChart.tsx`**
Line chart of `composite_score` over time with tooltip breakdown of each data source.
@@ -805,7 +834,7 @@ Separate queues prevent a flood of LLM tasks from blocking time-sensitive pollin
All LLM providers implement the same interface. Switching providers is a single admin setting change — no code changes, no restart required (the factory reads from DB on each task invocation).
### JSONB for Flexible Brief Storage
`key_points`, `risks`, `deadlines`, `topic_tags` are stored as JSONB. This means the schema change from `list[str]` to `list[{text, citation, quote}]` required no migration — only the LLM prompt and application code changed. Old string-format briefs and new cited-object briefs coexist in the same column.
`key_points`, `risks`, `deadlines`, `topic_tags` are stored as JSONB. This means schema changes (adding `citation`/`quote` in v0.2.0, adding `label` in v0.6.0) required no migrations — only the LLM prompt and application code changed. Old string-format briefs, cited-object briefs without labels, and fully-labelled briefs all coexist in the same column and render correctly at each fidelity level.
### Redis-backed Beat Schedule (RedBeat)
The Celery Beat schedule is stored in Redis rather than in memory. This means the beat scheduler can restart without losing schedule state or double-firing tasks.
@@ -915,6 +944,55 @@ Nginx uses `resolver 127.0.0.11 valid=10s` (Docker's internal DNS) so upstream c
- `introduced_date` shown conditionally (not rendered when null, preventing "Introduced: —")
- Admin reprocess endpoint: `POST /api/admin/bills/{bill_id}/reprocess`
### v0.5.0 — Follow Modes, Public Browsing & Draft Letter Generator
**Follow Modes:**
- `follow_mode` column on `follows` table: `neutral | pocket_veto | pocket_boost`
- `FollowButton` replaced with a mode-selector dropdown (shield/zap/heart icons, descriptions for each mode)
- `pocket_veto` — alert only on advancement milestones; `pocket_boost` — all changes + action prompts
- Mode stored per-follow; respected by notification dispatcher
**Public Browsing:**
- Unauthenticated guests can browse bills, members, topics, and the trending dashboard
- `AuthModal` gates follow and other interactive actions
- Sidebar and nav adapt to guest state (no email/logout shown)
- All public endpoints already auth-free; guard refactored to allow guest reads
**Draft Constituent Letter Generator (email_gen):**
- `DraftLetterPanel.tsx` — collapsible UI below `BriefPanel` for bills with a brief
- User selects up to 3 cited points from the brief, picks stance (YES/NO), tone, optional ZIP (not stored)
- Stance pre-fills from follow mode; clears on unfollow (ref-tracked, not effect-guarded)
- Recipient derived from bill chamber — no dropdown needed
- `POST /api/bills/{bill_id}/draft-letter` endpoint: reads LLM provider/model from `AppSetting` (respects Settings page), wraps LLM errors with human-readable messages (quota, rate limit, auth)
- `generate_text(prompt) → str` added to `LLMProvider` ABC and all four providers
**Bill Text Status Indicators:**
- `has_document` field added to `BillSchema` (list endpoint) via a single batch `SELECT DISTINCT` — no per-card queries
- `BillCard` shows: `Brief` (green) / `Pending` (amber) / `No text` (muted) based on brief + document state
### v0.6.0 — Phase 2: Change-driven Alerts & Fact/Inference Labeling
**Change-driven Alerts:**
- `notification_utils.py` milestone keyword list expanded: added `"markup"` (markup sessions) and `"conference"` (conference committee)
- New `is_referral_action()` classifier for committee referrals (`"referred to"`)
- Two-tier notification system: `milestone_tier` field in `NotificationEvent.payload`
- `"progress"` — high-signal milestones (passed, signed, markup, etc.): all follow types notified
- `"referral"` — committee referral: pocket_veto and pocket_boost notified; neutral silently dropped
- **Topic followers now receive `bill_updated` milestone notifications** — previously they only received `new_document`/`new_amendment` events. Fixed by querying the bill's latest brief for `topic_tags` inside `_update_bill_if_changed()`
- All three follow types (bill, sponsor, topic) covered for both tiers
**Fact vs Inference Labeling:**
- `label: "cited_fact" | "inference"` added to every cited key_point and risk in the LLM JSON schema
- System prompt updated for all four providers (OpenAI, Anthropic, Gemini, Ollama)
- UI: neutral "Inferred" badge shown next to inference items in `AIBriefCard`; cited_fact items render cleanly without a badge
- `backfill_brief_labels` Celery task: classifies existing cited points in-place — one compact LLM call per brief (all points batched), updates JSONB with `flag_modified`, no brief re-generation
- `POST /api/admin/backfill-labels` endpoint + "Backfill Fact/Inference Labels" button in Admin panel
- `unlabeled_briefs` counter added to `/api/admin/stats` and pipeline breakdown table
**Admin Panel Cleanup:**
- Manual Controls split into two sections: always-visible recurring controls (Poll, Members, Trends, Actions, Resume) and a collapsible **Maintenance** section for one-time backfill tasks
- Maintenance section header shows "⚠ action needed" when any backfill has a non-zero count
### v0.2.2 — Sponsor Linking & Search Fixes
- **Root cause fixed:** Congress.gov list API does not return sponsor data — only the detail endpoint does. Poller now calls the detail endpoint for each new bill to get the sponsor and populate `bill.sponsor_id`
- **Backfill task:** `backfill_sponsor_ids` Celery task + `/api/admin/backfill-sponsors` endpoint + "Backfill Sponsors" button in Admin UI — fixes existing bills with `NULL` sponsor_id (~10 req/sec, ~3 min for 1,600 bills)