Classification Pipeline Master Plan
Executive Summary
This document tracks the implementation of critical improvements to our three classification pipelines (Domain, URL, Keyword).
STATUS: Sprints 1-4 COMPLETE (December 2024)
Implementation Status
| Sprint | Phase | Status | Notes |
|---|---|---|---|
| Sprint 1 | Remove Bubble-Up, Domain-First | COMPLETE | Removed maybeUpdateDomainClassification(), added domain-first enforcement |
| Sprint 2 | Learning Improvements | COMPLETE | All pipelines now consistently feed Vectorize |
| Sprint 3 | Negative Learning | COMPLETE | classification_corrections table, correction_patterns, API endpoints |
| Sprint 4 | Admin Console | COMPLETE | Enterprise-grade UI with DataTable, KPIs, charts |
| Sprint 5 | Polish & Monitor | COMPLETE | Documentation, mermaid diagrams updated |
Completed Work
Sprint 1: Clean Foundation (COMPLETE)
1.1 Removed Domain Bubble-Up
- Removed
maybeUpdateDomainClassification()frombacklink-classify-consumer.js - Removed all calls at lines 624 and 936
- Removed unused DOMAIN_AGGREGATION_* constants
1.2 Enforce Domain Classification Before URL
- Added domain classification check in
ensureUrl()inurl-management.js - URLs cannot be classified without their domain being classified first
- If domain not classified, URL classification waits or triggers domain classification
Sprint 2: Learning Improvements (COMPLETE)
2.1 Domain Pipeline Learning
- Moved learning to Stage 6 for ALL classification sources (not just LLM)
- Imports centralized from
classification-config.js:DOMAIN_LEARNING_MIN_CONFIDENCE(80%)shouldTriggerLearning(confidence, 'domain')
- Rules engine matches at 95%+ now feed Vectorize
- LLM results at 80%+ feed Vectorize
2.2 URL Pipeline Learning
- Standardized learning thresholds via
classification-config.js - Uses
LEARNING_MIN_CONFIDENCE(65%) consistently - Added
shouldTriggerLearning()checks in learnFromClassification()
2.3 Keyword Pipeline
- Already well-designed, learning at 70% threshold
- No changes needed, serves as model for other pipelines
Sprint 3: Negative Learning (COMPLETE)
3.1 Database Schema
Created migrations/0119_classification_corrections.sql:
-- Track classification corrections/feedback
CREATE TABLE classification_corrections (
id INTEGER PRIMARY KEY AUTOINCREMENT,
entity_type TEXT NOT NULL, -- 'domain', 'url', 'keyword'
entity_id INTEGER NOT NULL,
original_dimension TEXT NOT NULL,
original_value TEXT,
corrected_value TEXT NOT NULL,
confidence_before INTEGER,
notes TEXT,
created_by TEXT DEFAULT 'system',
created_at INTEGER NOT NULL DEFAULT (unixepoch()),
processed_at INTEGER,
UNIQUE(entity_type, entity_id, original_dimension, corrected_value)
);
-- Track correction patterns for generating rules
CREATE TABLE correction_patterns (
id INTEGER PRIMARY KEY AUTOINCREMENT,
entity_type TEXT NOT NULL,
dimension TEXT NOT NULL,
from_value TEXT,
to_value TEXT NOT NULL,
count INTEGER DEFAULT 1,
suggested_rule TEXT,
created_at INTEGER NOT NULL DEFAULT (unixepoch()),
last_seen_at INTEGER NOT NULL DEFAULT (unixepoch()),
UNIQUE(entity_type, dimension, from_value, to_value)
);
3.2 Corrections Module
Created src/lib/classification-corrections.js:
recordCorrection()- Save correction and update entitygetPendingCorrections()- Get unprocessed correctionsgetSuggestedRules()- Analyze patterns for auto-rulesgetCorrectionHistory()- View correction historygetCorrectionStats()- Dashboard statisticsprocessPendingCorrections()- Batch process for Vectorize feedback
3.3 Admin Endpoints
Added to src/endpoints/admin-classifier.js:
POST /api/admin/classifier/corrections- Submit correctionGET /api/admin/classifier/corrections/stats- Dashboard statsGET /api/admin/classifier/corrections/history/:type/:id- Entity historyGET /api/admin/classifier/corrections/patterns- Suggested rulesPOST /api/admin/classifier/corrections/learn- Process pending
Sprint 4: Admin Console (COMPLETE)
4.1 Enterprise Design System
Complete rewrite of console/css/style.css:
- CSS custom properties for theming
- Dark theme with professional color palette
- KPI cards with trends and colors
- Progress bars and badges
- Toast notifications
- Responsive grid layouts
4.2 Reusable Components
Created console/js/components.js:
- DataTable class with:
- Sorting (click headers)
- Pagination (configurable page sizes)
- Search (debounced)
- Custom cell renderers
- Loading/empty states
- Toast notification system
- Format helpers (number, percent, currency, date, relative)
- confidenceBadge() - Color-coded confidence display
- classificationBadge() - Classification value badges
4.3 Page Upgrades
Main Dashboard (console/index.html):
- New sidebar with logo and nav sections
- System KPIs with trend indicators
- Entity cards with progress bars
- Health table with status badges
- Distribution charts
URLs Page (console/pages/urls.html):
- Tabbed interface (Overview, Distributions, Recent, Needs Review)
- DataTable with full sorting/pagination
- Source filter dropdown
- Confidence distribution charts
Domains Page (console/pages/domains.html):
- Tabbed interface
- Confidence by dimension table
- Domain type distribution charts
- Classify unclassified button
Keywords Page (console/pages/keywords.html):
- 11 dimension charts
- Brand and location stats
- Journey/Intent confidence KPIs
- Random sample loader
Costs Page (console/pages/costs.html):
- Budget tracking with alerts
- Daily cost line chart
- Service breakdown pie charts
- Cost per request efficiency
- Export CSV button
Queues Page (console/pages/queues.html):
- Queue cards grid with status badges
- Processing rate chart
- Failed messages table
- Retry all / Retry individual buttons
- Real-time status indicators
Corrections Page (console/pages/corrections.html):
- NEW page for negative learning
- Correction stats KPIs
- Pattern analysis
- Add correction form
- Correction history table
Architecture Diagrams
Classification Pipeline Flow
Domain Classification Pipeline Detail
Admin Console Architecture
Success Metrics
| Metric | Target | Current |
|---|---|---|
| Domain Classification Coverage | 100% before URLs | COMPLETE |
| Learning Rate (Vectorize) | > 60% of high-conf | COMPLETE |
| Correction System | Implemented | COMPLETE |
| Admin Console | Enterprise-grade | COMPLETE |
| Review Queue | < 100 items | MONITORING |
Files Modified/Created
Phase 1 (Remove Bubble-Up)
src/queue/backlink-classify-consumer.js- Removed bubble-up functionsrc/lib/url-management.js- Domain-first enforcement
Phase 2 (Learning)
src/lib/domain-classifier.js- Learning for all stagessrc/lib/url-classifier.js- Standardized learning thresholdssrc/lib/classification-config.js- Centralized thresholds
Phase 3 (Negative Learning)
migrations/0119_classification_corrections.sql- New tablessrc/lib/classification-corrections.js- NEW modulesrc/endpoints/admin-classifier.js- Correction endpoints
Phase 4 (Console)
console/css/style.css- Enterprise design systemconsole/js/components.js- DataTable, Toast, Formatconsole/js/api.js- Correction API methodsconsole/index.html- Complete redesignconsole/pages/urls.html- Enterprise upgradeconsole/pages/domains.html- Enterprise upgradeconsole/pages/keywords.html- Enterprise upgradeconsole/pages/costs.html- Enterprise upgradeconsole/pages/queues.html- Enterprise upgradeconsole/pages/corrections.html- NEW page
Future Considerations
Index Versioning (Deferred)
- Weekly snapshots of Vectorize to R2
- Version metadata for rollback
- Not urgent, can implement if quality issues arise
Cross-Pipeline Learning (Deferred)
- Keyword brand mentions → domain verification
- URL classification → domain reinforcement
- Always via LLM with human oversight
- Queue for review, never auto-update
LLM Model Selection (Deferred)
- Different models for different confidence levels
- Claude for low-confidence review
- Llama 3.3 70B for standard classification
Maintenance Notes
Adding New Classification Dimensions
- Update pipeline stage in appropriate classifier
- Add rules in
classifier-rules-engine.js - Update Vectorize metadata schema
- Add column to D1 table
- Update admin console charts/tables
Monitoring Classification Quality
- Check admin dashboard KPIs daily
- Review low-confidence queue weekly
- Analyze correction patterns monthly
- Update rules engine based on patterns
Deploying Changes
- Run migrations:
wrangler d1 execute RANKFABRIC_DB --file=migrations/XXXX.sql - Deploy worker:
wrangler deploy - Deploy console:
wrangler pages deploy console/ - Verify via admin console