RankFabric Documentation
Welcome to RankFabric - a unified Cloudflare Worker platform providing integrated SEO intelligence, app store analytics, and marketing data services.
Quick Reference
| I want to... | Go to |
|---|---|
| Get started quickly | Getting Started |
| Understand the architecture | Architecture Overview |
| Learn about classification | Classification Pipeline |
| Set up queues | Queue Infrastructure |
| Debug an issue | Troubleshooting |
| Check API endpoints | API Reference |
| Understand the database | Database Schema |
| Deploy to production | Operations Runbook |
System Architecture
Storage Responsibilities:
| Storage | Purpose | Examples |
|---|---|---|
| D1 | Operational data, classifications | domains, urls, keywords, apps, rankings |
| ClickHouse | Analytics, time-series | keyword_snapshots, serp_runs, app_analytics |
| Base44 | Canonical entities | Projects, Categories, Business Types |
| KV | Run state, caching | DFS_RUNS, budgets, category mappings |
| R2 | Raw payloads | HTML snapshots, crawl data |
| Vectorize | ML embeddings | Classification similarity search |
Documentation by Audience
For Developers
New to the codebase? Start here:
| Document | Description |
|---|---|
| Architecture Overview | System design, project structure, component roles |
| Backend Lifecycle | Worker phases, queue consumers, request flow |
| API Endpoints | Complete API surface with examples |
| React Integration | Frontend workflows and API consumption |
| Client API Guide | How to build API clients |
For Operators
Running the platform? Essential guides:
| Document | Description |
|---|---|
| Operations Runbook | Deployment, monitoring, troubleshooting |
| Cost Tracking | Budget management and cost optimization |
| Crawl Management | Job scheduling, queue management |
| Diagnostics | Debug endpoints and health checks |
For Architects
Understanding the design? Deep dives:
| Document | Description |
|---|---|
| Data Architecture | Storage layer responsibilities, data flow |
| System Overview | High-level system design |
| Workflows | Durable Workflows orchestration |
| Architecture Diagrams | Visual system representations |
Core Products
1. Keyword Research
AI-powered website analysis, category detection, and keyword harvesting.
- Product Guide
- Keyword Classification - Multi-dimensional keyword intelligence
2. SERP Tracking
Daily and on-demand search ranking monitoring with location support.
- Product Guide
- Location-aware tracking with local pack support
3. App Store Crawler
Apple App Store and Google Play catalog discovery and rankings.
- Product Guide
- Multi-platform category rankings
4. Backlink Intelligence
Marketing DNA analysis and backlink profile classification.
5. Domain Onboarding
Complete domain intelligence gathering with automated classification.
Classification Pipeline
RankFabric uses a sophisticated multi-stage classification system optimized for cost and accuracy.
Classification Documentation
| Document | Description |
|---|---|
| Pipeline Master Plan | Implementation status, architecture diagrams |
| Domain Classification | 7-stage domain classification pipeline |
| URL Classification | URL page type classification |
| Keyword Classification | Multi-dimensional keyword intelligence |
| Backlink Classification | Backlink profile analysis |
| Classification Dimensions | Taxonomy and dimension reference |
Key Concepts
- Early Exit: Stages exit at 70% confidence to minimize costs
- Self-Learning: High-confidence results feed back to Vectorize
- Domain-First: URLs wait for domain classification before processing
- Negative Learning: Corrections improve future classifications
Queue Infrastructure
All heavy processing runs through Cloudflare Queues for reliability and observability.
Queue Overview
| Queue | Consumer | Purpose |
|---|---|---|
rankfabric-tasks | task-consumer | Main work queue (keyword harvest, SERP tracking) |
domain-classify | domain-classify-consumer | Domain classification pipeline |
url-classify | url-classify-consumer | URL/backlink classification |
keyword-classify | keyword-classify-consumer | Keyword classification |
clickhouse-ingestion | clickhouse-consumer | Batched analytics writes |
app-details-fetch | app-details-consumer | App metadata enrichment |
llm-verify | llm-verify-consumer | Low-confidence verification |
rankfabric-dlq | - | Dead letter queue for failures |
Queue Flow Patterns
- User-initiated: HTTP -> KV state -> Queue -> Storage
- Scheduled: Cron -> Queue -> Storage
- Webhook: DataForSEO callback -> Queue -> Storage
- Cascading: Domain queue -> URL queue -> Keyword queue
Workflow Orchestration
Cloudflare Durable Workflows provide visibility, state management, and automatic retries.
| Workflow | Purpose | Trigger |
|---|---|---|
| AssetOnboardWorkflow | Master orchestrator for all assets | POST /api/assets |
| DomainOnboardWorkflow | Domain intelligence gathering | Asset onboard |
| UrlClassifyWorkflow | 4-stage URL classification | Backlink queue |
| KeywordClassifyWorkflow | 5-stage keyword classification | Keyword queue |
| SerpTrackingWorkflow | SERP position tracking | POST /api/keywords/track |
| AppDetailsWorkflow | App store metadata enrichment | Asset onboard |
See Workflows Documentation for detailed flow diagrams.
Vectorize ML System
RankFabric uses Cloudflare Vectorize for semantic similarity classification.
Vectorize Indexes
| Index | Purpose | Embedding Model |
|---|---|---|
domain-classifier | Domain type similarity | BGE-base |
backlink-classifier | URL page type similarity | BGE-base |
keyword-classifier | Keyword intent/funnel similarity | BGE-base |
How It Works
- Training: High-confidence classifications are embedded and stored
- Inference: New items are embedded and compared to known examples
- Feedback: Corrections improve the model over time
See Classification Pipeline for implementation details.
Database Schema
D1 Tables (Operational)
| Table | Purpose |
|---|---|
domains | Domain records with classification |
urls | URL records with page type classification |
keywords | Global keyword repository |
apps | App metadata (Apple/Google) |
app_category_rankings | App positions in charts |
brands | Developer/company entities |
jobs | Job queue tracking |
ClickHouse Tables (Analytics)
| Table | Purpose |
|---|---|
keyword_snapshots | Historical keyword metrics |
serp_runs | SERP tracking results |
app_analytics | App ranking history |
See Database Schema Reference for complete documentation.
Integration Guides
| Integration | Description |
|---|---|
| React Client | Frontend workflows and API consumption |
| Base44 | Entity management and relationships |
| DataForSEO | API endpoints, limits, credentials |
| ClickHouse | Analytics storage, ingestion queue |
Troubleshooting
Common Issues
| Problem | Solution |
|---|---|
| DataForSEO quota exceeded | Check KV DFS_BUDGETS, adjust limits in wrangler.toml |
| ClickHouse connection failed | Verify secrets, check /test/clickhouse endpoint |
| Classification stuck | Check queue status, review DLQ for errors |
| Apple rate limited | Worker auto-backs off; wait and retry |
| Queue not draining | Check consumer logs, verify bindings |
Debug Endpoints
| Endpoint | Purpose |
|---|---|
/test/clickhouse | Test ClickHouse connectivity |
/diagnostics/run/{id} | Inspect run state and errors |
/api/admin/queues/status | Queue health and depths |
/api/admin/classifier/stats | Classification statistics |
See Operations Runbook for detailed troubleshooting.
Internal Documentation
Development notes and implementation plans:
| Document | Description |
|---|---|
| D1 Subrequest Audit | Database query optimization |
| Domain Setup | Domain onboarding implementation |
| Workflow Implementation Plan | Workflow development notes |
| Session Notes | Development session notes |
Reference Documentation
| Document | Description |
|---|---|
| API Endpoints | Complete API surface |
| API Public | Public API documentation |
| API Internal | Internal/admin API docs |
| Database Schema | D1 and ClickHouse schemas |
| Data Flows | System data flow documentation |
| Diagnostics | Debug endpoints and health checks |
Additional Resources
Getting Started
Prerequisites
- Cloudflare Account with Workers, D1, KV, R2, Queues, and Vectorize access
- DataForSEO Account for keyword and SERP data
- ClickHouse instance (Cloud recommended)
- Base44 for entity management (optional)
Quick Setup
# Clone the repository
git clone <repo-url>
cd rankfabric-edge-worker/packages/api
# Install dependencies
npm install
# Configure secrets
wrangler secret put DATAFORSEO_LOGIN
wrangler secret put DATAFORSEO_PASSWORD
wrangler secret put CLICKHOUSE_HOST
wrangler secret put CLICKHOUSE_USER
wrangler secret put CLICKHOUSE_PASSWORD
# Deploy
wrangler deploy
Verify Deployment
# Test ClickHouse connection
curl https://your-worker.workers.dev/test/clickhouse
# Run smoke test
curl -X POST https://your-worker.workers.dev/run \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com","project_id":"test"}'
See Operations Runbook for complete deployment guide.
Getting Help
- Operational issues? → Runbook
- API questions? → API Reference
- Data model questions? → Database Schema
- Integration questions? → See
integrations/guides - Classification questions? → Pipeline Plan