Getting Started with RankDisco
Welcome to RankDisco, an SEO intelligence platform built on Cloudflare Workers. This guide will get you from zero to classifying URLs in about 5 minutes.
What is RankDisco?
RankDisco is a backlink and URL classification engine that automatically categorizes domains, URLs, and keywords using a multi-stage AI pipeline. It powers SEO intelligence workflows including:
- Domain Classification - Determine if a site is SaaS, ecommerce, news, forum, etc.
- URL Classification - Identify page types (blog post, product page, forum thread)
- Keyword Classification - Understand search intent, buyer journey stage, and more
- Backlink Intelligence - Analyze backlink profiles with marketing DNA insights
- SERP Tracking - Monitor keyword rankings over time
Who is this for?
- SEO teams building backlink analysis tools
- Marketing platforms needing URL categorization
- Developers integrating classification into their apps
Key Features:
- Cost-optimized 7-stage classification pipeline (FREE stages first, paid only when needed)
- Self-learning system via Vectorize embeddings
- Real-time pipeline monitoring via WebSocket
- Admin console for reviewing and correcting classifications
Quick Start
Prerequisites
Before you begin, ensure you have:
- Node.js 20+ installed
- Wrangler CLI:
npm install -g wrangler - Cloudflare account with Workers, D1, Queues, and Vectorize enabled
- DataForSEO account (for backlink and keyword data)
1. Clone and Install
git clone https://github.com/your-org/rankdisco.git
cd rankdisco
# Install all workspace dependencies
npm install
2. Configure Cloudflare
Log in to Cloudflare and create the required resources:
# Authenticate with Cloudflare
wrangler login
# The project uses these Cloudflare resources (already configured in wrangler.toml):
# - D1 Database: rankfabric_db
# - KV Namespaces: LOOKUP_CACHE, PIPELINE_ACTIVITY, BRAND_CACHE, etc.
# - Queues: domain-classify, url-classify, keyword-classify, domain-onboard
# - Vectorize Indexes: domain-classifier, backlink-classifier, keyword-classifier
# - R2 Bucket: rankfabric-payloads
3. Set Up Secrets
Configure your API credentials:
cd packages/api
# DataForSEO credentials (required)
wrangler secret put DATAFORSEO_LOGIN
wrangler secret put DATAFORSEO_PASSWORD
# Optional integrations
wrangler secret put OPENAI_API_KEY # For enhanced LLM classification
wrangler secret put ZENROWS_API_KEY # For web scraping fallback
4. Run Database Migrations
Apply the D1 schema:
# List available migrations
ls packages/api/migrations/
# Apply all migrations
wrangler d1 execute rankfabric_db --file=packages/api/migrations/0001_initial.sql
# ... apply subsequent migrations in order
5. Start Development Server
# From the repo root
npm run api:dev
# Or from packages/api
cd packages/api
npm run dev
You should see:
Ready on http://localhost:8787
6. Verify It Works
curl http://localhost:8787/health
Expected response:
{
"status": "ok",
"version": "1.0.0",
"timestamp": "2025-01-20T12:00:00Z"
}
Your First Domain
Let's onboard a domain and fetch its backlink profile. Domain onboarding triggers:
- Backlink fetching from DataForSEO
- Keyword ranking data collection
- Automatic classification of the domain and its backlink sources
Onboard a Domain
curl -X POST http://localhost:8787/api/admin/domains/onboard \
-H "Content-Type: application/json" \
-d '{
"domain": "notion.so",
"options": {
"backlinks_limit": 50,
"keywords_limit": 100
}
}'
Response:
{
"success": true,
"domain_id": 12345,
"workflow_id": "wf_abc123xyz",
"status_url": "/api/admin/workflow/domain-onboard/wf_abc123xyz"
}
Check Onboarding Progress
curl http://localhost:8787/api/admin/workflow/domain-onboard/wf_abc123xyz
Response:
{
"id": "wf_abc123xyz",
"status": "running",
"steps": [
{ "name": "initialize", "status": "completed" },
{ "name": "fetch-backlinks", "status": "running" },
{ "name": "fetch-keywords", "status": "pending" },
{ "name": "fetch-summary", "status": "pending" },
{ "name": "finalize", "status": "pending" }
]
}
View the Domain
Once complete, retrieve the domain data:
curl "http://localhost:8787/api/admin/domains?domain=notion.so"
Response:
{
"domain": "notion.so",
"tier1_type": "platform",
"domain_type": "saas_product",
"classification_confidence": 95,
"classification_source": "rules",
"backlinks_count": 50,
"keywords_count": 100,
"onboard_status": "complete"
}
Your First Classification
RankDisco can classify individual URLs through a 5-stage pipeline that balances cost and accuracy.
Classify a URL
curl -X POST http://localhost:8787/api/admin/classifier/classify \
-H "Content-Type: application/json" \
-d '{
"url": "https://techcrunch.com/2025/01/15/ai-startup-raises-100m/"
}'
Response:
{
"url": "https://techcrunch.com/2025/01/15/ai-startup-raises-100m/",
"domain": "techcrunch.com",
"classification": {
"tier1_type": "information",
"domain_type": "news_publisher",
"page_type": "news_article",
"tactic_type": "pr_funding_announcement",
"confidence": 92
},
"stages_completed": ["rules", "vectorize"],
"cost": 0
}
Classification Stages Explained
| Stage | Name | Cost | What It Does |
|---|---|---|---|
| 0 | Cache | FREE | Check if already classified |
| 1 | Rules | FREE | Pattern matching (7,100+ known domains) |
| 2 | Vectorize | FREE | Semantic similarity to labeled examples |
| 3 | Content | $0.000125 | DataForSEO Instant Pages fetch |
| 4 | LLM | ~$0.0001 | Workers AI fallback |
The pipeline exits early when confidence reaches 70%+, minimizing costs.
Classify a Domain Directly
curl -X POST http://localhost:8787/api/admin/classifier/domain \
-H "Content-Type: application/json" \
-d '{
"domain": "stripe.com"
}'
Response:
{
"domain": "stripe.com",
"classification": {
"tier1_type": "service",
"domain_type": "financial_service",
"confidence": 98
},
"source": "rules",
"stages_run": 1
}
Classify a Keyword
curl -X POST http://localhost:8787/api/admin/classify-keyword \
-H "Content-Type: application/json" \
-d '{
"keyword": "best project management software for startups"
}'
Response:
{
"keyword": "best project management software for startups",
"classification": {
"journey_moment": "option_evaluating",
"intent_type": "commercial_investigation",
"query_specificity": "long_tail",
"expertise_level": "novice",
"has_brand_mention": false,
"confidence": 85
}
}
Exploring the Console
RankDisco includes an admin console for monitoring and managing classifications.
Start the Console
# From repo root
npm run console:dev
# Or from packages/console
cd packages/console
npm run dev
Open http://localhost:3000 in your browser.
Console Pages
| Page | URL | Purpose |
|---|---|---|
| Dashboard | / | System KPIs, health status, queue overview |
| Domains | /pages/domains.html | Domain classifications, confidence charts |
| URLs | /pages/urls.html | URL classifications, recent activity |
| Keywords | /pages/keywords.html | Keyword classifications across 11 dimensions |
| Queues | /pages/queues.html | Queue depths, processing rates, retries |
| Costs | /pages/costs.html | DataForSEO spend, budget tracking |
| Corrections | /pages/corrections.html | Submit corrections, view patterns |
Key Features
- Real-time DAG visualization - Watch domain onboarding progress through the pipeline
- DataTable component - Sort, filter, and paginate large datasets
- Correction workflow - Fix misclassifications to improve the model
Understanding the Pipeline
Here's how data flows through RankDisco:
+------------------+
| Entry Points |
| API / Webhooks |
+--------+---------+
|
+--------------+--------------+
| |
+---------v---------+ +---------v---------+
| ensureDomain() | | ensureUrl() |
| (creates domain) | | (creates URL) |
+---------+---------+ +---------+---------+
| |
| Not classified? | Domain first, then URL
| |
+---------v---------+ +---------v---------+
| DOMAIN_CLASSIFY | | URL_CLASSIFY |
| QUEUE | | QUEUE |
+---------+---------+ +---------+---------+
| |
+---------v---------+ +---------v---------+
| 7-Stage Pipeline | | 5-Stage Pipeline |
| Rules->Vectorize | | Rules->Vectorize |
| ->Content->LLM | | ->Content->LLM |
+---------+---------+ +---------+---------+
| |
| >= 80% confidence | >= 65% confidence
| |
+---------v---------+ +---------v---------+
| Vectorize Learn | | Vectorize Learn |
| (Self-improving) | | (Self-improving) |
+-------------------+ +-------------------+
Domain Classification Dimensions
| Dimension | Description | Example Values |
|---|---|---|
tier1_type | High-level archetype | platform, commerce, service, information |
domain_type | Specific business type | saas_product, news_publisher, ecommerce_store |
channel_bucket | Marketing channel | pr_earned_media, owned_content_marketing |
quality_tier | Authority level | tier_1 (DR 80+) to tier_5 (DR 0-19) |
URL Classification Dimensions
| Dimension | Description | Example Values |
|---|---|---|
page_type | Type of page | news_article, product_page, forum_thread |
tactic_type | Marketing tactic | pr_funding_announcement, guest_post_editorial |
is_money_page | Conversion page? | true / false |
Next Steps
Now that you have RankDisco running, explore these topics:
Core Documentation
- Architecture Overview - System design and data flows
- Classification Dimensions - Complete taxonomy reference
- Domain Onboarding - Deep dive into the onboarding workflow
API Reference
- Public API - Client-facing endpoints
- Internal API - Admin and operational endpoints
- API Endpoints - DataForSEO wrapper endpoints
Integrations
- DataForSEO Integration - API setup and cost management
- ClickHouse Integration - Analytics storage
- React Client Integration - Frontend integration patterns
Operations
- Runbook - Deployment, monitoring, troubleshooting
- Cost Tracking - Managing DataForSEO spend
- Crawl Management - Queue and rate limit configuration
Common Commands Reference
# Development
npm run api:dev # Start API locally
npm run console:dev # Start admin console
npm run docs:dev # Start documentation site
# Deployment
npm run api:deploy # Deploy API to Cloudflare
wrangler pages deploy packages/console # Deploy console
# Database
wrangler d1 execute rankfabric_db --command "SELECT COUNT(*) FROM domains"
wrangler d1 execute rankfabric_db --file=migrations/XXXX.sql
# Secrets
wrangler secret put SECRET_NAME
wrangler secret list
# Logs
wrangler tail # Stream live logs
wrangler tail --format=json | jq . # Structured logs
Troubleshooting
"DATAFORSEO_LOGIN or DATAFORSEO_PASSWORD missing"
Set your DataForSEO credentials:
wrangler secret put DATAFORSEO_LOGIN
wrangler secret put DATAFORSEO_PASSWORD
Queues Not Processing
Check if queue consumers are enabled in wrangler.toml. Some are disabled by default for maintenance:
# Enable by uncommenting:
[[queues.consumers]]
queue = "domain-classify"
max_batch_size = 20
max_concurrency = 20
Classifications Stuck at Low Confidence
The system learns from high-confidence LLM results. If Vectorize has few examples:
- Manually classify some seed domains via the console
- Lower the confidence threshold temporarily
- Run
/api/admin/classifier/corrections/learnto process pending feedback
Local Development Without Cloudflare Resources
For testing without full Cloudflare setup, the API will degrade gracefully:
- Vectorize lookups return empty (falls through to LLM)
- Queue sends become no-ops
- D1 requires at least a local database binding
Getting Help
- Documentation: docs.rankdisco.com
- Issues: GitHub Issues for bug reports
- Architecture Questions: See Architecture Overview
Happy classifying!