Skip to main content

Getting Started with RankDisco

Welcome to RankDisco, an SEO intelligence platform built on Cloudflare Workers. This guide will get you from zero to classifying URLs in about 5 minutes.


What is RankDisco?

RankDisco is a backlink and URL classification engine that automatically categorizes domains, URLs, and keywords using a multi-stage AI pipeline. It powers SEO intelligence workflows including:

  • Domain Classification - Determine if a site is SaaS, ecommerce, news, forum, etc.
  • URL Classification - Identify page types (blog post, product page, forum thread)
  • Keyword Classification - Understand search intent, buyer journey stage, and more
  • Backlink Intelligence - Analyze backlink profiles with marketing DNA insights
  • SERP Tracking - Monitor keyword rankings over time

Who is this for?

  • SEO teams building backlink analysis tools
  • Marketing platforms needing URL categorization
  • Developers integrating classification into their apps

Key Features:

  • Cost-optimized 7-stage classification pipeline (FREE stages first, paid only when needed)
  • Self-learning system via Vectorize embeddings
  • Real-time pipeline monitoring via WebSocket
  • Admin console for reviewing and correcting classifications

Quick Start

Prerequisites

Before you begin, ensure you have:

  • Node.js 20+ installed
  • Wrangler CLI: npm install -g wrangler
  • Cloudflare account with Workers, D1, Queues, and Vectorize enabled
  • DataForSEO account (for backlink and keyword data)

1. Clone and Install

git clone https://github.com/your-org/rankdisco.git
cd rankdisco

# Install all workspace dependencies
npm install

2. Configure Cloudflare

Log in to Cloudflare and create the required resources:

# Authenticate with Cloudflare
wrangler login

# The project uses these Cloudflare resources (already configured in wrangler.toml):
# - D1 Database: rankfabric_db
# - KV Namespaces: LOOKUP_CACHE, PIPELINE_ACTIVITY, BRAND_CACHE, etc.
# - Queues: domain-classify, url-classify, keyword-classify, domain-onboard
# - Vectorize Indexes: domain-classifier, backlink-classifier, keyword-classifier
# - R2 Bucket: rankfabric-payloads

3. Set Up Secrets

Configure your API credentials:

cd packages/api

# DataForSEO credentials (required)
wrangler secret put DATAFORSEO_LOGIN
wrangler secret put DATAFORSEO_PASSWORD

# Optional integrations
wrangler secret put OPENAI_API_KEY # For enhanced LLM classification
wrangler secret put ZENROWS_API_KEY # For web scraping fallback

4. Run Database Migrations

Apply the D1 schema:

# List available migrations
ls packages/api/migrations/

# Apply all migrations
wrangler d1 execute rankfabric_db --file=packages/api/migrations/0001_initial.sql
# ... apply subsequent migrations in order

5. Start Development Server

# From the repo root
npm run api:dev

# Or from packages/api
cd packages/api
npm run dev

You should see:

Ready on http://localhost:8787

6. Verify It Works

curl http://localhost:8787/health

Expected response:

{
"status": "ok",
"version": "1.0.0",
"timestamp": "2025-01-20T12:00:00Z"
}

Your First Domain

Let's onboard a domain and fetch its backlink profile. Domain onboarding triggers:

  1. Backlink fetching from DataForSEO
  2. Keyword ranking data collection
  3. Automatic classification of the domain and its backlink sources

Onboard a Domain

curl -X POST http://localhost:8787/api/admin/domains/onboard \
-H "Content-Type: application/json" \
-d '{
"domain": "notion.so",
"options": {
"backlinks_limit": 50,
"keywords_limit": 100
}
}'

Response:

{
"success": true,
"domain_id": 12345,
"workflow_id": "wf_abc123xyz",
"status_url": "/api/admin/workflow/domain-onboard/wf_abc123xyz"
}

Check Onboarding Progress

curl http://localhost:8787/api/admin/workflow/domain-onboard/wf_abc123xyz

Response:

{
"id": "wf_abc123xyz",
"status": "running",
"steps": [
{ "name": "initialize", "status": "completed" },
{ "name": "fetch-backlinks", "status": "running" },
{ "name": "fetch-keywords", "status": "pending" },
{ "name": "fetch-summary", "status": "pending" },
{ "name": "finalize", "status": "pending" }
]
}

View the Domain

Once complete, retrieve the domain data:

curl "http://localhost:8787/api/admin/domains?domain=notion.so"

Response:

{
"domain": "notion.so",
"tier1_type": "platform",
"domain_type": "saas_product",
"classification_confidence": 95,
"classification_source": "rules",
"backlinks_count": 50,
"keywords_count": 100,
"onboard_status": "complete"
}

Your First Classification

RankDisco can classify individual URLs through a 5-stage pipeline that balances cost and accuracy.

Classify a URL

curl -X POST http://localhost:8787/api/admin/classifier/classify \
-H "Content-Type: application/json" \
-d '{
"url": "https://techcrunch.com/2025/01/15/ai-startup-raises-100m/"
}'

Response:

{
"url": "https://techcrunch.com/2025/01/15/ai-startup-raises-100m/",
"domain": "techcrunch.com",
"classification": {
"tier1_type": "information",
"domain_type": "news_publisher",
"page_type": "news_article",
"tactic_type": "pr_funding_announcement",
"confidence": 92
},
"stages_completed": ["rules", "vectorize"],
"cost": 0
}

Classification Stages Explained

StageNameCostWhat It Does
0CacheFREECheck if already classified
1RulesFREEPattern matching (7,100+ known domains)
2VectorizeFREESemantic similarity to labeled examples
3Content$0.000125DataForSEO Instant Pages fetch
4LLM~$0.0001Workers AI fallback

The pipeline exits early when confidence reaches 70%+, minimizing costs.

Classify a Domain Directly

curl -X POST http://localhost:8787/api/admin/classifier/domain \
-H "Content-Type: application/json" \
-d '{
"domain": "stripe.com"
}'

Response:

{
"domain": "stripe.com",
"classification": {
"tier1_type": "service",
"domain_type": "financial_service",
"confidence": 98
},
"source": "rules",
"stages_run": 1
}

Classify a Keyword

curl -X POST http://localhost:8787/api/admin/classify-keyword \
-H "Content-Type: application/json" \
-d '{
"keyword": "best project management software for startups"
}'

Response:

{
"keyword": "best project management software for startups",
"classification": {
"journey_moment": "option_evaluating",
"intent_type": "commercial_investigation",
"query_specificity": "long_tail",
"expertise_level": "novice",
"has_brand_mention": false,
"confidence": 85
}
}

Exploring the Console

RankDisco includes an admin console for monitoring and managing classifications.

Start the Console

# From repo root
npm run console:dev

# Or from packages/console
cd packages/console
npm run dev

Open http://localhost:3000 in your browser.

Console Pages

PageURLPurpose
Dashboard/System KPIs, health status, queue overview
Domains/pages/domains.htmlDomain classifications, confidence charts
URLs/pages/urls.htmlURL classifications, recent activity
Keywords/pages/keywords.htmlKeyword classifications across 11 dimensions
Queues/pages/queues.htmlQueue depths, processing rates, retries
Costs/pages/costs.htmlDataForSEO spend, budget tracking
Corrections/pages/corrections.htmlSubmit corrections, view patterns

Key Features

  • Real-time DAG visualization - Watch domain onboarding progress through the pipeline
  • DataTable component - Sort, filter, and paginate large datasets
  • Correction workflow - Fix misclassifications to improve the model

Understanding the Pipeline

Here's how data flows through RankDisco:

                    +------------------+
| Entry Points |
| API / Webhooks |
+--------+---------+
|
+--------------+--------------+
| |
+---------v---------+ +---------v---------+
| ensureDomain() | | ensureUrl() |
| (creates domain) | | (creates URL) |
+---------+---------+ +---------+---------+
| |
| Not classified? | Domain first, then URL
| |
+---------v---------+ +---------v---------+
| DOMAIN_CLASSIFY | | URL_CLASSIFY |
| QUEUE | | QUEUE |
+---------+---------+ +---------+---------+
| |
+---------v---------+ +---------v---------+
| 7-Stage Pipeline | | 5-Stage Pipeline |
| Rules->Vectorize | | Rules->Vectorize |
| ->Content->LLM | | ->Content->LLM |
+---------+---------+ +---------+---------+
| |
| >= 80% confidence | >= 65% confidence
| |
+---------v---------+ +---------v---------+
| Vectorize Learn | | Vectorize Learn |
| (Self-improving) | | (Self-improving) |
+-------------------+ +-------------------+

Domain Classification Dimensions

DimensionDescriptionExample Values
tier1_typeHigh-level archetypeplatform, commerce, service, information
domain_typeSpecific business typesaas_product, news_publisher, ecommerce_store
channel_bucketMarketing channelpr_earned_media, owned_content_marketing
quality_tierAuthority leveltier_1 (DR 80+) to tier_5 (DR 0-19)

URL Classification Dimensions

DimensionDescriptionExample Values
page_typeType of pagenews_article, product_page, forum_thread
tactic_typeMarketing tacticpr_funding_announcement, guest_post_editorial
is_money_pageConversion page?true / false

Next Steps

Now that you have RankDisco running, explore these topics:

Core Documentation

API Reference

Integrations

Operations


Common Commands Reference

# Development
npm run api:dev # Start API locally
npm run console:dev # Start admin console
npm run docs:dev # Start documentation site

# Deployment
npm run api:deploy # Deploy API to Cloudflare
wrangler pages deploy packages/console # Deploy console

# Database
wrangler d1 execute rankfabric_db --command "SELECT COUNT(*) FROM domains"
wrangler d1 execute rankfabric_db --file=migrations/XXXX.sql

# Secrets
wrangler secret put SECRET_NAME
wrangler secret list

# Logs
wrangler tail # Stream live logs
wrangler tail --format=json | jq . # Structured logs

Troubleshooting

"DATAFORSEO_LOGIN or DATAFORSEO_PASSWORD missing"

Set your DataForSEO credentials:

wrangler secret put DATAFORSEO_LOGIN
wrangler secret put DATAFORSEO_PASSWORD

Queues Not Processing

Check if queue consumers are enabled in wrangler.toml. Some are disabled by default for maintenance:

# Enable by uncommenting:
[[queues.consumers]]
queue = "domain-classify"
max_batch_size = 20
max_concurrency = 20

Classifications Stuck at Low Confidence

The system learns from high-confidence LLM results. If Vectorize has few examples:

  1. Manually classify some seed domains via the console
  2. Lower the confidence threshold temporarily
  3. Run /api/admin/classifier/corrections/learn to process pending feedback

Local Development Without Cloudflare Resources

For testing without full Cloudflare setup, the API will degrade gracefully:

  • Vectorize lookups return empty (falls through to LLM)
  • Queue sends become no-ops
  • D1 requires at least a local database binding

Getting Help

Happy classifying!