Getting Started with RankDisco

Welcome to RankDisco, an SEO intelligence platform built on Cloudflare Workers. This guide will get you from zero to classifying URLs in about 5 minutes.

What is RankDisco?

RankDisco is a backlink and URL classification engine that automatically categorizes domains, URLs, and keywords using a multi-stage AI pipeline. It powers SEO intelligence workflows including:

Domain Classification - Determine if a site is SaaS, ecommerce, news, forum, etc.
URL Classification - Identify page types (blog post, product page, forum thread)
Keyword Classification - Understand search intent, buyer journey stage, and more
Backlink Intelligence - Analyze backlink profiles with marketing DNA insights
SERP Tracking - Monitor keyword rankings over time

Who is this for?

SEO teams building backlink analysis tools
Marketing platforms needing URL categorization
Developers integrating classification into their apps

Key Features:

Cost-optimized 7-stage classification pipeline (FREE stages first, paid only when needed)
Self-learning system via Vectorize embeddings
Real-time pipeline monitoring via WebSocket
Admin console for reviewing and correcting classifications

Quick Start

Prerequisites

Before you begin, ensure you have:

Node.js 20+ installed
Wrangler CLI: npm install -g wrangler
Cloudflare account with Workers, D1, Queues, and Vectorize enabled
DataForSEO account (for backlink and keyword data)

1. Clone and Install

git clone https://github.com/your-org/rankdisco.git
cd rankdisco

# Install all workspace dependencies
npm install

2. Configure Cloudflare

# Authenticate with Cloudflare
wrangler login

# The project uses these Cloudflare resources (already configured in wrangler.toml):
# - D1 Database: rankfabric_db
# - KV Namespaces: LOOKUP_CACHE, PIPELINE_ACTIVITY, BRAND_CACHE, etc.
# - Queues: domain-classify, url-classify, keyword-classify, domain-onboard
# - Vectorize Indexes: domain-classifier, backlink-classifier, keyword-classifier
# - R2 Bucket: rankfabric-payloads

3. Set Up Secrets

Configure your API credentials:

cd packages/api

# DataForSEO credentials (required)
wrangler secret put DATAFORSEO_LOGIN
wrangler secret put DATAFORSEO_PASSWORD

# Optional integrations
wrangler secret put OPENAI_API_KEY      # For enhanced LLM classification
wrangler secret put ZENROWS_API_KEY     # For web scraping fallback

4. Run Database Migrations

Apply the D1 schema:

# List available migrations
ls packages/api/migrations/

# Apply all migrations
wrangler d1 execute rankfabric_db --file=packages/api/migrations/0001_initial.sql
# ... apply subsequent migrations in order

5. Start Development Server

# From the repo root
npm run api:dev

# Or from packages/api
cd packages/api
npm run dev

You should see:

Ready on http://localhost:8787

6. Verify It Works

curl http://localhost:8787/health

Expected response:

{
  "status": "ok",
  "version": "1.0.0",
  "timestamp": "2025-01-20T12:00:00Z"
}

Your First Domain

Let's onboard a domain and fetch its backlink profile. Domain onboarding triggers:

Backlink fetching from DataForSEO
Keyword ranking data collection
Automatic classification of the domain and its backlink sources

Onboard a Domain

curl -X POST http://localhost:8787/api/admin/domains/onboard \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "notion.so",
    "options": {
      "backlinks_limit": 50,
      "keywords_limit": 100
    }
  }'

Response:

{
  "success": true,
  "domain_id": 12345,
  "workflow_id": "wf_abc123xyz",
  "status_url": "/api/admin/workflow/domain-onboard/wf_abc123xyz"
}

Check Onboarding Progress

curl http://localhost:8787/api/admin/workflow/domain-onboard/wf_abc123xyz

Response:

{
  "id": "wf_abc123xyz",
  "status": "running",
  "steps": [
    { "name": "initialize", "status": "completed" },
    { "name": "fetch-backlinks", "status": "running" },
    { "name": "fetch-keywords", "status": "pending" },
    { "name": "fetch-summary", "status": "pending" },
    { "name": "finalize", "status": "pending" }
  ]
}

View the Domain

Once complete, retrieve the domain data:

curl "http://localhost:8787/api/admin/domains?domain=notion.so"

Response:

{
  "domain": "notion.so",
  "tier1_type": "platform",
  "domain_type": "saas_product",
  "classification_confidence": 95,
  "classification_source": "rules",
  "backlinks_count": 50,
  "keywords_count": 100,
  "onboard_status": "complete"
}

Your First Classification

RankDisco can classify individual URLs through a 5-stage pipeline that balances cost and accuracy.

Classify a URL

curl -X POST http://localhost:8787/api/admin/classifier/classify \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://techcrunch.com/2025/01/15/ai-startup-raises-100m/"
  }'

Response:

{
  "url": "https://techcrunch.com/2025/01/15/ai-startup-raises-100m/",
  "domain": "techcrunch.com",
  "classification": {
    "tier1_type": "information",
    "domain_type": "news_publisher",
    "page_type": "news_article",
    "tactic_type": "pr_funding_announcement",
    "confidence": 92
  },
  "stages_completed": ["rules", "vectorize"],
  "cost": 0
}

Classification Stages Explained

Stage	Name	Cost	What It Does
0	Cache	FREE	Check if already classified
1	Rules	FREE	Pattern matching (7,100+ known domains)
2	Vectorize	FREE	Semantic similarity to labeled examples
3	Content	$0.000125	DataForSEO Instant Pages fetch
4	LLM	~$0.0001	Workers AI fallback

The pipeline exits early when confidence reaches 70%+, minimizing costs.

Classify a Domain Directly

curl -X POST http://localhost:8787/api/admin/classifier/domain \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "stripe.com"
  }'

Response:

{
  "domain": "stripe.com",
  "classification": {
    "tier1_type": "service",
    "domain_type": "financial_service",
    "confidence": 98
  },
  "source": "rules",
  "stages_run": 1
}

Classify a Keyword

curl -X POST http://localhost:8787/api/admin/classify-keyword \
  -H "Content-Type: application/json" \
  -d '{
    "keyword": "best project management software for startups"
  }'

Response:

{
  "keyword": "best project management software for startups",
  "classification": {
    "journey_moment": "option_evaluating",
    "intent_type": "commercial_investigation",
    "query_specificity": "long_tail",
    "expertise_level": "novice",
    "has_brand_mention": false,
    "confidence": 85
  }
}

Exploring the Console

RankDisco includes an admin console for monitoring and managing classifications.

Start the Console

# From repo root
npm run console:dev

# Or from packages/console
cd packages/console
npm run dev

Open http://localhost:3000 in your browser.

Console Pages

Page	URL	Purpose
Dashboard	`/`	System KPIs, health status, queue overview
Domains	`/pages/domains.html`	Domain classifications, confidence charts
URLs	`/pages/urls.html`	URL classifications, recent activity
Keywords	`/pages/keywords.html`	Keyword classifications across 11 dimensions
Queues	`/pages/queues.html`	Queue depths, processing rates, retries
Costs	`/pages/costs.html`	DataForSEO spend, budget tracking
Corrections	`/pages/corrections.html`	Submit corrections, view patterns

Key Features

Real-time DAG visualization - Watch domain onboarding progress through the pipeline
DataTable component - Sort, filter, and paginate large datasets
Correction workflow - Fix misclassifications to improve the model

Understanding the Pipeline

Here's how data flows through RankDisco:

                    +------------------+
                    |   Entry Points   |
                    |  API / Webhooks  |
                    +--------+---------+
                             |
              +--------------+--------------+
              |                             |
    +---------v---------+         +---------v---------+
    |   ensureDomain()  |         |    ensureUrl()    |
    |  (creates domain) |         |   (creates URL)   |
    +---------+---------+         +---------+---------+
              |                             |
              |  Not classified?            |  Domain first, then URL
              |                             |
    +---------v---------+         +---------v---------+
    | DOMAIN_CLASSIFY   |         |  URL_CLASSIFY     |
    |     QUEUE         |         |     QUEUE         |
    +---------+---------+         +---------+---------+
              |                             |
    +---------v---------+         +---------v---------+
    | 7-Stage Pipeline  |         | 5-Stage Pipeline  |
    |  Rules->Vectorize |         |  Rules->Vectorize |
    |  ->Content->LLM   |         |  ->Content->LLM   |
    +---------+---------+         +---------+---------+
              |                             |
              |  >= 80% confidence          |  >= 65% confidence
              |                             |
    +---------v---------+         +---------v---------+
    | Vectorize Learn   |         | Vectorize Learn   |
    | (Self-improving)  |         | (Self-improving)  |
    +-------------------+         +-------------------+

Domain Classification Dimensions

Dimension	Description	Example Values
`tier1_type`	High-level archetype	`platform`, `commerce`, `service`, `information`
`domain_type`	Specific business type	`saas_product`, `news_publisher`, `ecommerce_store`
`channel_bucket`	Marketing channel	`pr_earned_media`, `owned_content_marketing`
`quality_tier`	Authority level	`tier_1` (DR 80+) to `tier_5` (DR 0-19)

URL Classification Dimensions

Dimension	Description	Example Values
`page_type`	Type of page	`news_article`, `product_page`, `forum_thread`
`tactic_type`	Marketing tactic	`pr_funding_announcement`, `guest_post_editorial`
`is_money_page`	Conversion page?	`true` / `false`

Next Steps

Now that you have RankDisco running, explore these topics:

Core Documentation

Architecture Overview - System design and data flows
Classification Dimensions - Complete taxonomy reference
Domain Onboarding - Deep dive into the onboarding workflow

API Reference

Public API - Client-facing endpoints
Internal API - Admin and operational endpoints
API Endpoints - DataForSEO wrapper endpoints

Integrations

DataForSEO Integration - API setup and cost management
ClickHouse Integration - Analytics storage
React Client Integration - Frontend integration patterns

Operations

Runbook - Deployment, monitoring, troubleshooting
Cost Tracking - Managing DataForSEO spend
Crawl Management - Queue and rate limit configuration

Common Commands Reference

# Development
npm run api:dev          # Start API locally
npm run console:dev      # Start admin console
npm run docs:dev         # Start documentation site

# Deployment
npm run api:deploy       # Deploy API to Cloudflare
wrangler pages deploy packages/console  # Deploy console

# Database
wrangler d1 execute rankfabric_db --command "SELECT COUNT(*) FROM domains"
wrangler d1 execute rankfabric_db --file=migrations/XXXX.sql

# Secrets
wrangler secret put SECRET_NAME
wrangler secret list

# Logs
wrangler tail             # Stream live logs
wrangler tail --format=json | jq .  # Structured logs

Troubleshooting

"DATAFORSEO_LOGIN or DATAFORSEO_PASSWORD missing"

Set your DataForSEO credentials:

wrangler secret put DATAFORSEO_LOGIN
wrangler secret put DATAFORSEO_PASSWORD

Queues Not Processing

Check if queue consumers are enabled in wrangler.toml. Some are disabled by default for maintenance:

# Enable by uncommenting:
[[queues.consumers]]
queue = "domain-classify"
max_batch_size = 20
max_concurrency = 20

Classifications Stuck at Low Confidence

The system learns from high-confidence LLM results. If Vectorize has few examples:

Manually classify some seed domains via the console
Lower the confidence threshold temporarily
Run /api/admin/classifier/corrections/learn to process pending feedback

Local Development Without Cloudflare Resources

For testing without full Cloudflare setup, the API will degrade gracefully:

Vectorize lookups return empty (falls through to LLM)
Queue sends become no-ops
D1 requires at least a local database binding

Getting Help

Documentation: docs.rankdisco.com
Issues: GitHub Issues for bug reports
Architecture Questions: See Architecture Overview

Happy classifying!

What is RankDisco?​

Quick Start​

Prerequisites​

1. Clone and Install​

2. Configure Cloudflare​

3. Set Up Secrets​

4. Run Database Migrations​

5. Start Development Server​

6. Verify It Works​

Your First Domain​

Onboard a Domain​

Check Onboarding Progress​

View the Domain​

Your First Classification​

Classify a URL​

Classification Stages Explained​

Classify a Domain Directly​

Classify a Keyword​

Exploring the Console​

Start the Console​

Console Pages​

Key Features​

Understanding the Pipeline​

Domain Classification Dimensions​

URL Classification Dimensions​

Next Steps​

Core Documentation​

API Reference​

Integrations​

Operations​

Common Commands Reference​

Troubleshooting​

"DATAFORSEO_LOGIN or DATAFORSEO_PASSWORD missing"​

Queues Not Processing​

Classifications Stuck at Low Confidence​

Local Development Without Cloudflare Resources​

Getting Help​

What is RankDisco?

Quick Start

Prerequisites

1. Clone and Install

2. Configure Cloudflare

3. Set Up Secrets

4. Run Database Migrations

5. Start Development Server

6. Verify It Works

Your First Domain

Onboard a Domain

Check Onboarding Progress

View the Domain

Your First Classification

Classify a URL

Classification Stages Explained

Classify a Domain Directly

Classify a Keyword

Exploring the Console

Start the Console

Console Pages

Key Features

Understanding the Pipeline

Domain Classification Dimensions

URL Classification Dimensions

Next Steps

Core Documentation

API Reference

Integrations

Operations

Common Commands Reference

Troubleshooting

"DATAFORSEO_LOGIN or DATAFORSEO_PASSWORD missing"

Queues Not Processing

Classifications Stuck at Low Confidence

Local Development Without Cloudflare Resources

Getting Help