SERP Tracking System

Daily and on-demand search engine ranking monitoring with historical analytics.

Overview

Track keyword rankings in Google search results for customer projects. Supports organic positions, local pack results, and paid ad positions.

Key Features:

Daily automated tracking (cron-based)
On-demand tracking (API-triggered)
Multi-location support
Device targeting (desktop/mobile)
Historical trend analysis
ClickHouse analytics storage
Task-based async mode for large batches

Architecture

┌─────────────┐
│ Daily Cron  │ 10:00 UTC (2 AM PST)
└──────┬──────┘
       │
       ├─> Query D1: SELECT active keywords
       │
       ├─> For each keyword:
       │   └─> QUEUE.send({type: 'serp_tracking'})
       │
       └─> Queue Consumer (max_concurrency: 20)
           ├─> Call DataForSEO SERP API
           ├─> Parse results
           ├─> Save to D1: serp_runs, serp_positions
           └─> Queue to ClickHouse for analytics

Tracking Modes

1. Daily Automated Tracking

Scheduled via cron trigger at 10:00 UTC (2 AM PST).

Flow:

Cron triggers lib/serp-tracking.handleDailySerpTracking
Query D1 for active keywords (with project subscriptions)
Enqueue serp_tracking jobs to main queue
Queue consumer processes with DataForSEO
Results stored in D1 + queued to ClickHouse

Database Query:

SELECT k.*, p.domain 
FROM keywords k
JOIN projects p ON k.project_id = p.id
JOIN customer_subscriptions cs ON p.id = cs.project_id
WHERE k.is_active = 1
  AND cs.subscription_type LIKE '%serp%'
  AND cs.is_active = 1
  AND (cs.expires_at IS NULL OR cs.expires_at > ?)

Configuration:

Modify cron schedule in wrangler.toml:

[triggers]
crons = ["0 10 * * *"]  # 10:00 UTC = 2 AM PST

2. On-Demand Tracking

Trigger tracking for specific keywords immediately.

Endpoint: POST /api/rankings/track

{
  "project_id": "proj_123",
  "keyword": "project management software",
  "location_code": 2840,
  "device": "desktop"
}

Response:

{
  "success": true,
  "run_id": "serp_run_abc123",
  "positions": [
    {
      "rank": 3,
      "url": "https://example.com/features",
      "title": "Best Project Management Software",
      "type": "organic"
    }
  ],
  "cached": false
}

Deduplication:

Results cached in D1 for 1 hour (same keyword + location + device)
Subsequent requests return cached results

3. Task-Based Async Mode

For large batches, use task mode to avoid HTTP timeouts.

Step 1: Create Task

Endpoint: POST /api/serp/tasks/create

{
  "keywords": [
    {
      "keyword": "project management",
      "location_code": 2840,
      "device": "desktop"
    },
    {
      "keyword": "task tracking software",
      "location_code": 2840,
      "device": "mobile"
    }
  ],
  "project_id": "proj_123"
}

Response:

{
  "task_id": "task_abc123",
  "status": "pending",
  "total_keywords": 2
}

Step 2: Poll Status

Endpoint: GET /api/serp/tasks/TASK_ID

{
  "task_id": "task_abc123",
  "status": "completed",
  "completed_keywords": 2,
  "total_keywords": 2,
  "results": [...]
}

Data Models

D1 Tables

queries

CREATE TABLE queries (
  id TEXT PRIMARY KEY,
  keyword TEXT NOT NULL,
  location_code INTEGER,
  language_code TEXT,
  device TEXT,
  created_at INTEGER NOT NULL
);

serp_runs

CREATE TABLE serp_runs (
  id TEXT PRIMARY KEY,
  query_id TEXT NOT NULL,
  project_id TEXT,
  check_url TEXT,
  items_count INTEGER,
  se_results_count INTEGER,
  created_at INTEGER NOT NULL,
  FOREIGN KEY (query_id) REFERENCES queries(id)
);

serp_positions

CREATE TABLE serp_positions (
  id TEXT PRIMARY KEY,
  serp_run_id TEXT NOT NULL,
  rank_group INTEGER,
  rank_absolute INTEGER,
  position TEXT,
  xpath TEXT,
  domain TEXT,
  url TEXT,
  title TEXT,
  description TEXT,
  type TEXT,  -- 'organic', 'local_pack', 'paid'
  is_paid INTEGER DEFAULT 0,
  rating_value REAL,
  rating_count INTEGER,
  created_at INTEGER NOT NULL,
  FOREIGN KEY (serp_run_id) REFERENCES serp_runs(id)
);

ClickHouse Tables

queries

CREATE TABLE queries (
  query_id String,
  keyword String,
  location_code UInt32,
  language_code String,
  device String,
  created_at DateTime
) ENGINE = MergeTree()
ORDER BY (keyword, location_code, created_at);

serp_runs

CREATE TABLE serp_runs (
  run_id String,
  query_id String,
  project_id String,
  customer_id String,
  check_url String,
  items_count UInt16,
  se_results_count UInt32,
  run_date Date,
  created_at DateTime
) ENGINE = MergeTree()
ORDER BY (project_id, query_id, run_date);

serp_positions

CREATE TABLE serp_positions (
  position_id String,
  serp_run_id String,
  query_id String,
  rank_group UInt8,
  rank_absolute UInt16,
  position String,
  domain String,
  url String,
  title String,
  description String,
  type String,
  is_paid UInt8,
  rating_value Float32,
  rating_count UInt32,
  created_at DateTime
) ENGINE = MergeTree()
ORDER BY (query_id, serp_run_id, rank_absolute);

DataForSEO Integration

Live Mode (Synchronous)

Endpoint: https://api.dataforseo.com/v3/serp/google/organic/live/advanced

Request:

[{
  "keyword": "project management software",
  "location_code": 2840,
  "language_code": "en",
  "device": "desktop",
  "depth": 100
}]

Use Case: On-demand tracking, immediate results needed

Cost: ~$0.003 per request

Task Mode (Asynchronous)

Post Task: https://api.dataforseo.com/v3/serp/google/organic/task_post

Get Results: https://api.dataforseo.com/v3/serp/google/organic/task_get/TASK_ID

Use Case: Large batches, cost optimization (cheaper than live mode)

Cost: ~$0.002 per request

API Reference

POST /api/rankings/track

Track keyword ranking immediately.

Request:

{
  "project_id": "proj_123",
  "keyword": "project management software",
  "location_code": 2840,
  "device": "desktop",
  "check_url": "example.com"  // Optional: filter results
}

Response:

{
  "success": true,
  "run_id": "serp_run_abc123",
  "query_id": "query_xyz789",
  "positions": [
    {
      "rank": 3,
      "url": "https://example.com/features",
      "title": "Best Project Management Software",
      "description": "...",
      "type": "organic",
      "rating_value": 4.5,
      "rating_count": 1234
    }
  ],
  "cached": false,
  "se_results_count": 45000000
}

GET /api/rankings

Get historical rankings for a project.

Query Params:

project_id (required)
keyword (optional, filter by keyword)
start_date (optional, ISO date)
end_date (optional, ISO date)

Response:

{
  "project_id": "proj_123",
  "rankings": [
    {
      "date": "2024-01-15",
      "keyword": "project management",
      "rank": 3,
      "url": "https://example.com/features",
      "previous_rank": 5,
      "change": 2
    }
  ]
}

POST /api/serp/tasks/create

Create async SERP tracking task.

Request:

{
  "keywords": [
    {"keyword": "...", "location_code": 2840, "device": "desktop"}
  ],
  "project_id": "proj_123"
}

Response:

{
  "task_id": "task_abc123",
  "status": "pending",
  "total_keywords": 1
}

GET /api/serp/tasks/TASK_ID

Poll task status.

Response:

{
  "task_id": "task_abc123",
  "status": "completed",
  "completed_keywords": 1,
  "total_keywords": 1,
  "results": [...]
}

GET /api/google-ads-status

Health check for DataForSEO SERP API.

Response:

{
  "status": "operational",
  "credits_remaining": 9876,
  "daily_limit": 10000
}

GET /api/domain-metrics

Get aggregated domain metrics from ClickHouse.

Query Params:

domain (required)
start_date (optional)
end_date (optional)

Response:

{
  "domain": "example.com",
  "total_keywords": 234,
  "avg_rank": 12.5,
  "top_10_count": 45,
  "top_3_count": 12,
  "trending_up": 23,
  "trending_down": 8
}

Configuration

Queue Settings

[[queues.consumers]]
queue = "rankfabric-tasks"
max_batch_size = 1
max_batch_timeout = 30
max_retries = 3
max_concurrency = 20  # Balance: Fast processing vs DataForSEO rate limits

Note: DataForSEO has a 30 req/sec limit. Keep max_concurrency ≤ 20 to avoid rate limit errors.

Cron Schedule

[triggers]
crons = [
  "0 10 * * *"  # 10:00 UTC = 2 AM PST
]

To change tracking time, modify the cron expression:

0 2 * * * = 2 AM UTC
0 14 * * * = 2 PM UTC (6 AM PST)

Customer Subscriptions

Tracking runs only for customers with active SERP subscriptions.

Database Setup

CREATE TABLE customer_subscriptions (
  id TEXT PRIMARY KEY,
  customer_id TEXT NOT NULL,
  project_id TEXT NOT NULL,
  subscription_type TEXT,  -- 'serp_organic', 'serp_local', 'app_tracking'
  is_active INTEGER DEFAULT 1,
  started_at INTEGER NOT NULL,
  expires_at INTEGER,
  created_at INTEGER NOT NULL
);

Subscription Types

serp_organic - Organic rankings
serp_local - Local pack rankings
serp_paid - Paid ad positions

Billing Model

Per keyword per day: Each tracked keyword counts as 1 billable unit
Retrieval: Query serp_runs to count daily runs per customer

SELECT customer_id, COUNT(*) as billable_units
FROM serp_runs
WHERE DATE(created_at) = CURRENT_DATE
GROUP BY customer_id;

React Integration

Display Current Rankings

const { data } = await fetch(`/api/rankings?project_id=$PROJECT_ID`);

data.rankings.forEach(ranking => {
  console.log(`${ranking.keyword}: #${ranking.rank} (${ranking.change > 0 ? '↑' : '↓'}${Math.abs(ranking.change)})`);
});

Trigger On-Demand Tracking

const response = await fetch('/api/rankings/track', {
  method: 'POST',
  body: JSON.stringify({
    project_id: 'proj_123',
    keyword: 'project management',
    location_code: 2840,
    device: 'desktop'
  })
});

const { positions } = await response.json();

Build Historical Charts

Query ClickHouse via /api/domain-metrics for aggregated trends.

Troubleshooting

Daily tracking not running

Check:

Cron trigger configured in wrangler.toml
customer_subscriptions table has active subscriptions
Worker logs for cron execution

wrangler tail --format pretty

DataForSEO rate limit errors

Symptoms: 429 errors in logs

Fix: Reduce max_concurrency in queue consumer:

[[queues.consumers]]
max_concurrency = 10  # Lower from 20

Results not appearing in ClickHouse

Check:

ClickHouse ingestion queue health
Schema initialized: GET /init/clickhouse
ClickHouse credentials valid

curl https://your-worker.workers.dev/test/clickhouse

Duplicate tracking runs

Check: D1 serp_runs table for duplicate query_id + created_at within 1 hour

Fix: Ensure deduplication logic in lib/serp-tracking.js is active

Performance & Costs

DataForSEO Costs

Live mode: $0.003 per keyword
Task mode: $0.002 per keyword
Daily cost example: 1,000 keywords/day = $2-3/day

ClickHouse Ingestion

Batched inserts (100 rows per batch) reduce request overhead
Typical ingestion: 1,000 keywords = 10 ClickHouse requests

Queue Processing

With max_concurrency: 20, process ~1,200 keywords/hour
Increase concurrency for faster processing (watch DataForSEO limits)

Future Enhancements

Local pack position tracking (separate endpoint)
Paid ad position monitoring
SERP feature detection (featured snippets, people also ask)
Competitor domain tracking
Automated alerts for rank changes
Multi-engine support (Bing, Yahoo)

Overview​

Architecture​

Tracking Modes​

1. Daily Automated Tracking​

2. On-Demand Tracking​

3. Task-Based Async Mode​

Step 1: Create Task​

Step 2: Poll Status​

Data Models​

D1 Tables​

queries​

serp_runs​

serp_positions​

ClickHouse Tables​

queries​

serp_runs​

serp_positions​

DataForSEO Integration​

Live Mode (Synchronous)​

Task Mode (Asynchronous)​

API Reference​

POST /api/rankings/track​

GET /api/rankings​

POST /api/serp/tasks/create​

GET /api/serp/tasks/TASK_ID​

GET /api/google-ads-status​

GET /api/domain-metrics​

Configuration​

Queue Settings​

Cron Schedule​

Customer Subscriptions​

Database Setup​

Subscription Types​

Billing Model​

React Integration​

Display Current Rankings​

Trigger On-Demand Tracking​

Build Historical Charts​

Troubleshooting​

Daily tracking not running​

DataForSEO rate limit errors​

Results not appearing in ClickHouse​

Duplicate tracking runs​

Performance & Costs​

DataForSEO Costs​

ClickHouse Ingestion​

Queue Processing​

Future Enhancements​

Overview

Architecture

Tracking Modes

1. Daily Automated Tracking

2. On-Demand Tracking

3. Task-Based Async Mode

Step 1: Create Task

Step 2: Poll Status

Data Models

D1 Tables

queries

serp_runs

serp_positions

ClickHouse Tables

queries

serp_runs

serp_positions

DataForSEO Integration

Live Mode (Synchronous)

Task Mode (Asynchronous)

API Reference

POST /api/rankings/track

GET /api/rankings

POST /api/serp/tasks/create

GET /api/serp/tasks/TASK_ID

GET /api/google-ads-status

GET /api/domain-metrics

Configuration

Queue Settings

Cron Schedule

Customer Subscriptions

Database Setup

Subscription Types

Billing Model

React Integration

Display Current Rankings

Trigger On-Demand Tracking

Build Historical Charts

Troubleshooting

Daily tracking not running

DataForSEO rate limit errors

Results not appearing in ClickHouse

Duplicate tracking runs

Performance & Costs

DataForSEO Costs

ClickHouse Ingestion

Queue Processing

Future Enhancements