Skip to main content

SERP Tracking System

Daily and on-demand search engine ranking monitoring with historical analytics.


Overview

Track keyword rankings in Google search results for customer projects. Supports organic positions, local pack results, and paid ad positions.

Key Features:

  • Daily automated tracking (cron-based)
  • On-demand tracking (API-triggered)
  • Multi-location support
  • Device targeting (desktop/mobile)
  • Historical trend analysis
  • ClickHouse analytics storage
  • Task-based async mode for large batches

Architecture

┌─────────────┐
│ Daily Cron │ 10:00 UTC (2 AM PST)
└──────┬──────┘

├─> Query D1: SELECT active keywords

├─> For each keyword:
│ └─> QUEUE.send({type: 'serp_tracking'})

└─> Queue Consumer (max_concurrency: 20)
├─> Call DataForSEO SERP API
├─> Parse results
├─> Save to D1: serp_runs, serp_positions
└─> Queue to ClickHouse for analytics

Tracking Modes

1. Daily Automated Tracking

Scheduled via cron trigger at 10:00 UTC (2 AM PST).

Flow:

  1. Cron triggers lib/serp-tracking.handleDailySerpTracking
  2. Query D1 for active keywords (with project subscriptions)
  3. Enqueue serp_tracking jobs to main queue
  4. Queue consumer processes with DataForSEO
  5. Results stored in D1 + queued to ClickHouse

Database Query:

SELECT k.*, p.domain 
FROM keywords k
JOIN projects p ON k.project_id = p.id
JOIN customer_subscriptions cs ON p.id = cs.project_id
WHERE k.is_active = 1
AND cs.subscription_type LIKE '%serp%'
AND cs.is_active = 1
AND (cs.expires_at IS NULL OR cs.expires_at > ?)

Configuration:

  • Modify cron schedule in wrangler.toml:
    [triggers]
    crons = ["0 10 * * *"] # 10:00 UTC = 2 AM PST

2. On-Demand Tracking

Trigger tracking for specific keywords immediately.

Endpoint: POST /api/rankings/track

{
"project_id": "proj_123",
"keyword": "project management software",
"location_code": 2840,
"device": "desktop"
}

Response:

{
"success": true,
"run_id": "serp_run_abc123",
"positions": [
{
"rank": 3,
"url": "https://example.com/features",
"title": "Best Project Management Software",
"type": "organic"
}
],
"cached": false
}

Deduplication:

  • Results cached in D1 for 1 hour (same keyword + location + device)
  • Subsequent requests return cached results

3. Task-Based Async Mode

For large batches, use task mode to avoid HTTP timeouts.

Step 1: Create Task

Endpoint: POST /api/serp/tasks/create

{
"keywords": [
{
"keyword": "project management",
"location_code": 2840,
"device": "desktop"
},
{
"keyword": "task tracking software",
"location_code": 2840,
"device": "mobile"
}
],
"project_id": "proj_123"
}

Response:

{
"task_id": "task_abc123",
"status": "pending",
"total_keywords": 2
}

Step 2: Poll Status

Endpoint: GET /api/serp/tasks/TASK_ID

{
"task_id": "task_abc123",
"status": "completed",
"completed_keywords": 2,
"total_keywords": 2,
"results": [...]
}

Data Models

D1 Tables

queries

CREATE TABLE queries (
id TEXT PRIMARY KEY,
keyword TEXT NOT NULL,
location_code INTEGER,
language_code TEXT,
device TEXT,
created_at INTEGER NOT NULL
);

serp_runs

CREATE TABLE serp_runs (
id TEXT PRIMARY KEY,
query_id TEXT NOT NULL,
project_id TEXT,
check_url TEXT,
items_count INTEGER,
se_results_count INTEGER,
created_at INTEGER NOT NULL,
FOREIGN KEY (query_id) REFERENCES queries(id)
);

serp_positions

CREATE TABLE serp_positions (
id TEXT PRIMARY KEY,
serp_run_id TEXT NOT NULL,
rank_group INTEGER,
rank_absolute INTEGER,
position TEXT,
xpath TEXT,
domain TEXT,
url TEXT,
title TEXT,
description TEXT,
type TEXT, -- 'organic', 'local_pack', 'paid'
is_paid INTEGER DEFAULT 0,
rating_value REAL,
rating_count INTEGER,
created_at INTEGER NOT NULL,
FOREIGN KEY (serp_run_id) REFERENCES serp_runs(id)
);

ClickHouse Tables

queries

CREATE TABLE queries (
query_id String,
keyword String,
location_code UInt32,
language_code String,
device String,
created_at DateTime
) ENGINE = MergeTree()
ORDER BY (keyword, location_code, created_at);

serp_runs

CREATE TABLE serp_runs (
run_id String,
query_id String,
project_id String,
customer_id String,
check_url String,
items_count UInt16,
se_results_count UInt32,
run_date Date,
created_at DateTime
) ENGINE = MergeTree()
ORDER BY (project_id, query_id, run_date);

serp_positions

CREATE TABLE serp_positions (
position_id String,
serp_run_id String,
query_id String,
rank_group UInt8,
rank_absolute UInt16,
position String,
domain String,
url String,
title String,
description String,
type String,
is_paid UInt8,
rating_value Float32,
rating_count UInt32,
created_at DateTime
) ENGINE = MergeTree()
ORDER BY (query_id, serp_run_id, rank_absolute);

DataForSEO Integration

Live Mode (Synchronous)

Endpoint: https://api.dataforseo.com/v3/serp/google/organic/live/advanced

Request:

[{
"keyword": "project management software",
"location_code": 2840,
"language_code": "en",
"device": "desktop",
"depth": 100
}]

Use Case: On-demand tracking, immediate results needed

Cost: ~$0.003 per request

Task Mode (Asynchronous)

Post Task: https://api.dataforseo.com/v3/serp/google/organic/task_post

Get Results: https://api.dataforseo.com/v3/serp/google/organic/task_get/TASK_ID

Use Case: Large batches, cost optimization (cheaper than live mode)

Cost: ~$0.002 per request


API Reference

POST /api/rankings/track

Track keyword ranking immediately.

Request:

{
"project_id": "proj_123",
"keyword": "project management software",
"location_code": 2840,
"device": "desktop",
"check_url": "example.com" // Optional: filter results
}

Response:

{
"success": true,
"run_id": "serp_run_abc123",
"query_id": "query_xyz789",
"positions": [
{
"rank": 3,
"url": "https://example.com/features",
"title": "Best Project Management Software",
"description": "...",
"type": "organic",
"rating_value": 4.5,
"rating_count": 1234
}
],
"cached": false,
"se_results_count": 45000000
}

GET /api/rankings

Get historical rankings for a project.

Query Params:

  • project_id (required)
  • keyword (optional, filter by keyword)
  • start_date (optional, ISO date)
  • end_date (optional, ISO date)

Response:

{
"project_id": "proj_123",
"rankings": [
{
"date": "2024-01-15",
"keyword": "project management",
"rank": 3,
"url": "https://example.com/features",
"previous_rank": 5,
"change": 2
}
]
}

POST /api/serp/tasks/create

Create async SERP tracking task.

Request:

{
"keywords": [
{"keyword": "...", "location_code": 2840, "device": "desktop"}
],
"project_id": "proj_123"
}

Response:

{
"task_id": "task_abc123",
"status": "pending",
"total_keywords": 1
}

GET /api/serp/tasks/TASK_ID

Poll task status.

Response:

{
"task_id": "task_abc123",
"status": "completed",
"completed_keywords": 1,
"total_keywords": 1,
"results": [...]
}

GET /api/google-ads-status

Health check for DataForSEO SERP API.

Response:

{
"status": "operational",
"credits_remaining": 9876,
"daily_limit": 10000
}

GET /api/domain-metrics

Get aggregated domain metrics from ClickHouse.

Query Params:

  • domain (required)
  • start_date (optional)
  • end_date (optional)

Response:

{
"domain": "example.com",
"total_keywords": 234,
"avg_rank": 12.5,
"top_10_count": 45,
"top_3_count": 12,
"trending_up": 23,
"trending_down": 8
}

Configuration

Queue Settings

[[queues.consumers]]
queue = "rankfabric-tasks"
max_batch_size = 1
max_batch_timeout = 30
max_retries = 3
max_concurrency = 20 # Balance: Fast processing vs DataForSEO rate limits

Note: DataForSEO has a 30 req/sec limit. Keep max_concurrency ≤ 20 to avoid rate limit errors.

Cron Schedule

[triggers]
crons = [
"0 10 * * *" # 10:00 UTC = 2 AM PST
]

To change tracking time, modify the cron expression:

  • 0 2 * * * = 2 AM UTC
  • 0 14 * * * = 2 PM UTC (6 AM PST)

Customer Subscriptions

Tracking runs only for customers with active SERP subscriptions.

Database Setup

CREATE TABLE customer_subscriptions (
id TEXT PRIMARY KEY,
customer_id TEXT NOT NULL,
project_id TEXT NOT NULL,
subscription_type TEXT, -- 'serp_organic', 'serp_local', 'app_tracking'
is_active INTEGER DEFAULT 1,
started_at INTEGER NOT NULL,
expires_at INTEGER,
created_at INTEGER NOT NULL
);

Subscription Types

  • serp_organic - Organic rankings
  • serp_local - Local pack rankings
  • serp_paid - Paid ad positions

Billing Model

  • Per keyword per day: Each tracked keyword counts as 1 billable unit
  • Retrieval: Query serp_runs to count daily runs per customer
SELECT customer_id, COUNT(*) as billable_units
FROM serp_runs
WHERE DATE(created_at) = CURRENT_DATE
GROUP BY customer_id;

React Integration

Display Current Rankings

const { data } = await fetch(`/api/rankings?project_id=$PROJECT_ID`);

data.rankings.forEach(ranking => {
console.log(`${ranking.keyword}: #${ranking.rank} (${ranking.change > 0 ? '↑' : '↓'}${Math.abs(ranking.change)})`);
});

Trigger On-Demand Tracking

const response = await fetch('/api/rankings/track', {
method: 'POST',
body: JSON.stringify({
project_id: 'proj_123',
keyword: 'project management',
location_code: 2840,
device: 'desktop'
})
});

const { positions } = await response.json();

Build Historical Charts

Query ClickHouse via /api/domain-metrics for aggregated trends.


Troubleshooting

Daily tracking not running

Check:

  1. Cron trigger configured in wrangler.toml
  2. customer_subscriptions table has active subscriptions
  3. Worker logs for cron execution
wrangler tail --format pretty

DataForSEO rate limit errors

Symptoms: 429 errors in logs

Fix: Reduce max_concurrency in queue consumer:

[[queues.consumers]]
max_concurrency = 10 # Lower from 20

Results not appearing in ClickHouse

Check:

  1. ClickHouse ingestion queue health
  2. Schema initialized: GET /init/clickhouse
  3. ClickHouse credentials valid
curl https://your-worker.workers.dev/test/clickhouse

Duplicate tracking runs

Check: D1 serp_runs table for duplicate query_id + created_at within 1 hour

Fix: Ensure deduplication logic in lib/serp-tracking.js is active


Performance & Costs

DataForSEO Costs

  • Live mode: $0.003 per keyword
  • Task mode: $0.002 per keyword
  • Daily cost example: 1,000 keywords/day = $2-3/day

ClickHouse Ingestion

  • Batched inserts (100 rows per batch) reduce request overhead
  • Typical ingestion: 1,000 keywords = 10 ClickHouse requests

Queue Processing

  • With max_concurrency: 20, process ~1,200 keywords/hour
  • Increase concurrency for faster processing (watch DataForSEO limits)

Future Enhancements

  • Local pack position tracking (separate endpoint)
  • Paid ad position monitoring
  • SERP feature detection (featured snippets, people also ask)
  • Competitor domain tracking
  • Automated alerts for rank changes
  • Multi-engine support (Bing, Yahoo)