SERP Tracking System
Daily and on-demand search engine ranking monitoring with historical analytics.
Overview
Track keyword rankings in Google search results for customer projects. Supports organic positions, local pack results, and paid ad positions.
Key Features:
- Daily automated tracking (cron-based)
- On-demand tracking (API-triggered)
- Multi-location support
- Device targeting (desktop/mobile)
- Historical trend analysis
- ClickHouse analytics storage
- Task-based async mode for large batches
Architecture
┌─────────────┐
│ Daily Cron │ 10:00 UTC (2 AM PST)
└──────┬──────┘
│
├─> Query D1: SELECT active keywords
│
├─> For each keyword:
│ └─> QUEUE.send({type: 'serp_tracking'})
│
└─> Queue Consumer (max_concurrency: 20)
├─> Call DataForSEO SERP API
├─> Parse results
├─> Save to D1: serp_runs, serp_positions
└─> Queue to ClickHouse for analytics
Tracking Modes
1. Daily Automated Tracking
Scheduled via cron trigger at 10:00 UTC (2 AM PST).
Flow:
- Cron triggers
lib/serp-tracking.handleDailySerpTracking - Query D1 for active keywords (with project subscriptions)
- Enqueue
serp_trackingjobs to main queue - Queue consumer processes with DataForSEO
- Results stored in D1 + queued to ClickHouse
Database Query:
SELECT k.*, p.domain
FROM keywords k
JOIN projects p ON k.project_id = p.id
JOIN customer_subscriptions cs ON p.id = cs.project_id
WHERE k.is_active = 1
AND cs.subscription_type LIKE '%serp%'
AND cs.is_active = 1
AND (cs.expires_at IS NULL OR cs.expires_at > ?)
Configuration:
- Modify cron schedule in
wrangler.toml:[triggers]
crons = ["0 10 * * *"] # 10:00 UTC = 2 AM PST
2. On-Demand Tracking
Trigger tracking for specific keywords immediately.
Endpoint: POST /api/rankings/track
{
"project_id": "proj_123",
"keyword": "project management software",
"location_code": 2840,
"device": "desktop"
}
Response:
{
"success": true,
"run_id": "serp_run_abc123",
"positions": [
{
"rank": 3,
"url": "https://example.com/features",
"title": "Best Project Management Software",
"type": "organic"
}
],
"cached": false
}
Deduplication:
- Results cached in D1 for 1 hour (same keyword + location + device)
- Subsequent requests return cached results
3. Task-Based Async Mode
For large batches, use task mode to avoid HTTP timeouts.
Step 1: Create Task
Endpoint: POST /api/serp/tasks/create
{
"keywords": [
{
"keyword": "project management",
"location_code": 2840,
"device": "desktop"
},
{
"keyword": "task tracking software",
"location_code": 2840,
"device": "mobile"
}
],
"project_id": "proj_123"
}
Response:
{
"task_id": "task_abc123",
"status": "pending",
"total_keywords": 2
}
Step 2: Poll Status
Endpoint: GET /api/serp/tasks/TASK_ID
{
"task_id": "task_abc123",
"status": "completed",
"completed_keywords": 2,
"total_keywords": 2,
"results": [...]
}
Data Models
D1 Tables
queries
CREATE TABLE queries (
id TEXT PRIMARY KEY,
keyword TEXT NOT NULL,
location_code INTEGER,
language_code TEXT,
device TEXT,
created_at INTEGER NOT NULL
);
serp_runs
CREATE TABLE serp_runs (
id TEXT PRIMARY KEY,
query_id TEXT NOT NULL,
project_id TEXT,
check_url TEXT,
items_count INTEGER,
se_results_count INTEGER,
created_at INTEGER NOT NULL,
FOREIGN KEY (query_id) REFERENCES queries(id)
);
serp_positions
CREATE TABLE serp_positions (
id TEXT PRIMARY KEY,
serp_run_id TEXT NOT NULL,
rank_group INTEGER,
rank_absolute INTEGER,
position TEXT,
xpath TEXT,
domain TEXT,
url TEXT,
title TEXT,
description TEXT,
type TEXT, -- 'organic', 'local_pack', 'paid'
is_paid INTEGER DEFAULT 0,
rating_value REAL,
rating_count INTEGER,
created_at INTEGER NOT NULL,
FOREIGN KEY (serp_run_id) REFERENCES serp_runs(id)
);
ClickHouse Tables
queries
CREATE TABLE queries (
query_id String,
keyword String,
location_code UInt32,
language_code String,
device String,
created_at DateTime
) ENGINE = MergeTree()
ORDER BY (keyword, location_code, created_at);
serp_runs
CREATE TABLE serp_runs (
run_id String,
query_id String,
project_id String,
customer_id String,
check_url String,
items_count UInt16,
se_results_count UInt32,
run_date Date,
created_at DateTime
) ENGINE = MergeTree()
ORDER BY (project_id, query_id, run_date);
serp_positions
CREATE TABLE serp_positions (
position_id String,
serp_run_id String,
query_id String,
rank_group UInt8,
rank_absolute UInt16,
position String,
domain String,
url String,
title String,
description String,
type String,
is_paid UInt8,
rating_value Float32,
rating_count UInt32,
created_at DateTime
) ENGINE = MergeTree()
ORDER BY (query_id, serp_run_id, rank_absolute);
DataForSEO Integration
Live Mode (Synchronous)
Endpoint: https://api.dataforseo.com/v3/serp/google/organic/live/advanced
Request:
[{
"keyword": "project management software",
"location_code": 2840,
"language_code": "en",
"device": "desktop",
"depth": 100
}]
Use Case: On-demand tracking, immediate results needed
Cost: ~$0.003 per request
Task Mode (Asynchronous)
Post Task: https://api.dataforseo.com/v3/serp/google/organic/task_post
Get Results: https://api.dataforseo.com/v3/serp/google/organic/task_get/TASK_ID
Use Case: Large batches, cost optimization (cheaper than live mode)
Cost: ~$0.002 per request
API Reference
POST /api/rankings/track
Track keyword ranking immediately.
Request:
{
"project_id": "proj_123",
"keyword": "project management software",
"location_code": 2840,
"device": "desktop",
"check_url": "example.com" // Optional: filter results
}
Response:
{
"success": true,
"run_id": "serp_run_abc123",
"query_id": "query_xyz789",
"positions": [
{
"rank": 3,
"url": "https://example.com/features",
"title": "Best Project Management Software",
"description": "...",
"type": "organic",
"rating_value": 4.5,
"rating_count": 1234
}
],
"cached": false,
"se_results_count": 45000000
}
GET /api/rankings
Get historical rankings for a project.
Query Params:
project_id(required)keyword(optional, filter by keyword)start_date(optional, ISO date)end_date(optional, ISO date)
Response:
{
"project_id": "proj_123",
"rankings": [
{
"date": "2024-01-15",
"keyword": "project management",
"rank": 3,
"url": "https://example.com/features",
"previous_rank": 5,
"change": 2
}
]
}
POST /api/serp/tasks/create
Create async SERP tracking task.
Request:
{
"keywords": [
{"keyword": "...", "location_code": 2840, "device": "desktop"}
],
"project_id": "proj_123"
}
Response:
{
"task_id": "task_abc123",
"status": "pending",
"total_keywords": 1
}
GET /api/serp/tasks/TASK_ID
Poll task status.
Response:
{
"task_id": "task_abc123",
"status": "completed",
"completed_keywords": 1,
"total_keywords": 1,
"results": [...]
}
GET /api/google-ads-status
Health check for DataForSEO SERP API.
Response:
{
"status": "operational",
"credits_remaining": 9876,
"daily_limit": 10000
}
GET /api/domain-metrics
Get aggregated domain metrics from ClickHouse.
Query Params:
domain(required)start_date(optional)end_date(optional)
Response:
{
"domain": "example.com",
"total_keywords": 234,
"avg_rank": 12.5,
"top_10_count": 45,
"top_3_count": 12,
"trending_up": 23,
"trending_down": 8
}
Configuration
Queue Settings
[[queues.consumers]]
queue = "rankfabric-tasks"
max_batch_size = 1
max_batch_timeout = 30
max_retries = 3
max_concurrency = 20 # Balance: Fast processing vs DataForSEO rate limits
Note: DataForSEO has a 30 req/sec limit. Keep max_concurrency ≤ 20 to avoid rate limit errors.
Cron Schedule
[triggers]
crons = [
"0 10 * * *" # 10:00 UTC = 2 AM PST
]
To change tracking time, modify the cron expression:
0 2 * * *= 2 AM UTC0 14 * * *= 2 PM UTC (6 AM PST)
Customer Subscriptions
Tracking runs only for customers with active SERP subscriptions.
Database Setup
CREATE TABLE customer_subscriptions (
id TEXT PRIMARY KEY,
customer_id TEXT NOT NULL,
project_id TEXT NOT NULL,
subscription_type TEXT, -- 'serp_organic', 'serp_local', 'app_tracking'
is_active INTEGER DEFAULT 1,
started_at INTEGER NOT NULL,
expires_at INTEGER,
created_at INTEGER NOT NULL
);
Subscription Types
serp_organic- Organic rankingsserp_local- Local pack rankingsserp_paid- Paid ad positions
Billing Model
- Per keyword per day: Each tracked keyword counts as 1 billable unit
- Retrieval: Query
serp_runsto count daily runs per customer
SELECT customer_id, COUNT(*) as billable_units
FROM serp_runs
WHERE DATE(created_at) = CURRENT_DATE
GROUP BY customer_id;
React Integration
Display Current Rankings
const { data } = await fetch(`/api/rankings?project_id=$PROJECT_ID`);
data.rankings.forEach(ranking => {
console.log(`${ranking.keyword}: #${ranking.rank} (${ranking.change > 0 ? '↑' : '↓'}${Math.abs(ranking.change)})`);
});
Trigger On-Demand Tracking
const response = await fetch('/api/rankings/track', {
method: 'POST',
body: JSON.stringify({
project_id: 'proj_123',
keyword: 'project management',
location_code: 2840,
device: 'desktop'
})
});
const { positions } = await response.json();
Build Historical Charts
Query ClickHouse via /api/domain-metrics for aggregated trends.
Troubleshooting
Daily tracking not running
Check:
- Cron trigger configured in
wrangler.toml customer_subscriptionstable has active subscriptions- Worker logs for cron execution
wrangler tail --format pretty
DataForSEO rate limit errors
Symptoms: 429 errors in logs
Fix: Reduce max_concurrency in queue consumer:
[[queues.consumers]]
max_concurrency = 10 # Lower from 20
Results not appearing in ClickHouse
Check:
- ClickHouse ingestion queue health
- Schema initialized:
GET /init/clickhouse - ClickHouse credentials valid
curl https://your-worker.workers.dev/test/clickhouse
Duplicate tracking runs
Check: D1 serp_runs table for duplicate query_id + created_at within 1 hour
Fix: Ensure deduplication logic in lib/serp-tracking.js is active
Performance & Costs
DataForSEO Costs
- Live mode: $0.003 per keyword
- Task mode: $0.002 per keyword
- Daily cost example: 1,000 keywords/day = $2-3/day
ClickHouse Ingestion
- Batched inserts (100 rows per batch) reduce request overhead
- Typical ingestion: 1,000 keywords = 10 ClickHouse requests
Queue Processing
- With
max_concurrency: 20, process ~1,200 keywords/hour - Increase concurrency for faster processing (watch DataForSEO limits)
Future Enhancements
- Local pack position tracking (separate endpoint)
- Paid ad position monitoring
- SERP feature detection (featured snippets, people also ask)
- Competitor domain tracking
- Automated alerts for rank changes
- Multi-engine support (Bing, Yahoo)