D1 Subrequest Limit Audit
Date: 2024-12-14
Issue: Cloudflare D1 limits Workers to ~1000 subrequests per invocation
Status: ✅ COMPLETE - All high/medium priority fixes applied
Summary
All major D1 subrequest issues have been fixed. The pattern used across all fixes:
- Collect all data first (no DB calls in loops)
- Batch upsert domains using
env.DB.batch() - Batch upsert URLs using
env.DB.batch() - Fetch IDs with
INclause queries - Batch insert final records in chunks of 50-100
Fixed Files
1. src/lib/dataforseo-backlinks.js - storeBacklinks() ✅ FIXED
Problem: Was doing 9-13 DB operations per backlink Impact: 500 backlinks = 4500-6500 DB calls → "Too many API requests" error Solution: 6-phase batch approach, ~5 DB calls instead of ~5000
2. src/queue/serp-consumer.js - parseSerpResponse() ✅ FIXED
Problem: ~800+ DB calls for a typical SERP result with organic + local pack + related searches Solution: 11-phase batch approach:
- Phase 1: Collect all data (no DB calls)
- Phase 2: Initial inserts (keyword + serp_run) - 1 batch
- Phase 3: Batch upsert all domains - 1 batch
- Phase 4: Batch upsert all URLs - 1 batch
- Phase 5: Fetch domain IDs - 1 query with IN clause
- Phase 6: Get previous positions - 1 query
- Phase 7: Batch upsert local entities - 1 batch
- Phase 8: Batch insert organic positions - 1 batch
- Phase 9: Batch insert local pack positions - 1 batch
- Phase 10: Get position IDs and insert local_positions - 2 operations
- Phase 11: Batch insert related searches + refinement chips - 1 batch
Result: ~10-15 DB calls instead of ~800+
3. src/queue/app-crawl-consumer.js - Google Play section ✅ FIXED
Problem: 700-1000+ DB calls for 100 apps Solution: 8-phase batch approach:
- Phase 1: Collect all data needed (minimal DB calls)
- Phase 2: Batch upsert all domains - 1 batch
- Phase 3: Batch upsert all URLs - 1 batch
- Phase 4: Fetch URL IDs - 1 query with IN clause
- Phase 5: Get unique category names for lookup - 1 query
- Phase 6: Upload icons (parallel, external calls)
- Phase 7: Batch insert/update apps - chunked batches
- Phase 8: Batch insert rankings - chunked batches
Also fixed project tracking loop to use batch operations.
Result: ~15-20 DB calls instead of ~1000
4. src/queue/domain-onboard-consumer.js - storeRankedKeywords() ✅ FIXED
Problem: 5000+ DB calls for 1000 keywords (keyword upsert + ensureUrl + SERP update + ranking insert per item) Solution: 9-phase batch approach:
- Phase 1: Collect all data and classify keywords (no DB calls)
- Phase 2: Batch upsert all domains - 1 batch
- Phase 3: Batch upsert all URLs - chunked batches
- Phase 4: Fetch URL IDs with IN clause - chunked queries
- Phase 5: Batch update SERP title/description - chunked batches
- Phase 6: Batch upsert keywords - chunked batches
- Phase 7: Batch insert rankings - chunked batches
- Phase 8: Check URLs needing classification - 1 query
- Phase 9: Brand handling - 1-2 queries
Result: ~20-30 DB calls instead of ~5000
5. src/endpoints/admin-referring-domains.js - storeRankedKeywords() ✅ FIXED
Problem: Same as domain-onboard-consumer.js - duplicate function with 3000+ DB calls for 1000 keywords Solution: Same 7-phase batch approach (simpler version without brand/URL classification handling):
- Phase 1: Collect all data (no DB calls)
- Phase 2: Batch upsert all domains - chunked batches
- Phase 3: Batch upsert all URLs - chunked batches
- Phase 4: Fetch URL IDs with IN clause - chunked queries
- Phase 5: Batch update SERP title/description - chunked batches
- Phase 6: Batch upsert keywords - chunked batches
- Phase 7: Batch insert rankings - chunked batches
Result: ~15-20 DB calls instead of ~3000
Lower Priority (Not Fixed)
6. src/lib/url-management.js - ensureUrl() / ensureDomain() ℹ️ INFO
These helper functions still do 2-3 DB operations each. They're fine for single-use cases but should NOT be called in loops. All hot paths have been refactored to use direct batch inserts instead.
7. src/lib/dataforseo-backlinks.js - upsertReferringDomains() ⚠️ LOW PRIORITY
Still uses individual upserts in a loop. Lower priority because:
- Referring domains are fetched less frequently than backlinks
- Typical volume is 100-500 domains, not 1000+
- Not currently causing production issues
Can be fixed later if needed using same batch pattern.
Batch Pattern Reference
// Phase-based approach used in all fixes:
// 1. Collect data (no DB calls)
const items = [];
const uniqueDomains = new Map();
const uniqueUrls = new Map();
for (const data of inputData) {
// Extract domains/URLs, build data structures
uniqueDomains.set(domain, true);
uniqueUrls.set(url, { domain, urlHash: await generateUrlHash(url) });
items.push({ ... });
}
// 2. Batch upsert domains
const domainStatements = Array.from(uniqueDomains.keys()).map(d =>
env.DB.prepare(`INSERT INTO domains...ON CONFLICT...`).bind(d, ...)
);
for (let i = 0; i < domainStatements.length; i += CHUNK_SIZE) {
await env.DB.batch(domainStatements.slice(i, i + CHUNK_SIZE));
}
// 3. Batch upsert URLs (same pattern)
// 4. Fetch IDs with IN clause
const hashes = urlEntries.map(([, data]) => data.urlHash);
const placeholders = hashes.map(() => "?").join(",");
const { results } = await env.DB.prepare(
`SELECT id, url_hash FROM urls WHERE url_hash IN (${placeholders})`
).bind(...hashes).all();
// 5. Batch insert final records (same pattern as step 2)
Testing Completed
All fixes tested with:
- 500+ backlinks ✅
- 100+ SERP results ✅
- 100+ apps per category ✅
- 1000+ ranked keywords ✅
No "Too many API requests" errors observed.