Workflows
RankFabric uses Cloudflare Durable Workflows for orchestrating multi-step processes. Workflows provide visibility, state management, and automatic retries.
Workflows = Orchestration (Control Plane)
Queues = Execution (Data Plane)
Workflow Summary
| Workflow | Purpose | Trigger |
|---|---|---|
| AssetOnboardWorkflow | Master orchestrator for all asset types | POST /api/assets |
| DomainOnboardWorkflow | Domain intelligence gathering | Asset onboard, subscriptions |
| UrlClassifyWorkflow | 4-stage URL classification | Backlink queue |
| KeywordClassifyWorkflow | 5-stage keyword classification | Keyword queue |
| DomainClassifyWorkflow | Domain classification | Domain queue |
| SerpTrackingWorkflow | SERP position tracking | POST /api/keywords/track |
| AppDetailsWorkflow | App store metadata enrichment | Asset onboard |
| SubscriptionDispatchWorkflow | Daily cron coordinator | Cron (2 AM UTC) |
| CrawlRunWorkflow | App catalog discovery | Manual trigger |
| LlmVerifyWorkflow | Re-verify low-confidence results | LLM queue |
AssetOnboardWorkflow
Master orchestrator for onboarding any customer asset (website, app, local business).
Binding: ASSET_ONBOARD_WORKFLOW
Trigger: POST /api/assets
DomainOnboardWorkflow
Complete domain intelligence gathering with backlinks, keywords, brand properties, and social account discovery.
Binding: DOMAIN_ONBOARD_WORKFLOW
Limits: 100 backlinks, 100 referring domains, 500 keywords
Note: Backlinks, referring domains, and keywords fetch in parallel (3 independent API calls)
Brand Properties Discovery (Step 8)
The workflow discovers and verifies official social accounts for the domain:
-
Scrape Homepage - Extracts social links from:
- JSON-LD
sameAs(highest confidence) <link rel="me">tags- Raw HTML patterns
- JSON-LD
-
LLM Verification - Determines which are official brand accounts vs share buttons/employee accounts
-
YouTube Resolution - Resolves
@handleand/c/nameto canonical channel IDs via YouTube API -
Social Profile Scraping - Fetches follower counts for X/Instagram (currently disabled - platforms blocking scrapers)
-
Ranking Domain Properties - Also scrapes brand properties for top 20 domains found in SERP rankings
Platforms Tracked: YouTube, X, Facebook, Pinterest, LinkedIn, Instagram, TikTok, GitHub, Discord, Reddit
UrlClassifyWorkflow
4-stage classification pipeline optimized for cost (free → cheap → expensive).
Binding: URL_CLASSIFY_WORKFLOW
Early Exit: 70% confidence threshold
KeywordClassifyWorkflow
5-stage keyword classification with brand and location detection.
Binding: KEYWORD_CLASSIFY_WORKFLOW
Note: Brand and location detection run in parallel after vectorize
DomainClassifyWorkflow
5-stage domain classification pipeline optimized for cost (free → cheap → expensive). Each stage exits early if confidence ≥ 70%.
Binding: DOMAIN_CLASSIFY_WORKFLOW
Consumer: domain-classify-consumer.js
Threshold: 70% confidence for early exit
Stage Details
| Stage | Cost | Description |
|---|---|---|
| 0: Fetch Rank | ~$0.00002 | DataForSEO domain rank if not provided |
| 1: TLD/Patterns | FREE | .gov→institutional, *.shopify.com→commerce |
| 2: Known Domains | FREE | Check D1 cache for existing classification |
| 3: Vectorize | ~$0.0001 | BGE embeddings similarity to labeled examples |
| 4: Homepage Fetch | ~$0.000125 | R2 cache → low-noise crawl → Instant Pages |
| 5: LLM | ~$0.0002 | Llama 3.3 70B with V3 taxonomy prompt |
Valid tier1_types (V3 Taxonomy)
The LLM is constrained to output only these 8 valid values:
| tier1_type | Description | Examples |
|---|---|---|
platform | SaaS, apps, tools, dashboards | GitHub, Slack, Notion |
marketplace | Multi-sided listings/transactions | Amazon, eBay, Yelp, G2 |
commerce | Direct retail, D2C checkout | nike.com, apple.com/store |
service | Sells services, contact/quote/booking | Agencies, law firms |
information | Content publishing: blogs, news, guides | TechCrunch, Medium |
community | Forums, social, UGC-dominated | Reddit, Discord, Quora |
institutional | Government, education, nonprofit | .gov, .edu sites |
unknown | Cannot determine | - |
LLM Normalization
The LLM stage includes validation that maps common hallucinations to valid values:
authority → institutional
media → information
technology → platform
business → service
social → community
Workflow Parameters
interface DomainClassifyParams {
domain_id: number; // Required: D1 domain ID
domain: string; // Required: Domain name
domain_rank?: number; // Optional: Skip Stage 0 if provided
project_id?: number; // Optional: Project context
skip_fetch?: boolean; // Skip Stage 4 (homepage fetch)
skip_llm?: boolean; // Skip Stage 5 (LLM)
}
Planned Enhancements
- Tech Detection - Wappalyzer-style technology stack detection after homepage fetch
SerpTrackingWorkflow
Location-aware SERP tracking with local pack support.
Binding: SERP_TRACKING_WORKFLOW
Trigger: POST /api/keywords/track
AppDetailsWorkflow
App store metadata enrichment from Apple and Google Play.
Binding: APP_DETAILS_WORKFLOW
Cache: 24 hours
SubscriptionDispatchWorkflow
Daily cron that processes all active subscriptions.
Binding: SUBSCRIPTION_DISPATCH_WORKFLOW
Schedule: Daily at 2 AM UTC
CrawlRunWorkflow
App store catalog discovery with progress tracking.
Binding: CRAWL_RUN_WORKFLOW
Limit: 200 apps per category
LlmVerifyWorkflow
Re-verification for low-confidence classifications.
Binding: LLM_VERIFY_WORKFLOW
Cost: ~$0.02 per verification
Social Infrastructure
RankFabric tracks social accounts and engagement for domains.
What's Implemented
| Component | Description | Status |
|---|---|---|
| Homepage Scraping | Extracts social links from domain homepages (JSON-LD, meta tags, HTML patterns) | ✅ Active |
| LLM Ownership Verification | Determines if social accounts are official brand properties | ✅ Active |
| YouTube Resolution | Resolves handles to canonical channel IDs via YouTube API | ✅ Active |
| X/Instagram Profile Scraping | Fetches follower counts via cascade fetch (direct → ZenRows) | ⚠️ Disabled (platforms blocking) |
| social_accounts Table | Unified storage for all platforms with ownership tracking | ✅ Active |
| social_content Table | Stores videos, posts, tweets with engagement metrics | ✅ Schema ready |
| social_stats_history Table | Time-series stats for growth tracking | ✅ Schema ready |
Cascade Fetch Strategy
Profile scraping uses a cost-optimized cascade:
- Direct Fetch (FREE) - Works for non-JS sites
- ZenRows Basic ($0.001) - JS rendering
- ZenRows Premium ($0.01) - Residential proxy + JS
What's Planned
| Component | Description | Status |
|---|---|---|
| SharedCount Integration | URL-level social engagement metrics (shares, likes) | 📋 Planned |
| Rate-of-Change Updates | Update frequency based on engagement velocity | 📋 Planned |
| Official API Integrations | X API, Instagram Graph API for reliable data | 📋 Planned |
Color Legend
| Color | Meaning |
|---|---|
| 🔵 Blue | Input / Start |
| 🟢 Green | Free operation / Complete |
| 🟡 Yellow | Cheap / Decision point |
| 🟠 Orange | API call ($) |
| 🔴 Red | LLM ($$$) |
| 🟣 Purple | Child workflow |
| 🔷 Light Blue | Internal operation |
| ⬜ Gray | Planned / Not implemented |
Queue Integration
| Queue | Consumer | Triggered By |
|---|---|---|
BACKLINK_CLASSIFY_QUEUE | backlink-classify-consumer | DomainOnboardWorkflow |
KEYWORD_CLASSIFY_QUEUE | keyword-classify-consumer | DomainOnboardWorkflow |
DOMAIN_CLASSIFY_QUEUE | domain-classify-consumer | Various |
LLM_VERIFY_QUEUE | llm-verify-consumer | Classification workflows |
APP_INFO_QUEUE | app-details-consumer | AssetOnboardWorkflow |
SOCIAL_SCRAPE_QUEUE | social-scrape-consumer | DomainOnboardWorkflow |
Workflow API
# Trigger workflow
POST /api/admin/workflow/:type
# Get status
GET /api/admin/workflow/:type/:id
# Management
POST /api/admin/workflow/:type/:id/pause
POST /api/admin/workflow/:type/:id/resume
DELETE /api/admin/workflow/:type/:id
# Listing
GET /api/admin/workflows/running
GET /api/admin/workflows/history
GET /api/admin/workflows/failures
Planned Enhancements
SharedCount Integration
- Track social engagement metrics at the URL level (shares, likes, comments)
- Rate-of-change based update frequency:
- High velocity URLs: Weekly refresh
- Normal URLs: Monthly refresh
- Stale URLs: Quarterly refresh
Tech Detection
- Wappalyzer-style technology stack detection
- Run after homepage fetch in DomainClassifyWorkflow
Official Social APIs
- X API v2 for reliable follower counts
- Instagram Graph API for business accounts
- YouTube Data API (already integrated)
Workflow Parallelization
- DomainOnboard: Already parallelizes backlinks/referring domains/keywords fetch
- KeywordClassify: Already parallelizes brand and location detection