Skip to main content

Workflows

RankFabric uses Cloudflare Durable Workflows for orchestrating multi-step processes. Workflows provide visibility, state management, and automatic retries.

Workflows = Orchestration (Control Plane)
Queues = Execution (Data Plane)


Workflow Summary

WorkflowPurposeTrigger
AssetOnboardWorkflowMaster orchestrator for all asset typesPOST /api/assets
DomainOnboardWorkflowDomain intelligence gatheringAsset onboard, subscriptions
UrlClassifyWorkflow4-stage URL classificationBacklink queue
KeywordClassifyWorkflow5-stage keyword classificationKeyword queue
DomainClassifyWorkflowDomain classificationDomain queue
SerpTrackingWorkflowSERP position trackingPOST /api/keywords/track
AppDetailsWorkflowApp store metadata enrichmentAsset onboard
SubscriptionDispatchWorkflowDaily cron coordinatorCron (2 AM UTC)
CrawlRunWorkflowApp catalog discoveryManual trigger
LlmVerifyWorkflowRe-verify low-confidence resultsLLM queue

AssetOnboardWorkflow

Master orchestrator for onboarding any customer asset (website, app, local business).

Binding: ASSET_ONBOARD_WORKFLOW
Trigger: POST /api/assets


DomainOnboardWorkflow

Complete domain intelligence gathering with backlinks, keywords, brand properties, and social account discovery.

Binding: DOMAIN_ONBOARD_WORKFLOW
Limits: 100 backlinks, 100 referring domains, 500 keywords
Note: Backlinks, referring domains, and keywords fetch in parallel (3 independent API calls)

Brand Properties Discovery (Step 8)

The workflow discovers and verifies official social accounts for the domain:

  1. Scrape Homepage - Extracts social links from:

    • JSON-LD sameAs (highest confidence)
    • <link rel="me"> tags
    • Raw HTML patterns
  2. LLM Verification - Determines which are official brand accounts vs share buttons/employee accounts

  3. YouTube Resolution - Resolves @handle and /c/name to canonical channel IDs via YouTube API

  4. Social Profile Scraping - Fetches follower counts for X/Instagram (currently disabled - platforms blocking scrapers)

  5. Ranking Domain Properties - Also scrapes brand properties for top 20 domains found in SERP rankings

Platforms Tracked: YouTube, X, Facebook, Pinterest, LinkedIn, Instagram, TikTok, GitHub, Discord, Reddit


UrlClassifyWorkflow

4-stage classification pipeline optimized for cost (free → cheap → expensive).

Binding: URL_CLASSIFY_WORKFLOW
Early Exit: 70% confidence threshold


KeywordClassifyWorkflow

5-stage keyword classification with brand and location detection.

Binding: KEYWORD_CLASSIFY_WORKFLOW
Note: Brand and location detection run in parallel after vectorize


DomainClassifyWorkflow

5-stage domain classification pipeline optimized for cost (free → cheap → expensive). Each stage exits early if confidence ≥ 70%.

Binding: DOMAIN_CLASSIFY_WORKFLOW
Consumer: domain-classify-consumer.js
Threshold: 70% confidence for early exit

Stage Details

StageCostDescription
0: Fetch Rank~$0.00002DataForSEO domain rank if not provided
1: TLD/PatternsFREE.gov→institutional, *.shopify.com→commerce
2: Known DomainsFREECheck D1 cache for existing classification
3: Vectorize~$0.0001BGE embeddings similarity to labeled examples
4: Homepage Fetch~$0.000125R2 cache → low-noise crawl → Instant Pages
5: LLM~$0.0002Llama 3.3 70B with V3 taxonomy prompt

Valid tier1_types (V3 Taxonomy)

The LLM is constrained to output only these 8 valid values:

tier1_typeDescriptionExamples
platformSaaS, apps, tools, dashboardsGitHub, Slack, Notion
marketplaceMulti-sided listings/transactionsAmazon, eBay, Yelp, G2
commerceDirect retail, D2C checkoutnike.com, apple.com/store
serviceSells services, contact/quote/bookingAgencies, law firms
informationContent publishing: blogs, news, guidesTechCrunch, Medium
communityForums, social, UGC-dominatedReddit, Discord, Quora
institutionalGovernment, education, nonprofit.gov, .edu sites
unknownCannot determine-

LLM Normalization

The LLM stage includes validation that maps common hallucinations to valid values:

authority → institutional
media → information
technology → platform
business → service
social → community

Workflow Parameters

interface DomainClassifyParams {
domain_id: number; // Required: D1 domain ID
domain: string; // Required: Domain name
domain_rank?: number; // Optional: Skip Stage 0 if provided
project_id?: number; // Optional: Project context
skip_fetch?: boolean; // Skip Stage 4 (homepage fetch)
skip_llm?: boolean; // Skip Stage 5 (LLM)
}

Planned Enhancements

  • Tech Detection - Wappalyzer-style technology stack detection after homepage fetch

SerpTrackingWorkflow

Location-aware SERP tracking with local pack support.

Binding: SERP_TRACKING_WORKFLOW
Trigger: POST /api/keywords/track


AppDetailsWorkflow

App store metadata enrichment from Apple and Google Play.

Binding: APP_DETAILS_WORKFLOW
Cache: 24 hours


SubscriptionDispatchWorkflow

Daily cron that processes all active subscriptions.

Binding: SUBSCRIPTION_DISPATCH_WORKFLOW
Schedule: Daily at 2 AM UTC


CrawlRunWorkflow

App store catalog discovery with progress tracking.

Binding: CRAWL_RUN_WORKFLOW
Limit: 200 apps per category


LlmVerifyWorkflow

Re-verification for low-confidence classifications.

Binding: LLM_VERIFY_WORKFLOW
Cost: ~$0.02 per verification


Social Infrastructure

RankFabric tracks social accounts and engagement for domains.

What's Implemented

ComponentDescriptionStatus
Homepage ScrapingExtracts social links from domain homepages (JSON-LD, meta tags, HTML patterns)✅ Active
LLM Ownership VerificationDetermines if social accounts are official brand properties✅ Active
YouTube ResolutionResolves handles to canonical channel IDs via YouTube API✅ Active
X/Instagram Profile ScrapingFetches follower counts via cascade fetch (direct → ZenRows)⚠️ Disabled (platforms blocking)
social_accounts TableUnified storage for all platforms with ownership tracking✅ Active
social_content TableStores videos, posts, tweets with engagement metrics✅ Schema ready
social_stats_history TableTime-series stats for growth tracking✅ Schema ready

Cascade Fetch Strategy

Profile scraping uses a cost-optimized cascade:

  1. Direct Fetch (FREE) - Works for non-JS sites
  2. ZenRows Basic ($0.001) - JS rendering
  3. ZenRows Premium ($0.01) - Residential proxy + JS

What's Planned

ComponentDescriptionStatus
SharedCount IntegrationURL-level social engagement metrics (shares, likes)📋 Planned
Rate-of-Change UpdatesUpdate frequency based on engagement velocity📋 Planned
Official API IntegrationsX API, Instagram Graph API for reliable data📋 Planned

Color Legend

ColorMeaning
🔵 BlueInput / Start
🟢 GreenFree operation / Complete
🟡 YellowCheap / Decision point
🟠 OrangeAPI call ($)
🔴 RedLLM ($$$)
🟣 PurpleChild workflow
🔷 Light BlueInternal operation
⬜ GrayPlanned / Not implemented

Queue Integration

QueueConsumerTriggered By
BACKLINK_CLASSIFY_QUEUEbacklink-classify-consumerDomainOnboardWorkflow
KEYWORD_CLASSIFY_QUEUEkeyword-classify-consumerDomainOnboardWorkflow
DOMAIN_CLASSIFY_QUEUEdomain-classify-consumerVarious
LLM_VERIFY_QUEUEllm-verify-consumerClassification workflows
APP_INFO_QUEUEapp-details-consumerAssetOnboardWorkflow
SOCIAL_SCRAPE_QUEUEsocial-scrape-consumerDomainOnboardWorkflow

Workflow API

# Trigger workflow
POST /api/admin/workflow/:type

# Get status
GET /api/admin/workflow/:type/:id

# Management
POST /api/admin/workflow/:type/:id/pause
POST /api/admin/workflow/:type/:id/resume
DELETE /api/admin/workflow/:type/:id

# Listing
GET /api/admin/workflows/running
GET /api/admin/workflows/history
GET /api/admin/workflows/failures

Planned Enhancements

SharedCount Integration

  • Track social engagement metrics at the URL level (shares, likes, comments)
  • Rate-of-change based update frequency:
    • High velocity URLs: Weekly refresh
    • Normal URLs: Monthly refresh
    • Stale URLs: Quarterly refresh

Tech Detection

  • Wappalyzer-style technology stack detection
  • Run after homepage fetch in DomainClassifyWorkflow

Official Social APIs

  • X API v2 for reliable follower counts
  • Instagram Graph API for business accounts
  • YouTube Data API (already integrated)

Workflow Parallelization

  • DomainOnboard: Already parallelizes backlinks/referring domains/keywords fetch
  • KeywordClassify: Already parallelizes brand and location detection