Domain Classification Taxonomy
This document defines the hierarchical taxonomy for classifying domains. The classification system uses a tree structure where tier1_type is determined first, then domain_type is constrained to valid children of that tier1.
Classification Hierarchy
tier1_type (7 archetypes)
└── domain_type (child of tier1)
└── industry (vertical)
└── subcategory (niche)
Key Principle: tier1_type is the trunk, everything else branches from it. Once tier1 is determined with high confidence, it doesn't change - child classifications are constrained to valid options within that tier1.
Tier 1 Types (7 Universal Archetypes)
| tier1_type | Description | Business Model |
|---|---|---|
platform | Tools you log into and use | Subscription/usage fees |
marketplace | Connects buyers and sellers | Transaction fees, listings |
commerce | Sells products directly | Product sales |
service | Sells services | Service fees, billable hours |
information | Content is the product | Ads, subscriptions, leads |
community | User-generated content dominated | Ads, premium, donations |
institutional | Authority/trust-based orgs | Taxes, tuition, donations |
Domain Types by Tier1 (The Tree)
PLATFORM (tools you log into)
| domain_type | Description | Examples |
|---|---|---|
saas_product | Business software, productivity tools | Slack, Notion, Asana, Salesforce |
code_repository | Source code hosting | GitHub, GitLab, Bitbucket |
app_platform | App stores, distribution platforms | App Store, Google Play |
documentation_portal | Hosted docs, wikis | GitBook, ReadTheDocs, Notion docs |
messaging_platform | Communication tools | Slack, Discord, WhatsApp |
social_network | Professional/personal networking | LinkedIn, Facebook, Twitter |
audio_platform | Podcasts, music streaming | Spotify, Apple Podcasts |
video_platform | Video hosting/streaming | YouTube, Vimeo, Twitch |
Note: If a domain is a tool that users log into to accomplish tasks, it's a PLATFORM regardless of industry.
MARKETPLACE (multi-sided, connects buyers/sellers)
| domain_type | Description | Examples |
|---|---|---|
ecommerce_marketplace | Multi-seller retail | Amazon, eBay, Etsy |
ticket_marketplace | Event tickets | Ticketmaster, StubHub, AXS |
real_estate_marketplace | Property listings | Zillow, Realtor.com, Redfin |
job_marketplace | Job listings | Indeed, LinkedIn Jobs, Glassdoor |
service_marketplace | Freelance/gig services | Upwork, Fiverr, Thumbtack |
app_marketplace | Software/plugin listings | Chrome Web Store, Salesforce AppExchange |
review_marketplace | Business reviews + leads | G2, Capterra, Yelp |
directory_citation | Business directories | Yellow Pages, BBB |
Key distinction: Marketplaces connect multiple sellers to buyers. If a site only sells its own products, it's COMMERCE, not MARKETPLACE.
Example: Ticketmaster is a MARKETPLACE (connects ticket sellers to buyers), not COMMUNITY (even though it's sports-related).
COMMERCE (sells directly, single seller)
| domain_type | Description | Examples |
|---|---|---|
ecommerce_store | Direct retail/D2C | Nike.com, Apple Store, Warby Parker |
travel_booking | Airlines, hotels, car rental | Delta, Marriott, Hertz |
subscription_commerce | Subscription boxes, services | Netflix, Dollar Shave Club |
product_manufacturer | Brand/manufacturer sites | Apple, Samsung, Ford |
Note: If a company sells its own products directly to consumers, it's COMMERCE. If it hosts other sellers, it's MARKETPLACE.
SERVICE (sells services, not products)
| domain_type | Description | Examples |
|---|---|---|
agency_provider | Marketing, PR, consulting | Ogilvy, McKinsey, Accenture |
pr_distribution | Press release wire services | PR Newswire, Business Wire |
professional_service | Law, accounting, consulting | Baker McKenzie, Deloitte |
healthcare_provider | Hospitals, clinics, telehealth | Mayo Clinic, Teladoc |
financial_service | Banks, credit unions, fintech | Chase, PayPal, Stripe |
legal_service | Law firms, legal services | Baker McKenzie, LegalZoom |
Note: If the primary business is selling human services (billable hours, expertise), it's SERVICE.
INFORMATION (content is the product)
| domain_type | Description | Examples |
|---|---|---|
news_publisher | Journalism organizations | NYT, TechCrunch, The Verge |
magazine_publisher | Magazines, periodicals | Wired, Forbes, Inc. |
blog_publisher | Blog platforms hosting multiple blogs | Medium, Substack |
content_publisher | Generic content sites | BuzzFeed, HuffPost |
review_site | Editorial reviews (non-affiliate) | Wirecutter, Consumer Reports |
affiliate_review_site | Affiliate-driven reviews | NerdWallet, The Points Guy |
reference_wiki | Reference/encyclopedia | Wikipedia, Investopedia |
Note: The domain's PRIMARY business is creating/distributing content. A SaaS company with a blog is still PLATFORM, not INFORMATION.
COMMUNITY (UGC-dominated)
| domain_type | Description | Examples |
|---|---|---|
forum_community | Discussion forums | Reddit, Stack Overflow, Discourse |
gaming_community | Gaming forums, fan sites | Discord servers, IGN forums |
sports_community | Fan forums, fantasy sports | ESPN forums, Fantasy Pros |
qna_platform | Q&A sites | Quora, Stack Exchange |
ugc_video | User-generated video | YouTube (as community), TikTok |
Note: The domain's PRIMARY value comes from user-generated content, not editorial or company content.
Important: Sports media sites like ESPN.com are INFORMATION (they produce content), not COMMUNITY. Fan forums are COMMUNITY.
INSTITUTIONAL (authority-based orgs)
| domain_type | Description | Examples |
|---|---|---|
government_site | Government agencies | IRS.gov, CDC.gov, FDA.gov |
education_academic | Universities, schools | Stanford.edu, MIT.edu |
nonprofit_org | Nonprofits, NGOs | Red Cross, Wikipedia Foundation |
healthcare_institution | Hospital systems, health orgs | Mayo Clinic, Cleveland Clinic |
financial_institution | Banks, credit unions | Federal Reserve, FDIC |
legal_institution | Courts, bar associations | Supreme Court, ABA |
trade_association | Industry associations | IEEE, ACM, NAR |
Note: These are trust-based organizations, often with .gov, .edu, or .org TLDs. Their authority comes from institutional status, not commercial activity.
Classification Flow
1. Determine tier1_type FIRST (highest confidence)
├── Known domain in database? → Use stored tier1
├── TLD rules (.gov → institutional, .edu → institutional)
├── Platform patterns (Shopify, WordPress, etc.)
└── LLM classification (constrained to 7 options)
2. Once tier1 is locked, determine domain_type
├── Only show valid domain_types for that tier1
├── Use URL patterns, content signals
└── LLM classification (constrained to tier1's children)
3. Determine industry/subcategory (optional granularity)
├── sports, gaming, finance, healthcare, etc.
└── Stored in industry + subcategory fields
Database Schema
The domains table stores:
| Column | Description |
|---|---|
tier1_type | One of 7 archetypes (platform, marketplace, etc.) |
domain_type | Child type within tier1 |
tier1_confidence | Confidence score for tier1 (0-1) |
domain_type_confidence | Confidence score for domain_type (0-1) |
The CSV master file (classification-data/domains/_master.csv) stores:
| Column | Description |
|---|---|
tier1_type | Archetype |
domain_type | Granular type |
industry | Vertical (sports, finance, healthcare) |
subcategory | Specific niche |
Common Misclassifications to Avoid
| Domain | WRONG | RIGHT | Reason |
|---|---|---|---|
| Ticketmaster | community/SPORTS_MEDIA | marketplace/ticket_marketplace | Connects sellers to buyers |
| Shopify.com | marketplace/ECOMMERCE_MARKETPLACE | platform/saas_product | It's a SaaS tool, not a marketplace |
| Apple.com | platform/saas_product | commerce/product_manufacturer | Sells hardware products |
| InsideEVs.com | commerce/ECOMMERCE_STORE | information/news_publisher | EV news site, not a store |
| Three.js | community/GAMING_PLATFORM | platform/documentation_portal | JavaScript library docs |
| Ahrefs.com | service/telehealth_provider | platform/saas_product | SEO SaaS tool |
| ESPN.com | community/sports_community | information/news_publisher | Produces sports journalism |
LLM Prompt Guidelines
When asking the LLM to classify a domain:
- Ask for tier1_type FIRST with only 7 options
- Then ask for domain_type constrained to that tier1's children
- Provide clear definitions for each tier1 type
- Give examples of common misclassifications to avoid
Example prompt structure:
First, determine the tier1_type. Choose exactly one:
- platform: Tools users log into (SaaS, apps, dev tools)
- marketplace: Connects buyers and sellers (multi-sided)
- commerce: Sells products directly (single seller)
- service: Sells services (agencies, consulting, healthcare)
- information: Content is the product (news, blogs, reviews)
- community: User-generated content dominated (forums, Q&A)
- institutional: Authority-based orgs (gov, edu, nonprofit)
Then, based on tier1_type, choose the appropriate domain_type...
Files
| File | Purpose |
|---|---|
src/lib/classification-constants.js | Enum definitions and mappings |
src/data/domain-database.js | Generated from master CSV |
classification-data/domains/_master.csv | Single source of truth for curated domains |
src/lib/classifier-llm.js | LLM classification prompts |
src/lib/classifier-rules-engine.js | Rules-based classification |
Updating the Taxonomy
- Edit the constants in
src/lib/classification-constants.js - Update the master CSV if adding new domain_types
- Run
npm run build:domainsto regenerate domain-database.js - Update the LLM prompt in
src/lib/classifier-llm.js - Update this document