Skip to main content

Domain Classification Taxonomy

This document defines the hierarchical taxonomy for classifying domains. The classification system uses a tree structure where tier1_type is determined first, then domain_type is constrained to valid children of that tier1.


Classification Hierarchy

tier1_type (7 archetypes)
└── domain_type (child of tier1)
└── industry (vertical)
└── subcategory (niche)

Key Principle: tier1_type is the trunk, everything else branches from it. Once tier1 is determined with high confidence, it doesn't change - child classifications are constrained to valid options within that tier1.


Tier 1 Types (7 Universal Archetypes)

tier1_typeDescriptionBusiness Model
platformTools you log into and useSubscription/usage fees
marketplaceConnects buyers and sellersTransaction fees, listings
commerceSells products directlyProduct sales
serviceSells servicesService fees, billable hours
informationContent is the productAds, subscriptions, leads
communityUser-generated content dominatedAds, premium, donations
institutionalAuthority/trust-based orgsTaxes, tuition, donations

Domain Types by Tier1 (The Tree)

PLATFORM (tools you log into)

domain_typeDescriptionExamples
saas_productBusiness software, productivity toolsSlack, Notion, Asana, Salesforce
code_repositorySource code hostingGitHub, GitLab, Bitbucket
app_platformApp stores, distribution platformsApp Store, Google Play
documentation_portalHosted docs, wikisGitBook, ReadTheDocs, Notion docs
messaging_platformCommunication toolsSlack, Discord, WhatsApp
social_networkProfessional/personal networkingLinkedIn, Facebook, Twitter
audio_platformPodcasts, music streamingSpotify, Apple Podcasts
video_platformVideo hosting/streamingYouTube, Vimeo, Twitch

Note: If a domain is a tool that users log into to accomplish tasks, it's a PLATFORM regardless of industry.


MARKETPLACE (multi-sided, connects buyers/sellers)

domain_typeDescriptionExamples
ecommerce_marketplaceMulti-seller retailAmazon, eBay, Etsy
ticket_marketplaceEvent ticketsTicketmaster, StubHub, AXS
real_estate_marketplaceProperty listingsZillow, Realtor.com, Redfin
job_marketplaceJob listingsIndeed, LinkedIn Jobs, Glassdoor
service_marketplaceFreelance/gig servicesUpwork, Fiverr, Thumbtack
app_marketplaceSoftware/plugin listingsChrome Web Store, Salesforce AppExchange
review_marketplaceBusiness reviews + leadsG2, Capterra, Yelp
directory_citationBusiness directoriesYellow Pages, BBB

Key distinction: Marketplaces connect multiple sellers to buyers. If a site only sells its own products, it's COMMERCE, not MARKETPLACE.

Example: Ticketmaster is a MARKETPLACE (connects ticket sellers to buyers), not COMMUNITY (even though it's sports-related).


COMMERCE (sells directly, single seller)

domain_typeDescriptionExamples
ecommerce_storeDirect retail/D2CNike.com, Apple Store, Warby Parker
travel_bookingAirlines, hotels, car rentalDelta, Marriott, Hertz
subscription_commerceSubscription boxes, servicesNetflix, Dollar Shave Club
product_manufacturerBrand/manufacturer sitesApple, Samsung, Ford

Note: If a company sells its own products directly to consumers, it's COMMERCE. If it hosts other sellers, it's MARKETPLACE.


SERVICE (sells services, not products)

domain_typeDescriptionExamples
agency_providerMarketing, PR, consultingOgilvy, McKinsey, Accenture
pr_distributionPress release wire servicesPR Newswire, Business Wire
professional_serviceLaw, accounting, consultingBaker McKenzie, Deloitte
healthcare_providerHospitals, clinics, telehealthMayo Clinic, Teladoc
financial_serviceBanks, credit unions, fintechChase, PayPal, Stripe
legal_serviceLaw firms, legal servicesBaker McKenzie, LegalZoom

Note: If the primary business is selling human services (billable hours, expertise), it's SERVICE.


INFORMATION (content is the product)

domain_typeDescriptionExamples
news_publisherJournalism organizationsNYT, TechCrunch, The Verge
magazine_publisherMagazines, periodicalsWired, Forbes, Inc.
blog_publisherBlog platforms hosting multiple blogsMedium, Substack
content_publisherGeneric content sitesBuzzFeed, HuffPost
review_siteEditorial reviews (non-affiliate)Wirecutter, Consumer Reports
affiliate_review_siteAffiliate-driven reviewsNerdWallet, The Points Guy
reference_wikiReference/encyclopediaWikipedia, Investopedia

Note: The domain's PRIMARY business is creating/distributing content. A SaaS company with a blog is still PLATFORM, not INFORMATION.


COMMUNITY (UGC-dominated)

domain_typeDescriptionExamples
forum_communityDiscussion forumsReddit, Stack Overflow, Discourse
gaming_communityGaming forums, fan sitesDiscord servers, IGN forums
sports_communityFan forums, fantasy sportsESPN forums, Fantasy Pros
qna_platformQ&A sitesQuora, Stack Exchange
ugc_videoUser-generated videoYouTube (as community), TikTok

Note: The domain's PRIMARY value comes from user-generated content, not editorial or company content.

Important: Sports media sites like ESPN.com are INFORMATION (they produce content), not COMMUNITY. Fan forums are COMMUNITY.


INSTITUTIONAL (authority-based orgs)

domain_typeDescriptionExamples
government_siteGovernment agenciesIRS.gov, CDC.gov, FDA.gov
education_academicUniversities, schoolsStanford.edu, MIT.edu
nonprofit_orgNonprofits, NGOsRed Cross, Wikipedia Foundation
healthcare_institutionHospital systems, health orgsMayo Clinic, Cleveland Clinic
financial_institutionBanks, credit unionsFederal Reserve, FDIC
legal_institutionCourts, bar associationsSupreme Court, ABA
trade_associationIndustry associationsIEEE, ACM, NAR

Note: These are trust-based organizations, often with .gov, .edu, or .org TLDs. Their authority comes from institutional status, not commercial activity.


Classification Flow

1. Determine tier1_type FIRST (highest confidence)
├── Known domain in database? → Use stored tier1
├── TLD rules (.gov → institutional, .edu → institutional)
├── Platform patterns (Shopify, WordPress, etc.)
└── LLM classification (constrained to 7 options)

2. Once tier1 is locked, determine domain_type
├── Only show valid domain_types for that tier1
├── Use URL patterns, content signals
└── LLM classification (constrained to tier1's children)

3. Determine industry/subcategory (optional granularity)
├── sports, gaming, finance, healthcare, etc.
└── Stored in industry + subcategory fields

Database Schema

The domains table stores:

ColumnDescription
tier1_typeOne of 7 archetypes (platform, marketplace, etc.)
domain_typeChild type within tier1
tier1_confidenceConfidence score for tier1 (0-1)
domain_type_confidenceConfidence score for domain_type (0-1)

The CSV master file (classification-data/domains/_master.csv) stores:

ColumnDescription
tier1_typeArchetype
domain_typeGranular type
industryVertical (sports, finance, healthcare)
subcategorySpecific niche

Common Misclassifications to Avoid

DomainWRONGRIGHTReason
Ticketmastercommunity/SPORTS_MEDIAmarketplace/ticket_marketplaceConnects sellers to buyers
Shopify.commarketplace/ECOMMERCE_MARKETPLACEplatform/saas_productIt's a SaaS tool, not a marketplace
Apple.complatform/saas_productcommerce/product_manufacturerSells hardware products
InsideEVs.comcommerce/ECOMMERCE_STOREinformation/news_publisherEV news site, not a store
Three.jscommunity/GAMING_PLATFORMplatform/documentation_portalJavaScript library docs
Ahrefs.comservice/telehealth_providerplatform/saas_productSEO SaaS tool
ESPN.comcommunity/sports_communityinformation/news_publisherProduces sports journalism

LLM Prompt Guidelines

When asking the LLM to classify a domain:

  1. Ask for tier1_type FIRST with only 7 options
  2. Then ask for domain_type constrained to that tier1's children
  3. Provide clear definitions for each tier1 type
  4. Give examples of common misclassifications to avoid

Example prompt structure:

First, determine the tier1_type. Choose exactly one:
- platform: Tools users log into (SaaS, apps, dev tools)
- marketplace: Connects buyers and sellers (multi-sided)
- commerce: Sells products directly (single seller)
- service: Sells services (agencies, consulting, healthcare)
- information: Content is the product (news, blogs, reviews)
- community: User-generated content dominated (forums, Q&A)
- institutional: Authority-based orgs (gov, edu, nonprofit)

Then, based on tier1_type, choose the appropriate domain_type...

Files

FilePurpose
src/lib/classification-constants.jsEnum definitions and mappings
src/data/domain-database.jsGenerated from master CSV
classification-data/domains/_master.csvSingle source of truth for curated domains
src/lib/classifier-llm.jsLLM classification prompts
src/lib/classifier-rules-engine.jsRules-based classification

Updating the Taxonomy

  1. Edit the constants in src/lib/classification-constants.js
  2. Update the master CSV if adding new domain_types
  3. Run npm run build:domains to regenerate domain-database.js
  4. Update the LLM prompt in src/lib/classifier-llm.js
  5. Update this document