Database Schema Reference
Complete database schema documentation for D1 and ClickHouse.
Table of Contents
D1 Schema
D1 stores operational data: app metadata, categories, rankings, crawl state, and jobs.
Apps
Primary app metadata table.
CREATE TABLE apps (
id TEXT PRIMARY KEY,
platform TEXT NOT NULL,
title TEXT,
subtitle TEXT,
developer_name TEXT,
developer_id TEXT,
icon_url TEXT,
rating REAL,
rating_count INTEGER,
price REAL,
is_free INTEGER,
primary_category TEXT,
description TEXT,
description_full TEXT,
website_url TEXT,
store_url TEXT,
-- Google Play Enhanced Fields
download_count TEXT,
download_count_exact INTEGER,
in_app_purchase_range TEXT,
iap_price_min REAL,
iap_price_max REAL,
iap_price_avg REAL,
feature_tags TEXT,
-- Apple-Specific Fields
bundle_id TEXT,
version TEXT,
size TEXT,
age_rating TEXT,
screenshots TEXT,
-- Derived Fields
best_rank INTEGER,
total_categories INTEGER,
total_shelves INTEGER,
total_positions INTEGER,
weighted_score REAL,
-- Metadata
discovered_from TEXT,
crawl_project_id TEXT,
created_at INTEGER,
updated_at INTEGER,
last_crawled_at INTEGER
);
CREATE INDEX idx_apps_platform ON apps(platform);
CREATE INDEX idx_apps_primary_category ON apps(primary_category);
CREATE INDEX idx_apps_best_rank ON apps(best_rank);
CREATE INDEX idx_apps_developer ON apps(developer_id);
CREATE INDEX idx_apps_crawl_project ON apps(crawl_project_id);
Field Descriptions:
Core Fields:
id: App identifier (bundle ID for Google Play, numeric ID for Apple)platform:google_playorappletitle: App namesubtitle: Short tagline (Apple) or NULL (Google Play)developer_name: Developer/publisher namedeveloper_id: Developer identifiericon_url: App icon URLrating: Average rating (0-5)rating_count: Number of ratingsprice: App price (0 for free)is_free: Boolean flag (1 = free, 0 = paid)primary_category: Primary category IDdescription: Short description (first 2000 chars)description_full: Complete descriptionwebsite_url: Developer websitestore_url: App store URL
Google Play Enhanced Fields:
download_count: Display string (e.g., "1B+", "500M+")download_count_exact: Actual download count (e.g., 3984720514)in_app_purchase_range: IAP range string (e.g., "$0.05 - $1,209.99 per item")iap_price_min: Minimum IAP priceiap_price_max: Maximum IAP priceiap_price_avg: Average IAP pricefeature_tags: JSON array of feature tags (e.g., ["#1 top grossing social"])
Apple-Specific Fields:
bundle_id: Apple bundle identifierversion: App version numbersize: App file sizeage_rating: Age rating (e.g., "4+", "12+", "17+")screenshots: JSON array of screenshot URLs
Derived Fields:
best_rank: Best position across all categories (lowest number = best)total_categories: Number of categories app appears intotal_shelves: Number of unique shelves (e.g., Top Free, Top Paid)total_positions: Total shelf placements across all categoriesweighted_score: Ranking algorithm score (see Weighted Score)
Metadata:
discovered_from:category,recommendations,searchcrawl_project_id: Associated crawl project IDcreated_at: Unix timestamp (milliseconds)updated_at: Last update timestamplast_crawled_at: Last crawl timestamp
App Store Categories
Category taxonomy for Apple App Store and Google Play.
CREATE TABLE app_store_categories (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
platform TEXT NOT NULL,
device TEXT,
category_type TEXT,
parent_ids TEXT,
primary_parent_id TEXT,
path TEXT,
level INTEGER,
created_at INTEGER,
updated_at INTEGER
);
CREATE INDEX idx_categories_platform ON app_store_categories(platform);
CREATE INDEX idx_categories_parent ON app_store_categories(primary_parent_id);
CREATE INDEX idx_categories_type ON app_store_categories(category_type);
Field Descriptions:
id: Category identifier (e.g.,google_play_weather,apple_games_action)name: Display name (e.g., "Weather", "Action Games")platform:google_playorappledevice:phone,iphone,ipad, or NULL (all devices)category_type:category,subcategory,grouping,room,story,chartparent_ids: JSON array of parent category IDs (source of truth)primary_parent_id: First parent (derived fromparent_ids[0])path: Display path (e.g., "Games > Action")level: Depth in hierarchy (0 = root, 1 = child, etc.)
Category Hierarchy Model:
Categories are organized in a hierarchy with multiple inheritance support:
- Source of Truth:
parent_idsarray contains all parent category IDs - Convenience Field:
primary_parent_idis derived fromparent_ids[0] - Relationships: Stored in
category_relationshipstable for querying
Example:
{
"id": "apple_games_action_adventure",
"name": "Action Adventure",
"parent_ids": ["apple_games", "apple_games_action"],
"primary_parent_id": "apple_games"
}
Category Types:
category: Top-level category (e.g., "Games", "Social")subcategory: Sub-category (e.g., "Action Games")grouping: Apple grouping containerroom: Apple room (curated collection)story: Apple story (editorial feature)chart: Chart type (e.g., "Top Free", "Top Paid")
Category Relationships
Explicit parent-child relationships for category hierarchy.
CREATE TABLE category_relationships (
id INTEGER PRIMARY KEY AUTOINCREMENT,
category_id TEXT NOT NULL,
parent_category_id TEXT NOT NULL,
platform TEXT NOT NULL,
device TEXT,
relationship_type TEXT,
created_at INTEGER,
UNIQUE(category_id, parent_category_id)
);
CREATE INDEX idx_relationships_category ON category_relationships(category_id);
CREATE INDEX idx_relationships_parent ON category_relationships(parent_category_id);
Field Descriptions:
category_id: Child category IDparent_category_id: Parent category IDrelationship_type:primary,secondary,chart,editorial
Usage:
Query all parents of a category:
SELECT parent_category_id
FROM category_relationships
WHERE category_id = 'apple_games_action';
Query all children of a category:
SELECT category_id
FROM category_relationships
WHERE parent_category_id = 'apple_games';
App Category Rankings
App positions in category charts.
CREATE TABLE app_category_rankings (
id INTEGER PRIMARY KEY AUTOINCREMENT,
app_id TEXT NOT NULL,
platform TEXT NOT NULL,
category_id TEXT NOT NULL,
app_rank INTEGER,
category_position INTEGER,
shelf_name TEXT,
category_name TEXT,
category_type TEXT,
crawl_date INTEGER,
created_at INTEGER,
UNIQUE(app_id, category_id, shelf_name, crawl_date)
);
CREATE INDEX idx_rankings_app ON app_category_rankings(app_id);
CREATE INDEX idx_rankings_category ON app_category_rankings(category_id);
CREATE INDEX idx_rankings_rank ON app_category_rankings(app_rank);
Field Descriptions:
app_id: App identifiercategory_id: Category identifierapp_rank: Position in category (1 = #1)category_position: Position within specific shelfshelf_name: Shelf name (e.g., "Top Free", "Top Paid", "Top Grossing")category_name: Display name of categorycategory_type: Type of categorycrawl_date: Date of crawl (Unix timestamp, milliseconds)
Example:
{
"app_id": "com.instagram.android",
"category_id": "google_play_social",
"app_rank": 2,
"category_position": 2,
"shelf_name": "Top Free",
"category_name": "Social"
}
App Recommendations
Apps recommended on other app pages.
CREATE TABLE app_recommendations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
app_id TEXT NOT NULL,
recommended_app_id TEXT NOT NULL,
position INTEGER,
title TEXT,
icon_url TEXT,
rating REAL,
platform TEXT,
created_at INTEGER,
UNIQUE(app_id, recommended_app_id, position)
);
CREATE INDEX idx_recommendations_app ON app_recommendations(app_id);
CREATE INDEX idx_recommendations_recommended ON app_recommendations(recommended_app_id);
Field Descriptions:
app_id: App that shows the recommendationrecommended_app_id: App being recommendedposition: Position in recommendations list (1-based)
Usage:
Query recommendations shown on an app page:
SELECT * FROM app_recommendations WHERE app_id = 'com.example.app';
Query how many times an app is recommended:
SELECT COUNT(*) FROM app_recommendations WHERE recommended_app_id = 'com.example.app';
Crawl Management Tables
Crawl Projects
CREATE TABLE crawl_projects (
id TEXT PRIMARY KEY,
platform TEXT NOT NULL,
device TEXT,
status TEXT,
categories_queued INTEGER,
related_app_depth INTEGER,
created_at INTEGER,
completed_at INTEGER
);
Crawl Runs
CREATE TABLE crawl_runs (
id TEXT PRIMARY KEY,
project_id TEXT,
category_id TEXT,
status TEXT,
apps_discovered INTEGER,
error TEXT,
started_at INTEGER,
completed_at INTEGER
);
Jobs
Generic job queue tracking.
CREATE TABLE jobs (
id TEXT PRIMARY KEY,
type TEXT NOT NULL,
status TEXT NOT NULL,
payload TEXT,
result TEXT,
error TEXT,
retries INTEGER,
created_at INTEGER,
started_at INTEGER,
completed_at INTEGER
);
CREATE INDEX idx_jobs_type ON jobs(type);
CREATE INDEX idx_jobs_status ON jobs(status);
Field Descriptions:
id: Job UUIDtype: Job type (e.g.,harvest_keywords,serp_tracking,app_crawl)status:pending,running,completed,failedpayload: JSON job parametersresult: JSON job resulterror: Error message if failedretries: Number of retry attempts
Derived Fields
Weighted Score
The weighted_score is a ranking quality metric calculated as:
weighted_score = 1.0 - (best_rank / 500)
- Apps ranked #1 → score = 0.998
- Apps ranked #50 → score = 0.90
- Apps ranked #500 → score = 0.00
Used for sorting apps by overall ranking strength.
ClickHouse Schema
ClickHouse stores analytics and time-series data.
Keyword Snapshots
Historical keyword metrics.
CREATE TABLE keyword_snapshots (
keyword String,
project_id String,
search_volume UInt32,
difficulty UInt8,
cpc Float32,
competition Float32,
trend Array(UInt32),
captured_at DateTime,
location_code UInt32,
language_code String
) ENGINE = MergeTree()
ORDER BY (project_id, keyword, captured_at);
SERP Runs
SERP tracking results.
CREATE TABLE serp_runs (
run_id String,
keyword String,
project_id String,
url String,
position UInt16,
title String,
snippet String,
tracked_at DateTime,
location_code UInt32,
device String
) ENGINE = MergeTree()
ORDER BY (project_id, keyword, tracked_at);
App Analytics
App ranking history and metrics.
CREATE TABLE app_analytics (
app_id String,
platform String,
category_id String,
rank UInt16,
rating Float32,
rating_count UInt32,
download_count_exact UInt64,
captured_at DateTime
) ENGINE = MergeTree()
ORDER BY (app_id, captured_at);
Data Ownership
Different storage layers own different types of data:
| Store | Purpose | Tables |
|---|---|---|
| Base44 | Canonical entities | Keywords, Categories, Projects, relationships |
| ClickHouse | Analytics, time-series | keyword_snapshots, serp_runs, app_analytics |
| D1 | Operational state | apps, categories, rankings, jobs, crawl state |
| KV | Run state, caching | DFS_RUNS, DFS_BUDGETS, category mappings |
| R2 | Raw payloads | HTML snapshots from scraping |
See Data Architecture for detailed responsibilities.
Migration Notes
- ClickHouse Schema: Initialize with
GET /init/clickhouse - D1 Migrations: Located in
migrations/directory - Migration Status: Track migration completeness via ClickHouse tables
Schema Updates
When adding new fields:
- Create migration file in
migrations/ - Update this documentation
- Update TypeScript types if applicable
- Deploy with
wrangler deploy - Run migration with appropriate endpoint
See Also
- API Endpoints - Complete API reference
- Data Architecture - Storage layer responsibilities
- ClickHouse Integration - ClickHouse setup and usage