Data Model
Hardal's ClickHouse database schema — core raw tables and pre-aggregated analytics tables for fast querying.
Hardal stores your data in ClickHouse across two types of tables: core tables that hold raw event data, and analytics tables that are pre-aggregated for fast querying.
Use the analytics tables whenever possible. They are pre-aggregated and optimized for performance — queries run significantly faster than scanning raw core tables.
Analytics Tables
These tables are pre-aggregated and purpose-built for reporting. Start here.
sessions
Every user session with device, location, and attribution data. One row per event within a session.
| Column | Type | Description |
|---|---|---|
website_id | String | Your signal ID |
user_id | String | Anonymous user identifier |
session_id | String | Session identifier |
event_id | UUID | Unique event identifier |
event_name | String | Event name (e.g. page_view, purchase) |
country | String | Visitor country |
city | String | Visitor city |
device | String | Device type (desktop, mobile, tablet) |
browser | String | Browser name |
os | String | Operating system |
hostname | String | Domain where the event occurred |
referrer_domain | String | Referring domain |
url | String | Full page URL |
event_time | DateTime | When the event occurred |
session_part | UInt32 | Part number within a multi-part session |
-- Page views by country for the last 7 days
SELECT country, count() AS views
FROM sessions
WHERE website_id = 'your_signal_id'
AND event_name = 'page_view'
AND event_time >= now() - INTERVAL 7 DAY
GROUP BY country
ORDER BY views DESC
items
Product-level data from e-commerce events (purchase, add_to_cart, view_item, etc.).
| Column | Type | Description |
|---|---|---|
website_id | String | Your signal ID |
session_id | String | Session identifier |
event_id | UUID | Parent event identifier |
event_name | String | Event that contained the item |
event_time | DateTime | Event timestamp |
transaction_id | String | Order or transaction ID |
transaction_value | Float64 | Total transaction value |
currency | String | Currency code |
item_id | String | Product identifier |
item_name | String | Product name |
item_brand | String | Brand name |
item_category | String | Primary category |
item_category2 | String | Secondary category |
item_category3 | String | Tertiary category |
item_sku | String | SKU code |
item_variant | String | Variant name (e.g. color, size) |
item_variant_id | String | Variant identifier |
item_price | Float64 | Unit price at purchase |
item_first_price | Float64 | Original price before discount |
item_tax | Float64 | Tax amount |
item_discount | Float64 | Discount amount |
item_quantity | UInt32 | Units purchased |
-- Top products by revenue this month
SELECT
item_name,
item_brand,
sum(item_price * item_quantity) AS revenue,
sum(item_quantity) AS units_sold
FROM items
WHERE website_id = 'your_signal_id'
AND event_name = 'purchase'
AND toYYYYMM(event_time) = toYYYYMM(today())
GROUP BY item_name, item_brand
ORDER BY revenue DESC
LIMIT 20
session_attr
Session-level marketing attribution — which source, medium, and campaign drove each session.
| Column | Type | Description |
|---|---|---|
website_id | String | Your signal ID |
session_id | String | Session identifier |
source | String | Traffic source (e.g. google, facebook) |
medium | String | Traffic medium (e.g. cpc, organic) |
campaign | String | UTM campaign value |
channel | String | Resolved channel (e.g. PaidSearch, OrganicSocial) |
channel_score | Float64 | Attribution weight score |
touchpoint_count | UInt32 | Number of touchpoints in session |
-- Conversions by channel this week
SELECT
sa.channel,
count() AS conversions,
sum(pr.revenue) AS revenue
FROM sessions s
JOIN session_attr sa ON s.session_id = sa.session_id AND s.website_id = sa.website_id
JOIN purchase_revenue pr ON s.event_id = pr.event_id AND s.website_id = pr.website_id
WHERE s.website_id = 'your_signal_id'
AND s.event_name = 'purchase'
AND s.event_time >= now() - INTERVAL 7 DAY
GROUP BY sa.channel
ORDER BY revenue DESC
user_attr
User-level attribution — the winning channel across all of a user's sessions.
| Column | Type | Description |
|---|---|---|
website_id | String | Your signal ID |
user_id | String | Anonymous user identifier |
source | String | Attributed source |
medium | String | Attributed medium |
campaign | String | Attributed campaign |
channel | String | Attributed channel |
channel_score | Float64 | Attribution score |
touchpoint_count | UInt32 | Total touchpoints across sessions |
purchase_revenue
Deduplicated revenue per event. Use this to sum revenue without double-counting.
| Column | Type | Description |
|---|---|---|
website_id | String | Your signal ID |
event_id | UUID | Purchase event identifier |
revenue | Float64 | Revenue value (deduped) |
vw_daily_campaign_attribution
Pre-aggregated daily view of campaign performance. Best for campaign reporting queries.
| Column | Type | Description |
|---|---|---|
date | Date | Day |
campaign | String | Campaign name |
source | String | Traffic source |
medium | String | Traffic medium |
channel | String | Resolved channel |
sessions | UInt64 | Unique sessions |
conversions | UInt64 | Conversion events |
conversionRate | Float64 | Conversions ÷ sessions |
revenue | Float64 | Total revenue |
channelScore | Float64 | Average channel attribution score |
-- Campaign performance last 30 days
SELECT
source,
medium,
campaign,
sum(sessions) AS sessions,
sum(conversions) AS conversions,
sum(revenue) AS revenue
FROM vw_daily_campaign_attribution
WHERE date >= today() - 30
GROUP BY source, medium, campaign
ORDER BY revenue DESC
Core Tables
These tables hold the raw event data that powers everything above.
Avoid querying core tables directly for reporting — they are high-volume and not optimized for aggregation queries. Use the analytics tables above instead.
| Table | Engine | Purpose |
|---|---|---|
website_event | MergeTree (partitioned monthly) | Every raw event with full device, browser, URL, and location data |
event_data | MergeTree | Custom event properties stored as key-value pairs |
session_data | ReplacingMergeTree | Session-level properties stored as key-value pairs |
touchpoints | MergeTree (partitioned monthly) | Raw marketing touchpoints with every supported click ID and UTM parameter |
website_event schema
The primary raw event table. Every hardal.track() call writes a row here.
| Column | Type | Description |
|---|---|---|
id | UUID | Event identifier |
website_id | LowCardinality(String) | Signal ID |
session_id | LowCardinality(String) | Session identifier |
hostname | LowCardinality(String) | Domain |
browser / browser_version | String | Browser and version |
os | String | Operating system |
device | String | Device type |
screen / viewport_size | String | Screen and viewport dimensions |
language | String | Browser language |
country / city / region | String | Geo data |
timezone | String | User timezone |
url / url_path / url_query | String | Full URL and components |
referrer / referrer_domain | String | Referrer data |
page_title | String | HTML page title |
event_type | String | pageview or event |
event_name | String | Event name |
created_at | DateTime | Timestamp (UTC) |
event_data schema
Custom parameters passed with events (e.g. value, currency, item_id). Each property is a separate row.
| Column | Type | Description |
|---|---|---|
website_id | String | Signal ID |
session_id | String | Session identifier |
event_id | UUID | Links to website_event.id |
event_name | String | Event name |
data_key | String | Property name |
string_value | Nullable(String) | String value |
number_value | Nullable(Float64) | Numeric value |
date_value | Nullable(DateTime) | Date value |
data_type | UInt32 | 1 = string, 2 = number, 3 = date |