Analytics Database Documentation

Database Overview

The analytics data is stored in a database called default which contains multiple tables. For this documentation, we’ll focus on the analytics table, which can be accessed using the format default.analytics.

Table Structure

The analytics table contains the following columns:

Column NameTypeDescription
idUUIDUnique identifier for each record
event_nameLowCardinality(String)Name of the tracked event
distinct_idLowCardinality(String)Unique identifier for tracking
propertiesStringJSON object containing detailed event data
created_atDateTimeTimestamp of event creation

Properties Structure

The properties column contains a JSON object with rich event data. Here’s the standard structure:

Top-Level Properties

  • distinct: Contains user identification information

  • page: Page-related information

  • screen: Device screen specifications

  • browser: Browser details

  • device_type: Type of device (e.g., desktop, mobile)

  • timezone: User’s timezone

  • timestamp: Event timestamp

  • query_params: URL query parameters

  • events: Array of detailed event information

  • batch_size: Number of events in batch

  • batch_timestamp: Timestamp of the batch

Properties Detail

Page Information

"page": {
  "url": "Full URL",
  "path": "URL path",
  "title": "Page title",
  "protocol": "Protocol used",
  "hostname": "Domain name",
  "hash": "URL hash",
  "referrer": "Referring URL"
}

Screen Information

"screen": {
  "resolution": "Screen resolution",
  "color_depth": "Color depth in bits",
  "pixel_depth": "Pixel depth",
  "viewport_size": "Viewport dimensions",
  "device_pixel_ratio": "Device pixel ratio"
}

Browser Information

"browser": {
  "name": "Browser name",
  "version": "Browser version",
  "language": "Browser language",
  "platform": "Operating platform",
  "vendor": "Browser vendor",
  "user_agent": "User agent string"
}

Querying Data

Basic Query Structure

To query the analytics table:

SELECT * FROM default.analytics

Extracting JSON Properties

The properties column contains nested JSON data that can be accessed using specialized functions:

String Extraction

-- Basic property extraction
JSONExtractString(properties, 'device_type')
JSONExtractString(properties, 'timezone')

-- Nested property extraction
JSONExtractString(properties, 'page', 'url')
JSONExtractString(properties, 'browser', 'name')
JSONExtractString(properties, 'screen', 'resolution')

Numeric Extraction

-- Float values
JSONExtractFloat(properties, 'screen', 'device_pixel_ratio')
JSONExtractFloat(properties, 'screen', 'color_depth')

-- Integer values
JSONExtractInt(properties, 'batch_size')

Working with Events Array

The events array contains detailed event information. To access specific events:

-- Extract event name from first event (array index starts at 1)
JSONExtractString(JSONExtractString(properties, 'events')[1], 'event_name')

-- Extract source from first event
JSONExtractString(JSONExtractString(properties, 'events')[1], 'source')

-- Extract timestamp from first event
JSONExtractString(JSONExtractString(properties, 'events')[1], 'timestamp')

-- Extract nested query parameters from first event
JSONExtractString(
  JSONExtractString(
    JSONExtractString(properties, 'events')[1], 
    'query_params'
  ), 
  'tid'
)

Example Queries

Here’s a complete example that extracts multiple properties:

SELECT
    id,
    event_name,
    JSONExtractString(properties, 'device_type') AS device,
    JSONExtractString(properties, 'page', 'url') AS page_url,
    JSONExtractString(properties, 'browser', 'name') AS browser_name,
    JSONExtractString(properties, 'screen', 'resolution') AS screen_resolution,
    JSONExtractFloat(properties, 'screen', 'device_pixel_ratio') AS pixel_ratio,
    created_at
FROM default.analytics
LIMIT 10;

Best Practices

  1. Always use appropriate JSON extraction functions based on the expected data type:

    • JSONExtractString for text

    • JSONExtractFloat for decimal numbers

    • JSONExtractInt for whole numbers

    • JSONExtractBool for boolean values

  2. When working with arrays, remember that indexing starts at 1, not 0

  3. For nested properties, chain the extraction functions from outer to inner elements

  4. Use meaningful aliases in your SELECT statements to make the output more readable

  5. Consider adding LIMIT clauses during development to avoid processing large amounts of data