SQL Queries Guide for Hardal Analytics

Overview

Hardal Analytics allows you to create custom metrics using SQL queries to analyze your data in ways specific to your business needs. This guide will help you understand how to write effective SQL queries in Hardal’s Analytics Dashboard to extract insights from your collected data.

Hardal uses ClickHouse as its underlying database, which provides high-performance SQL querying capabilities for analytics workloads.

Prerequisites

Before writing SQL queries in Hardal Analytics, you should have:

  • Access to the Hardal Analytics Dashboard
  • Basic understanding of SQL syntax
  • Familiarity with JSON data structures (as much of Hardal’s data is stored in JSON format)
  • Understanding of your data collection setup

Data Structure

Learn details about Hardal Database Structure

Database Overview

The analytics data is stored in a database called default which contains multiple tables. For this documentation, we’ll focus on the analytics table, which can be accessed using the format default.analytics.

Table Structure

The analytics table contains the following columns:

Column NameTypeDescription
idUUIDUnique identifier for each record
event_nameLowCardinality(String)Name of the tracked event
distinct_idLowCardinality(String)Unique identifier for tracking
propertiesStringJSON object containing detailed event data
created_atDateTimeTimestamp of event creation

Properties Structure

The properties column contains a JSON object with rich event data. Here’s the standard structure:

Top-Level Properties

  • distinct: Contains user identification information
  • page: Page-related information
  • screen: Device screen specifications
  • browser: Browser details
  • device_type: Type of device (e.g., desktop, mobile)
  • timezone: User’s timezone
  • timestamp: Event timestamp
  • query_params: URL query parameters
  • events: Array of detailed event information
  • batch_size: Number of events in batch
  • batch_timestamp: Timestamp of the batch

Properties Detail

Page Information
"page": {
  "url": "Full URL",
  "path": "URL path",
  "title": "Page title",
  "protocol": "Protocol used",
  "hostname": "Domain name",
  "hash": "URL hash",
  "referrer": "Referring URL"
}
Screen Information
"screen": {
  "resolution": "Screen resolution",
  "color_depth": "Color depth in bits",
  "pixel_depth": "Pixel depth",
  "viewport_size": "Viewport dimensions",
  "device_pixel_ratio": "Device pixel ratio"
}
Browser Information
"browser": {
  "name": "Browser name",
  "version": "Browser version",
  "language": "Browser language",
  "platform": "Operating platform",
  "vendor": "Browser vendor",
  "user_agent": "User agent string"
}

Writing SQL Queries

To create a custom SQL query:

  1. Navigate to the Analytics Dashboard
  2. Click “Create Metric” button
  3. In the query editor, write your SQL query
  4. Click “Execute Query” to run and visualize the results
  5. Select your preferred visualization type (Table, Line Chart, Bar Chart, or Score Card)

Basic Query Structure

A basic query to retrieve data from Hardal follows this structure:

SELECT 
    [columns]
FROM 
    default.analytics
WHERE 
    [conditions]
GROUP BY 
    [grouping]
ORDER BY 
    [ordering]
LIMIT 
    [limit]

Working with JSON Data

Since much of Hardal’s data is stored in JSON format in the properties column, you’ll need to use special functions to extract values:

String Extraction

-- Basic property extraction
JSONExtractString(properties, 'device_type')
JSONExtractString(properties, 'timezone')

-- Nested property extraction
JSONExtractString(properties, 'page', 'url')
JSONExtractString(properties, 'browser', 'name')
JSONExtractString(properties, 'screen', 'resolution')

Numeric Extraction

-- Float values
JSONExtractFloat(properties, 'screen', 'device_pixel_ratio')
JSONExtractFloat(properties, 'screen', 'color_depth')

-- Integer values
JSONExtractInt(properties, 'batch_size')

Array Extraction

When working with arrays in JSON (like events):

-- Extract event name from first event (array index starts at 1)
JSONExtractString(JSONExtractString(properties, 'events')[1], 'event_name')

-- Extract source from first event
JSONExtractString(JSONExtractString(properties, 'events')[1], 'source')

-- Extract timestamp from first event
JSONExtractString(JSONExtractString(properties, 'events')[1], 'timestamp')

-- Working with arrays using ARRAY JOIN
FROM default.analytics
ARRAY JOIN JSONExtractArrayRaw(properties, 'events') as event

-- Extract nested query parameters from first event
JSONExtractString(
  JSONExtractString(
    JSONExtractString(properties, 'events')[1], 
    'query_params'
  ), 
  'tid'
)

Date and Time Filtering

Important: Hardal requires all queries to include a date range filter using the BETWEEN toDateTime(?) AND toDateTime(?) syntax. This is mandatory to prevent queries from scanning too much data and affecting performance.

-- Required date range format (replace ? with actual dates)
WHERE created_at BETWEEN toDateTime(?) AND toDateTime(?)

-- Example with specific dates
WHERE created_at BETWEEN toDateTime('2025-02-10 00:00:00') AND toDateTime('2025-02-17 23:59:59')

-- You can add additional time filters
WHERE created_at BETWEEN toDateTime('2025-02-17 00:00:00') AND toDateTime('2025-02-18 00:00:00')
  AND JSONExtractString(properties, 'timestamp') LIKE '%18T%'

If you want to filter for the last 7 days, you’ll need to calculate the date range explicitly:

WHERE created_at BETWEEN toDateTime(date_sub(DAY, 7, now())) AND toDateTime(now())

Example Queries

Here are some practical examples of SQL queries you can use in Hardal Analytics:

Basic Property Extraction

SELECT
    id,
    event_name,
    JSONExtractString(properties, 'device_type') AS device,
    JSONExtractString(properties, 'page', 'url') AS page_url,
    JSONExtractString(properties, 'browser', 'name') AS browser_name,
    JSONExtractString(properties, 'screen', 'resolution') AS screen_resolution,
    JSONExtractFloat(properties, 'screen', 'device_pixel_ratio') AS pixel_ratio,
    created_at
FROM default.analytics
WHERE created_at BETWEEN toDateTime('2025-02-10 00:00:00') AND toDateTime('2025-02-17 23:59:59')
LIMIT 10;

Basic Page View Analysis

SELECT
  toDate(created_at) as date,
  JSONExtractString(JSONExtractRaw(properties, 'page'), 'path') as page_path,
  COUNT(*) as view_count
FROM default.analytics
WHERE event_name = 'page_view'
  AND created_at BETWEEN toDateTime(date_sub(DAY, 7, now())) AND toDateTime(now())
GROUP BY
  date,
  page_path
ORDER BY date DESC, view_count DESC
FORMAT TabSeparatedWithNames

Purchase Analysis by City

WITH CityPurchases AS (
    SELECT 
        JSONExtractString(properties, 'city') AS city,
        COUNT(DISTINCT JSONExtractString(properties, 'crmId')) AS unique_customers,
        COUNT(*) AS total_purchases,
        SUM(toFloat64OrNull(JSONExtractString(item, 'price'))) AS total_revenue,
        AVG(toFloat64OrNull(JSONExtractString(item, 'price'))) AS avg_basket_size,
        COUNT(*) / COUNT(DISTINCT JSONExtractString(properties, 'crmId')) AS purchases_per_customer
    FROM default.analytics
    ARRAY JOIN JSONExtractArrayRaw(properties, 'items') AS item
    WHERE created_at BETWEEN toDateTime(date_sub(DAY, 30, now())) AND toDateTime(now())
        AND event_name = 'purchase'
        AND JSONExtractString(properties, 'country') = 'TR'
        AND JSONExtractString(properties, 'city') != ''
    GROUP BY city
    ORDER BY total_purchases DESC
    LIMIT 20
)
SELECT 
    city AS "City",
    unique_customers AS "Unique Customers",
    total_purchases AS "Total Purchases",
    ROUND(total_revenue, 2) AS "Total Revenue",
    ROUND(avg_basket_size, 2) AS "Average Basket Size",
    ROUND(purchases_per_customer, 2) AS "Purchases per Customer"
FROM CityPurchases
ORDER BY "Total Purchases" DESC

Purchases by Platform

SELECT
   toDate(created_at) AS PurchaseDate,
   CASE
       WHEN JSONExtractString(properties, 'user_agent') LIKE '%iPhone%' OR JSONExtractString(properties, 'user_agent') LIKE '%iPad%' THEN 'iOS'
       WHEN JSONExtractString(properties, 'user_agent') LIKE '%Android%' THEN 'Android'
       WHEN JSONExtractString(properties, 'user_agent') LIKE '%Windows%' OR JSONExtractString(properties, 'user_agent') LIKE '%Macintosh%' THEN 'Web'
       ELSE 'Unknown'
   END AS Platform,
   COUNT(*) AS TotalPurchases,
   SUM(toFloat64OrNull(JSONExtractString(item, 'price'))) AS TotalRevenue
FROM default.analytics
ARRAY JOIN JSONExtractArrayRaw(properties, 'items') AS item
WHERE
   created_at BETWEEN toDateTime(date_sub(DAY, 7, now())) AND toDateTime(now())
   AND event_name = 'purchase'
GROUP BY
   PurchaseDate,
   Platform
ORDER BY
   PurchaseDate DESC,
   TotalPurchases DESC

User Journey Analysis

WITH 
base_data AS (
    SELECT
        created_at,
        event_name,
        JSONExtractString(properties, 'distinct', 'server_distinct_id') AS server_distinct_id,
        JSONExtractString(properties, 'page', 'url') AS page_url,
        JSONExtractString(properties, 'page', 'path') AS page_path,
        JSONExtractString(properties, 'page', 'referrer') AS page_referrer,
        JSONExtractString(properties, 'device_type') AS device_type
    FROM default.analytics
    WHERE created_at BETWEEN toDateTime(date_sub(DAY, 2, now())) AND toDateTime(now())
),

session_calcs AS (
    SELECT 
        *,
        MIN(created_at) OVER (PARTITION BY server_distinct_id) as first_seen,
        MAX(created_at) OVER (PARTITION BY server_distinct_id) as last_seen,
        if(dateDiff('minute',
            lagInFrame(created_at) OVER (PARTITION BY server_distinct_id ORDER BY created_at),
            created_at) > 30 OR
            lagInFrame(created_at) OVER (PARTITION BY server_distinct_id ORDER BY created_at) IS NULL,
            1, 0) AS is_new_session
    FROM base_data
)

SELECT 
    server_distinct_id,
    event_name,
    page_url,
    page_path,
    page_referrer,
    device_type,
    first_seen,
    last_seen,
    COUNT(*) as event_count
FROM session_calcs
GROUP BY 
    server_distinct_id,
    event_name,
    page_url,
    page_path,
    page_referrer,
    device_type,
    first_seen,
    last_seen
ORDER BY last_seen DESC
LIMIT 100

Analyzing GA4 Events

SELECT 
    toDate(created_at) as date,
    COUNT(*) as total_purchases,
    JSONExtractString(JSONExtractRaw(e, 'query_params'), 'ep.transaction_id') as transaction_id,
    SUM(CAST(JSONExtractString(JSONExtractRaw(e, 'query_params'), 'epn.value') AS Float64)) as total_value
FROM default.analytics
ARRAY JOIN JSONExtractArrayRaw(properties, 'events') as e
WHERE created_at >= toDateTime('2025-02-17 00:00:00')
    AND created_at < toDateTime('2025-02-19 00:00:00')
    AND event_name = 'network_batch'
    AND JSONExtractString(e, 'source') = 'ga4'
    AND JSONExtractString(e, 'event_name') = 'purchase'
GROUP BY 
    date,
    transaction_id
ORDER BY total_value DESC

Visualization Options

After executing your query, you can visualize the results in several formats:

Table View

Displays the raw data in a tabular format. Best for detailed data exploration or when you need to see individual records.

Bar Chart

Useful for comparing values across categories. Good for queries that group data by dimensions like page, country, or device type.

Line Chart

Ideal for showing trends over time. Best for queries that include date or timestamp columns.

Score Card

Shows a single value, usually a key metric like total purchases, average revenue, or user count. Good for dashboard summary metrics.

Troubleshooting and Optimization

Common Issues

  1. Missing Required Date Range: All queries must include a date range filter:

    • Ensure you include WHERE created_at BETWEEN toDateTime(?) AND toDateTime(?)
    • Replace the ? placeholders with actual date values
  2. Query Timeout: If your query takes too long to execute:

    • Add more specific filters
    • Narrow the date range
    • Add a LIMIT clause
    • Use more efficient JSON extraction
  3. No Results: If your query returns no data:

    • Check your date range - ensure data exists for the period
    • Verify event names and property paths
    • Simplify filters to see if any data exists
  4. JSON Extraction Errors: If you’re having trouble accessing nested data:

    • Use the right extraction function for the data type
    • Check the path exactly
    • Remember array indices start at 1 in ClickHouse, not 0

Performance Tips

  1. Use Specific Date Ranges: Always include a date filter to limit the amount of data scanned.

  2. Filter Early: Apply WHERE clauses as early as possible in your query logic.

  3. Limit Result Sets: Use LIMIT or TOP to return only the data you need.

  4. Use Appropriate JSON Extractors: Match the extractor to the data type (String, Int, Float).

  5. Avoid Complex Subqueries: Use CTEs (WITH clauses) for better readability and performance.

Best Practices

  1. Always Include Date Range: Every query must include the required date range filter using BETWEEN toDateTime(?) AND toDateTime(?).

  2. Document Your Queries: Add comments to explain complex logic.

  3. Use Meaningful Aliases: Give columns clear names that reflect what the data represents.

  4. Format Your Queries: Use consistent indentation and line breaks for readability.

  5. Test Incrementally: Build complex queries step by step, verifying results at each stage.

  6. Reuse Tested Patterns: Save working query patterns for future use.

  7. Handle NULL Values: Always handle potential NULL values in JSON extraction or calculations.

  8. Use WITH Clauses: For complex queries, use CTEs (WITH clauses) to make your logic clearer.

  9. Use Appropriate JSON Extraction Functions:

    • JSONExtractString for text
    • JSONExtractFloat for decimal numbers
    • JSONExtractInt for whole numbers
    • JSONExtractBool for boolean values
  10. Remember Array Indexing: When working with arrays, remember that indexing starts at 1, not 0

  11. Chain Extractions Correctly: For nested properties, chain the extraction functions from outer to inner elements

  12. Use LIMIT During Development: Consider adding LIMIT clauses during development to avoid processing large amounts of data