Data Adapter queries

Use these examples to execute SQL queries using Data Adapter

Overview

Once you have successfully connected your preferred tool to Data Adapter, you should be able to execute your own SQL queries.

Amazon Athena supports a subset of the Data Definition Language (DDL) and Data Manipulation Language (DML) statements, functions, operators, and data types.

📘

AWS Athena SQL reference

For more information on supported statements and caveats, please see the AWS SQL reference for Athena. This includes the full list of supported SQL functions.

This guide covers some common queries and functions to help get you started.

Data dictionary for navigating your Root data

To help you in navigate Root's data structure and tables, we have created a comprehensive data dictionary. It offers detailed information on data elements, their types, keys and interrelationships. It also provides hints on how you can use the data to extract meaningful insights for your business.

Click here to view the data dictionary. We hope it helps you maximise the value of the Data Adapter and streamline data analysis processes.

Working with JSON fields

​Due to the highly configurable nature of the Root platform, many fields are represented as JSON objects or arrays that are returned as plain VARCHAR strings. The Athena Querying JSON guide contains good examples of how to work with these fields.

📘

Use the API reference to understand the shape of JSON data

Many of the JSON fields you will encounter when using Data Adapter are described in our API reference.

For example, if you want to understand the shape of the created_by JSON field on the payments table, go to the List policy payments endpoint and expand the 200 Success response to see a schema of the payment object.

Here are examples of how to extract data from a JSON field. These examples query the charges JSON array on the policies table. To understand which properties to expect on each charges object, see the policy object on the API reference.

SELECT
    policy_id,
    -- first charge
    JSON_EXTRACT_SCALAR(charges, '$[0].name')
        AS "charge_0_name",
    CAST(JSON_EXTRACT(charges, '$[0].amount') AS INTEGER)
        AS "charge_0_amount",
    -- second charge
    JSON_EXTRACT_SCALAR(charges, '$[1].name')
        AS "charge_1_name",
    CAST(JSON_EXTRACT(charges, '$[1].amount') AS INTEGER)
        AS "charge_1_amount",
    -- last charge
    JSON_ARRAY_GET(charges, -1)
        AS "charges_last",
    JSON_EXTRACT_SCALAR(JSON_ARRAY_GET(charges, -1), '$.name')
        AS "charges_last_name",
    CAST(JSON_EXTRACT(JSON_ARRAY_GET(charges, -1), '$.amount') AS INTEGER)
        AS "charges_last_amount"
FROM
    policies
LIMIT 100
;

Working with dates

You can find a good overview of working with dates in the PrestoDB documentation. Dates on the Root platform are generally represented as ISO 8601 strings in the UTC timezone.

All entities have a created_at field, allowing to fetch entities from a certain timeframe, like how many policies were created in the last month:

SELECT
    count(*)
FROM
    policies
WHERE
    -- specify dates relative to now (eg. not older than 1 month)
    AND created_at > NOW() - INTERVAL '1' MONTH

Below are examples of converting date strings to timestamps. These examples query the schedule_versions JSON array on the policies table. To understand which properties to expect on each schedule version object, refer to the policy object on the API reference.

SELECT
    policy_id,
    JSON_ARRAY_GET(schedule_versions, -1)
        AS "last_schedule_version",
    -- parse string to ensure valid date
    from_iso8601_timestamp(
        JSON_EXTRACT_SCALAR(
            JSON_ARRAY_GET(schedule_versions, -1),
            '$.createdAt'
        )
    ) AS "last_schedule_created",
    -- cast to timestamp for further comparison
    CAST (
        from_iso8601_timestamp(
            JSON_EXTRACT_SCALAR(
                JSON_ARRAY_GET(schedule_versions, -1),
                '$.createdAt'
            )
        )
        AS timestamp
    ) AS "last_schedule_created_timestamp"
FROM
    policies
LIMIT 100

If you want to work with dates in a particular timezone, please see Athena's list of supported timezones and available timezone functions. When using the format_datetime function, also have a look at the available formatting options.

Below are examples of working with timezones. These examples convert the created_at date on the policies table to different timezones.

SELECT
    policy_id,
    created_at,
    -- display timestamp in specific timezone
    created_at AT TIME ZONE 'UTC' AS "created_at_utc",
    created_at AT TIME ZONE 'Africa/Johannesburg' AS "created_at_jhb",
    created_at AT TIME ZONE 'America/Los_Angeles' AS "created_at_la",
    -- format timestamp
    format_datetime(
        created_at AT TIME ZONE 'Africa/Johannesburg',
        'E, d MMMM y H:m:s Z'
    ) AS "created_at_format"
FROM
    policies
WHERE
    -- specify range in specific timezone
    created_at
        BETWEEN timestamp '2022-11-01 00:00:00 Africa/Johannesburg'
        AND timestamp '2022-12-01 00:00:00 Africa/Johannesburg'
    -- specify dates relative to now (eg. not older than 1 year)
    AND created_at > NOW() - INTERVAL '1' YEAR
LIMIT 100

Creating views

To simplify analytical queries, you can make use of views to extract data from complex fields.

🚧

Names of views must end with _view

The name of any view created on the Root Data adapter must end with _view. For example, dinosure_policy_view or my_custom_event_view.

In the example below, we start with aSELECT query that extracts or combines the desired data from the policy_events table:

  • Extracting the policy_id from the persistence key and giving it an alias.
  • Extracting event type and with its own alias.
  • If the event was initiated by a user (and not via API or the system itself), we extract the UUID of that user with the alias requested_by_user_id.

The result of this query will make it easier to find and/or join events for specific policies, and perform further analysis.

SELECT
    regexp_replace(
        persistence_key,
        '^.+-policy:',
        ''
    ) AS "policy_id",
    created_at,
    json_extract_scalar(data, '$.type') AS "type",
    sequence,
    version,
    data,
    CASE
        json_extract_scalar(data, '$.metadata.requested_by.type')
        WHEN 'user' THEN json_extract_scalar(
            data,
            '$.metadata.requested_by.id'
        )
    END AS "requested_by_user_id",
    persistence_key
FROM
    policy_events
LIMIT 100

Once we have confirmed that the query is returning the expected results, we can use the CREATE VIEW query:

CREATE VIEW policy_events_view AS
    SELECT
        regexp_replace(
            persistence_key,
            '^.+-policy:',
            ''
        ) AS "policy_id",
        created_at,
        json_extract_scalar(data, '$.type') AS "type",
        sequence,
        version,
        data,
        CASE
            json_extract_scalar(data, '$.metadata.requested_by.type')
            WHEN 'user' THEN json_extract_scalar(data, '$.metadata.requested_by.id')
        END AS "requested_by_user_id",
        persistence_key
    FROM
        policy_events

Now, we can use this view in the same way as any other table in further queries. In this case, we are using the view to combine information from the policy_events table with information from the policies and users tables.

SELECT
    pol.policy_id,
    ev.created_at AS "activated_at",
    usr.email AS "activated_by"
FROM policies AS pol
LEFT JOIN policy_events_view AS ev
    ON ev.policy_id = pol.policy_id
    AND ev.type = 'policy_activated'
LEFT JOIN users AS usr
    ON usr.id = ev.requested_by_user_id
LIMIT 100

If the view was created for temporary analyses, don't forget to DROP the view when you are done:

DROP TABLE policy_events_view