Data Adapter queries
Use these examples to execute SQL queries using Data Adapter
Overview
Once you have successfully connected your preferred tool to Data Adapter, you should be able to execute your own SQL queries.
Amazon Athena supports a subset of the Data Definition Language (DDL) and Data Manipulation Language (DML) statements, functions, operators, and data types.
AWS Athena SQL reference
For more information on supported statements and caveats, please see the AWS SQL reference for Athena. This includes the full list of supported SQL functions.
This guide covers some common queries and functions to help get you started.
Data dictionary for navigating your Root data
To help you in navigate Root's data structure and tables, we have created a comprehensive data dictionary. It offers detailed information on data elements, their types, keys and interrelationships. It also provides hints on how you can use the data to extract meaningful insights for your business.
Click here to view the data dictionary. We hope it helps you maximise the value of the Data Adapter and streamline data analysis processes.
Working with JSON fields
Due to the highly configurable nature of the Root platform, many fields are represented as JSON objects or arrays that are returned as plain VARCHAR
strings. The Athena Querying JSON guide contains good examples of how to work with these fields.
Use the API reference to understand the shape of JSON data
Many of the JSON fields you will encounter when using Data Adapter are described in our API reference.
For example, if you want to understand the shape of the
created_by
JSON field on thepayments
table, go to the List policy payments endpoint and expand the200 Success
response to see a schema of thepayment
object.
Here are examples of how to extract data from a JSON field. These examples query the charges
JSON array on the policies
table. To understand which properties to expect on each charges
object, see the policy object on the API reference.
SELECT
policy_id,
-- first charge
JSON_EXTRACT_SCALAR(charges, '$[0].name')
AS "charge_0_name",
CAST(JSON_EXTRACT(charges, '$[0].amount') AS INTEGER)
AS "charge_0_amount",
-- second charge
JSON_EXTRACT_SCALAR(charges, '$[1].name')
AS "charge_1_name",
CAST(JSON_EXTRACT(charges, '$[1].amount') AS INTEGER)
AS "charge_1_amount",
-- last charge
JSON_ARRAY_GET(charges, -1)
AS "charges_last",
JSON_EXTRACT_SCALAR(JSON_ARRAY_GET(charges, -1), '$.name')
AS "charges_last_name",
CAST(JSON_EXTRACT(JSON_ARRAY_GET(charges, -1), '$.amount') AS INTEGER)
AS "charges_last_amount"
FROM
policies
LIMIT 100
;
Working with dates
You can find a good overview of working with dates in the PrestoDB documentation. Dates on the Root platform are generally represented as ISO 8601 strings in the UTC timezone.
All entities have a created_at
field, allowing to fetch entities from a certain timeframe, like how many policies were created in the last month:
SELECT
count(*)
FROM
policies
WHERE
-- specify dates relative to now (eg. not older than 1 month)
AND created_at > NOW() - INTERVAL '1' MONTH
Below are examples of converting date strings to timestamps. These examples query the schedule_versions
JSON array on the policies
table. To understand which properties to expect on each schedule version object, refer to the policy object on the API reference.
SELECT
policy_id,
JSON_ARRAY_GET(schedule_versions, -1)
AS "last_schedule_version",
-- parse string to ensure valid date
from_iso8601_timestamp(
JSON_EXTRACT_SCALAR(
JSON_ARRAY_GET(schedule_versions, -1),
'$.createdAt'
)
) AS "last_schedule_created",
-- cast to timestamp for further comparison
CAST (
from_iso8601_timestamp(
JSON_EXTRACT_SCALAR(
JSON_ARRAY_GET(schedule_versions, -1),
'$.createdAt'
)
)
AS timestamp
) AS "last_schedule_created_timestamp"
FROM
policies
LIMIT 100
If you want to work with dates in a particular timezone, please see Athena's list of supported timezones and available timezone functions. When using the format_datetime
function, also have a look at the available formatting options.
Below are examples of working with timezones. These examples convert the created_at
date on the policies
table to different timezones.
SELECT
policy_id,
created_at,
-- display timestamp in specific timezone
created_at AT TIME ZONE 'UTC' AS "created_at_utc",
created_at AT TIME ZONE 'Africa/Johannesburg' AS "created_at_jhb",
created_at AT TIME ZONE 'America/Los_Angeles' AS "created_at_la",
-- format timestamp
format_datetime(
created_at AT TIME ZONE 'Africa/Johannesburg',
'E, d MMMM y H:m:s Z'
) AS "created_at_format"
FROM
policies
WHERE
-- specify range in specific timezone
created_at
BETWEEN timestamp '2022-11-01 00:00:00 Africa/Johannesburg'
AND timestamp '2022-12-01 00:00:00 Africa/Johannesburg'
-- specify dates relative to now (eg. not older than 1 year)
AND created_at > NOW() - INTERVAL '1' YEAR
LIMIT 100
Creating views
To simplify analytical queries, you can make use of views
to extract data from complex fields.
Names of views must end with
_view
The name of any view created on the Root Data adapter must end with
_view
. For example,dinosure_policy_view
ormy_custom_event_view
.
In the example below, we start with aSELECT
query that extracts or combines the desired data from the policy_events
table:
- Extracting the
policy_id
from the persistence key and giving it an alias. - Extracting event
type
and with its own alias. - If the event was initiated by a user (and not via API or the system itself), we extract the UUID of that user with the alias
requested_by_user_id
.
The result of this query will make it easier to find and/or join events for specific policies, and perform further analysis.
SELECT
regexp_replace(
persistence_key,
'^.+-policy:',
''
) AS "policy_id",
created_at,
json_extract_scalar(data, '$.type') AS "type",
sequence,
version,
data,
CASE
json_extract_scalar(data, '$.metadata.requested_by.type')
WHEN 'user' THEN json_extract_scalar(
data,
'$.metadata.requested_by.id'
)
END AS "requested_by_user_id",
persistence_key
FROM
policy_events
LIMIT 100
Once we have confirmed that the query is returning the expected results, we can use the CREATE VIEW
query:
CREATE VIEW policy_events_view AS
SELECT
regexp_replace(
persistence_key,
'^.+-policy:',
''
) AS "policy_id",
created_at,
json_extract_scalar(data, '$.type') AS "type",
sequence,
version,
data,
CASE
json_extract_scalar(data, '$.metadata.requested_by.type')
WHEN 'user' THEN json_extract_scalar(data, '$.metadata.requested_by.id')
END AS "requested_by_user_id",
persistence_key
FROM
policy_events
Now, we can use this view in the same way as any other table in further queries. In this case, we are using the view to combine information from the policy_events
table with information from the policies
and users
tables.
SELECT
pol.policy_id,
ev.created_at AS "activated_at",
usr.email AS "activated_by"
FROM policies AS pol
LEFT JOIN policy_events_view AS ev
ON ev.policy_id = pol.policy_id
AND ev.type = 'policy_activated'
LEFT JOIN users AS usr
ON usr.id = ev.requested_by_user_id
LIMIT 100
If the view was created for temporary analyses, don't forget to DROP
the view when you are done:
DROP TABLE policy_events_view
Updated 6 days ago