Qri & SQL

Rico Gardaphe
4 min readApr 17, 2020

--

Qri now supports SQL. It’s only natural.

We want to live in a world where you can run queries on any dataset on Qri, and we're delighted to say that day is here. We’ve landed support for SQL queries in qri 0.9.7, bringing us full circle to the vision outlined in the original white paper we published in 2017:

The query engine writes the results to IPFS as a Dataset object, writing the Data, Query and Structure hash references into the Dataset, and saving the output Data too.

“With deterministic querying it is entirely possible that, for example, gorilla population data may hash-collide with data on the population of the city of London. While this [example] may seem nonsensical, we believe this form of toll-free comparability will encourage users to conceive of entirely new uses for data that had gone previously undiscovered. Combined with machine learning (particularly random forest decision trees), it is possible to have a computer chase down any number of these data comparison permutations looking for new insights from hard-earned data and queries.”

One day, gorilla population data and London population data may make sense to blend. For now at least, we’re celebrating your ability to run queries on any dataset held by you in your collection AND any dataset published to the qri network. Because Qri is a peer-to-peer network, the limit on the number of datasets you can play with is limited only by the size of the hard drives of qri users. So…big.

It’s still early

We have plans to improve & build upon this crucial feature, and we’re marking it as experimental while we flesh out our SQL implementation. We’ll drop the “experimental” flag when we support a healthy subset of the SQL spec.

Care to see it in action? We’ve been talking about SQL a bunch in recent community calls:

Demo introducing SQL Support:

Qri as a global SQL database:

SQL errors & prepping datasets for querying:

Let’s see an Example

You can use SQL right now on any dataset in your local collection with the command qri sql. Here are some examples using the New York City Popular Baby Names dataset published by the city’s Department of Health and Mental Hygiene:

A simple SELECT with LIMIT

qri sql "SELECT * FROM nyc-open-data-archive/popular-baby-names a LIMIT 10"
+-----------------+----------+-------------+---------------------+---------+--------+
| a.year_of_birth | a.gender | a.ethnicity | a.childs_first_name | a.count | a.rank |
+-----------------+----------+-------------+---------------------+---------+--------+
| 2011 | 'FEMALE' | 'HISPANIC' | 'GERALDINE' | 13 | 75 |
| 2011 | 'FEMALE' | 'HISPANIC' | 'GIA' | 21 | 67 |
| 2011 | 'FEMALE' | 'HISPANIC' | 'GIANNA' | 49 | 42 |
| 2011 | 'FEMALE' | 'HISPANIC' | 'GISELLE' | 38 | 51 |
| 2011 | 'FEMALE' | 'HISPANIC' | 'GRACE' | 36 | 53 |
| 2011 | 'FEMALE' | 'HISPANIC' | 'GUADALUPE' | 26 | 62 |
| 2011 | 'FEMALE' | 'HISPANIC' | 'HAILEY' | 126 | 8 |
| 2011 | 'FEMALE' | 'HISPANIC' | 'HALEY' | 14 | 74 |
| 2011 | 'FEMALE' | 'HISPANIC' | 'HANNAH' | 17 | 71 |
| 2011 | 'FEMALE' | 'HISPANIC' | 'HAYLEE' | 17 | 71 |
+-----------------+----------+-------------+---------------------+---------+--------+

How many records are there for each distinct ethnicity?

qri sql "SELECT a.ethnicity, count(childs_first_name) AS count FROM nyc-open-data-archive/popular-baby-names a GROUP BY a.ethnicity"
+------------------------------+-------+
| a.ethnicity | count |
+------------------------------+-------+
| 'WHITE NON HISP' | 1338 |
| 'HISPANIC' | 5714 |
| 'WHITE NON HISPANIC' | 5473 |
| 'ASIAN AND PACIFIC ISLANDER' | 2693 |
| 'BLACK NON HISPANIC' | 2826 |
| 'ASIAN AND PACI' | 693 |
| 'BLACK NON HISP' | 681 |
+------------------------------+-------+

The feature is young, so a lot of what you can do in other flavors isn’t quite there yet, but simple queries and joins will work. Give it a try, kick the tires, and let us know what features you need!

SQL support is coming to Desktop

We’re designing and coding up wires for SQL support in Qri Desktop, so keep your eye out for our latest releases. Our desktop app auto-updates when we cut new releases, so download today and you’re all set.

--

--

Rico Gardaphe

Head of Business Development for Qri — free and open source dataset versioning software. Former strategy consultant and Obama White House staffer.