-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
query language #9
Comments
ping @jainilajmera @mullr @jhubberts for thoughts/insights |
I am now targeting something that looks more like this:
|
from discussion with Russ about https://docs.auxon.io/speqtr/syntax.html#queries
|
|
big chunk of query execution stuff is now merged. Leaving this open as we still do not have variable length array support. |
will actually close this and track new query language features in the monster list - that won't be the only one |
Background
Today dp3 executes only one kind of query, which you could consider as an unrestricted as-of join on timestamp, or a timestamp-sorted union.
We will want to expand this to more flexibly defined as-of joins on timestamp, and more generally provide a query language interface with some "psql-like" functionality in the CLI to the user to give them a more interactive and databasey experience.
SQL is a poor language for this because supporting any SQL sets up the expectation that we will support all of it -- which we are not prepared to do efficiently -- and even if we did there are no SQL implementations that express as-of joins in a concise and user-friendly way, that SQL non-users will be excited to pick up. People who like SQL will be annoyed that we only support a subset and people that hate it will be annoyed that our query language resembles it.
Finally SQL is going to limit our eventual prospects for flexible autocompletion support in the client, due to the way it lists columns prior to tables.
So, we are not going to implement SQL.
The concern about only implementing part of SQL also applies to the other languages people are trying to standardize like kusto or prql. We are at best going to get inspired by them - we aren't going to implement them or advertise support for them.
We do need a few similar features to what all of these provide though:
The intent of the language will be to support easy interactive searching, primarily on small numbers of topics, and to enable the expression of multi-topic conditions like "show me 50 times in the last month when we braked hard while it was raining" (considering in that example separate topics for "hard brake" and "is raining"). These conditions can get complex and incorporate several topics. I would expect that as-of joins bigger than 12-way will be rare and that most will be (well) under 6-way.
We likely never support other kinds of joins, or sorting on fields other than log time, or heavy analytics (at least in this engine). We need to play to the strengths of our system and the query engine that supports this will only be single-node. A closer analog for what we are targeting would be the query languages of sumologic, stackdriver, or any of the cloud log search tools, except with a lot more focus on as-of joins. For heavy analytics work users can use spark.
For purposes of autocomplete, we can assume that we can get both a fast listing of available tables (i.e topics), and a fast schema listing for a particular table. Both of those are true, we just don't have APIs for them. So if a query leads with a table and follows with the column restrictions, we will be able to autocomplete it (probably glossing over some nuances about the grammar).
Proposal
Get single topic:
Get two topics joined
Current behavior of topics param - comma operator
Restrict a single topic with scalar subfield
Conventional comparison operators supported i.e
=, <, <=, ~, *~, <>
, etc.Restrict on a fixed-size array element.
Restrict on variable-sized array element.
We probably do need the ability to address a variable-sized element by index, but other queries on variable-sized arrays are going to require some kind of "any", "all", or "none" semantics I think. For access by index we will use the same syntax as above for fixed-size array. For the others we need something special
It’s possible we won’t need the parentheses, but if we don’t it would be good to support them anyway since they make the structure clearer. In all cases above the argument to “any” could have been parenthesized.
Join two restrictions on timestamp
Precedes/succeeds/neighbors operators.
Supports keywords nanoseconds, microseconds, milliseconds, seconds, minutes. We will probably cap at 1 or 5 minutes for now since something is going to buffer that data until/unless we spill queries to disk, which we can defer until requested.
As-of semijoin
^ AKA suppression operator
Timestamp restriction
From/to keywords
Descending keyword
Reverse the sort order. This should be used at the end of queries (before limit/offset) but is also valid in subqueries. Most likely when used in subqueries the effect on precedes/succeeds behavior will become confusing.
Limit/offset
Valid at the end of any query/subquery only.
The text was updated successfully, but these errors were encountered: