Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bring startf inline #785

Merged
merged 4 commits into from
May 31, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.DS_Store
.dataset
.qri
qri
/qri
node_modules
coverage.txt
18 changes: 12 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
GOFILES = $(shell find . -name '*.go' -not -path './vendor/*')
define GOPACKAGES
golang.org/x/text \
github.com/360EntSecGroup-Skylar/excelize \
github.com/briandowns/spinner \
github.com/qri-io/apiutil \
github.com/fatih/color \
Expand All @@ -13,6 +13,9 @@ github.com/qri-io/deepdiff \
github.com/qri-io/dsdiff \
github.com/qri-io/varName \
github.com/qri-io/iso8601 \
github.com/qri-io/ioes \
github.com/qri-io/starlib \
github.com/ipfs/go-datastore \
github.com/sergi/go-diff/diffmatchpatch \
github.com/sirupsen/logrus \
github.com/spf13/cobra \
Expand All @@ -21,16 +24,20 @@ github.com/theckman/go-flock \
github.com/ugorji/go/codec \
github.com/beme/abide \
github.com/ghodss/yaml \
github.com/qri-io/ioes \
github.com/PuerkitoBio/goquery \
github.com/pkg/errors \
github.com/google/flatbuffers/go
github.com/google/flatbuffers/go \
github.com/ipfs/go-log \
golang.org/x/text \
go.starlark.net/starlark \
go.starlark.net/repl \
go.starlark.net/resolve
endef

define GX_DEP_PACKAGES
github.com/qri-io/registry/regclient \
github.com/qri-io/dag \
github.com/qri-io/qfs \
github.com/qri-io/startf
github.com/qri-io/qfs
endef

default: build
Expand Down Expand Up @@ -83,7 +90,6 @@ update-qri-deps: require-gopath
cd $$GOPATH/src/github.com/qri-io/deepdiff && git checkout master && git pull
cd $$GOPATH/src/github.com/qri-io/dsdiff && git checkout master && git pull
cd $$GOPATH/src/github.com/qri-io/jsonschema && git checkout master && git pull
cd $$GOPATH/src/github.com/qri-io/startf && git checkout master && git pull
cd $$GOPATH/src/github.com/qri-io/starlib && git checkout master && git pull
cd $$GOPATH/src/github.com/qri-io/dag && git checkout master && git pull
cd $$GOPATH/src/github.com/qri-io/ioes && git checkout master && git pull
Expand Down
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,6 @@ Qri is comprised of many specialized packages. Below you will find a summary of
| [`repo`](https://github.com/qri-io/qri/tree/master/repo) | [![Go Docs](https://godoc.org/github.com/qri-io/qri?status.svg)](https://godoc.org/github.com/qri-io/qri/repo) | [![report](https://goreportcard.com/badge/github.com/qri-io/qri)](https://goreportcard.com/report/github.com/qri-io/qri) | the repository: saving, removing, and storing datasets, profiles, and the config |
| [`dataset`](https://github.com/qri-io/dataset) | [![Go Docs](https://godoc.org/github.com/qri-io/dataset?status.svg)](https://godoc.org/github.com/qri-io/dataset) | [![report](https://goreportcard.com/badge/github.com/qri-io/dataset)](https://goreportcard.com/report/github.com/qri-io/dataset) | the blueprint for a dataset, the atoms that make up qri |
| [`registry`](https://github.com/qri-io/registry) | [![Go Docs](https://godoc.org/github.com/qri-io/registry?status.svg)](https://godoc.org/github.com/qri-io/registry) | [![report](https://goreportcard.com/badge/github.com/qri-io/registry)](https://goreportcard.com/report/github.com/qri-io/registry) | the blueprint for a registry: the service that allows profiles to be unique and datasets to be searchable |
| [`startf`](https://github.com/qri-io/startf) | [![Go Docs](https://godoc.org/github.com/qri-io/startf?status.svg)](https://godoc.org/github.com/qri-io/startf) | [![report](https://goreportcard.com/badge/github.com/qri-io/startf)](https://goreportcard.com/report/github.com/qri-io/startf) | brings starlark into qri to be used in transforms, adds qri specific functionality |
| [`starlib`](https://github.com/qri-io/starlib) | [![Go Docs](https://godoc.org/github.com/qri-io/starlib?status.svg)](https://godoc.org/github.com/qri-io/starlib) | [![report](https://goreportcard.com/badge/github.com/qri-io/starlib)](https://goreportcard.com/report/github.com/qri-io/starlib) | the starlark standard library available for qri transform scripts |
| [`qfs`](https://github.com/qri-io/qfs) | [![Go Docs](https://godoc.org/github.com/qri-io/qfs?status.svg)](https://godoc.org/github.com/qri-io/qfs) | [![report](https://goreportcard.com/badge/github.com/qri-io/qfs)](https://goreportcard.com/report/github.com/qri-io/qfs) | "qri file sytem" is Qri's file system abstraction for getting & storing data from different sources |
| [`ioes`](https://github.com/qri-io/ioes) | [![Go Docs](https://godoc.org/github.com/qri-io/ioes?status.svg)](https://godoc.org/github.com/qri-io/ioes) | [![report](https://goreportcard.com/badge/github.com/qri-io/ioes)](https://goreportcard.com/report/github.com/qri-io/ioes) | package to handle in, out, and error streams: gives us better control of where we send output and errors |
Expand Down
2 changes: 1 addition & 1 deletion actions/transform.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import (

"github.com/qri-io/dataset"
"github.com/qri-io/qri/p2p"
"github.com/qri-io/startf"
"github.com/qri-io/qri/startf"
)

func mutatedComponentsFunc(dsp *dataset.Dataset) func(path ...string) error {
Expand Down
3 changes: 3 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
module github.com/qri-io/qri

go 1.12
170 changes: 170 additions & 0 deletions startf/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
# Qri Starlark Transformation Syntax

Qri ("query") is about datasets. Transformations are repeatable scripts for generating a dataset. [Starlark](https://github.com/google/starlark-go/blob/master/doc/spec.md) is a scripting language from Google that feels a lot like python. This package implements starlark as a _transformation syntax_. Starlark tranformations are about as close as one can get to the full power of a programming language as a transformation syntax. Often you need this degree of control to generate a dataset.

Typical examples of a starlark transformation include:
* combining paginated calls to an API into a single dataset
* downloading unstructured structured data from the internet to extract
* pulling raw data off the web & turning it into a datset

We're excited about starlark for a few reasons:
* **python syntax** - _many_ people working in data science these days write python, we like that, starlark likes that. dope.
* **deterministic subset of python** - unlike python, starlark removes properties that reduce introspection into code behaviour. things like `while` loops and recursive functions are omitted, making it possible for qri to infer how a given transformation will behave.
* **parallel execution** - thanks to this deterministic requirement (and lack of global interpreter lock) starlark functions can be executed in parallel. Combined with peer-2-peer networking, we're hoping to advance tranformations toward peer-driven distribed computing. More on that in the coming months.


## Getting started
If you're mainly interested in learning how to write starlark transformations, our [documentation](https://qri.io/docs) is a better place to start. If you're interested in contributing to the way starlark transformations work, this is the place!

The easiest way to see starlark transformations in action is to use [qri](https://github.com/qri-io/qri). This `startf` package powers all the starlark stuff in qri. Assuming you have the [go programming language](https://golang.org/) the following should work from a terminal:

<!--
docrun:
pass: true
-->
```shell
# get this package
$ go get github.com/qri-io/startf

# navigate to package
$ cd $GOPATH/src/github.com/qri-io/startf
```

# run tests

<!--
docrun:
pass: true
-->
```
$ go test ./...
```

Often the next steps are to install [qri](https://github.com/qri-io/qri), mess with this `startf` package, then rebuild qri with your changes to see them in action within qri itself.

## Starlark Special Functions

_Special Functions_ are the core of a starlark transform script. Here's an example of a simple data function that sets the body of a dataset to a constant:

<!--
docrun:
test:
call: transform(ds, ctx)
actual: ds.get_meta()
expect: {"hello": "world", "qri": "md:0"}
-->
```python
def transform(ds,ctx):
ds.set_meta("hello","world")
```

Here's something slightly more complicated (but still very contrived) that modifies a dataset by adding up the length of all of the elements in a dataset body

<!--
docrun:
test:
setup: ds.set_body(["a","b","c"])
call: transform(ds, ctx)
actual: ds.get_body()
expect: [{"total": 3.0}]
-->
```python
def transform(ds, ctx):
body = ds.get_body()
if body != None:
count = 0
for entry in body:
count += len(entry)
ds.set_body([{"total": count}])
```

Starlark special functions have a few rules on top of starlark itself:
* special functions *always* accept a _transformation context_ (the `ctx` arg)
* When you define a data function, qri calls it for you
* All special functions are optional (you don't _need_ to define them), except `transform`. transform is required.
* Special functions are always called in the same order

Another import special function is `download`, which allows access to the `http` package:

<!--
docrun:
test:
webproxy:
url: http://example.com/data.json
response: {"data":[4,5,6]}
call: download(ctx)
actual: ctx.download
expect: {"data":[4.0,5.0,6.0]}
save:
filename: transform.star
-->
```python
load("http.star", "http")

def download(ctx):
data = http.get("http://example.com/data.json")
return data
```

The result of this special function can be accessed using `ctx.download`:

<!--
docrun:
test:
setup: ctx.download = ["test"]
call: transform(ds, ctx)
actual: ds.get_body()
expect: ["test"]
save:
filename: transform.star
append: true
-->
```python
def transform(ds, ctx):
ds.set_body(ctx.download)
```

More docs on the provide API is coming soon.

## Running a transform

Let's say the above function is saved as `transform.star`. You can run it to create a new dataset by using:

<!--
docrun:
pass: true
# TODO: Run this command in a sandbox, using the transform.star created above.
-->
```
qri save --file=transform.star me/dataset_name
```

Or, you can add more details by creating a dataset file (saved as `dataset.yaml`, for example) with additional structure:

<!--
docrun:
pass: true
# TODO: Save this file to use in the command below.
-->
```
name: dataset_name
transform:
scriptpath: transform.star
meta:
title: My awesome dataset
```

Then invoke qri:

<!--
docrun:
pass: true
# TODO: Run this command in a sandbox, using the dataset.yaml created above.
-->
```
qri save --file=dataset.yaml
```

Fun! More info over on our [docs site](https://qri.io/docs)

** **
103 changes: 103 additions & 0 deletions startf/context/context.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
package context

import (
"fmt"

"github.com/qri-io/starlib/util"
"go.starlark.net/starlark"
"go.starlark.net/starlarkstruct"
)

// Context carries values across function calls in a transformation
type Context struct {
// Results carries the return values of special function calls
results starlark.StringDict
values starlark.StringDict
config map[string]interface{}
secrets map[string]interface{}
}

// NewContext creates a new contex
func NewContext(config, secrets map[string]interface{}) *Context {
return &Context{
results: starlark.StringDict{},
values: starlark.StringDict{},
config: config,
secrets: secrets,
}
}

// Struct delivers this context as a starlark struct
func (c *Context) Struct() *starlarkstruct.Struct {
dict := starlark.StringDict{
"set": starlark.NewBuiltin("set", c.setValue),
"get": starlark.NewBuiltin("get", c.getValue),
"get_config": starlark.NewBuiltin("get_config", c.GetConfig),
"get_secret": starlark.NewBuiltin("get_secret", c.GetSecret),
}

for k, v := range c.results {
dict[k] = v
}

return starlarkstruct.FromStringDict(starlark.String("context"), dict)
}

// SetResult places the result of a function call in the results stringDict
// any results set here will be placed in the context struct field by name
func (c *Context) SetResult(name string, value starlark.Value) {
c.results[name] = value
}

func (c *Context) setValue(thread *starlark.Thread, _ *starlark.Builtin, args starlark.Tuple, kwargs []starlark.Tuple) (starlark.Value, error) {
var (
key starlark.String
value starlark.Value
)
if err := starlark.UnpackArgs("set", args, kwargs, "key", &key, "value", &value); err != nil {
return starlark.None, err
}

c.values[string(key)] = value
return starlark.None, nil
}

func (c *Context) getValue(thread *starlark.Thread, _ *starlark.Builtin, args starlark.Tuple, kwargs []starlark.Tuple) (starlark.Value, error) {
var key starlark.String
if err := starlark.UnpackArgs("get", args, kwargs, "key", &key); err != nil {
return starlark.None, err
}
if v, ok := c.values[string(key)]; ok {
return v, nil
}
return starlark.None, fmt.Errorf("value %s not set in context", string(key))
}

// GetSecret fetches a secret for a given string
func (c *Context) GetSecret(thread *starlark.Thread, _ *starlark.Builtin, args starlark.Tuple, kwargs []starlark.Tuple) (starlark.Value, error) {
if c.secrets == nil {
return starlark.None, fmt.Errorf("no secrets provided")
}

var key starlark.String
if err := starlark.UnpackPositionalArgs("get_secret", args, kwargs, 1, &key); err != nil {
return nil, err
}

return util.Marshal(c.secrets[string(key)])
}

// GetConfig returns transformation configuration details
// TODO - supplying a string argument to qri.get_config('foo') should return the single config value instead of the whole map
func (c *Context) GetConfig(thread *starlark.Thread, _ *starlark.Builtin, args starlark.Tuple, kwargs []starlark.Tuple) (starlark.Value, error) {
if c.config == nil {
return starlark.None, fmt.Errorf("no config provided")
}

var key starlark.String
if err := starlark.UnpackPositionalArgs("get_config", args, kwargs, 1, &key); err != nil {
return nil, err
}

return util.Marshal(c.config[string(key)])
}
Loading