Skip to content

Aurelius Rcode Transformation

Jim Pivarski edited this page Sep 9, 2015 · 16 revisions

Overview

PFA is produced by Aurelius in four stages: (1) a model is converted into a list-of-lists data structure, (2) code is written in R syntax and wrapped in expression, (3) pfa.config gathers all pieces of the scoring engine and produces a list-of-lists corresponding to the final PFA file, and (4) json converts the list-of-lists to a serialized JSON string or output file.

At no point does Aurelius check the validity of the resulting PFA file; use Hadrian, Titus, or Titus-in-Aurelius (pfa.engine) for this crucial test.

The "code written in R syntax" cannot use arbitrary R functions. Only unary and binary operators (e.g. +, -, ==, <, which are functions in R and PFA) are automatically converted to their counterparts. Functions called in prefix notation (e.g. log(x), print(x)) must use PFA function names. Moreover, indexes, if used, start at zero (PFA convention), not one (R convention).

Some valid R syntax is "overloaded" with new meanings to supply information needed by PFA, such as type annotations. For instance, a function must be notated like this:

function(x = avro.double, y = avro.string -> avro.boolean) {
  (x > 5  &&  y == "hello")
}

This allows PFA to know that x is a number, y is a string, and that the function returns a boolean. Apart from these differences, writing R code for PFA conversion is like writing executable R code.

Aurelius automatically identifies whether symbols (variable references) are local variables, cells, pools, function names, or unknown. If unknown, Aurelius searches for a variable in the current R scope with that name and uses its value as a substitution. This allows you to write:

threshold <- 5
pfaExpression <- pfa.expr(quote(if (x < threshold) 0 else 1))
print(json(pfaExpression))
[1] {"if": {"<": ["x", 5]}, "then": 0, "else" 1}

to hard-code a threshold in the PFA expression. The replacement is assumed to already be in list-of-lists format, so if you're substituting code, convert it with pfa.expr first. (The number 5 in this example is so simple that its list-of-lists format is also 5. Note that strings in PFA's list-of-lists are list("hello") rather than "hello" to avoid confusion with a variable named hello.)

Any type annotations encountered in the R code are treated as R expressions, evaluated immediately. For instance,

pfaExpression <- pfa.expr(quote(new(type(), "[]")))

calls the type() function to generate the type annotation in list-of-lists format. Think of the R-to-PFA process as compilation: a compiler evaluates types during compilation and generates code that evaluates values at runtime. The same is true in the R-to-PFA conversion.

JSON as lists-of-lists

Aurelius uses the following subset of R's data structures to represent JSON.

R data structure JSON equivalent
NULL null
TRUE true
FALSE false
scalar number number (integer or floating-point)
character vector string
unnamed list JSON array (ordered, surrounded in square brackets)
named list JSON object (unordered pairs in curly brackets)

Note that data frames, environments, and non-scalars (vectors of length > 1) have no equivalent in Aurelius's JSON representation. If the json function encounters an unrecognized object, it will raise an error.

Avro schema specification

All types in PFA are encoded in Avro schemas, which are also embedded in JSON. You could create type specifications by producing the corresponding JSON in list-of-lists form or you can use the following convenience functions.

Avro type JSON Aurelius generator
null "null" avro.null
boolean "boolean" avro.boolean
int "int" avro.int
long "long" avro.long
float "float" avro.float
double "double" avro.double
string "string" avro.string
bytes "bytes" avro.bytes
array {"type": "array", "items": "int"} avro.array(avro.int)
map {"type": "map", "values": "int"} avro.map(avro.int)
fixed {"type": "fixed", "name": "SomeName", "namespace": "com.wowzers", "size": 16} avro.fixed(16, "SomeName", "com.wowzers")
enum {"type": "enum", "name": "SomeName", "namespace": "com.wowzers", "symbols": ["one", "two", "three"]} avro.enum(list("one", "two", "three"), "SomeName", "com.wowzers")
record {"type": "record", "name": "SomeName", "namespace": "com.wowzers", "fields": [{"name": "one", "type": "int"}, {"name": "two", "type": "double"}, {"name": "three", "type": "string"}]} avro.record(list(one = avro.int, two = avro.double, three = avro.string), "SomeName", "com.wowzers")
union ["null", "int", "string"] avro.union(avro.null, avro.int, avro.string)

The name and namespace parameters are optional. If a name is not provided, Aurelius will generate a unique name.

Code transformations

R code passed to pfa.expr must be wrapped in R's quote function and R code passed to arguments of pfa.config must be wrapped in R's expression function. The quote function only takes a single expression, but expression takes any number of expressions as comma-separated arguments. (Remember to put commas in the top level of your code!) The quote and expression functions are needed to keep R from evaluating the code right away.

The table below shows how each expression is transformed.

R code PFA equivalent Interpretation
12, 3.14 12, 3.14 literal numbers
"hello" {"string": "hello"} literal string
NULL, TRUE, FALSE null, true, false literal null and boolean
function(. = avro.int) { 3.14 } {"params": [], "ret": "int", "do": 3.14} Zero-parameter function
function(x = avro.int -> avro.double) { x + 0.1 } {"params": [{"x": "int"}], "ret": "double", "do": {"+": ["x", 0.1]}} General function

Non-code elements

pfa.cell, pfa.pool

Clone this wiki locally