-
Notifications
You must be signed in to change notification settings - Fork 49
Aurelius Rcode Transformation
PFA is produced by Aurelius in four stages: (1) a model is converted into a list-of-lists data structure, (2) code is written in R syntax and wrapped in expression
, (3) pfa.config
gathers all pieces of the scoring engine and produces a list-of-lists corresponding to the final PFA file, and (4) json
converts the list-of-lists to a serialized JSON string or output file.
At no point does Aurelius check the validity of the resulting PFA file; use Hadrian, Titus, or Titus-in-Aurelius (pfa.engine
) for this crucial test.
The "code written in R syntax" cannot use arbitrary R functions. Only unary and binary operators (e.g. +
, -
, ==
, <
, which are functions in R and PFA) are automatically converted to their counterparts. Functions called in prefix notation (e.g. log(x)
, print(x)
) must use PFA function names. Moreover, indexes, if used, start at zero (PFA convention), not one (R convention).
Some valid R syntax is "overloaded" with new meanings to supply information needed by PFA, such as type annotations. For instance, a function must be notated like this:
function(x = avro.double, y = avro.string -> avro.boolean) {
(x > 5 && y == "hello")
}
This allows PFA to know that x
is a number, y
is a string, and that the function returns a boolean. Apart from these differences, writing R code for PFA conversion is like writing executable R code.
Aurelius automatically identifies whether symbols (variable references) are local variables, cells, pools, function names, or unknown. If unknown, Aurelius searches for a variable in the current R scope with that name and uses its value as a substitution. This allows you to write:
threshold <- 5
pfaExpression <- pfa.expr(quote(if (x < threshold) 0 else 1))
print(json(pfaExpression))
[1] {"if": {"<": ["x", 5]}, "then": 0, "else" 1}
to hard-code a threshold
in the PFA expression. The replacement is assumed to already be in list-of-lists format, so if you're substituting code, convert it with pfa.expr
first. (The number 5
in this example is so simple that its list-of-lists format is also 5
. Note that strings in PFA's list-of-lists are list("hello")
rather than "hello"
to avoid confusion with a variable named hello
.)
Any type annotations encountered in the R code are treated as R expressions, evaluated immediately. For instance,
pfaExpression <- pfa.expr(quote(new(type(), "[]")))
calls the type()
function to generate the type annotation in list-of-lists format. Think of the R-to-PFA process as compilation: a compiler evaluates types during compilation and generates code that evaluates values at runtime. The same is true in the R-to-PFA conversion.
Aurelius uses the following subset of R's data structures to represent JSON.
R data structure | JSON equivalent |
---|---|
NULL | null |
TRUE | true |
FALSE | false |
scalar number | number (integer or floating-point) |
character vector | string |
unnamed list | JSON array (ordered, surrounded in square brackets) |
named list | JSON object (unordered pairs in curly brackets) |
Note that data frames, environments, and non-scalars (vectors of length > 1) have no equivalent in Aurelius's JSON representation. If the json
function encounters an unrecognized object, it will raise an error.
All types in PFA are encoded in Avro schemas, which are also embedded in JSON. You could create type specifications by producing the corresponding JSON in list-of-lists form or you can use the following convenience functions.
Avro type | JSON | Aurelius generator |
---|---|---|
null | "null" |
avro.null |
boolean | "boolean" |
avro.boolean |
int | "int" |
avro.int |
long | "long" |
avro.long |
float | "float" |
avro.float |
double | "double" |
avro.double |
string | "string" |
avro.string |
bytes | "bytes" |
avro.bytes |
array | {"type": "array", "items": "int"} |
avro.array(avro.int) |
map | {"type": "map", "values": "int"} |
avro.map(avro.int) |
fixed | {"type": "fixed", "name": "SomeName", "namespace": "com.wowzers", "size": 16} |
avro.fixed(16, "SomeName", "com.wowzers") |
enum | {"type": "enum", "name": "SomeName", "namespace": "com.wowzers", "symbols": ["one", "two", "three"]} |
avro.enum(list("one", "two", "three"), "SomeName", "com.wowzers") |
record | {"type": "record", "name": "SomeName", "namespace": "com.wowzers", "fields": [{"name": "one", "type": "int"}, {"name": "two", "type": "double"}, {"name": "three", "type": "string"}]} |
avro.record(list(one = avro.int, two = avro.double, three = avro.string), "SomeName", "com.wowzers") |
union | ["null", "int", "string"] |
avro.union(avro.null, avro.int, avro.string) |
The name
and namespace
parameters are optional. If a name
is not provided, Aurelius will generate a unique name.
R code passed to pfa.expr
must be wrapped in R's quote
function and R code passed to arguments of pfa.config
must be wrapped in R's expression
function. The quote
function only takes a single expression, but expression
takes any number of expressions as comma-separated arguments. (Remember to put commas in the top level of your code!) The quote
and expression
functions are needed to keep R from evaluating the code right away.
The table below shows how each expression is transformed.
R code | PFA equivalent | Interpretation |
---|---|---|
12 , 3.14
|
12 , 3.14
|
literal numbers |
"hello" |
{"string": "hello"} |
literal string |
NULL , TRUE , FALSE
|
null , true , false
|
literal null and boolean |
function(. = avro.int) { 3.14 } |
{"params": [], "ret": "int", "do": 3.14} |
Zero-parameter function |
function(x = avro.int -> avro.double) { x + 0.1 } |
{"params": [{"x": "int"}], "ret": "double", "do": {"+": ["x", 0.1]}} |
General function |
pfa.cell
, pfa.pool
Return to the Hadrian wiki table of contents.
Licensed under the Hadrian Personal Use and Evaluation License (PUEL).