Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Substrate Runtime Interface #2334

Closed
pepyakin opened this issue Apr 20, 2019 · 9 comments
Closed

Substrate Runtime Interface #2334

pepyakin opened this issue Apr 20, 2019 · 9 comments
Labels
J0-enhancement An additional feature request.
Milestone

Comments

@pepyakin
Copy link
Contributor

The current substrate runtime API is not the easiest to work with:

  • We have a couple of different runtime environments (with_std/without_std - or rather, staticly linked native code and wasm runtime), however interface should be the same.
  • Error handling might differ between environments.
  • Implementation and interface are located in different places but they are really tightly coupled and I mean it: say, a resulting pointer from runtime API must have a certain alignment or we depend on that in unsafe code. TBH I won't be surprised if there is broken unsafe code out there since it is so easy to break it, so hard to check, and we don't really treat this code as unsafe (e.g. there are no proofs on safety).
  • These unsafe invariants not usually stated anywhere, they are just in code. And there is also lack of documentation and the reason for that might be it is just not clear where this doc belongs.
  • There is this weird behavior of impl_function_executor, which allows you declare parameters such as usize which, however, will have u32 type, which might be baffling for new people working with this code. There was a PR that introduces an honest Rust interface, but unfortunately it has a downside: there is a desire for signatures of host functions to be trivially copyable without any changes to reduce error proness.
  • There are some differences and pitfalls to keep in mind when designing Substrate Runtime APIs. For example, multiple returns are not supported and thus if there is need to return multiple values it should be workarounded somehow. And yeah there are some inconstencies at the moment in the current implementation.

Basically, this is very error-prone and very boilerplaty code. This sounds as a good use-case for a code-generation. Here is my strawman proposal for such code generator:

We could introduce a special AST that describes an interface between the substrate runtime and the substrate host. It would support rather high-level types, e.g.: bytes_vec/bytes, bool, (T, J) (a tuple), *T (a raw pointer), [T; N] for parameters and return values. We could also declare if a function traps, maybe with a type Result or a special annotation.

Here is an example how it could look like:

# Not a rust file

# usize is not supported deliberatly, use u32.
fn malloc(size: u32) -> *const u8;
fn free(ptr: *const u8);

# `bytes` is converted to `Vec<u8>` on the substrate host side, and `&[u8]` on the rust side.
fn storage_exists(key: bytes) -> bool;

# `bytes?` - notation for optional value (for denoting absence of a value under the given key)
# note that we don't deal with problems such as how to return a bytes vector (which is two components `ptr` and `len`) via wasm: all such ABI details will be handled by codegeneration.
# note that we also don't need to care about details as creating slices with `from_raw_parts`, and caring about upholding all invariants.
fn child_storage(storage_key: bytes, key: bytes) -> bytes?;

and etc.
(Note that a new language is not necessary for this, we only need a model definition, which could be a rust expresion creating the model struct or it could be yaml)

Having this model, we can use some build.rs code generation for building definitions for several crates, such as:

  • sri-guest - declarations of every API function for usage from the runtime side (hence guest). Probably has without_std and with_std versions generated. Supersedes most part of sr-io, allocator part from sr-std, externs part of srml-sandbox.
  • sri-host-wasmi - a glue code that dispatches a call to Externals to the appropriate function in some trait, supersedes current impl_function_executor. This trait would be implemented in today's wasm_executor.rs.

Code generation gives a lot of benefits, here are some:

  • As was said, it removes duplication decreases error proness.
  • The API is in one place, so we it could be more consistent.
  • One single place for documentation of Substrate Runtime Interface,
  • All quirks of implementation is implemented only once per type, not for every case.
  • Definitions could potentially be used by other implementations of substrate runtime interface,
  • We would be able experiment more easily. For example, we could benchmark different ways of passing values, what works best, etc.
  • It would make us to decouple from wasmi and integrate other engines much more easily. The thing is we most likely want to have a couple of wasm engines at the same time (i.e. we want to have compilers for performance reasons, but we also want to have wasmi because it is super robust). So we just could generate different boilerplate for calling some trait methods and just dispatch them differently depending on the engine (Externals for wasmi, potentially extern "C" fn for others).
  • We would be able to easily change our wasm ABI: there is a proposal for multi-value return values in wasm. Code generation would make it easier to migrate to this.
  • It might give us ability to introduce versioning to (akin to impl_runtime_apis macro, but the other way around? : ) )
  • I might be wrong but maybe it will provide us an easier way to implement cumulus and maybe make this code a bit cleaner.
@pepyakin pepyakin added the J0-enhancement An additional feature request. label Apr 20, 2019
@pepyakin pepyakin added this to the As-and-when milestone Apr 20, 2019
@pepyakin
Copy link
Contributor Author

pepyakin commented May 20, 2019

Related PR: #2381. It can be seen as a first step towards that direction.

@Demi-Marie
Copy link
Contributor

I think we should go farther and avoid raw pointers in our APIs.

@pepyakin
Copy link
Contributor Author

Then how would you implement low-level APIs that depend on pointers, e.g. fn malloc(size: u32) -> *const u8;? TBH, I think I am ok with having pointers in the API as long the bytes type used whenever is possible.

@Demi-Marie
Copy link
Contributor

@pepyakin I do not believe that the host needs to provide malloc. When compiled for wasm, the runtime will use its own allocator, and when compiled natively, the runtime will use the system malloc routines.

@Demi-Marie
Copy link
Contributor

More specifically, I want to avoid needing unsafe code on the host side, which is a potential security risk.

@bkchr
Copy link
Member

bkchr commented May 27, 2019

@demimarie-parity this does not work... We provide our own allocator, but even the allocator interface uses raw pointers....

Pointers are not bad.

@pepyakin
Copy link
Contributor Author

Well, technically the code on the host side wont be unsafe since it deals only with wasm memory (for reference, take a look at wasm_executor.rs). That said, it is still practically unsafe since the host can corrupt the wasm memory if due care is not taken. And this is actually what this task proposes to solve (instead of using raw pointers we would have the bytes type)

For wasm we decided to provide the allocator. See the discussion in #300

@Demi-Marie
Copy link
Contributor

This is a prerequisite for my work on sandboxing.

@pepyakin
Copy link
Contributor Author

I believe this was implemented in some form.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
J0-enhancement An additional feature request.
Projects
None yet
Development

No branches or pull requests

3 participants