Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support arbitrary R objects #5

Closed
randy3k opened this issue Feb 28, 2018 · 7 comments
Closed

Support arbitrary R objects #5

randy3k opened this issue Feb 28, 2018 · 7 comments

Comments

@randy3k
Copy link

randy3k commented Feb 28, 2018

I am wondering is there any plan to support arbitrary R objects, eg, lists or environments in the future. The current implementation has somewhat limited the practical usability of the package. Thanks.

@dirmeier
Copy link
Owner

dirmeier commented Mar 1, 2018

Hey, I totally agree, this should be supported. I was planning to add this already, but to be honest I don't know the C API well enough yet to implement this. Unfortunately there are also not too many sources except the R documentation.

Let me quickly explain my problem. So, as far as I know I need to predefine C++ data types, because R dispatches dynamically. For instance, for a heap of int->double I need to predefine the following :

typedef binomial_heap<int, double> binomial_heap_id;

Now suppose I want a heap of int -> list. A list might look like this in R:

> list(a=1, b="a")
$a
[1] 1

$b
[1] "a"

If I look at the SEXP object on the backend, it has the following structure:

pryr::inspect(list(a=1, b="a"))

<VECSXP 0x7f90fc23b950>
  <REALSXP 0x7f90fc04baf8>
  <STRSXP 0x7f90fc04ba98>
    <CHARSXP 0x7f90fa814798>

attributes:
  <LISTSXP 0x7f90fbcbf440>
  tag:
    <SYMSXP 0x7f90fb80a940>
  car:
    <STRSXP 0x7f90fc23b8e0>
      [CHARSXP 0x7f90fa814798]
      <CHARSXP 0x7f90fb30d3f8>
  cdr:
    NULL

I haven't checked, how I could implement this for different data types. If I can just

typedef binomial_heap<int, SEXP> binomial_heap_id;

this would be easy.

As soon as I have more time, I'll have a look into this. Thanks for the feedback!

@randy3k
Copy link
Author

randy3k commented Mar 1, 2018

SEXP is just a pointer, you just need to make sure R GC doesn't remove the memory by preserving it.
Check R_PreserveObject and R_ReleaseObject.

Update: However..

These functions need to be used with care: because R_ReleaseObject() will perform a recursive search for the object to protect, if many SEXPs are inserted and later removed in the same order, this can cause performance regressions.

@randy3k
Copy link
Author

randy3k commented Mar 1, 2018

Another possibility is to serialize the R objects to raw vectors via serialize(x, NULL)

@dirmeier
Copy link
Owner

dirmeier commented Mar 1, 2018

Oh wow, that's great! Thanks!

@randy3k
Copy link
Author

randy3k commented Mar 1, 2018

One more alternative, you could also store the R objects in a queue using a pairlist if you think R_PreserveObject/R_ReleaseObject is too heavy.

@wlandau
Copy link

wlandau commented Mar 2, 2018

I generally prefer to avoid serialization in R because it is usually a performance bottleneck.

@dirmeier dirmeier mentioned this issue Mar 11, 2018
@dirmeier
Copy link
Owner

Dear @randy3k,

the latest version of the package implements your suggestions. Thanks for the tips and feedback.
I will validate and test this for a couple of more days and then release to CRAN.

  hm <- hashmap("integer")
  keys <- 1:2
  values <- list(
    environment(),
    data.frame(A=rbeta(3, .5, .5), B=rgamma(3, 1)))
  hm[keys] <- values

  hm[1L]
  [[1]]
  <environment: R_GlobalEnv>

Please let me know if I can close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants