-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to write a tibble for non-R software to use later? #202
Comments
This entirely depends on how this hypothetical "other language" processes nested data frames (or data frames with list columns in general). Which particular language do you have in mind? For example, for a database that can usually store only atomic values in a column, you would use two tables to store this dataset, linked with a key column. This question is a bit broad, and the tibble issue tracker isn't the best forum. Would you mind asking at StackOverflow, with a bit more detail about your target language? |
Well, we don't have a target language. That's why we have been using tab separated value text tables. One way of re-framing my issue would be like this: have you thought of a way to write tibbles to tsv files with some extra markup for list columns that can be used for later reading in that tsv file into a data frame with list columns (or however that's implemented outside R)? For example, I know R sometimes writes list columns with |
I think tsv is a particularly poor format for this kind of data. If you're interested in portability and durability, use XML or json; I'm not aware of ready-made readers/writers, but I haven't really looked. |
I think JSON might be the best way to write this out. It is certainly language agnostic. This is why I raised this issue a while back, but it was a nonstarter: |
Hi again, Ok, json it is then. You might want to add a quick section at https://github.com/tidyverse/tibble/blob/master/vignettes/tibble.Rmd on writing tibbles to disk. Here's my quick test with the first small example from earlier. Best, Clean codelibrary('tibble')
## From ?as_tibble
l <- list(x = 1:500, y = runif(500), z = 500:1)
df <- as_tibble(l)
## Add a list column
df$l <- lapply(1:500, seq_len)
## Convert to a json keeping as many digits as possible
library('jsonlite')
json <- toJSON(df, digits = NA)
## Check that it's ok
validate(json)
## Write to file, then read again
write(json, file = 'tibble_to_json.json')
disk <- fromJSON('tibble_to_json.json')
## Convert back to tibble
df2 <- as_tibble(disk)
## Are the tibbles the same?
identical(df, df2)
library('testthat')
expect_equivalent(df, df2)
expect_equal(df, df2)
## Session info
options(width = 120)
devtools::session_info() Evaluated code> library('tibble')
> ## From ?as_tibble
> l <- list(x = 1:500, y = runif(500), z = 500:1)
> df <- as_tibble(l)
>
> ## Add a list column
> df$l <- lapply(1:500, seq_len)
>
> ## Convert to a json keeping as many digits as possible
> library('jsonlite')
> json <- toJSON(df, digits = NA)
>
> ## Check that it's ok
> validate(json)
[1] TRUE
>
> ## Write to file, then read again
> write(json, file = 'tibble_to_json.json')
> disk <- fromJSON('tibble_to_json.json')
>
> ## Convert back to tibble
> df2 <- as_tibble(disk)
>
> ## Are the tibbles the same?
> identical(df, df2)
[1] FALSE
> library('testthat')
> expect_equivalent(df, df2)
> expect_equal(df, df2)
>
> ## Session info
> options(width = 120)
> devtools::session_info()
Session info -----------------------------------------------------------------------------------------------------------
setting value
version R Under development (unstable) (2016-10-26 r71594)
system x86_64, darwin13.4.0
ui AQUA
language (EN)
collate en_US.UTF-8
tz America/New_York
date 2016-12-06
Packages ---------------------------------------------------------------------------------------------------------------
package * version date source
assertthat 0.1 2013-12-06 CRAN (R 3.4.0)
crayon 1.3.2 2016-06-28 CRAN (R 3.4.0)
devtools 1.12.0 2016-06-24 CRAN (R 3.4.0)
digest 0.6.10 2016-08-02 CRAN (R 3.4.0)
jsonlite * 1.1 2016-09-14 CRAN (R 3.4.0)
magrittr 1.5 2014-11-22 CRAN (R 3.4.0)
memoise 1.0.0 2016-01-29 CRAN (R 3.4.0)
R6 2.2.0 2016-10-05 CRAN (R 3.4.0)
Rcpp 0.12.8 2016-11-17 CRAN (R 3.4.0)
testthat * 1.0.2 2016-04-23 CRAN (R 3.4.0)
tibble * 1.2 2016-08-26 CRAN (R 3.4.0)
withr 1.0.2 2016-06-20 CRAN (R 3.4.0)
> |
Interesting. What are the differences after serialization (df and df2 aren't identical according to your code)? |
I believe that it's just numerical differences after many digits. That
would be why testthat::expect_equal() says it's ok since it has be default:
tolerance
numeric *≥* 0. Differences smaller than tolerance are not reported. The
default value is close to 1.5e-8.
…On Tue, Dec 6, 2016 at 11:09 AM, Kirill Müller ***@***.***> wrote:
Interesting. What are the differences after serialization (df and df2
aren't identical according to your code)?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#202 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACLqVfftUmGL8fJC9QACOM9c0MJfn2cHks5rFYiugaJpZM4LEr8->
.
|
This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary. |
Hi,
After @jennybc's talk today at JHU Biostats about tibble I'm curious if you have figured out a way to export a tibble (or
tbl_df
#82) to a disk file that can be used by other languages. What do you typically do in these situations? You write it out to a json file? Ideally that same file could then be read into a tibble if someone wanted to.My particular use case is
recount::all_metadata('tcga')
which loads the data.frame at https://github.com/leekgroup/recount-website/blob/master/metadata/metadata_clean_tcga.Rdata. That data.frame has 5 list columns and it's currently problematic for us to write it to disk in a tsv format usingwrite.table()
. That's why I deleted these 5 columns from the tsv file we have at http://duffel.rail.bio/recount/TCGA/TCGA.tsv.Best,
Leo
Clean code
Evaluated code
The text was updated successfully, but these errors were encountered: