Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practices for collaborative work on a single RDS storr? #92

Open
wlandau opened this issue Nov 25, 2018 · 3 comments
Open

Best practices for collaborative work on a single RDS storr? #92

wlandau opened this issue Nov 25, 2018 · 3 comments

Comments

@wlandau
Copy link
Contributor

wlandau commented Nov 25, 2018

@ldecicco-USGS has raised this issue as it applies to drake. It can be challenging to commit/upload all the tiny files of an RDS storr to GitHub/Dropbox/Google Drive. I think a vignette might help if there are counterintuitive workarounds and/or good existing best practices. Related: #1, #4, #16, #37, #77.

@richfitz
Copy link
Owner

There's import/export support between stores, that idea could be extended to archive to zip - I thought I'd done that already but seems not.

Can you sketch out what you need as I don't currently have this use case. Once I can see requirements I can look at what it would take to support

@wlandau
Copy link
Contributor Author

wlandau commented Dec 13, 2018

I am not sure I can sketch out any one specific best solution right now. I am just looking for guidance on the most efficient ways to transport RDS stores so I can provide better recommendations on how multiple users can collaborate on drake projects. I have some ideas, but not all of them work, and I do not think they are exhaustive.

Even outside storr's API, it is already straightforward to zip up and transport stores. drake targets even remain up to date that way.

library(drake)
load_mtcars_example() # Load packages and functions (& write report.Rmd)
make(my_plan, verbose = FALSE) # Run the project
zip(zipfile = "cache.zip", files = ".drake", flags = "-qr9X") # Zip up the cache.
wd <- getwd() # Directory where the project was built.
dir <- tempfile() # Go to a new directory.
dir.create(dir)
setwd(dir)
load_mtcars_example() # Load packages and functions (& write report.Rmd)
unzip(file.path(wd, "cache.zip")) # Unpack the cache.
tmp <- file.copy(from = file.path(wd, "report.md"), to = ".") # Get the compiled report too.
make(my_plan) # Everything is up to date in the new location.
#> All targets are already up to date.

Created on 2018-12-13 by the reprex package (v0.2.1)

However, this process duplicates information, and it would not bode well for projects with large datasets. That is why I think #93 could go a long way towards a good recommendation.

For projects with the potential for collaboration, I already recommend starting out with DBI format, but this does not help existing drake workflows.

With storrs (ref: #61) collaboration might not require any copying at all, but this would almost certainly slow down the execution of drake::make().

I also wonder if containerization can help somehow. It's hard to beat reproducibility and portability in the feature set covered by Docker and Singularity.

@r2evans
Copy link

r2evans commented Feb 13, 2020

@wlandau , updated link from your last comment about DBI, I believe: https://books.ropensci.org/drake/storage.html#interfaces-to-the-cache

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants