Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map NULL to empty file to consume zero memory instead of block size #37

Closed
lorenzwalthert opened this issue Sep 2, 2019 · 7 comments
Closed

Comments

@lorenzwalthert
Copy link

lorenzwalthert commented Sep 2, 2019

I came across your package and thought it would be a good match to implement caching with styler in r-lib/styler#538. A feature of the problem at hand is that there is no need to cache a specific value - the only thing we want to remember is if a computation has already been executed (in my case if the text to style has already been styled and hence does not need restyling). I could R.cache::saveCache(object = NULL, key = text, ...) or similar, but unfortunately, this will create a file with size > 0 bytes, and on my Mac, any file that is not totally empty takes 4KB of space beacuse that's the block size. In my usecase, it is likely that people will accumulate hundreds of cached values, which makes R.cache::saveCache(object = NULL, ...) a solution with too much overhead.

For this reason, I propose to map NULL to an empty file, For this reasons, I propose to introduce an argument shallow (defaulting to FALSE) in R.cache::saveCache() (or to create another function for shallow hashing if you prefer). When set, a file with base::file.create() would be created that is completely empty to accommodate the case in which keys should be cached, but values are not needed. The advantage is that this will take 0 bytes on disk (ignoring some metadata like file name, permissions ect. stored somewhere else on the file system). We could use the current implementation of R.cache::findCache() without an error, other functions that expect meta data would probably need to be adapted but I believe we could easily identify if a cached value is NULL, just check if the file size is 0. I belive it is generic enough to have other applications than r-lib/styler#538.

You can find a basic implementation without tests and documentation in my fork.

@krlmlr
Copy link

krlmlr commented Sep 22, 2019

Would it be easier to map NULL to empty files?

@lorenzwalthert
Copy link
Author

Good idea, that's much more elegant I think.

@lorenzwalthert lorenzwalthert changed the title Support for shallow caching (keys only, no values) Map NULL to empty file to consume zero memory instad of block size Oct 2, 2019
@lorenzwalthert lorenzwalthert changed the title Map NULL to empty file to consume zero memory instad of block size Map NULL to empty file to consume zero memory instead of block size Oct 2, 2019
@krlmlr
Copy link

krlmlr commented Dec 1, 2019

@HenrikBengtsson: Do you think this is feasible? Would you consider a PR?

@HenrikBengtsson
Copy link
Owner

Thanks for the bump - this one slipped. Yes, I'm open to PRs. If just like to think thought that your needs are and what the options forward are.

@HenrikBengtsson
Copy link
Owner

HenrikBengtsson commented Dec 1, 2019

I've just exported generateCache() in the develop branch;

remotes::install_github("HenrikBengtsson/R.cache@develop")

This will allow you to generate cache pathnames without creating files, so you can to do things like:

> pathname <- generateCache(key = list("foo"), dirs = c("styler", "this", "that"), suffix = ".styled")
> pathname
[1] "/home/hb/.cache/R/R.cache/styler/this/that/cfb1611e527451f9a3334e6f60768f5b.styled"
> already_styled <- utils::file_test("-f", pathname)
> already_styled
[1] FALSE
> res <- file.create(pathname)
> res
[1] TRUE
> utils::file_test("-f", pathname)
[1] TRUE
> file.size(pathname)
[1] 0

Would this be sufficient?

@lorenzwalthert
Copy link
Author

It seems yes: r-lib/styler@36606a2

Thanks.

@HenrikBengtsson
Copy link
Owner

Good to hear. FYI, I'll soon start preparing the next CRAN submission of R.cache (https://github.com/HenrikBengtsson/R.cache/milestone/4?closed=1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants