Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring test: Simple loop for prepare_spectra() assignments #56

Closed
wants to merge 6 commits into from

Conversation

philipp-baumann
Copy link
Member

@philipp-baumann philipp-baumann commented Dec 10, 2022

instead of Reduce. @ThomasKnecht I did a bit of refactoring. The Reduce approach you introduced leads to very elegant, functional code in parse_opus(). The nested loop on the other hand makes it possible to simplify the code in prepare_spectra(), and makes the assignment logic very clear in the loop. There is sure pros and cons with each of the approaches. Let me do a benchmark on more files to read, and then let's discuss. Curious about your opinion.

@philipp-baumann
Copy link
Member Author

philipp-baumann commented Dec 10, 2022

Current main branch:

r$> # Set up -----------------------------------------------------------------------
 
    library("opusreader2")

r$> path <- file.path("data", "spectra", "2018-NABO")
    opus_files <- list.files(
      path = path, pattern = "\\.\\d+$", full.names = TRUE, recursive = TRUE
    )
    # number of OPUS files
    length(opus_files)
[1] 1612

r$> bnch_seq <- bench::mark(
      data <- opusreader2::read_opus(dsn = path)
    )
Warning message:
Some expressions had a GC in every iteration; so filtering is disabled. 

r$> bnch_seq
# A tibble: 1 × 13
  expression                                      min  median itr/s…¹ mem_a…² gc/se…³ n_itr  n_gc total…⁴
  <bch:expr>                                 <bch:tm> <bch:t>   <dbl> <bch:b>   <dbl> <int> <dbl> <bch:t>
1 data <- opusreader2::read_opus(dsn = path)    20.9s   20.9s  0.0478  8.32GB    5.02     1   105   20.9s
# … with 4 more variables: result <list>, memory <list>, time <list>, gc <list>, and abbreviated
#   variable names ¹​`itr/sec`, ²​mem_alloc, ³​`gc/sec`, ⁴​total_time
# ℹ Use `colnames()` to see all variable names

Current branch

r$> remotes::install_github("spectral-cockpit/opusreader2", ref = "opt-52-prepare-spectra-loop")
...
r$> library("opusreader2")
    library("bench")
 
    path <- file.path("data", "spectra", "2018-NABO")
    opus_files <- list.files(
      path = path, pattern = "\\.\\d+$", full.names = TRUE, recursive = TRUE
    )

r$> bnch_seq <- bench::mark(
      data <- opusreader2::read_opus(dsn = path)
    )
Warning message:
Some expressions had a GC in every iteration; so filtering is disabled. 

r$> bnch_seq
# A tibble: 1 × 13
  expression                                      min  median itr/s…¹ mem_a…² gc/se…³ n_itr  n_gc total…⁴
  <bch:expr>                                 <bch:tm> <bch:t>   <dbl> <bch:b>   <dbl> <int> <dbl> <bch:t>
1 data <- opusreader2::read_opus(dsn = path)    20.8s   20.8s  0.0480  8.31GB    5.86     1   122   20.8s
# … with 4 more variables: result <list>, memory <list>, time <list>, gc <list>, and abbreviated
#   variable names ¹​`itr/sec`, ²​mem_alloc, ³​`gc/sec`, ⁴​total_time
# ℹ Use `colnames()` to see all variable names

@philipp-baumann philipp-baumann changed the title Refactoring test: Nested loop for prepare_spectra() Refactoring test: Simple loop for prepare_spectra() assignments Dec 10, 2022
@philipp-baumann
Copy link
Member Author

Take home:

  • Speed and memory: on par
  • Proposed criterion: code readability and simplicity/style

@ThomasKnecht
Copy link
Collaborator

ThomasKnecht commented Dec 10, 2022

i just don't like for-loops 😜.
but fine with me. as i understand i right, it does not make a difference in speed and memory?

@philipp-baumann
Copy link
Member Author

i just don't like for-loops stuck_out_tongue_winking_eye. but fine with me. as i understand i right, it does not make a difference in speed and memory?

There is indeed no difference in time and memory, I am sure Reduce has received quite some care in that direction. Let's just drop this PR and keep in mind the take home. Though a small simplification, we don't need

index <- which(grepl(data_pattern, names(ds_list)))

and we can directly assign by element name here

ds_list[[index]] <- ds_data[[1]]

@philipp-baumann
Copy link
Member Author

i just don't like for-loops stuck_out_tongue_winking_eye. but fine with me. as i understand i right, it does not make a difference in speed and memory?

There is indeed no difference in time and memory, I am sure Reduce has received quite some care in that direction. Let's just drop this PR and keep what we have learned from it.

@philipp-baumann philipp-baumann deleted the opt-52-prepare-spectra-loop branch December 18, 2022 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants