Refactoring test: Simple loop for `prepare_spectra()` assignments #56

philipp-baumann · 2022-12-10T12:36:17Z

instead of Reduce. @ThomasKnecht I did a bit of refactoring. The Reduce approach you introduced leads to very elegant, functional code in parse_opus(). The nested loop on the other hand makes it possible to simplify the code in prepare_spectra(), and makes the assignment logic very clear in the loop. There is sure pros and cons with each of the approaches. Let me do a benchmark on more files to read, and then let's discuss. Curious about your opinion.

philipp-baumann · 2022-12-10T16:04:05Z

Current main branch:

r$> # Set up -----------------------------------------------------------------------
 
    library("opusreader2")

r$> path <- file.path("data", "spectra", "2018-NABO")
    opus_files <- list.files(
      path = path, pattern = "\\.\\d+$", full.names = TRUE, recursive = TRUE
    )
    # number of OPUS files
    length(opus_files)
[1] 1612

r$> bnch_seq <- bench::mark(
      data <- opusreader2::read_opus(dsn = path)
    )
Warning message:
Some expressions had a GC in every iteration; so filtering is disabled. 

r$> bnch_seq
# A tibble: 1 × 13
  expression                                      min  median itr/s…¹ mem_a…² gc/se…³ n_itr  n_gc total…⁴
  <bch:expr>                                 <bch:tm> <bch:t>   <dbl> <bch:b>   <dbl> <int> <dbl> <bch:t>
1 data <- opusreader2::read_opus(dsn = path)    20.9s   20.9s  0.0478  8.32GB    5.02     1   105   20.9s
# … with 4 more variables: result <list>, memory <list>, time <list>, gc <list>, and abbreviated
#   variable names ¹`itr/sec`, ²mem_alloc, ³`gc/sec`, ⁴total_time
# ℹ Use `colnames()` to see all variable names

Current branch

r$> remotes::install_github("spectral-cockpit/opusreader2", ref = "opt-52-prepare-spectra-loop")
...
r$> library("opusreader2")
    library("bench")
 
    path <- file.path("data", "spectra", "2018-NABO")
    opus_files <- list.files(
      path = path, pattern = "\\.\\d+$", full.names = TRUE, recursive = TRUE
    )

r$> bnch_seq <- bench::mark(
      data <- opusreader2::read_opus(dsn = path)
    )
Warning message:
Some expressions had a GC in every iteration; so filtering is disabled. 

r$> bnch_seq
# A tibble: 1 × 13
  expression                                      min  median itr/s…¹ mem_a…² gc/se…³ n_itr  n_gc total…⁴
  <bch:expr>                                 <bch:tm> <bch:t>   <dbl> <bch:b>   <dbl> <int> <dbl> <bch:t>
1 data <- opusreader2::read_opus(dsn = path)    20.8s   20.8s  0.0480  8.31GB    5.86     1   122   20.8s
# … with 4 more variables: result <list>, memory <list>, time <list>, gc <list>, and abbreviated
#   variable names ¹`itr/sec`, ²mem_alloc, ³`gc/sec`, ⁴total_time
# ℹ Use `colnames()` to see all variable names

philipp-baumann · 2022-12-10T16:39:21Z

Take home:

Speed and memory: on par
Proposed criterion: code readability and simplicity/style

ThomasKnecht · 2022-12-10T17:24:50Z

i just don't like for-loops 😜.
but fine with me. as i understand i right, it does not make a difference in speed and memory?

philipp-baumann · 2022-12-14T08:48:59Z

i just don't like for-loops stuck_out_tongue_winking_eye. but fine with me. as i understand i right, it does not make a difference in speed and memory?

There is indeed no difference in time and memory, I am sure Reduce has received quite some care in that direction. Let's just drop this PR and keep in mind the take home. Though a small simplification, we don't need

opusreader2/R/prepare_spectra.R

Line 7 in c3e4c2d

index <- which(grepl(data_pattern, names(ds_list)))

and we can directly assign by element name here

opusreader2/R/prepare_spectra.R

Line 32 in c3e4c2d

ds_list[[index]] <- ds_data[[1]]

philipp-baumann · 2022-12-14T08:52:31Z

i just don't like for-loops stuck_out_tongue_winking_eye. but fine with me. as i understand i right, it does not make a difference in speed and memory?

There is indeed no difference in time and memory, I am sure Reduce has received quite some care in that direction. Let's just drop this PR and keep what we have learned from it.

Philipp Baumann added 6 commits December 8, 2022 16:57

improve speed

04ea7ea

fix wrong object

6b837ad

fix 2, wrong object

c17f4cf

fix 3, wrong object

0b87d5a

restructure funs in nested for loop

caf49fd

no nested loop (wrong) needed

36ab28f

philipp-baumann changed the title ~~Refactoring test: Nested loop for prepare_spectra()~~ Refactoring test: Simple loop for prepare_spectra() assignments Dec 10, 2022

philipp-baumann requested a review from ThomasKnecht December 10, 2022 16:32

philipp-baumann mentioned this pull request Dec 10, 2022

💾 We can probably reduce the memory footprint (alloc read & parsed size) #52

Closed

philipp-baumann closed this Dec 14, 2022

philipp-baumann deleted the opt-52-prepare-spectra-loop branch December 18, 2022 18:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring test: Simple loop for `prepare_spectra()` assignments #56

Refactoring test: Simple loop for `prepare_spectra()` assignments #56

philipp-baumann commented Dec 10, 2022 •

edited

Loading

philipp-baumann commented Dec 10, 2022 •

edited

Loading

philipp-baumann commented Dec 10, 2022

ThomasKnecht commented Dec 10, 2022 •

edited

Loading

philipp-baumann commented Dec 14, 2022

philipp-baumann commented Dec 14, 2022

Refactoring test: Simple loop for prepare_spectra() assignments #56

Refactoring test: Simple loop for prepare_spectra() assignments #56

Conversation

philipp-baumann commented Dec 10, 2022 • edited Loading

philipp-baumann commented Dec 10, 2022 • edited Loading

philipp-baumann commented Dec 10, 2022

ThomasKnecht commented Dec 10, 2022 • edited Loading

philipp-baumann commented Dec 14, 2022

philipp-baumann commented Dec 14, 2022

Refactoring test: Simple loop for `prepare_spectra()` assignments #56

Refactoring test: Simple loop for `prepare_spectra()` assignments #56

philipp-baumann commented Dec 10, 2022 •

edited

Loading

philipp-baumann commented Dec 10, 2022 •

edited

Loading

ThomasKnecht commented Dec 10, 2022 •

edited

Loading