Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition from dynamic dispatch to generics #20

Merged
merged 11 commits into from
Mar 4, 2023

Conversation

SteveCookTU
Copy link
Contributor

This is a draft PR to transition core functionality to use generics.

I have not yet done any benchmarking to see if this actually improves performance.

Before benchmarking, I would like to make sure all tests are running properly. The internals of the multi readers has changed to properly implement this which seems to have caused issues for both RSS barcode formats. I am drafting this PR for now and will continue to investigate the cause.

If anyone else would like to take a look to find the issue, feel free! No core math has changed so my only guess is it's something within the MultiFormatReader or MultiFormatOneDReader.

@SteveCookTU
Copy link
Contributor Author

What also leads me to believe the issue is within MultiFormatReader or MultiFormatOneDReader is the only tests that fail are the black box ones. The unit tests all pass.

@hschimke
Copy link
Collaborator

hschimke commented Mar 2, 2023

I'll take a look at it in the morning. My gut says that it's the state between runs for multi part reading of rss codes.

@SteveCookTU
Copy link
Contributor Author

My gut says that it's the state between runs for multi part reading of rss codes.

If that's the case I'll have to figure out a way to re-incorporate keeping the state after reads from within the readers. Thanks for the extra eyes on it!

@SteveCookTU
Copy link
Contributor Author

I do have an alternative implementation as well for the readers which should allow for holding onto the states. Will test it out in the morning as well.

@SteveCookTU SteveCookTU marked this pull request as ready for review March 2, 2023 14:11
Copy link
Collaborator

@hschimke hschimke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find your method for converting the Binarizer trait especially interesting. I think this is where I got hung up when I attempted this, it didn't occur to me to put a type into the trait, so I had to make the trait generic, which spread a lot more generics parameters all over the code.

@hschimke
Copy link
Collaborator

hschimke commented Mar 2, 2023

Ok, I've fully reviewed it, and I'm impressed with out cleanly this works. My attempts all resulted in the types getting rather bloated.

I hadn't thought all the way through before, but it's interesting how differently the multi-line read works for RSS vs every other type that works similarly (multi-image-read for pdf417, for instance). The RSS reader code pretty clearly was contributed by someone other than the original devs (not surprising, given that the the pdf417 code was as well).

As for benchmarking, I don't have a great way to approach it. I haven't built any benchmarks for the library, though there were some benchmark images included with the original which they used to test their client application. I'll look into what it takes to get those working and figure something out.

While I get that worked out, I think this looks good to be part of our 0.4.0 release, I'd like to get the decoupled character encoding / decoding work into that as well.

Excellent work, thank you!

@SteveCookTU
Copy link
Contributor Author

I can look into benchmarking as well. I think we can copy from the integration tests and make benchmark versions of them. That may be a big thing to implement so it may be better in its own PR after this were to be merged. Until then I'll play around with the idea

Copy link
Collaborator

@hschimke hschimke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests pass, pending benchmarks

@hschimke
Copy link
Collaborator

hschimke commented Mar 2, 2023

I think it makes the interface so much cleaner that it might make sense even if the benchmarks aren't conclusive.

@SteveCookTU SteveCookTU mentioned this pull request Mar 2, 2023
@SteveCookTU
Copy link
Contributor Author

Added some small benchmarks. I ran them locally against the current main branch and there is definitely some improvement for the more advanced formats (maxicode, qrcode). Almost nothing or just noise from my pc for more simple ones such as codabar.

@hschimke
Copy link
Collaborator

hschimke commented Mar 3, 2023

Nice!

Thinking about where the (biggest) performance issues are, I think the GenericMultipleBarcodeReader will be an interesting one to benchmark. It can get very slow for larger images.

@hschimke
Copy link
Collaborator

hschimke commented Mar 3, 2023

I added this benchmark:

fn multi_barcode_benchmark(c: &mut Criterion) {
    let mut image = get_image("test_resources/blackbox/multi-1/1.png");
    c.bench_function("multi_barcode", |b| {
        b.iter( || {
            let mut reader = GenericMultipleBarcodeReader::new(MultiFormatReader::default());
            let _res = reader.decode_multiple(&mut image);
        });
    });
}

I think criterion runs in release mode, though, so I just realized that your optimizations weren't included when I benched the main branch, that probably throws things off a little. Anyway, this is the result I got:

     Running benches/benchmarks.rs (target/release/deps/benchmarks-68e8dc38572985c0)
aztec                   time:   [73.593 µs 73.617 µs 73.645 µs]
                        change: [-1.0091% -0.6700% -0.3327%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 18 outliers among 100 measurements (18.00%)
  10 (10.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe

codabar                 time:   [1.2934 µs 1.2937 µs 1.2941 µs]
                        change: [-3.1258% -2.8201% -2.5260%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  7 (7.00%) high mild
  7 (7.00%) high severe

code39                  time:   [1.8068 µs 1.8087 µs 1.8109 µs]
                        change: [-6.2732% -6.1258% -5.9615%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  4 (4.00%) high mild
  4 (4.00%) high severe

code93                  time:   [846.67 ns 846.84 ns 847.04 ns]
                        change: [+7.8847% +7.9943% +8.1020%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low severe
  4 (4.00%) high mild
  7 (7.00%) high severe

datamatrix              time:   [14.079 µs 14.080 µs 14.083 µs]
                        change: [-0.3057% -0.1799% -0.0556%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe

ean8                    time:   [705.75 ns 705.93 ns 706.12 ns]
                        change: [+0.9299% +1.0660% +1.1956%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  4 (4.00%) high mild
  5 (5.00%) high severe

ean13                   time:   [2.8088 µs 2.8097 µs 2.8107 µs]
                        change: [+0.3196% +0.4103% +0.4986%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe

itf                     time:   [1.1229 µs 1.1234 µs 1.1240 µs]
                        change: [+2.9470% +3.0494% +3.1485%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low severe
  5 (5.00%) high mild
  6 (6.00%) high severe

maxicode                time:   [57.680 µs 57.751 µs 57.857 µs]
                        change: [-2.8732% -2.7421% -2.5729%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

pdf417                  time:   [79.708 µs 79.760 µs 79.842 µs]
                        change: [-3.8282% -3.7293% -3.6144%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  8 (8.00%) high severe

qrcode                  time:   [449.92 µs 450.38 µs 450.71 µs]
                        change: [-4.8010% -4.6919% -4.5926%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  6 (6.00%) high severe

rss14                   time:   [5.1668 µs 5.1679 µs 5.1691 µs]
                        change: [-4.9782% -4.8728% -4.7823%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) high mild
  8 (8.00%) high severe

rss_expanded            time:   [4.0921 µs 4.0938 µs 4.0957 µs]
                        change: [-2.5715% -2.4221% -2.2858%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

upca                    time:   [21.361 µs 21.365 µs 21.370 µs]
                        change: [-2.1822% -1.9784% -1.7806%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

upce                    time:   [641.73 ns 642.13 ns 642.65 ns]
                        change: [-5.0038% -4.8382% -4.6679%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) low severe
  5 (5.00%) high mild
  10 (10.00%) high severe

multi_barcode           time:   [3.5963 ms 3.6042 ms 3.6121 ms]
                        change: [+7.3536% +7.6257% +7.8952%] (p = 0.00 < 0.05)
                        Performance has regressed.

I cannot fully explain the differences here, and I think I need to run them again in a more controlled way.

@hschimke
Copy link
Collaborator

hschimke commented Mar 3, 2023

Ok, I fully admit that I don't understand benchmarking very well. Without any modifications, I ran all my the benchmarks again and got the following:

     Running benches/benchmarks.rs (target/release/deps/benchmarks-68e8dc38572985c0)
aztec                   time:   [71.552 µs 71.795 µs 72.073 µs]
                        change: [-2.7853% -2.5493% -2.3014%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
  11 (11.00%) high severe

codabar                 time:   [1.2645 µs 1.2657 µs 1.2673 µs]
                        change: [-2.3197% -2.2249% -2.1186%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe

code39                  time:   [1.7666 µs 1.7678 µs 1.7691 µs]
                        change: [-2.6568% -2.4716% -2.3116%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

code93                  time:   [820.46 ns 820.65 ns 820.85 ns]
                        change: [-3.1357% -3.0517% -2.9652%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

datamatrix              time:   [13.681 µs 13.683 µs 13.685 µs]
                        change: [-2.9173% -2.8291% -2.7344%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low mild
  7 (7.00%) high severe

ean8                    time:   [690.00 ns 690.49 ns 691.08 ns]
                        change: [-2.1332% -1.9839% -1.8160%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) high mild
  10 (10.00%) high severe

ean13                   time:   [2.7341 µs 2.7356 µs 2.7375 µs]
                        change: [-2.1441% -1.8200% -1.4603%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

itf                     time:   [1.0960 µs 1.0965 µs 1.0969 µs]
                        change: [-2.5821% -2.4990% -2.4139%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

maxicode                time:   [57.512 µs 57.523 µs 57.535 µs]
                        change: [-0.4825% -0.3314% -0.1911%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) high mild
  10 (10.00%) high severe

pdf417                  time:   [79.445 µs 79.467 µs 79.493 µs]
                        change: [-0.4912% -0.3818% -0.2753%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) high mild
  9 (9.00%) high severe

qrcode                  time:   [440.98 µs 441.09 µs 441.21 µs]
                        change: [-2.1329% -2.0165% -1.8954%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) high mild
  6 (6.00%) high severe

rss14                   time:   [5.1676 µs 5.1697 µs 5.1723 µs]
                        change: [-0.0093% +0.1046% +0.2161%] (p = 0.07 > 0.05)
                        No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
  1 (1.00%) high mild
  13 (13.00%) high severe

rss_expanded            time:   [3.9739 µs 3.9765 µs 3.9796 µs]
                        change: [-3.0311% -2.9088% -2.7851%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  8 (8.00%) high mild
  8 (8.00%) high severe

upca                    time:   [21.310 µs 21.318 µs 21.327 µs]
                        change: [-0.4571% -0.3591% -0.2688%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
  8 (8.00%) high mild
  4 (4.00%) high severe

upce                    time:   [623.36 ns 623.61 ns 623.92 ns]
                        change: [-3.1519% -2.9895% -2.8580%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

multi_barcode           time:   [3.6061 ms 3.6088 ms 3.6115 ms]
                        change: [-0.1019% +0.1280% +0.3666%] (p = 0.28 > 0.05)
                        No change in performance detected.

So, without changes to the codebase in any way besides rerunning cargo bench I got a result that suggests every benchmark improved by about 2%.

I also don't know enough about the rust arm64 target vs the amd64 target to know how different I should expect runs between them to be.

The only place where a difference in performance makes sense to me is in the GenericMultipleBarcodeReader, because it relies on re-using readers several times, so we end up creating readers multiple times, as opposed to the Box version, where the readers are created and then held in memory for subsequent runs. There isn't much that goes into creating a new reader, but there is a little bit.

@SteveCookTU
Copy link
Contributor Author

I think the issue is how much noise is running on a PC and how at times it can be hit or miss. It's suggested to either increase the noise threshold (it's pretty small for default) and also increase the sample size cause 100 doesn't mean much if there are a handful of outliers.

Since generics favor strict typing, one thing could be to create a variable inside the struct for each reader similar to what was done for the RSS ones. Then the creation wont happen on every iteration but it would slightly increase the size on the stack. Can still box them into another internal struct and it would only use a pointer if that were necessary.

I think the main thing is almost that can be changed at this point would change performance in minimal ways. If there are any major differences then they'd be more easily visible with benchmarks. It's hard to tell because the change might be minimal or noise is throwing off even the average of the samples. Noise is just produced from literally any process running on the system even background tasks which is hard to avoid without an official benchmark machine

@SteveCookTU
Copy link
Contributor Author

SteveCookTU commented Mar 3, 2023

Overall, it's hard to tell if dynamic dispatch really hurt performance. Everything is only abstracted over one trait Reader and/or Writer which is only one layer of abstraction. Also for the binary bitmaps and binarizer it's the same thing, just one layer of abstraction which probably isn't that large in the vtable. This probably makes lookup times faster than say 5 layers of abstraction.

I prefer the strict typing from generics for Rust but in the end if it isn't deemed necessary or unwarranted this PR can be scrapped or archived. I'll always have it on my fork if the idea were to ever see interest again.

@hschimke
Copy link
Collaborator

hschimke commented Mar 4, 2023

I ran more controlled benchmarks. MacOS makes that a little hard but I think I succeeded. For most cases it's pretty close. I think that the generics solution makes for a better api, so my plan is to merge this PR.

My one concern is efficiency for bulk runs reusing a single MultiFormatReader many times, but I'm going to put an issue in to create a new implementation of it that is optimized for many runs. The startup cost and overall memory cost of the one you built seems better for most situations. I think that's a reasonable solution: keeping your implementation as the default and adding another "multi run optimized" reader for those edge cases it makes sense.

@SteveCookTU
Copy link
Contributor Author

SteveCookTU commented Mar 4, 2023

That's a good idea. There is definitely decent overhead for bulk reads which could be reduced. I do have to fix one thing relating to resets that I forgot regarding the RSS readers so I'll commit that and then this PR should be all set

@hschimke
Copy link
Collaborator

hschimke commented Mar 4, 2023

All checks passed, I'm going to merge.

@hschimke hschimke merged commit 406a2fa into rxing-core:main Mar 4, 2023
@hschimke hschimke mentioned this pull request Mar 4, 2023
@SteveCookTU SteveCookTU deleted the generics branch March 4, 2023 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants