Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance patch and a few enhancements #568

Closed
wants to merge 1 commit into from

Conversation

bpasteur
Copy link

I am using the OpenJpeg libraries to convert between large TIF and JP2 files. A while back I found Aaron Boxer's OpenMP patch (https://code.google.com/p/openjpeg/issues/detail?id=372). It had a memory leak which led to poor performance with large files. After some small changes to correct the memory leak I am seeing a substantial performance boost. I am very interested in getting this patch accepted into the main branch so the changes can be maintained going forward.

The performance gains I am seeing are well worth incorporating this patch into the code base. This patch allows you to take advantage of all the CPU's in the system. With the current trunk I am seeing 20 to 30 percent CPU utilization, with this patch (using the same number of threads as CPU cores) I am seeing 80 to 90 percent CPU utilization. In a system with a fast CPU, a lot of cores, and a lot of memory you could scale up the number of threads to really take advantage of the system resources. Storing large files and creating large mosaics are good candidates for the jp2 files and the performance numbers appear to be better for these larger files. A chart with some performance numbers comparing the main branch with the patch is listed at the bottom of this post.

I included some additional enhancements along with the performance patch:

 * added error checking to the thread loops
 * allow setting the number of threads to use (the default is the number of processors) in the library code
 * added parameters to opj_compress and opj_decompress to allow passing in the number of threads to use
 * added parameters to opj_compress to suppress warning pop-ups for unknown tag types (to enable performance testing scrips)
 * added additional timing prints to opj_compress and opj_decompress
 * added a check for tiled tif files and bailing with an error since they are not yet supported

As for the performance numbers, I am running on a virtual Windows 7 64 bit machine, 4 processors and 6 GB of memory. There are no OpenMP enhancements in the code that loads and converts the BMP, PNG, and TIF files into an image in memory before being processed into a .jp2 file. Because of this those times are not included in the timing analysis. Likewise with decompression the actual writing of the decompressed file is excluded from the timing analysis. As best as I can tell performance number calculations are all over the map. I am categorizing the performance numbers here as % faster (original time - new time) / new time. This gives a good indication of the performance I am actually seeing. The original times are also included in the list for any alternative formulas.

From what I've seen the smaller files do not get as much benefit from the threading. My guess is that the times are so short the overhead of managing the threads eats into the performance gains. There is also a lot of variance in times with the small files, likely due to the normal system usage noise. The BMP files show the smallest gains compared to PNG and TIF files. Generally speaking when creating tiled JP2 files the performance gains are less for the smaller tile sizes. In some cases negative gains are seen using the BMP files. The best performance gains seen are with large TIF files, compression giving the best results.

Image Size Tiles Original Compress Patch Compress % faster Compress Original Decompress Patch Decompress % faster Decompress
file1.bmp 4.95MB No Tiles 1.179828 0.866847 36.11 0.977499 0.750414 30.26
file1.bmp 4.95MB 512x512 0.785534 0.592355 32.61 0.555878 0.512308 8.50
file1.bmp 4.95MB 1024x1024 1.067184 0.721891 47.83 0.801062 0.59383 34.90
file1.bmp 4.95MB 2048x2048 1.289697 1.002421 28.66 1.063665 0.828963 28.31
file1.bmp 4.95MB 4096x4096 1.160955 0.874976 32.68 0.979292 0.743863 31.65
file2.bmp 11.42MB No Tiles 2.282978 2.038762 11.98 2.058321 1.854108 11.01
file2.bmp 11.42MB 512x512 1.032834 0.961137 7.46 0.776514 0.859627 -9.67
file2.bmp 11.42MB 1024x1024 1.580861 1.396536 13.20 1.370536 1.247963 9.82
file2.bmp 11.42MB 2048x2048 2.125625 1.891052 12.40 1.904789 1.702871 11.86
file2.bmp 11.42MB 4096x4096 2.298492 2.038751 12.74 2.072052 1.981888 4.55
file3.bmp 21.97MB No Tiles 6.451538 4.656454 38.55 5.489996 4.340535 26.48
file3.bmp 21.97MB 512x512 3.52326 2.599379 35.54 2.661895 2.195166 21.26
file3.bmp 21.97MB 1024x1024 4.648617 3.013 54.29 3.784424 2.679082 41.26
file3.bmp 21.97MB 2048x2048 5.768925 4.06578 41.89 4.911037 3.769211 30.29
file3.bmp 21.97MB 4096x4096 6.400979 4.652175 37.59 5.607541 4.30835 30.16
file4.bmp 49.32MB No Tiles 13.004941 11.651772 11.61 12.508767 11.643189 7.43
file4.bmp 49.32MB 512x512 4.237911 4.423962 -4.21 3.328917 3.80946 -12.61
file4.bmp 49.32MB 1024x1024 7.223601 5.959255 21.22 6.295957 5.73972 9.69
file4.bmp 49.32MB 2048x2048 9.588559 8.339286 14.98 8.775902 7.980153 9.97
file4.bmp 49.32MB 4096x4096 11.00858 9.530159 15.51 10.107587 9.132797 10.67
file1.png 1.45MB No Tiles 1.088024 0.605926 79.56 0.805388 0.418722 92.34
file1.png 1.45MB 512x512 0.988469 0.548657 80.16 0.652538 0.372991 74.95
file1.png 1.45MB 1024x1024 1.165803 0.628769 85.41 0.834257 0.422873 97.28
file1.png 1.45MB 2048x2048 1.093828 0.587083 86.32 0.804756 0.407083 97.69
file1.png 1.45MB 4096x4096 1.088535 0.567778 91.72 0.792234 0.384203 106.20
file2.png 4.18MB No Tiles 3.683501 1.717639 114.45 2.92906 1.381902 111.96
file2.png 4.18MB 512x512 2.691555 1.442813 86.55 1.908611 1.051161 81.57
file2.png 4.18MB 1024x1024 3.197382 1.669987 91.46 2.392201 1.245044 92.14
file2.png 4.18MB 2048x2048 3.658982 1.725868 112.01 2.944026 1.379621 113.39
file2.png 4.18MB 4096x4096 3.618815 1.716116 110.87 2.968533 1.393181 113.08
file3.png 7.22MB No Tiles 8.657535 4.116039 110.34 7.112438 3.576212 98.88
file3.png 7.22MB 512x512 5.734775 3.025372 89.56 4.067705 2.323575 75.06
file3.png 7.22MB 1024x1024 7.266594 3.484759 108.53 5.622319 2.768363 103.09
file3.png 7.22MB 2048x2048 8.634415 4.063327 112.50 6.950068 3.517615 97.58
file3.png 7.22MB 4096x4096 8.686937 4.027346 115.70 7.113517 3.902804 82.27
file4.png 7.90MB No Tiles 8.238915 4.029098 104.49 6.808466 3.693199 84.35
file4.png 7.90MB 512x512 5.548535 2.929953 89.37 4.009227 2.26104 77.32
file4.png 7.90MB 1024x1024 6.890034 3.263396 111.13 5.280665 2.634434 100.45
file4.png 7.90MB 2048x2048 8.173525 3.870993 111.15 6.566475 3.451295 90.26
file4.png 7.90MB 4096x4096 8.241761 4.049314 103.53 6.433433 3.263885 97.11
file1.tif 6.97MB No Tiles 2.16728 0.953582 127.28 1.698278 0.707933 139.89
file1.tif 6.97MB 512x512 1.739916 0.969467 79.47 1.24823 0.752469 65.88
file1.tif 6.97MB 1024x1024 1.985924 1.07201 85.25 1.503963 0.748056 101.05
file1.tif 6.97MB 2048x2048 2.179823 0.945449 130.56 1.699182 0.704992 141.02
file1.tif 6.97MB 4096x4096 2.180064 0.930357 134.33 1.691063 0.77508 118.18
file2.tif 148.67MB No Tiles 60.247377 21.038146 186.37 51.070741 20.601996 147.89
file2.tif 148.67MB 512x512 36.455834 16.876595 116.01 25.815366 13.657244 89.02
file2.tif 148.67MB 1024x1024 43.563181 18.055617 141.27 33.694943 14.713993 129.00
file2.tif 148.67MB 2048x2048 50.489958 19.284282 161.82 41.063985 16.534817 148.35
file2.tif 148.67MB 4096x4096 54.223868 19.900259 172.48 44.68764 17.377504 157.16
file3.tif 211.03MB No Tiles 93.818508 32.670882 187.16 78.758161 29.85254 163.82
file3.tif 211.03MB 512x512 57.801283 26.347154 119.38 42.140125 21.014166 100.53
file3.tif 211.03MB 1024x1024 68.991805 27.682716 149.22 53.03097 23.047439 130.09
file3.tif 211.03MB 2048x2048 80.8344 31.481119 156.77 64.946273 25.969722 150.08
file3.tif 211.03MB 4096x4096 85.720982 31.623811 171.06 69.468997 28.058295 147.59
file4.tif 385.09MB No Tiles 822.715578 411.348062 100.00 354.907433 213.908053 65.92
file4.tif 385.09MB 512x512 165.297758 83.523793 97.90 125.226607 72.45033 72.84
file4.tif 385.09MB 1024x1024 208.883609 92.667119 125.41 171.958026 79.620845 115.97
file4.tif 385.09MB 2048x2048 262.382347 104.657886 150.70 224.11709 95.285472 135.21
file4.tif 385.09MB 4096x4096 298.210729 115.286866 158.67 261.233092 107.231279 143.62

…itional enhancements

     * added error checking to the thread loops
     * allow setting the number of threads to use (the default is the number of processors) in the library code
     * added parameters to opj_compress and opj_decompress to allow passing in the number of threads to use
     * added parameters to opj_compress to suppress warning pop-ups for unknown tag types (to enable performance testing scrips)
     * added additional timing prints to opj_compress and opj_decompress
     * added a check for tiled tif files and bailing with an error since they are not yet supported
@cth103
Copy link

cth103 commented Sep 17, 2015

I didn't write the OpenMP patch. Some time ago I posted an optimisation patch which used some of the ideas from Taubman/Marcellin to speed up the encoder (specifically the T1). My patch did not use OpenMP though.

The link to google groups above is me discussing my patch, not the OpenMP one.

@bpasteur
Copy link
Author

Sorry Carl (and Aaron), I misread Aaron's comment in his T1 optimize pull request. I have corrected my comment above.

@detonin detonin added this to the OPJv2.2 milestone Sep 18, 2015
@rouault
Copy link
Collaborator

rouault commented May 24, 2016

@boxerab Do you confirm that the work in a497d80 derives from an initial work from you and can be used under the OpenJPEG BSD license ? (I cannot find a trace of the "T1 optimize pull request" mentionned in above comments ) I'm considering applying it on top of my latest improvements in T1

@rouault
Copy link
Collaborator

rouault commented May 25, 2016

I finally decided to implement multi-threading decoding my own way in PR #786

@mayeut mayeut removed this from the OPJ v2.2.0 milestone Sep 26, 2016
@mayeut
Copy link
Collaborator

mayeut commented Sep 26, 2016

@detonin, I think this PR has been superseded by #786.
Although this one contains encoding optimizations, I think those shall now use the same mechanisms as those used for decoding in #786

@rouault
Copy link
Collaborator

rouault commented Jul 30, 2017

Closing as superseded by #786. @bpasteur If you want to add multi-threaded optimizations on the encoding side, please use the thread pool mechanism now in master

@rouault rouault closed this Jul 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants