Assorted updates for GEOS-Chem 1-year benchmarking -- closes #163 #164

lizziel · 2022-09-22T20:46:49Z

This update includes the implementation of diff-of-diff plots for 1-year GCHP benchmarks, among other things. Here is a list of the updates:

Use lat/lon 1x1.25 for GCHP vs GCHP comparisons in 1-year benchmark. This avoids an issue with cubed-sphere to cubed-sphere regridding that is encountered in the 14.0 1-yr GCHP benchmark due to different grid resolution from the last 1-yr GCHP benchmark (c24 vs c48).
Regrid GCHP ref and GCHP dev to C48 using sparselt package prior to taking the difference for 1-yr benchmark diff-of-diff plots. This is a simpler way to regrid than using the esmf python tools implemented in GCPy. Previously no regridding was done prior to computing differences for diff-of-diffs since GCHP ref and dev were the same grid resolution. This is not the case for the GCHP 1-yr benchmark of 14.0. It will be the case, however, moving forward, and this code will no longer be needed.
Fix bugs in 1-yr GCHP benchmark code that have to do with restart file paths
Set a single benchmark results directory for 1-yr benchmarks and put all benchmark artifacts in that folder, including both GCHP comparisons and GCC vs GCC comparisons. Previously the GCC vs GCC comparison were written to the GCC dev folder and all GCHP comparisons were written to the GCHP dev folder. This was problematic when running the benchmark on other's people data (permissions issue writing to their rundir) or when using data not stored in a run directory.
Append load to calls to open data to avoid dask arrays when reading via open_mfdataset. Dask arrays cause a problem in sparselt.
Comment out using fake dimension for GCHP vs GCC diff-of-diffs plots. This code was added somewhat recently for the case of ref and dev not having the same time. However, it causes problems when there are more than one time in the file. The quick fix is to disable this feature.
Add the implementation of diff-of-diffs to 1-year benchmark code. Previously it was only implemented for 1-month benchmarks. The 1-yr benchmarks are different in that they have 12 months of data (time series length 12), rather than a single time in the files.

Signed-off-by: Lizzie Lundgren <[email protected]>

…ries The new code makes it easier to edit such that all benchmark results (GCC vs GCC, GCHP vs GCC, GCHP vs GCHP, and GCHP vs GCC diff-of-diffs) are in a single results directory rather than spread out between the GCC and GCHP dev directories. Signed-off-by: Lizzie Lundgren <[email protected]>

Signed-off-by: Lizzie Lundgren <[email protected]>

This code needs more work since it causes the plotting to crash in some instances, such as multiple times. Signed-off-by: Lizzie Lundgren <[email protected]>

This is a requirement for using the resultant xarray dataset as an argument in sparselt for regridding, if reader using open_mfdataset to open the file. Using open_mfdataset results in dask arrays which are not compatible with sparselt. This update also include minor updates to doc code in benchmark.py as well as a commented out block to not make benchmark concentration plots in parallel. Signed-off-by: Lizzie Lundgren <[email protected]>

…diffs This update uses the sparselt package to perform regridding. Regridding weights must be computed in advance. Currently the regrid file is assumed to be in weightsdir and has filename hard-coded. Using this code should only be necessary if comparing two ref datasets or two dev datasets that are different resolutions, which is not typical in benchmarking. Signed-off-by: Lizzie Lundgren <[email protected]>

…hmark Signed-off-by: Lizzie Lundgren <[email protected]>

Having separate directories for GCC vs GCC and the GCHP comparisons is still possible by editing run_1yr_fullchem_benchmark.py variables base_gcc_resultsdir and base_gchp_resultsdir. Signed-off-by: Lizzie Lundgren <[email protected]>

Signed-off-by: Lizzie Lundgren <[email protected]>

# Conflicts: # benchmark/modules/run_1yr_fullchem_benchmark.py Signed-off-by: Lizzie Lundgren <[email protected]>

This update fixes problems generating the mass tables in the 1-year benchmark simulations. However, it causes a problem with the Ox budget table generation which needs to be looked at later. The Ox budget code assumes the restart path is the full path rather than retrieve it from get_filepath(s). Signed-off-by: Lizzie Lundgren <[email protected]>

…code Signed-off-by: Lizzie Lundgren <[email protected]>

Signed-off-by: Lizzie Lundgren <[email protected]>

Benchmarks include computing OH metrics for both GCHP and GC-Classic. Signed-off-by: Lizzie Lundgren <[email protected]>

…e 1x1.25 This is a temporary work-around for cubed-sphere to cubed-sphere regridding not currently working in the benchmark plotting code. It is only applicable to GCHP vs GCHP 1-yr full chemistry benchmarks because the upcoming benchmark will compare C24 with C48. It is also only applicable to level plots (surface and 500 hPa). Signed-off-by: Lizzie Lundgren <[email protected]>

…functions gcpy/util.py - Remove hardwired restart folder file paths in get_filepath() and get_filepaths() benchmark/1mo_benchmark.yml benchmark/1yr_fullchem_benchmark.yml benchmark/1yr_tt_benchmark.yml - Rename "subdir" tag to "outputs_subdir" - Added "restarts_subdir" tags benchmark/modules/run_1yr_fullchem_benchmark.py benchmark/modules/run_1yr_tt_benchmark.py benchmark/run_benchmark.py - Now use "outputs_subdir" to construct paths to the GEOS-Chem OutputDir folders - Now use "restarts_subdir" to construct paths to the GEOS-Chem restart file folders Signed-off-by: Bob Yantosca <[email protected]>

gcpy/benchmark.py - In routine make_column_aod_plots, we set a variable quiet = not verbose, but the verbose argument is not used. This causes verbose output to be printed to the screen. To fix this, we now set verbose=False in the argument list. Signed-off-by: Bob Yantosca <[email protected]>

gcpy/benchmark.py - Remove duplicate definition of verbose=False - Make sure that compare_varnames uses quiet=(not verbose) in all routines where verbose is passed - Added verbose=False keyword in the make_benchmark_operations_budget Signed-off-by: Bob Yantosca <[email protected]>

These updates were rendered obsolete, as we can now specify the outputs_subdir and restarts_subdir for gcc/gchp in the YAML files. benchmark/1mo_benchmark.yml - Remove "is_pre_14.0" tag from ref:gcc and dev:gcc entries benchmark/run_benchmark.py: - Remove gcc_is_pre_14.0 from calls to get_filepath gcpy/util.py - Remove gcc_is_pre_14_0 argument Signed-off-by: Bob Yantosca <[email protected]>

lizziel · 2022-09-27T14:29:58Z

I am marking this PR for review. @yantosca is concurrently working on benchmark code in GCPy for the upcoming GEOS-Chem 14.0 benchmark, and may still push to this branch. I am marking him as reviewer so that this does not get merged until he gives the ok.

gcpy.benchmark.py - Add new function get_species_database_dir, which takes in a config object and returns the path to the directory where the species database file (species_database.yml) is located. benchmark/modules/run_1yr_fullchem_benchmark.py benchmark/modules/run_1yr_tt_benchmark.py benchmark/run_benchmark.py - Now call get_species_database_dir to get the spcdb_dir variable - Replace "gchp_metname" with "StateMet" Signed-off-by: Bob Yantosca <[email protected]>

gcpy/benchmark.py - If we are successful at locating the species database file, then print a message. (We had composed the message as an f-string but had never printed it). Signed-off-by: Bob Yantosca <[email protected]>

benchmark/1mo_benchmark.yml benchmark/1yr_fullchem_benchmark.yml benchmark/1yr_tt_benchmark.yml - Change "spcdb_dir: None" to "spcdb_dir: default". None is an allowable Python keyword but not an allowable YAML keyword. - Also added &--- YAML headers Signed-off-by: Bob Yantosca <[email protected]>

yantosca · 2022-10-03T19:56:48Z

Also added:
15. Allow specification of species database directory

gcpy/benchmark.py - databaase -> database - Removed exclamation point at end of successful print message Signed-off-by: Bob Yantosca <[email protected]>

benchmark/1mo_benchmark.yml benchmark/1yr_fullchem_benchmark.yml benchmark/1yr_tt_benchmark.yml - Update comments for the paths & data sections for consistency Signed-off-by: Bob Yantosca <[email protected]>

gcc_vs_gcc:run should be True not False Signed-off-by: Bob Yantosca <[email protected]>

yantosca

Some more updates are needed.

yantosca · 2022-10-03T19:57:34Z

AUTHORS.txt

@@ -1,12 +1,13 @@
-List of GCPy developers (30 Oct 2020)
+List of GCPy developers (29 Sep 2022)


This is just an updating of the GCPy authors in advance of version 1.3.0.

yantosca · 2022-10-03T19:57:52Z

LICENSE.txt

@@ -2,7 +2,7 @@ License Agreement for GCPy and related developments
 (The MIT "Expat" License, http://opensource.org/licenses/MIT)
 ==============================================================================

-Copyright (c) 2017-2020 GCPy Developers
+Copyright (c) 2017-2022 GCPy Developers


Updated the end date to 2022

yantosca · 2022-10-03T19:58:20Z

LICENSE.txt

-   promote products or services of Licensee, or any third party.
-
+   License agreement for matplotlib versions 1.3.0 and later
+   =========================================================


Added a couple more license agreements for 3rd-party packages.

yantosca · 2022-10-03T19:59:10Z

benchmark/1mo_benchmark.yml

@@ -10,49 +10,73 @@
 # to gcc_dev (not gcc_ref!). This ensures consistency in version names
 # when doing GCHP vs GCC diff-of-diffs (mps, 6/27/19)
 # =====================================================================
-# configuration for 1 month benchmark
+#


Grouped comments together for easier readability

yantosca · 2022-10-03T20:01:16Z

benchmark/1mo_benchmark.yml

+#   main_dir:  High-level directory containing subdirectories with dat
+#   results_dir: Directory where plots/tables will be created
+#   weights_dir: Path to regridding weights
+#   spcdb_dir: Path to species_database.yml.  If equal to None, will


You can now specify the path to the species_database.yml file, or use "default" to tell the benchmark scripts to look in one of the Dev folders.

yantosca · 2022-10-03T20:57:12Z

gcpy/util.py

@@ -1646,10 +1627,9 @@ def get_filepath(
        col,
        date,
        is_gchp=False,
-        gchp_res="00",
+        gchp_res="c00",


Now include the "c" in GCHP resolution string default values.

yantosca · 2022-10-03T20:57:31Z

gcpy/util.py

@@ -1669,8 +1649,10 @@ def get_filepath(
            Set this switch to True to obtain file pathnames to
            GCHP diagnostic data files. If False, assumes GEOS-Chem "Classic"

-        gchp_res: int
-            Cubed-sphere resolution of GCHP data grid. Only needed for restart files.
+        gchp_res: str


Updated comment to denote gchp_res is now of type str.

yantosca · 2022-10-03T20:58:14Z

gcpy/util.py

-                data_list.add(trimmed_path)
-
-        # Read next line
+    # Open file


Pylint identified that we should use "with" when opening a file name for reading.

yantosca · 2022-10-03T20:58:50Z

setup.py

@@ -92,8 +92,9 @@ def _write_version_file():
    packages = find_packages(),
    include_package_data=True,
    install_requires=["xesmf>=0.2.1", "scipy>=1.3.1", "Cartopy>=0.17.0", "pandas>=0.25.1",
-                      "matplotlib>=3.1.1", "tabulate>=0.8.3", "joblib>=0.17.0", "xbpch>=0.3.5", 
-                      "numpy>=1.19.1", "PyPDF2>=1.26.0", "sphinx", "sphinx_rtd_theme", "sphinx-autoapi"],
+                      "matplotlib>=3.1.1", "tabulate>=0.8.3", "joblib>=0.17.0", "xbpch>=0.3.5",


Note, needs to be updated w/ new package versions

yantosca · 2022-10-03T20:59:28Z

gcpy/budget_tt.py

-                self.devrstdir,
-                "GEOSChem.Restart.{}*.nc4".format(self.y1_str)
-            )
+        # Initial restart file


Now use get_filepath to get restart file paths

Signed-off-by: Bob Yantosca <[email protected]>

benchmark/1mo_benchmark.yml - Change GCC restarts_subdir tags to "Restarts", as this is the new name of the restarts subdirectory going forward (not "restarts") benchmark/1yr_fullchem_benchmark.yml benchmark/1yr_tt_benchmark.yml - Add settings for current gcpy_test_data (with comments that these can be edited if needed) Signed-off-by: Bob Yantosca <[email protected]>

The ReadTheDocs build should pick up the proper packages from the docs/source/requirements.yml file. Signed-off-by: Bob Yantosca <[email protected]>

setup.py - Now use the same package versions as in environment.yml. These now are specific package versions which should prevent incompatibility errors. Signed-off-by: Bob Yantosca <[email protected]>

benchmark/1mo_benchmark.yml benchmark/1yr_fullchem_benchmark.yml benchmark/1yr_tt_benchmark.yml - Remove gcc:is_pre_14.0. This is made obsolete by the fix to the get_filepath and get_filepaths routines. Signed-off-by: Bob Yantosca <[email protected]>

Signed-off-by: Bob Yantosca <[email protected]>

yantosca · 2022-10-04T19:16:50Z

Also note the changelog updates:

Unreleased

Added

New features in benchmarking scripts (@lizziel, @yantosca)
- Extra print statements (@lizziel)
- Diff-of-diffs plots for 1-year benchmarks (@lizziel)
- sparselt is now a GCPy requirement (@lizziel)
- Add switch for
Removed obsolete environment.yml files (@yantosca)
Added requirements.yml to docs folder for Sphinx/RTD documentation (@yantosca)
New regridding script regrid_restart_file.py (@LiamBindle)

Changed

Fixed several issues in benchmarking scripts (@laestrada, @lizziel, @yantosca)
- Add OMP_NUM_THREADS and OMP_STACKSIZE in plot_driver.sh (@yantosca)
- Increase requested memory to 50MB in plot_driver.sh (@yantosca)
- Benchmark scripts print a message upon completion (@yantosca)
- Linted several benchmarking routines with Pylint (@yantosca)
- Rewrote algorithm of add_lumped_species_to_dataset for speed (@yantosca)
- Can now specify the path to species_database.yml for 1yr benchmarks (@yantosca)
- 1-yr benchmarks now save output in subdirs of the same path (@lizziel)
- Avoid hardwiring restart file paths in benchmark scripts (@yantosca)
- Now use outputs_subdir tag from YAML file for paths to diagnostic files (@yantosca)
- Now use restarts_subdir tag from YAML file for paths to restart files (@yantosca)
- GCPy now uses proper year for dev in 1-yr benchmarks (@laestrada)
- Fixed date string issue in benchmarking scripts (@lizziel)
- Updates for new GCHP restart file format (@lizziel)
Updated environment.yml with package versions that work together (@yantosca)
Updated the AUTHORS.txt and LICENSE.txt files (@yantosca)

gcpy/benchmark.py gcpy/budget_ox.py gcpy/budget_tt.py gcpy/ste_flux.py - At the end of each method, manually delete the larger objects (such as xarray Datasets) and call gc.collect() to force garbage collection. This should hopefully prevent problems with the benchmark scripts halting due to exceeding requested memory (which seems to happen if all 1-year benchmark artifacts are requested). Signed-off-by: Bob Yantosca <[email protected]>

CHANGELOG.md - Add info about garbage collection benchmark/plot_driver.sh - Request 4 hours of run time via SLURM Signed-off-by: Bob Yantosca <[email protected]>

yantosca · 2022-10-05T13:48:06Z

CHANGELOG.md

@@ -6,7 +6,32 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

 ## Unreleased
 ### Added


Updated the changelog for 1.3.0

yantosca

I approve of these changes, and we can merge into Dev.

Note, there is still a memory issue (see #174) that I believe is related to joblib. The workaround is to submit gcc_vs_gcc, gchp_vs_gcc,, gchp_vs_gchp, and gchp_vs_gcc_diff_of_diffs as separate jobs (at least for the 1-year benchmarks).

lizziel added 18 commits September 20, 2022 13:15

Expand prints to give information about reading benchmark data

ba92283

Signed-off-by: Lizzie Lundgren <[email protected]>

Include diff-of-diffs in 1-year fullchem benchmark yaml file

fe2233a

Signed-off-by: Lizzie Lundgren <[email protected]>

Fix bug in extracting GCHP restart filename if multiple times in a file

fe248eb

Signed-off-by: Lizzie Lundgren <[email protected]>

Fix bugs in 1-yr fullchem benchmark run script

50237fc

Signed-off-by: Lizzie Lundgren <[email protected]>

Add grid res option for GCHP vs GCHP 1-yr fullchem benchmark conc plots

8a44076

Signed-off-by: Lizzie Lundgren <[email protected]>

Add GCHP vs GCC diff-of-diffs to 1yr fullchem benchmark

497b530

Signed-off-by: Lizzie Lundgren <[email protected]>

Update plot.py docs for diff-of-diffs

ba4a330

Signed-off-by: Lizzie Lundgren <[email protected]>

Comment out using fake time dimension for GCHP vs GCC diff-of-diffs

7d9de34

This code needs more work since it causes the plotting to crash in some instances, such as multiple times. Signed-off-by: Lizzie Lundgren <[email protected]>

Use run dirs not version strings for data paths in 1-yr fullchem benc…

a76fea8

…hmark Signed-off-by: Lizzie Lundgren <[email protected]>

Add sparselt package as a gcpy requirement

e565bea

Signed-off-by: Lizzie Lundgren <[email protected]>

Merge branch 'dev' into feature/gchp_1yr_benchmark_diff_of_diffs

b762971

# Conflicts: # benchmark/modules/run_1yr_fullchem_benchmark.py Signed-off-by: Lizzie Lundgren <[email protected]>

Minor benchmark code updates: error warning prints and remove unused …

f4d3d0f

…code Signed-off-by: Lizzie Lundgren <[email protected]>

Fix bugs in new code to regrid GCHP files for diff-of-diff comparison

d9d328f

Signed-off-by: Lizzie Lundgren <[email protected]>

lizziel added the topic: Benchmark Plots and Tables Issues pertaining to generating plots/tables from benchmark output label Sep 22, 2022

lizziel self-assigned this Sep 22, 2022

lizziel and others added 5 commits September 23, 2022 11:05

Remove benchmark yml file comment that OH metrics is GC-Classic only

c0ee97a

Benchmarks include computing OH metrics for both GCHP and GC-Classic. Signed-off-by: Lizzie Lundgren <[email protected]>

yantosca mentioned this pull request Sep 26, 2022

[BUG/ISSUE] Inconsistent restart file path usage in 1-year benchmarks #163

Closed

yantosca changed the title ~~Assorted updates for GEOS-Chem 1-year benchmarking~~ Assorted updates for GEOS-Chem 1-year benchmarking -- closes #163 Sep 26, 2022

lizziel marked this pull request as ready for review September 27, 2022 14:30

yantosca added 3 commits October 3, 2022 15:29

Add missing print statement in get_species_database_dir function

7be90b8

gcpy/benchmark.py - If we are successful at locating the species database file, then print a message. (We had composed the message as an f-string but had never printed it). Signed-off-by: Bob Yantosca <[email protected]>

yantosca added 3 commits October 3, 2022 16:07

Fixed a couple of typos in gcpy/benchmark.py

7f31a44

gcpy/benchmark.py - databaase -> database - Removed exclamation point at end of successful print message Signed-off-by: Bob Yantosca <[email protected]>

Update comments in 1yr_*_benchmark.yml files

ea8985e

benchmark/1mo_benchmark.yml benchmark/1yr_fullchem_benchmark.yml benchmark/1yr_tt_benchmark.yml - Update comments for the paths & data sections for consistency Signed-off-by: Bob Yantosca <[email protected]>

Fixed typo in benchmark/1yr_tt_benchmark.yml

e17eef8

gcc_vs_gcc:run should be True not False Signed-off-by: Bob Yantosca <[email protected]>

yantosca reviewed Oct 3, 2022

View reviewed changes

yantosca added 5 commits October 4, 2022 10:48

Bug fix in run_benchmark.py: change exists to os.path.exists

7209fcb

Signed-off-by: Bob Yantosca <[email protected]>

Removed readthedocs.yml

0b4026a

The ReadTheDocs build should pick up the proper packages from the docs/source/requirements.yml file. Signed-off-by: Bob Yantosca <[email protected]>

Updated setup.py with the same package versions as in environnment.yml

d1616e2

setup.py - Now use the same package versions as in environment.yml. These now are specific package versions which should prevent incompatibility errors. Signed-off-by: Bob Yantosca <[email protected]>

yantosca requested a review from laestrada October 4, 2022 18:44

yantosca mentioned this pull request Oct 4, 2022

[FEATURE REQUEST] Specify species database path in benchmark yml files #162

Closed

Updated CHANGELOG.md with recent updates to benchmarking scripts etc.

f0003b9

Signed-off-by: Bob Yantosca <[email protected]>

yantosca added 2 commits October 4, 2022 16:42

Update CHANGELOG.md; plot_driver.sh now requests 4 hours

3c7eb5f

CHANGELOG.md - Add info about garbage collection benchmark/plot_driver.sh - Request 4 hours of run time via SLURM Signed-off-by: Bob Yantosca <[email protected]>

yantosca mentioned this pull request Oct 4, 2022

[BUG/ISSUE] 1-year benchmarking scripts eventually run out of memory? #174

Closed

yantosca linked an issue Oct 4, 2022 that may be closed by this pull request

[BUG/ISSUE] 1-year benchmarking scripts eventually run out of memory? #174

Closed

yantosca reviewed Oct 5, 2022

View reviewed changes

yantosca approved these changes Oct 5, 2022

View reviewed changes

yantosca merged commit 3c7eb5f into dev Oct 5, 2022

yantosca deleted the feature/gchp_1yr_benchmark_diff_of_diffs branch October 5, 2022 14:03

yantosca added this to the 1.3.2 milestone May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assorted updates for GEOS-Chem 1-year benchmarking -- closes #163 #164

Assorted updates for GEOS-Chem 1-year benchmarking -- closes #163 #164

lizziel commented Sep 22, 2022 •

edited

Loading

lizziel commented Sep 27, 2022

yantosca commented Oct 3, 2022

yantosca left a comment

yantosca Oct 3, 2022

yantosca Oct 3, 2022

yantosca Oct 3, 2022

yantosca Oct 3, 2022

yantosca Oct 3, 2022

yantosca Oct 3, 2022

yantosca Oct 3, 2022

yantosca Oct 3, 2022

yantosca Oct 3, 2022

yantosca Oct 3, 2022

yantosca commented Oct 4, 2022

yantosca Oct 5, 2022

yantosca left a comment

		@@ -1,12 +1,13 @@
		List of GCPy developers (30 Oct 2020)
		List of GCPy developers (29 Sep 2022)

		@@ -6,7 +6,32 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

		## Unreleased
		### Added

Assorted updates for GEOS-Chem 1-year benchmarking -- closes #163 #164

Assorted updates for GEOS-Chem 1-year benchmarking -- closes #163 #164

Conversation

lizziel commented Sep 22, 2022 • edited Loading

lizziel commented Sep 27, 2022

yantosca commented Oct 3, 2022

yantosca left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yantosca commented Oct 4, 2022

Unreleased

Added

Changed

Choose a reason for hiding this comment

yantosca left a comment

Choose a reason for hiding this comment

lizziel commented Sep 22, 2022 •

edited

Loading