-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assorted updates for GEOS-Chem 1-year benchmarking -- closes #163 #164
Conversation
Signed-off-by: Lizzie Lundgren <[email protected]>
Signed-off-by: Lizzie Lundgren <[email protected]>
Signed-off-by: Lizzie Lundgren <[email protected]>
Signed-off-by: Lizzie Lundgren <[email protected]>
Signed-off-by: Lizzie Lundgren <[email protected]>
…ries The new code makes it easier to edit such that all benchmark results (GCC vs GCC, GCHP vs GCC, GCHP vs GCHP, and GCHP vs GCC diff-of-diffs) are in a single results directory rather than spread out between the GCC and GCHP dev directories. Signed-off-by: Lizzie Lundgren <[email protected]>
Signed-off-by: Lizzie Lundgren <[email protected]>
Signed-off-by: Lizzie Lundgren <[email protected]>
This code needs more work since it causes the plotting to crash in some instances, such as multiple times. Signed-off-by: Lizzie Lundgren <[email protected]>
This is a requirement for using the resultant xarray dataset as an argument in sparselt for regridding, if reader using open_mfdataset to open the file. Using open_mfdataset results in dask arrays which are not compatible with sparselt. This update also include minor updates to doc code in benchmark.py as well as a commented out block to not make benchmark concentration plots in parallel. Signed-off-by: Lizzie Lundgren <[email protected]>
…diffs This update uses the sparselt package to perform regridding. Regridding weights must be computed in advance. Currently the regrid file is assumed to be in weightsdir and has filename hard-coded. Using this code should only be necessary if comparing two ref datasets or two dev datasets that are different resolutions, which is not typical in benchmarking. Signed-off-by: Lizzie Lundgren <[email protected]>
…hmark Signed-off-by: Lizzie Lundgren <[email protected]>
Having separate directories for GCC vs GCC and the GCHP comparisons is still possible by editing run_1yr_fullchem_benchmark.py variables base_gcc_resultsdir and base_gchp_resultsdir. Signed-off-by: Lizzie Lundgren <[email protected]>
Signed-off-by: Lizzie Lundgren <[email protected]>
# Conflicts: # benchmark/modules/run_1yr_fullchem_benchmark.py Signed-off-by: Lizzie Lundgren <[email protected]>
This update fixes problems generating the mass tables in the 1-year benchmark simulations. However, it causes a problem with the Ox budget table generation which needs to be looked at later. The Ox budget code assumes the restart path is the full path rather than retrieve it from get_filepath(s). Signed-off-by: Lizzie Lundgren <[email protected]>
…code Signed-off-by: Lizzie Lundgren <[email protected]>
Signed-off-by: Lizzie Lundgren <[email protected]>
Benchmarks include computing OH metrics for both GCHP and GC-Classic. Signed-off-by: Lizzie Lundgren <[email protected]>
…e 1x1.25 This is a temporary work-around for cubed-sphere to cubed-sphere regridding not currently working in the benchmark plotting code. It is only applicable to GCHP vs GCHP 1-yr full chemistry benchmarks because the upcoming benchmark will compare C24 with C48. It is also only applicable to level plots (surface and 500 hPa). Signed-off-by: Lizzie Lundgren <[email protected]>
…functions gcpy/util.py - Remove hardwired restart folder file paths in get_filepath() and get_filepaths() benchmark/1mo_benchmark.yml benchmark/1yr_fullchem_benchmark.yml benchmark/1yr_tt_benchmark.yml - Rename "subdir" tag to "outputs_subdir" - Added "restarts_subdir" tags benchmark/modules/run_1yr_fullchem_benchmark.py benchmark/modules/run_1yr_tt_benchmark.py benchmark/run_benchmark.py - Now use "outputs_subdir" to construct paths to the GEOS-Chem OutputDir folders - Now use "restarts_subdir" to construct paths to the GEOS-Chem restart file folders Signed-off-by: Bob Yantosca <[email protected]>
gcpy/benchmark.py - In routine make_column_aod_plots, we set a variable quiet = not verbose, but the verbose argument is not used. This causes verbose output to be printed to the screen. To fix this, we now set verbose=False in the argument list. Signed-off-by: Bob Yantosca <[email protected]>
gcpy/benchmark.py - Remove duplicate definition of verbose=False - Make sure that compare_varnames uses quiet=(not verbose) in all routines where verbose is passed - Added verbose=False keyword in the make_benchmark_operations_budget Signed-off-by: Bob Yantosca <[email protected]>
These updates were rendered obsolete, as we can now specify the outputs_subdir and restarts_subdir for gcc/gchp in the YAML files. benchmark/1mo_benchmark.yml - Remove "is_pre_14.0" tag from ref:gcc and dev:gcc entries benchmark/run_benchmark.py: - Remove gcc_is_pre_14.0 from calls to get_filepath gcpy/util.py - Remove gcc_is_pre_14_0 argument Signed-off-by: Bob Yantosca <[email protected]>
I am marking this PR for review. @yantosca is concurrently working on benchmark code in GCPy for the upcoming GEOS-Chem 14.0 benchmark, and may still push to this branch. I am marking him as reviewer so that this does not get merged until he gives the ok. |
gcpy.benchmark.py - Add new function get_species_database_dir, which takes in a config object and returns the path to the directory where the species database file (species_database.yml) is located. benchmark/modules/run_1yr_fullchem_benchmark.py benchmark/modules/run_1yr_tt_benchmark.py benchmark/run_benchmark.py - Now call get_species_database_dir to get the spcdb_dir variable - Replace "gchp_metname" with "StateMet" Signed-off-by: Bob Yantosca <[email protected]>
gcpy/benchmark.py - If we are successful at locating the species database file, then print a message. (We had composed the message as an f-string but had never printed it). Signed-off-by: Bob Yantosca <[email protected]>
benchmark/1mo_benchmark.yml benchmark/1yr_fullchem_benchmark.yml benchmark/1yr_tt_benchmark.yml - Change "spcdb_dir: None" to "spcdb_dir: default". None is an allowable Python keyword but not an allowable YAML keyword. - Also added &--- YAML headers Signed-off-by: Bob Yantosca <[email protected]>
Also added: |
gcpy/benchmark.py - databaase -> database - Removed exclamation point at end of successful print message Signed-off-by: Bob Yantosca <[email protected]>
benchmark/1mo_benchmark.yml benchmark/1yr_fullchem_benchmark.yml benchmark/1yr_tt_benchmark.yml - Update comments for the paths & data sections for consistency Signed-off-by: Bob Yantosca <[email protected]>
gcc_vs_gcc:run should be True not False Signed-off-by: Bob Yantosca <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more updates are needed.
@@ -1,12 +1,13 @@ | |||
List of GCPy developers (30 Oct 2020) | |||
List of GCPy developers (29 Sep 2022) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just an updating of the GCPy authors in advance of version 1.3.0.
@@ -2,7 +2,7 @@ License Agreement for GCPy and related developments | |||
(The MIT "Expat" License, http://opensource.org/licenses/MIT) | |||
============================================================================== | |||
|
|||
Copyright (c) 2017-2020 GCPy Developers | |||
Copyright (c) 2017-2022 GCPy Developers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the end date to 2022
promote products or services of Licensee, or any third party. | ||
|
||
License agreement for matplotlib versions 1.3.0 and later | ||
========================================================= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a couple more license agreements for 3rd-party packages.
@@ -10,49 +10,73 @@ | |||
# to gcc_dev (not gcc_ref!). This ensures consistency in version names | |||
# when doing GCHP vs GCC diff-of-diffs (mps, 6/27/19) | |||
# ===================================================================== | |||
# configuration for 1 month benchmark | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grouped comments together for easier readability
benchmark/1mo_benchmark.yml
Outdated
# main_dir: High-level directory containing subdirectories with dat | ||
# results_dir: Directory where plots/tables will be created | ||
# weights_dir: Path to regridding weights | ||
# spcdb_dir: Path to species_database.yml. If equal to None, will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can now specify the path to the species_database.yml file, or use "default" to tell the benchmark scripts to look in one of the Dev folders.
@@ -1646,10 +1627,9 @@ def get_filepath( | |||
col, | |||
date, | |||
is_gchp=False, | |||
gchp_res="00", | |||
gchp_res="c00", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now include the "c" in GCHP resolution string default values.
@@ -1669,8 +1649,10 @@ def get_filepath( | |||
Set this switch to True to obtain file pathnames to | |||
GCHP diagnostic data files. If False, assumes GEOS-Chem "Classic" | |||
|
|||
gchp_res: int | |||
Cubed-sphere resolution of GCHP data grid. Only needed for restart files. | |||
gchp_res: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated comment to denote gchp_res is now of type str.
data_list.add(trimmed_path) | ||
|
||
# Read next line | ||
# Open file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pylint identified that we should use "with" when opening a file name for reading.
setup.py
Outdated
@@ -92,8 +92,9 @@ def _write_version_file(): | |||
packages = find_packages(), | |||
include_package_data=True, | |||
install_requires=["xesmf>=0.2.1", "scipy>=1.3.1", "Cartopy>=0.17.0", "pandas>=0.25.1", | |||
"matplotlib>=3.1.1", "tabulate>=0.8.3", "joblib>=0.17.0", "xbpch>=0.3.5", | |||
"numpy>=1.19.1", "PyPDF2>=1.26.0", "sphinx", "sphinx_rtd_theme", "sphinx-autoapi"], | |||
"matplotlib>=3.1.1", "tabulate>=0.8.3", "joblib>=0.17.0", "xbpch>=0.3.5", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, needs to be updated w/ new package versions
self.devrstdir, | ||
"GEOSChem.Restart.{}*.nc4".format(self.y1_str) | ||
) | ||
# Initial restart file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now use get_filepath to get restart file paths
Signed-off-by: Bob Yantosca <[email protected]>
benchmark/1mo_benchmark.yml - Change GCC restarts_subdir tags to "Restarts", as this is the new name of the restarts subdirectory going forward (not "restarts") benchmark/1yr_fullchem_benchmark.yml benchmark/1yr_tt_benchmark.yml - Add settings for current gcpy_test_data (with comments that these can be edited if needed) Signed-off-by: Bob Yantosca <[email protected]>
The ReadTheDocs build should pick up the proper packages from the docs/source/requirements.yml file. Signed-off-by: Bob Yantosca <[email protected]>
setup.py - Now use the same package versions as in environment.yml. These now are specific package versions which should prevent incompatibility errors. Signed-off-by: Bob Yantosca <[email protected]>
benchmark/1mo_benchmark.yml benchmark/1yr_fullchem_benchmark.yml benchmark/1yr_tt_benchmark.yml - Remove gcc:is_pre_14.0. This is made obsolete by the fix to the get_filepath and get_filepaths routines. Signed-off-by: Bob Yantosca <[email protected]>
Signed-off-by: Bob Yantosca <[email protected]>
Also note the changelog updates: UnreleasedAdded
Changed
|
gcpy/benchmark.py gcpy/budget_ox.py gcpy/budget_tt.py gcpy/ste_flux.py - At the end of each method, manually delete the larger objects (such as xarray Datasets) and call gc.collect() to force garbage collection. This should hopefully prevent problems with the benchmark scripts halting due to exceeding requested memory (which seems to happen if all 1-year benchmark artifacts are requested). Signed-off-by: Bob Yantosca <[email protected]>
CHANGELOG.md - Add info about garbage collection benchmark/plot_driver.sh - Request 4 hours of run time via SLURM Signed-off-by: Bob Yantosca <[email protected]>
@@ -6,7 +6,32 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). | |||
|
|||
## Unreleased | |||
### Added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the changelog for 1.3.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve of these changes, and we can merge into Dev.
Note, there is still a memory issue (see #174) that I believe is related to joblib. The workaround is to submit gcc_vs_gcc, gchp_vs_gcc,, gchp_vs_gchp, and gchp_vs_gcc_diff_of_diffs as separate jobs (at least for the 1-year benchmarks).
This update includes the implementation of diff-of-diff plots for 1-year GCHP benchmarks, among other things. Here is a list of the updates: