Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation #365

Merged
merged 10 commits into from
Oct 14, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 61 additions & 23 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,42 +45,72 @@
.. |Checked with mypy| image:: http://www.mypy-lang.org/static/mypy_badge.svg
:target: http://mypy-lang.org/

xCDAT is an extension of `xarray`_ for climate data analysis on structured grids. It serves as a spiritual successor to the Community Data Analysis Tools (`CDAT`_) library.
xCDAT is an extension of `xarray`_ for climate data analysis on structured grids. It
serves as the spiritual successor to the Community Data Analysis Tools (`CDAT`_)
library.

The goal of xCDAT is to provide generalizable climate domain features and general utilities in xarray, which includes porting some core CDAT functionalities. xCDAT leverages several powerful libraries in the xarray ecosystem (e.g., `xESMF`_ and `cf_xarray`_) to deliver robust APIs. The xCDAT core team is aiming to provide a maintainable and extensible package that serves the needs of the climate community in the long-term.
The goal of xCDAT is to provide generalizable climate domain features and utilities
that streamline the developer experience for data analysis code. xCDAT's design
philosophy is to reduce the complexity and overhead required by the user to accomplish
specific tasks in xarray. Some xCDAT features are inspired by or ported from core CDAT
functionalities, while others leverage powerful libraries in the xarray ecosystem
(e.g., `xESMF`_ and `cf_xarray`_) to deliver robust APIs.

A major design philosophy of xCDAT is streamlining the user experience while developing code to analyze climate data. This means reducing the complexity and number of lines required to achieve certain features with xarray.
The xCDAT core team's mission is to provide a maintainable and extensible package
that serves the needs of the climate community in the long-term. We are excited
to be working on this project and hope to have you onboard!

.. _xarray: https://github.com/pydata/xarray
.. _CDAT: https://github.com/CDAT/cdat


Getting Started
---------------

The best resource for getting started with xCDAT is via our `documentation page <https://xcdat.readthedocs.io/en/latest/>`__. There, we provide guidance for setting up your environment on your `computer <https://xcdat.readthedocs.io/en/latest/getting-started.html>`_ generally or on an `HPC/Jupyter <https://xcdat.readthedocs.io/en/latest/getting-started-hpc-jupyter.html>`_ environment. We also include an `API Overview <https://xcdat.readthedocs.io/en/latest/api.html>`_ and `Gallery <https://xcdat.readthedocs.io/en/latest/gallery.html>`_ to highlight xCDAT functionality.
The best resource for getting started with xCDAT is via our `documentation page`_.
There, we provide general guidance for setting up your Anaconda environment on your
local `computer`_ or on an `HPC/Jupyter`_ environment. We also include an
`API Overview`_ and `Gallery`_ to highlight xCDAT functionality.

xCDAT invites discussion on version releases, architecture, new feature suggestions, and
other topics on the `GitHub Discussions`_ page. Users and contributors can also view and
open issues on our `GitHub Issue Tracker`_.

xCDAT invites discussion on version releases, architecture, new feature suggestions, and other topics on the `GitHub discussion <https://github.com/xCDAT/xcdat/discussions>`_ page. Users and contributors can also view and open issues on our `GitHub Issue Tracker <https://github.com/xCDAT/xcdat/issues>`_.
We welcome and appreciate contributions to xCDAT. If you'd like to help improve xCDAT,
please checkout our `Contributing Guide`_.

We welcome and appreciate contributions to xCDAT. If you'd like to help improve xCDAT, please checkout our `Contributing Guide <https://xcdat.readthedocs.io/en/latest/contributing.html>`_.
.. _documentation page: https://xcdat.readthedocs.io/en/latest/
.. _computer: https://xcdat.readthedocs.io/en/latest/getting-started.html
.. _HPC/Jupyter: https://xcdat.readthedocs.io/en/latest/getting-started-hpc-jupyter.html
.. _API Overview: https://xcdat.readthedocs.io/en/latest/api.html
.. _Gallery: https://xcdat.readthedocs.io/en/latest/gallery.html
.. _GitHub Discussions: https://github.com/xCDAT/xcdat/discussions
.. _GitHub Issue Tracker: https://github.com/xCDAT/xcdat/issues
.. _Contributing Guide: https://xcdat.readthedocs.io/en/latest/contributing.html

Stay up to Date with Releases
-----------------------------
xCDAT (released as ``xcdat``) follows a rapid release cycle with continuous
integration/continuous deployment. This means releases are made relatively frequently
based on the importance of commits. The xCDAT core team reviews commits every two weeks
to determine if they warrant a release.
integration/continuous deployment. As a result, releases are performed relatively
frequently based on the priority of commits. The xCDAT core team reviews commits every
two weeks to determine if they warrant a release.

After releases are performed on `GitHub Releases`_, the corresponding ``xcdat`` package
version will be available to download through `conda-forge`_ within 30 minutes to 1
hour.

To be notified of releases through GitHub:

1. Go to the ``xcdat`` repository homepage on GitHub (https://github.com/xCDAT/xcdat)
2. Click the "Watch" button on the upper right-hand corner of the page.

.. image:: /docs/_static/github-watch-releases-1.png
.. image:: docs/_static/github-watch-releases-1.png

3. Click "Custom" and checkmark "Releases", then click "Apply".

.. image:: /docs/_static/github-watch-releases-2.png
.. image:: docs/_static/github-watch-releases-2.png

.. _conda-forge: https://anaconda.org/conda-forge/xcdat
.. _GitHub Releases: https://anaconda.org/conda-forge/xcdat

Available Features
------------------
Expand All @@ -92,8 +122,8 @@ Available Features
kept if they exist)
* Optional decoding of time coordinates

* In addition to CF time units, also decodes common non-CF time units ("months since ...",
"years since ...")
* In addition to CF time units, also decodes common non-CF time units
("months since ...", "years since ...")

* Optional centering of time coordinates using time bounds
* Optional conversion of longitudinal axis orientation between [0, 360) and [-180, 180)
Expand All @@ -106,12 +136,12 @@ Available Features

* Geospatial weighted averaging

* Support rectilinear grid
* Supports rectilinear grid
* Optional specification of regional domain

* Horizontal structured regridding

* Support rectilinear and cuvilinear grids
* Supports rectilinear and curvilinear grids
* Python implementation of `regrid2`_ for handling cartesian latitude longitude grids
* API that wraps `xESMF`_

Expand All @@ -120,28 +150,29 @@ Planned Features

* Vertical structured regridding

* Support rectilinear and cuvilinear grids
* Support rectilinear and curvilinear grids

Things we are striving for:
---------------------------

* Support for CF compliant, E3SM non-CF compliant, and common metadata
* xCDAT supports CF compliant datasets, but will also strive to support datasets with
common non-CF compliant metadata (e.g., time units in "months since ..." or "years
since ...")

* xCDAT primarily focuses on datasets that follow the `CF convention`_.
* xCDAT leverages `cf_xarray`_ to interpret CF attributes on ``xarray`` objects
* Accomodations for specific non-CF compliant situations will be considered on a
case-by-case basis.
* Refer to `CF Convention`_ for more information on CF attributes

* Robust handling of dimensions and their coordinates and coordinate bounds

* Coordinate variables are retrieved with ``cf_xarray`` using CF axis names or
coordinate names found in xarray object attributes. Refer to :ref:`Metadata Interpretation`.
coordinate names found in xarray object attributes. Refer to `Metadata Interpretation`_
for more information.
* Bounds are retrieved with ``cf_xarray`` using the ``"bounds"`` attr
* Ability to operate on both longitudinal axis orientations, [0, 360) and [-180, 180)

* Support for parallelism using `dask`_ where it is both possible and makes sense

.. _Metadata Interpretation: docs/faqs.rst#metadata-interpretation
.. _regrid2: https://cdms.readthedocs.io/en/latest/regrid2.html
.. _xESMF: https://pangeo-xesmf.readthedocs.io/en/latest/
.. _dask: https://dask.org/
Expand All @@ -151,7 +182,14 @@ Things we are striving for:
Acknowledgement
---------------

This software is jointly developed by scientists and developers from the Energy Exascale Earth System Model (`E3SM`_) Project and Program for Climate Model Diagnosis and Intercomparison (`PCMDI`_). The work is performed for the E3SM project, which is sponsored by Earth System Model Development (`ESMD`_) program, and the Simplifying ESM Analysis Through Standards (SEATS) project, which is sponsored by the Regional and Global Model Analysis (`RGMA`_) program. ESMD and RGMA are programs for the Earth and Environmental Systems Sciences Division (`EESSD`_) in the Office of Biological and Environmental Research (`BER`_) within the `Department of Energy`_'s `Office of Science`_.
This software is jointly developed by scientists and developers from the Energy Exascale
Earth System Model (`E3SM`_) Project and Program for Climate Model Diagnosis and
Intercomparison (`PCMDI`_). The work is performed for the E3SM project, which is
sponsored by Earth System Model Development (`ESMD`_) program, and the Simplifying ESM
Analysis Through Standards (SEATS) project, which is sponsored by the Regional and
Global Model Analysis (`RGMA`_) program. ESMD and RGMA are programs for the Earth and
Environmental Systems Sciences Division (`EESSD`_) in the Office of Biological and
Environmental Research (`BER`_) within the `Department of Energy`_'s `Office of Science`_.

.. _E3SM: https://e3sm.org/
.. _PCMDI: https://pcmdi.llnl.gov/
Expand Down
5 changes: 3 additions & 2 deletions docs/examples/climatology-and-departures.ipynb
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Calculating Climatology and Departures from Monthly Time Series\n",
"===============================================================\n",
"Calculating Climatology and Departures from Time Series Data\n",
"============================================================\n",
"\n",
"Author: [Tom Vo](https://github.com/tomvothecoder/)\n",
"\n",
Expand Down
45 changes: 28 additions & 17 deletions docs/faqs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,23 @@
Frequently Asked Questions
==========================

.. _Metadata Interpretation:

Metadata Interpretation
-----------------------

What types of datasets does ``xcdat`` primarily focus on?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``xcdat`` primarily focuses on datasets that follow the `CF convention`_.
``xcdat`` supports datasets that follow the `CF convention`_, but will also strive
to support datasets with common non-CF compliant metadata (e.g., time units in
"months since ..." or "years since ...").

.. _CF convention: http://cfconventions.org/

How does ``xcdat`` interpret dataset metadata?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``xcdat`` leverages `cf_xarray`_ to interpret CF attributes on ``xarray`` objects.
``xcdat`` methods and functions usually accept an ``axis`` argument (e.g.,
``ds.temporal.average(data_var="ts", axis="T")``). This argument is internally mapped to ``cf_xarray`` mapping tables that interpret the CF attributes.
``ds.temporal.average(data_var="ts", axis="T")``). This argument is internally mapped to
``cf_xarray`` mapping tables that interpret the CF attributes.

.. _cf_xarray: https://cf-xarray.readthedocs.io/en/latest/index.html

Expand Down Expand Up @@ -51,29 +52,39 @@ What CF attributes are interpreted using ``cf_xarray`` mapping tables?
Data Wrangling
--------------

``xcdat`` aims to implement generalized functionality. This means that functionality intended to handle data quality issues is out of scope, especially for limited cases.
``xcdat`` aims to implement generalized functionality. This means that functionality
intended to handle data quality issues is out of scope, especially for limited cases.

If data quality issues are present, ``xarray`` and ``xcdat`` might not be able to open the datasets.
Examples of data quality issues include conflicting floating point values between files or non-CF compliant attributes.
If data quality issues are present, ``xarray`` and ``xcdat`` might not be able to open
the datasets. Examples of data quality issues include conflicting floating point values
between files or non-CF compliant attributes that are not common.

A few workarounds include:

1. Configuring ``open_dataset()`` or ``open_mfdataset()`` keyword arguments based on your needs.
2. Writing a custom ``preprocess()`` function to feed into ``open_mfdataset()``. This function preprocesses each dataset file individually before joining them into a single Dataset object.

How do I open a multifile dataset with values that conflict?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In ``xarray``, the default setting for checking compatibility across a multifile dataset is ``compat='no_conflicts'``.
If conflicting values exists between files, xarray raises ``MergeError: conflicting values for variable <VARIABLE NAME> on objects to be combined. You can skip this check by specifying compat="override".``

If you still intend on working with these datasets and recognize the source of the issue (e.g., minor floating point diffs), follow the instructions below.
1. Configuring ``open_dataset()`` or ``open_mfdataset()`` keyword arguments based on
your needs.
2. Writing a custom ``preprocess()`` function to feed into ``open_mfdataset()``. This
function preprocesses each dataset file individually before joining them into a single
Dataset object.

How do I open a multi-file dataset with values that conflict?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In ``xarray``, the default setting for checking compatibility across a multi-file dataset
is ``compat='no_conflicts'``. If conflicting values exists between files, xarray raises
``MergeError: conflicting values for variable <VARIABLE NAME> on objects to be combined.
You can skip this check by specifying compat="override".``

If you still intend on working with these datasets and recognize the source of the issue
(e.g., minor floating point diffs), follow the instructions below.
**Please understand the potential implications before proceeding!**

.. code-block:: python

>>> xcdat.open_mfdataset("path/to/files/*.nc", compat="override", join="override")

1. ``compat="override"``: skip comparing and pick variable from first dataset
2. ``join="override"``: if indexes are of same size, rewrite indexes to be those of the first object with that dimension. Indexes for the same dimension must have the same size in all objects.
2. ``join="override"``: if indexes are of same size, rewrite indexes to be those of the
first object with that dimension. Indexes for the same dimension must have the same
size in all objects.

For more information, visit this page: https://xarray.pydata.org/en/stable/generated/xarray.open_mfdataset.html#xarray-open-mfdataset
4 changes: 3 additions & 1 deletion docs/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ Prerequisites

1. Familiarity with ``xarray``, since this package is an extension of it

- Please visit the `xarray documentation`_ to get started.
- We highly recommend visiting the `xarray tutorial`_ and `xarray documentation`_
pages if you aren't familiar with ``xarray``.

2. xCDAT is distributed through conda, which is available through Anaconda and Miniconda.

Expand All @@ -30,6 +31,7 @@ Prerequisites
restart your shell (if you use bash). Alternatively, you can log out and
log back in.

.. _xarray tutorial: https://tutorial.xarray.dev/intro.html
.. _xarray documentation: https://docs.xarray.dev/en/stable/getting-started-guide/index.html

Installation
Expand Down
Loading