From d26894bdc3d80acd117a16a528663bcdf26bab32 Mon Sep 17 00:00:00 2001
From: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
Date: Tue, 22 Feb 2022 19:49:52 -0800
Subject: [PATCH] Move Zarr up in io.rst (#6289)

* Move Zarr up in io.rst

The existing version had it right down the page, below Iris / Pickle /
et al.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
---
 doc/contributing.rst       |   4 +-
 doc/user-guide/io.rst      | 858 ++++++++++++++++++-------------------
 doc/whats-new.rst          |   4 +-
 xarray/core/computation.py |   2 +-
 4 files changed, 434 insertions(+), 434 deletions(-)
diff --git a/doc/contributing.rst b/doc/contributing.rst
index df279caa54f..0913702fd83 100644
--- a/doc/contributing.rst
+++ b/doc/contributing.rst
@@ -274,13 +274,13 @@ Some other important things to know about the docs:
       .. ipython:: python
 
           x = 2
-          x ** 3
+          x**3
 
   will be rendered as::
 
       In [1]: x = 2
 
-      In [2]: x ** 3
+      In [2]: x**3
       Out[2]: 8
 
   Almost all code examples in the docs are run (and the output saved) during the
diff --git a/doc/user-guide/io.rst b/doc/user-guide/io.rst
index 28eeeeda99b..834e9ad2464 100644
--- a/doc/user-guide/io.rst
+++ b/doc/user-guide/io.rst
@@ -498,596 +498,596 @@ and currently raises a warning unless ``invalid_netcdf=True`` is set:
   Note that this produces a file that is likely to be not readable by other netCDF
   libraries!
 
-.. _io.iris:
+.. _io.zarr:
 
-Iris
+Zarr
 ----
 
-The Iris_ tool allows easy reading of common meteorological and climate model formats
-(including GRIB and UK MetOffice PP files) into ``Cube`` objects which are in many ways very
-similar to ``DataArray`` objects, while enforcing a CF-compliant data model. If iris is
-installed, xarray can convert a ``DataArray`` into a ``Cube`` using
-:py:meth:`DataArray.to_iris`:
-
-.. ipython:: python
+`Zarr`_ is a Python package that provides an implementation of chunked, compressed,
+N-dimensional arrays.
+Zarr has the ability to store arrays in a range of ways, including in memory,
+in files, and in cloud-based object storage such as `Amazon S3`_ and
+`Google Cloud Storage`_.
+Xarray's Zarr backend allows xarray to leverage these capabilities, including
+the ability to store and analyze datasets far too large fit onto disk
+(particularly :ref:`in combination with dask <dask>`).
 
-    da = xr.DataArray(
-        np.random.rand(4, 5),
-        dims=["x", "y"],
-        coords=dict(x=[10, 20, 30, 40], y=pd.date_range("2000-01-01", periods=5)),
-    )
+Xarray can't open just any zarr dataset, because xarray requires special
+metadata (attributes) describing the dataset dimensions and coordinates.
+At this time, xarray can only open zarr datasets that have been written by
+xarray. For implementation details, see :ref:`zarr_encoding`.
 
-    cube = da.to_iris()
-    cube
+To write a dataset with zarr, we use the :py:meth:`Dataset.to_zarr` method.
 
-Conversely, we can create a new ``DataArray`` object from a ``Cube`` using
-:py:meth:`DataArray.from_iris`:
+To write to a local directory, we pass a path to a directory:
 
 .. ipython:: python
+    :suppress:
 
-    da_cube = xr.DataArray.from_iris(cube)
-    da_cube
+    ! rm -rf path/to/directory.zarr
 
+.. ipython:: python
 
-.. _Iris: https://scitools.org.uk/iris
+    ds = xr.Dataset(
+        {"foo": (("x", "y"), np.random.rand(4, 5))},
+        coords={
+            "x": [10, 20, 30, 40],
+            "y": pd.date_range("2000-01-01", periods=5),
+            "z": ("x", list("abcd")),
+        },
+    )
+    ds.to_zarr("path/to/directory.zarr")
 
+(The suffix ``.zarr`` is optional--just a reminder that a zarr store lives
+there.) If the directory does not exist, it will be created. If a zarr
+store is already present at that path, an error will be raised, preventing it
+from being overwritten. To override this behavior and overwrite an existing
+store, add ``mode='w'`` when invoking :py:meth:`~Dataset.to_zarr`.
 
-OPeNDAP
--------
+To store variable length strings, convert them to object arrays first with
+``dtype=object``.
 
-Xarray includes support for `OPeNDAP`__ (via the netCDF4 library or Pydap), which
-lets us access large datasets over HTTP.
+To read back a zarr dataset that has been created this way, we use the
+:py:func:`open_zarr` method:
 
-__ https://www.opendap.org/
+.. ipython:: python
 
-For example, we can open a connection to GBs of weather data produced by the
-`PRISM`__ project, and hosted by `IRI`__ at Columbia:
+    ds_zarr = xr.open_zarr("path/to/directory.zarr")
+    ds_zarr
 
-__ https://www.prism.oregonstate.edu/
-__ https://iri.columbia.edu/
+Cloud Storage Buckets
+~~~~~~~~~~~~~~~~~~~~~
 
-.. ipython source code for this section
-   we don't use this to avoid hitting the DAP server on every doc build.
+It is possible to read and write xarray datasets directly from / to cloud
+storage buckets using zarr. This example uses the `gcsfs`_ package to provide
+an interface to `Google Cloud Storage`_.
 
-   remote_data = xr.open_dataset(
-       'http://iridl.ldeo.columbia.edu/SOURCES/.OSU/.PRISM/.monthly/dods',
-       decode_times=False)
-   tmax = remote_data.tmax[:500, ::3, ::3]
-   tmax
+From v0.16.2: general `fsspec`_ URLs are parsed and the store set up for you
+automatically when reading, such that you can open a dataset in a single
+call. You should include any arguments to the storage backend as the
+key ``storage_options``, part of ``backend_kwargs``.
 
-   @savefig opendap-prism-tmax.png
-   tmax[0].plot()
+.. code:: python
 
-.. ipython::
-    :verbatim:
+    ds_gcs = xr.open_dataset(
+        "gcs://<bucket-name>/path.zarr",
+        backend_kwargs={
+            "storage_options": {"project": "<project-name>", "token": None}
+        },
+        engine="zarr",
+    )
 
-    In [3]: remote_data = xr.open_dataset(
-       ...:     "http://iridl.ldeo.columbia.edu/SOURCES/.OSU/.PRISM/.monthly/dods",
-       ...:     decode_times=False,
-       ...: )
 
-    In [4]: remote_data
-    Out[4]:
-    <xarray.Dataset>
-    Dimensions:  (T: 1422, X: 1405, Y: 621)
-    Coordinates:
-      * X        (X) float32 -125.0 -124.958 -124.917 -124.875 -124.833 -124.792 -124.75 ...
-      * T        (T) float32 -779.5 -778.5 -777.5 -776.5 -775.5 -774.5 -773.5 -772.5 -771.5 ...
-      * Y        (Y) float32 49.9167 49.875 49.8333 49.7917 49.75 49.7083 49.6667 49.625 ...
-    Data variables:
-        ppt      (T, Y, X) float64 ...
-        tdmean   (T, Y, X) float64 ...
-        tmax     (T, Y, X) float64 ...
-        tmin     (T, Y, X) float64 ...
-    Attributes:
-        Conventions: IRIDL
-        expires: 1375315200
+This also works with ``open_mfdataset``, allowing you to pass a list of paths or
+a URL to be interpreted as a glob string.
 
-.. TODO: update this example to show off decode_cf?
+For older versions, and for writing, you must explicitly set up a ``MutableMapping``
+instance and pass this, as follows:
 
-.. note::
+.. code:: python
 
-    Like many real-world datasets, this dataset does not entirely follow
-    `CF conventions`_. Unexpected formats will usually cause xarray's automatic
-    decoding to fail. The way to work around this is to either set
-    ``decode_cf=False`` in ``open_dataset`` to turn off all use of CF
-    conventions, or by only disabling the troublesome parser.
-    In this case, we set ``decode_times=False`` because the time axis here
-    provides the calendar attribute in a format that xarray does not expect
-    (the integer ``360`` instead of a string like ``'360_day'``).
+    import gcsfs
 
-We can select and slice this data any number of times, and nothing is loaded
-over the network until we look at particular values:
+    fs = gcsfs.GCSFileSystem(project="<project-name>", token=None)
+    gcsmap = gcsfs.mapping.GCSMap("<bucket-name>", gcs=fs, check=True, create=False)
+    # write to the bucket
+    ds.to_zarr(store=gcsmap)
+    # read it back
+    ds_gcs = xr.open_zarr(gcsmap)
 
-.. ipython::
-    :verbatim:
+(or use the utility function ``fsspec.get_mapper()``).
 
-    In [4]: tmax = remote_data["tmax"][:500, ::3, ::3]
+.. _fsspec: https://filesystem-spec.readthedocs.io/en/latest/
+.. _Zarr: https://zarr.readthedocs.io/
+.. _Amazon S3: https://aws.amazon.com/s3/
+.. _Google Cloud Storage: https://cloud.google.com/storage/
+.. _gcsfs: https://github.com/fsspec/gcsfs
 
-    In [5]: tmax
-    Out[5]:
-    <xarray.DataArray 'tmax' (T: 500, Y: 207, X: 469)>
-    [48541500 values with dtype=float64]
-    Coordinates:
-      * Y        (Y) float32 49.9167 49.7917 49.6667 49.5417 49.4167 49.2917 ...
-      * X        (X) float32 -125.0 -124.875 -124.75 -124.625 -124.5 -124.375 ...
-      * T        (T) float32 -779.5 -778.5 -777.5 -776.5 -775.5 -774.5 -773.5 ...
-    Attributes:
-        pointwidth: 120
-        standard_name: air_temperature
-        units: Celsius_scale
-        expires: 1443657600
+Zarr Compressors and Filters
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-    # the data is downloaded automatically when we make the plot
-    In [6]: tmax[0].plot()
+There are many different options for compression and filtering possible with
+zarr. These are described in the
+`zarr documentation <https://zarr.readthedocs.io/en/stable/tutorial.html#compressors>`_.
+These options can be passed to the ``to_zarr`` method as variable encoding.
+For example:
 
-.. image:: ../_static/opendap-prism-tmax.png
+.. ipython:: python
+    :suppress:
 
-Some servers require authentication before we can access the data. For this
-purpose we can explicitly create a :py:class:`backends.PydapDataStore`
-and pass in a `Requests`__ session object. For example for
-HTTP Basic authentication::
+    ! rm -rf foo.zarr
 
-    import xarray as xr
-    import requests
+.. ipython:: python
 
-    session = requests.Session()
-    session.auth = ('username', 'password')
+    import zarr
 
-    store = xr.backends.PydapDataStore.open('http://example.com/data',
-                                            session=session)
-    ds = xr.open_dataset(store)
+    compressor = zarr.Blosc(cname="zstd", clevel=3, shuffle=2)
+    ds.to_zarr("foo.zarr", encoding={"foo": {"compressor": compressor}})
 
-`Pydap's cas module`__ has functions that generate custom sessions for
-servers that use CAS single sign-on. For example, to connect to servers
-that require NASA's URS authentication::
+.. note::
 
-  import xarray as xr
-  from pydata.cas.urs import setup_session
+    Not all native zarr compression and filtering options have been tested with
+    xarray.
 
-  ds_url = 'https://gpm1.gesdisc.eosdis.nasa.gov/opendap/hyrax/example.nc'
+.. _io.zarr.consolidated_metadata:
 
-  session = setup_session('username', 'password', check_url=ds_url)
-  store = xr.backends.PydapDataStore.open(ds_url, session=session)
+Consolidated Metadata
+~~~~~~~~~~~~~~~~~~~~~
 
-  ds = xr.open_dataset(store)
+Xarray needs to read all of the zarr metadata when it opens a dataset.
+In some storage mediums, such as with cloud object storage (e.g. amazon S3),
+this can introduce significant overhead, because two separate HTTP calls to the
+object store must be made for each variable in the dataset.
+As of xarray version 0.18, xarray by default uses a feature called
+*consolidated metadata*, storing all metadata for the entire dataset with a
+single key (by default called ``.zmetadata``). This typically drastically speeds
+up opening the store. (For more information on this feature, consult the
+`zarr docs <https://zarr.readthedocs.io/en/latest/tutorial.html#consolidating-metadata>`_.)
 
-__ https://docs.python-requests.org
-__ https://www.pydap.org/en/latest/client.html#authentication
+By default, xarray writes consolidated metadata and attempts to read stores
+with consolidated metadata, falling back to use non-consolidated metadata for
+reads. Because this fall-back option is so much slower, xarray issues a
+``RuntimeWarning`` with guidance when reading with consolidated metadata fails:
 
-.. _io.pickle:
+    Failed to open Zarr store with consolidated metadata, falling back to try
+    reading non-consolidated metadata. This is typically much slower for
+    opening a dataset. To silence this warning, consider:
 
-Pickle
-------
+    1. Consolidating metadata in this existing store with
+       :py:func:`zarr.consolidate_metadata`.
+    2. Explicitly setting ``consolidated=False``, to avoid trying to read
+       consolidate metadata.
+    3. Explicitly setting ``consolidated=True``, to raise an error in this case
+       instead of falling back to try reading non-consolidated metadata.
 
-The simplest way to serialize an xarray object is to use Python's built-in pickle
-module:
+.. _io.zarr.appending:
 
-.. ipython:: python
+Appending to existing Zarr stores
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-    import pickle
+Xarray supports several ways of incrementally writing variables to a Zarr
+store. These options are useful for scenarios when it is infeasible or
+undesirable to write your entire dataset at once.
 
-    # use the highest protocol (-1) because it is way faster than the default
-    # text based pickle format
-    pkl = pickle.dumps(ds, protocol=-1)
+.. tip::
 
-    pickle.loads(pkl)
+    If you can load all of your data into a single ``Dataset`` using dask, a
+    single call to ``to_zarr()`` will write all of your data in parallel.
 
-Pickling is important because it doesn't require any external libraries
-and lets you use xarray objects with Python modules like
-:py:mod:`multiprocessing` or :ref:`Dask <dask>`. However, pickling is
-**not recommended for long-term storage**.
+.. warning::
 
-Restoring a pickle requires that the internal structure of the types for the
-pickled data remain unchanged. Because the internal design of xarray is still
-being refined, we make no guarantees (at this point) that objects pickled with
-this version of xarray will work in future versions.
-
-.. note::
+    Alignment of coordinates is currently not checked when modifying an
+    existing Zarr store. It is up to the user to ensure that coordinates are
+    consistent.
 
-  When pickling an object opened from a NetCDF file, the pickle file will
-  contain a reference to the file on disk. If you want to store the actual
-  array values, load it into memory first with :py:meth:`Dataset.load`
-  or :py:meth:`Dataset.compute`.
+To add or overwrite entire variables, simply call :py:meth:`~Dataset.to_zarr`
+with ``mode='a'`` on a Dataset containing the new variables, passing in an
+existing Zarr store or path to a Zarr store.
 
-.. _dictionary io:
+To resize and then append values along an existing dimension in a store, set
+``append_dim``. This is a good option if data always arives in a particular
+order, e.g., for time-stepping a simulation:
 
-Dictionary
-----------
+.. ipython:: python
+    :suppress:
 
-We can convert a ``Dataset`` (or a ``DataArray``) to a dict using
-:py:meth:`Dataset.to_dict`:
+    ! rm -rf path/to/directory.zarr
 
 .. ipython:: python
 
-    d = ds.to_dict()
-    d
+    ds1 = xr.Dataset(
+        {"foo": (("x", "y", "t"), np.random.rand(4, 5, 2))},
+        coords={
+            "x": [10, 20, 30, 40],
+            "y": [1, 2, 3, 4, 5],
+            "t": pd.date_range("2001-01-01", periods=2),
+        },
+    )
+    ds1.to_zarr("path/to/directory.zarr")
+    ds2 = xr.Dataset(
+        {"foo": (("x", "y", "t"), np.random.rand(4, 5, 2))},
+        coords={
+            "x": [10, 20, 30, 40],
+            "y": [1, 2, 3, 4, 5],
+            "t": pd.date_range("2001-01-03", periods=2),
+        },
+    )
+    ds2.to_zarr("path/to/directory.zarr", append_dim="t")
 
-We can create a new xarray object from a dict using
-:py:meth:`Dataset.from_dict`:
+Finally, you can use ``region`` to write to limited regions of existing arrays
+in an existing Zarr store. This is a good option for writing data in parallel
+from independent processes.
+
+To scale this up to writing large datasets, the first step is creating an
+initial Zarr store without writing all of its array data. This can be done by
+first creating a ``Dataset`` with dummy values stored in :ref:`dask <dask>`,
+and then calling ``to_zarr`` with ``compute=False`` to write only metadata
+(including ``attrs``) to Zarr:
 
 .. ipython:: python
+    :suppress:
 
-    ds_dict = xr.Dataset.from_dict(d)
-    ds_dict
+    ! rm -rf path/to/directory.zarr
 
-Dictionary support allows for flexible use of xarray objects. It doesn't
-require external libraries and dicts can easily be pickled, or converted to
-json, or geojson. All the values are converted to lists, so dicts might
-be quite large.
+.. ipython:: python
 
-To export just the dataset schema without the data itself, use the
-``data=False`` option:
+    import dask.array
+
+    # The values of this dask array are entirely irrelevant; only the dtype,
+    # shape and chunks are used
+    dummies = dask.array.zeros(30, chunks=10)
+    ds = xr.Dataset({"foo": ("x", dummies)})
+    path = "path/to/directory.zarr"
+    # Now we write the metadata without computing any array values
+    ds.to_zarr(path, compute=False)
+
+Now, a Zarr store with the correct variable shapes and attributes exists that
+can be filled out by subsequent calls to ``to_zarr``. The ``region`` provides a
+mapping from dimension names to Python ``slice`` objects indicating where the
+data should be written (in index space, not coordinate space), e.g.,
 
 .. ipython:: python
 
-    ds.to_dict(data=False)
+    # For convenience, we'll slice a single dataset, but in the real use-case
+    # we would create them separately possibly even from separate processes.
+    ds = xr.Dataset({"foo": ("x", np.arange(30))})
+    ds.isel(x=slice(0, 10)).to_zarr(path, region={"x": slice(0, 10)})
+    ds.isel(x=slice(10, 20)).to_zarr(path, region={"x": slice(10, 20)})
+    ds.isel(x=slice(20, 30)).to_zarr(path, region={"x": slice(20, 30)})
 
-This can be useful for generating indices of dataset contents to expose to
-search indices or other automated data discovery tools.
+Concurrent writes with ``region`` are safe as long as they modify distinct
+chunks in the underlying Zarr arrays (or use an appropriate ``lock``).
 
-.. ipython:: python
-    :suppress:
+As a safety check to make it harder to inadvertently override existing values,
+if you set ``region`` then *all* variables included in a Dataset must have
+dimensions included in ``region``. Other variables (typically coordinates)
+need to be explicitly dropped and/or written in a separate calls to ``to_zarr``
+with ``mode='a'``.
 
-    import os
+.. _io.iris:
 
-    os.remove("saved_on_disk.nc")
+Iris
+----
 
-.. _io.rasterio:
+The Iris_ tool allows easy reading of common meteorological and climate model formats
+(including GRIB and UK MetOffice PP files) into ``Cube`` objects which are in many ways very
+similar to ``DataArray`` objects, while enforcing a CF-compliant data model. If iris is
+installed, xarray can convert a ``DataArray`` into a ``Cube`` using
+:py:meth:`DataArray.to_iris`:
 
-Rasterio
---------
+.. ipython:: python
 
-GeoTIFFs and other gridded raster datasets can be opened using `rasterio`_, if
-rasterio is installed. Here is an example of how to use
-:py:func:`open_rasterio` to read one of rasterio's `test files`_:
+    da = xr.DataArray(
+        np.random.rand(4, 5),
+        dims=["x", "y"],
+        coords=dict(x=[10, 20, 30, 40], y=pd.date_range("2000-01-01", periods=5)),
+    )
 
-.. deprecated:: 0.20.0
+    cube = da.to_iris()
+    cube
 
-        Deprecated in favor of rioxarray.
-        For information about transitioning, see:
-        https://corteva.github.io/rioxarray/stable/getting_started/getting_started.html
+Conversely, we can create a new ``DataArray`` object from a ``Cube`` using
+:py:meth:`DataArray.from_iris`:
 
-.. ipython::
-    :verbatim:
+.. ipython:: python
 
-    In [7]: rio = xr.open_rasterio("RGB.byte.tif")
+    da_cube = xr.DataArray.from_iris(cube)
+    da_cube
 
-    In [8]: rio
-    Out[8]:
-    <xarray.DataArray (band: 3, y: 718, x: 791)>
-    [1703814 values with dtype=uint8]
-    Coordinates:
-      * band     (band) int64 1 2 3
-      * y        (y) float64 2.827e+06 2.826e+06 2.826e+06 2.826e+06 2.826e+06 ...
-      * x        (x) float64 1.021e+05 1.024e+05 1.027e+05 1.03e+05 1.033e+05 ...
-    Attributes:
-        res:        (300.0379266750948, 300.041782729805)
-        transform:  (300.0379266750948, 0.0, 101985.0, 0.0, -300.041782729805, 28...
-        is_tiled:   0
-        crs:        +init=epsg:32618
 
+.. _Iris: https://scitools.org.uk/iris
 
-The ``x`` and ``y`` coordinates are generated out of the file's metadata
-(``bounds``, ``width``, ``height``), and they can be understood as cartesian
-coordinates defined in the file's projection provided by the ``crs`` attribute.
-``crs`` is a PROJ4 string which can be parsed by e.g. `pyproj`_ or rasterio.
-See :ref:`/examples/visualization_gallery.ipynb#Parsing-rasterio-geocoordinates`
-for an example of how to convert these to longitudes and latitudes.
 
+OPeNDAP
+-------
 
-Additionally, you can use `rioxarray`_ for reading in GeoTiff, netCDF or other
-GDAL readable raster data using `rasterio`_ as well as for exporting to a geoTIFF.
-`rioxarray`_ can also handle geospatial related tasks such as re-projecting and clipping.
+Xarray includes support for `OPeNDAP`__ (via the netCDF4 library or Pydap), which
+lets us access large datasets over HTTP.
 
-.. ipython::
-    :verbatim:
+__ https://www.opendap.org/
 
-    In [1]: import rioxarray
+For example, we can open a connection to GBs of weather data produced by the
+`PRISM`__ project, and hosted by `IRI`__ at Columbia:
 
-    In [2]: rds = rioxarray.open_rasterio("RGB.byte.tif")
+__ https://www.prism.oregonstate.edu/
+__ https://iri.columbia.edu/
 
-    In [3]: rds
-    Out[3]:
-    <xarray.DataArray (band: 3, y: 718, x: 791)>
-    [1703814 values with dtype=uint8]
-    Coordinates:
-      * band         (band) int64 1 2 3
-      * y            (y) float64 2.827e+06 2.826e+06 ... 2.612e+06 2.612e+06
-      * x            (x) float64 1.021e+05 1.024e+05 ... 3.389e+05 3.392e+05
-        spatial_ref  int64 0
-    Attributes:
-        STATISTICS_MAXIMUM:  255
-        STATISTICS_MEAN:     29.947726688477
-        STATISTICS_MINIMUM:  0
-        STATISTICS_STDDEV:   52.340921626611
-        transform:           (300.0379266750948, 0.0, 101985.0, 0.0, -300.0417827...
-        _FillValue:          0.0
-        scale_factor:        1.0
-        add_offset:          0.0
-        grid_mapping:        spatial_ref
+.. ipython source code for this section
+   we don't use this to avoid hitting the DAP server on every doc build.
 
-    In [4]: rds.rio.crs
-    Out[4]: CRS.from_epsg(32618)
+   remote_data = xr.open_dataset(
+       'http://iridl.ldeo.columbia.edu/SOURCES/.OSU/.PRISM/.monthly/dods',
+       decode_times=False)
+   tmax = remote_data.tmax[:500, ::3, ::3]
+   tmax
 
-    In [5]: rds4326 = rds.rio.reproject("epsg:4326")
+   @savefig opendap-prism-tmax.png
+   tmax[0].plot()
 
-    In [6]: rds4326.rio.crs
-    Out[6]: CRS.from_epsg(4326)
+.. ipython::
+    :verbatim:
 
-    In [7]: rds4326.rio.to_raster("RGB.byte.4326.tif")
+    In [3]: remote_data = xr.open_dataset(
+       ...:     "http://iridl.ldeo.columbia.edu/SOURCES/.OSU/.PRISM/.monthly/dods",
+       ...:     decode_times=False,
+       ...: )
 
+    In [4]: remote_data
+    Out[4]:
+    <xarray.Dataset>
+    Dimensions:  (T: 1422, X: 1405, Y: 621)
+    Coordinates:
+      * X        (X) float32 -125.0 -124.958 -124.917 -124.875 -124.833 -124.792 -124.75 ...
+      * T        (T) float32 -779.5 -778.5 -777.5 -776.5 -775.5 -774.5 -773.5 -772.5 -771.5 ...
+      * Y        (Y) float32 49.9167 49.875 49.8333 49.7917 49.75 49.7083 49.6667 49.625 ...
+    Data variables:
+        ppt      (T, Y, X) float64 ...
+        tdmean   (T, Y, X) float64 ...
+        tmax     (T, Y, X) float64 ...
+        tmin     (T, Y, X) float64 ...
+    Attributes:
+        Conventions: IRIDL
+        expires: 1375315200
 
-.. _rasterio: https://rasterio.readthedocs.io/en/latest/
-.. _rioxarray: https://corteva.github.io/rioxarray/stable/
-.. _test files: https://github.com/rasterio/rasterio/blob/master/tests/data/RGB.byte.tif
-.. _pyproj: https://github.com/pyproj4/pyproj
+.. TODO: update this example to show off decode_cf?
 
-.. _io.zarr:
+.. note::
 
-Zarr
-----
+    Like many real-world datasets, this dataset does not entirely follow
+    `CF conventions`_. Unexpected formats will usually cause xarray's automatic
+    decoding to fail. The way to work around this is to either set
+    ``decode_cf=False`` in ``open_dataset`` to turn off all use of CF
+    conventions, or by only disabling the troublesome parser.
+    In this case, we set ``decode_times=False`` because the time axis here
+    provides the calendar attribute in a format that xarray does not expect
+    (the integer ``360`` instead of a string like ``'360_day'``).
 
-`Zarr`_ is a Python package that provides an implementation of chunked, compressed,
-N-dimensional arrays.
-Zarr has the ability to store arrays in a range of ways, including in memory,
-in files, and in cloud-based object storage such as `Amazon S3`_ and
-`Google Cloud Storage`_.
-Xarray's Zarr backend allows xarray to leverage these capabilities, including
-the ability to store and analyze datasets far too large fit onto disk
-(particularly :ref:`in combination with dask <dask>`).
+We can select and slice this data any number of times, and nothing is loaded
+over the network until we look at particular values:
 
-Xarray can't open just any zarr dataset, because xarray requires special
-metadata (attributes) describing the dataset dimensions and coordinates.
-At this time, xarray can only open zarr datasets that have been written by
-xarray. For implementation details, see :ref:`zarr_encoding`.
+.. ipython::
+    :verbatim:
 
-To write a dataset with zarr, we use the :py:meth:`Dataset.to_zarr` method.
+    In [4]: tmax = remote_data["tmax"][:500, ::3, ::3]
 
-To write to a local directory, we pass a path to a directory:
+    In [5]: tmax
+    Out[5]:
+    <xarray.DataArray 'tmax' (T: 500, Y: 207, X: 469)>
+    [48541500 values with dtype=float64]
+    Coordinates:
+      * Y        (Y) float32 49.9167 49.7917 49.6667 49.5417 49.4167 49.2917 ...
+      * X        (X) float32 -125.0 -124.875 -124.75 -124.625 -124.5 -124.375 ...
+      * T        (T) float32 -779.5 -778.5 -777.5 -776.5 -775.5 -774.5 -773.5 ...
+    Attributes:
+        pointwidth: 120
+        standard_name: air_temperature
+        units: Celsius_scale
+        expires: 1443657600
 
-.. ipython:: python
-    :suppress:
+    # the data is downloaded automatically when we make the plot
+    In [6]: tmax[0].plot()
 
-    ! rm -rf path/to/directory.zarr
+.. image:: ../_static/opendap-prism-tmax.png
 
-.. ipython:: python
+Some servers require authentication before we can access the data. For this
+purpose we can explicitly create a :py:class:`backends.PydapDataStore`
+and pass in a `Requests`__ session object. For example for
+HTTP Basic authentication::
 
-    ds = xr.Dataset(
-        {"foo": (("x", "y"), np.random.rand(4, 5))},
-        coords={
-            "x": [10, 20, 30, 40],
-            "y": pd.date_range("2000-01-01", periods=5),
-            "z": ("x", list("abcd")),
-        },
-    )
-    ds.to_zarr("path/to/directory.zarr")
+    import xarray as xr
+    import requests
 
-(The suffix ``.zarr`` is optional--just a reminder that a zarr store lives
-there.) If the directory does not exist, it will be created. If a zarr
-store is already present at that path, an error will be raised, preventing it
-from being overwritten. To override this behavior and overwrite an existing
-store, add ``mode='w'`` when invoking :py:meth:`~Dataset.to_zarr`.
+    session = requests.Session()
+    session.auth = ('username', 'password')
 
-To store variable length strings, convert them to object arrays first with
-``dtype=object``.
+    store = xr.backends.PydapDataStore.open('http://example.com/data',
+                                            session=session)
+    ds = xr.open_dataset(store)
 
-To read back a zarr dataset that has been created this way, we use the
-:py:func:`open_zarr` method:
+`Pydap's cas module`__ has functions that generate custom sessions for
+servers that use CAS single sign-on. For example, to connect to servers
+that require NASA's URS authentication::
 
-.. ipython:: python
+  import xarray as xr
+  from pydata.cas.urs import setup_session
 
-    ds_zarr = xr.open_zarr("path/to/directory.zarr")
-    ds_zarr
+  ds_url = 'https://gpm1.gesdisc.eosdis.nasa.gov/opendap/hyrax/example.nc'
 
-Cloud Storage Buckets
-~~~~~~~~~~~~~~~~~~~~~
+  session = setup_session('username', 'password', check_url=ds_url)
+  store = xr.backends.PydapDataStore.open(ds_url, session=session)
 
-It is possible to read and write xarray datasets directly from / to cloud
-storage buckets using zarr. This example uses the `gcsfs`_ package to provide
-an interface to `Google Cloud Storage`_.
+  ds = xr.open_dataset(store)
 
-From v0.16.2: general `fsspec`_ URLs are parsed and the store set up for you
-automatically when reading, such that you can open a dataset in a single
-call. You should include any arguments to the storage backend as the
-key ``storage_options``, part of ``backend_kwargs``.
+__ https://docs.python-requests.org
+__ https://www.pydap.org/en/latest/client.html#authentication
 
-.. code:: python
+.. _io.pickle:
 
-    ds_gcs = xr.open_dataset(
-        "gcs://<bucket-name>/path.zarr",
-        backend_kwargs={
-            "storage_options": {"project": "<project-name>", "token": None}
-        },
-        engine="zarr",
-    )
+Pickle
+------
 
+The simplest way to serialize an xarray object is to use Python's built-in pickle
+module:
 
-This also works with ``open_mfdataset``, allowing you to pass a list of paths or
-a URL to be interpreted as a glob string.
+.. ipython:: python
 
-For older versions, and for writing, you must explicitly set up a ``MutableMapping``
-instance and pass this, as follows:
+    import pickle
 
-.. code:: python
+    # use the highest protocol (-1) because it is way faster than the default
+    # text based pickle format
+    pkl = pickle.dumps(ds, protocol=-1)
 
-    import gcsfs
+    pickle.loads(pkl)
 
-    fs = gcsfs.GCSFileSystem(project="<project-name>", token=None)
-    gcsmap = gcsfs.mapping.GCSMap("<bucket-name>", gcs=fs, check=True, create=False)
-    # write to the bucket
-    ds.to_zarr(store=gcsmap)
-    # read it back
-    ds_gcs = xr.open_zarr(gcsmap)
+Pickling is important because it doesn't require any external libraries
+and lets you use xarray objects with Python modules like
+:py:mod:`multiprocessing` or :ref:`Dask <dask>`. However, pickling is
+**not recommended for long-term storage**.
 
-(or use the utility function ``fsspec.get_mapper()``).
+Restoring a pickle requires that the internal structure of the types for the
+pickled data remain unchanged. Because the internal design of xarray is still
+being refined, we make no guarantees (at this point) that objects pickled with
+this version of xarray will work in future versions.
 
-.. _fsspec: https://filesystem-spec.readthedocs.io/en/latest/
-.. _Zarr: https://zarr.readthedocs.io/
-.. _Amazon S3: https://aws.amazon.com/s3/
-.. _Google Cloud Storage: https://cloud.google.com/storage/
-.. _gcsfs: https://github.com/fsspec/gcsfs
+.. note::
 
-Zarr Compressors and Filters
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+  When pickling an object opened from a NetCDF file, the pickle file will
+  contain a reference to the file on disk. If you want to store the actual
+  array values, load it into memory first with :py:meth:`Dataset.load`
+  or :py:meth:`Dataset.compute`.
 
-There are many different options for compression and filtering possible with
-zarr. These are described in the
-`zarr documentation <https://zarr.readthedocs.io/en/stable/tutorial.html#compressors>`_.
-These options can be passed to the ``to_zarr`` method as variable encoding.
-For example:
+.. _dictionary io:
 
-.. ipython:: python
-    :suppress:
+Dictionary
+----------
 
-    ! rm -rf foo.zarr
+We can convert a ``Dataset`` (or a ``DataArray``) to a dict using
+:py:meth:`Dataset.to_dict`:
 
 .. ipython:: python
 
-    import zarr
-
-    compressor = zarr.Blosc(cname="zstd", clevel=3, shuffle=2)
-    ds.to_zarr("foo.zarr", encoding={"foo": {"compressor": compressor}})
+    d = ds.to_dict()
+    d
 
-.. note::
+We can create a new xarray object from a dict using
+:py:meth:`Dataset.from_dict`:
 
-    Not all native zarr compression and filtering options have been tested with
-    xarray.
+.. ipython:: python
 
-.. _io.zarr.consolidated_metadata:
+    ds_dict = xr.Dataset.from_dict(d)
+    ds_dict
 
-Consolidated Metadata
-~~~~~~~~~~~~~~~~~~~~~
+Dictionary support allows for flexible use of xarray objects. It doesn't
+require external libraries and dicts can easily be pickled, or converted to
+json, or geojson. All the values are converted to lists, so dicts might
+be quite large.
 
-Xarray needs to read all of the zarr metadata when it opens a dataset.
-In some storage mediums, such as with cloud object storage (e.g. amazon S3),
-this can introduce significant overhead, because two separate HTTP calls to the
-object store must be made for each variable in the dataset.
-As of xarray version 0.18, xarray by default uses a feature called
-*consolidated metadata*, storing all metadata for the entire dataset with a
-single key (by default called ``.zmetadata``). This typically drastically speeds
-up opening the store. (For more information on this feature, consult the
-`zarr docs <https://zarr.readthedocs.io/en/latest/tutorial.html#consolidating-metadata>`_.)
+To export just the dataset schema without the data itself, use the
+``data=False`` option:
 
-By default, xarray writes consolidated metadata and attempts to read stores
-with consolidated metadata, falling back to use non-consolidated metadata for
-reads. Because this fall-back option is so much slower, xarray issues a
-``RuntimeWarning`` with guidance when reading with consolidated metadata fails:
+.. ipython:: python
 
-    Failed to open Zarr store with consolidated metadata, falling back to try
-    reading non-consolidated metadata. This is typically much slower for
-    opening a dataset. To silence this warning, consider:
+    ds.to_dict(data=False)
 
-    1. Consolidating metadata in this existing store with
-       :py:func:`zarr.consolidate_metadata`.
-    2. Explicitly setting ``consolidated=False``, to avoid trying to read
-       consolidate metadata.
-    3. Explicitly setting ``consolidated=True``, to raise an error in this case
-       instead of falling back to try reading non-consolidated metadata.
+This can be useful for generating indices of dataset contents to expose to
+search indices or other automated data discovery tools.
 
-.. _io.zarr.appending:
+.. ipython:: python
+    :suppress:
 
-Appending to existing Zarr stores
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    import os
 
-Xarray supports several ways of incrementally writing variables to a Zarr
-store. These options are useful for scenarios when it is infeasible or
-undesirable to write your entire dataset at once.
+    os.remove("saved_on_disk.nc")
 
-.. tip::
+.. _io.rasterio:
 
-    If you can load all of your data into a single ``Dataset`` using dask, a
-    single call to ``to_zarr()`` will write all of your data in parallel.
+Rasterio
+--------
 
-.. warning::
+GeoTIFFs and other gridded raster datasets can be opened using `rasterio`_, if
+rasterio is installed. Here is an example of how to use
+:py:func:`open_rasterio` to read one of rasterio's `test files`_:
 
-    Alignment of coordinates is currently not checked when modifying an
-    existing Zarr store. It is up to the user to ensure that coordinates are
-    consistent.
+.. deprecated:: 0.20.0
 
-To add or overwrite entire variables, simply call :py:meth:`~Dataset.to_zarr`
-with ``mode='a'`` on a Dataset containing the new variables, passing in an
-existing Zarr store or path to a Zarr store.
+        Deprecated in favor of rioxarray.
+        For information about transitioning, see:
+        https://corteva.github.io/rioxarray/stable/getting_started/getting_started.html
 
-To resize and then append values along an existing dimension in a store, set
-``append_dim``. This is a good option if data always arives in a particular
-order, e.g., for time-stepping a simulation:
+.. ipython::
+    :verbatim:
 
-.. ipython:: python
-    :suppress:
+    In [7]: rio = xr.open_rasterio("RGB.byte.tif")
 
-    ! rm -rf path/to/directory.zarr
+    In [8]: rio
+    Out[8]:
+    <xarray.DataArray (band: 3, y: 718, x: 791)>
+    [1703814 values with dtype=uint8]
+    Coordinates:
+      * band     (band) int64 1 2 3
+      * y        (y) float64 2.827e+06 2.826e+06 2.826e+06 2.826e+06 2.826e+06 ...
+      * x        (x) float64 1.021e+05 1.024e+05 1.027e+05 1.03e+05 1.033e+05 ...
+    Attributes:
+        res:        (300.0379266750948, 300.041782729805)
+        transform:  (300.0379266750948, 0.0, 101985.0, 0.0, -300.041782729805, 28...
+        is_tiled:   0
+        crs:        +init=epsg:32618
 
-.. ipython:: python
 
-    ds1 = xr.Dataset(
-        {"foo": (("x", "y", "t"), np.random.rand(4, 5, 2))},
-        coords={
-            "x": [10, 20, 30, 40],
-            "y": [1, 2, 3, 4, 5],
-            "t": pd.date_range("2001-01-01", periods=2),
-        },
-    )
-    ds1.to_zarr("path/to/directory.zarr")
-    ds2 = xr.Dataset(
-        {"foo": (("x", "y", "t"), np.random.rand(4, 5, 2))},
-        coords={
-            "x": [10, 20, 30, 40],
-            "y": [1, 2, 3, 4, 5],
-            "t": pd.date_range("2001-01-03", periods=2),
-        },
-    )
-    ds2.to_zarr("path/to/directory.zarr", append_dim="t")
+The ``x`` and ``y`` coordinates are generated out of the file's metadata
+(``bounds``, ``width``, ``height``), and they can be understood as cartesian
+coordinates defined in the file's projection provided by the ``crs`` attribute.
+``crs`` is a PROJ4 string which can be parsed by e.g. `pyproj`_ or rasterio.
+See :ref:`/examples/visualization_gallery.ipynb#Parsing-rasterio-geocoordinates`
+for an example of how to convert these to longitudes and latitudes.
 
-Finally, you can use ``region`` to write to limited regions of existing arrays
-in an existing Zarr store. This is a good option for writing data in parallel
-from independent processes.
 
-To scale this up to writing large datasets, the first step is creating an
-initial Zarr store without writing all of its array data. This can be done by
-first creating a ``Dataset`` with dummy values stored in :ref:`dask <dask>`,
-and then calling ``to_zarr`` with ``compute=False`` to write only metadata
-(including ``attrs``) to Zarr:
+Additionally, you can use `rioxarray`_ for reading in GeoTiff, netCDF or other
+GDAL readable raster data using `rasterio`_ as well as for exporting to a geoTIFF.
+`rioxarray`_ can also handle geospatial related tasks such as re-projecting and clipping.
 
-.. ipython:: python
-    :suppress:
+.. ipython::
+    :verbatim:
 
-    ! rm -rf path/to/directory.zarr
+    In [1]: import rioxarray
 
-.. ipython:: python
+    In [2]: rds = rioxarray.open_rasterio("RGB.byte.tif")
 
-    import dask.array
+    In [3]: rds
+    Out[3]:
+    <xarray.DataArray (band: 3, y: 718, x: 791)>
+    [1703814 values with dtype=uint8]
+    Coordinates:
+      * band         (band) int64 1 2 3
+      * y            (y) float64 2.827e+06 2.826e+06 ... 2.612e+06 2.612e+06
+      * x            (x) float64 1.021e+05 1.024e+05 ... 3.389e+05 3.392e+05
+        spatial_ref  int64 0
+    Attributes:
+        STATISTICS_MAXIMUM:  255
+        STATISTICS_MEAN:     29.947726688477
+        STATISTICS_MINIMUM:  0
+        STATISTICS_STDDEV:   52.340921626611
+        transform:           (300.0379266750948, 0.0, 101985.0, 0.0, -300.0417827...
+        _FillValue:          0.0
+        scale_factor:        1.0
+        add_offset:          0.0
+        grid_mapping:        spatial_ref
 
-    # The values of this dask array are entirely irrelevant; only the dtype,
-    # shape and chunks are used
-    dummies = dask.array.zeros(30, chunks=10)
-    ds = xr.Dataset({"foo": ("x", dummies)})
-    path = "path/to/directory.zarr"
-    # Now we write the metadata without computing any array values
-    ds.to_zarr(path, compute=False)
+    In [4]: rds.rio.crs
+    Out[4]: CRS.from_epsg(32618)
 
-Now, a Zarr store with the correct variable shapes and attributes exists that
-can be filled out by subsequent calls to ``to_zarr``. The ``region`` provides a
-mapping from dimension names to Python ``slice`` objects indicating where the
-data should be written (in index space, not coordinate space), e.g.,
+    In [5]: rds4326 = rds.rio.reproject("epsg:4326")
 
-.. ipython:: python
+    In [6]: rds4326.rio.crs
+    Out[6]: CRS.from_epsg(4326)
 
-    # For convenience, we'll slice a single dataset, but in the real use-case
-    # we would create them separately possibly even from separate processes.
-    ds = xr.Dataset({"foo": ("x", np.arange(30))})
-    ds.isel(x=slice(0, 10)).to_zarr(path, region={"x": slice(0, 10)})
-    ds.isel(x=slice(10, 20)).to_zarr(path, region={"x": slice(10, 20)})
-    ds.isel(x=slice(20, 30)).to_zarr(path, region={"x": slice(20, 30)})
+    In [7]: rds4326.rio.to_raster("RGB.byte.4326.tif")
 
-Concurrent writes with ``region`` are safe as long as they modify distinct
-chunks in the underlying Zarr arrays (or use an appropriate ``lock``).
 
-As a safety check to make it harder to inadvertently override existing values,
-if you set ``region`` then *all* variables included in a Dataset must have
-dimensions included in ``region``. Other variables (typically coordinates)
-need to be explicitly dropped and/or written in a separate calls to ``to_zarr``
-with ``mode='a'``.
+.. _rasterio: https://rasterio.readthedocs.io/en/latest/
+.. _rioxarray: https://corteva.github.io/rioxarray/stable/
+.. _test files: https://github.com/rasterio/rasterio/blob/master/tests/data/RGB.byte.tif
+.. _pyproj: https://github.com/pyproj4/pyproj
 
 .. _io.cfgrib:
 
diff --git a/doc/whats-new.rst b/doc/whats-new.rst
index 37abf931eb4..aa48bd619e8 100644
--- a/doc/whats-new.rst
+++ b/doc/whats-new.rst
@@ -5133,7 +5133,7 @@ Enhancements
   .. ipython:: python
 
       ds = xray.Dataset(coords={"x": range(100), "y": range(100)})
-      ds["distance"] = np.sqrt(ds.x ** 2 + ds.y ** 2)
+      ds["distance"] = np.sqrt(ds.x**2 + ds.y**2)
 
       @savefig where_example.png width=4in height=4in
       ds.distance.where(ds.distance < 100).plot()
@@ -5341,7 +5341,7 @@ Enhancements
   .. ipython:: python
 
       ds = xray.Dataset({"y": ("x", [1, 2, 3])})
-      ds.assign(z=lambda ds: ds.y ** 2)
+      ds.assign(z=lambda ds: ds.y**2)
       ds.assign_coords(z=("x", ["a", "b", "c"]))
 
   These methods return a new Dataset (or DataArray) with updated data or
diff --git a/xarray/core/computation.py b/xarray/core/computation.py
index 88eefbdc441..c11bd1a78a4 100644
--- a/xarray/core/computation.py
+++ b/xarray/core/computation.py
@@ -944,7 +944,7 @@ def apply_ufunc(
     Calculate the vector magnitude of two arguments:
 
     >>> def magnitude(a, b):
-    ...     func = lambda x, y: np.sqrt(x ** 2 + y ** 2)
+    ...     func = lambda x, y: np.sqrt(x**2 + y**2)
     ...     return xr.apply_ufunc(func, a, b)
     ...