Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata for JRA55 #286

Open
wants to merge 54 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
ae5f0be
start modifying the data structure
simone-silvestri Jul 25, 2024
42be4e9
going on...
simone-silvestri Jul 25, 2024
1dab6e2
comment
simone-silvestri Jul 26, 2024
9d11295
add some changes
simone-silvestri Aug 1, 2024
27ebee4
Merge remote-tracking branch 'origin/main' into ss/metadata-for-every…
simone-silvestri Nov 8, 2024
7f4af88
in their folder
simone-silvestri Nov 8, 2024
8d9401e
modernize
simone-silvestri Nov 8, 2024
a7a3268
start changing stuff
simone-silvestri Nov 8, 2024
f8e7fe7
starting a bit
simone-silvestri Nov 8, 2024
6deb3b8
Merge remote-tracking branch 'origin/main' into ss/metadata-for-every…
simone-silvestri Dec 4, 2024
fdef030
let's go
simone-silvestri Dec 5, 2024
f4d63a1
correct alias
simone-silvestri Dec 5, 2024
a130e73
correct dates
simone-silvestri Dec 5, 2024
c90b7f9
Merge remote-tracking branch 'origin/main' into ss/metadata-for-every…
simone-silvestri Dec 23, 2024
08b377f
syntax
simone-silvestri Dec 23, 2024
2eb5556
Merge remote-tracking branch 'origin/main' into ss/metadata-for-every…
simone-silvestri Feb 12, 2025
ba4bce0
more updates
simone-silvestri Feb 12, 2025
68dfdb0
all this should work?
simone-silvestri Feb 12, 2025
f212e88
clear comments
simone-silvestri Feb 12, 2025
1a68d4e
vestigial code
simone-silvestri Feb 12, 2025
c318e54
works
simone-silvestri Feb 12, 2025
79b5935
veeery much simplified
simone-silvestri Feb 12, 2025
91e1169
this part is already done
simone-silvestri Feb 12, 2025
b6fb198
back to working
simone-silvestri Feb 12, 2025
e81bdfb
works
simone-silvestri Feb 12, 2025
afbd1cc
improve show method
simone-silvestri Feb 12, 2025
8599314
change names
simone-silvestri Feb 12, 2025
0501f79
remove vestigial code
simone-silvestri Feb 12, 2025
9a85ffe
tests should pass
simone-silvestri Feb 12, 2025
7514f40
no bang for download_dataset
simone-silvestri Feb 12, 2025
90db285
fix tests
simone-silvestri Feb 12, 2025
056f025
fixed all tests?
simone-silvestri Feb 12, 2025
89588ef
more usefull comments
simone-silvestri Feb 12, 2025
b701eec
fix a couple of stuff
simone-silvestri Feb 12, 2025
15172b4
probably correct now
simone-silvestri Feb 12, 2025
be7ae17
last fix
simone-silvestri Feb 12, 2025
dff8b77
remove extra line
simone-silvestri Feb 13, 2025
dc8f393
add a default architecture
simone-silvestri Feb 13, 2025
4190fd2
correct default
simone-silvestri Feb 13, 2025
8fadb77
fixed all tests
simone-silvestri Feb 13, 2025
0dcff3f
another bugfix
simone-silvestri Feb 13, 2025
031932d
Merge remote-tracking branch 'origin/main' into ss/metadata-for-every…
simone-silvestri Feb 13, 2025
787a3d5
churning on
simone-silvestri Feb 13, 2025
973cd01
Merge remote-tracking branch 'origin/main' into ss/metadata-for-every…
simone-silvestri Feb 14, 2025
00ef5b0
this should be ready to merge
simone-silvestri Feb 14, 2025
6dc49b1
add correct tests
simone-silvestri Feb 14, 2025
c64d5a2
fix tests
simone-silvestri Feb 14, 2025
e4e720d
fix tests
simone-silvestri Feb 15, 2025
16d2b17
Merge branch 'main' into ss/metadata-for-everything
simone-silvestri Feb 15, 2025
6f0f211
Merge remote-tracking branch 'origin/main' into ss/metadata-for-every…
simone-silvestri Feb 16, 2025
1ece381
Merge branch 'main' into ss/metadata-for-everything
simone-silvestri Feb 21, 2025
0e4c0e5
updTE CLIMA COMMON
simone-silvestri Feb 21, 2025
807a63f
Merge branch 'ss/metadata-for-everything' of github.com:CliMA/ClimaOc…
simone-silvestri Feb 21, 2025
c76900d
Merge branch 'main' into ss/metadata-for-everything
simone-silvestri Feb 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .buildkite/examples_build.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
agents:
queue: new-central
slurm_mem: 8G
modules: climacommon/2024_05_27
modules: climacommon/2024_10_10
slurm_time: 48:00:00

env:
Expand Down
8 changes: 4 additions & 4 deletions .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,9 @@ steps:
- wait

- label: "CPU JRA55 tests"
key: "cpu_jra55_tests"
key: "cpu_JRA55_tests"
env:
TEST_GROUP: "jra55"
TEST_GROUP: "JRA55"
commands:
- "julia --project -e 'using Pkg; Pkg.test()'"
agents:
Expand Down Expand Up @@ -78,9 +78,9 @@ steps:
slurm_ntasks: 1

- label: "GPU JRA55 tests"
key: "gpu_jra55_tests"
key: "gpu_JRA55_tests"
env:
TEST_GROUP: "jra55"
TEST_GROUP: "JRA55"
GPU_TEST: "true"
commands:
- "julia --project -e 'using Pkg; Pkg.test()'"
Expand Down
6 changes: 3 additions & 3 deletions examples/generate_atmos_dataset.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ using JLD2

time_indices = 1:1

qt = ClimaOcean.JRA55.JRA55_field_time_series(:specific_humidity; time_indices)
Tt = ClimaOcean.JRA55.JRA55_field_time_series(:temperature; time_indices)
pt = ClimaOcean.JRA55.JRA55_field_time_series(:sea_level_pressure; time_indices)
qt = ClimaOcean.JRA55.JRA55FieldTimeSeries(:specific_humidity; time_indices)
Tt = ClimaOcean.JRA55.JRA55FieldTimeSeries(:temperature; time_indices)
pt = ClimaOcean.JRA55.JRA55FieldTimeSeries(:sea_level_pressure; time_indices)

Nx, Ny, Nz = size(qt[1])

Expand Down
5 changes: 2 additions & 3 deletions examples/inspect_JRA55_data.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,8 @@ using Oceananigans
using Oceananigans.Units
using Printf

time_indices = Colon()
Qswt = ClimaOcean.JRA55.JRA55_field_time_series(:downwelling_shortwave_radiation; time_indices)
rht = ClimaOcean.JRA55.JRA55_field_time_series(:relative_humidity; time_indices)
Qswt = ClimaOcean.JRA55.JRA55FieldTimeSeries(:downwelling_shortwave_radiation)
rht = ClimaOcean.JRA55.JRA55FieldTimeSeries(:relative_humidity)

function lonlat2xyz(lons::AbstractVector, lats::AbstractVector)
x = [cosd(lat) * cosd(lon) for lon in lons, lat in lats]
Expand Down
2 changes: 1 addition & 1 deletion src/ClimaOcean.jl
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ export
exponential_z_faces,
PowerLawStretching, LinearStretching,
exponential_z_faces,
JRA55_field_time_series,
JRA55FieldTimeSeries,
ECCO_field,
ECCOMetadata,
ECCORestoring,
Expand Down
9 changes: 7 additions & 2 deletions src/DataWrangling/DataWrangling.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
module DataWrangling

export Metadata

using Oceananigans
using Downloads
using Printf
Expand All @@ -13,6 +15,8 @@ using Oceananigans: pretty_filesize, location
using Oceananigans.Utils: launch!
using KernelAbstractions: @kernel, @index

using ClimaOcean.DistributedUtils

#####
##### Downloading utilities
#####
Expand Down Expand Up @@ -119,11 +123,12 @@ function save_field_time_series!(fts; path, name, overwrite_existing=false)
return nothing
end

include("metadata.jl")
include("inpaint_mask.jl")
include("JRA55.jl")
include("JRA55/JRA55.jl")
include("ECCO/ECCO.jl")

using .JRA55
using .ECCO
using .JRA55

end # module
2 changes: 1 addition & 1 deletion src/DataWrangling/ECCO/ECCO.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ export ECCO2Monthly, ECCO4Monthly, ECCO2Daily
export ECCOFieldTimeSeries, ECCORestoring, LinearlyTaperedPolarMask

using ClimaOcean
using ClimaOcean.DistributedUtils: @root
using ClimaOcean.DistributedUtils
using ClimaOcean.DataWrangling
using ClimaOcean.DataWrangling: inpaint_mask!, NearestNeighborInpainting, download_progress
using ClimaOcean.InitialConditions: three_dimensional_regrid!, interpolate!
Expand Down
4 changes: 2 additions & 2 deletions src/DataWrangling/ECCO/ECCO_mask.jl
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ function ECCO_mask(metadata, architecture = CPU();
end

# Default
ECCO_mask(arch::AbstractArchitecture=CPU()) = ECCO_mask(ECCOMetadata(:temperature), arch)
ECCO_mask(arch::AbstractArchitecture=CPU()) = ECCO_mask(Metadata(:temperature, version=ECCO4Monthly()), arch)

@kernel function _set_ECCO2_mask!(mask, Tᵢ, minimum_value, maximum_value)
i, j, k = @index(Global, NTuple)
Expand Down Expand Up @@ -62,7 +62,7 @@ function ECCO_immersed_grid(metadata, architecture = CPU())
end

# Default
ECCO_immersed_grid(arch::AbstractArchitecture=CPU()) = ECCO_immersed_grid(ECCOMetadata(:temperature), arch)
ECCO_immersed_grid(arch::AbstractArchitecture=CPU()) = ECCO_immersed_grid(Metadata(:temperature, version=ECCO4Monthly()), arch)

@kernel function _set_height_from_mask!(bottom, grid, mask)
i, j = @index(Global, NTuple)
Expand Down
95 changes: 7 additions & 88 deletions src/DataWrangling/ECCO/ECCO_metadata.jl
Original file line number Diff line number Diff line change
@@ -1,42 +1,22 @@
using CFTime
using Dates
using ClimaOcean.DataWrangling
using ClimaOcean.DataWrangling: netrc_downloader
using ClimaOcean.DataWrangling: netrc_downloader, metadata_path

import Dates: year, month, day

using Base: @propagate_inbounds
using Downloads

import Oceananigans.Fields: set!, location
import Base
import ClimaOcean.DataWrangling: all_dates, metadata_filename, default_download_folder

struct ECCO2Monthly end
struct ECCO2Daily end
struct ECCO4Monthly end

"""
struct ECCOMetadata{D, V}
const ECCOMetadata{D, V} = Metadata{D, V} where {D, V<:Union{<:ECCO2Monthly, <:ECCO2Daily, <:ECCO4Monthly}}

Metadata information for an ECCO dataset:
- `name`: The name of the dataset.
- `dates`: The dates of the dataset, in an `AbstractCFDateTime` format.
- `version`: The version of the dataset, could be `ECCO2Monthly`, `ECCO2Daily`, or `ECCO4Monthly`.
- `dir`: The directory where the dataset is stored.
"""
struct ECCOMetadata{D, V}
name :: Symbol
dates :: D
version :: V
dir :: String
end

Base.show(io::IO, metadata::ECCOMetadata) =
print(io, "ECCOMetadata:", '\n',
"├── name: $(metadata.name)", '\n',
"├── dates: $(metadata.dates)", '\n',
"├── version: $(metadata.version)", '\n',
"└── dir: $(metadata.dir)")
default_download_folder(::Union{<:ECCO2Monthly, <:ECCO2Daily, <:ECCO4Monthly}) = download_ECCO_cache

Base.summary(md::ECCOMetadata{<:Any, <:ECCO2Daily}) = "ECCO2Daily $(md.name) metadata ($(first(md.dates))--$(last(md.dates)))"
Base.summary(md::ECCOMetadata{<:Any, <:ECCO2Monthly}) = "ECCO2Monthly $(md.name) metadata ($(first(md.dates))--$(last(md.dates)))"
Expand All @@ -46,78 +26,18 @@ Base.summary(md::ECCOMetadata{<:AbstractCFDateTime, <:ECCO2Daily}) = "ECCO2Dai
Base.summary(md::ECCOMetadata{<:AbstractCFDateTime, <:ECCO2Monthly}) = "ECCO2Monthly $(md.name) metadata at $(md.dates)"
Base.summary(md::ECCOMetadata{<:AbstractCFDateTime, <:ECCO4Monthly}) = "ECCO4Monthly $(md.name) metadata at $(md.dates)"

"""
ECCOMetadata(name::Symbol;
dates = DateTimeProlepticGregorian(1993, 1, 1),
version = ECCO4Monthly(),
dir = download_ECCO_cache)

Construct an `ECCOMetadata` object with the specified parameters.

Arguments
=========
- `name::Symbol`: The name of the metadata.

Keyword Arguments
=================
- `dates`: The date(s) of the metadata. Note this can either be a single date,
representing a snapshot, or a range of dates, representing a time-series.
Default: `DateTimeProlepticGregorian(1993, 1, 1)`.

- `version`: The data version. Supported versions are `ECCO2Monthly()`, `ECCO2Daily()`,
or `ECCO4Monthly()`.

- `dir`: The directory of the data file. Default: `download_ECCO_cache`.
"""
function ECCOMetadata(name::Symbol;
dates = DateTimeProlepticGregorian(1993, 1, 1),
version = ECCO4Monthly(),
dir = download_ECCO_cache)

return ECCOMetadata(name, dates, version, dir)
end

ECCOMetadata(name::Symbol, date, version=ECCO4Monthly(); dir=download_ECCO_cache) =
ECCOMetadata(name, date, version, dir)

# Treat ECCOMetadata as an array to allow iteration over the dates.
Base.eltype(metadata::ECCOMetadata) = Base.eltype(metadata.dates)

@propagate_inbounds Base.getindex(m::ECCOMetadata, i::Int) = ECCOMetadata(m.name, m.dates[i], m.version, m.dir)
@propagate_inbounds Base.first(m::ECCOMetadata) = ECCOMetadata(m.name, m.dates[1], m.version, m.dir)
@propagate_inbounds Base.last(m::ECCOMetadata) = ECCOMetadata(m.name, m.dates[end], m.version, m.dir)

@inline function Base.iterate(m::ECCOMetadata, i=1)
if (i % UInt) - 1 < length(m)
return ECCOMetadata(m.name, m.dates[i], m.version, m.dir), i + 1
else
return nothing
end
end

Base.axes(metadata::ECCOMetadata{<:AbstractCFDateTime}) = 1
Base.first(metadata::ECCOMetadata{<:AbstractCFDateTime}) = metadata
Base.last(metadata::ECCOMetadata{<:AbstractCFDateTime}) = metadata
Base.iterate(metadata::ECCOMetadata{<:AbstractCFDateTime}) = (metadata, nothing)
Base.iterate(::ECCOMetadata{<:AbstractCFDateTime}, ::Any) = nothing

Base.length(metadata::ECCOMetadata) = length(metadata.dates)
Base.size(data::ECCOMetadata{<:Any, <:ECCO2Daily}) = (1440, 720, 50, length(data.dates))
Base.size(data::ECCOMetadata{<:Any, <:ECCO2Monthly}) = (1440, 720, 50, length(data.dates))
Base.size(data::ECCOMetadata{<:Any, <:ECCO4Monthly}) = (720, 360, 50, length(data.dates))

Base.length(metadata::ECCOMetadata{<:AbstractCFDateTime}) = 1
Base.size(::ECCOMetadata{<:AbstractCFDateTime, <:ECCO2Daily}) = (1440, 720, 50, 1)
Base.size(::ECCOMetadata{<:AbstractCFDateTime, <:ECCO2Monthly}) = (1440, 720, 50, 1)
Base.size(::ECCOMetadata{<:AbstractCFDateTime, <:ECCO4Monthly}) = (720, 360, 50, 1)

# The whole range of dates in the different dataset versions
all_ECCO_dates(::ECCO4Monthly) = DateTimeProlepticGregorian(1992, 1, 1) : Month(1) : DateTimeProlepticGregorian(2023, 12, 1)
all_ECCO_dates(::ECCO2Monthly) = DateTimeProlepticGregorian(1992, 1, 1) : Month(1) : DateTimeProlepticGregorian(2023, 12, 1)
all_ECCO_dates(::ECCO2Daily) = DateTimeProlepticGregorian(1992, 1, 4) : Day(1) : DateTimeProlepticGregorian(2023, 12, 31)

# File names of metadata containing multiple dates
metadata_filename(metadata) = [metadata_filename(metadatum) for metadatum in metadata]
all_dates(::ECCO4Monthly) = DateTimeProlepticGregorian(1992, 1, 1) : Month(1) : DateTimeProlepticGregorian(2023, 12, 1)
all_dates(::ECCO2Monthly) = DateTimeProlepticGregorian(1992, 1, 1) : Month(1) : DateTimeProlepticGregorian(2023, 12, 1)
all_dates(::ECCO2Daily) = DateTimeProlepticGregorian(1992, 1, 4) : Day(3) : DateTimeProlepticGregorian(2023, 12, 31)

# File name generation specific to each Dataset version
function metadata_filename(metadata::ECCOMetadata{<:AbstractCFDateTime, <:ECCO4Monthly})
Expand All @@ -142,7 +62,6 @@ function metadata_filename(metadata::ECCOMetadata{<:AbstractCFDateTime})
end

# Convenience functions
metadata_path(metadata) = joinpath(metadata.dir, metadata_filename(metadata))
short_name(data::ECCOMetadata{<:Any, <:ECCO2Daily}) = ECCO2_short_names[data.name]
short_name(data::ECCOMetadata{<:Any, <:ECCO2Monthly}) = ECCO2_short_names[data.name]
short_name(data::ECCOMetadata{<:Any, <:ECCO4Monthly}) = ECCO4_short_names[data.name]
Expand Down
43 changes: 8 additions & 35 deletions src/DataWrangling/ECCO/ECCO_restoring.jl
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ using JLD2

using Dates: Second
using ClimaOcean: stateindex
using ClimaOcean.DataWrangling: NearestNeighborInpainting
using ClimaOcean.DataWrangling: NearestNeighborInpainting, native_times

import Oceananigans.Fields: set!
import Oceananigans.Forcings: regularize_forcing
Expand Down Expand Up @@ -78,33 +78,6 @@ function set!(fts::ECCOFieldTimeSeries)
return nothing
end

"""
ECCO_times(metadata; start_time = first(metadata).dates)

Extract the time values from the given metadata and calculates the time difference
from the start time.

Arguments
=========
- `metadata`: The metadata containing the date information.
- `start_time`: The start time for calculating the time difference. Defaults to the first date in the metadata.

Returns
=======
An array of time differences in seconds.
"""
function ECCO_times(metadata; start_time = first(metadata).dates)
times = zeros(length(metadata))
for (t, data) in enumerate(metadata)
date = data.dates
time = date - start_time
time = Second(time).value
times[t] = time
end

return times
end

"""
ECCOFieldTimeSeries(metadata::ECCOMetadata [, arch_or_grid=CPU() ];
time_indices_in_memory = 2,
Expand Down Expand Up @@ -156,7 +129,7 @@ function ECCOFieldTimeSeries(metadata::ECCOMetadata, grid::AbstractGrid;
inpainting isa Int && (inpainting = NearestNeighborInpainting(inpainting))
backend = ECCONetCDFBackend(time_indices_in_memory, metadata; on_native_grid, inpainting, cache_inpainted_data)

times = ECCO_times(metadata)
times = native_times(metadata)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this used to be called dates, is it times now?

loc = LX, LY, LZ = location(metadata)
boundary_conditions = FieldBoundaryConditions(grid, loc)
fts = FieldTimeSeries{LX, LY, LZ}(grid, times; backend, time_indexing, boundary_conditions)
Expand All @@ -167,11 +140,11 @@ end

function ECCOFieldTimeSeries(variable_name::Symbol, version=ECCO4Monthly();
architecture = CPU(),
dates = all_ECCO_dates(version),
dates = all_dates(version),
dir = download_ECCO_cache,
kw...)

metadata = ECCOMetadata(variable_name, dates, version, dir)
metadata = Metadata(variable_name, dates, version, dir)
return ECCOFieldTimeSeries(metadata, architecture; kw...)
end

Expand Down Expand Up @@ -246,7 +219,7 @@ end
"""
ECCORestoring(variable_name::Symbol, [ arch_or_grid = CPU(), ];
version = ECCO4Monthly(),
dates = all_ECCO_dates(version),
dates = all_dates(version),
time_indices_in_memory = 2,
time_indexing = Cyclical(),
mask = 1,
Expand Down Expand Up @@ -292,7 +265,7 @@ Keyword Arguments

- `version`: The version of the ECCO dataset. Default: `ECCO4Monthly()`.

- `dates`: The dates to use for the ECCO dataset. Default: `all_ECCO_dates(version)`.
- `dates`: The dates to use for the ECCO dataset. Default: `all_dates(version)`.

- `time_indices_in_memory`: The number of time indices to keep in memory. The number is chosen based on
a trade-off between increased performance (more indices in memory) and reduced
Expand All @@ -315,11 +288,11 @@ Keyword Arguments
function ECCORestoring(variable_name::Symbol,
arch_or_grid = CPU();
version = ECCO4Monthly(),
dates = all_ECCO_dates(version),
dates = all_dates(version),
dir = download_ECCO_cache,
kw...)

metadata = ECCOMetadata(variable_name, dates, version, dir)
metadata = Metadata(variable_name, dates, version, dir)
return ECCORestoring(metadata, arch_or_grid; kw...)
end

Expand Down
Loading