Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADD] EnbPI #135

Merged
merged 37 commits into from
Jun 13, 2022
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
a6b1a2b
[ADD] EnbPI
Feb 8, 2022
14552cb
[CHANGE] partial_fit -> partial_update
Feb 11, 2022
e23a4d1
GMA & VTA's first remarks taken into account
Feb 28, 2022
db21938
I made a confusion deleting the doc oc regression.py...
Feb 28, 2022
666b8df
ADD examples/regression/plot_timeseries_enbpi.py
Feb 28, 2022
b23484a
Implement 3 predict method
Mar 4, 2022
ac1a94b
[commit before update with master]
Mar 21, 2022
2a058e9
Merge branch 'master' into enbpi
Mar 21, 2022
8749fcb
unit test OK, example not OK
Mar 21, 2022
ff12460
after tyoe checking
Mar 22, 2022
85c936f
correct typing
Mar 22, 2022
948384c
Merge branch 'fix_typing' into enbpi
Mar 28, 2022
07578cf
ALL TESTS OK
Mar 28, 2022
b5b46d6
make lint PASS
Mar 28, 2022
8a807e9
maked_quantile not perfect, yet
Mar 30, 2022
98faf3d
PR EnbPI completed
Apr 5, 2022
ed39cca
EnbPI ready!!
Apr 6, 2022
59b7505
3 methods for predict
Apr 13, 2022
1242263
Merge branch 'master' into enbpi
Apr 13, 2022
81c4679
[REFACTO]
Apr 14, 2022
1c1e8b0
[CORRECT]
Apr 14, 2022
cf23034
[CORRECT] Documentation
Apr 14, 2022
76eb321
prune enbpi commit
May 6, 2022
0017fd3
all test pass after enbpi pruning
May 6, 2022
57a07c2
Add notebook with TS changepoint
May 30, 2022
c06b0e6
Take VTA remarks into account
Jun 1, 2022
79a5cc0
Merge branch 'enbpi' of https://github.com/scikit-learn-contrib/MAPIE…
Jun 1, 2022
98a87f6
all test pass after remarks of VTA integration
Jun 1, 2022
8590a9c
the maximun test coverage is reached after remarks integration
Jun 1, 2022
9cff9bf
a bug to be fixed
Jun 1, 2022
a7c5c57
an import error add to be corrected in the documentation
Jun 7, 2022
247e26d
VTA remarks taken into account
Jun 8, 2022
e1edc5f
a bug fixed in enbpi example
Jun 9, 2022
2af9f36
Last remarks of VTA taken into account
Jun 13, 2022
f747fee
Merge branch 'master' into enbpi
Jun 13, 2022
c3234fa
merge with master before push for release
Jun 13, 2022
3f779ec
plot_enbpi so nice
Jun 13, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ __pycache__/
*.py[cod]
*$py.class
.DS_Store
.mypy*

# C extensions
*.so
Expand Down
1 change: 1 addition & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ History
0.3.3 (2022-XX-XX)
------------------
* Relax and fix typing
* Add EnbPI method for Time Series Regression

0.3.2 (2022-03-11)
------------------
Expand Down
5 changes: 5 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,11 @@ MAPIE methods belong to the field of conformal inference.
"Uncertainty Sets for Image Classifiers using Conformal Prediction."
International Conference on Learning Representations 2021.

[6] Chen Xu, Yao Xie.
"Conformal prediction for dynamic time-series"
https://arxiv.org/abs/2010.09107


📝 License
==========

Expand Down
Binary file modified doc/images/quickstart_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
intervals, through the `sklearn.model_selection.KFold()` object.
Residuals are therefore estimated using models trained on data with higher
indices than the validation data, which is inappropriate for time-series data.
Howerver, using a `sklearn.model_selection.TimeSeriesSplit` cross validation
However, using a `sklearn.model_selection.TimeSeriesSplit` cross validation
object for estimating the residuals breaks the theoretical guarantees of the
Jackknife+ and CV+ methods.
"""
Expand Down
212 changes: 212 additions & 0 deletions examples/regression/2-advanced-analysis/plot_timeseries_enbpi.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
"""
==================================================================
Estimating prediction intervals of time series forecast with EnbPI
==================================================================

This example uses
:class:`mapie.time_series_regression.MapieTimeSeriesRegressor` to estimate
prediction intervals associated with time series forecast. It follows [6].

We use here the Victoria electricity demand dataset used in the book
"Forecasting: Principles and Practice" by R. J. Hyndman and G. Athanasopoulos.
The electricity demand features daily and weekly seasonalities and is impacted
by the temperature, considered here as a exogeneous variable.

A Random Forest model is already fitted on data. The hyper-parameters are
optimized with a :class:`sklearn.model_selection.RandomizedSearchCV` using a
sequential :class:`sklearn.model_selection.TimeSeriesSplit` cross validation,
in which the training set is prior to the validation set.
The best model is then feeded into
:class:`mapie.time_series_regression.MapieTimeSeriesRegressor` to estimate the
associated prediction intervals. We compare four approaches: with or without
``partial_fit`` called at every step, and following [6]. It appears that
``partial_fit`` offer higher coverage, but with higher width of PIs and is much
slower.
"""

import numpy as np
import pandas as pd
from matplotlib import pylab as plt
from scipy.stats import randint
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import RandomizedSearchCV, TimeSeriesSplit

from mapie.metrics import (
regression_coverage_score,
regression_mean_width_score,
)
from mapie.subsample import BlockBootstrap
from mapie.time_series_regression import MapieTimeSeriesRegressor

# Load input data and feature engineering
url_file = ("https://raw.githubusercontent.com/scikit-learn-contrib/MAPIE/" +
"master/examples/data/demand_temperature.csv"
)
demand_df = pd.read_csv(url_file, parse_dates=True, index_col=0)

demand_df["Date"] = pd.to_datetime(demand_df.index)
demand_df["Weekofyear"] = demand_df.Date.dt.isocalendar().week.astype("int64")
demand_df["Weekday"] = demand_df.Date.dt.isocalendar().day.astype("int64")
demand_df["Hour"] = demand_df.index.hour
n_lags = 5
for hour in range(1, n_lags):
demand_df[f"Lag_{hour}"] = demand_df["Demand"].shift(hour)

# Train/validation/test split
num_test_steps = 24 * 7
demand_train = demand_df.iloc[:-num_test_steps, :].copy()
demand_test = demand_df.iloc[-num_test_steps:, :].copy()
features = ["Weekofyear", "Weekday", "Hour", "Temperature"] + [
f"Lag_{hour}" for hour in range(1, n_lags)
]

X_train = demand_train.loc[
~np.any(demand_train[features].isnull(), axis=1), features
]
y_train = demand_train.loc[X_train.index, "Demand"]
X_test = demand_test.loc[:, features]
y_test = demand_test["Demand"]

perform_hyperparameters_search = False
if perform_hyperparameters_search:
# CV parameter search
n_iter = 100
n_splits = 5
tscv = TimeSeriesSplit(n_splits=n_splits)
random_state = 59
rf_model = RandomForestRegressor(random_state=random_state)
rf_params = {"max_depth": randint(2, 30), "n_estimators": randint(10, 100)}
cv_obj = RandomizedSearchCV(
rf_model,
param_distributions=rf_params,
n_iter=n_iter,
cv=tscv,
scoring="neg_root_mean_squared_error",
random_state=random_state,
verbose=0,
n_jobs=-1,
)
cv_obj.fit(X_train, y_train)
model = cv_obj.best_estimator_
else:
# Model: Random Forest previously optimized with a cross-validation
model = RandomForestRegressor(
max_depth=10, n_estimators=50, random_state=59
)

# Estimate prediction intervals on test set with best estimator
alpha = 0.05
cv_mapietimeseries = BlockBootstrap(
n_resamplings=100, length=48, overlapping=True, random_state=59
)

mapie_enpbi = MapieTimeSeriesRegressor(
model,
method="enbpi",
cv=cv_mapietimeseries,
agg_function="mean",
n_jobs=-1,
)

print("EnbPI, with no partial_fit, width optimization")
mapie_enpbi = mapie_enpbi.fit(X_train, y_train)
y_pred_npfit_enbpi, y_pis_npfit_enbpi = mapie_enpbi.predict(
X_test, alpha=alpha, ensemble=True, optimize_beta=True
)
coverage_npfit_enbpi = regression_coverage_score(
y_test, y_pis_npfit_enbpi[:, 0, 0], y_pis_npfit_enbpi[:, 1, 0]
)

width_npfit_enbpi = regression_mean_width_score(
y_pis_npfit_enbpi[:, 1, 0], y_pis_npfit_enbpi[:, 0, 0]
)

print("EnbPI with partial_fit, width optimization")
mapie_enpbi = mapie_enpbi.fit(X_train, y_train)
y_pred_pfit_enbpi = np.zeros(y_pred_npfit_enbpi.shape)
y_pis_pfit_enbpi = np.zeros(y_pis_npfit_enbpi.shape)

step_size = 1
(
y_pred_pfit_enbpi[:step_size],
y_pis_pfit_enbpi[:step_size, :, :],
) = mapie_enpbi.predict(
X_test.iloc[:step_size, :], alpha=alpha, ensemble=True, optimize_beta=True
)

for step in range(step_size, len(X_test), step_size):
mapie_enpbi.partial_fit(
X_test.iloc[(step - step_size):step, :],
y_test.iloc[(step - step_size):step],
)
(
y_pred_pfit_enbpi[step:step + step_size],
y_pis_pfit_enbpi[step:step + step_size, :, :],
) = mapie_enpbi.predict(
X_test.iloc[step:(step + step_size), :],
alpha=alpha,
ensemble=True,
optimize_beta=True,
)
coverage_pfit_enbpi = regression_coverage_score(
y_test, y_pis_pfit_enbpi[:, 0, 0], y_pis_pfit_enbpi[:, 1, 0]
)
width_pfit_enbpi = regression_mean_width_score(
y_pis_pfit_enbpi[:, 1, 0], y_pis_pfit_enbpi[:, 0, 0]
)

# Print results
print(
"Coverage / prediction interval width mean for MapieTimeSeriesRegressor: "
"\nEnbPI without any partial_fit:"
f"{coverage_npfit_enbpi :.3f}, {width_npfit_enbpi:.3f}"
)
print(
"Coverage / prediction interval width mean for MapieTimeSeriesRegressor: "
"\nEnbPI with partial_fit:"
f"{coverage_pfit_enbpi:.3f}, {width_pfit_enbpi:.3f}"
)

# Plot estimated prediction intervals on test set
fig, (ax1, ax2) = plt.subplots(
nrows=2, ncols=1, figsize=(30, 25), sharey="row", sharex="col"
)

for ax in [ax1, ax2]:
ax.set_ylabel("Hourly demand (GW)")
ax.plot(demand_test.Demand, lw=2, label="Test data", c="C1")

ax1.plot(
demand_test.index, y_pred_npfit_enbpi, lw=2, c="C2", label="Predictions"
)
ax1.fill_between(
demand_test.index,
y_pis_npfit_enbpi[:, 0, 0],
y_pis_npfit_enbpi[:, 1, 0],
color="C2",
alpha=0.2,
label="MapieTimeSeriesRegressor PIs",
)
ax1.set_title(
"EnbPI, without partial_fit.\n"
f"Coverage:{coverage_npfit_enbpi:.3f} Width:{width_npfit_enbpi:.3f}"
)

ax2.plot(
demand_test.index, y_pred_pfit_enbpi, lw=2, c="C2", label="Predictions"
)
ax2.fill_between(
demand_test.index,
y_pis_pfit_enbpi[:, 0, 0],
y_pis_pfit_enbpi[:, 1, 0],
color="C2",
alpha=0.2,
label="MapieTimeSeriesRegressor PIs",
)
ax2.set_title(
"EnbPI with partial_fit.\n"
f"Coverage:{coverage_pfit_enbpi:.3f} Width:{width_pfit_enbpi:.3f}"
)

ax1.legend()
plt.show()
23 changes: 23 additions & 0 deletions mapie/_compatibility.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,31 @@ def np_quantile_version_above_122(
return np.quantile(a, q, method=method, **kwargs) # type: ignore


def np_nanquantile_version_below_122(
a: ArrayLike,
q: ArrayLike,
method: str = "linear",
**kwargs: Any
) -> NDArray:
"""Wrapper of np.quantile function for numpy version < 1.22."""
return np.nanquantile(a, q, interpolation=method, **kwargs) # type: ignore


def np_nanquantile_version_above_122(
a: ArrayLike,
q: ArrayLike,
method: str = "linear",
**kwargs: Any
) -> NDArray:
"""Wrapper of np.quantile function for numpy version >= 1.22."""
return np.nanquantile(a, q, method=method, **kwargs) # type: ignore


numpy_version = parse_version(np.__version__)
if numpy_version < parse_version("1.22"):
np_quantile = np_quantile_version_below_122
np_nanquantile = np_nanquantile_version_below_122

else:
np_quantile = np_quantile_version_above_122
np_nanquantile = np_nanquantile_version_above_122
Loading