-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add get_srml iotools function; deprecate read_srml_month_from_solardat #1779
Conversation
I haven't read this PR in detail, but FYI... there was some reason that we went through fetch gymnastics with SRML when we added it to pvlib and solar forecast arbiter. I'd make sure that the equivalent of this test passes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd make sure that the equivalent of this test passes.
This PR delegates the year/month business to pd.date_range
instead of DIYing like the Arbiter does, so I think we can probably get away without that level of testing, although considering what happens when end < start
may be worthwhile... :)
except urllib.error.HTTPError: | ||
warnings.warn(f"The following file was not found: {f}") | ||
|
||
data = pd.concat(dfs, axis='rows') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about following this with a data = data.loc[start:end]
? Right now I get back a full month even if I request just a single day, which isn't really ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not opposed to this suggestion, although we may run into some issues related to timezone. I think this is why the start
/end
parameters in Solar Forecast Arbiter are required to be timezone localized - that seems like a hassle though.
There already exist several functions in pvlib that return a full month when requesting a single day btw, e.g. get_bsrn
, and probably others too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that's a good point. Up to you on what is best here. If we keep the current behavior of returning complete months, might be worth a note in the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not opposed to this suggestion, although we may run into some issues related to timezone. I think this is why the start/end parameters in Solar Forecast Arbiter are required to be timezone localized - that seems like a hassle though.
This sounds right. And a complicating factor for SRML was nighttime 0s in the future if we requested a day from the current month.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a raw data file for the current month that shows -999 and similar for future dates.
Here's a raw data file that includes -999 and similar as well as 0s
This caused a problem in SFA SolarArbiter/solarforecastarbiter-core#572 and it's reasonable to expect that it would cause a problem with other user code. I don't know if that's pvlib's problem to solve, but I think it's somewhat more likely to come up with this new function that accepts datetimes instead of entire months.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how the new function makes this more of an issue, currently, I would still request the same months with the old function, but it would just be more manual work.
I do think that it deserves a Warning entry and perhaps we can also implement a line that cuts off future data? For example:
data = data.loc[:pd.Timestamp.today(), :]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think many users would have different expectations of the new function that accepts start, end datetimes than the old function that accepts a year and a month.
Thanks for adding the warning. I'd rather see the previously rejected start:end slicing than .today() slicing. I'm also fine with just adding the warning and seeing if users complain.
Co-authored-by: Kevin Anderson <[email protected]>
except urllib.error.HTTPError: | ||
warnings.warn(f"The following file was not found: {f}") | ||
|
||
data = pd.concat(dfs, axis='rows') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that's a good point. Up to you on what is best here. If we keep the current behavior of returning complete months, might be worth a note in the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple very minor things below, but otherwise LGTM!
|
||
|
||
def get_srml(station, start, end, filetype='PO', map_variables=True, | ||
url="http://solardat.uoregon.edu/download/Archive/"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about minor consistency details, I guess we could standardize the order of map_variables
and url
. Looking through other functions, here's what I see:
map_variables
first:get_cams
url
first:get_psm3
,get_pvgis_hourly
,get_pvgis_tmy
#1767 currently does map_variables
first but of course can be changed easily.
I don't see one as much better than the other. I guess I'd favor map_variables
first since I think it is very uncommon that people will want to mess with url
, but maybe map_variables=False
has some somewhat common uses. @AdamRJensen what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually thought about discussing this for your PR, but I guess I don't have a strong opinion except for consistency. But the logic of having URL last makes sense to me - it's just more of a hassle changing this as there are more existing functions with URL first.
I opened #1791 based on this discussion fyi.
Co-authored-by: Kevin Anderson <[email protected]>
Seems like the stated concerns are adequately addressed at this point, so I will go ahead and merge. Thanks @AdamRJensen and @wholmgren :) |
Thanks @AdamRJensen and @kandersolar! |
docs/sphinx/source/reference
for API changes.docs/sphinx/source/whatsnew
for all changes. Includes link to the GitHub Issue with:issue:`num`
or this Pull Request with:pull:`num`
. Includes contributor name and/or GitHub username (link with:ghuser:`user`
).remote-data
) and Milestone are assigned to the Pull Request and linked Issue.This PR adds a function
pvlib.iotools.get_srml
which eventually will replace the currentpvlib.iotools.read_srml_month_from_solardat
which does not follow the iotools convention.The new function features a
start
/end
parameter which allows for retrieving multiple months of data. The function also returns a tuple of (data, metadata) and exposes theurl
parameter.Ideally, the
pvlib.iotools.read_srml_month_from_solardat
will begin its deprecation cycle in this PR.