-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
variables_style_rules.csv could use an update #1253
Comments
Add |
I really dig this idea. Ensuring consistent naming is one place we're not doing great in and this would be a good help I think. |
I understand this PR is talking about updating the user guide Variables and Symbols page entries with what is used in pvlib, but after running the script I also found there are multiple duplicate parameter names in slightly different forms, e.g. Do you guys think it's worth a PR to unify these names? I think it would be worth it at least to make the function parameters consistent with what we already have in the user guide. I could review the list in more detail and make a start on this in a PR. Unifying parameter names that aren't already in the user guide could be a separate piece of work as that may require more discussion to reach a consensus on a name. |
It would be great to see a list of parameter name variants you found. In the past, a deprecation process has been followed when a parameter name has changed. I don't think we want to rename in one large PR. We've done that in small bites. #894 and probably other issues are relevant. |
It's a painstaking process cross checking everything (parameter map, user guide, function docstring) so the table below is incomplete, but it gives a few examples 😅 I understand there may be some reasons for some of the differences (I haven't tracked every related issue / definition at this stage) but I did check most definitions at least, e.g. "current" is definitely the same as "photocurrent" in this case (not short-circuit current or anything else). The table below is a summary of what I have found. I hope it helps. Note this does not include terms that aren't in the user guide that potentially should be, just terms for which there is an overlap either between a function and the user guide, or between functions (with or without a user guide entry)
*not sure whether it's ac or dc, not obvious unless you dive into the references/source code, either way there's a still a duplicate somewhere The starred points are a few examples of where terms require further clarification. Some definitions are even misleading (in my opinion) e.g. There are also instances where two parameters have a similar meaning but have no overlap, e.g.
Ah of course, I had not thought about the required deprecation process but that does make sense. Good point, thanks for explaining. |
|
These are the most common parameter names unlisted in the glossary: Details
A total of 469 unique unknown parameters were found. They make for a total of 983 ocurrences.
The most common unknown parameters are:
- map_variables: 26
- model: 20
- filename: 15
- start: 14
- end: 14
- gcr: 13
- url: 13
- times: 12
- x: 11
- delta_t: 11
- time: 10
- altitude: 9
- zenith: 9
- method: 9
- nNsVth: 9
- min_cos_zenith: 8
- max_zenith: 8
- temperature: 8
- numthreads: 8
- datetime_or_doy: 7
- axis_azimuth: 7
- julian_ephemeris_century: 7
- timeout: 7
- api_key: 7
- d2mutau: 6
- NsVbi: 6
- unixtime: 6
- coefficients: 6
- b: 5
- a: 5
- data: 5
- current: 5
- breakdown_factor: 5
- breakdown_voltage: 5
- breakdown_exp: 5
- atmos_refract: 5
- dayofyear: 5
- observer_latitude: 5
- irrad_ref: 5
- module_efficiency: 5
- return_components: 4
- xtol: 4
- v_dc: 4
- p_dc: 4
- voltage: 4
- axis_tilt: 4
- true_ecliptic_obliquity: 4
- lat: 4
- lon: 4
- height: 4
- pitch: 4
- station: 4
- coerce_year: 4
- alpha_sc: 4
- cells_in_series: 4
- temp_ref: 4
- wavelength: 4
- module_type: 4
- aod500: 3
- surface_type: 3
- normalize: 3
- inverter: 3
- model_perez: 3
- use_delta_kt_prime: 3
- location: 3
- name: 3
- weather: 3
- clearsky_index: 3
- slant_height: 3
- method_kwargs: 3
- i: 3
- how: 3
- horizon: 3
- declination: 3
- julian_day: 3
- jme: 3
- x0: 3
- x1: 3
- longitude_nutation: 3
- local_hour_angle: 3
- parallax_sun_right_ascension: 3
- elev: 3
- temp: 3
- sst: 3
- esd: 3
- u0: 3
- u1: 3
- noct: 3
- angle: 3
- number: 3
- npoints: 3
- vectorize: 3
- logical_records: 3
- fbuf: 3
- encoding: 3
- integrated: 3
- label: 3
- EgRef: 3
- coeff: 2
- aod380: 2
- alpha: 2
- lambda1: 2
- max_iterations: 2
- ozone: 2
- a_r: 2
- module: 2
- weight: 2
- ghi_clear: 2
- max_clearness_index: 2
- solar_position: 2
- system: 2
- clearsky_model: 2
- airmass_model: 2
- v: 2
- rsh: 2
- snowfall: 2
- threshold_snowfall: 2
- rainfall: 2
- cleaning_threshold: 2
- rain_accum_period: 2
- hourangle: 2
- equation_of_time: 2
- year: 2
- month: 2
- out: 2
- julian_ephemeris_millennium: 2
- earth_radius_vector: 2
- apparent_sun_longitude: 2
- geocentric_latitude: 2
- u: 2
- observer_elevation: 2
- xterm: 2
- equatorial_horizontal_parallax: 2
- geocentric_sun_declination: 2
- geocentric_sun_right_ascension: 2
- topocentric_sun_declination: 2
- topocentric_local_hour_angle: 2
- deltaT: 2
- u_c: 2
- u_v: 2
- alpha_absorption: 2
- emissivity: 2
- transmittance_absorptance: 2
- wind_fit_low: 2
- wind_fit_high: 2
- slope_azimuth: 2
- slope_tilt: 2
- max_rows: 2
- variable_map: 2
- email: 2
- outputformat: 2
- usehorizon: 2
- userhorizon: 2
- pvgis_format: 2
- time_resolution: 2
- variables: 2
- beta_voc: 2
- ivcurves: 2
- specs: 2
- const: 2
- maxiter: 2
- eps1: 2
- gamma_ref: 2
- mu_gamma: 2
- I_L_ref: 2
- I_o_ref: 2
- R_sh_ref: 2
- R_sh_0: 2
- R_s: 2
- R_sh_exp: 2
- e: 2
- sr: 2
- solar_elevation: 1
- surface_condition: 1
- color_coeff: 1
- wave_roughness_coeff: 1
- aod_bb: 1
- aod0: 1
- lambda0: 1
- aod1: 1
- aod2: 1
- lambda2: 1
- wind_speed_reference: 1
- height_reference: 1
- height_desired: 1
- exponent: 1
- linke_turbidity: 1
- perez_enhancement: 1
- filepath: 1
- interp_turbidity: 1
- apparent_elevation: 1
- aod700: 1
- measured: 1
- clearsky: 1
- infer_limits: 1
- window_length: 1
- mean_diff: 1
- max_diff: 1
- lower_line_length: 1
- upper_line_length: 1
- var_diff: 1
- slope_dev: 1
- asymmetry: 1
- n: 1
- K: 1
- L: 1
- c1: 1
- c2: 1
- theta_ref: 1
- iam_ref: 1
- upper: 1
- function: 1
- region: 1
- num: 1
- source_name: 1
- source_params: 1
- target_name: 1
- fix_n: 1
- measured_aoi: 1
- measured_iam: 1
- model_name: 1
- target_param: 1
- vtol: 1
- ac_power: 1
- dc_power: 1
- dc_voltage: 1
- dc_voltage_level: 1
- p_ac_0: 1
- p_nt: 1
- c: 1
- x_d: 1
- add: 1
- solar_constant: 1
- epoch_year: 1
- projection_ratio: 1
- full_output: 1
- max_clearsky_index: 1
- extra_radiation: 1
- clearness_index: 1
- max_airmass: 1
- calculate_gt_90: 1
- a_coeff: 1
- b_coeff: 1
- transmittance: 1
- clearsky_tolerance: 1
- zenith_threshold_for_zero_dni: 1
- zenith_threshold_for_clearsky_limit: 1
- daily_solar_zenith: 1
- global_diffuse_fraction: 1
- tmy_metadata: 1
- tmy_data: 1
- metadata: 1
- strategy: 1
- transposition_model: 1
- solar_position_method: 1
- k_a: 1
- k_d: 1
- tc_d: 1
- k_rs: 1
- k_rsh: 1
- eta: 1
- dict_output: 1
- temp_mod: 1
- k: 1
- cell_type: 1
- xdata: 1
- path: 1
- reference_irradiance: 1
- resistance: 1
- index: 1
- iam_model: 1
- pw: 1
- positions: 1
- cloud_speed: 1
- dt: 1
- coordinates: 1
- masking_angle: 1
- shaded_row_rotation: 1
- shaded_fraction: 1
- shaded_blocks: 1
- total_blocks: 1
- diode_voltage: 1
- gradients: 1
- voc: 1
- iph: 1
- isat: 1
- rs: 1
- gamma: 1
- poa_irradiance: 1
- initial_coverage: 1
- can_slide_coefficient: 1
- slide_amount_coefficient: 1
- snow_coverage: 1
- num_strings: 1
- snow_total: 1
- snow_events: 1
- lower_edge_height: 1
- string_factor: 1
- angle_of_repose: 1
- pm2_5: 1
- pm10: 1
- depo_veloc: 1
- soiling_loss_rate: 1
- grace_period: 1
- max_soiling: 1
- manual_wash_dates: 1
- initial_soiling: 1
- raw_spa_output: 1
- next_or_previous: 1
- lower_bound: 1
- upper_bound: 1
- attribute: 1
- value: 1
- thetime: 1
- target: 1
- attr: 1
- day: 1
- hour: 1
- minute: 1
- second: 1
- microsecond: 1
- julian_ephemeris_day: 1
- arr: 1
- heliocentric_longitude: 1
- heliocentric_latitude: 1
- x2: 1
- x3: 1
- x4: 1
- mean_ecliptic_obliquity: 1
- obliquity_nutation: 1
- geocentric_longitude: 1
- aberration_correction: 1
- julian_century: 1
- mean_sidereal_time: 1
- apparent_sidereal_time: 1
- observer_longitude: 1
- sun_right_ascension: 1
- yterm: 1
- local_pressure: 1
- local_temp: 1
- topocentric_elevation_angle_wo_atmosphere: 1
- topocentric_elevation_angle_without_atmosphere: 1
- atmospheric_refraction_correction: 1
- topocentric_elevation_angle: 1
- topocentric_astronomers_azimuth: 1
- sun_mean_longitude: 1
- loc_args: 1
- dates: 1
- module_temperature: 1
- ir_down: 1
- sky_view: 1
- noct_installed: 1
- module_height: 1
- wind_height: 1
- absorption: 1
- module_width: 1
- module_length: 1
- array_height: 1
- mount_standoff: 1
- unit_mass: 1
- u_const: 1
- du_wind: 1
- absorptance: 1
- djd: 1
- apparent_azimuth: 1
- max_angle: 1
- backtrack: 1
- cross_axis_tilt: 1
- tracker_theta: 1
- input_power: 1
- no_load_loss: 1
- load_loss: 1
- transformer_rating: 1
- iam: 1
- iam_front: 1
- iam_back: 1
- bifaciality: 1
- shade_factor: 1
- transmission_factor: 1
- rmad: 1
- fill_factor: 1
- fill_factor_reference: 1
- timestamps: 1
- pvrow_height: 1
- pvrow_width: 1
- n_pvrows: 1
- index_observed_pvrow: 1
- rho_front_pvrow: 1
- rho_back_pvrow: 1
- horizon_band_angle: 1
- pvarray: 1
- rotation: 1
- grid: 1
- trace_val: 1
- latitude_range: 1
- longitude_range: 1
- username: 1
- password: 1
- save_path: 1
- csvdata: 1
- raw_data: 1
- site: 1
- names: 1
- interval: 1
- attributes: 1
- leap_day: 1
- full_name: 1
- affiliation: 1
- raddatabase: 1
- components: 1
- pvcalculation: 1
- peakpower: 1
- pvtechchoice: 1
- mountingplace: 1
- loss: 1
- trackingtype: 1
- optimal_surface_tilt: 1
- optimalangles: 1
- startyear: 1
- endyear: 1
- roll_utc_offset: 1
- identifier: 1
- time_step: 1
- time_ref: 1
- verbose: 1
- server: 1
- source: 1
- spatial_resolution: 1
- true_dynamics: 1
- probability_of_exceedance: 1
- missing_data: 1
- timestamp_type: 1
- terrain_shading: 1
- duration: 1
- filetype: 1
- recolumn: 1
- v_mp_i_mp: 1
- vlim: 1
- ilim: 1
- celltype: 1
- gamma_pmp: 1
- dEgdT: 1
- init_guess: 1
- root_kwargs: 1
- rshexp: 1
- ee: 1
- e0: 1
- decimals: 1
- imax_limits: 1
- vmax_limits: 1
- voc_points: 1
- isc_points: 1
- mp_fit_order: 1
- wavelengths: 1
- standard: 1
- e_sun: 1
- e_ref: 1
- min_precipitable_water: 1
- max_precipitable_water: 1
- min_airmass_absolute: 1
- max_airmass_absolute: 1
- qe: 1
- ground_albedo: 1
- surface_pressure: 1
- relative_airmass: 1
- aerosol_turbidity_500nm: 1
- scattering_albedo_400nm: 1
- wavelength_variation_factor: 1
- aerosol_asymmetry_factor: 1 Code (free of charge)
#!/usr/bin/env python3
"""
Utility script to ensure that the parameters in the public API of pvlib-python
are normalized. By normalized, we mean that the parameters are either in the
documentation glossary or in the exclusion list.
This script is intended to be run in the CI pipeline to ensure that the
parameters are normalized before merging a PR.
Author: Echedey :D
"""
# %%
from pathlib import Path
import re
import ast
# Paths setup
REPO_ROOT = Path(__file__).parent.parent
# Path to pvlib directory
PVLIB_DIR = REPO_ROOT / "pvlib"
# Path to the glossary
GLOSSARY_PATH = REPO_ROOT / "docs/sphinx/source/user_guide/nomenclature.rst"
# %%
def get_glossary_terms(glossary_rst_file) -> list[str]:
"""
Returns the list of defined terms in the glossary.
Parameters
----------
glossary_rst_file : Path
Path to the glossary file.
Returns
-------
list[str]
List of defined terms in the glossary.
Notes
-----
The terms of the glossary are defined in the nomenclature section of the
documentation.
"""
with open(glossary_rst_file, mode="r", encoding="utf-8") as f:
# first, trim everything before the '.. glossary::' directive
while ".. glossary::" not in f.readline():
pass
# then, read the glossary terms
# each term is indented by 4 spaces
# more than one can appear in a line, separated by commas
# the glossary ends with two newlines
glossary_terms = []
term_regex = re.compile(r"^ {4}\w+") # match a line with a term
for line in f:
if line == "\n": # a newline is found
if (line := f.readline()) == "\n": # is it a double newline?
break # then the glossary ends
# otherwise, it is a text line
if term_regex.match(line): # is it a glossary term?
# remove the leading spaces and the trailing newline
glossary_terms.extend(line.strip().split(", "))
return glossary_terms
# %%
def get_parameters_to_exclude() -> list[str]:
"""
Returns the list of parameters to exclude from the normalization check.
These include python-specific parameters, like 'self', 'cls', etc.
And also parameters that are not in the glossary but are used in the code.
"""
return [
"self",
"cls",
"kwargs",
] # list of parameters to exclude from the normalization check
# %%
def get_files_to_check() -> list[Path]:
"""
Returns the list of files to check for normalized parameters.
These files are relative to the pvlib package root directory.
"""
files_to_check = list(PVLIB_DIR.glob("**/*.py"))
files_to_exclude = [
# non-public API files, __init__.py, _deprecation.py
*PVLIB_DIR.glob("**/_*.py"),
*PVLIB_DIR.glob("tests/**/*.py"), # test files
*PVLIB_DIR.glob("spa_c_files/*.py"), # spa c submodule
]
files_to_check = [
file.relative_to(PVLIB_DIR)
for file in files_to_check
if file not in files_to_exclude
]
return files_to_check
# %%
if __name__ == "__main__":
# Get the glossary terms
glossary_terms = get_glossary_terms(GLOSSARY_PATH)
# Get the parameters to exclude and append them to the glossary terms
allowed_parameter_names = get_parameters_to_exclude()
allowed_parameter_names = set(glossary_terms + allowed_parameter_names)
# Get the list of files to check
files_to_check = get_files_to_check()
# tree of files, with the functions and their line number and params
# {file_path: {function_name: (line_number, [parameter_names])}}
tree: dict[Path, dict[tuple[str, int], list[str]]] = {}
for file_path in files_to_check:
with open(PVLIB_DIR / file_path, mode="r", encoding="utf-8") as f:
tree[file_path] = {} # initialize the file tree
parsed_file = ast.parse(f.read())
for node in ast.walk(parsed_file):
if isinstance(node, ast.FunctionDef):
if node.name.startswith("_"):
continue # skip private functions
tree[file_path][node.name] = (
node.lineno,
[arg.arg for arg in node.args.args],
)
# Check if the parameters are normalized
parameter_names_violations = 0
parameter_names_violated = []
for file_path, functions in tree.items():
for function_name, (lineno, parameters) in functions.items():
for parameter in parameters:
if parameter not in allowed_parameter_names:
print(
f"Unknown {parameter} in {function_name}: "
+ f"in {file_path}:{lineno}"
)
parameter_names_violations += 1
parameter_names_violated.append(parameter)
if parameter_names_violations > 0:
# count each unique unknown parameter into a dictionary
parameter_names_violated: dict[str, int] = {
parameter: parameter_names_violated.count(parameter)
for parameter in parameter_names_violated
}
print("The unique unknown parameters are:")
for parameter in parameter_names_violated:
print(f" - {parameter}")
print(f"A total of {len(parameter_names_violated)} unique unknown "
+ "parameters were found. They make for a total of "
+ f"{parameter_names_violations} ocurrences.")
# order by most common unknown parameter
print("The most common unknown parameters are:")
for parameter in sorted(
parameter_names_violated,
key=parameter_names_violated.get,
reverse=True,
):
print(f" - {parameter}: {parameter_names_violated[parameter]}")
exit(1)
print("All parameters are consistent.")
exit(0)
# %% Edit to add: we can do an online spreadsheet, discuss whether they are appropriate for the glossary and/or fill the definitions there. |
@jsstein see @echedey-ls list of parameter names. |
The table of parameter names in the docs is a bit dusty and doesn't perfectly reflect today's pvlib. To make it easier to figure out where the gaps are, here's a script that scans the package, finds the parameter names currently in use, and compares with that table:
unlisted_names
has over 300 elements. I think it makes sense to leave most of them out of the table, for example function-specific names likeupper_line_length
andu0
, but a few of the unlisted ones are common enough to warrant inclusion, e.g.albedo
. It also shows a few inconsistencies likedelta_t
vsdeltaT
andpoa_irradiance
vspoa_global
.An idea, maybe a bad one: once we've figured out which of the unlisted parameter names we don't want to include in the table, put them in an "ignore list" and create a CI check that performs the above scan to check for new parameter names that aren't in the table but maybe should be.
The text was updated successfully, but these errors were encountered: