-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving the alias configuration API for validation and serialization #1640
Conversation
CodSpeed Performance ReportMerging #1640 will not alter performanceComparing Summary
|
* replace populate_by_name with validate_by_name * enforce validate_by_alias in conjunction with validate_by_name (via get_lookup_key) * deprecate populate_by_name spec on model, dc, and td schemas in favor of access through the config setting
adb1f49
to
a780908
Compare
…validate functions This approach has now been reverted in favor of a schema-build-time approach due to perf reasons See the next commit for a more robust explanation :).
e05e78a
to
7c2e851
Compare
…ions * This approach (2nd attempt) emphasizes building lookup keys at schema build time for performance reasons * We avoid any LookupKey builds at validation time to avoid perf bottlenecks + redundant builds * We store potentially up to 3 LookupKey instances via LookupKeyCollection, representing name, alias, and alias_then_name lookups based on the combination of config and runtime alias settings. * Adding parametrized tests to check various alias config / runtime setting combinations
7c2e851
to
2c9ac6d
Compare
by_alias: bool | None = None, | ||
by_name: bool | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's important that these have the | None
specification because we want to be able to detect that a value is unset + enforce a default.
@@ -2888,7 +2891,6 @@ class TypedDictSchema(TypedDict, total=False): | |||
# all these values can be set via config, equivalent fields have `typed_dict_` prefix | |||
extra_behavior: ExtraBehavior | |||
total: bool # default: True | |||
populate_by_name: bool # replaces `allow_population_by_field_name` in pydantic v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We remove this specification off of core schemas (other than arguments) because it can be specified through configuration, and that's how it's practically done in pydantic
during schema builds.
src/lookup_key.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the fun part of this PR - designing the new lookup key pattern to accommodate the web of validation by alias/name settings.
Note, it's important that we build these keys during schema build time. I tried just building one key as needed at validation time, and that resulted in some really unfortunate perf regressions.
…validation alias/name settings
#[pyo3(signature = (value, *, mode = None, include = None, exclude = None, by_alias = true, | ||
#[pyo3(signature = (value, *, mode = None, include = None, exclude = None, by_alias = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the PR description - we had by_alias
defaulting to true
here which was inconsistent with that of pydantic
. In V3, I definitely think we should change this default, but for now, we can't change it in pydantic
, so we should be consistent here.
src/serializers/extra.rs
Outdated
pub fn serialize_by_alias_or(&self, serialize_by_alias: Option<bool>) -> bool { | ||
self.by_alias.unwrap_or(serialize_by_alias.unwrap_or(false)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use this pattern a lot in this PR - a great way to ensure that runtime settings take priority, if set, over config settings.
Haven't reviewed everything yet, but it might be that we need to go through a deprecation process for the old This can be done without too much overhead, by raising a deprecation warning in the relevant core schema creation functions, and ideally with a deprecated overload. |
We discussed this on the open source call today and decided this wasn't necessary, though we did decide to not officially deprecate |
Co-authored-by: Victorien <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust implementation seems fine to me 👍
@@ -54,8 +56,6 @@ impl BuildValidator for DataclassArgsValidator { | |||
) -> PyResult<CombinedValidator> { | |||
let py = schema.py(); | |||
|
|||
let populate_by_name = schema_or_config_same(schema, config, intern!(py, "populate_by_name"))?.unwrap_or(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like rather than deprecate populate_by_name
it got straight-up removed, should we keep it around in the Rust code and emit a warning if it's present, just to help users migrate for a release or two. (Just in case there's anyone using core directly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I'd agree, we should be more careful when deprecating a core schema field like this one. However, in this case, I can't find any public usage of populate_by_name
with core schema construction.
See for example, this search.
Co-authored-by: David Hewitt <[email protected]>
…core into new-alias-api
Co-authored-by: Victorien <[email protected]>
The
pydantic-core
(functional) part of pydantic/pydantic#8379Prep for pydantic/pydantic#11468 🚀
This is a bit of a monster PR. Hopefully, it's reviewable commit by commit. I considered splitting these changes up into different PRs, but they're all really closely tied together and conceptually make sense in one place.
You can skip this one. It was my first attempt at supporting
by_alias
andby_name
inmodel_validate_X
functions, and it caused serious performance issues. I've left it in the history for reference, but it can be practically ignored in terms of review.TL;DR: This should make it easier to configure alias validation and serialization behavior, both with model configuration and with runtime flags in model validation / serialization functions 🤞.
TL;DR part 2: this is the new API we're moving to support:
If unset, the default is used.
Runtime settings (args to model validate / serialize functions) take priority over configuration settings, and they surpass model boundaries (apply to nested models).
Now, this is the TL part of TL;DR... I've added comments to relevant sections of the diff to make this easier for reviewers (thank you in advance @Viicos, @davidhewitt for your efforts here). In this PR:
populate_by_name
setting directly on core schemas in favor of using it through config, as is the case when schemas are built withpydantic
. The one exception here isarguments_schema
, which doesn't have a config.validate_by_alias
alongsidevalidate_by_name
, which means you can enforce validation only by name, if desired (see accompanying tests)serialize_by_alias
specification on configuration for model like classes, which takes a backseat to theby_alias
runtime specification in serialization functions. This required two changes:by_alias
defaulted totrue
inpydantic-core
for some reason, which was inconsistent with thepydantic
False
default, so the default (and corresponding tests) were changed tofalse
here.by_alias
setting, like there is forstrict: bool | None
for example. This is necessary in order to understand if a config setting should take action over an unset runtime flag, thus, a newby_alias
signature has been introduced.by_alias
andby_name
specifications onmodel_validate_X
functions in order to be more consistent with runtime alias configuration at serialization time. These settings, like in the serialization case, take priority over configuration settings and apply to all nested models. Note - the config settings are subject to the model config boundary.SchemaError
for the case where we get tovalidate_by_alias = False
andvalidate_by_name = False
, but this is practically enforced inpydantic
, so hopefully isn't a big issue here.