Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving the alias configuration API for validation and serialization #1640

Merged
merged 17 commits into from
Feb 25, 2025

Conversation

sydney-runkle
Copy link
Member

@sydney-runkle sydney-runkle commented Feb 19, 2025

The pydantic-core (functional) part of pydantic/pydantic#8379
Prep for pydantic/pydantic#11468 🚀

This is a bit of a monster PR. Hopefully, it's reviewable commit by commit. I considered splitting these changes up into different PRs, but they're all really closely tied together and conceptually make sense in one place.
You can skip this one. It was my first attempt at supporting by_alias and by_name in model_validate_X functions, and it caused serious performance issues. I've left it in the history for reference, but it can be practically ignored in terms of review.

TL;DR: This should make it easier to configure alias validation and serialization behavior, both with model configuration and with runtime flags in model validation / serialization functions 🤞.

TL;DR part 2: this is the new API we're moving to support:

If unset, the default is used.
Runtime settings (args to model validate / serialize functions) take priority over configuration settings, and they surpass model boundaries (apply to nested models).

class ConfigDict:
    validate_by_alias: bool = True
    serialize_by_alias: bool = False
    #   in v3, serialize_by_alias default should change to True

    populate_by_name: bool = False ---> validate_by_name: bool = False

def model_dump_X(by_alias: bool = False, ...):
    in v3, by_alias default should change to True

def model_validate_X(by_alias: bool = True, by_name: bool = False, ...):

Now, this is the TL part of TL;DR... I've added comments to relevant sections of the diff to make this easier for reviewers (thank you in advance @Viicos, @davidhewitt for your efforts here). In this PR:

  • I've deprecated the populate_by_name setting directly on core schemas in favor of using it through config, as is the case when schemas are built with pydantic. The one exception here is arguments_schema, which doesn't have a config.
  • We now enforce validate_by_alias alongside validate_by_name, which means you can enforce validation only by name, if desired (see accompanying tests)
  • We support a serialize_by_alias specification on configuration for model like classes, which takes a backseat to the by_alias runtime specification in serialization functions. This required two changes:
    • by_alias defaulted to true in pydantic-core for some reason, which was inconsistent with the pydantic False default, so the default (and corresponding tests) were changed to false here.
    • There was no "unset" concept for the by_alias setting, like there is for strict: bool | None for example. This is necessary in order to understand if a config setting should take action over an unset runtime flag, thus, a new by_alias signature has been introduced.
  • We support by_alias and by_name specifications on model_validate_X functions in order to be more consistent with runtime alias configuration at serialization time. These settings, like in the serialization case, take priority over configuration settings and apply to all nested models. Note - the config settings are subject to the model config boundary.
    • I've added a SchemaError for the case where we get to validate_by_alias = False and validate_by_name = False, but this is practically enforced in pydantic, so hopefully isn't a big issue here.

Copy link

codspeed-hq bot commented Feb 19, 2025

CodSpeed Performance Report

Merging #1640 will not alter performance

Comparing new-alias-api (77c8c03) with main (7dc19c3)

Summary

✅ 157 untouched benchmarks

* replace populate_by_name with validate_by_name
* enforce validate_by_alias in conjunction with validate_by_name (via get_lookup_key)
* deprecate populate_by_name spec on model, dc, and td schemas in favor of access through the config setting
…validate functions

This approach has now been reverted in favor of a schema-build-time approach due to perf reasons
See the next commit for a more robust explanation :).
…ions

* This approach (2nd attempt) emphasizes building lookup keys at schema build time for performance reasons
* We avoid any LookupKey builds at validation time to avoid perf bottlenecks + redundant builds
* We store potentially up to 3 LookupKey instances via LookupKeyCollection, representing
name, alias, and alias_then_name lookups based on the combination of config and runtime alias settings.
* Adding parametrized tests to check various alias config / runtime setting combinations
@sydney-runkle sydney-runkle marked this pull request as ready for review February 22, 2025 14:17
@sydney-runkle sydney-runkle changed the title starting on alias API unification Improving the Alias configuration API for validation and serialization Feb 22, 2025
Comment on lines +100 to +101
by_alias: bool | None = None,
by_name: bool | None = None,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important that these have the | None specification because we want to be able to detect that a value is unset + enforce a default.

@@ -2888,7 +2891,6 @@ class TypedDictSchema(TypedDict, total=False):
# all these values can be set via config, equivalent fields have `typed_dict_` prefix
extra_behavior: ExtraBehavior
total: bool # default: True
populate_by_name: bool # replaces `allow_population_by_field_name` in pydantic v1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We remove this specification off of core schemas (other than arguments) because it can be specified through configuration, and that's how it's practically done in pydantic during schema builds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the fun part of this PR - designing the new lookup key pattern to accommodate the web of validation by alias/name settings.

Note, it's important that we build these keys during schema build time. I tried just building one key as needed at validation time, and that resulted in some really unfortunate perf regressions.

#[pyo3(signature = (value, *, mode = None, include = None, exclude = None, by_alias = true,
#[pyo3(signature = (value, *, mode = None, include = None, exclude = None, by_alias = None,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the PR description - we had by_alias defaulting to true here which was inconsistent with that of pydantic. In V3, I definitely think we should change this default, but for now, we can't change it in pydantic, so we should be consistent here.

Comment on lines 208 to 210
pub fn serialize_by_alias_or(&self, serialize_by_alias: Option<bool>) -> bool {
self.by_alias.unwrap_or(serialize_by_alias.unwrap_or(false))
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use this pattern a lot in this PR - a great way to ensure that runtime settings take priority, if set, over config settings.

@sydney-runkle sydney-runkle changed the title Improving the Alias configuration API for validation and serialization Improving the alias configuration API for validation and serialization Feb 22, 2025
@Viicos
Copy link
Member

Viicos commented Feb 23, 2025

Haven't reviewed everything yet, but it might be that we need to go through a deprecation process for the old populate_by_name core schema field. The probability that some user is using this at the core schema level is quite now, but technically this is public API.

This can be done without too much overhead, by raising a deprecation warning in the relevant core schema creation functions, and ideally with a deprecated overload.

@sydney-runkle
Copy link
Member Author

Haven't reviewed everything yet, but it might be that we need to go through a deprecation process for the old populate_by_name core schema field.

We discussed this on the open source call today and decided this wasn't necessary, though we did decide to not officially deprecate populate_by_name in pydantic in a minor version and we instead just patch validate_by_name and validate_by_alias accordingly. I'll be sure to cover this heavily in the release blog post.

Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust implementation seems fine to me 👍

@@ -54,8 +56,6 @@ impl BuildValidator for DataclassArgsValidator {
) -> PyResult<CombinedValidator> {
let py = schema.py();

let populate_by_name = schema_or_config_same(schema, config, intern!(py, "populate_by_name"))?.unwrap_or(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like rather than deprecate populate_by_name it got straight-up removed, should we keep it around in the Rust code and emit a warning if it's present, just to help users migrate for a release or two. (Just in case there's anyone using core directly)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I'd agree, we should be more careful when deprecating a core schema field like this one. However, in this case, I can't find any public usage of populate_by_name with core schema construction.

See for example, this search.

@sydney-runkle sydney-runkle enabled auto-merge (squash) February 25, 2025 16:51
@sydney-runkle sydney-runkle merged commit 8c59fa2 into main Feb 25, 2025
27 of 28 checks passed
@sydney-runkle sydney-runkle deleted the new-alias-api branch February 25, 2025 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants