Improving the alias configuration API for validation and serialization #1640

sydney-runkle · 2025-02-19T19:56:10Z

The pydantic-core (functional) part of pydantic/pydantic#8379
Prep for pydantic/pydantic#11468 🚀

This is a bit of a monster PR. Hopefully, it's reviewable commit by commit. I considered splitting these changes up into different PRs, but they're all really closely tied together and conceptually make sense in one place.
You can skip this one. It was my first attempt at supporting by_alias and by_name in model_validate_X functions, and it caused serious performance issues. I've left it in the history for reference, but it can be practically ignored in terms of review.

TL;DR: This should make it easier to configure alias validation and serialization behavior, both with model configuration and with runtime flags in model validation / serialization functions 🤞.

TL;DR part 2: this is the new API we're moving to support:

If unset, the default is used.
Runtime settings (args to model validate / serialize functions) take priority over configuration settings, and they surpass model boundaries (apply to nested models).

class ConfigDict:
    validate_by_alias: bool = True
    serialize_by_alias: bool = False
    #   in v3, serialize_by_alias default should change to True

    populate_by_name: bool = False ---> validate_by_name: bool = False

def model_dump_X(by_alias: bool = False, ...):
    in v3, by_alias default should change to True

def model_validate_X(by_alias: bool = True, by_name: bool = False, ...):

Now, this is the TL part of TL;DR... I've added comments to relevant sections of the diff to make this easier for reviewers (thank you in advance @Viicos, @davidhewitt for your efforts here). In this PR:

I've deprecated the populate_by_name setting directly on core schemas in favor of using it through config, as is the case when schemas are built with pydantic. The one exception here is arguments_schema, which doesn't have a config.
We now enforce validate_by_alias alongside validate_by_name, which means you can enforce validation only by name, if desired (see accompanying tests)
We support a serialize_by_alias specification on configuration for model like classes, which takes a backseat to the by_alias runtime specification in serialization functions. This required two changes:
- by_alias defaulted to true in pydantic-core for some reason, which was inconsistent with the pydantic False default, so the default (and corresponding tests) were changed to false here.
- There was no "unset" concept for the by_alias setting, like there is for strict: bool | None for example. This is necessary in order to understand if a config setting should take action over an unset runtime flag, thus, a new by_alias signature has been introduced.
We support by_alias and by_name specifications on model_validate_X functions in order to be more consistent with runtime alias configuration at serialization time. These settings, like in the serialization case, take priority over configuration settings and apply to all nested models. Note - the config settings are subject to the model config boundary.
- I've added a SchemaError for the case where we get to validate_by_alias = False and validate_by_name = False, but this is practically enforced in pydantic, so hopefully isn't a big issue here.

codspeed-hq · 2025-02-19T20:02:55Z

CodSpeed Performance Report

Merging #1640 will not alter performance

_{Comparing new-alias-api (77c8c03) with main (7dc19c3)}

Summary

✅ 157 untouched benchmarks

* replace populate_by_name with validate_by_name * enforce validate_by_alias in conjunction with validate_by_name (via get_lookup_key) * deprecate populate_by_name spec on model, dc, and td schemas in favor of access through the config setting

…validate functions This approach has now been reverted in favor of a schema-build-time approach due to perf reasons See the next commit for a more robust explanation :).

src/url.rs

…ions * This approach (2nd attempt) emphasizes building lookup keys at schema build time for performance reasons * We avoid any LookupKey builds at validation time to avoid perf bottlenecks + redundant builds * We store potentially up to 3 LookupKey instances via LookupKeyCollection, representing name, alias, and alias_then_name lookups based on the combination of config and runtime alias settings. * Adding parametrized tests to check various alias config / runtime setting combinations

sydney-runkle · 2025-02-22T14:19:53Z

python/pydantic_core/_pydantic_core.pyi

+        by_alias: bool | None = None,
+        by_name: bool | None = None,


It's important that these have the | None specification because we want to be able to detect that a value is unset + enforce a default.

sydney-runkle · 2025-02-22T14:20:38Z

python/pydantic_core/core_schema.py

@@ -2888,7 +2891,6 @@ class TypedDictSchema(TypedDict, total=False):
    # all these values can be set via config, equivalent fields have `typed_dict_` prefix
    extra_behavior: ExtraBehavior
    total: bool  # default: True
-    populate_by_name: bool  # replaces `allow_population_by_field_name` in pydantic v1


We remove this specification off of core schemas (other than arguments) because it can be specified through configuration, and that's how it's practically done in pydantic during schema builds.

sydney-runkle · 2025-02-22T14:22:19Z

src/lookup_key.rs

This was the fun part of this PR - designing the new lookup key pattern to accommodate the web of validation by alias/name settings.

Note, it's important that we build these keys during schema build time. I tried just building one key as needed at validation time, and that resulted in some really unfortunate perf regressions.

…validation alias/name settings

sydney-runkle · 2025-02-22T14:28:23Z

src/serializers/mod.rs

-    #[pyo3(signature = (value, *, mode = None, include = None, exclude = None, by_alias = true,
+    #[pyo3(signature = (value, *, mode = None, include = None, exclude = None, by_alias = None,


As mentioned in the PR description - we had by_alias defaulting to true here which was inconsistent with that of pydantic. In V3, I definitely think we should change this default, but for now, we can't change it in pydantic, so we should be consistent here.

sydney-runkle · 2025-02-22T14:29:43Z

src/serializers/extra.rs

+    pub fn serialize_by_alias_or(&self, serialize_by_alias: Option<bool>) -> bool {
+        self.by_alias.unwrap_or(serialize_by_alias.unwrap_or(false))
+    }


I use this pattern a lot in this PR - a great way to ensure that runtime settings take priority, if set, over config settings.

Viicos · 2025-02-23T21:47:21Z

Haven't reviewed everything yet, but it might be that we need to go through a deprecation process for the old populate_by_name core schema field. The probability that some user is using this at the core schema level is quite now, but technically this is public API.

This can be done without too much overhead, by raising a deprecation warning in the relevant core schema creation functions, and ideally with a deprecated overload.

sydney-runkle · 2025-02-24T16:00:44Z

Haven't reviewed everything yet, but it might be that we need to go through a deprecation process for the old populate_by_name core schema field.

We discussed this on the open source call today and decided this wasn't necessary, though we did decide to not officially deprecate populate_by_name in pydantic in a minor version and we instead just patch validate_by_name and validate_by_alias accordingly. I'll be sure to cover this heavily in the release blog post.

python/pydantic_core/_pydantic_core.pyi

python/pydantic_core/core_schema.py

src/validators/arguments.rs

tests/serializers/test_dataclasses.py

tests/serializers/test_functions.py

…core into new-alias-api

src/validators/arguments.rs

Co-authored-by: Victorien <[email protected]>

davidhewitt

Rust implementation seems fine to me 👍

src/serializers/extra.rs

davidhewitt · 2025-02-25T13:46:32Z

src/validators/dataclass.rs

@@ -54,8 +56,6 @@ impl BuildValidator for DataclassArgsValidator {
    ) -> PyResult<CombinedValidator> {
        let py = schema.py();

-        let populate_by_name = schema_or_config_same(schema, config, intern!(py, "populate_by_name"))?.unwrap_or(false);


It looks like rather than deprecate populate_by_name it got straight-up removed, should we keep it around in the Rust code and emit a warning if it's present, just to help users migrate for a release or two. (Just in case there's anyone using core directly)

Generally, I'd agree, we should be more careful when deprecating a core schema field like this one. However, in this case, I can't find any public usage of populate_by_name with core schema construction.

See for example, this search.

src/validators/validation_state.rs

tests/serializers/test_functions.py

Co-authored-by: David Hewitt <[email protected]>

…core into new-alias-api

python/pydantic_core/core_schema.py

Co-authored-by: Victorien <[email protected]>

starting on alias API unification

a780908

* replace populate_by_name with validate_by_name * enforce validate_by_alias in conjunction with validate_by_name (via get_lookup_key) * deprecate populate_by_name spec on model, dc, and td schemas in favor of access through the config setting

sydney-runkle force-pushed the new-alias-api branch from adb1f49 to a780908 Compare February 20, 2025 12:47

sydney-runkle added 3 commits February 20, 2025 11:24

add support for serialize_by_alias

2c5c0f8

serialize_by_alias tests

772f706

First pass at implementing support for by_alias and by_name in model …

490ab36

…validate functions This approach has now been reverted in favor of a schema-build-time approach due to perf reasons See the next commit for a more robust explanation :).

sydney-runkle force-pushed the new-alias-api branch from e05e78a to 7c2e851 Compare February 21, 2025 16:01

sydney-runkle commented Feb 21, 2025

View reviewed changes

src/url.rs Outdated Show resolved Hide resolved

sydney-runkle force-pushed the new-alias-api branch from 7c2e851 to 2c9ac6d Compare February 22, 2025 14:14

sydney-runkle marked this pull request as ready for review February 22, 2025 14:17

sydney-runkle changed the title ~~starting on alias API unification~~ Improving the Alias configuration API for validation and serialization Feb 22, 2025

sydney-runkle commented Feb 22, 2025

View reviewed changes

use Option<bool> for serialize_by_alias config to be consistent with …

4fac099

…validation alias/name settings

sydney-runkle commented Feb 22, 2025

View reviewed changes

sydney-runkle mentioned this pull request Feb 22, 2025

Improve alias configutation APIs pydantic/pydantic#11468

Merged

sydney-runkle changed the title ~~Improving the Alias configuration API for validation and serialization~~ Improving the alias configuration API for validation and serialization Feb 22, 2025

Viicos reviewed Feb 25, 2025

View reviewed changes

sydney-runkle and others added 3 commits February 25, 2025 08:37

Merge branch 'main' into new-alias-api

5e2a102

docs suggestion by @Viicos

f6cde33

Merge branch 'new-alias-api' of https://github.com/pydantic/pydantic-…

afed325

…core into new-alias-api

sydney-runkle commented Feb 25, 2025

View reviewed changes

src/validators/arguments.rs Outdated Show resolved Hide resolved

Apply suggestions from code review

280be5a

Co-authored-by: Victorien <[email protected]>

davidhewitt approved these changes Feb 25, 2025

View reviewed changes

a and A -> my_field and my_alias

93f1aef

sydney-runkle commented Feb 25, 2025

View reviewed changes

tests/serializers/test_functions.py Outdated Show resolved Hide resolved

sydney-runkle and others added 4 commits February 25, 2025 09:05

Using .or(x).unwrap_or(y) syntax

b5e3b4c

Co-authored-by: David Hewitt <[email protected]>

linting

06e8564

Merge branch 'new-alias-api' of https://github.com/pydantic/pydantic-…

57c1409

…core into new-alias-api

formatting

f64a0fa

Viicos approved these changes Feb 25, 2025

View reviewed changes

python/pydantic_core/core_schema.py Outdated Show resolved Hide resolved

sydney-runkle and others added 2 commits February 25, 2025 11:45

Update python/pydantic_core/core_schema.py

c2a1b67

Co-authored-by: Victorien <[email protected]>

revert test skip

77c8c03

sydney-runkle enabled auto-merge (squash) February 25, 2025 16:51

sydney-runkle merged commit 8c59fa2 into main Feb 25, 2025
27 of 28 checks passed

sydney-runkle deleted the new-alias-api branch February 25, 2025 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving the alias configuration API for validation and serialization #1640

Improving the alias configuration API for validation and serialization #1640

sydney-runkle commented Feb 19, 2025 •

edited by Viicos

Loading

codspeed-hq bot commented Feb 19, 2025 •

edited

Loading

sydney-runkle Feb 22, 2025

sydney-runkle Feb 22, 2025

sydney-runkle Feb 22, 2025

sydney-runkle Feb 22, 2025

sydney-runkle Feb 22, 2025

Viicos commented Feb 23, 2025

sydney-runkle commented Feb 24, 2025

davidhewitt left a comment

davidhewitt Feb 25, 2025

sydney-runkle Feb 25, 2025

		#[pyo3(signature = (value, *, mode = None, include = None, exclude = None, by_alias = true,
		#[pyo3(signature = (value, *, mode = None, include = None, exclude = None, by_alias = None,

Improving the alias configuration API for validation and serialization #1640

Improving the alias configuration API for validation and serialization #1640

Conversation

sydney-runkle commented Feb 19, 2025 • edited by Viicos Loading

codspeed-hq bot commented Feb 19, 2025 • edited Loading

Merging #1640 will not alter performance

Summary

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Viicos commented Feb 23, 2025

sydney-runkle commented Feb 24, 2025

davidhewitt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sydney-runkle commented Feb 19, 2025 •

edited by Viicos

Loading

codspeed-hq bot commented Feb 19, 2025 •

edited

Loading