Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flexible HasData seeding behaviors #31072

Closed
wdhenrik opened this issue Jun 13, 2023 · 11 comments
Closed

Flexible HasData seeding behaviors #31072

wdhenrik opened this issue Jun 13, 2023 · 11 comments
Labels
closed-no-further-action The issue is closed and no further action is planned. customer-reported

Comments

@wdhenrik
Copy link

In v1 of our application, we seeded all data using HasData. For v2 we added some features that require certain data keys. The data for this was manually created in v2 by an ETL process, which is also part of the solution. For v3, we're creating some new features that will depend on certain data keys existing. We know these will be managed by the ETL process after deployment, but we need the keys to exist after migration.

For new installs, HasData works fine, but if migrating from an earlier version, these keys already exist and the data seeding in V3 fails.

Please extend the HasData method to accept an additional optional parameter, SeedBehavior.

SeedBehavior should consist of these options:

  1. FailIfKeyExists
    This would be the default to match the current behavior.
  2. ReplaceIfKeyExists
    This would probably best be implemented as an UPDATE if exists. I think a DELETE/INSERT would increase complexity and risk and reduce migration performance.
  3. SkipIfKeyExists
    If the key exists it does not need to be seeded. This is the behavior I would use for my situation.

I know there are workarounds for seeding additional data without using HasData, but that has downsides as well.

  1. Finding data seeded in custom migrations is time-consuming and hard to determine the complete set of seeds since the data may be in different migrations, and combined with data seeded for other entities. Leveraging HasData for seeding standardizes where to find the seeded values for a given entity regardless of when they were added.
  2. Custom migrations require more time and effort, and have a higher risk floor. Custom migration errors general only surface when the migration is applied, and is more subject to differences in environment issues. HasData is easy and reliable to use. Making a mistake with HasData generally raises an error in add-migration, at design time.

Please consider extending HasData so it can be used with established applications as well.

@ajcvickers
Copy link
Contributor

See also #27959.

@roji
Copy link
Member

roji commented Jun 13, 2023

ReplaceIfKeyExists sounds like UPSERT (#4526), SkipIfKeyExists sounds like add-or-ignore (#16949), and FailIfKeyExists is the current simple behavior (simple INSERT).

Once the above are implemented, I highly doubt we'd add support for selecting them via HasData. As @ajcvickers wrote above, it really sounds like you want to implement seeding yourself via the regular mechanisms that EF provides for inserting data. This means doing seeding completely out of the migrations process.

@wdhenrik
Copy link
Author

wdhenrik commented Jun 13, 2023

I've added a detailed comment to #27959.

Nope, I'd prefer not to implement a custom seeding approach. We've made great use of HasData so far and want to continue with it. I think the reason it hasn't seen much adoption is it cannot reliably be used once an application has existing data. We've been using it for a year now, and we've reached a point in which I can't use HasData for new features, because the generated migration only works for bare iron new deployments, OR existing installs, but not both.

I think adding these behavior options, specifically SkipIfKeyExists, makes HasData useful for established applications. Whether SkipIfKeyExists is implemented as an Upsert, Merge or Disabling/Enabling the FK constraints is a different question. I'm just hoping to get the functionality baked into HasData so it is available for future enhancements to my applications.

For example:

    builder.HasData(new List<Position> {
                  {Position.None},
                  {Position.Unknown } });
    builder.HasData(Position.GetEleveatedPositions()});
  }

I have a few new features that are tied to elevated positions (not people). I'd like to seed these for the feature deployment, but this data is also populated from our HR systems ELT and already exists.

Rather than write up a custom migration for this, I'd much rather be able to use
builder.HasData(Position.GetEleveatedPositions()}, SkipIfKeyExists);

Thanks for your consideration on this.
Wes

@ajcvickers
Copy link
Contributor

The intention for HasData is that it only be used for data which the application will never change. Migrations can then be used to keep this data in sync across multiple application versions, updating, adding, and removing entities as needed as the data in HasData changes. None of this needs calls to SkipIfKeyExists or similar--the entity will already be skipped automatically if the migration that includes it has been applied.

However, all bets are off once the application modifies the data. At that point, HasData isn't an appropriate mechanism anymore.

@wdhenrik
Copy link
Author

wdhenrik commented Jun 19, 2023

The intention for HasData is that it only be used for data which the application will never change.

Agreed, and that is how I'm using it.

My issue is how to handle seeding when new features require data and the data MIGHT already exist from an external seeding action. The data won't be changed once seeded. The only thing that changes is when it is needed.

When we implemented a v1 feature, we didn't think we'd need to seed particular data and instead loaded the lookup tables from another source using ETL. Now we have a new feature that requires one common record always exists.

Since that record already exists in our lookup tables in Production, we can't use HasData to seed the data for any other publish, such as a fresh Dev install.

  • When my local database gets out of sync from switching branches in VS, the quickest fix is drop-database; update-database.
  • Standing up a new environment using CICD deploys the entire database with seeded data in a single step. Once that is completed a subset of ETLs run to load large volume data from integrated systems.

Both of these processes would have to change, significantly increasing complexity and fragility. We'd have to publish to an intermediate v1 state, run ETL processing, and then publish the rest of the application.

In its current implementation, that data can never be brought into the application seeding for future deployments and the seeded data can never be depended on for future development.

All because we can't retrofit seeding a couple of records using HasData.

I am a big fan of HasData. I think it is an excellent tool. It has greatly simplified our data creation and integrity, and enables design-time validation of data by leveraging static values with HasData.

However, HasData is deficient in one area. It should ensure that data needed by the application exists, but it shouldn't care how or when the data arrived. Only that it exists. SkipIfKeyExists functionality would solve this.

Thanks again for everything you've done, It has greatly simplified our development process, and keeps getting better.

Wes

@roji
Copy link
Member

roji commented Jun 23, 2023

The intention for HasData is that it only be used for data which the application will never change.

My issue is how to handle seeding when new features require data and the data MIGHT already exist from an external seeding action. The data won't be changed once seeded.

HasData and data seeding was very specifically not designed for this kind of scenario; the feature assumes that the data is always "owned" by EF and only ever inserted/manipulated via migrations. The data can change via the seeding mechanism in later migrations, but the moment anyone touches it externally (or inserts it in the first place), you're operating outside the way the feature was meant to be used.

I am a big fan of HasData. I think it is an excellent tool. It has greatly simplified our data creation and integrity, and enables design-time validation of data by leveraging static values with HasData.

I'd be interested in understanding this better. Especially since you indicated that the data will never change, what exact advantages do you see with HasData compared to just seeding data via normal EF means? In other words, your program can instantiate a regular EF context, add your seed data and then call SaveChangesAsync in the normal way. This seems very simple and provides most of the advantages of seeding; I'm not sure what kind of design-time validation of static values you're referring to.

HasData does have an advantage when you want to evolve your seeded data over time via migrations, similarly to how you evolve your database schema with migrations; it can be difficult to write that kind of seeding yourself, and migrations already provide a database "state in time" which is useful here. But you've indicated above that the data won't change, so seeding seems even less valuable for you.

As indicated in #27959, we consider data seeding a problematic feature (at least in it's current form), and I'd recommend thinking carefully about exactly what makes you think it's better than just plain old insertion. In any case, it's very unlikely we'd add "insert or do nothing" functionailty to HasData as you're requesting in this issue.

@wdhenrik
Copy link
Author

wdhenrik commented Jun 28, 2023

and enables design-time validation of data by leveraging static values with HasData.

We declare our lookup values as static values of the class. These are added to a static collection which is passed to HasData to seed the table.

This enforces "seeding" the data at design time. We can't use a value from a Enumeration class until it has been added (obviously), and simply by adding it to the class, we are ensuring it gets seeded to the lookup table. This is seen in action in my Example A on 27959.

Without using HasData, this process becomes much more complicated and inconsistent. In addition to eliminating any issues with SQL scripting, this has helped us catch some issues with the seeded data (such as duplicate key use) when the migration is generated. Normally, that would not fail until the script is executed.

I agree with the feature assumes that the data is always "owned" by EF , but the current implementation does not provide a way to transfer ownership to EF. This effectively limits HasData to new development only. Anyone migrating to EF Current cannot leverage HasData.

With my suggestion to skip if the key exists, I can transfer ownership of existing data so EF can manage it and any changes moving forward. Since it would be an opt-in behavior, SkipIfKeyExists could just ignore the Primary Key violation exception on Insert.
Msg 2627, Level 14, State 1, Line 1 Violation of PRIMARY KEY constraint 'PK_SomeTable'. Cannot insert duplicate key in object 'dbo.SomeTable'. The duplicate key value is (1)

But you've indicated above that the data won't change

The data won't be changed by the application. We've had to add new values, and have updated a couple of the existing values to change labeling. These types of changes to seeded data don't happen often, but the update process is incredibly simple with HasData. We add/change the static value in the class and generate a new migration. Our CICD process makes sure the data and relevant application changes are all pushed up together.

@roji
Copy link
Member

roji commented Jun 28, 2023

@wdhenrik I really don't see anything here that wouldn't be just as trivial without seeding... At the end of the day you're just doing: builder.HasData(Activity.AllItems), with AllItems being some List<Activity> that's prepopulated with whatever values you want (a static collection or not). You can just as well do the following:

context.Activities.AddRange(Activity.AllItems);
await context.SaveChangesAsync();

... and EF will save those instances to the table; no need for SQL scripting or anything else.

[...] but the current implementation does not provide a way to transfer ownership to EF.

That's very much by design. For your very specific case, just skipping existing rows may be good - but other users may need something different. For example, another user may want to merge the new data into the existing rows in some way that's very specific to their scenario. Since there are potentially endless ways to "merge" the new data with the existing, the only viable way to handle this is for users to write their own code inserting the data with SaveChanges - just as they do with any other data insertion/update scenario.

Once again, I encourage you to try to think exactly what HasData adds over EF's general-purpose SaveChanges mechanism.

@wdhenrik
Copy link
Author

@roji I'm open to considering other options. The article on Data Seeding is not very useful. Do you have other references you can provide that describe that option in more detail, specifically how and when it is triggered?

@roji
Copy link
Member

roji commented Jun 28, 2023

@wdhenrik it's up to you to trigger your seeding logic whenever is appropriate, for example when your program starts up; that's covered (very briefly!) under Custom initialization logic. When exactly the seeding should occur is also something that varies from user to user - some want it as part of deployment, others when some button is clicked in an administrative UI, etc.

@ajcvickers
Copy link
Contributor

Note from triage: this is not something we plan to support.

@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Jul 7, 2023
@ajcvickers ajcvickers added closed-no-further-action The issue is closed and no further action is planned. and removed type-enhancement labels Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-no-further-action The issue is closed and no further action is planned. customer-reported
Projects
None yet
Development

No branches or pull requests

3 participants