-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flexible HasData seeding behaviors #31072
Comments
See also #27959. |
ReplaceIfKeyExists sounds like UPSERT (#4526), SkipIfKeyExists sounds like add-or-ignore (#16949), and FailIfKeyExists is the current simple behavior (simple INSERT). Once the above are implemented, I highly doubt we'd add support for selecting them via HasData. As @ajcvickers wrote above, it really sounds like you want to implement seeding yourself via the regular mechanisms that EF provides for inserting data. This means doing seeding completely out of the migrations process. |
I've added a detailed comment to #27959. Nope, I'd prefer not to implement a custom seeding approach. We've made great use of HasData so far and want to continue with it. I think the reason it hasn't seen much adoption is it cannot reliably be used once an application has existing data. We've been using it for a year now, and we've reached a point in which I can't use HasData for new features, because the generated migration only works for bare iron new deployments, OR existing installs, but not both. I think adding these behavior options, specifically SkipIfKeyExists, makes HasData useful for established applications. Whether SkipIfKeyExists is implemented as an Upsert, Merge or Disabling/Enabling the FK constraints is a different question. I'm just hoping to get the functionality baked into HasData so it is available for future enhancements to my applications. For example:
I have a few new features that are tied to elevated positions (not people). I'd like to seed these for the feature deployment, but this data is also populated from our HR systems ELT and already exists. Rather than write up a custom migration for this, I'd much rather be able to use Thanks for your consideration on this. |
The intention for However, all bets are off once the application modifies the data. At that point, |
Agreed, and that is how I'm using it. My issue is how to handle seeding when new features require data and the data MIGHT already exist from an external seeding action. The data won't be changed once seeded. The only thing that changes is when it is needed. When we implemented a v1 feature, we didn't think we'd need to seed particular data and instead loaded the lookup tables from another source using ETL. Now we have a new feature that requires one common record always exists. Since that record already exists in our lookup tables in Production, we can't use HasData to seed the data for any other publish, such as a fresh Dev install.
Both of these processes would have to change, significantly increasing complexity and fragility. We'd have to publish to an intermediate v1 state, run ETL processing, and then publish the rest of the application. In its current implementation, that data can never be brought into the application seeding for future deployments and the seeded data can never be depended on for future development. All because we can't retrofit seeding a couple of records using HasData. I am a big fan of HasData. I think it is an excellent tool. It has greatly simplified our data creation and integrity, and enables design-time validation of data by leveraging static values with HasData. However, HasData is deficient in one area. It should ensure that data needed by the application exists, but it shouldn't care how or when the data arrived. Only that it exists. SkipIfKeyExists functionality would solve this. Thanks again for everything you've done, It has greatly simplified our development process, and keeps getting better. Wes |
HasData and data seeding was very specifically not designed for this kind of scenario; the feature assumes that the data is always "owned" by EF and only ever inserted/manipulated via migrations. The data can change via the seeding mechanism in later migrations, but the moment anyone touches it externally (or inserts it in the first place), you're operating outside the way the feature was meant to be used.
I'd be interested in understanding this better. Especially since you indicated that the data will never change, what exact advantages do you see with HasData compared to just seeding data via normal EF means? In other words, your program can instantiate a regular EF context, add your seed data and then call SaveChangesAsync in the normal way. This seems very simple and provides most of the advantages of seeding; I'm not sure what kind of design-time validation of static values you're referring to. HasData does have an advantage when you want to evolve your seeded data over time via migrations, similarly to how you evolve your database schema with migrations; it can be difficult to write that kind of seeding yourself, and migrations already provide a database "state in time" which is useful here. But you've indicated above that the data won't change, so seeding seems even less valuable for you. As indicated in #27959, we consider data seeding a problematic feature (at least in it's current form), and I'd recommend thinking carefully about exactly what makes you think it's better than just plain old insertion. In any case, it's very unlikely we'd add "insert or do nothing" functionailty to HasData as you're requesting in this issue. |
We declare our lookup values as static values of the class. These are added to a static collection which is passed to HasData to seed the table. This enforces "seeding" the data at design time. We can't use a value from a Enumeration class until it has been added (obviously), and simply by adding it to the class, we are ensuring it gets seeded to the lookup table. This is seen in action in my Example A on 27959. Without using HasData, this process becomes much more complicated and inconsistent. In addition to eliminating any issues with SQL scripting, this has helped us catch some issues with the seeded data (such as duplicate key use) when the migration is generated. Normally, that would not fail until the script is executed. I agree with With my suggestion to skip if the key exists, I can transfer ownership of existing data so EF can manage it and any changes moving forward. Since it would be an opt-in behavior,
The data won't be changed by the application. We've had to add new values, and have updated a couple of the existing values to change labeling. These types of changes to seeded data don't happen often, but the update process is incredibly simple with HasData. We add/change the static value in the class and generate a new migration. Our CICD process makes sure the data and relevant application changes are all pushed up together. |
@wdhenrik I really don't see anything here that wouldn't be just as trivial without seeding... At the end of the day you're just doing: context.Activities.AddRange(Activity.AllItems);
await context.SaveChangesAsync(); ... and EF will save those instances to the table; no need for SQL scripting or anything else.
That's very much by design. For your very specific case, just skipping existing rows may be good - but other users may need something different. For example, another user may want to merge the new data into the existing rows in some way that's very specific to their scenario. Since there are potentially endless ways to "merge" the new data with the existing, the only viable way to handle this is for users to write their own code inserting the data with SaveChanges - just as they do with any other data insertion/update scenario. Once again, I encourage you to try to think exactly what HasData adds over EF's general-purpose SaveChanges mechanism. |
@roji I'm open to considering other options. The article on Data Seeding is not very useful. Do you have other references you can provide that describe that option in more detail, specifically how and when it is triggered? |
@wdhenrik it's up to you to trigger your seeding logic whenever is appropriate, for example when your program starts up; that's covered (very briefly!) under Custom initialization logic. When exactly the seeding should occur is also something that varies from user to user - some want it as part of deployment, others when some button is clicked in an administrative UI, etc. |
Note from triage: this is not something we plan to support. |
In v1 of our application, we seeded all data using HasData. For v2 we added some features that require certain data keys. The data for this was manually created in v2 by an ETL process, which is also part of the solution. For v3, we're creating some new features that will depend on certain data keys existing. We know these will be managed by the ETL process after deployment, but we need the keys to exist after migration.
For new installs, HasData works fine, but if migrating from an earlier version, these keys already exist and the data seeding in V3 fails.
Please extend the HasData method to accept an additional optional parameter, SeedBehavior.
SeedBehavior should consist of these options:
This would be the default to match the current behavior.
This would probably best be implemented as an UPDATE if exists. I think a DELETE/INSERT would increase complexity and risk and reduce migration performance.
If the key exists it does not need to be seeded. This is the behavior I would use for my situation.
I know there are workarounds for seeding additional data without using HasData, but that has downsides as well.
Please consider extending HasData so it can be used with established applications as well.
The text was updated successfully, but these errors were encountered: