Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider stronger semantics around generated keys #13575

Open
Tracked by #22954
ajcvickers opened this issue Oct 10, 2018 · 29 comments
Open
Tracked by #22954

Consider stronger semantics around generated keys #13575

ajcvickers opened this issue Oct 10, 2018 · 29 comments

Comments

@ajcvickers
Copy link
Contributor

Currently we avoid reasoning about store-generated keys in some places because it is possible to add a new entity with an explicit key value set even when the entity type is configured to use generated keys. The two most prominent examples of this are:

  • Add behaves differently from Attach. That is, Attach can assume that an entity without a key value set should be Added, while Add cannot assume that an entity with a key value set should be Attached.
  • DetectChanges always sets new entities as Added, even if they have a key value set.

Both of these cause confusion.

However, setting key values like this is typically used in special situations--most notably test data and seeding. So adding to the complexity of the mental model and as a result generating confusion in the cases above may not be a good trade-off to make. Instead, if we assume a set store-generated key means that the entity already exists in the database, I believe this would make common things easier to both do and understand.

We could then allow a pattern something like this to temporarily enable setting explicit key values:

using (context.ChangeTracker.EnableExplitKeys())
{
    context.Add(new Blog { Id = 77 });
}
@ajcvickers
Copy link
Contributor Author

@sjb-sjb Please file a separate issue for that--it's not the same thing that this issue is tracking.

@optiks
Copy link

optiks commented Oct 10, 2018

“However, setting key values like this is typically used in special situations--most notably test data and seeding.“

Not in our case - it’s production code!

We’ve worked around this by forcing the state to Unchanged through the ChangeTracker.Tracked event.

@ajcvickers
Copy link
Contributor Author

@optiks If you're forcing the state to Unchanged, then I think your workaround is doing what I am suggesting we should do by default. That is, if we make this change, then if you set the key value, then we will always assume that this means the state of the entity should be Modified/Unchanged. Conversely, if the key value is not set, then we will always assume that the entity should be Added.

The behavior today follows this except that:

  • When Add is called, then all entities in the graph will end up in the Added state because it is possible that key values have been set for some entities even though they should be inserted. I'm saying we don't do this anymore, and if the key values for entities in the graph are set, then we will set the state for those entities to Unchanged.
  • When a new entity is detected in an already tracked graph. In this case we currently assume that the entity should be Added, but I am proposing that we instead mark it Unchanged (or possibly Modified) if the key value is set.

@sjb-sjb
Copy link

sjb-sjb commented Oct 10, 2018

@ajcvickers I have restated the issue in #13564, and added a question and a suggestion.

@ajcvickers
Copy link
Contributor Author

I re-opened #13564 and we will discuss in triage.

@optiks
Copy link

optiks commented Oct 10, 2018

@sjb-sjb Our workaround may help you (you may need to tweak/make it generic if you don't use int ids):

// Subscribe to the Tracked event somewhere, e.g. in the context constructor. You should unsubscribe somewhere too.
ChangeTracker.Tracked += ChangeTracker_Tracked; 
        
private void ChangeTracker_Tracked(object sender, Microsoft.EntityFrameworkCore.ChangeTracking.EntityTrackedEventArgs e)
{
   var entry = e.Entry;
   var entity = entry.Entity;
   if (entry.State == EntityState.Added && IsIdSet(entry))
   {
      entry.State = EntityState.Unchanged;
   }
}

public static bool IsIdSet(EntityEntry entry)
{
   var entity = entry.Entity;
   var metadata = entry.Metadata;
   var primaryKey = metadata.FindPrimaryKey();
   return primaryKey.Properties.All(x => Convert.ToInt32(x.PropertyInfo.GetValue(entity)) > 0);
}

@sjb-sjb
Copy link

sjb-sjb commented Oct 10, 2018

@optiks Will that cause re-entry into the Tracked (or StateChanged) event handler? Or perhaps it will trigger a StateChanged event for an Added --> Unchanged transition?

@sjb-sjb
Copy link

sjb-sjb commented Oct 10, 2018

@ajcvickers I agree with your proposal, which I read after suggesting something similar in one of the comments in #13564.

The only people that could find it problematic would be people who set store-generated keys themselves and, well, they just shouldn't be doing that in the first place!

@ajcvickers
Copy link
Contributor Author

ajcvickers commented Oct 10, 2018

@ajcvickers @sjb-sjb I disagree your last statement. It's very much a valid thing to do in testing/seeding scenarios. We had many requests to be able to do this in classic EF. However, my point is that we can still support this as an explicit gesture.

The main problem with making this change now is that it's going to be a break for a lot of seeding/tests, so while I am proposing this, I'm not convinced we will be able to do it.

@optiks
Copy link

optiks commented Oct 10, 2018

@sjb-sjb Unchanged is still tracked, so it shouldn't be re-entrant. If you're subscribing to StateChanged as well, then yes, I'd expect that to fire.

@sjb-sjb
Copy link

sjb-sjb commented Oct 10, 2018

@ajcvickers Good point about seeding. I am currently not seeding in the recommended way, instead I am making internal API calls and then doing a SaveChanges.

It would make sense to me to go ahead with the changes you propose and then alter HasData so that seeding still works. Within HasData you can move the entities from Unchanged to Added.

Outside of seeding, it seems like broken behavior to me that EF would Add an entity that already has a valid store-generated key.

@optiks Thanks for the clarification.

@optiks
Copy link

optiks commented Oct 11, 2018

@ajcvickers FWIW, I agree that you should be limiting the number of breaking changes. In general, opt-in for behavioural changes is reasonable.

@sjb-sjb
Copy link

sjb-sjb commented Oct 11, 2018

It's hard to disagree with @optiks there. But, @ajcvickers, if an option is created to have the entity land in the Unchanged state instead of the Added state, then I would vote for an additional possibility which is that it stays Detached. In fact this is what I want it to do in my actual use case. This also is consistent with, as I commented in #13564, the generally desirable goal of giving the user fine-grained control over when / whether entities are implicitly attached to a context or not. My software design is using State assignments to attach entities and consistent with that I generally don't want other (reachable) entities to be attached implicitly.

@optiks
Copy link

optiks commented Oct 11, 2018

@sjb-sjb, I get what you're saying.

Two other workarounds I can think of are:

  1. Set the foreign key id property, instead of the navigation property object reference. This may or may not work for you.
  2. Pre-emptively attach the foreign key object reference as Unchanged. EF Core won't override this state during the Add.

Another thing to consider is whether disabling AutoDetectChanges might help in your case?

@ajcvickers
Copy link
Contributor Author

Triage: we agreed on the general idea, but the breaking potential is great. Therefore, we will instead introduce two new methods:

  • AddOrAttach
  • AddOrUpdate

These methods would work in the way described above when generated keys are being used. That is, if the instance has no key value set, then it will always be Added. If it has a key value set then it will be Unchanged for AddOrAttach or Modified for AddOrUpdate. For non-store generated keys, both will Add everything.

We will make the breaking change in the DetectChanges behavior, but also add a hook to allow the behavior to be defined--see #13564

@ajcvickers ajcvickers added this to the 3.0.0 milestone Oct 12, 2018
@ajcvickers ajcvickers self-assigned this Oct 12, 2018
@sjb-sjb
Copy link

sjb-sjb commented Nov 3, 2018

@optiks Thanks for your comments. I'm glad that you "got it" because in retrospect I sure didn't (and probably still don't). I tried to workaround by setting state back to Detached if it I didn't want the Detached --> Added transition. However for some reason EF then re-attached it again, resulting in an infinite cycle Detached --> Added --> Detached --> Added etc. This problem went away when I set the state to Unchanged instead of Detached. I do end up with an Unchanged entity attached to my context that I would rather not have attached, but this is not a big efficiency hit in my situation.

@ajcvickers I can imagine this cropping up when you implement the Add-suppressing modification to the event, as suggested in your triage comment for #13564. I would guess it is solvable though.

@ajcvickers
Copy link
Contributor Author

See also #14031

@sjb-sjb
Copy link

sjb-sjb commented Dec 2, 2018

Just to provide a comment on something that I have found out through a bit more research. In the workaround suggested by @optiks (and I believe this is relevant to #14031 by @Rubbiroid as well), setting the state to Unchanged will result in a sequence Detached --> Added --> Unchanged. Some extra work is required to distinguish this from the Detached --> Added --> Unchanged transitions that occur as a result of SaveChanges on an Added entity. The only difference is that in the 'workaround' version we have Id > 0 during Detached --> Added, while in the normal case we have Id < 0.

What we want to see, in effect, is Detached --> Unchanged in the 'workaround' case but we want to keep Detached --> Added --> Unchanged in the 'normal' case. To do this we need to carry information from the Tracked event forward to the StateChanged event. Some locking is needed since the state transition could be triggered from any thread.

Putting it together we get something like this (replace ".Id" with your primary key):

    private void ContextChangeTracker_Tracked(object sender, EntityTrackedEventArgs e)
    {
        EntityEntry entry = e.Entry;
        EntityState previousState = EntityState.Detached;
        EntityState currentState = entry.State;

        if (previousState == EntityState.Detached && currentState == EntityState.Added && entry.Entity.Id > 0) {
            // Workaround a bug whereby EF can Add an entity that already exists (i.e. that has Id > 0)
            entry.State = EntityState.Unchanged; // triggers StateChanged callback
        } else {
            this.TransitionOccurred(entity, previousState, currentState);
        }
    }

    private void ContextChangeTracker_StateChanged(object sender, EntityStateChangedEventArgs e)
    {
        EntityEntry entry = e.Entry;
        EntityState previousState = e.OldState;
        EntityState currentState = e.NewState;
        Debug.Assert(currentState == entry.State);

        if (previousState == EntityState.Added && currentState == EntityState.Unchanged && entry.Entity.Id > 0 && !this._TrackedEntities.ContainsKey(entry.Entity)) {
            // Second half of the bug fix.
            previousState = EntityState.Detached;
            this.TransitionOccurred(entity, previousState, currentState);
        } else {
            this.TransitionOccurred(entity, previousState, currentState);
        }
    }

    private ConcurrentDictionary<object,bool> _TrackedEntities { get; } = new ConcurrentDictionary<object,bool>(); // not identical to EF tracking.

    private void TransitionOccurred(IEntity entity, EntityState previousState, EntityState subsequentState)
    {
        if (subsequentState == previousState) { return; }
        bool startedTracking = (previousState == EntityState.Detached && subsequentState != EntityState.Detached);
        bool stoppedTracking = (previousState != EntityState.Detached && subsequentState == EntityState.Detached);
      
        if (startedTracking) {
             this._TrackedEntities.TryAdd(entity,true);
        } else if (stoppedTracking) {
             this._TrackedEntities.TryRemove(entity, out bool ignore);
        }
        this.OnTransitionOccurred( entity, previousState, subsequentState);
    }

    virtual protected void OnTransitionOccurred(IEntity entity, EntityState previousState, EntityState subsequentState)
     {
     }

@ajcvickers
Copy link
Contributor Author

DetectChanges bit for 3.0; Backlog for other parts.

@ajcvickers ajcvickers added this to the Backlog milestone Feb 5, 2019
@ajcvickers
Copy link
Contributor Author

Split DetectChanges part into #14616

@sjb-sjb
Copy link

sjb-sjb commented Sep 11, 2023

@ajcvickers , @AndriySvyryd : this issue has become more pressing for Sqlite users following the implementation of TPC.

Since Sqlite does not have named id sequences, the use of TPC with Sqlite typically means using Guid's for the identity field. But for Guid key fields with Sqlite, the key value appears to be generated by EF Core during an Attach operation, and before the Tracked event is called. This differs from autogenerated ints or longs used as identity fields where the value is set into the entity following the save changes.

Net/net, the workaround above no longer works. Testing the identity field during the Tracked event (arising due to an Attach call) no longer tells us whether or not this is an entity that is already in the database, because the Guid value will have been filled in by EF Core if it was not already set before Attach was called. Consequently we can no longer use this to determine whether the Added state genuinely reflects an entity that needs to be added to the database, vs an entity that already had an Id and was only put into the Added state due to EF's internal logic. Indeed as far as I can tell there is no value associated with the Tracked event args that can be used to test this condition.

Edit: not sure how to work around this currently. Possibilities to explore are (a) server-side generation as suggested by @roji below or (b) perhaps stashing a copy of the guid at the time of materialization (assuming there is a hook for materialization).

@optiks take note.

@roji
Copy link
Member

roji commented Sep 11, 2023

@sjb-sjb you should be able to switch to server-side generation of UUIDs, which would make things very similar to how they work with sequence-drive ints. It seems that there's no nice native way to generate a random UUID in SQLite, but there's at least this extension which look pretty good.

Could be related to #30753, which would in general change the default value generation strategy for GUIDs to be server-side.

@sjb-sjb
Copy link

sjb-sjb commented Sep 11, 2023

One way to resolve this would be to store EF-generated guid keys internally, the same way temporary integer ids are, and not store them in the entity field until after the entity has been saved. Treating EF-generated guids like temporary keys would make sense because… they ARE temporary keys, until they are stored in the database whereupon they become permanent keys. The fact that the temporary value is equal to the permanent value is beside the point.

@roji
Copy link
Member

roji commented Sep 11, 2023

Treating EF-generated guids like temporary keys would make sense because… they ARE temporary keys, until they are stored in the database whereupon they become permanent keys.

This is not the meaning of temporary keys in the EF context: temporary keys are about the key being a temporary placeholder until a different key gets fetched back from the database. Since client-generated keys really are the final key (they'll be sent to the database and persisted), they're not temporary (all other properties are "temporary" in the sense that you're using, simply since they haven't yet been saved).

@sjb-sjb
Copy link

sjb-sjb commented Sep 11, 2023

Sure, that’s one way to define it.

Edit: sorry, my wording was a bit too short here. I should have said, "That's a fine definition for 'temporary' in the EF context. The similarity I was trying to bring forward was that in both cases a key is assigned in memory and does not become permanent until it is saved to the database. In that sense both are 'temporary' rather than permanently saved in the database."

@ajcvickers
Copy link
Contributor Author

Sure, that’s one way to define it.

Importantly, this is the way it is defined for the EF Code. This isn't going to change.

@sjb-sjb
Copy link

sjb-sjb commented Sep 11, 2023

I'm fine with that. I'm just saying that having the Guid's not in the entity field due to the Attach would be one way to make a workaround possible here. The breaking impact would be very limited because the users who need to access the Guid could get it from the shadow property instead. The type of breaking is similar to that experienced in the transition to keeping temporary sequential id's in shadow instead of in the entity. A bit less of an impact because people who do want to see the Guid's in the entity could simply assign them themselves in the entity ctor.

Otherwise I don't really know what the workaround could be. I didn't really see how to switch to server-side GUID generation without a large disruption, also if we did that then it would probably have a negative impact on the ability to migrate to other databases in the future.

@roji
Copy link
Member

roji commented Sep 11, 2023

I didn't really see how to switch to server-side GUID generation without a large disruption, also if we did that then it would probably have a negative impact on the ability to migrate to other databases in the future.

What kind of disruption are you thinking of, and why do you think this would impact your ability to migrate to other databases? I'm not aware of other database which don't support server-side generation of GUIDs (SQLite is actually the tricky one).

@sjb-sjb
Copy link

sjb-sjb commented Sep 11, 2023

It's just not an area that I have knowledge in and there seem to be a number of hurdles. The library you referenced says it only works with Linux and OS X, I'm not familiar with installing user functions in Sqlite, and I don't know what I would have to do to enable those functions to generate Id's in EF Core. I also don't know how to configure server-side generation of Guid's in SQL Server or in EF Core with SqlServer. In addition I have also read that there is a possibility of performance issues due to index fragmentation when using Guid's.

Bear in mind, the problem that I'm really trying to solve is how to use TPC with Sqlite. There is a note on the TPC page saying that autoincremented integer identities don't work with TPC under Sqlite because there is an absence of named sequences in Sqlite. That's why I started investigating using Guid identifiers. But given the challenges with Guid's, I'm wondering whether to look again at the autoincrement question. I understand there is a table in the Sqlite db containing the current autoincrement values, and it seems to me that if I seeded the high-order bits with a different number for each concrete type then the autoincrement approach would work fine.

EDIT: seeding the sqlite_sequence table did not work for me in this context. It seems that the sqlite_sequence table is either not generated or not used.

EDIT: one possible workaround is to wrap the Guids in a struct. Then EF will not generate the Guid when the entity is passed to Attach, and correspondingly the test for Id == default in the Tracked event will correctly reflect whether the entity has been stored in the database. However in this case you must remember to set the Guid yourself in the Tracked event. To maintain consistently that the Guid is unset if the entity has not been saved, you also need to unset the Guid if the SaveChanges fails.

I do sort of wish that TPC would work 'out of the box' with Sqlite. Maybe it does and I'm just missing something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants