-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip the zeroed parent token (deleted) Custom Attribute entries in Reflection (and elsewhere) #53635
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
@jkotas, @davidwrighton I would appreciate any thoughts you may have about this plan |
Can the delete be achieved by setting the parent to 0 using the existing code? (I expect that it is how we would actually implement it in CoreCLR.) |
Tagging subscribers to this area: @tommcdon Issue DetailsExtend the EnCLog format to include a new allowed value for the Function Code column: "Delete" with value 6. Motivation: Support deletion of custom attributes using EnC. The semantics of "Delete" is to do a "soft delete" - the row is treated as unavailable and is skipped by traversals of the table. Row indices of subsequent rows in the CustomAttributes table are not affected. Future generations may have an EnCLog row with function code 0 (Default) to insert an updated valid custom attribute in the table at a previously deleted row. It is an error to delete a deleted row. Related work:
|
Yes, except apparently this is going to be a problem for System.Reflection.Metadata and Roslyn because SRM doesn't consider a 0 Parent to be valid. |
Where is the problem in System.Reflection.Metadata ? |
MetadataBuilder validates inputs: https://source.dot.net/#System.Reflection.Metadata/System/Reflection/Metadata/Ecma335/MetadataBuilder.Tables.cs,749 |
Looking at the check again the validation requires token with valid kind. But it does not require the row id to be != 0. I guess it would go thru if we only set the parent row id to 0. Feels hacky though. |
Should I go ahead and add the new ENC log delete command? Implement it by changing the parent to 0. The challenge for me will be finding the right places to skip the rows with a 0 parent. Are we sure we want to call this command just "Delete". The rest of the commands have the member type (func, method, field, etc.) in the name. What about "eDeltaAttributeDelete"? |
Yes, should be just "delete" - the token in the table fully specifies what entity should be deleted. |
I think so. It is clean and should be easy to do afaict.
Agree that's the harder part. |
Yep, I expect that the rows with parent 0 are going to leak out. For example, it will be difficult for them to not leak out in APIs like save to stream. Does the debugger side maintain its snapshot of metadata - does it need to understand the new command as well to maintain its snapshot? |
@gregg-miskelly @isadorasophia I don't think the debugger applies metadata/IL delta. It only applies PDB delta to its symbol reader. |
Perhaps it would be better then if the CLR internally represented deleted CAs using a valid CA record that points to a well-known attribute, such as "System.Reflection.Metadata.DeletedAttribute" with no value and parent of the containing assembly. Then if it leaks then it wouldn't break the reader. (It would however require MemberRef to such attribute constructor to be added during the serialization). |
I think that would be a lot more difficult to pull off than the parent 0 solution internally in CLR. We would have to manufacture tokens for this attribute, Roslyn would have no idea about these tokens, and they would clash on next update. If you like the fake attribute better, would it be possible to emit it in Roslyn so that Roslyn is aware of its token?
Any particular situation that you are worried about? The typical readers should not be reading these rows. These rows should be only reachable by readers that enumerate all rows. |
I don't think that would be an issue. What I meant is that we still use the Delete operation. The DeletedAttribute would only be found in the metadata image that is produced by the serialization. EnC does not use that image. The only metadata EnC uses are the baseline metadata that are produced by the compiler and stored on disk. The image might be used by Expression Evaluators (assuming the debugger uses the CLR metadata serialization API to get the metadata) and anyone else who is using that serialization API - I do not know what other users of this API are there. That's the only concern I have, that something that calls this API gets broken. If you are not concerned about these then I am not concerned :)
Good point. I think that reduces the risk. |
Just occurred to me that serialization needs to sort the CustomAttribute table if the result is supposed to be ECMA compliant, because there is no CustomAttributePtr table that would allow indirection (unlike, e.g. FieldPtr). In that process 0-parent entries can be filtered out. If the resulting serialized format is not ECMA compliant then I guess it's not much worse of with 0-parent entries. |
I do not think they have to be sorted. It is what the Sorted bit field in the |
To test the runtime changes, I'll need a Roslyn build that uses the new ENC log delete function. |
The question is what is expected from the API that serializes the runtime metadata. The caller might not know whether EnC has been applied. If the expectation is that the metadata may have the extra stuff then we are fine. |
I have been thinking about a hypothetical case where we would extend delete to support deleting fields or methods. We would want the soft-deleted entries to be leaking out from metadata APIs for those cases. The runtime would crash in spectacular ways (with no way to continue) in number of places if it would fail to find the metadata records for the soft-delete entry. It makes me think that we should do nothing to prevent the deleted entries from leaking out. Considering that, is it really worth it to add the special delete command? It will be indistinguishable from the update with null command. |
@tmat In some cases the debugger currently relies on the IL delta for getting metadata information based on relative virtual addresses, e.g. reading fields on EE. Other than that, we rely on the CorDebug APIs for further EE queries -- we reload the module metadata ( |
@isadorasophia How does the debugger acquire the metadata image that's passed to Roslyn EEs? |
It looks like the core code can be moved to the CMiniMdRW class allowing the metadata enc code to use it. This is assuming we are still talking about a ENC Log delete function. Should I continue with adding the delete function? The RegMeta::DeleteToken code would make it easy to support deleting any kind of token; not just custom attributes. |
I do not have a strong opionion on this given this discussion. The encoding of deleted rows is part of the extended metadata spec, and it is going to leak out. There is not a big difference between encoding the deleted row on the Roslyn side and sending it over as an update vs. sending the delete EnC log entry and letting runtime side to take care of encoding it. If you choose to add the delete function, make sure to add it everywhere, e.g. including https://github.com/dotnet/runtime/blob/main/src/libraries/System.Reflection.Metadata/src/System/Reflection/Metadata/Ecma335/EditAndContinueOperation.cs |
From Roslyn perspective we don't care much either. Seems like emitting the update with zeroed RowId would be simpler overall. |
BTW, let's start a document that details the format of EnC specific metadata so that we know what's expected where. |
Then I'm going to leave this issue in your hands Tomas. |
@mikem8361 Sounds good. Is skipping the zeroed CA entries in Reflection (and elsewhere) tracked by another issue? If so we can close this one. |
No it isn't. I'll rename this issue to track zero'ed CA entries. |
I don't think it makes sense to close the issue: right now it's the only documentation of the delete semantics. If we record them somewhere else (in docs/design/specs/ for example) we can close the issue. |
I do not think there is any work needed in Reflection to skip to zeroed out entries for custom attributes. If we push deletes further to e.g. methods, there may be some work required. |
Ah, yes, because they won't be found for the parent. Makes sense. |
Who is going to do the Roslyn zero parent for CA entries "delete" work? Do we need an issue? |
@davidwengier Is already working on it in his CustomAttribute PR. |
Yes, updating the metadata for attributes is tracked by dotnet/roslyn#52816 |
I've run into a slight issue implementing the delete (zero parent token) in Roslyn which could be solved by simplifying what is emitted for deleted rows. The plan agreed upon here is to zero the Parent column, and leave Constructor and Value columns as they were. The simplification would be to just set Constructor to a MemberRef to row 1, and Value to nil, for the deleted rows. Would that cause an issue for the runtime? If the deleted rows are never enumerated then presumably not, but is there any validation performed that might fail? Any checks that Constructor actually points to an attribute type for example? |
I think 0 would make more sense. Any reason you have picked 1? I do not see a problem with this. EnC for reflection has a general problem with race conditions between user code reflecting on something and the item being mutated by EnC. This is just one of those cases. |
An abundance of caution, that's all. Happy with 0. |
The Constructor is a coded index. It can't be 0. It can be MethodDef with 0 row id or MemberRef with 0 row id. |
MethodDef with row 0 is what I meant. |
Yep, that's what I meant too :) |
Motivation: Support deletion of custom attributes using EnC.
Proposal:
Document how deleted entries look for each table (CustomAttributes table for now, other tables possibly in future) to allow Roslyn to emit deltas that logically remove metadata entities.
Specifically for the CustomAttributes table the "delete" update would zero out the RowId part of the Parent field and keep the rest of the entry unchanged.
Alternatives:
Extend the EnCLog format to include a new allowed value for the Function Code column: "Delete" with value 6.
In .NET 6 the only valid combination is a Delete function code with a Token column value that specifies a row in the CustomAttributes table.
Semantics:
The semantics of "Delete" is to do a "soft delete" - the row is treated as unavailable and is skipped by traversals of the table. Row indices of subsequent rows in the CustomAttributes table are not affected.
Future generations may have an EnCLog row entry that resurrects a previously deleted row with some updated valid entry using a normal update operation (function code 0 (Default)).
An EnCLog should not delete an already deleted row, but the runtime is not required to detect and signal an error.
Operations that serialize or expose entire updated tables should emit the zeroed out rows with the understanding that after an update has been applied, the metadata is no longer necessarily strictly ECMA-335 conforming.
Related work:
The text was updated successfully, but these errors were encountered: