-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semantic Tokens - Reduce set of non-standard token types #13066
Comments
There are actually a number of custom types which I think could be replaced with standard token types without losing meaning. I think this is a holdover from the deprecated semantic highlighting LSP extension, which had no set of standard types and did not support modifiers. To preserve meaning when replacing types with standard types, non-standard modifiers could be used as in this case the client can configure sensible fallbacks for the highlighting. Distributions of rust-analyzer, such as the VSCode plugin, generally provide configurations for these custom types, hence this is not a massive issue, however maintenance of these distributions could be simplified and creation of new distributions for different clients made easier by slimming down the set of custom token types. Here are a few suggestions (
I am happy to implement this, but think it is worth getting some consensus on which changes should be made first. If you have any comments on the suggestions made above or your own suggestions, please add them. The full set of current token types can be found here. |
Built-in types are not the same thing as standard library types, which is what |
A non-standard
|
I fail to see why adding a ton of modifiers would be a better solution than what we currently have. That is, why exactly are non-standard modifiers over non-standard types better in your eyes? On another note, yes we should be using the standard types where it makes sense, we already use |
Most clients implement fallbacks from unrecognised modifers to the un-modified type. For example, |
Hmm I see, the thing is we are limited in how many token modifiers we can have, which is currently 32 due to the bitset requirement. It also feels like the wrong usage of token modifiers to me and instead signals a lack of the language server protocol in that we really need to be able to specify a hierarchy of token types which I'd much rather see. VSCode itself already supports this, but the LSP unfortunately doesn't. |
A point to be made is that the feature is intended to convey highlighting information, rather than a full semantic description of the document. The server therefore has to make some decisions as to what constitutes sensible highlighting. Personally I can't see many use cases in which a user wants to highlight, for example, a pair of parentheses differently to a pair of square brackets. Maybe the approach to take is an overall slimming down of the information sent to the client? (NB: #12783 reached a similar conclusion) |
I don't really see the point made here, full semanticdescription is highlighting information so what the server does seems like a correct choice to me.
That I can't really tell you either, I am not using the different punctuation tags, others are as they were requested, see #6152 and #8279.
I am all in favor of making more parts of this infra configurable, since there are a lot of tags that are not useful for the general userbase and a lot of clients seem to struggle with too many tokens being sent. But overall I would dislike us removing features from our syntax highlighting fully as there are clearly people that want these. And to reiterate regarding the main part of this issue, I do not think restricting ourselves to the standard token types is a good idea, for one as outlined we can only have up to 32 modifiers and secondly there is no standard token set that describes all the languages (which is why extensibility by the protocol is even a thing). |
Sorry maybe I didn't articulate myself too well, all I meant to say is that the specification is quite restrictive and so the server must make compromises. I don't think that the set of token types needs to be limited to only the standard set, just reduced. As you say, the set of standard types is not a one-size-fits-all solution and there are certain Rust features such as lifetimes which I think warrant their own types. At the moment however, rust-analyzer provides 35 non-standard types. For comparison, clangd provides 2 and pylance provides 4. A lot of these non-standard types come from the distinct operator and punctuation types which I didn't realise have come from feature requests so yes I agree they should not be removed. I think server configuration options are a good compromise of flexibility and ease-of-use. To illustrate why I think having so many non-standard token types is somewhat of an issue, here is a snippet of my coc configuration: vim.api.nvim_command('hi link CocSemAngle CocSemPunctuation')
vim.api.nvim_command('hi link CocSemArithmetic CocSemOperator')
vim.api.nvim_command('hi link CocSemAttribute CocSemDecorator')
vim.api.nvim_command('hi link CocSemAttributeBracket CocSemDecorator')
vim.api.nvim_command('hi link CocSemBitwise CocSemOperator')
vim.api.nvim_command('hi link CocSemBoolean Boolean')
vim.api.nvim_command('hi link CocSemBrace CocSemPunctuation')
vim.api.nvim_command('hi link CocSemBracket CocSemPunctuation')
vim.api.nvim_command('hi link CocSemBuiltinAttribute CocSemAttribute')
vim.api.nvim_command('hi link CocSemBuiltinType CocSemType')
vim.api.nvim_command('hi link CocSemCharacter CocSemString')
vim.api.nvim_command('hi link CocSemColon CocSemPunctuation')
vim.api.nvim_command('hi link CocSemComma CocSemPunctuation')
vim.api.nvim_command('hi link CocSemComparison CocSemOperator')
vim.api.nvim_command('hi link CocSemConstParameter CocSemParameter')
vim.api.nvim_command('hi link CocSemDerive CocSemAttribute')
vim.api.nvim_command('hi link CocSemDeriveHelper CocSemAttribute')
vim.api.nvim_command('hi link CocSemDot CocSemPunctuation')
vim.api.nvim_command('hi link CocSemEscapeSequence CocSemString')
vim.api.nvim_command('hi link CocSemFormatSpecifier CocSemString')
vim.api.nvim_command('hi link CocSemGeneric CocSemUnknown')
vim.api.nvim_command('hi link CocSemLabel CocSemMacro')
vim.api.nvim_command('hi link CocSemLifetime CocSemTypeParameter')
vim.api.nvim_command('hi link CocSemLogical CocSemOperator')
vim.api.nvim_command('hi link CocSemMacroBang CocSemMacro')
vim.api.nvim_command('hi link CocSemParenthesis CocSemPunctuation')
vim.api.nvim_command('hi clear CocSemPunctuation')
vim.api.nvim_command('hi link CocSemSelfKeyword CocSemParameter')
vim.api.nvim_command('hi link CocSemSelfTypeKeyword CocSemType')
vim.api.nvim_command('hi link CocSemSemicolon CocSemPunctuation')
vim.api.nvim_command('hi link CocSemTypeAlias CocSemType')
vim.api.nvim_command('hi link CocSemToolModule CocSemNamespace')
vim.api.nvim_command('hi link CocSemUnion CocSemStruct')
vim.api.nvim_command('hi link CocSemUnresolvedReference Warning') Without all this, large areas of the document do not get highlighted correctly, as the token types are unrecognised. Defaults similar to this could be provided by coc-rust-analyzer, but having to provide distributions like this for every LSP client is a lot of work. |
Ye, I didn't quite express myself either properly I think. I'll look into this a bit more and fix up a PR for settings regarding "unspecializing" punctuations and operators, and making use of |
@Veykril EDIT: also |
Ye, that's not an issue. We can just duplicate decorator for now like its done with |
#13084 now adds some config for the mentioned ones (+ the |
Okay, so with those settings in mind, it boils down to the following custom types we have: ATTRIBUTE_BRACKET - the the builtin ones I think we can turn around with a modifier as you proposed, having Using modifiers to specialize one concrete entity (like the derive case with attributes) seems like the wrong usage of modifiers to me which is why I am so stubborn here :D But we also don't really have a lot of alternatives right now in this regard unfortunately. I'll open an issue on https://github.com/microsoft/language-server-protocol in a bit in regards to the problems with semantic highlighting we have here. |
I think like a limited set of custom types (with extra modifiers, in the CSS sense, since LSP is somewhat ambiguous) more. Do we really need both I'm more surprised that |
Well, I color |
I think of The LSP set has some really suprising omissions, even |
Add some more highlighting configurations The following can be enabled/disabled now in terms of highlighting: - doc comment injection (enabled by default) - punctuation highlighting (disabled by default) - operator highlighting (enabled by default) - punctuation specialized highlighting (disabled by default) - operator specialized highlighting (disabled by default) - macro call bang highlighting (disabled by default) This PR also changes our `attribute` semantic token type to the `decorator` type which landed upstream (but not yet in lsp-types). Specialized highlighting is disabled by default, as all clients will have to ship something to map these back to the standard punctuation/operator token (we do this in VSCode via the inheritance mapping for example). This is a lot of maintenance work, and not something every client wants to do, pushing that need to use the user. As this is a rather niche use in the first place this will just be disabled by default. Punctuation highlighting is disabled by default, punctuation is usually something that can be done by the native syntactic highlighting of the client, so there is no loss in quality. The main reason for this though is that punctuation adds a lot of extra token data that we sent over, a lot of clients struggle with applying this, so disabling this improves the UX for a lot of people. Note that we still highlight punctuations with special meaning as that special entity, (the never type `!` will still be tagged as a builtin type if it occurs as such) Separate highlighting of the macro call bang `!` is disabled by default, as I think people actually didn't like that change that much, though at the same time I feel like not many people even noticed that change (I prefer it be separate, but that's not enough reason for it to be enabled by default I believe :^) ) cc #12783 #13066
The standard types in the LSP are less than bare bones tbh yes ... I guess going with the modifiers is the best we can do as I don't think a viable alternative will appear within the coming years, if we run into the limit at some point then so be it but I assume for now we should be able to fit all the things with some room to spare(given we ignore the variations of punctuation and operators). I love semantic highlighting but I feel like the protocol vastly limited its own functionality for no apparent reason. |
Hey guys, don't know if I should open up a separate issue for this, but currently, there's no way to distinguish |
That sounds like it deserves a separate issue, so please open one for that (also please elaborate a bit on what exactly you want there, I am not quite sure I completely understand) |
Is the limit of 32 modifiers per token or total? Separately, is the limit on r-a's end or LSP's? In my opinion switching to have lots of modifiers would be the best solution, pending a proper hierarchy on the LSP side of things. If it's possible to implement it for just VS Code, that could work too (though I am biased as I use VS Code). |
LSP Spec uses bitset for modifiers but doesn't explicitly mention the max number of modifiers. But LSP's integer is defined to be |
Note that VSCode shouldn't have an issue with unset token types not being colored, since VSCode natively supports token type hierarchy as a fallback (which we are populating). |
Hmm...LSP uses TypeScript for its implementation? JS has bigint support natively, so theoretically it could be unbounded if that were used. Not sure if that's a change they'd be interested in making, however. When looking to see what tokens had what semantics attached yesterday, I did notice that TextMate scopes were quite a bit more detailed in some cases. Perhaps if there was some way to tell the editor to override semantic highlighting for certain TextMate scopes, that could work as well? And for the token type hierarchy, is there any way to take advantage of that today? |
Can you elaborate on that? In general semantic tokens can be more precise than textmate scopes (as the latter is just regex), if there are some provided by r-a where that is not the case I would consider that a bug.
If you are using VSCode you are by default taking advantage of it, though we might be missing some hierarchy definitions (the hierarchy can also specifically delegate back to textmate scope fwiw). |
No. We should follows the protocol definition in text, not any JS or TS implementation. Going beyond the protocol will break support of other clients that strictly following the protocol (like, already using IMO, we don't really need 32 modifiers. Semantic tokens are not about distinguish each individual tokens, and themes don't usually provide these many different colors. There are currently too many punctuation kinds. |
We should just have a setting that makes r-a map the custom types to the standard ones I think, and then probably make that enabled by default. It is a shortcoming of the LSP that non-standard tokens are this painful to use in clients that don't have this hierarchy concept I think. It's a shame given how useful semantic tokens can be. |
Even targeting specific keywords is an area where TextMate has more detail.
Yeah, I was suggesting that LSP could make a change, as it doesn't appear specified anywhere.
At least in my case, that's not the concern. I'm more interested in changing the color to match something else. |
Then we are missing those details, we don't necessarily add new token types unless someone needs it, as adding a type for each keyword would certainly be too much eventually.
Also regarding this, standard themes don't provide colors for these specific tokens usually. That doesn't mean they aren't useful. The punctuations were requested by someone, because they want to make use of them. Likewise a lot of these more specific token types I personally use in my own theme because I want to leverage the more precise semantic token types. (Nevertheless I see the problem with non-standard token types which is why I think we should just add what I said earlier now, |
At present, a built-in type is conveyed to the client using a token with type
builtinType
. This is not in the standard set of semantic token types and therefore requires configuration in the client. The same semantic idea can be conveyed using a token with typetype
and thedefaultLibrary
modifier, both of which are part of the standard set of token types and modifiers.The text was updated successfully, but these errors were encountered: