Semantic Tokens - Reduce set of non-standard token types #13066

frankplow · 2022-08-19T12:39:11Z

At present, a built-in type is conveyed to the client using a token with type builtinType. This is not in the standard set of semantic token types and therefore requires configuration in the client. The same semantic idea can be conveyed using a token with type type and the defaultLibrary modifier, both of which are part of the standard set of token types and modifiers.

The text was updated successfully, but these errors were encountered:

frankplow · 2022-08-19T13:11:34Z

There are actually a number of custom types which I think could be replaced with standard token types without losing meaning. I think this is a holdover from the deprecated semantic highlighting LSP extension, which had no set of standard types and did not support modifiers. To preserve meaning when replacing types with standard types, non-standard modifiers could be used as in this case the client can configure sensible fallbacks for the highlighting. Distributions of rust-analyzer, such as the VSCode plugin, generally provide configurations for these custom types, hence this is not a massive issue, however maintenance of these distributions could be simplified and creation of new distributions for different clients made easier by slimming down the set of custom token types.

Here are a few suggestions (type + modifiers):

attribute -> decorator; these may not have quite identical meanings, but given that Rust only has attributes this should serve to provide some sensible default highlighting for attributes.
builtinAttribute -> decorator + defaultLibrary
derive -> decorator + derive
deriveHelper -> decorator + derive
constParameter -> parameter + readonly
trait -> interface; not too sure about this one
typeAlias -> type
arithmetic -> operator + arithmetic
bitwise -> operator + bitwise
comparison -> operator + comparison
logical -> operator + logical
attributeBracket -> punctuation + attributeBracket
angle -> punctuation + angleBracket
brace -> punctuation + curlyBracket
parenthesis -> punctuation + parenthesis
colon -> punctuation + colon
comma -> punctuation + comma
dot -> punctuation + period
semi -> punctuation + semicolon
macroBang -> punctuation + exclamationMark; or made part of its corresponding macro, I don't think many want to highlight this differently to the preceding word.
selfKeyword -> parameter + self
selfTypeKeyword -> type + self

I am happy to implement this, but think it is worth getting some consensus on which changes should be made first. If you have any comments on the suggestions made above or your own suggestions, please add them. The full set of current token types can be found here.

flodiebold · 2022-08-19T13:32:24Z

Built-in types are not the same thing as standard library types, which is what defaultLibrary indicates.

frankplow · 2022-08-19T13:38:22Z

A non-standard builtin modifier could be introduced then?

builtinType -> Type + builtin
builtinAttribute -> decorator + builtin

Veykril · 2022-08-19T14:50:38Z

I fail to see why adding a ton of modifiers would be a better solution than what we currently have. That is, why exactly are non-standard modifiers over non-standard types better in your eyes?

On another note, yes we should be using the standard types where it makes sense, we already use interface for traits as that is basically the equivalent in rust. decorator didn't exist when we introduced the attribute type, so that one I think certainly makes sense to use instead now.

frankplow · 2022-08-19T15:06:18Z

@Veykril

Why exactly are non-standard modifiers over non-standard types better in your eyes?

Most clients implement fallbacks from unrecognised modifers to the un-modified type. For example, type + builtin would fall back to type, which is what I suspect most users would desire anyway, whereas builtinType has nothing to fall back to and would not be semantically highlighted.

Veykril · 2022-08-19T15:42:10Z

Hmm I see, the thing is we are limited in how many token modifiers we can have, which is currently 32 due to the bitset requirement. It also feels like the wrong usage of token modifiers to me and instead signals a lack of the language server protocol in that we really need to be able to specify a hierarchy of token types which I'd much rather see. VSCode itself already supports this, but the LSP unfortunately doesn't.

frankplow · 2022-08-19T16:58:37Z

A point to be made is that the feature is intended to convey highlighting information, rather than a full semantic description of the document. The server therefore has to make some decisions as to what constitutes sensible highlighting. Personally I can't see many use cases in which a user wants to highlight, for example, a pair of parentheses differently to a pair of square brackets. Maybe the approach to take is an overall slimming down of the information sent to the client?

(NB: #12783 reached a similar conclusion)

Veykril · 2022-08-22T07:29:17Z

A point to be made is that the feature is intended to convey highlighting information, rather than a full semantic description of the document

I don't really see the point made here, full semanticdescription is highlighting information so what the server does seems like a correct choice to me.

Personally I can't see many use cases in which a user wants to highlight, for example, a pair of parentheses differently to a pair of square brackets.

That I can't really tell you either, I am not using the different punctuation tags, others are as they were requested, see #6152 and #8279.

(NB: #12783 reached a similar conclusion)

I am all in favor of making more parts of this infra configurable, since there are a lot of tags that are not useful for the general userbase and a lot of clients seem to struggle with too many tokens being sent. But overall I would dislike us removing features from our syntax highlighting fully as there are clearly people that want these.

And to reiterate regarding the main part of this issue, I do not think restricting ourselves to the standard token types is a good idea, for one as outlined we can only have up to 32 modifiers and secondly there is no standard token set that describes all the languages (which is why extensibility by the protocol is even a thing).

frankplow · 2022-08-22T10:48:08Z

Sorry maybe I didn't articulate myself too well, all I meant to say is that the specification is quite restrictive and so the server must make compromises.

I don't think that the set of token types needs to be limited to only the standard set, just reduced. As you say, the set of standard types is not a one-size-fits-all solution and there are certain Rust features such as lifetimes which I think warrant their own types. At the moment however, rust-analyzer provides 35 non-standard types. For comparison, clangd provides 2 and pylance provides 4.

A lot of these non-standard types come from the distinct operator and punctuation types which I didn't realise have come from feature requests so yes I agree they should not be removed. I think server configuration options are a good compromise of flexibility and ease-of-use.

To illustrate why I think having so many non-standard token types is somewhat of an issue, here is a snippet of my coc configuration:

vim.api.nvim_command('hi link CocSemAngle CocSemPunctuation')
vim.api.nvim_command('hi link CocSemArithmetic CocSemOperator')
vim.api.nvim_command('hi link CocSemAttribute CocSemDecorator')
vim.api.nvim_command('hi link CocSemAttributeBracket CocSemDecorator')
vim.api.nvim_command('hi link CocSemBitwise CocSemOperator')
vim.api.nvim_command('hi link CocSemBoolean Boolean')
vim.api.nvim_command('hi link CocSemBrace CocSemPunctuation')
vim.api.nvim_command('hi link CocSemBracket CocSemPunctuation')
vim.api.nvim_command('hi link CocSemBuiltinAttribute CocSemAttribute')
vim.api.nvim_command('hi link CocSemBuiltinType CocSemType')
vim.api.nvim_command('hi link CocSemCharacter CocSemString')
vim.api.nvim_command('hi link CocSemColon CocSemPunctuation')
vim.api.nvim_command('hi link CocSemComma CocSemPunctuation')
vim.api.nvim_command('hi link CocSemComparison CocSemOperator')
vim.api.nvim_command('hi link CocSemConstParameter CocSemParameter')
vim.api.nvim_command('hi link CocSemDerive CocSemAttribute')
vim.api.nvim_command('hi link CocSemDeriveHelper CocSemAttribute')
vim.api.nvim_command('hi link CocSemDot CocSemPunctuation')
vim.api.nvim_command('hi link CocSemEscapeSequence CocSemString')
vim.api.nvim_command('hi link CocSemFormatSpecifier CocSemString')
vim.api.nvim_command('hi link CocSemGeneric CocSemUnknown')
vim.api.nvim_command('hi link CocSemLabel CocSemMacro')
vim.api.nvim_command('hi link CocSemLifetime CocSemTypeParameter')
vim.api.nvim_command('hi link CocSemLogical CocSemOperator')
vim.api.nvim_command('hi link CocSemMacroBang CocSemMacro')
vim.api.nvim_command('hi link CocSemParenthesis CocSemPunctuation')
vim.api.nvim_command('hi clear CocSemPunctuation')
vim.api.nvim_command('hi link CocSemSelfKeyword CocSemParameter')
vim.api.nvim_command('hi link CocSemSelfTypeKeyword CocSemType')
vim.api.nvim_command('hi link CocSemSemicolon CocSemPunctuation')
vim.api.nvim_command('hi link CocSemTypeAlias CocSemType')
vim.api.nvim_command('hi link CocSemToolModule CocSemNamespace')
vim.api.nvim_command('hi link CocSemUnion CocSemStruct')
vim.api.nvim_command('hi link CocSemUnresolvedReference Warning')

Without all this, large areas of the document do not get highlighted correctly, as the token types are unrecognised. Defaults similar to this could be provided by coc-rust-analyzer, but having to provide distributions like this for every LSP client is a lot of work.

Veykril · 2022-08-22T11:11:51Z

Ye, I didn't quite express myself either properly I think. I'll look into this a bit more and fix up a PR for settings regarding "unspecializing" punctuations and operators, and making use of decorator for attributes for now. Given from your list, the operators and punctuation ones are the majority of things it seems, and the easiest to fix. Will come back to this afterwards.

frankplow · 2022-08-22T11:20:34Z

@Veykril decorator will need this to be merged first I think: gluon-lang/lsp-types#246, also that repo still hasn't been updated to make 3.17 stable so it's behind a configuration flag.

EDIT: also operator is duplicated between the non-standard types and standard types at the minute. Not sure if there's really an issue with this (actually doing this may be an alternative if lsp-types does not update soon) but it could potentially be removed from the non-standard set here.

Veykril · 2022-08-22T11:32:38Z

Ye, that's not an issue. We can just duplicate decorator for now like its done with operator. While we are at it we can also remove the operator dupe then.

Veykril · 2022-08-22T11:57:43Z

#13084 now adds some config for the mentioned ones (+ the decorator change) with some disabled by default.

Veykril · 2022-08-22T12:39:19Z

Okay, so with those settings in mind, it boils down to the following custom types we have:

ATTRIBUTE_BRACKET - the #[ ] parts of attributes
BUILTIN_ATTRIBUTE - builtin attributes #[cold]
BUILTIN_TYPE - builtin types usize
CHAR - 'c'
CONST_PARAMETER - const param defs, and references
DERIVE - derive attribute references
DERIVE_HELPER - derive helper references
ESCAPE_SEQUENCE - string escapes
FORMAT_SPECIFIER - format! specifiers
GENERIC - things that have no proper tag (this is a curious case, do we even emit these in the end? why would we, this seems like an implementation detail where we shouldn't emit a tag at all maybe?)
LABEL - loop labels
LIFETIME - lifetimes
SELF_KEYWORD - self when used as the module or local (this should actually be 2 things even imo)
SELF_TYPE_KEYWORD - Self
TYPE_ALIAS - type alias
TOOL_MODULE - tool modules, unstable feature. #[rustfmt::foo] is such a tool module
UNION - union references (how is this not a standard token type???)
UNRESOLVED_REFERENCE - unresolved references (this one is simple, we should make this configurable)

the builtin ones I think we can turn around with a modifier as you proposed, having builtin be a modifier (since defaultLibrary has a different meaning).
UNRESOLVED_REFERENCE should just have a config to disable it.
GENERIC I'll have to check what the use of it is, I'm unsure right now but we might use it to explicitly overwrite highlights in macro calls.
ATTRIBUTE_BRACKET should probably be moved into punctuations, in which case it can be disabled (though I think it should have a separate config as well, like I made for the macro call bang).
derive and derive helper could probably be merged (the split doesn't seem to be too useful after all), though I am unsure whether turning that into a modifier is nice, attributes would then be the only thing tagged with this.
Rest I would need to think about a bit more.

Using modifiers to specialize one concrete entity (like the derive case with attributes) seems like the wrong usage of modifiers to me which is why I am so stubborn here :D But we also don't really have a lot of alternatives right now in this regard unfortunately. I'll open an issue on https://github.com/microsoft/language-server-protocol in a bit in regards to the problems with semantic highlighting we have here.

lnicola · 2022-08-22T14:45:32Z

I think like a limited set of custom types (with extra modifiers, in the CSS sense, since LSP is somewhat ambiguous) more. Do we really need both CHAR and STRING? Can't Self be just a TYPE? TOOL_MODULE could be DECORATOR, UNION could be ENUM, and so on.

I'm more surprised that LABEL isn't a standard type.

Veykril · 2022-08-22T14:57:33Z

Can't Self be just a TYPE?

Well, I color Self as a keyword, while I imagine others might wanna color it like a type. The reason why I think having a lot of different tags is beneficial is because some people want to specifically color certain things in various ways (the punctutation and operator variations are a good indicator I think). The modifier approach could work, but going with that route we will easily hit the 32 modifier limit in the near future I imagine.

frankplow · 2022-08-22T15:25:07Z

I think of Self as a type and self as a parameter/module depending on context. Maybe a self or special modifier could be added - I think these are important enough to get their own modifier, and it's not quite narrowing down to only a single language construct. It would allow either the user to use either the keyword highlighting or the type/parameter/module highlighting, as well as providing a distinction between self as a parameter and self as a module which is not available at the moment.

The LSP set has some really suprising omissions, even boolean isn't standard.

Add some more highlighting configurations The following can be enabled/disabled now in terms of highlighting: - doc comment injection (enabled by default) - punctuation highlighting (disabled by default) - operator highlighting (enabled by default) - punctuation specialized highlighting (disabled by default) - operator specialized highlighting (disabled by default) - macro call bang highlighting (disabled by default) This PR also changes our `attribute` semantic token type to the `decorator` type which landed upstream (but not yet in lsp-types). Specialized highlighting is disabled by default, as all clients will have to ship something to map these back to the standard punctuation/operator token (we do this in VSCode via the inheritance mapping for example). This is a lot of maintenance work, and not something every client wants to do, pushing that need to use the user. As this is a rather niche use in the first place this will just be disabled by default. Punctuation highlighting is disabled by default, punctuation is usually something that can be done by the native syntactic highlighting of the client, so there is no loss in quality. The main reason for this though is that punctuation adds a lot of extra token data that we sent over, a lot of clients struggle with applying this, so disabling this improves the UX for a lot of people. Note that we still highlight punctuations with special meaning as that special entity, (the never type `!` will still be tagged as a builtin type if it occurs as such) Separate highlighting of the macro call bang `!` is disabled by default, as I think people actually didn't like that change that much, though at the same time I feel like not many people even noticed that change (I prefer it be separate, but that's not enough reason for it to be enabled by default I believe :^) ) cc #12783 #13066

Veykril · 2022-08-23T12:41:34Z

The standard types in the LSP are less than bare bones tbh yes ...

I guess going with the modifiers is the best we can do as I don't think a viable alternative will appear within the coming years, if we run into the limit at some point then so be it but I assume for now we should be able to fit all the things with some room to spare(given we ignore the variations of punctuation and operators).

I love semantic highlighting but I feel like the protocol vastly limited its own functionality for no apparent reason.

ghost · 2022-09-30T20:15:15Z

Hey guys, don't know if I should open up a separate issue for this, but currently, there's no way to distinguish macro!() calls and the macro_rules! bang ("!") at the end of the macro_rules, which itself is of type keyword. This makes for weird highlighting for macro_rules! definitions.

Veykril · 2022-09-30T21:04:01Z

That sounds like it deserves a separate issue, so please open one for that (also please elaborate a bit on what exactly you want there, I am not quite sure I completely understand)

jhpratt · 2022-11-23T11:00:46Z

Is the limit of 32 modifiers per token or total? Separately, is the limit on r-a's end or LSP's?

In my opinion switching to have lots of modifiers would be the best solution, pending a proper hierarchy on the LSP side of things. If it's possible to implement it for just VS Code, that could work too (though I am biased as I use VS Code).

oxalica · 2022-11-23T11:21:25Z

@jhpratt

Is the limit of 32 modifiers per token or total? Separately, is the limit on r-a's end or LSP's?

LSP Spec uses bitset for modifiers but doesn't explicitly mention the max number of modifiers. But LSP's integer is defined to be i32, so 32 (or 31?) is the technical limit. See Semantic Tokens and Base Types - integer.

Veykril · 2022-11-23T13:03:49Z

Note that VSCode shouldn't have an issue with unset token types not being colored, since VSCode natively supports token type hierarchy as a fallback (which we are populating).

jhpratt · 2022-11-23T16:33:44Z

Hmm...LSP uses TypeScript for its implementation? JS has bigint support natively, so theoretically it could be unbounded if that were used. Not sure if that's a change they'd be interested in making, however.

When looking to see what tokens had what semantics attached yesterday, I did notice that TextMate scopes were quite a bit more detailed in some cases. Perhaps if there was some way to tell the editor to override semantic highlighting for certain TextMate scopes, that could work as well?

And for the token type hierarchy, is there any way to take advantage of that today?

Veykril · 2022-11-23T18:59:01Z

When looking to see what tokens had what semantics attached yesterday, I did notice that TextMate scopes were quite a bit more detailed in some cases.

Can you elaborate on that? In general semantic tokens can be more precise than textmate scopes (as the latter is just regex), if there are some provided by r-a where that is not the case I would consider that a bug.

And for the token type hierarchy, is there any way to take advantage of that today?

If you are using VSCode you are by default taking advantage of it, though we might be missing some hierarchy definitions (the hierarchy can also specifically delegate back to textmate scope fwiw).

oxalica · 2022-11-24T12:43:54Z

Hmm...LSP uses TypeScript for its implementation? JS has bigint support natively, so theoretically it could be unbounded if that were used. Not sure if that's a change they'd be interested in making, however.

No. We should follows the protocol definition in text, not any JS or TS implementation. Going beyond the protocol will break support of other clients that strictly following the protocol (like, already using i32 in Rust clients), which is bad for the ecosystem.

IMO, we don't really need 32 modifiers. Semantic tokens are not about distinguish each individual tokens, and themes don't usually provide these many different colors. There are currently too many punctuation kinds.

Veykril · 2022-11-24T15:53:43Z

We should just have a setting that makes r-a map the custom types to the standard ones I think, and then probably make that enabled by default. It is a shortcoming of the LSP that non-standard tokens are this painful to use in clients that don't have this hierarchy concept I think. It's a shame given how useful semantic tokens can be.

jhpratt · 2022-11-27T20:59:00Z

Can you elaborate on that? In general semantic tokens can be more precise than textmate scopes (as the latter is just regex), if there are some provided by r-a where that is not the case I would consider that a bug.

Even targeting specific keywords is an area where TextMate has more detail. struct is keyword.other.struct.rust for TextMate, while r-a just classifies it as keyword.other.rust. While I didn't intend on targeting this specific keyword, I did want to at least experiment with it for a couple others.

No. We should follows the protocol definition in text

Yeah, I was suggesting that LSP could make a change, as it doesn't appear specified anywhere.

themes don't usually provide these many different colors.

At least in my case, that's not the concern. I'm more interested in changing the color to match something else.

Veykril · 2022-11-28T07:59:26Z

Even targeting specific keywords is an area where TextMate has more detail

Then we are missing those details, we don't necessarily add new token types unless someone needs it, as adding a type for each keyword would certainly be too much eventually.

themes don't usually provide these many different colors

Also regarding this, standard themes don't provide colors for these specific tokens usually. That doesn't mean they aren't useful. The punctuations were requested by someone, because they want to make use of them. Likewise a lot of these more specific token types I personally use in my own theme because I want to leverage the more precise semantic token types. (Nevertheless I see the problem with non-standard token types which is why I think we should just add what I said earlier now, We should just have a setting that makes r-a map the custom types to the standard ones I think, and then probably make that enabled by default.)

frankplow changed the title ~~Semantic Tokens - builtInType should be composed of type and defaultLibrary rather than being its own type~~ Semantic Tokens - builtinType should be composed of type and defaultLibrary rather than being its own type Aug 19, 2022

frankplow changed the title ~~Semantic Tokens - builtinType should be composed of type and defaultLibrary rather than being its own type~~ Semantic Tokens - Reduce set of non-standard token types Aug 19, 2022

Veykril added the A-highlighting (semantic) token highlighting label Aug 19, 2022

Veykril added the A-lsp LSP conformance issues and missing features label Aug 19, 2022

Veykril mentioned this issue Aug 22, 2022

Add some more highlighting configurations #13084

Merged

Veykril added the C-enhancement Category: enhancement label Feb 9, 2023

Veykril self-assigned this May 10, 2023

Veykril mentioned this issue May 11, 2023

Add basic support for augmentsSyntaxTokens and non-standard semantic token config #14777

Merged

bors closed this as completed in b069eb7 May 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semantic Tokens - Reduce set of non-standard token types #13066

Semantic Tokens - Reduce set of non-standard token types #13066

frankplow commented Aug 19, 2022 •

edited

Loading

frankplow commented Aug 19, 2022

flodiebold commented Aug 19, 2022

frankplow commented Aug 19, 2022

Veykril commented Aug 19, 2022

frankplow commented Aug 19, 2022 •

edited

Loading

Veykril commented Aug 19, 2022

frankplow commented Aug 19, 2022

Veykril commented Aug 22, 2022

frankplow commented Aug 22, 2022 •

edited

Loading

Veykril commented Aug 22, 2022

frankplow commented Aug 22, 2022 •

edited

Loading

Veykril commented Aug 22, 2022

Veykril commented Aug 22, 2022

Veykril commented Aug 22, 2022

lnicola commented Aug 22, 2022 •

edited

Loading

Veykril commented Aug 22, 2022

frankplow commented Aug 22, 2022

Veykril commented Aug 23, 2022

ghost commented Sep 30, 2022

Veykril commented Sep 30, 2022

jhpratt commented Nov 23, 2022

oxalica commented Nov 23, 2022

Veykril commented Nov 23, 2022

jhpratt commented Nov 23, 2022 •

edited

Loading

Veykril commented Nov 23, 2022

oxalica commented Nov 24, 2022

Veykril commented Nov 24, 2022

jhpratt commented Nov 27, 2022

Veykril commented Nov 28, 2022 •

edited

Loading

Semantic Tokens - Reduce set of non-standard token types #13066

Semantic Tokens - Reduce set of non-standard token types #13066

Comments

frankplow commented Aug 19, 2022 • edited Loading

frankplow commented Aug 19, 2022

flodiebold commented Aug 19, 2022

frankplow commented Aug 19, 2022

Veykril commented Aug 19, 2022

frankplow commented Aug 19, 2022 • edited Loading

Veykril commented Aug 19, 2022

frankplow commented Aug 19, 2022

Veykril commented Aug 22, 2022

frankplow commented Aug 22, 2022 • edited Loading

Veykril commented Aug 22, 2022

frankplow commented Aug 22, 2022 • edited Loading

Veykril commented Aug 22, 2022

Veykril commented Aug 22, 2022

Veykril commented Aug 22, 2022

lnicola commented Aug 22, 2022 • edited Loading

Veykril commented Aug 22, 2022

frankplow commented Aug 22, 2022

Veykril commented Aug 23, 2022

ghost commented Sep 30, 2022

Veykril commented Sep 30, 2022

jhpratt commented Nov 23, 2022

oxalica commented Nov 23, 2022

Veykril commented Nov 23, 2022

jhpratt commented Nov 23, 2022 • edited Loading

Veykril commented Nov 23, 2022

oxalica commented Nov 24, 2022

Veykril commented Nov 24, 2022

jhpratt commented Nov 27, 2022

Veykril commented Nov 28, 2022 • edited Loading

frankplow commented Aug 19, 2022 •

edited

Loading

frankplow commented Aug 19, 2022 •

edited

Loading

frankplow commented Aug 22, 2022 •

edited

Loading

frankplow commented Aug 22, 2022 •

edited

Loading

lnicola commented Aug 22, 2022 •

edited

Loading

jhpratt commented Nov 23, 2022 •

edited

Loading

Veykril commented Nov 28, 2022 •

edited

Loading