Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Theia AI] Support prompt fragments #14899

Open
Tracked by #14923
JonasHelming opened this issue Feb 11, 2025 · 21 comments
Open
Tracked by #14923

[Theia AI] Support prompt fragments #14899

JonasHelming opened this issue Feb 11, 2025 · 21 comments
Assignees

Comments

@JonasHelming
Copy link
Contributor

JonasHelming commented Feb 11, 2025

Prompt fragments allow to define parts of prompts in a reusable way. You can embed these prompts then via a variable in the chat or in our prompt templates. This also allows to have a set of prompts available in the chat without always defining a custom agent.

Step 1: Allow referencing prompt fragments

The PromptCustomizationService should already load any file in the prompt service. When we add a file there which does not have the id of an agent, we assume it is a resuable prompt.
We now need a variable "#prompt:${promptId}" that resolves to the prompt with the specified id. We need to check the resolvement logic in the prompt service so that variables and function occuring in injected prompt fragments are resolved , too. same applies for chat messages.
With this, the user can add a prompt fragment by just placing a file e.g. myprompt.prompttemplate in the prompt templates directory and then use it via "#prompt:myprompt". "myPrompt" gets auto completed.

Step 2: Add UI

Add a new tab to the AI configuration view showing a list of all loaded prompt templates allowing to:

  • Edit
  • Add Variant (of an existing one)
  • Add prompt fragment
  • Reset/delete (if they are resettable, e.g. they have been created by an agent, otherwise delete)
  • Filter to agent prompts or prompt fragments
  • Open the directory
  • Sort
@sdirix
Copy link
Member

sdirix commented Feb 12, 2025

VS Code also introduced prompt management and reusable prompts. We should evaluate whether we want to align Theia's prompt management.

@JonasHelming
Copy link
Contributor Author

I had a look. It is pretty similar, but I like our UI integration a lot better for now.

@JonasHelming
Copy link
Contributor Author

We could allow to place prompt variables in the context, though so they get part of the system prompt. @planger WDYT?

@planger
Copy link
Contributor

planger commented Feb 13, 2025

The VS Code concept for prompt templates is pretty similar indeed. I only see two real differences:

  • They only allow attaching prompt files to the context, while we foresee to reference them directly in the prompt via a variable, which would then be replaced with the actual prompt text.
  • They allow making prompt templates part of the workspace (.github/prompts), which imho is a nice addition to simplify sharing prompt templates.

TL;DR:

  • I'm in favor of keeping our prompt variables as they are for now (so users can put them in their user prompt to be replaced with the prompt template text). This gives them more control where exactly to put the prompt template content, if they want to.
  • I second the idea of marking prompt template variables as context variables, so they could also just be referenced in the context (is almost for free -- see below, and may enable some extra use cases)
  • I think it would be nice to read in the .github/prompts folder, in order to add all files there into the prompt template registry to be compatible to VS Code projects in this regard.

Turning Prompt Template Variables into Context Variable

Making prompt variables context variables in addition, could be a nice extra feature and would lead to consistency with VS Code.
This comes almost for free. Prompt variables would just need to set the context flag to true and would need to provide a contextValue in addition to the prompt value. Once they are marked as context variables, users can also just attach them to prompts, so their context value would end up in the context. Users can choose whether they in addition also want to mention them in the prompt text directly (which would then be redundant, as their prompt value and their context value would be equivalent. But I guess that's not a problem.
If users only add the prompt variable to the context, we'd rely on the agent to incorporate them (via tool function or variable for the context in the system message). But that's also good enough for now, as long as our default agents do that.

@lucas-koehler
Copy link
Contributor

@JonasHelming For the UI part, should we also add auto completion in the chat for the prompt template ids? With just defining the variable, users need to know and type the full prompt template id.
If yes, please add it above :)

@JonasHelming
Copy link
Contributor Author

JonasHelming commented Feb 13, 2025

Yes, and I believe this is conceptionally introduced in #14787 so it might make sense to branch of this PR. Added to the description

@sdirix
Copy link
Member

sdirix commented Feb 13, 2025

The VS Code concept for prompt templates is pretty similar indeed. I only see two real differences:

  • They only allow attaching prompt files to the context, while we foresee to reference them directly in the prompt via a variable, which would then be replaced with the actual prompt text.
  • They allow making prompt templates part of the workspace (.github/prompts), which imho is a nice addition to simplify sharing prompt templates.

TL;DR:

  • I'm in favor of keeping our prompt variables as they are for now (so users can put them in their user prompt to be replaced with the prompt template text). This gives them more control where exactly to put the prompt template content, if they want to.
  • I second the idea of marking prompt template variables as context variables, so they could also just be referenced in the context (is almost for free -- see below, and may enable some extra use cases)
  • I think it would be nice to read in the .github/prompts folder, in order to add all files there into the prompt template registry to be compatible to VS Code projects in this regard.

In VS Code

  • they use #file: to refer to other prompts.
  • they support markdown style links as syntactic sugar for #file
  • They allow relative paths in these links

Therefore

  • I'm not sure it's worth it to support a new variable #prompt. I guess the only difference is that our variable is then path independent?
  • We will need to support VS Code's spec anyway at some point, so why not right away

Edit: The #prompt also allows to refer to prompts which are not even managed in files. So that's definitely a good use case which we likely need. Still would be good to also support the VS Code style

@planger
Copy link
Contributor

planger commented Feb 13, 2025

@sdirix You are right, the file-relative links (and the MD syntactic sugar) is indeed what we would be missing at the moment. I missed that part.

I think however that we aim at additional purposes with the #prompt variable (beyond the purposes that you could achieve with putting prompts into #files):

  1. Reuse prompt fragments in other prompts, like canned prompts that people would add if they add a set of tool functions, etc.
  2. Have short-cuts for users to reproduce similar prompts for similar tasks

For both of these use cases, we don't want to use #file, because file is just pulling the file into the context, but doesn't replace the actual variable occurrence with the actual file content. (edit: so that's to me the main difference between #prompt and #file)

Unrelated thought:
While I was working on the context feature, I was also thinking whether we should mark variables as dynamic or not and, if they are static, directly replace them in the user input text field when they are selected, to be even more transparent and let users also modify the resolved content.
That would now be super handy for prompt templates, actually. But this is unrelated.

@JonasHelming
Copy link
Contributor Author

The VS Code concept for prompt templates is pretty similar indeed. I only see two real differences:

  • They only allow attaching prompt files to the context, while we foresee to reference them directly in the prompt via a variable, which would then be replaced with the actual prompt text.
  • They allow making prompt templates part of the workspace (.github/prompts), which imho is a nice addition to simplify sharing prompt templates.

TL;DR:

  • I'm in favor of keeping our prompt variables as they are for now (so users can put them in their user prompt to be replaced with the prompt template text). This gives them more control where exactly to put the prompt template content, if they want to.
  • I second the idea of marking prompt template variables as context variables, so they could also just be referenced in the context (is almost for free -- see below, and may enable some extra use cases)
  • I think it would be nice to read in the .github/prompts folder, in order to add all files there into the prompt template registry to be compatible to VS Code projects in this regard.

In VS Code

  • they use #file: to refer to other prompts.
  • they support markdown style links as syntactic sugar for #file
  • They allow relative paths in these links

Therefore

  • I'm not sure it's worth it to support a new variable #prompt. I guess the only difference is that our variable is then path independent?
  • We will need to support VS Code's spec anyway at some point, so why not right away

Edit: The #prompt also allows to refer to prompts which are not even managed in files. So that's definitely a good use case which we likely need. Still would be good to also support the VS Code style

I think eventually, we should support both. Reasoning: Prompts in Theia can come from various sources:

  • Files
  • they are "built in"
  • They can be contributed via MCP

@sdirix
Copy link
Member

sdirix commented Feb 13, 2025

For both of these use cases, we don't want to use #file, because file is just pulling the file into the context, but doesn't replace the actual variable occurrence with the actual file content. (edit: so that's to me the main difference between #prompt and #file)

I think we can't keep it this way, in VS Code it seems it will evaluate the prompt as they specifically state

You can also reference other .prompt.md files to create a hierarchy of prompts, with reusable prompts that can be shared across multiple prompt files

Once people use that they will complain that in Theia we just include the file without evaluating it.

@JonasHelming
Copy link
Contributor Author

I will add this to our discussion agenda and post the results here

@planger
Copy link
Contributor

planger commented Feb 13, 2025

I think we can't keep it this way, in VS Code it seems it will evaluate the prompt as they specifically state

Well I think this is up to the agent and in Theia AI we can keep it this way, imho. For Theia IDE default agents (which are the ones that will be in comparison to VS Code Copilot), we just have to make sure to incorporate the referenced files in one way or another.

@JonasHelming
Copy link
Contributor Author

JonasHelming commented Feb 13, 2025

I think we can't keep it this way, in VS Code it seems it will evaluate the prompt as they specifically state

Well I think this is up to the agent and in Theia AI we can keep it this way, imho. For Theia IDE default agents (which are the ones that will be in comparison to VS Code Copilot), we just have to make sure to incorporate the referenced files in one way or another.

Yes, we have to be aware that specific agents, specifically Coder can actually resolve these file themselve, which I claim to be beneficial compared to just adding them somewhere in the user prompt. I would claim the input is much more structured in the prompt then.

for universal, we should likely resolve the files indeed.

This will be posible with a variable btw. (e.g. #resolvedContext)

@lucas-koehler
Copy link
Contributor

lucas-koehler commented Feb 14, 2025

@JonasHelming For the initial version of being able to reference prompt templates in the chat with auto-completion:

  1. Do we want to show all prompt templates known to the prompt customization service and the prompt service? Or do we only want to show the ones of the prompt customization service?
  2. If only the ones of the prompt customization service, should we also refuse to resolve the others or still resolve them in case someone knows the id and really wants to use them?

EDIT: The difference in implementation is very small.

prompt templates from both services:

Image

prompt templates from prompt customization service only (my test template loaded from configured folder):

Image

@JonasHelming
Copy link
Contributor Author

JonasHelming commented Feb 14, 2025

Following the law of least suprise we should add all prompts I guess, but put the custom onces on top and maybe with a different symbol, WDYT? If you find the ordering strange than at least different symbol

@lucas-koehler
Copy link
Contributor

@JonasHelming With using a different icon, sorting custom first, and adding a detail text it would look like this:

prompt-template-autocomplete.webm

@JonasHelming
Copy link
Contributor Author

Looks very good to me!

@lucas-koehler
Copy link
Contributor

During implementation in #14985 it was discovered that functions referenced in prompt templates should be resolved in the end when the final prompt or chat message is resolved because functions require handling of function objects in addition to inline replacements. See #14985 (comment)

For prompts, this is quite easy to do because the prompt service can simply be adapted to first resolve all variables and then resolve all functions based on the prompt with already resolved variables.

For chat message, there is no clear solution with the current approach of the chat request parser. Furthermore, it seems beneficial to unify the parsing and resolving of chat messages and prompts to avoid duplicate code adaptions like in this case.

Unification of Prompt and Chat Message Parsing in TheiaAI

Objective

Explore the possibility of unifying the parsing logic for prompts and chat messages while allowing a distinction in how variables and functions are specified, i.e. prompts use {{variable:arg}} and chats #variable:arg. This distinction could be controlled via an option of the unified parser.

Chat Parser Issues and Limitations

  • The does not allow parsing functions within resolved variables: Variables are not resolved by the parser (but later by the chat service)
  • Functions are resolved in the chat parser. This is inconsistent with handling variables
  • Using parts for elements such as variables, agents, and functions does not really fit with resolving further in the parts
    • resolving functions inside a variable part results in an unclear result of how to arrange the parts

Difference to Prompt Service

The prompt service does not follow the same parsing approach as chat messages. In the prompt service, resolution occurs inline, and additional data (such as functions) are returned separately. This raises the question whether the part approach is really necessary for chat parsing.

Idea

Analyse whether we can unify parsing of prompts and chat messages. The parsed result could contain the full unresolved text and the fully resolved text (prompt text). At the core, this is what the parts provide: Access to the unresolved and resolved text.
In addition, all "special" references - such as agents, variables, functions - are returned as separate properties next to the text and prompt text. consumers can then access them directly if needed.

@planger
Copy link
Contributor

planger commented Feb 25, 2025

Thanks for bringing this up! Consolidation indeed seems to make sense here, however, aside from the slight variations in the patterns {{variable}} vs #variable, there are a few other differences in how we parse and what we parse, as well as certain requirements, which I'd like to raise here:

The chat service has those track-and-replace patterns (like @<agent-name> or ~<function-name>), where we not only replace the occurrence but also track which occurrences were found. This information later on influences further processing, e.g. which agent we direct this request to or which functions to add to the request. There is no recursion involved for such patterns (that is the replacement string does not contain any data we need to parse again).

This is in contrast to variable parsing, which don't need to be tracked, but resolved recursively, handling potentially cyclic dependencies, and ideally involving caching to avoid multiple resolutions for one overall message (see also #15000).

The other dimension is whether the parsing takes place in the context of a chat session or outside. In a chat session currently there is the additional agent parsing (like @<agent-name>), which is invalid in the non-chat case. In the future there might be further patterns that are only relevant for one or the other case.

Another side requirement to the parsing might be, especially in the chat case, that it'd be great to know afterwards which patterns where successfully matched at which character indexes. This is useful to e.g. highlight the respective parts of the chat request in the chat UI. Currently I think we plainly highlight all occurrences that match the pattern, but not if those were actually successful:
e.g. I believe we highlight any #zzz string even though the variable zzz might not exist and thus wasn't replaced. Similarly if the user mentions different agents multiple time (@<agent-name>) we highlight all of them even though only the first occurrence might be active. So the parsing should maintain this information to be used later on the UI.
Thinking this further, this may even be useful while typing, e.g. adding red underline if the user references a non-existing variable, etc.

I certainly don't want to overload this issue, but I think this may be helpful to capture the current differences, requirements, and clients of the parser.

@planger
Copy link
Contributor

planger commented Feb 25, 2025

In summary: it looks like we may consider actually split the parsing of variables and all other patterns, because they differ in their nature. They should be resolved before the track-and-replace patterns are matched, they should be resolved recursively, we may need to guard cycles, and may need caching.

This feels different from parsing the track-and-replace patterns once the message was completely unfolded.

@lucas-koehler
Copy link
Contributor

Summary

I think we can cover the requirements by having a unified, configurable parser that tracks all resolved patterns including matching ranges in the original text. Consumers (i.e. chat ui or prompt consumers) can then just use the tracked information they need.
In essence, this has the advantage of a unified parsing logic but the disadvantage of parsing more than required for some use cases. We would need to find out whether the current ParsedChatRequestPart concept - which splits the parsed chat message in parts - has other features we would miss. On a first look, they are maily used to collect original text, prompt text, range and resolved values. All of which we could also track at the root of the returned parsed result (i.e. like resolved functions).

Details

Thanks for the extensive elaboration! Here are my thoughts on this.

The chat service has those track-and-replace patterns (like @ or ~), where we not only replace the occurrence but also track which occurrences were found. This information later on influences further processing, e.g. which agent we direct this request to or which functions to add to the request. There is no recursion involved for such patterns (that is the replacement string does not contain any data we need to parse again).

At least for functions, the same is done in the prompt service. I could see unifying this in two ways:

  • Parse and track everything. Consumers then only use the tracked patters they need and disregard the rest.
  • Hand over as an option what kind of patterns should be parsed. This option could also include the regex pattern allowing different patterns for chat and prompts.

Generally, I think parsing could even be an extensible concept where dev users of Theia Ai can contribute new patterns and parser logic to be added to the parsed result. Custom agents could then access this information.

This is in contrast to variable parsing, which don't need to be tracked, but resolved recursively, handling potentially cyclic dependencies, and ideally involving caching to avoid multiple resolutions for one overall message (see also #15000).

I agree with this except for the no-tracking need for chat ui reasons (as you elaborated, see more later). As I see it, the resolving, caching, etc. can be implemented independently of the other changes because resolving variables goes through the AIVariableService already. The service implementation could and should hide implementing the details.

The other dimension is whether the parsing takes place in the context of a chat session or outside. In a chat session currently there is the additional agent parsing (like @), which is invalid in the non-chat case. In the future there might be further patterns that are only relevant for one or the other case.

This certainly is an important point. I think we could still solve this with the approach above where you tell the parser which patterns to parse. With this, we could have one unified parser that only parses relevant patterns. Or even if everything is parsed, the tracking results could just be ignored depending on the use case.
If we want to get really generic, we could even use a priority system to determine parsing and replacement order.

Another side requirement to the parsing might be, especially in the chat case, that it'd be great to know afterwards which patterns where successfully matched at which character indexes. This is useful to e.g. highlight the respective parts of the chat request in the chat UI. Currently I think we plainly highlight all occurrences that match the pattern, but not if those were actually successful:
e.g. I believe we highlight any #zzz string even though the variable zzz might not exist and thus wasn't replaced. Similarly if the user mentions different agents multiple time (@) we highlight all of them even though only the first occurrence might be active. So the parsing should maintain this information to be used later on the UI.
Thinking this further, this may even be useful while typing, e.g. adding red underline if the user references a non-existing variable, etc.

I think this is also the origin of the parts returned by the chat request parser. However, I'm not sure we need the part approach to achieve this.
We could IMO also parse the chat message and track every resolved pattern, including resolved variables, in separate properties (like the resolved functions). The "tracking" results then contain the matched range (as the parts also do) and in addition whether resolvement was successfull. A detailed description/error could also be added here (e.g. "Agent xzy was not resolved because agent ghj was mentioned previously in this message").

If we just generally track this information, this could IMO also be used in a unified way. The downside for the prompt case would be that more information is tracked than needed. However, storing the tracking results should not be that much overhead because we have to do the resolving anyway.

In summary: it looks like we may consider actually split the parsing of variables and all other patterns, because they differ in their nature. They should be resolved before the track-and-replace patterns are matched, they should be resolved recursively, we may need to guard cycles, and may need caching.

This feels different from parsing the track-and-replace patterns once the message was completely unfolded.

I agree that we might need to split them. For the chat ui, we still might want to track the variables, too, as elaborated earlier. Essentially, I think we apply a similar approach as with the track-and-replace patters with the one exception that we need to resolve and replace variables first. Recursive resolving of variables should IMO be handled by the variable service to encapsulate this behavior. Also, the chat ui (probably?!) does not need to know about the recursive variable resolutions as only the initial one can be highlighted in the chat window.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants