Skip to content

Latest commit

 

History

History
136 lines (106 loc) · 13 KB

options.md

File metadata and controls

136 lines (106 loc) · 13 KB

🤖Supported configs & options

en-icon zh-hans-icon

Symbols: ✅ - Supported, ❌ - Not supported, 📌 - Plan to support

OpenAI ✅

API configurations

Field Description
API Key The API key for your OpenAI API.
Model ID of the model to use.

Conversation options

Option Description Supported
frequency_penalty Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
max_tokens The maximum number of tokens that can be generated in the chat completion.
The total length of input tokens and generated tokens is limited by the model's context length.
presence_penalty Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
temperature What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
We generally recommend altering this or top_p but not both.
top_p An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
We generally recommend altering this or temperature but not both.
stream If set, partial message deltas will be sent, like in ChatGPT.
user A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse.
response_format An object specifying the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models newer than gpt-3.5-turbo-1106. 📌
seed If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. 📌
stop Up to 4 sequences where the API will stop generating further tokens. 📌
tools A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for.
tool_choice Controls which (if any) function is called by the model. none means the model will not call a function and instead generates a message. auto means the model can pick between generating a message or calling a function. Specifying a particular function via {"type": "function", "function": {"name": "my_function"}} forces the model to call that function.
none is the default when no functions are present. auto is the default if functions are present.
logit_bias Modify the likelihood of specified tokens appearing in the completion.
Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.
logprobs Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message. This option is currently not available on the gpt-4-vision-preview model.
top_logprobs An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.
n How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

References

Microsoft Azure ✅

API configurations

Field Description
API Key The API key for your Azure OpenAI API.
Endpoint The endpoint for your Azure OpenAI API.
API version The API version to use for this operation. This follows the YYYY-MM-DD or YYYY-MM-DD-preview format.
Deployment ID The name of your model deployment.

Conversation options

Option Description Supported
max_tokens The maximum number of tokens to generate in the completion. The token count of your prompt plus max_tokens can't exceed the model's context length.
temperature What sampling temperature to use, between 0 and 2. Higher values mean the model takes more risks. Try 0.9 for more creative applications, and 0 (argmax sampling) for ones with a well-defined answer. We generally recommend altering this or top_p but not both.
top_p An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
presence_penalty Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
frequency_penalty Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
stream If set, partial message deltas will be sent, like in ChatGPT.
user A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse.
suffix The suffix that comes after a completion of inserted text. 📌
echo Echo back the prompt in addition to the completion. This parameter cannot be used with gpt-35-turbo. 📌
stop Up to four sequences where the API will stop generating further tokens. The returned text won't contain the stop sequence. For GPT-4 Turbo with Vision, up to two sequences are supported. 📌
logit_bias Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. You can use this tokenizer tool (which works for both GPT-2 and GPT-3) to convert text to token IDs. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect varies per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. As an example, you can pass {"50256": -100} to prevent the <|endoftext|> token from being generated.
n How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
logprobs Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. For example, if logprobs is 10, the API will return a list of the 10 most likely tokens. The API will always return the logprob of the sampled token, so there might be up to logprobs+1 elements in the response. This parameter cannot be used with gpt-35-turbo.
best_of Generates best_of completions server-side and returns the "best" (the one with the lowest log probability per token). Results can't be streamed. When used with n, best_of controls the number of candidate completions and n specifies how many to return – best_of must be greater than n. Note: Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for max_tokens and stop. This parameter cannot be used with gpt-35-turbo.

References

Anthropic Claude ✅

API configurations

Field Description
api-key The API key for your Anthropic API.
anthropic-version The version of Anthropic to use.
model The Anthropic model to use.

Conversation options

Option Description Supported
max_tokens The maximum number of tokens to generate before stopping.
temperature Amount of randomness injected into the response.
Defaults to 1.0. Ranges from 0.0 to 1.0. Use temperature closer to 0.0 for analytical / multiple choice, and closer to 1.0 for creative and generative tasks.
We generally recommend altering this or top_p but not both.
top_p Use nucleus sampling.
Recommended for advanced use cases only. You usually only need to use temperature.
stream Whether to incrementally stream the response using server-sent events.
user An object describing metadata about the request.
metadata.user_id: An external identifier for the user who is associated with the request.
stop_sequences Custom text sequences that will cause the model to stop generating. 📌
top_k Only sample from the top K options for each subsequent token.
Recommended for advanced use cases only. You usually only need to use temperature.
📌
tools Definitions of tools that the model may use.
tool_choice How the model should use the provided tools.

Ollama ✅

API configurations

Field Description
Endpoint The endpoint for your Azure OpenAI API.
Model The model to use.

Conversation options

Option Description Supported
num_ctx Number of input tokens. Sets the size of the context window used to generate the next token. (Default: 2048)
num-predict Number of output tokens. Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context)
temperature The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)
top_p Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)
mirostat Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0) 📌
mirostat_eta Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. (Default: 0.1) 📌
mirostat_tau Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0) 📌
repeat_last_n Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx) 📌
repeat_penalty Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1) 📌
seed Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. (Default: 0) 📌
stop Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate stop parameters in a modelfile. 📌
tfs_z Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. (default: 1) 📌
top_k Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40) 📌
min_p Alternative to the top_p, and aims to ensure a balance of quality and variety. The parameter p represents the minimum probability for a token to be considered, relative to the probability of the most likely token. For example, with p=0.05 and the most likely token having a probability of 0.9, logits with a value less than 0.045 are filtered out. (Default: 0.0) 📌

References

Google Gemini

📌 Plan to support