* feat: add MiniMax as a chat completion provider
Add MiniMax (https://www.minimax.io) as a first-class chat completion
provider. MiniMax already has TTS integration in SillyTavern; this
extends support to LLM chat completions via their OpenAI-compatible API.
Supported models:
- MiniMax-M2.5 (default) — 204K context
- MiniMax-M2.5-highspeed — same capability, faster inference
Key implementation details:
- Reuses existing SECRET_KEYS.MINIMAX (shared with TTS)
- API endpoint: https://api.minimax.io/v1
- Temperature clamped to (0.0, 1.0] as required by MiniMax API
- Returns hardcoded model list since MiniMax doesn't expose /v1/models
- Full UI integration: model selector, sampler parameters, streaming
Co-Authored-By: octo-patch <octo-patch@users.noreply.github.com>
* feat: upgrade MiniMax default model to M2.7
- Add MiniMax-M2.7 and MiniMax-M2.7-highspeed to model list
- Set MiniMax-M2.7 as default model
- Keep all previous models as alternatives
* feat: independent request function, vision support, temp clamping for MiniMax
- Extract sendMinimaxRequest() following Chutes pattern (PR #4844)
with function calling and JSON Schema structured output support
- Clamp temperature to (0.01, 1.0] on backend; limit frontend UI max to 1.0
- Enable image inlining for MiniMax M2.7 model
- Add MiniMax to slash-commands model selector and tokenizer mapping
- Add minimax_model to default preset
* feat: add VLM-based vision support for MiniMax M2.7
M2.7 does not natively accept image input. When images are detected
in messages, pre-process them via the MiniMax VLM endpoint
(/v1/coding_plan/vlm) to convert images to text descriptions before
sending to the chat completions API. Uses the same API key.
* feat: add M2-her model to MiniMax provider
M2-her is MiniMax's dialogue/roleplay-optimized model with 64K context
and 2048 max completion tokens. Text-only (no vision).
* feat: add MiniMax China endpoint (minimaxi.com) support
Add endpoint selector (Global/China) for MiniMax, mirroring the
SiliconFlow pattern. Users can now choose between api.minimax.io
(international) and api.minimaxi.com (China domestic).
* fix: merge consecutive same-role messages for MiniMax
MiniMax API rejects consecutive messages with the same role with
error 'invalid chat setting (2013)'. Merge them before sending.
* review: address PR feedback on MiniMax provider
Backend (src/endpoints/backends/chat-completions.js):
- Drop the entire MiniMax VLM image-preprocessing path; vision is no
longer advertised for this provider, so M2.7 messages now go straight
to /chat/completions without a separate VLM round-trip.
- Drop the json_schema -> response_format mapping (MiniMax does not
document structured-output support; relying on it was speculative).
- Drop the backend temperature clamp; the same clamp now lives in the
frontend so the wire payload matches what the user sees.
- Drop the MINIMAX branch in /status that returned a hard-coded model
list; the frontend hardcodes the same list and bypasses /status via
noValidateSources, so the round-trip was wasted.
- Add a streaming Transform + non-streaming helper that move
<think>...</think> blocks from delta.content / message.content to
reasoning_content. MiniMax M2.x emit chain-of-thought inline in
content; without this transform the raw <think> tags leak into the
rendered chat. Includes a state machine that holds back partial
marker bytes so a marker split across SSE chunks is still detected.
Frontend:
- public/scripts/openai.js: add MINIMAX to noValidateSources so the key
is accepted without a /models call; remove the dead saveModelList
branch; clamp temperature to (0.0, 1.0] in createGenerationParameters.
- public/scripts/reasoning.js: add MINIMAX to the non-streaming
reasoning_content extraction case (the backend transform now produces
this field for MiniMax responses).
- public/scripts/slash-commands.js: add MINIMAX to the /api enum and
add a MiniMax case to /api-url so users can switch endpoint by
command.
- public/scripts/custom-request.js: pass minimax_endpoint through the
override-payload merge alongside the other per-source endpoint fields.
- public/scripts/tokenizers.js: stop returning openai_model (which was
always a MiniMax model id and thus an unknown tokenizer); fall back
to gpt-3.5-turbo for a coarse but functional estimate.
- public/scripts/tool-calling.js: add MINIMAX to supportedSources so
function-calling settings are exposed.
- public/index.html: drop the "-- Connect to the API --" placeholder
option from the model select (the model list is hardcoded and always
populated); remove minimax from the vision data-source attributes
on the inline-media controls.
- public/img/minimax.svg: replace the multicolor brand SVG with a
single-color currentColor version that matches the other provider
icons in the connect panel.
* review: drop backend <think> parsing, defer to frontend
Per reviewer feedback: SillyTavern's reasoningHandler / reasoning_auto_parse
setting already extracts <think>...</think> blocks on the client side, so the
backend doesn't need to rewrite MiniMax responses. Removes the SSE Transform,
the non-streaming helper, and the corresponding case in reasoning.js.
* fix: remove isImageInliningSupported declaration for MINIMAX
* fix: remove MINIMAX from stream reasoning parsing
* fix: add to autoconnect logic
* fix: add missing MINIMAX models from docs
* fix: freq. and pres. pen aren't supported for MINIMAX
* fix: use clamp function for adjusting temperature
* fix: pass minimax_endpoint from connection profile to ChatCompletionService
* fix: update supported APIs in slash command documentation
* fix: replace bespoke merge with standard MERGE_TOOLS processing
* fix: add data-i18n attributes for headers
---------
Co-authored-by: octo-patch <octo-patch@users.noreply.github.com>
Co-authored-by: octo-patch <octo-patch@github.com>
Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
In group chats, only include reasoning from the currently generating character instead of all group members. This prevents reasoning from other characters being injected into the prompt context when generating responses.
- Filter reasoning in coreChat loop based on message author matching name2
- Filter reasoning in setOpenAIMessages based on message author matching name2
- Add isOtherGroupMember check before adding reasoning to messages
* enable interleaved tool reasoning for custom OpenAI-compat endpoints
Add chat_completion_sources.CUSTOM to interleaved_reasoning_providers so
that local OpenAI-compatible endpoints (e.g. KoboldCPP in Chat Completions
mode) can forward reasoning context in tool-call chains when the user has
configured Interleaved Thinking.
Also expose the Interleaved Thinking UI control for the Custom source so
users can actually opt in — previously the dropdown was hidden behind a
data-source="openrouter" guard.
The custom streaming path already correctly accumulates delta.reasoning_content
from streaming chunks; this change only removes the provider gate that was
silently discarding that data before it reached the API payload.
* don't override invocation reasoning with prior-turn assistant reasoning
When an invocation already has its own reasoning captured at execution
time, preserve it instead of replacing it with previousAssistantReasoning
from the backward scan. The override was correct when invocations never
carried their own reasoning, but now that the custom/openrouter paths
capture per-invocation reasoning, the unconditional replacement caused
all tool calls in a chain to receive the same stale reasoning from an
earlier unrelated assistant turn.
Fall back to previousAssistantReasoning only when clone.reasoning is empty.
* feat: add Cloudflare Workers AI provider
Adds support for Cloudflare Workers AI using its OpenAI-compatible API.
Workers AI-specific stuff includes:
- Model list fetching and capabilities detection
- Tokenizer auto-detection for typical hosted model families
- Streaming not supported when using structured output
Closes#5305
* Make the entire header clickable
* Add missing samplers
* Fix non-streaming reasoning parsing
---------
Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
* feat: add SiliconFlow.cn endpoint support and embedding vectors
Chat completion:
- Add endpoint selection dropdown (Global/.com vs China/.cn) to existing
SiliconFlow provider, following the Z.AI endpoint pattern
- Backend switches API URL based on selected endpoint
- Add /api-url slash command support for endpoint switching
Embeddings:
- Add SiliconFlow as a vector/embedding source (OpenAI-compatible)
- Support both .com and .cn endpoints via siliconflow_endpoint setting
borrowed from the main connection panel (Vertex AI pattern)
- Superset model list with platform attribution (.cn) markers
- Models: Qwen3-Embedding (0.6B/4B/8B) + BGE/BCE models (.cn only)
* Add filter by models type
* Load embedding models from endpoint
* Improve api-url command declaration
* Support endpoint override in custom-request service
---------
Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
* Add gpt-5.3-chat-latest model support
- Add to OpenAI model dropdown (index.html)
- Add to captioning multimodal model list (caption/settings.html)
- Add to OPENAI_REASONING_EFFORT_MODELS (constants.js)
- Add OPENAI_FIXED_REASONING_EFFORT map to clamp effort to 'medium' (the only value this model accepts)
- Apply fixed effort override in both Azure and general OpenAI request paths (chat-completions.js)
- Update frontend gpt-5.x regex for parameter handling (openai.js)
* Update public/scripts/openai.js
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Added filter for OpenRouter models provider selection
Now if a model is selected, only available providers for that model will show. Wanted to do the same for the quants, but I think the API is not returning the quants available for each model at the moment. Used existing API that for some reason was not consumed.
* Added filter for OpenRouter providers
Now if a model is selected, only the providers available show. Wanted to do the same with the quants but it seems the OpenRouter API is not giving the available quants list at the moment for each model.
* gua
* Now it also works on chat completion and only disables options
* detail
* Warning added
* eslint
* Move inline styles to CSS
---------
Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
* feat(openrouter): disable reasoning if "Request model reasoning" is disabled
* feat(openrouter): map minimum reasoning to none if request reasoning is off
* Add hint how to disable reasoning
---------
Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
* fix(openrouter): forward reasoning across active tool-call chains
* feat(reasoning): add tool-chain forwarding toggle and honor edited reasoning
* feat(reasoning): add OpenRouter interleaved forwarding modes
* moved the reasoning forwarding dropdown into a separate line
* feat(reasoning): default tool reasoning forwarding to disabled
* refactor(openrouter): move tool reasoning mode to CC settings
Move OpenRouter tool reasoning forwarding control to response configuration and scope it to OpenRouter.
Store mode in chat completion settings (presettable), remove legacy power_user boolean/fallback, and use constants for mode values.
Preserve OpenRouter Gemini signature forwarding independently from plaintext tool reasoning mode.
* fix(openrouter): tighten active-chain reasoning forwarding
Use trailing contiguous tool-chain boundary for active-chain eligibility.
Also rename the UI control to Interleaved Thinking Forwarding and place selector on its own line.
* fix(openrouter): use adjacent assistant reasoning for tool calls
For interleaved thinking forwarding, source reasoning only from the immediately preceding assistant non-tool message.
Keep mode gating behavior unchanged and avoid history-window reasoning carryover.
* fix(openrouter): skip tool messages for reasoning source
When forwarding interleaved reasoning, ignore intervening tool result messages when resolving the preceding assistant reasoning source.
This keeps only the first tool call in a chain tied to a prior assistant reasoning block unless a later invocation carries its own reasoning.
* fix(openrouter): keep plaintext reasoning with signatures
Do not suppress forwarded tool-call reasoning when thought signatures are present.
* fix(openrouter): split interleaved thinking mode behavior
Restore distinct mode semantics: active_chain uses nearest assistant-text boundary after skipping tool/tool-call messages, while since_last_user scans for latest assistant reasoning since user.
Update UI label to Interleaved Thinking with right-aligned dropdown and explanatory tooltip.
* style(openrouter): align interleaved thinking dropdown row
Match OpenRouter interleaved thinking control layout with existing oneline-dropdown patterns.
Also update reasoning-forwarding inline comment wording for current mode behavior.
* docs(ui): clarify interleaved thinking tooltip
Use explicit API-request wording for OpenRouter interleaved thinking tooltip text.
* i18n(openrouter): localize interleaved thinking UI
Add locale keys for OpenRouter interleaved thinking label, mode options, and inline helper description.
Wire dropdown option text to data-i18n in index.html.
* fixed helper text wrapping
* fix(ui): make interleaved thinking helper text wrap
* i18n(openrouter): translate interleaved thinking labels
Replace placeholder English values for interleaved thinking keys in non-English locale files.
* fix(ui): restore interleaved thinking dropdown alignment
* Remove changes from en.json
* Type fixes
* Reworked the interleaved reasoning provider logic
* Renamed the variables in preparation for potential implementation for other providers
* Gate interleaved tool reasoning on reasoning request setting
---------
Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
* Implement Gemini thought signatures
* Implement streaming support for Gemini thought signatures
* Implement OR support for Gemini thought signatures
* Remove unnecessary extraction of thought sigs from response parts
* Update thought sig comments to remove explicit Gemini mention
* Fix thought_signature naming convention in message.extra
* Add thought_signatures to ReasoningMessageExtra typedef
* Prevent thought sigs being sent to incompatible endpoints
* Move signatures to populateChatHistory, update for consistent casing
* Code clean-up
* Only send thought signatures if target model and API match original
* Implement content-hash thought signature mapping
* Change the data model + split for text/functions
* Don't include signature to invocations if the model doesn't match
* Fix function description
* Remove misleading comment
* Handle OpenRouter signatures
* Improve message extra types
* Prevent modifying original invocations when removing signatures
* Fix return of openrouter non-streaming signatures
* Remove redundant array check
---------
Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
* Separate prompt-building functionality from request-sending functionality
* removing logs and clarifying comments
* separating parameter construction functionality to allow ConnectionManagerRequestService to use all other preset parameters
* fixing chat completion issues, adding documentation to new functions.
* Improving ConnectionManagerRequestService errors. Adding parseReasoningFromString option to override reasoning template.
* Adjusting TextCompletionService prompt formatting
* linting
* Use settingsToUpdate to convert from OAI preset to OAI settings.
* lint
* throw errors when profile ID not found
* Fix missed instances of global completion settings being used (CC and TC), replaced with optional argument. Specified typing for ChatCompletionSettings and TextCompletionSettings.
* Adjusting parameters of parseReasoningFromString and adding getReasoningTemplateByName
* using messages.role as a fallback for custom requests, fixing newline removal.
* parameters => settings
I like how it sounds better
* ditto
* You know I had to do it to 'em
* Update getCustomTokenBans
* Fix calculateLogitBias
* Fix param attributes
* Fix type checks
* Less strict role type on ChatCompletionMessage
* Add missing space
* fixing getChatCompletionModel to use an arbitrary chat completion settings object
* Fixing issues with preset overriding custom data passed.
* Pass model to createGenerationParameters externally
* Unify seed param handling for CHUTES
* Fix non-existing CC source
* Use strict comparison
* Use global settings as a base for generation parameters creation
* removing unnecessary handling of preset fields
* don't pass preset prompts, use the passed payload override messages
* refactoring text generation prompt building of last line
* Pass model to getReasoningEffort
* Pass model name to canPerformToolCalls
* Pass model to createTextGenGenerationData
---------
Co-authored-by: qvink <qvink@users.noreply.github.com>
Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>