Commit Graph

360 Commits

Author SHA1 Message Date
Cohee c249e5384c feat: pass koboldcpp reasoning effort (#5491)
Fixes #5489
2026-04-26 00:02:07 +03:00
Cohee 09d72828cb feat: add gemma 4 for AI studio (#5493)
* feat: add gemma 4 for AI studio

* fix: update max context return value for gemma-3n-e4b-it model

* refactor: iterate array of [regex, number]

* gemma4: enable tool calling and sysprompt

Co-authored-by: Copilot <copilot@github.com>

---------

Co-authored-by: Copilot <copilot@github.com>
2026-04-25 22:22:55 +03:00
Dclef 77cbcd8774 feat: add DeepSeek V4 model support with thinking mode and reasoning effort (#5522)
* fix: align DeepSeek provider with V4 API

* Fix DeepSeek beta routing for standard chat completions

* feat: add DeepSeek V4 model support with thinking mode and reasoning effort

* Address DeepSeek review feedback

* Set DeepSeek default model to v4 flash

* fix: clean-up deprecated models, add migration

* fix: move reasoning effort mapping to resolveReasoningEffort

* fix: lint empty line

* fix: remove duplicate code

* fix: add coder model to migration logic

---------

Co-authored-by: dclef <drclef233@gmail.com>
Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
2026-04-24 21:47:30 +03:00
Octopus aecbb9a2ee feat: add MiniMax as a chat completion provider (#5452)
* feat: add MiniMax as a chat completion provider

Add MiniMax (https://www.minimax.io) as a first-class chat completion
provider. MiniMax already has TTS integration in SillyTavern; this
extends support to LLM chat completions via their OpenAI-compatible API.

Supported models:
- MiniMax-M2.5 (default) — 204K context
- MiniMax-M2.5-highspeed — same capability, faster inference

Key implementation details:
- Reuses existing SECRET_KEYS.MINIMAX (shared with TTS)
- API endpoint: https://api.minimax.io/v1
- Temperature clamped to (0.0, 1.0] as required by MiniMax API
- Returns hardcoded model list since MiniMax doesn't expose /v1/models
- Full UI integration: model selector, sampler parameters, streaming

Co-Authored-By: octo-patch <octo-patch@users.noreply.github.com>

* feat: upgrade MiniMax default model to M2.7

- Add MiniMax-M2.7 and MiniMax-M2.7-highspeed to model list
- Set MiniMax-M2.7 as default model
- Keep all previous models as alternatives

* feat: independent request function, vision support, temp clamping for MiniMax

- Extract sendMinimaxRequest() following Chutes pattern (PR #4844)
  with function calling and JSON Schema structured output support
- Clamp temperature to (0.01, 1.0] on backend; limit frontend UI max to 1.0
- Enable image inlining for MiniMax M2.7 model
- Add MiniMax to slash-commands model selector and tokenizer mapping
- Add minimax_model to default preset

* feat: add VLM-based vision support for MiniMax M2.7

M2.7 does not natively accept image input. When images are detected
in messages, pre-process them via the MiniMax VLM endpoint
(/v1/coding_plan/vlm) to convert images to text descriptions before
sending to the chat completions API. Uses the same API key.

* feat: add M2-her model to MiniMax provider

M2-her is MiniMax's dialogue/roleplay-optimized model with 64K context
and 2048 max completion tokens. Text-only (no vision).

* feat: add MiniMax China endpoint (minimaxi.com) support

Add endpoint selector (Global/China) for MiniMax, mirroring the
SiliconFlow pattern. Users can now choose between api.minimax.io
(international) and api.minimaxi.com (China domestic).

* fix: merge consecutive same-role messages for MiniMax

MiniMax API rejects consecutive messages with the same role with
error 'invalid chat setting (2013)'. Merge them before sending.

* review: address PR feedback on MiniMax provider

Backend (src/endpoints/backends/chat-completions.js):
- Drop the entire MiniMax VLM image-preprocessing path; vision is no
  longer advertised for this provider, so M2.7 messages now go straight
  to /chat/completions without a separate VLM round-trip.
- Drop the json_schema -> response_format mapping (MiniMax does not
  document structured-output support; relying on it was speculative).
- Drop the backend temperature clamp; the same clamp now lives in the
  frontend so the wire payload matches what the user sees.
- Drop the MINIMAX branch in /status that returned a hard-coded model
  list; the frontend hardcodes the same list and bypasses /status via
  noValidateSources, so the round-trip was wasted.
- Add a streaming Transform + non-streaming helper that move
  <think>...</think> blocks from delta.content / message.content to
  reasoning_content. MiniMax M2.x emit chain-of-thought inline in
  content; without this transform the raw <think> tags leak into the
  rendered chat. Includes a state machine that holds back partial
  marker bytes so a marker split across SSE chunks is still detected.

Frontend:
- public/scripts/openai.js: add MINIMAX to noValidateSources so the key
  is accepted without a /models call; remove the dead saveModelList
  branch; clamp temperature to (0.0, 1.0] in createGenerationParameters.
- public/scripts/reasoning.js: add MINIMAX to the non-streaming
  reasoning_content extraction case (the backend transform now produces
  this field for MiniMax responses).
- public/scripts/slash-commands.js: add MINIMAX to the /api enum and
  add a MiniMax case to /api-url so users can switch endpoint by
  command.
- public/scripts/custom-request.js: pass minimax_endpoint through the
  override-payload merge alongside the other per-source endpoint fields.
- public/scripts/tokenizers.js: stop returning openai_model (which was
  always a MiniMax model id and thus an unknown tokenizer); fall back
  to gpt-3.5-turbo for a coarse but functional estimate.
- public/scripts/tool-calling.js: add MINIMAX to supportedSources so
  function-calling settings are exposed.
- public/index.html: drop the "-- Connect to the API --" placeholder
  option from the model select (the model list is hardcoded and always
  populated); remove minimax from the vision data-source attributes
  on the inline-media controls.
- public/img/minimax.svg: replace the multicolor brand SVG with a
  single-color currentColor version that matches the other provider
  icons in the connect panel.

* review: drop backend <think> parsing, defer to frontend

Per reviewer feedback: SillyTavern's reasoningHandler / reasoning_auto_parse
setting already extracts <think>...</think> blocks on the client side, so the
backend doesn't need to rewrite MiniMax responses. Removes the SSE Transform,
the non-streaming helper, and the corresponding case in reasoning.js.

* fix: remove isImageInliningSupported declaration for MINIMAX

* fix: remove MINIMAX from stream reasoning parsing

* fix: add to autoconnect logic

* fix: add missing MINIMAX models from docs

* fix: freq. and pres. pen aren't supported for MINIMAX

* fix: use clamp function for adjusting temperature

* fix: pass minimax_endpoint from connection profile to ChatCompletionService

* fix: update supported APIs in slash command documentation

* fix: replace bespoke merge with standard MERGE_TOOLS processing

* fix: add data-i18n attributes for headers

---------

Co-authored-by: octo-patch <octo-patch@users.noreply.github.com>
Co-authored-by: octo-patch <octo-patch@github.com>
Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
2026-04-24 00:43:05 +03:00
ashishch432 d1e719eb48 add claude-opus-4-7 (#5465) 2026-04-19 15:47:40 +03:00
Tony Gies a9c377c3c8 feat: add Workers AI text embeddings and multimodal captioning (#5414)
* feat: add Workers AI text embeddings and multimodal captioning

Extends the Cloudflare Workers AI integration to the vectors and
caption extensions.

Embeddings: adds workers_ai source to the vectors extension using the
OpenAI-compatible /v1/embeddings endpoint, with dynamic model listing
from the Cloudflare model search API.

Captioning: adds workers_ai as a multimodal caption API with dynamic
vision model discovery via the multimodal-models endpoint.

* Add logo svg

* Refactor caption dropdown population

* Fix order of sources

* feat: add error handling for missing Workers AI account ID

---------

Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
2026-04-08 23:43:21 +03:00
Tony Gies 700fc05411 feat: add Cloudflare Workers AI provider (#5385)
* feat: add Cloudflare Workers AI provider

Adds support for Cloudflare Workers AI using its OpenAI-compatible API.

Workers AI-specific stuff includes:
- Model list fetching and capabilities detection
- Tokenizer auto-detection for typical hosted model families
- Streaming not supported when using structured output

Closes #5305

* Make the entire header clickable

* Add missing samplers

* Fix non-streaming reasoning parsing

---------

Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
2026-04-06 00:24:47 +03:00
KKTsN c9c652eece fix: improve streaming error propagation and forwarded response logging (#5317)
* Fix: Improve streaming error handling and forwarded response logging

* Fix: fix ESLint error  Strings must use singlequote  quotes

* fix: preserve and log forwarded stream errors

* chore: narrow forwarded stream error fix scope

* fix: make forwardFetchResponse awaitable and forward upstream error text

* Restore original happy path handling

* Remove redundant checks in forwardFetchResponse function

* Don't send anything on parsing error end

---------

Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
2026-04-05 23:01:47 +03:00
lunar sheep ff1ca1412a feat(secrets): update readSecret function to accept optional secret ID (#5356)
* feat(secrets): update readSecret function to accept optional secret ID

* add secret_id to ConnectionManagerRequestService payload

* fix: pass secret_id for Text Completion types

---------

Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
2026-03-30 22:30:45 +03:00
Xiangzhe 2cb1861db6 feat: add SiliconFlow.cn chat completion and embedding support (#5316)
* feat: add SiliconFlow.cn endpoint support and embedding vectors

Chat completion:
- Add endpoint selection dropdown (Global/.com vs China/.cn) to existing
  SiliconFlow provider, following the Z.AI endpoint pattern
- Backend switches API URL based on selected endpoint
- Add /api-url slash command support for endpoint switching

Embeddings:
- Add SiliconFlow as a vector/embedding source (OpenAI-compatible)
- Support both .com and .cn endpoints via siliconflow_endpoint setting
  borrowed from the main connection panel (Vertex AI pattern)
- Superset model list with platform attribution (.cn) markers
- Models: Qwen3-Embedding (0.6B/4B/8B) + BGE/BCE models (.cn only)

* Add filter by models type

* Load embedding models from endpoint

* Improve api-url command declaration

* Support endpoint override in custom-request service

---------

Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
2026-03-22 00:52:03 +02:00
equal-l2 e834d3724b Remove xAI web search capability (#5255)
With web search on, the API now returns 410 Gone.
2026-03-07 16:58:56 +02:00
Spicy Marinara f20aed95d0 Add gpt-5.3-chat-latest model support (#5241)
* Add gpt-5.3-chat-latest model support

- Add to OpenAI model dropdown (index.html)
- Add to captioning multimodal model list (caption/settings.html)
- Add to OPENAI_REASONING_EFFORT_MODELS (constants.js)
- Add OPENAI_FIXED_REASONING_EFFORT map to clamp effort to 'medium' (the only value this model accepts)
- Apply fixed effort override in both Azure and general OpenAI request paths (chat-completions.js)
- Update frontend gpt-5.x regex for parameter handling (openai.js)

* Update public/scripts/openai.js

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-04 20:04:04 +02:00
Cohee 3070cf26cd Add config for adaptive thinking
Fixes #5236
2026-03-03 20:10:39 +02:00
Cohee 63fa9c1d07 Claude: map Reasoning Effort to adaptive thinking config (#5219)
Supersedes #5105
2026-03-01 17:11:22 +02:00
Cohee 744ce7705d gemini-3.1-flash-image-preview 2026-02-27 20:26:22 +02:00
Brioch 0cef10f63f feat(openrouter): disable reasoning if Request model reasoning is off and effort is minimum (#5079)
* feat(openrouter): disable reasoning if "Request model reasoning" is disabled

* feat(openrouter): map minimum reasoning to none if request reasoning is off

* Add hint how to disable reasoning

---------

Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
2026-02-23 21:19:04 +02:00
Spicy Marinara a923b0eefe Add gemini-3.1-pro-preview to Google AI Studio and Vertex model lists with thinking support (#5188) 2026-02-19 14:28:48 +02:00
Cohee 3bd1034639 claude-sonnet-4-6 2026-02-17 21:33:19 +02:00
Cohee 46ea79bab5 Merge branch 'release' into staging 2026-02-15 15:57:51 +02:00
SenatusSPQR1 4672647293 Fix NanoGPT Claude cache detection for prefixed model IDs (#5164) 2026-02-15 15:57:14 +02:00
Cohee 4d1619ba47 Chore: enable brace-style eslint check (#5159)
* eslint: enable brace-style check

* Fix jsdoc and color

* fix: correct CSS color syntax in CreateZenSliders function
2026-02-15 01:46:32 +02:00
Lumi 39c8eb343c add option for claude-opus-4-6 (#5103)
* add option for claude-opus-4-6

* fix: add claude-opus-4-6 to limited sampling and verbosity model lists

* fix: disable assistant prefill for claude-opus-4-6

* refacor: merge fixthinkingPrefill and noPrefillModel

* 1m context

---------

Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
2026-02-05 21:42:27 +02:00
Brioch 6c864e8bb2 feat(openrouter): add model quantizations setting (#5080)
* feat(openrouter): add model quantizations setting

* Remove bogus setting

* Simplify nullish coalescing assignment

---------

Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
2026-01-30 23:51:22 +02:00
Cohee 10e8e01a55 Moonshot: Map "Request reasoning" to thinking type
Fixes #5072
2026-01-28 00:55:11 +02:00
Cohee 0e5b4de10c Moonshot: Pull vision flag from model data
Fixes #5068
2026-01-28 00:26:50 +02:00
Cohee 5a7875ba28 Update Pollinations API (#5060)
* Upgrade Pollinations API
Done: text, caption
To do: TTS, image
Fixes #5020

* Update Pollinations TTS to new API

* Update Pollinations API for images
2026-01-26 20:31:13 +02:00
DeclineThyself a09c1a7a84 Added 'dot-notation': ['error'] to .eslint.cjs (#5042)
* Added 'dot-notation': ['error'], to `.eslint.cjs`

* Ran `eslint --fix` to correct `dot-notation` errors.

* Added `eslint-disable dot-notation` anywhere errors were caused.

* Allowed dot-notation for uppercase properties: 'allowPattern': '[A-Z]\\w*$'

* Check if `rule instanceof CSSStyleRule`
https://github.com/SillyTavern/SillyTavern/pull/5042#discussion_r2711827148

* Fixed `await result.json();` types.

* refactor: update dot-notation usage in CoquiTtsProvider and PresetManager

---------

Co-authored-by: user <user@exmaple.com>
Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
2026-01-23 00:11:03 +02:00
Cohee 372db63cd5 NanoGPT: Add reasoning effort control
Closes #4999
2026-01-12 21:05:02 +02:00
DeclineThyself 8372e7bf9d "gradually replacing property access with a dot operator" (#4965)
* "gradually replacing property access with a dot operator"
https://github.com/SillyTavern/SillyTavern/pull/4963#discussion_r2663003561

(?<=\w|\])\['([a-zA-Z]\w+)'\]
My regex found 593 matches across 47 files.
Also, two typos.

* Fixed chat[0].chat_metadata type error.
https://github.com/SillyTavern/SillyTavern/pull/4965#discussion_r2664275854

* Fixed `swipedElementsDiv[0]?.getAnimations().filter((a) => a.animationName` type error.
https://github.com/SillyTavern/SillyTavern/pull/4965#discussion_r2664274593

* Fixed config.MESSAGE_SANITIZE and config.MESSAGE_ALLOW_SYSTEM_UI type errors.
https://github.com/SillyTavern/SillyTavern/pull/4965#discussion_r2664266271

* Fixed group.date_last_chat type error.
https://github.com/SillyTavern/SillyTavern/pull/4965#discussion_r2664295652

* Reverted SlashCommandParser dot property access.
https://github.com/SillyTavern/SillyTavern/pull/4965#discussion_r2664310931

* LLM fixed canUseNegativeLookbehind.result; type error.
https://github.com/SillyTavern/SillyTavern/pull/4965#discussion_r2664314288

* Reverted chat-completions.js bodyParams and headers dot property access.

https://github.com/SillyTavern/SillyTavern/pull/4965#discussion_r2664317848
https://github.com/SillyTavern/SillyTavern/pull/4965#discussion_r2664320088
https://github.com/SillyTavern/SillyTavern/pull/4965#discussion_r2664324438

* Reverted openai.js data dot property access.

https://github.com/SillyTavern/SillyTavern/pull/4965#discussion_r2664326244

* Reverted tests/frontend/MacroEnvBuilder.e2e.js env.dynamicMacros dot property access.

https://github.com/SillyTavern/SillyTavern/pull/4965#discussion_r2664330990

* Partially reverted `window` dot property access.

* Reverted result.json() and settings dot property access.

* Reverted google.js headers dot property access.

* Fixed regex: `(?<=\w|\])\['([a-zA-Z]\w*)'\]`

* Swapped window to globalThis with dot property access.

* LLM fixed canUseNegativeLookbehind type.

* Refactor property access

* Consistency

---------

Co-authored-by: user <user@exmaple.com>
Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
2026-01-08 23:58:21 +02:00
Cohee 7320aa948d Audio inlining for OpenAI and Custom-compatible (#4964)
* Audio inlining for OpenAI and Custom-compatible

* Add context sizes

* chatgpt-image-latest

* Add quality control for gpt-image
2026-01-06 13:27:13 +02:00
Subwolf a8eb154517 Zai moonshot reverse proxy (#4923)
* adding reverse proxy support

* update

* added handling for the image caption extension
2025-12-28 23:52:04 +02:00
Ngo Dinh Gia Bao 829db7f2d0 [Electron Hub] Prompt Caching Support for Claude models (#4918)
* Prompt Caching support Claude models

* Prompt Caching support Claude models

* Diff clean-up

---------

Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
2025-12-28 17:05:54 +02:00
Ben 3668e95d95 filter out models that don't have a valid id (#4920) 2025-12-27 02:33:25 +02:00
Chanho Chung ca43796795 Add caching system prompt feature for OpenRouter Gemini (#4903)
* feat: add caching system prompt for OpenRouter Gemini

* fix: resolve reviews
2025-12-20 19:01:42 +02:00
Cohee 83ea6e5cbf Better thonk effort for Gem 3 2025-12-18 21:40:52 +02:00
mightytribble 2cd2bd4a4d Implement Gemini thought signatures (#4886)
* Implement Gemini thought signatures

* Implement streaming support for Gemini thought signatures

* Implement OR support for Gemini thought signatures

* Remove unnecessary extraction of thought sigs from response parts

* Update thought sig comments to remove explicit Gemini mention

* Fix thought_signature naming convention in message.extra

* Add thought_signatures to ReasoningMessageExtra typedef

* Prevent thought sigs being sent to incompatible endpoints

* Move signatures to populateChatHistory, update for consistent casing

* Code clean-up

* Only send thought signatures if target model and API match original

* Implement content-hash thought signature mapping

* Change the data model + split for text/functions

* Don't include signature to invocations if the model doesn't match

* Fix function description

* Remove misleading comment

* Handle OpenRouter signatures

* Improve message extra types

* Prevent modifying original invocations when removing signatures

* Fix return of openrouter non-streaming signatures

* Remove redundant array check

---------

Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
2025-12-17 22:23:47 +02:00
Cohee 081c3e7b1c jerma 3 flash 2025-12-17 20:35:00 +02:00
Cohee 9046fe8d2d Refactor CC API async route handlers (#4885)
* Improve error handling in CC /status and /generate endpoints

* Cancel pending status check on switching CC source
2025-12-11 23:31:46 +02:00
Chanho Chung 6fdeaa2cd9 fix: caching system prompt functionality for OpenRouter Claude (#4872) 2025-12-10 20:22:54 +02:00
Cohee 9aff57c9c4 Add dummy reasoning_content for deepseek-reasoner tool calls
#4857
2025-12-07 23:40:52 +02:00
Ben 55a07d445d Chutes integration (#4844)
* Chutes integration

* Fix eslint

* Fix key saving

* Fix logo coloration

* Fix tool checks

* Unhide image inlining controls

* Fix order of options

* Fix type use in TTS extension script

* Add Chutes as a vector storage source

* Change log levels to debug

* Fix streamed reasoning parsing

* Skip remote models update

* TTS: Fix API key highlight

* Sort image models A-Z

* TTS: Fixes

* Remove unused SD endpoint

* Skip setting context size if models list is not yet loaded

* remove chutes quota / balance

* Fix: streamed tool calling

* Hide reasoning effort control

* Add image request debug log

* Fix: scroll down on media load in extensions

* Unhide some samplers

* Bring back reasoning effort

* This code will never execute

* Reformat else if cases

* Add stop strings to request

* Remove conditional from reasoning_effort body param

* Preserve original pricing fields

* Unhide logit bias setting

* Pass repetition penalty and logit bias to backend

* Swap llama tokenizer for llama3

* Pass min_p, remove supported_sampling_parameters checks

* Enable logprobs

---------

Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
2025-12-01 00:17:49 +02:00
mightytribble 1f78094322 Convert OAI tool_choice to Gemini functionCallingConfig for Gemini requests (#4840)
* Send toolConfig block to Gemini, if defined and tools block also present.

* Convert OAI tool_choice to Gemini functionCallingConfig for Gemini requests

* Remove blank line

---------

Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
2025-11-30 19:18:41 +02:00
Cohee 2eba10fa7f Gemini: Add image request settings (#4838)
* Gemini: Add image request settings

* Allow aspect ratio for 2.5 flash
2025-11-29 00:59:09 +02:00
Cohee 3b59eae7c0 Gemini: Only register custom tools when there are no function tools 2025-11-28 20:22:05 +02:00
Cohee 068c6bdccd Gemini: Fix search tool is not supported when function tools are used 2025-11-28 20:18:02 +02:00
mightytribble 32bbf4ec10 Support non-function native tools for Gemini
* Enable retrieval tool type for VertexAI Gemini endpoints

* Apply code suggestion

---------

Co-authored-by: Cohee <18619528+Cohee1207@users.noreply.github.com>
2025-11-28 20:02:32 +02:00
Cohee 965b86da62 Add verbosity control (#4837)
* Add verbosity control

* Remove for Azure OpenAI
2025-11-28 19:49:59 +02:00
Cohee 0a22856faf Chat Completion: Reduce number of toggles in AI Response Configuration (#4821)
* Chat Completion: Reduce number of toggles in AI Response Configuration

* Consolidate migration logic

* Don't enable media inlining if image inlining was disabled

* Fix icons showing on media toggle off

* Update i18n
2025-11-28 00:16:23 +02:00
Cohee 3efcfbd1a2 Add new Claude model options and update regex checks for model validation 2025-11-24 21:55:29 +02:00
Cohee 248f5aa892 NanoGPT: Expose additional samplers 2025-11-24 20:36:51 +02:00