Embeddings & Semantic Cache

Muxa supports multiple embedding backends so clients like Cursor/Codex can run @Codebase searches through the proxy.

Supported Backends

Backend Variables
Ollama MUXA_SEMANTIC_CACHE_ENABLED=true, OLLAMA_BASE_URL=http://localhost:11434, OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
llama.cpp Point your llama.cpp server to expose an OpenAI-compatible embeddings endpoint and set OPENAI_BASE_URL / OPENAI_API_KEY appropriately.
OpenRouter MUXA_SEMANTIC_CACHE_ENABLED=true, OPENROUTER_API_KEY=..., OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
OpenAI MUXA_SEMANTIC_CACHE_ENABLED=true, OPENAI_API_KEY=..., optionally override OPENAI_BASE_URL for proxies like MLX.

Quick Start (Ollama)

ollama pull nomic-embed-text
export MUXA_SEMANTIC_CACHE_ENABLED=true
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
npm start

Tuning

  • MUXA_SEMANTIC_CACHE_THRESHOLD — cosine similarity threshold (0..1). Default 0.85.
  • MUXA_PROMPT_CACHE_TTL_MS / MUXA_PROMPT_CACHE_MAX control the exact prompt cache.
  • MUXA_MEMORY_* toggles affect how many memories are injected alongside embeddings.

When the semantic cache is enabled, /metrics/semantic-cache and /metrics/compression report hits/misses and savings.

results matching ""

    No results matching ""