Merge pull request #1314 from danielaskdd/main

Improve context only mode for Ollama api
This commit is contained in:
Daniel.y
2025-04-08 18:48:12 +08:00
committed by GitHub
3 changed files with 50 additions and 23 deletions

View File

@@ -164,7 +164,7 @@ sudo systemctl enable lightrag.service
### 将 Open WebUI 连接到 LightRAG
启动 lightrag-server 后,您可以在 Open WebUI 管理面板中添加 Ollama 类型的连接。然后,一个名为 lightrag:latest 的模型将出现在 Open WebUI 的模型管理界面中。用户随后可以通过聊天界面向 LightRAG 发送查询。对于这种用例,最好将 LightRAG 安装为服务。
启动 lightrag-server 后,您可以在 Open WebUI 管理面板中添加 Ollama 类型的连接。然后,一个名为 `lightrag:latest` 的模型将出现在 Open WebUI 的模型管理界面中。用户随后可以通过聊天界面向 LightRAG 发送查询。对于这种用例,最好将 LightRAG 安装为服务。
Open WebUI 使用 LLM 来执行会话标题和会话关键词生成任务。因此Ollama 聊天补全 API 会检测并将 OpenWebUI 会话相关请求直接转发给底层 LLM。Open WebUI 的截图:
@@ -172,6 +172,8 @@ Open WebUI 使用 LLM 来执行会话标题和会话关键词生成任务。因
### 在聊天中选择查询模式
如果您从 LightRAG 的 Ollama 接口发送消息(查询),默认查询模式是 `hybrid`。您可以通过发送带有查询前缀的消息来选择查询模式。
查询字符串中的查询前缀可以决定使用哪种 LightRAG 查询模式来生成响应。支持的前缀包括:
```
@@ -180,13 +182,22 @@ Open WebUI 使用 LLM 来执行会话标题和会话关键词生成任务。因
/hybrid
/naive
/mix
/bypass
/context
/localcontext
/globalcontext
/hybridcontext
/naivecontext
/mixcontext
```
例如,聊天消息 "/mix 唐僧有几个徒弟" 将触发 LightRAG 的混合模式查询。没有查询前缀的聊天消息默认会触发混合模式查询。
"/bypass" 不是 LightRAG 查询模式,它会告诉 API 服务器将查询连同聊天历史直接传递给底层 LLM。因此用户可以使用 LLM 基于聊天历史回答问题。如果您使用 Open WebUI 作为前端,您可以直接切换到普通 LLM 模型,而不是使用 /bypass 前缀。
"/context" 也不是 LightRAG 查询模式,它会告诉 LightRAG 只返回为 LLM 准备的上下文信息。您可以检查上下文是否符合您的需求,或者自行处理上下文。
## API 密钥和认证
默认情况下LightRAG 服务器可以在没有任何认证的情况下访问。我们可以使用 API 密钥或账户凭证配置服务器以确保其安全。

View File

@@ -168,7 +168,7 @@ We provide an Ollama-compatible interfaces for LightRAG, aiming to emulate Light
### Connect Open WebUI to LightRAG
After starting the lightrag-server, you can add an Ollama-type connection in the Open WebUI admin pannel. And then a model named lightrag:latest will appear in Open WebUI's model management interface. Users can then send queries to LightRAG through the chat interface. You'd better install LightRAG as service for this use case.
After starting the lightrag-server, you can add an Ollama-type connection in the Open WebUI admin pannel. And then a model named `lightrag:latest` will appear in Open WebUI's model management interface. Users can then send queries to LightRAG through the chat interface. You'd better install LightRAG as service for this use case.
Open WebUI's use LLM to do the session title and session keyword generation task. So the Ollama chat chat completion API detects and forwards OpenWebUI session-related requests directly to underlying LLM. Screen shot from Open WebUI:
@@ -176,6 +176,8 @@ Open WebUI's use LLM to do the session title and session keyword generation task
### Choose Query mode in chat
The defautl query mode is `hybrid` if you send a message(query) from Ollama interface of LightRAG. You can select query mode by sending a message with query prefix.
A query prefix in the query string can determines which LightRAG query mode is used to generate the respond for the query. The supported prefixes include:
```
@@ -184,12 +186,21 @@ A query prefix in the query string can determines which LightRAG query mode is u
/hybrid
/naive
/mix
/bypass
/context
/localcontext
/globalcontext
/hybridcontext
/naivecontext
/mixcontext
```
For example, chat message "/mix 唐僧有几个徒弟" will trigger a mix mode query for LighRAG. A chat message without query prefix will trigger a hybrid mode query by default
For example, chat message "/mix What's LightRag" will trigger a mix mode query for LighRAG. A chat message without query prefix will trigger a hybrid mode query by default.
"/bypass" is not a LightRAG query mode, it will tell API Server to pass the query directly to the underlying LLM with chat history. So user can use LLM to answer question base on the chat history. If you are using Open WebUI as front end, you can just switch the model to a normal LLM instead of using /bypass prefix.
"/bypass" not a LightRAG query mode, it will tell API Server to pass the query directly to the underlying LLM with chat history. So user can use LLM to answer question base on the chat history. If you are using Open WebUI as front end, you can just switch the model to a normal LLM instead of using /bypass prefix.
"/context" is not a LightRAG query mode neither, it will tell LightRAG to return only the context information prepared for LLM. You can check the context if it's want you want, or process the conext by your self.

View File

@@ -101,27 +101,38 @@ def estimate_tokens(text: str) -> int:
return len(tokens)
def parse_query_mode(query: str) -> tuple[str, SearchMode]:
def parse_query_mode(query: str) -> tuple[str, SearchMode, bool]:
"""Parse query prefix to determine search mode
Returns tuple of (cleaned_query, search_mode)
Returns tuple of (cleaned_query, search_mode, only_need_context)
"""
mode_map = {
"/local ": SearchMode.local,
"/global ": SearchMode.global_, # global_ is used because 'global' is a Python keyword
"/naive ": SearchMode.naive,
"/hybrid ": SearchMode.hybrid,
"/mix ": SearchMode.mix,
"/bypass ": SearchMode.bypass,
"/context": SearchMode.context,
"/local ": (SearchMode.local, False),
"/global ": (
SearchMode.global_,
False,
), # global_ is used because 'global' is a Python keyword
"/naive ": (SearchMode.naive, False),
"/hybrid ": (SearchMode.hybrid, False),
"/mix ": (SearchMode.mix, False),
"/bypass ": (SearchMode.bypass, False),
"/context": (
SearchMode.hybrid,
True,
),
"/localcontext": (SearchMode.local, True),
"/globalcontext": (SearchMode.global_, True),
"/hybridcontext": (SearchMode.hybrid, True),
"/naivecontext": (SearchMode.naive, True),
"/mixcontext": (SearchMode.mix, True),
}
for prefix, mode in mode_map.items():
for prefix, (mode, only_need_context) in mode_map.items():
if query.startswith(prefix):
# After removing prefix an leading spaces
cleaned_query = query[len(prefix) :].lstrip()
return cleaned_query, mode
return cleaned_query, mode, only_need_context
return query, SearchMode.hybrid
return query, SearchMode.hybrid, False
class OllamaAPI:
@@ -351,17 +362,11 @@ class OllamaAPI:
]
# Check for query prefix
cleaned_query, mode = parse_query_mode(query)
cleaned_query, mode, only_need_context = parse_query_mode(query)
start_time = time.time_ns()
prompt_tokens = estimate_tokens(cleaned_query)
if mode == SearchMode.context:
mode = SearchMode.hybrid
only_need_context = True
else:
only_need_context = False
param_dict = {
"mode": mode,
"stream": request.stream,