Merge pull request #1314 from danielaskdd/main

Improve context only mode for Ollama api
2025-04-08 18:48:12 +08:00
parent 3c2d0462bb 25f2ecb5a9
commit e5daf2cfbd
3 changed files with 50 additions and 23 deletions
--- a/lightrag/api/README-zh.md
+++ b/lightrag/api/README-zh.md
@@ -164,7 +164,7 @@ sudo systemctl enable lightrag.service

 ### 将 Open WebUI 连接到 LightRAG

-启动 lightrag-server 后，您可以在 Open WebUI 管理面板中添加 Ollama 类型的连接。然后，一个名为 lightrag:latest 的模型将出现在 Open WebUI 的模型管理界面中。用户随后可以通过聊天界面向 LightRAG 发送查询。对于这种用例，最好将 LightRAG 安装为服务。
+启动 lightrag-server 后，您可以在 Open WebUI 管理面板中添加 Ollama 类型的连接。然后，一个名为 `lightrag:latest` 的模型将出现在 Open WebUI 的模型管理界面中。用户随后可以通过聊天界面向 LightRAG 发送查询。对于这种用例，最好将 LightRAG 安装为服务。

 Open WebUI 使用 LLM 来执行会话标题和会话关键词生成任务。因此，Ollama 聊天补全 API 会检测并将 OpenWebUI 会话相关请求直接转发给底层 LLM。Open WebUI 的截图：

@@ -172,6 +172,8 @@ Open WebUI 使用 LLM 来执行会话标题和会话关键词生成任务。因

 ### 在聊天中选择查询模式

+如果您从 LightRAG 的 Ollama 接口发送消息（查询），默认查询模式是 `hybrid`。您可以通过发送带有查询前缀的消息来选择查询模式。
+
 查询字符串中的查询前缀可以决定使用哪种 LightRAG 查询模式来生成响应。支持的前缀包括：

 ```
@@ -180,13 +182,22 @@ Open WebUI 使用 LLM 来执行会话标题和会话关键词生成任务。因
 /hybrid
 /naive
 /mix
+
 /bypass
+/context
+/localcontext
+/globalcontext
+/hybridcontext
+/naivecontext
+/mixcontext
 ```

 例如，聊天消息 "/mix 唐僧有几个徒弟" 将触发 LightRAG 的混合模式查询。没有查询前缀的聊天消息默认会触发混合模式查询。

 "/bypass" 不是 LightRAG 查询模式，它会告诉 API 服务器将查询连同聊天历史直接传递给底层 LLM。因此用户可以使用 LLM 基于聊天历史回答问题。如果您使用 Open WebUI 作为前端，您可以直接切换到普通 LLM 模型，而不是使用 /bypass 前缀。

+"/context" 也不是 LightRAG 查询模式，它会告诉 LightRAG 只返回为 LLM 准备的上下文信息。您可以检查上下文是否符合您的需求，或者自行处理上下文。
+
 ## API 密钥和认证

 默认情况下，LightRAG 服务器可以在没有任何认证的情况下访问。我们可以使用 API 密钥或账户凭证配置服务器以确保其安全。
--- a/lightrag/api/README.md
+++ b/lightrag/api/README.md
@@ -168,7 +168,7 @@ We provide an Ollama-compatible interfaces for LightRAG, aiming to emulate Light

 ### Connect Open WebUI to LightRAG

-After starting the lightrag-server, you can add an Ollama-type connection in the Open WebUI admin pannel. And then a model named lightrag:latest will appear in Open WebUI's model management interface. Users can then send queries to LightRAG through the chat interface. You'd better install LightRAG as service for this use case.
+After starting the lightrag-server, you can add an Ollama-type connection in the Open WebUI admin pannel. And then a model named `lightrag:latest` will appear in Open WebUI's model management interface. Users can then send queries to LightRAG through the chat interface. You'd better install LightRAG as service for this use case.

 Open WebUI's use LLM to do the session title and session keyword generation task. So the Ollama chat chat completion API detects and forwards OpenWebUI session-related requests directly to underlying LLM. Screen shot from Open WebUI:

@@ -176,6 +176,8 @@ Open WebUI's use LLM to do the session title and session keyword generation task

 ### Choose Query mode in chat

+The defautl query mode is `hybrid` if you send a message(query) from Ollama interface of LightRAG. You can select query mode by sending a message with query prefix.
+
 A query prefix in the query string can determines which LightRAG query mode is used to generate the respond for the query. The supported prefixes include:

 ```
@@ -184,12 +186,21 @@ A query prefix in the query string can determines which LightRAG query mode is u
 /hybrid
 /naive
 /mix
+
 /bypass
+/context
+/localcontext
+/globalcontext
+/hybridcontext
+/naivecontext
+/mixcontext
 ```

-For example, chat message "/mix 唐僧有几个徒弟" will trigger a mix mode query for LighRAG. A chat message without query prefix will trigger a hybrid mode query by default。
+For example, chat message "/mix What's LightRag" will trigger a mix mode query for LighRAG. A chat message without query prefix will trigger a hybrid mode query by default.

-"/bypass" is not a LightRAG query mode, it will tell API Server to pass the query directly to the underlying LLM with chat history. So user can use LLM to answer question base on the chat history. If you are using Open WebUI as front end, you can just switch the model to a normal LLM instead of using /bypass prefix.
+"/bypass" not a LightRAG query mode, it will tell API Server to pass the query directly to the underlying LLM with chat history. So user can use LLM to answer question base on the chat history. If you are using Open WebUI as front end, you can just switch the model to a normal LLM instead of using /bypass prefix.
+
+"/context" is not a LightRAG query mode neither, it will tell LightRAG to return only the context information prepared for LLM. You can check the context if it's want you want, or process the conext by your self.



--- a/lightrag/api/routers/ollama_api.py
+++ b/lightrag/api/routers/ollama_api.py
@@ -101,27 +101,38 @@ def estimate_tokens(text: str) -> int:
    return len(tokens)


-def parse_query_mode(query: str) -> tuple[str, SearchMode]:
+def parse_query_mode(query: str) -> tuple[str, SearchMode, bool]:
    """Parse query prefix to determine search mode
-    Returns tuple of (cleaned_query, search_mode)
+    Returns tuple of (cleaned_query, search_mode, only_need_context)
    """
    mode_map = {
-        "/local ": SearchMode.local,
-        "/global ": SearchMode.global_,  # global_ is used because 'global' is a Python keyword
-        "/naive ": SearchMode.naive,
-        "/hybrid ": SearchMode.hybrid,
-        "/mix ": SearchMode.mix,
-        "/bypass ": SearchMode.bypass,
-        "/context": SearchMode.context,
+        "/local ": (SearchMode.local, False),
+        "/global ": (
+            SearchMode.global_,
+            False,
+        ),  # global_ is used because 'global' is a Python keyword
+        "/naive ": (SearchMode.naive, False),
+        "/hybrid ": (SearchMode.hybrid, False),
+        "/mix ": (SearchMode.mix, False),
+        "/bypass ": (SearchMode.bypass, False),
+        "/context": (
+            SearchMode.hybrid,
+            True,
+        ),
+        "/localcontext": (SearchMode.local, True),
+        "/globalcontext": (SearchMode.global_, True),
+        "/hybridcontext": (SearchMode.hybrid, True),
+        "/naivecontext": (SearchMode.naive, True),
+        "/mixcontext": (SearchMode.mix, True),
    }

-    for prefix, mode in mode_map.items():
+    for prefix, (mode, only_need_context) in mode_map.items():
        if query.startswith(prefix):
            # After removing prefix an leading spaces
            cleaned_query = query[len(prefix) :].lstrip()
-            return cleaned_query, mode
+            return cleaned_query, mode, only_need_context

-    return query, SearchMode.hybrid
+    return query, SearchMode.hybrid, False


 class OllamaAPI:
@@ -351,17 +362,11 @@ class OllamaAPI:
                ]

                # Check for query prefix
-                cleaned_query, mode = parse_query_mode(query)
+                cleaned_query, mode, only_need_context = parse_query_mode(query)

                start_time = time.time_ns()
                prompt_tokens = estimate_tokens(cleaned_query)

-                if mode == SearchMode.context:
-                    mode = SearchMode.hybrid
-                    only_need_context = True
-                else:
-                    only_need_context = False
-
                param_dict = {
                    "mode": mode,
                    "stream": request.stream,