Update README

2025-04-23 09:43:24 +08:00
parent 466baaa29f
commit c48ead6d8e
2 changed files with 223 additions and 237 deletions
--- a/README-zh.md
+++ b/README-zh.md
@@ -93,9 +93,11 @@ python examples/lightrag_openai_demo.py

 **注意事项**：在运行demo程序的时候需要注意，不同的测试程序可能使用的是不同的embedding模型，更换不同的embeding模型的时候需要把清空数据目录（`./dickens`），否则层序执行会出错。如果你想保留LLM缓存，可以在清除数据目录是保留`kv_store_llm_response_cache.json`文件。

-## 查询
+## 使用LightRAG Core进行编程

-使用以下Python代码片段（在脚本中）初始化LightRAG并执行查询：
+### 一个简单程序
+
+以下Python代码片段演示了如何初始化LightRAG、插入文本并进行查询：

 ```python
 import os
@@ -107,6 +109,7 @@ from lightrag.utils import setup_logger

 setup_logger("lightrag", level="INFO")

+WORKING_DIR = "./rag_storage"
 if not os.path.exists(WORKING_DIR):
    os.mkdir(WORKING_DIR)

@@ -120,23 +123,24 @@ async def initialize_rag():
    await initialize_pipeline_status()
    return rag

-def main():
+async def main():
    try:
-        # Initialize RAG instance
+        # 初始化RAG实例
        rag = await initialize_rag()
-		    rag.insert("Your text")
+        # 插入文本
+        await rag.insert("Your text")

-        # Perform hybrid search
-        mode="hybrid"
+        # 执行混合检索
+        mode = "hybrid"
        print(
-          await rag.query(
-              "What are the top themes in this story?",
-              param=QueryParam(mode=mode)
-          )
+            await rag.query(
+                "这个故事的主要主题是什么？",
+                param=QueryParam(mode=mode)
+            )
        )

    except Exception as e:
-        print(f"An error occurred: {e}")
+        print(f"发生错误: {e}")
    finally:
        if rag:
            await rag.finalize_storages()
@@ -145,8 +149,54 @@ if __name__ == "__main__":
    asyncio.run(main())
 ```

+重要说明：
+- 运行脚本前请先导出你的OPENAI_API_KEY环境变量。
+- 该程序使用LightRAG的默认存储设置，所有数据将持久化在WORKING_DIR/rag_storage目录下。
+- 该示例仅展示了初始化LightRAG对象的最简单方式：注入embedding和LLM函数，并在创建LightRAG对象后初始化存储和管道状态。
+
+### LightRAG初始化参数
+
+以下是完整的LightRAG对象初始化参数清单：
+
+<details>
+<summary> 参数 </summary>
+
+| **参数** | **类型** | **说明** | **默认值** |
+|--------------|----------|-----------------|-------------|
+| **working_dir** | `str` | 存储缓存的目录 | `lightrag_cache+timestamp` |
+| **kv_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`,`PGKVStorage`,`RedisKVStorage`,`MongoKVStorage` | `JsonKVStorage` |
+| **vector_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`,`PGVectorStorage`,`MilvusVectorDBStorage`,`ChromaVectorDBStorage`,`FaissVectorDBStorage`,`MongoVectorDBStorage`,`QdrantVectorDBStorage` | `NanoVectorDBStorage` |
+| **graph_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`,`Neo4JStorage`,`PGGraphStorage`,`AGEStorage` | `NetworkXStorage` |
+| **doc_status_storage** | `str` | Storage type for documents process status. Supported types: `JsonDocStatusStorage`,`PGDocStatusStorage`,`MongoDocStatusStorage` | `JsonDocStatusStorage` |
+| **chunk_token_size** | `int` | 拆分文档时每个块的最大令牌大小 | `1200` |
+| **chunk_overlap_token_size** | `int` | 拆分文档时两个块之间的重叠令牌大小 | `100` |
+| **tokenizer** | `Tokenizer` | 用于将文本转换为 tokens（数字）以及使用遵循 TokenizerInterface 协议的 .encode() 和 .decode() 函数将 tokens 转换回文本的函数。 如果您不指定，它将使用默认的 Tiktoken tokenizer。 | `TiktokenTokenizer` |
+| **tiktoken_model_name** | `str` | 如果您使用的是默认的 Tiktoken tokenizer，那么这是要使用的特定 Tiktoken 模型的名称。如果您提供自己的 tokenizer，则忽略此设置。 | `gpt-4o-mini` |
+| **entity_extract_max_gleaning** | `int` | 实体提取过程中的循环次数，附加历史消息 | `1` |
+| **entity_summary_to_max_tokens** | `int` | 每个实体摘要的最大令牌大小 | `500` |
+| **node_embedding_algorithm** | `str` | 节点嵌入算法（当前未使用） | `node2vec` |
+| **node2vec_params** | `dict` | 节点嵌入的参数 | `{"dimensions": 1536,"num_walks": 10,"walk_length": 40,"window_size": 2,"iterations": 3,"random_seed": 3,}` |
+| **embedding_func** | `EmbeddingFunc` | 从文本生成嵌入向量的函数 | `openai_embed` |
+| **embedding_batch_num** | `int` | 嵌入过程的最大批量大小（每批发送多个文本） | `32` |
+| **embedding_func_max_async** | `int` | 最大并发异步嵌入进程数 | `16` |
+| **llm_model_func** | `callable` | LLM生成的函数 | `gpt_4o_mini_complete` |
+| **llm_model_name** | `str` | 用于生成的LLM模型名称 | `meta-llama/Llama-3.2-1B-Instruct` |
+| **llm_model_max_token_size** | `int` | LLM生成的最大令牌大小（影响实体关系摘要） | `32768`（默认值由环境变量MAX_TOKENS更改） |
+| **llm_model_max_async** | `int` | 最大并发异步LLM进程数 | `4`（默认值由环境变量MAX_ASYNC更改） |
+| **llm_model_kwargs** | `dict` | LLM生成的附加参数 | |
+| **vector_db_storage_cls_kwargs** | `dict` | 向量数据库的附加参数，如设置节点和关系检索的阈值 | cosine_better_than_threshold: 0.2（默认值由环境变量COSINE_THRESHOLD更改） |
+| **enable_llm_cache** | `bool` | 如果为`TRUE`，将LLM结果存储在缓存中；重复的提示返回缓存的响应 | `TRUE` |
+| **enable_llm_cache_for_entity_extract** | `bool` | 如果为`TRUE`，将实体提取的LLM结果存储在缓存中；适合初学者调试应用程序 | `TRUE` |
+| **addon_params** | `dict` | 附加参数，例如`{"example_number": 1, "language": "Simplified Chinese", "entity_types": ["organization", "person", "geo", "event"]}`：设置示例限制、输出语言和文档处理的批量大小 | `example_number: 所有示例, language: English` |
+| **convert_response_to_json_func** | `callable` | 未使用 | `convert_response_to_json` |
+| **embedding_cache_config** | `dict` | 问答缓存的配置。包含三个参数：`enabled`：布尔值，启用/禁用缓存查找功能。启用时，系统将在生成新答案之前检查缓存的响应。`similarity_threshold`：浮点值（0-1），相似度阈值。当新问题与缓存问题的相似度超过此阈值时，将直接返回缓存的答案而不调用LLM。`use_llm_check`：布尔值，启用/禁用LLM相似度验证。启用时，在返回缓存答案之前，将使用LLM作为二次检查来验证问题之间的相似度。 | 默认：`{"enabled": False, "similarity_threshold": 0.95, "use_llm_check": False}` |
+
+</details>
+
 ### 查询参数

+使用QueryParam控制你的查询行为：
+
 ```python
 class QueryParam:
    mode: Literal["local", "global", "hybrid", "naive", "mix"] = "global"
@@ -421,54 +471,6 @@ if __name__ == "__main__":

 </details>

-### Token统计功能
-<details>
-<summary> <b>概述和使用</b> </summary>
-
-LightRAG提供了TokenTracker工具来跟踪和管理大模型的token消耗。这个功能对于控制API成本和优化性能特别有用。
-
-#### 使用方法
-
-```python
-from lightrag.utils import TokenTracker
-
-# 创建TokenTracker实例
-token_tracker = TokenTracker()
-
-# 方法1：使用上下文管理器（推荐）
-# 适用于需要自动跟踪token使用的场景
-with token_tracker:
-    result1 = await llm_model_func("你的问题1")
-    result2 = await llm_model_func("你的问题2")
-
-# 方法2：手动添加token使用记录
-# 适用于需要更精细控制token统计的场景
-token_tracker.reset()
-
-rag.insert()
-
-rag.query("你的问题1", param=QueryParam(mode="naive"))
-rag.query("你的问题2", param=QueryParam(mode="mix"))
-
-# 显示总token使用量（包含插入和查询操作）
-print("Token usage:", token_tracker.get_usage())
-```
-
-#### 使用建议
- 在长会话或批量操作中使用上下文管理器，可以自动跟踪所有token消耗
- 对于需要分段统计的场景，使用手动模式并适时调用reset()
- 定期检查token使用情况，有助于及时发现异常消耗
- 在开发测试阶段积极使用此功能，以便优化生产环境的成本
-
-#### 实际应用示例
-您可以参考以下示例来实现token统计：
- `examples/lightrag_gemini_track_token_demo.py`：使用Google Gemini模型的token统计示例
- `examples/lightrag_siliconcloud_track_token_demo.py`：使用SiliconCloud模型的token统计示例
-
-这些示例展示了如何在不同模型和场景下有效地使用TokenTracker功能。
-
-</details>
-
 ### 对话历史

 LightRAG现在通过对话历史功能支持多轮对话。以下是使用方法：
@@ -619,7 +621,7 @@ custom_kg = {
 rag.insert_custom_kg(custom_kg)
 ```

-## 插入
+### 插入

 <details>
  <summary> <b> 基本插入 </b></summary>
@@ -718,7 +720,9 @@ rag.insert(documents, file_paths=file_paths)

 </details>

-## 存储
+### 存储
+
+LightRAG使用到4种类型的存储，每一种存储都有多种实现方案。在初始化LightRAG的时候可以通过参数设定这四类存储的实现方案。详情请参看前面的LightRAG初始化参数。

 <details>
 <summary> <b>使用Neo4J进行存储</b> </summary>
@@ -846,16 +850,6 @@ rag = LightRAG(

 </details>

-## 删除
-
-```python
-#  删除实体：通过实体名称删除实体
-rag.delete_by_entity("Project Gutenberg")
-
-#  删除文档：通过文档ID删除与文档相关的实体和关系
-rag.delete_by_doc_id("doc_id")
-```
-
 ## 编辑实体和关系

 LightRAG现在支持全面的知识图谱管理功能，允许您在知识图谱中创建、编辑和删除实体和关系。
@@ -926,6 +920,54 @@ updated_relation = rag.edit_relation("Google", "Google Mail", {

 这些操作在图数据库和向量数据库组件之间保持数据一致性，确保您的知识图谱保持连贯。

+## Token统计功能
+<details>
+<summary> <b>概述和使用</b> </summary>
+
+LightRAG提供了TokenTracker工具来跟踪和管理大模型的token消耗。这个功能对于控制API成本和优化性能特别有用。
+
+### 使用方法
+
+```python
+from lightrag.utils import TokenTracker
+
+# 创建TokenTracker实例
+token_tracker = TokenTracker()
+
+# 方法1：使用上下文管理器（推荐）
+# 适用于需要自动跟踪token使用的场景
+with token_tracker:
+    result1 = await llm_model_func("你的问题1")
+    result2 = await llm_model_func("你的问题2")
+
+# 方法2：手动添加token使用记录
+# 适用于需要更精细控制token统计的场景
+token_tracker.reset()
+
+rag.insert()
+
+rag.query("你的问题1", param=QueryParam(mode="naive"))
+rag.query("你的问题2", param=QueryParam(mode="mix"))
+
+# 显示总token使用量（包含插入和查询操作）
+print("Token usage:", token_tracker.get_usage())
+```
+
+### 使用建议
+- 在长会话或批量操作中使用上下文管理器，可以自动跟踪所有token消耗
+- 对于需要分段统计的场景，使用手动模式并适时调用reset()
+- 定期检查token使用情况，有助于及时发现异常消耗
+- 在开发测试阶段积极使用此功能，以便优化生产环境的成本
+
+### 实际应用示例
+您可以参考以下示例来实现token统计：
+- `examples/lightrag_gemini_track_token_demo.py`：使用Google Gemini模型的token统计示例
+- `examples/lightrag_siliconcloud_track_token_demo.py`：使用SiliconCloud模型的token统计示例
+
+这些示例展示了如何在不同模型和场景下有效地使用TokenTracker功能。
+
+</details>
+
 ## 数据导出功能

 ### 概述
@@ -1082,56 +1124,6 @@ rag.clear_cache(modes=["local"])

 </details>

-## LightRAG初始化参数
-
-<details>
-<summary> 参数 </summary>
-
-| **参数** | **类型** | **说明** | **默认值** |
-|--------------|----------|-----------------|-------------|
-| **working_dir** | `str` | 存储缓存的目录 | `lightrag_cache+timestamp` |
-| **kv_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`,`PGKVStorage`,`RedisKVStorage`,`MongoKVStorage` | `JsonKVStorage` |
-| **vector_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`,`PGVectorStorage`,`MilvusVectorDBStorage`,`ChromaVectorDBStorage`,`FaissVectorDBStorage`,`MongoVectorDBStorage`,`QdrantVectorDBStorage` | `NanoVectorDBStorage` |
-| **graph_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`,`Neo4JStorage`,`PGGraphStorage`,`AGEStorage` | `NetworkXStorage` |
-| **doc_status_storage** | `str` | Storage type for documents process status. Supported types: `JsonDocStatusStorage`,`PGDocStatusStorage`,`MongoDocStatusStorage` | `JsonDocStatusStorage` |
-| **chunk_token_size** | `int` | 拆分文档时每个块的最大令牌大小 | `1200` |
-| **chunk_overlap_token_size** | `int` | 拆分文档时两个块之间的重叠令牌大小 | `100` |
-| **tokenizer** | `Tokenizer` | 用于将文本转换为 tokens（数字）以及使用遵循 TokenizerInterface 协议的 .encode() 和 .decode() 函数将 tokens 转换回文本的函数。 如果您不指定，它将使用默认的 Tiktoken tokenizer。 | `TiktokenTokenizer` |
-| **tiktoken_model_name** | `str` | 如果您使用的是默认的 Tiktoken tokenizer，那么这是要使用的特定 Tiktoken 模型的名称。如果您提供自己的 tokenizer，则忽略此设置。 | `gpt-4o-mini` |
-| **entity_extract_max_gleaning** | `int` | 实体提取过程中的循环次数，附加历史消息 | `1` |
-| **entity_summary_to_max_tokens** | `int` | 每个实体摘要的最大令牌大小 | `500` |
-| **node_embedding_algorithm** | `str` | 节点嵌入算法（当前未使用） | `node2vec` |
-| **node2vec_params** | `dict` | 节点嵌入的参数 | `{"dimensions": 1536,"num_walks": 10,"walk_length": 40,"window_size": 2,"iterations": 3,"random_seed": 3,}` |
-| **embedding_func** | `EmbeddingFunc` | 从文本生成嵌入向量的函数 | `openai_embed` |
-| **embedding_batch_num** | `int` | 嵌入过程的最大批量大小（每批发送多个文本） | `32` |
-| **embedding_func_max_async** | `int` | 最大并发异步嵌入进程数 | `16` |
-| **llm_model_func** | `callable` | LLM生成的函数 | `gpt_4o_mini_complete` |
-| **llm_model_name** | `str` | 用于生成的LLM模型名称 | `meta-llama/Llama-3.2-1B-Instruct` |
-| **llm_model_max_token_size** | `int` | LLM生成的最大令牌大小（影响实体关系摘要） | `32768`（默认值由环境变量MAX_TOKENS更改） |
-| **llm_model_max_async** | `int` | 最大并发异步LLM进程数 | `4`（默认值由环境变量MAX_ASYNC更改） |
-| **llm_model_kwargs** | `dict` | LLM生成的附加参数 | |
-| **vector_db_storage_cls_kwargs** | `dict` | 向量数据库的附加参数，如设置节点和关系检索的阈值 | cosine_better_than_threshold: 0.2（默认值由环境变量COSINE_THRESHOLD更改） |
-| **enable_llm_cache** | `bool` | 如果为`TRUE`，将LLM结果存储在缓存中；重复的提示返回缓存的响应 | `TRUE` |
-| **enable_llm_cache_for_entity_extract** | `bool` | 如果为`TRUE`，将实体提取的LLM结果存储在缓存中；适合初学者调试应用程序 | `TRUE` |
-| **addon_params** | `dict` | 附加参数，例如`{"example_number": 1, "language": "Simplified Chinese", "entity_types": ["organization", "person", "geo", "event"]}`：设置示例限制、输出语言和文档处理的批量大小 | `example_number: 所有示例, language: English` |
-| **convert_response_to_json_func** | `callable` | 未使用 | `convert_response_to_json` |
-| **embedding_cache_config** | `dict` | 问答缓存的配置。包含三个参数：`enabled`：布尔值，启用/禁用缓存查找功能。启用时，系统将在生成新答案之前检查缓存的响应。`similarity_threshold`：浮点值（0-1），相似度阈值。当新问题与缓存问题的相似度超过此阈值时，将直接返回缓存的答案而不调用LLM。`use_llm_check`：布尔值，启用/禁用LLM相似度验证。启用时，在返回缓存答案之前，将使用LLM作为二次检查来验证问题之间的相似度。 | 默认：`{"enabled": False, "similarity_threshold": 0.95, "use_llm_check": False}` |
-
-</details>
-
-## 错误处理
-
-<details>
-<summary>点击查看错误处理详情</summary>
-
-API包括全面的错误处理：
-
- 文件未找到错误（404）
- 处理错误（500）
- 支持多种文件编码（UTF-8和GBK）
-
-</details>
-
 ## LightRAG API

 LightRAG服务器旨在提供Web UI和API支持。**有关LightRAG服务器的更多信息，请参阅[LightRAG服务器](./lightrag/api/README.md)。**
--- a/README.md
+++ b/README.md
@@ -129,9 +129,13 @@ For a streaming response implementation example, please see `examples/lightrag_o

 **Note**: When running the demo program, please be aware that different test scripts may use different embedding models. If you switch to a different embedding model, you must clear the data directory (`./dickens`); otherwise, the program may encounter errors. If you wish to retain the LLM cache, you can preserve the `kv_store_llm_response_cache.json` file while clearing the data directory.

-## Query
+Integrate Using LightRAG core object

-Use the below Python snippet (in a script) to initialize LightRAG and perform queries:
+## Programing with LightRAG Core
+
+### A Simple Program
+
+Use the below Python snippet to initialize LightRAG, insert text to it, and perform queries:

 ```python
 import os
@@ -143,6 +147,7 @@ from lightrag.utils import setup_logger

 setup_logger("lightrag", level="INFO")

+WORKING_DIR = "./rag_storage"
 if not os.path.exists(WORKING_DIR):
    os.mkdir(WORKING_DIR)

@@ -156,11 +161,11 @@ async def initialize_rag():
    await initialize_pipeline_status()
    return rag

-def main():
+async def main():
    try:
        # Initialize RAG instance
        rag = await initialize_rag()
-		    rag.insert("Your text")
+        rag.insert("Your text")

        # Perform hybrid search
        mode="hybrid"
@@ -181,8 +186,55 @@ if __name__ == "__main__":
    asyncio.run(main())
 ```

+Important notes for the above snippet:
+
+- Export your OPENAI_API_KEY environment variable before running the script.
+- This program uses the default storage settings for LightRAG, so all data will be persisted to WORKING_DIR/rag_storage.
+- This program demonstrates only the simplest way to initialize a LightRAG object: Injecting the embedding and LLM functions, and initializing storage and pipeline status after creating the LightRAG object.
+
+### LightRAG init parameters
+
+A full list of LightRAG init parameters:
+
+<details>
+<summary> Parameters </summary>
+
+| **Parameter** | **Type** | **Explanation** | **Default** |
+|--------------|----------|-----------------|-------------|
+| **working_dir** | `str` | Directory where the cache will be stored | `lightrag_cache+timestamp` |
+| **kv_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`,`PGKVStorage`,`RedisKVStorage`,`MongoKVStorage` | `JsonKVStorage` |
+| **vector_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`,`PGVectorStorage`,`MilvusVectorDBStorage`,`ChromaVectorDBStorage`,`FaissVectorDBStorage`,`MongoVectorDBStorage`,`QdrantVectorDBStorage` | `NanoVectorDBStorage` |
+| **graph_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`,`Neo4JStorage`,`PGGraphStorage`,`AGEStorage` | `NetworkXStorage` |
+| **doc_status_storage** | `str` | Storage type for documents process status. Supported types: `JsonDocStatusStorage`,`PGDocStatusStorage`,`MongoDocStatusStorage` | `JsonDocStatusStorage` |
+| **chunk_token_size** | `int` | Maximum token size per chunk when splitting documents | `1200` |
+| **chunk_overlap_token_size** | `int` | Overlap token size between two chunks when splitting documents | `100` |
+| **tokenizer** | `Tokenizer` | The function used to convert text into tokens (numbers) and back using .encode() and .decode() functions following `TokenizerInterface` protocol. If you don't specify one, it will use the default Tiktoken tokenizer. | `TiktokenTokenizer` |
+| **tiktoken_model_name** | `str` | If you're using the default Tiktoken tokenizer, this is the name of the specific Tiktoken model to use. This setting is ignored if you provide your own tokenizer. | `gpt-4o-mini` |
+| **entity_extract_max_gleaning** | `int` | Number of loops in the entity extraction process, appending history messages | `1` |
+| **entity_summary_to_max_tokens** | `int` | Maximum token size for each entity summary | `500` |
+| **node_embedding_algorithm** | `str` | Algorithm for node embedding (currently not used) | `node2vec` |
+| **node2vec_params** | `dict` | Parameters for node embedding | `{"dimensions": 1536,"num_walks": 10,"walk_length": 40,"window_size": 2,"iterations": 3,"random_seed": 3,}` |
+| **embedding_func** | `EmbeddingFunc` | Function to generate embedding vectors from text | `openai_embed` |
+| **embedding_batch_num** | `int` | Maximum batch size for embedding processes (multiple texts sent per batch) | `32` |
+| **embedding_func_max_async** | `int` | Maximum number of concurrent asynchronous embedding processes | `16` |
+| **llm_model_func** | `callable` | Function for LLM generation | `gpt_4o_mini_complete` |
+| **llm_model_name** | `str` | LLM model name for generation | `meta-llama/Llama-3.2-1B-Instruct` |
+| **llm_model_max_token_size** | `int` | Maximum token size for LLM generation (affects entity relation summaries) | `32768`（default value changed by env var MAX_TOKENS) |
+| **llm_model_max_async** | `int` | Maximum number of concurrent asynchronous LLM processes | `4`（default value changed by env var MAX_ASYNC) |
+| **llm_model_kwargs** | `dict` | Additional parameters for LLM generation | |
+| **vector_db_storage_cls_kwargs** | `dict` | Additional parameters for vector database, like setting the threshold for nodes and relations retrieval | cosine_better_than_threshold: 0.2（default value changed by env var COSINE_THRESHOLD) |
+| **enable_llm_cache** | `bool` | If `TRUE`, stores LLM results in cache; repeated prompts return cached responses | `TRUE` |
+| **enable_llm_cache_for_entity_extract** | `bool` | If `TRUE`, stores LLM results in cache for entity extraction; Good for beginners to debug your application | `TRUE` |
+| **addon_params** | `dict` | Additional parameters, e.g., `{"example_number": 1, "language": "Simplified Chinese", "entity_types": ["organization", "person", "geo", "event"]}`: sets example limit, entiy/relation extraction output language | `example_number: all examples, language: English` |
+| **convert_response_to_json_func** | `callable` | Not used | `convert_response_to_json` |
+| **embedding_cache_config** | `dict` | Configuration for question-answer caching. Contains three parameters: `enabled`: Boolean value to enable/disable cache lookup functionality. When enabled, the system will check cached responses before generating new answers. `similarity_threshold`: Float value (0-1), similarity threshold. When a new question's similarity with a cached question exceeds this threshold, the cached answer will be returned directly without calling the LLM. `use_llm_check`: Boolean value to enable/disable LLM similarity verification. When enabled, LLM will be used as a secondary check to verify the similarity between questions before returning cached answers. | Default: `{"enabled": False, "similarity_threshold": 0.95, "use_llm_check": False}` |
+
+</details>
+
 ### Query Param

+Use QueryParam to control the behavior your query:
+
 ```python
 class QueryParam:
    mode: Literal["local", "global", "hybrid", "naive", "mix"] = "global"
@@ -460,55 +512,6 @@ if __name__ == "__main__":

 </details>

-### Token Usage Tracking
-
-<details>
-<summary> <b>Overview and Usage</b> </summary>
-
-LightRAG provides a TokenTracker tool to monitor and manage token consumption by large language models. This feature is particularly useful for controlling API costs and optimizing performance.
-
-#### Usage
-
-```python
-from lightrag.utils import TokenTracker
-
-# Create TokenTracker instance
-token_tracker = TokenTracker()
-
-# Method 1: Using context manager (Recommended)
-# Suitable for scenarios requiring automatic token usage tracking
-with token_tracker:
-    result1 = await llm_model_func("your question 1")
-    result2 = await llm_model_func("your question 2")
-
-# Method 2: Manually adding token usage records
-# Suitable for scenarios requiring more granular control over token statistics
-token_tracker.reset()
-
-rag.insert()
-
-rag.query("your question 1", param=QueryParam(mode="naive"))
-rag.query("your question 2", param=QueryParam(mode="mix"))
-
-# Display total token usage (including insert and query operations)
-print("Token usage:", token_tracker.get_usage())
-```
-
-#### Usage Tips
- Use context managers for long sessions or batch operations to automatically track all token consumption
- For scenarios requiring segmented statistics, use manual mode and call reset() when appropriate
- Regular checking of token usage helps detect abnormal consumption early
- Actively use this feature during development and testing to optimize production costs
-
-#### Practical Examples
-You can refer to these examples for implementing token tracking:
- `examples/lightrag_gemini_track_token_demo.py`: Token tracking example using Google Gemini model
- `examples/lightrag_siliconcloud_track_token_demo.py`: Token tracking example using SiliconCloud model
-
-These examples demonstrate how to effectively use the TokenTracker feature with different models and scenarios.
-
-</details>
-
 ### Conversation History Support


@@ -612,7 +615,7 @@ rag.query_with_separate_keyword_extraction(

 </details>

-## Insert
+### Insert

 <details>
  <summary> <b> Basic Insert </b></summary>
@@ -775,7 +778,9 @@ rag.insert(documents, file_paths=file_paths)

 </details>

-## Storage
+### Storage
+
+LightRAG uses four types of storage, each of which has multiple implementation options. When initializing LightRAG, the implementation schemes for these four types of storage can be set through parameters. For details, please refer to the previous LightRAG initialization parameters.

 <details>
 <summary> <b>Using Neo4J for Storage</b> </summary>
@@ -904,16 +909,6 @@ rag = LightRAG(

 </details>

-## Delete
-
-```python
-#  Delete Entity: Deleting entities by their names
-rag.delete_by_entity("Project Gutenberg")
-
-#  Delete Document: Deleting entities and relationships associated with the document by doc id
-rag.delete_by_doc_id("doc_id")
-```
-
 ## Edit Entities and Relations

 LightRAG now supports comprehensive knowledge graph management capabilities, allowing you to create, edit, and delete entities and relationships within your knowledge graph.
@@ -984,6 +979,55 @@ These operations maintain data consistency across both the graph database and ve

 </details>

+## Token Usage Tracking
+
+<details>
+<summary> <b>Overview and Usage</b> </summary>
+
+LightRAG provides a TokenTracker tool to monitor and manage token consumption by large language models. This feature is particularly useful for controlling API costs and optimizing performance.
+
+### Usage
+
+```python
+from lightrag.utils import TokenTracker
+
+# Create TokenTracker instance
+token_tracker = TokenTracker()
+
+# Method 1: Using context manager (Recommended)
+# Suitable for scenarios requiring automatic token usage tracking
+with token_tracker:
+    result1 = await llm_model_func("your question 1")
+    result2 = await llm_model_func("your question 2")
+
+# Method 2: Manually adding token usage records
+# Suitable for scenarios requiring more granular control over token statistics
+token_tracker.reset()
+
+rag.insert()
+
+rag.query("your question 1", param=QueryParam(mode="naive"))
+rag.query("your question 2", param=QueryParam(mode="mix"))
+
+# Display total token usage (including insert and query operations)
+print("Token usage:", token_tracker.get_usage())
+```
+
+### Usage Tips
+- Use context managers for long sessions or batch operations to automatically track all token consumption
+- For scenarios requiring segmented statistics, use manual mode and call reset() when appropriate
+- Regular checking of token usage helps detect abnormal consumption early
+- Actively use this feature during development and testing to optimize production costs
+
+### Practical Examples
+You can refer to these examples for implementing token tracking:
+- `examples/lightrag_gemini_track_token_demo.py`: Token tracking example using Google Gemini model
+- `examples/lightrag_siliconcloud_track_token_demo.py`: Token tracking example using SiliconCloud model
+
+These examples demonstrate how to effectively use the TokenTracker feature with different models and scenarios.
+
+</details>
+
 ## Data Export Functions

 ### Overview
@@ -1148,56 +1192,6 @@ Valid modes are:

 </details>

-## LightRAG init parameters
-
-<details>
-<summary> Parameters </summary>
-
-| **Parameter** | **Type** | **Explanation** | **Default** |
-|--------------|----------|-----------------|-------------|
-| **working_dir** | `str` | Directory where the cache will be stored | `lightrag_cache+timestamp` |
-| **kv_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`,`PGKVStorage`,`RedisKVStorage`,`MongoKVStorage` | `JsonKVStorage` |
-| **vector_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`,`PGVectorStorage`,`MilvusVectorDBStorage`,`ChromaVectorDBStorage`,`FaissVectorDBStorage`,`MongoVectorDBStorage`,`QdrantVectorDBStorage` | `NanoVectorDBStorage` |
-| **graph_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`,`Neo4JStorage`,`PGGraphStorage`,`AGEStorage` | `NetworkXStorage` |
-| **doc_status_storage** | `str` | Storage type for documents process status. Supported types: `JsonDocStatusStorage`,`PGDocStatusStorage`,`MongoDocStatusStorage` | `JsonDocStatusStorage` |
-| **chunk_token_size** | `int` | Maximum token size per chunk when splitting documents | `1200` |
-| **chunk_overlap_token_size** | `int` | Overlap token size between two chunks when splitting documents | `100` |
-| **tokenizer** | `Tokenizer` | The function used to convert text into tokens (numbers) and back using .encode() and .decode() functions following `TokenizerInterface` protocol. If you don't specify one, it will use the default Tiktoken tokenizer. | `TiktokenTokenizer` |
-| **tiktoken_model_name** | `str` | If you're using the default Tiktoken tokenizer, this is the name of the specific Tiktoken model to use. This setting is ignored if you provide your own tokenizer. | `gpt-4o-mini` |
-| **entity_extract_max_gleaning** | `int` | Number of loops in the entity extraction process, appending history messages | `1` |
-| **entity_summary_to_max_tokens** | `int` | Maximum token size for each entity summary | `500` |
-| **node_embedding_algorithm** | `str` | Algorithm for node embedding (currently not used) | `node2vec` |
-| **node2vec_params** | `dict` | Parameters for node embedding | `{"dimensions": 1536,"num_walks": 10,"walk_length": 40,"window_size": 2,"iterations": 3,"random_seed": 3,}` |
-| **embedding_func** | `EmbeddingFunc` | Function to generate embedding vectors from text | `openai_embed` |
-| **embedding_batch_num** | `int` | Maximum batch size for embedding processes (multiple texts sent per batch) | `32` |
-| **embedding_func_max_async** | `int` | Maximum number of concurrent asynchronous embedding processes | `16` |
-| **llm_model_func** | `callable` | Function for LLM generation | `gpt_4o_mini_complete` |
-| **llm_model_name** | `str` | LLM model name for generation | `meta-llama/Llama-3.2-1B-Instruct` |
-| **llm_model_max_token_size** | `int` | Maximum token size for LLM generation (affects entity relation summaries) | `32768`（default value changed by env var MAX_TOKENS) |
-| **llm_model_max_async** | `int` | Maximum number of concurrent asynchronous LLM processes | `4`（default value changed by env var MAX_ASYNC) |
-| **llm_model_kwargs** | `dict` | Additional parameters for LLM generation | |
-| **vector_db_storage_cls_kwargs** | `dict` | Additional parameters for vector database, like setting the threshold for nodes and relations retrieval | cosine_better_than_threshold: 0.2（default value changed by env var COSINE_THRESHOLD) |
-| **enable_llm_cache** | `bool` | If `TRUE`, stores LLM results in cache; repeated prompts return cached responses | `TRUE` |
-| **enable_llm_cache_for_entity_extract** | `bool` | If `TRUE`, stores LLM results in cache for entity extraction; Good for beginners to debug your application | `TRUE` |
-| **addon_params** | `dict` | Additional parameters, e.g., `{"example_number": 1, "language": "Simplified Chinese", "entity_types": ["organization", "person", "geo", "event"]}`: sets example limit, entiy/relation extraction output language | `example_number: all examples, language: English` |
-| **convert_response_to_json_func** | `callable` | Not used | `convert_response_to_json` |
-| **embedding_cache_config** | `dict` | Configuration for question-answer caching. Contains three parameters: `enabled`: Boolean value to enable/disable cache lookup functionality. When enabled, the system will check cached responses before generating new answers. `similarity_threshold`: Float value (0-1), similarity threshold. When a new question's similarity with a cached question exceeds this threshold, the cached answer will be returned directly without calling the LLM. `use_llm_check`: Boolean value to enable/disable LLM similarity verification. When enabled, LLM will be used as a secondary check to verify the similarity between questions before returning cached answers. | Default: `{"enabled": False, "similarity_threshold": 0.95, "use_llm_check": False}` |
-
-</details>
-
-## Error Handling
-
-<details>
-<summary>Click to view error handling details</summary>
-
-The API includes comprehensive error handling:
-
- File not found errors (404)
- Processing errors (500)
- Supports multiple file encodings (UTF-8 and GBK)
-
-</details>
-
 ## LightRAG API

 The LightRAG Server is designed to provide Web UI and API support.  **For more information about LightRAG Server, please refer to [LightRAG Server](./lightrag/api/README.md).**