From e6b2f68e7c79405468f08e4abde34825a981af4f Mon Sep 17 00:00:00 2001 From: Magic_yuan <317617749@qq.com> Date: Sat, 28 Dec 2024 00:16:53 +0800 Subject: [PATCH] =?UTF-8?q?docs(readme):=20Add=20batch=20size=20configurat?= =?UTF-8?q?ion=20documentation=20=E6=96=87=E6=A1=A3(readme):=20=E6=B7=BB?= =?UTF-8?q?=E5=8A=A0=E6=89=B9=E5=A4=84=E7=90=86=E5=A4=A7=E5=B0=8F=E9=85=8D?= =?UTF-8?q?=E7=BD=AE=E8=AF=B4=E6=98=8E?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add documentation for insert_batch_size parameter in addon_params - 在 addon_params 中添加 insert_batch_size 参数的文档说明 - Explain default batch size value and its usage - 说明默认批处理大小值及其用途 - Add example configuration for batch processing - 添加批处理配置的示例 --- README.md | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 410049fe..0d7016d2 100644 --- a/README.md +++ b/README.md @@ -278,10 +278,25 @@ class QueryParam: ### Batch Insert ```python -# Batch Insert: Insert multiple texts at once +# Basic Batch Insert: Insert multiple texts at once rag.insert(["TEXT1", "TEXT2",...]) + +# Batch Insert with custom batch size configuration +rag = LightRAG( + working_dir=WORKING_DIR, + addon_params={ + "insert_batch_size": 20 # Process 20 documents per batch + } +) +rag.insert(["TEXT1", "TEXT2", "TEXT3", ...]) # Documents will be processed in batches of 20 ``` +The `insert_batch_size` parameter in `addon_params` controls how many documents are processed in each batch during insertion. This is useful for: +- Managing memory usage with large document collections +- Optimizing processing speed +- Providing better progress tracking +- Default value is 10 if not specified + ### Incremental Insert ```python @@ -594,7 +609,7 @@ if __name__ == "__main__": | **llm\_model\_kwargs** | `dict` | Additional parameters for LLM generation | | | **vector\_db\_storage\_cls\_kwargs** | `dict` | Additional parameters for vector database (currently not used) | | | **enable\_llm\_cache** | `bool` | If `TRUE`, stores LLM results in cache; repeated prompts return cached responses | `TRUE` | -| **addon\_params** | `dict` | Additional parameters, e.g., `{"example_number": 1, "language": "Simplified Chinese", "entity_types": ["organization", "person", "geo", "event"]}`: sets example limit and output language | `example_number: all examples, language: English` | +| **addon\_params** | `dict` | Additional parameters, e.g., `{"example_number": 1, "language": "Simplified Chinese", "entity_types": ["organization", "person", "geo", "event"], "insert_batch_size": 10}`: sets example limit, output language, and batch size for document processing | `example_number: all examples, language: English, insert_batch_size: 10` | | **convert\_response\_to\_json\_func** | `callable` | Not used | `convert_response_to_json` | | **embedding\_cache\_config** | `dict` | Configuration for question-answer caching. Contains three parameters:
- `enabled`: Boolean value to enable/disable cache lookup functionality. When enabled, the system will check cached responses before generating new answers.
- `similarity_threshold`: Float value (0-1), similarity threshold. When a new question's similarity with a cached question exceeds this threshold, the cached answer will be returned directly without calling the LLM.
- `use_llm_check`: Boolean value to enable/disable LLM similarity verification. When enabled, LLM will be used as a secondary check to verify the similarity between questions before returning cached answers. | Default: `{"enabled": False, "similarity_threshold": 0.95, "use_llm_check": False}` |