From e8c1aa8d5d018a406e7f8a50f23334b4daa6c5ae Mon Sep 17 00:00:00 2001 From: Yannick Stephan Date: Wed, 19 Feb 2025 09:58:25 +0100 Subject: [PATCH] updated doc --- README.md | 121 ++++++++++++++++++++++++++++-------------------------- 1 file changed, 62 insertions(+), 59 deletions(-) diff --git a/README.md b/README.md index ff18bfcb..bc0b6bde 100644 --- a/README.md +++ b/README.md @@ -566,7 +566,7 @@ rag.insert(text_content.decode('utf-8')) ``` -### Storage +## Storage
Using Neo4J for Storage @@ -682,8 +682,8 @@ async def embedding_func(texts: list[str]) -> np.ndarray:
+## Delete -### Delete ```python rag = LightRAG( @@ -703,11 +703,63 @@ rag.delete_by_entity("Project Gutenberg") rag.delete_by_doc_id("doc_id") ``` - -### Graph Visualization +## LightRAG init parameters
- Graph visualization with html + Parameters + +| **Parameter** | **Type** | **Explanation** | **Default** | +|----------------------------------------------| --- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------| +| **working\_dir** | `str` | Directory where the cache will be stored | `lightrag_cache+timestamp` | +| **kv\_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`, `OracleKVStorage` | `JsonKVStorage` | +| **vector\_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`, `OracleVectorDBStorage` | `NanoVectorDBStorage` | +| **graph\_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`, `Neo4JStorage`, `OracleGraphStorage` | `NetworkXStorage` | +| **log\_level** | | Log level for application runtime | `logging.DEBUG` | +| **chunk\_token\_size** | `int` | Maximum token size per chunk when splitting documents | `1200` | +| **chunk\_overlap\_token\_size** | `int` | Overlap token size between two chunks when splitting documents | `100` | +| **tiktoken\_model\_name** | `str` | Model name for the Tiktoken encoder used to calculate token numbers | `gpt-4o-mini` | +| **entity\_extract\_max\_gleaning** | `int` | Number of loops in the entity extraction process, appending history messages | `1` | +| **entity\_summary\_to\_max\_tokens** | `int` | Maximum token size for each entity summary | `500` | +| **node\_embedding\_algorithm** | `str` | Algorithm for node embedding (currently not used) | `node2vec` | +| **node2vec\_params** | `dict` | Parameters for node embedding | `{"dimensions": 1536,"num_walks": 10,"walk_length": 40,"window_size": 2,"iterations": 3,"random_seed": 3,}` | +| **embedding\_func** | `EmbeddingFunc` | Function to generate embedding vectors from text | `openai_embed` | +| **embedding\_batch\_num** | `int` | Maximum batch size for embedding processes (multiple texts sent per batch) | `32` | +| **embedding\_func\_max\_async** | `int` | Maximum number of concurrent asynchronous embedding processes | `16` | +| **llm\_model\_func** | `callable` | Function for LLM generation | `gpt_4o_mini_complete` | +| **llm\_model\_name** | `str` | LLM model name for generation | `meta-llama/Llama-3.2-1B-Instruct` | +| **llm\_model\_max\_token\_size** | `int` | Maximum token size for LLM generation (affects entity relation summaries) | `32768`(default value changed by env var MAX_TOKENS) | +| **llm\_model\_max\_async** | `int` | Maximum number of concurrent asynchronous LLM processes | `16`(default value changed by env var MAX_ASYNC) | +| **llm\_model\_kwargs** | `dict` | Additional parameters for LLM generation | | +| **vector\_db\_storage\_cls\_kwargs** | `dict` | Additional parameters for vector database, like setting the threshold for nodes and relations retrieval. | cosine_better_than_threshold: 0.2(default value changed by env var COSINE_THRESHOLD) | +| **enable\_llm\_cache** | `bool` | If `TRUE`, stores LLM results in cache; repeated prompts return cached responses | `TRUE` | +| **enable\_llm\_cache\_for\_entity\_extract** | `bool` | If `TRUE`, stores LLM results in cache for entity extraction; Good for beginners to debug your application | `TRUE` | +| **addon\_params** | `dict` | Additional parameters, e.g., `{"example_number": 1, "language": "Simplified Chinese", "entity_types": ["organization", "person", "geo", "event"], "insert_batch_size": 10}`: sets example limit, output language, and batch size for document processing | `example_number: all examples, language: English, insert_batch_size: 10` | +| **convert\_response\_to\_json\_func** | `callable` | Not used | `convert_response_to_json` | +| **embedding\_cache\_config** | `dict` | Configuration for question-answer caching. Contains three parameters:
- `enabled`: Boolean value to enable/disable cache lookup functionality. When enabled, the system will check cached responses before generating new answers.
- `similarity_threshold`: Float value (0-1), similarity threshold. When a new question's similarity with a cached question exceeds this threshold, the cached answer will be returned directly without calling the LLM.
- `use_llm_check`: Boolean value to enable/disable LLM similarity verification. When enabled, LLM will be used as a secondary check to verify the similarity between questions before returning cached answers. | Default: `{"enabled": False, "similarity_threshold": 0.95, "use_llm_check": False}` | +|**log\_dir** | `str` | Directory to store logs. | `./` | + +
+ +## Error Handling + +
+Click to view error handling details + +The API includes comprehensive error handling: +- File not found errors (404) +- Processing errors (500) +- Supports multiple file encodings (UTF-8 and GBK) +
+ +## API +LightRag can be installed with API support to serve a Fast api interface to perform data upload and indexing/Rag operations/Rescan of the input folder etc.. + +[LightRag API](lightrag/api/README.md) + +## Graph Visualization + +
+ Graph visualization with html * The following code can be found in `examples/graph_visual_with_html.py` @@ -731,7 +783,8 @@ net.show('knowledge_graph.html')
- Graph visualization with Neo4j + Graph visualization with Neo4 + * The following code can be found in `examples/graph_visual_with_neo4j.py` @@ -858,52 +911,13 @@ if __name__ == "__main__":
-### LightRAG init parameters -
- Parameters + Graphml 3d visualizer -| **Parameter** | **Type** | **Explanation** | **Default** | -|----------------------------------------------| --- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------| -| **working\_dir** | `str` | Directory where the cache will be stored | `lightrag_cache+timestamp` | -| **kv\_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`, `OracleKVStorage` | `JsonKVStorage` | -| **vector\_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`, `OracleVectorDBStorage` | `NanoVectorDBStorage` | -| **graph\_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`, `Neo4JStorage`, `OracleGraphStorage` | `NetworkXStorage` | -| **log\_level** | | Log level for application runtime | `logging.DEBUG` | -| **chunk\_token\_size** | `int` | Maximum token size per chunk when splitting documents | `1200` | -| **chunk\_overlap\_token\_size** | `int` | Overlap token size between two chunks when splitting documents | `100` | -| **tiktoken\_model\_name** | `str` | Model name for the Tiktoken encoder used to calculate token numbers | `gpt-4o-mini` | -| **entity\_extract\_max\_gleaning** | `int` | Number of loops in the entity extraction process, appending history messages | `1` | -| **entity\_summary\_to\_max\_tokens** | `int` | Maximum token size for each entity summary | `500` | -| **node\_embedding\_algorithm** | `str` | Algorithm for node embedding (currently not used) | `node2vec` | -| **node2vec\_params** | `dict` | Parameters for node embedding | `{"dimensions": 1536,"num_walks": 10,"walk_length": 40,"window_size": 2,"iterations": 3,"random_seed": 3,}` | -| **embedding\_func** | `EmbeddingFunc` | Function to generate embedding vectors from text | `openai_embed` | -| **embedding\_batch\_num** | `int` | Maximum batch size for embedding processes (multiple texts sent per batch) | `32` | -| **embedding\_func\_max\_async** | `int` | Maximum number of concurrent asynchronous embedding processes | `16` | -| **llm\_model\_func** | `callable` | Function for LLM generation | `gpt_4o_mini_complete` | -| **llm\_model\_name** | `str` | LLM model name for generation | `meta-llama/Llama-3.2-1B-Instruct` | -| **llm\_model\_max\_token\_size** | `int` | Maximum token size for LLM generation (affects entity relation summaries) | `32768`(default value changed by env var MAX_TOKENS) | -| **llm\_model\_max\_async** | `int` | Maximum number of concurrent asynchronous LLM processes | `16`(default value changed by env var MAX_ASYNC) | -| **llm\_model\_kwargs** | `dict` | Additional parameters for LLM generation | | -| **vector\_db\_storage\_cls\_kwargs** | `dict` | Additional parameters for vector database, like setting the threshold for nodes and relations retrieval. | cosine_better_than_threshold: 0.2(default value changed by env var COSINE_THRESHOLD) | -| **enable\_llm\_cache** | `bool` | If `TRUE`, stores LLM results in cache; repeated prompts return cached responses | `TRUE` | -| **enable\_llm\_cache\_for\_entity\_extract** | `bool` | If `TRUE`, stores LLM results in cache for entity extraction; Good for beginners to debug your application | `TRUE` | -| **addon\_params** | `dict` | Additional parameters, e.g., `{"example_number": 1, "language": "Simplified Chinese", "entity_types": ["organization", "person", "geo", "event"], "insert_batch_size": 10}`: sets example limit, output language, and batch size for document processing | `example_number: all examples, language: English, insert_batch_size: 10` | -| **convert\_response\_to\_json\_func** | `callable` | Not used | `convert_response_to_json` | -| **embedding\_cache\_config** | `dict` | Configuration for question-answer caching. Contains three parameters:
- `enabled`: Boolean value to enable/disable cache lookup functionality. When enabled, the system will check cached responses before generating new answers.
- `similarity_threshold`: Float value (0-1), similarity threshold. When a new question's similarity with a cached question exceeds this threshold, the cached answer will be returned directly without calling the LLM.
- `use_llm_check`: Boolean value to enable/disable LLM similarity verification. When enabled, LLM will be used as a secondary check to verify the similarity between questions before returning cached answers. | Default: `{"enabled": False, "similarity_threshold": 0.95, "use_llm_check": False}` | -|**log\_dir** | `str` | Directory to store logs. | `./` | +LightRag can be installed with Tools support to add extra tools like the graphml 3d visualizer. -
+[LightRag Visualizer](lightrag/tools/lightrag_visualizer/README.md) -### Error Handling - -
-Click to view error handling details - -The API includes comprehensive error handling: -- File not found errors (404) -- Processing errors (500) -- Supports multiple file encodings (UTF-8 and GBK)
## Evaluation @@ -1147,17 +1161,6 @@ def extract_queries(file_path): ``` -## API -LightRag can be installed with API support to serve a Fast api interface to perform data upload and indexing/Rag operations/Rescan of the input folder etc.. - -The documentation can be found [here](lightrag/api/README.md) - -## Graph viewer -LightRag can be installed with Tools support to add extra tools like the graphml 3d visualizer. - -The documentation can be found [here](lightrag/tools/lightrag_visualizer/README.md) - - ## Star History