updated doc
This commit is contained in:
121
README.md
121
README.md
@@ -566,7 +566,7 @@ rag.insert(text_content.decode('utf-8'))
|
|||||||
```
|
```
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
### Storage
|
## Storage
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
<summary> <b>Using Neo4J for Storage</b> </summary>
|
<summary> <b>Using Neo4J for Storage</b> </summary>
|
||||||
@@ -682,8 +682,8 @@ async def embedding_func(texts: list[str]) -> np.ndarray:
|
|||||||
|
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
|
## Delete
|
||||||
|
|
||||||
### Delete
|
|
||||||
```python
|
```python
|
||||||
|
|
||||||
rag = LightRAG(
|
rag = LightRAG(
|
||||||
@@ -703,11 +703,63 @@ rag.delete_by_entity("Project Gutenberg")
|
|||||||
rag.delete_by_doc_id("doc_id")
|
rag.delete_by_doc_id("doc_id")
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## LightRAG init parameters
|
||||||
### Graph Visualization
|
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
<summary> Graph visualization with html </summary>
|
<summary> Parameters </summary>
|
||||||
|
|
||||||
|
| **Parameter** | **Type** | **Explanation** | **Default** |
|
||||||
|
|----------------------------------------------| --- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|
|
||||||
|
| **working\_dir** | `str` | Directory where the cache will be stored | `lightrag_cache+timestamp` |
|
||||||
|
| **kv\_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`, `OracleKVStorage` | `JsonKVStorage` |
|
||||||
|
| **vector\_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`, `OracleVectorDBStorage` | `NanoVectorDBStorage` |
|
||||||
|
| **graph\_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`, `Neo4JStorage`, `OracleGraphStorage` | `NetworkXStorage` |
|
||||||
|
| **log\_level** | | Log level for application runtime | `logging.DEBUG` |
|
||||||
|
| **chunk\_token\_size** | `int` | Maximum token size per chunk when splitting documents | `1200` |
|
||||||
|
| **chunk\_overlap\_token\_size** | `int` | Overlap token size between two chunks when splitting documents | `100` |
|
||||||
|
| **tiktoken\_model\_name** | `str` | Model name for the Tiktoken encoder used to calculate token numbers | `gpt-4o-mini` |
|
||||||
|
| **entity\_extract\_max\_gleaning** | `int` | Number of loops in the entity extraction process, appending history messages | `1` |
|
||||||
|
| **entity\_summary\_to\_max\_tokens** | `int` | Maximum token size for each entity summary | `500` |
|
||||||
|
| **node\_embedding\_algorithm** | `str` | Algorithm for node embedding (currently not used) | `node2vec` |
|
||||||
|
| **node2vec\_params** | `dict` | Parameters for node embedding | `{"dimensions": 1536,"num_walks": 10,"walk_length": 40,"window_size": 2,"iterations": 3,"random_seed": 3,}` |
|
||||||
|
| **embedding\_func** | `EmbeddingFunc` | Function to generate embedding vectors from text | `openai_embed` |
|
||||||
|
| **embedding\_batch\_num** | `int` | Maximum batch size for embedding processes (multiple texts sent per batch) | `32` |
|
||||||
|
| **embedding\_func\_max\_async** | `int` | Maximum number of concurrent asynchronous embedding processes | `16` |
|
||||||
|
| **llm\_model\_func** | `callable` | Function for LLM generation | `gpt_4o_mini_complete` |
|
||||||
|
| **llm\_model\_name** | `str` | LLM model name for generation | `meta-llama/Llama-3.2-1B-Instruct` |
|
||||||
|
| **llm\_model\_max\_token\_size** | `int` | Maximum token size for LLM generation (affects entity relation summaries) | `32768`(default value changed by env var MAX_TOKENS) |
|
||||||
|
| **llm\_model\_max\_async** | `int` | Maximum number of concurrent asynchronous LLM processes | `16`(default value changed by env var MAX_ASYNC) |
|
||||||
|
| **llm\_model\_kwargs** | `dict` | Additional parameters for LLM generation | |
|
||||||
|
| **vector\_db\_storage\_cls\_kwargs** | `dict` | Additional parameters for vector database, like setting the threshold for nodes and relations retrieval. | cosine_better_than_threshold: 0.2(default value changed by env var COSINE_THRESHOLD) |
|
||||||
|
| **enable\_llm\_cache** | `bool` | If `TRUE`, stores LLM results in cache; repeated prompts return cached responses | `TRUE` |
|
||||||
|
| **enable\_llm\_cache\_for\_entity\_extract** | `bool` | If `TRUE`, stores LLM results in cache for entity extraction; Good for beginners to debug your application | `TRUE` |
|
||||||
|
| **addon\_params** | `dict` | Additional parameters, e.g., `{"example_number": 1, "language": "Simplified Chinese", "entity_types": ["organization", "person", "geo", "event"], "insert_batch_size": 10}`: sets example limit, output language, and batch size for document processing | `example_number: all examples, language: English, insert_batch_size: 10` |
|
||||||
|
| **convert\_response\_to\_json\_func** | `callable` | Not used | `convert_response_to_json` |
|
||||||
|
| **embedding\_cache\_config** | `dict` | Configuration for question-answer caching. Contains three parameters:<br>- `enabled`: Boolean value to enable/disable cache lookup functionality. When enabled, the system will check cached responses before generating new answers.<br>- `similarity_threshold`: Float value (0-1), similarity threshold. When a new question's similarity with a cached question exceeds this threshold, the cached answer will be returned directly without calling the LLM.<br>- `use_llm_check`: Boolean value to enable/disable LLM similarity verification. When enabled, LLM will be used as a secondary check to verify the similarity between questions before returning cached answers. | Default: `{"enabled": False, "similarity_threshold": 0.95, "use_llm_check": False}` |
|
||||||
|
|**log\_dir** | `str` | Directory to store logs. | `./` |
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
|
## Error Handling
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary>Click to view error handling details</summary>
|
||||||
|
|
||||||
|
The API includes comprehensive error handling:
|
||||||
|
- File not found errors (404)
|
||||||
|
- Processing errors (500)
|
||||||
|
- Supports multiple file encodings (UTF-8 and GBK)
|
||||||
|
</details>
|
||||||
|
|
||||||
|
## API
|
||||||
|
LightRag can be installed with API support to serve a Fast api interface to perform data upload and indexing/Rag operations/Rescan of the input folder etc..
|
||||||
|
|
||||||
|
[LightRag API](lightrag/api/README.md)
|
||||||
|
|
||||||
|
## Graph Visualization
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary> <b>Graph visualization with html</b> </summary>
|
||||||
|
|
||||||
* The following code can be found in `examples/graph_visual_with_html.py`
|
* The following code can be found in `examples/graph_visual_with_html.py`
|
||||||
|
|
||||||
@@ -731,7 +783,8 @@ net.show('knowledge_graph.html')
|
|||||||
</details>
|
</details>
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
<summary> Graph visualization with Neo4j </summary>
|
<summary> <b>Graph visualization with Neo4</b> </summary>
|
||||||
|
|
||||||
|
|
||||||
* The following code can be found in `examples/graph_visual_with_neo4j.py`
|
* The following code can be found in `examples/graph_visual_with_neo4j.py`
|
||||||
|
|
||||||
@@ -858,52 +911,13 @@ if __name__ == "__main__":
|
|||||||
|
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
### LightRAG init parameters
|
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
<summary> Parameters </summary>
|
<summary> <b>Graphml 3d visualizer</b> </summary>
|
||||||
|
|
||||||
| **Parameter** | **Type** | **Explanation** | **Default** |
|
LightRag can be installed with Tools support to add extra tools like the graphml 3d visualizer.
|
||||||
|----------------------------------------------| --- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|
|
|
||||||
| **working\_dir** | `str` | Directory where the cache will be stored | `lightrag_cache+timestamp` |
|
|
||||||
| **kv\_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`, `OracleKVStorage` | `JsonKVStorage` |
|
|
||||||
| **vector\_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`, `OracleVectorDBStorage` | `NanoVectorDBStorage` |
|
|
||||||
| **graph\_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`, `Neo4JStorage`, `OracleGraphStorage` | `NetworkXStorage` |
|
|
||||||
| **log\_level** | | Log level for application runtime | `logging.DEBUG` |
|
|
||||||
| **chunk\_token\_size** | `int` | Maximum token size per chunk when splitting documents | `1200` |
|
|
||||||
| **chunk\_overlap\_token\_size** | `int` | Overlap token size between two chunks when splitting documents | `100` |
|
|
||||||
| **tiktoken\_model\_name** | `str` | Model name for the Tiktoken encoder used to calculate token numbers | `gpt-4o-mini` |
|
|
||||||
| **entity\_extract\_max\_gleaning** | `int` | Number of loops in the entity extraction process, appending history messages | `1` |
|
|
||||||
| **entity\_summary\_to\_max\_tokens** | `int` | Maximum token size for each entity summary | `500` |
|
|
||||||
| **node\_embedding\_algorithm** | `str` | Algorithm for node embedding (currently not used) | `node2vec` |
|
|
||||||
| **node2vec\_params** | `dict` | Parameters for node embedding | `{"dimensions": 1536,"num_walks": 10,"walk_length": 40,"window_size": 2,"iterations": 3,"random_seed": 3,}` |
|
|
||||||
| **embedding\_func** | `EmbeddingFunc` | Function to generate embedding vectors from text | `openai_embed` |
|
|
||||||
| **embedding\_batch\_num** | `int` | Maximum batch size for embedding processes (multiple texts sent per batch) | `32` |
|
|
||||||
| **embedding\_func\_max\_async** | `int` | Maximum number of concurrent asynchronous embedding processes | `16` |
|
|
||||||
| **llm\_model\_func** | `callable` | Function for LLM generation | `gpt_4o_mini_complete` |
|
|
||||||
| **llm\_model\_name** | `str` | LLM model name for generation | `meta-llama/Llama-3.2-1B-Instruct` |
|
|
||||||
| **llm\_model\_max\_token\_size** | `int` | Maximum token size for LLM generation (affects entity relation summaries) | `32768`(default value changed by env var MAX_TOKENS) |
|
|
||||||
| **llm\_model\_max\_async** | `int` | Maximum number of concurrent asynchronous LLM processes | `16`(default value changed by env var MAX_ASYNC) |
|
|
||||||
| **llm\_model\_kwargs** | `dict` | Additional parameters for LLM generation | |
|
|
||||||
| **vector\_db\_storage\_cls\_kwargs** | `dict` | Additional parameters for vector database, like setting the threshold for nodes and relations retrieval. | cosine_better_than_threshold: 0.2(default value changed by env var COSINE_THRESHOLD) |
|
|
||||||
| **enable\_llm\_cache** | `bool` | If `TRUE`, stores LLM results in cache; repeated prompts return cached responses | `TRUE` |
|
|
||||||
| **enable\_llm\_cache\_for\_entity\_extract** | `bool` | If `TRUE`, stores LLM results in cache for entity extraction; Good for beginners to debug your application | `TRUE` |
|
|
||||||
| **addon\_params** | `dict` | Additional parameters, e.g., `{"example_number": 1, "language": "Simplified Chinese", "entity_types": ["organization", "person", "geo", "event"], "insert_batch_size": 10}`: sets example limit, output language, and batch size for document processing | `example_number: all examples, language: English, insert_batch_size: 10` |
|
|
||||||
| **convert\_response\_to\_json\_func** | `callable` | Not used | `convert_response_to_json` |
|
|
||||||
| **embedding\_cache\_config** | `dict` | Configuration for question-answer caching. Contains three parameters:<br>- `enabled`: Boolean value to enable/disable cache lookup functionality. When enabled, the system will check cached responses before generating new answers.<br>- `similarity_threshold`: Float value (0-1), similarity threshold. When a new question's similarity with a cached question exceeds this threshold, the cached answer will be returned directly without calling the LLM.<br>- `use_llm_check`: Boolean value to enable/disable LLM similarity verification. When enabled, LLM will be used as a secondary check to verify the similarity between questions before returning cached answers. | Default: `{"enabled": False, "similarity_threshold": 0.95, "use_llm_check": False}` |
|
|
||||||
|**log\_dir** | `str` | Directory to store logs. | `./` |
|
|
||||||
|
|
||||||
</details>
|
[LightRag Visualizer](lightrag/tools/lightrag_visualizer/README.md)
|
||||||
|
|
||||||
### Error Handling
|
|
||||||
|
|
||||||
<details>
|
|
||||||
<summary>Click to view error handling details</summary>
|
|
||||||
|
|
||||||
The API includes comprehensive error handling:
|
|
||||||
- File not found errors (404)
|
|
||||||
- Processing errors (500)
|
|
||||||
- Supports multiple file encodings (UTF-8 and GBK)
|
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
## Evaluation
|
## Evaluation
|
||||||
@@ -1147,17 +1161,6 @@ def extract_queries(file_path):
|
|||||||
```
|
```
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
## API
|
|
||||||
LightRag can be installed with API support to serve a Fast api interface to perform data upload and indexing/Rag operations/Rescan of the input folder etc..
|
|
||||||
|
|
||||||
The documentation can be found [here](lightrag/api/README.md)
|
|
||||||
|
|
||||||
## Graph viewer
|
|
||||||
LightRag can be installed with Tools support to add extra tools like the graphml 3d visualizer.
|
|
||||||
|
|
||||||
The documentation can be found [here](lightrag/tools/lightrag_visualizer/README.md)
|
|
||||||
|
|
||||||
|
|
||||||
## Star History
|
## Star History
|
||||||
|
|
||||||
<a href="https://star-history.com/#HKUDS/LightRAG&Date">
|
<a href="https://star-history.com/#HKUDS/LightRAG&Date">
|
||||||
|
Reference in New Issue
Block a user