Adjust concurrency limits more LLM friendly settings for new comers

- Lowered max async LLM processes to 4
- Enabled LLM cache for entity extraction
- Reduced max parallel insert to 2
This commit is contained in:
yangdx
2025-03-16 23:56:34 +08:00
parent 9d971e5889
commit c2ba7f33ff
5 changed files with 7 additions and 6 deletions

View File

@@ -1061,7 +1061,7 @@ Valid modes are:
| **llm\_model\_func** | `callable` | Function for LLM generation | `gpt_4o_mini_complete` |
| **llm\_model\_name** | `str` | LLM model name for generation | `meta-llama/Llama-3.2-1B-Instruct` |
| **llm\_model\_max\_token\_size** | `int` | Maximum token size for LLM generation (affects entity relation summaries) | `32768`default value changed by env var MAX_TOKENS) |
| **llm\_model\_max\_async** | `int` | Maximum number of concurrent asynchronous LLM processes | `16`default value changed by env var MAX_ASYNC) |
| **llm\_model\_max\_async** | `int` | Maximum number of concurrent asynchronous LLM processes | `4`default value changed by env var MAX_ASYNC) |
| **llm\_model\_kwargs** | `dict` | Additional parameters for LLM generation | |
| **vector\_db\_storage\_cls\_kwargs** | `dict` | Additional parameters for vector database, like setting the threshold for nodes and relations retrieval. | cosine_better_than_threshold: 0.2default value changed by env var COSINE_THRESHOLD) |
| **enable\_llm\_cache** | `bool` | If `TRUE`, stores LLM results in cache; repeated prompts return cached responses | `TRUE` |