diff --git a/README.md b/README.md
index 8a0da666..e302d708 100644
--- a/README.md
+++ b/README.md
@@ -921,342 +921,7 @@ def extract_queries(file_path):
 ```
 </details>
 
-## Install with API Support
 
-LightRAG provides optional API support through FastAPI servers that add RAG capabilities to existing LLM services. You can install LightRAG with API support in two ways:
-
-### 1. Installation from PyPI
-
-```bash
-pip install "lightrag-hku[api]"
-```
-
-### 2. Installation from Source (Development)
-
-```bash
-# Clone the repository
-git clone https://github.com/HKUDS/lightrag.git
-
-# Change to the repository directory
-cd lightrag
-
-# Install in editable mode with API support
-pip install -e ".[api]"
-```
-
-### Prerequisites
-
-Before running any of the servers, ensure you have the corresponding backend service running for both llm and embedding.
-The new api allows you to mix different bindings for llm/embeddings.
-For example, you have the possibility to use ollama for the embedding and openai for the llm.
-
-#### For LoLLMs Server
-- LoLLMs must be running and accessible
-- Default connection: http://localhost:9600
-- Configure using --llm-binding-host and/or --embedding-binding-host if running on a different host/port
-
-#### For Ollama Server
-- Ollama must be running and accessible
-- Default connection: http://localhost:11434
-- Configure using --ollama-host if running on a different host/port
-
-#### For OpenAI Server
-- Requires valid OpenAI API credentials set in environment variables
-- OPENAI_API_KEY must be set
-
-#### For Azure OpenAI Server
-Azure OpenAI API can be created using the following commands in Azure CLI (you need to install Azure CLI first from [https://docs.microsoft.com/en-us/cli/azure/install-azure-cli](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli)):
-```bash
-# Change the resource group name, location and OpenAI resource name as needed
-RESOURCE_GROUP_NAME=LightRAG
-LOCATION=swedencentral
-RESOURCE_NAME=LightRAG-OpenAI
-
-az login
-az group create --name $RESOURCE_GROUP_NAME --location $LOCATION
-az cognitiveservices account create --name $RESOURCE_NAME --resource-group $RESOURCE_GROUP_NAME  --kind OpenAI --sku S0 --location swedencentral
-az cognitiveservices account deployment create --resource-group $RESOURCE_GROUP_NAME  --model-format OpenAI --name $RESOURCE_NAME --deployment-name gpt-4o --model-name gpt-4o --model-version "2024-08-06"  --sku-capacity 100 --sku-name "Standard"
-az cognitiveservices account deployment create --resource-group $RESOURCE_GROUP_NAME  --model-format OpenAI --name $RESOURCE_NAME --deployment-name text-embedding-3-large --model-name text-embedding-3-large --model-version "1"  --sku-capacity 80 --sku-name "Standard"
-az cognitiveservices account show --name $RESOURCE_NAME --resource-group $RESOURCE_GROUP_NAME --query "properties.endpoint"
-az cognitiveservices account keys list --name $RESOURCE_NAME -g $RESOURCE_GROUP_NAME
-
-```
-The output of the last command will give you the endpoint and the key for the OpenAI API. You can use these values to set the environment variables in the `.env` file.
-
-
-
-### Configuration Options
-
-Each server has its own specific configuration options:
-
-#### LightRag Server Options
-
-| Parameter | Default | Description |
-|-----------|---------|-------------|
-| --host | 0.0.0.0 | Server host |
-| --port | 9621 | Server port |
-| --llm-binding | ollama | LLM binding to be used. Supported: lollms, ollama, openai |
-| --llm-binding-host | (dynamic) | LLM server host URL. Defaults based on binding: http://localhost:11434 (ollama), http://localhost:9600 (lollms), https://api.openai.com/v1 (openai) |
-| --llm-model | mistral-nemo:latest | LLM model name |
-| --embedding-binding | ollama | Embedding binding to be used. Supported: lollms, ollama, openai |
-| --embedding-binding-host | (dynamic) | Embedding server host URL. Defaults based on binding: http://localhost:11434 (ollama), http://localhost:9600 (lollms), https://api.openai.com/v1 (openai) |
-| --embedding-model | bge-m3:latest | Embedding model name |
-| --working-dir | ./rag_storage | Working directory for RAG storage |
-| --input-dir | ./inputs | Directory containing input documents |
-| --max-async | 4 | Maximum async operations |
-| --max-tokens | 32768 | Maximum token size |
-| --embedding-dim | 1024 | Embedding dimensions |
-| --max-embed-tokens | 8192 | Maximum embedding token size |
-| --timeout | None | Timeout in seconds (useful when using slow AI). Use None for infinite timeout |
-| --log-level | INFO | Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL) |
-| --key | None | API key for authentication. Protects lightrag server against unauthorized access |
-| --ssl | False | Enable HTTPS |
-| --ssl-certfile | None | Path to SSL certificate file (required if --ssl is enabled) |
-| --ssl-keyfile | None | Path to SSL private key file (required if --ssl is enabled) |
-
-
-
-For protecting the server using an authentication key, you can also use an environment variable named `LIGHTRAG_API_KEY`.
-### Example Usage
-
-#### Running a Lightrag server with ollama default local server as llm and embedding backends
-
-Ollama is the default backend for both llm and embedding, so by default you can run lightrag-server with no parameters and the default ones will be used. Make sure ollama is installed and is running and default models are already installed on ollama.
-
-```bash
-# Run lightrag with ollama, mistral-nemo:latest for llm, and bge-m3:latest for embedding
-lightrag-server
-
-# Using specific models (ensure they are installed in your ollama instance)
-lightrag-server --llm-model adrienbrault/nous-hermes2theta-llama3-8b:f16 --embedding-model nomic-embed-text --embedding-dim 1024
-
-# Using an authentication key
-lightrag-server --key my-key
-
-# Using lollms for llm and ollama for embedding
-lightrag-server --llm-binding lollms
-```
-
-#### Running a Lightrag server with lollms default local server as llm and embedding backends
-
-```bash
-# Run lightrag with lollms, mistral-nemo:latest for llm, and bge-m3:latest for embedding, use lollms for both llm and embedding
-lightrag-server --llm-binding lollms --embedding-binding lollms
-
-# Using specific models (ensure they are installed in your ollama instance)
-lightrag-server --llm-binding lollms --llm-model adrienbrault/nous-hermes2theta-llama3-8b:f16 --embedding-binding lollms --embedding-model nomic-embed-text --embedding-dim 1024
-
-# Using an authentication key
-lightrag-server --key my-key
-
-# Using lollms for llm and openai for embedding
-lightrag-server --llm-binding lollms --embedding-binding openai --embedding-model text-embedding-3-small
-```
-
-
-#### Running a Lightrag server with openai server as llm and embedding backends
-
-```bash
-# Run lightrag with lollms, GPT-4o-mini  for llm, and text-embedding-3-small for embedding, use openai for both llm and embedding
-lightrag-server --llm-binding openai --llm-model GPT-4o-mini --embedding-binding openai --embedding-model text-embedding-3-small
-
-# Using an authentication key
-lightrag-server --llm-binding openai --llm-model GPT-4o-mini --embedding-binding openai --embedding-model text-embedding-3-small --key my-key
-
-# Using lollms for llm and openai for embedding
-lightrag-server --llm-binding lollms --embedding-binding openai --embedding-model text-embedding-3-small
-```
-
-#### Running a Lightrag server with azure openai server as llm and embedding backends
-
-```bash
-# Run lightrag with lollms, GPT-4o-mini  for llm, and text-embedding-3-small for embedding, use openai for both llm and embedding
-lightrag-server --llm-binding azure_openai --llm-model GPT-4o-mini --embedding-binding openai --embedding-model text-embedding-3-small
-
-# Using an authentication key
-lightrag-server --llm-binding azure_openai --llm-model GPT-4o-mini --embedding-binding azure_openai --embedding-model text-embedding-3-small --key my-key
-
-# Using lollms for llm and azure_openai for embedding
-lightrag-server --llm-binding lollms --embedding-binding azure_openai --embedding-model text-embedding-3-small
-```
-
-**Important Notes:**
-- For LoLLMs: Make sure the specified models are installed in your LoLLMs instance
-- For Ollama: Make sure the specified models are installed in your Ollama instance
-- For OpenAI: Ensure you have set up your OPENAI_API_KEY environment variable
-- For Azure OpenAI: Build and configure your server as stated in the Prequisites section
-
-For help on any server, use the --help flag:
-```bash
-lightrag-server --help
-```
-
-Note: If you don't need the API functionality, you can install the base package without API support using:
-```bash
-pip install lightrag-hku
-```
-
-## API Endpoints
-
-All servers (LoLLMs, Ollama, OpenAI and Azure OpenAI) provide the same REST API endpoints for RAG functionality.
-
-### Query Endpoints
-
-#### POST /query
-Query the RAG system with options for different search modes.
-
-```bash
-curl -X POST "http://localhost:9621/query" \
-    -H "Content-Type: application/json" \
-    -d '{"query": "Your question here", "mode": "hybrid", ""}'
-```
-
-#### POST /query/stream
-Stream responses from the RAG system.
-
-```bash
-curl -X POST "http://localhost:9621/query/stream" \
-    -H "Content-Type: application/json" \
-    -d '{"query": "Your question here", "mode": "hybrid"}'
-```
-
-### Document Management Endpoints
-
-#### POST /documents/text
-Insert text directly into the RAG system.
-
-```bash
-curl -X POST "http://localhost:9621/documents/text" \
-    -H "Content-Type: application/json" \
-    -d '{"text": "Your text content here", "description": "Optional description"}'
-```
-
-#### POST /documents/file
-Upload a single file to the RAG system.
-
-```bash
-curl -X POST "http://localhost:9621/documents/file" \
-    -F "file=@/path/to/your/document.txt" \
-    -F "description=Optional description"
-```
-
-#### POST /documents/batch
-Upload multiple files at once.
-
-```bash
-curl -X POST "http://localhost:9621/documents/batch" \
-    -F "files=@/path/to/doc1.txt" \
-    -F "files=@/path/to/doc2.txt"
-```
-
-#### DELETE /documents
-Clear all documents from the RAG system.
-
-```bash
-curl -X DELETE "http://localhost:9621/documents"
-```
-
-### Utility Endpoints
-
-#### GET /health
-Check server health and configuration.
-
-```bash
-curl "http://localhost:9621/health"
-```
-
-## Development
-Contribute to the project: [Guide](contributor-readme.MD)
-
-### Running in Development Mode
-
-For LoLLMs:
-```bash
-uvicorn lollms_lightrag_server:app --reload --port 9621
-```
-
-For Ollama:
-```bash
-uvicorn ollama_lightrag_server:app --reload --port 9621
-```
-
-For OpenAI:
-```bash
-uvicorn openai_lightrag_server:app --reload --port 9621
-```
-For Azure OpenAI:
-```bash
-uvicorn azure_openai_lightrag_server:app --reload --port 9621
-```
-### API Documentation
-
-When any server is running, visit:
-- Swagger UI: http://localhost:9621/docs
-- ReDoc: http://localhost:9621/redoc
-
-### Testing API Endpoints
-
-You can test the API endpoints using the provided curl commands or through the Swagger UI interface. Make sure to:
-1. Start the appropriate backend service (LoLLMs, Ollama, or OpenAI)
-2. Start the RAG server
-3. Upload some documents using the document management endpoints
-4. Query the system using the query endpoints
-
-### Important Features
-
-#### Automatic Document Vectorization
-When starting any of the servers with the `--input-dir` parameter, the system will automatically:
-1. Scan the specified directory for documents
-2. Check for existing vectorized content in the database
-3. Only vectorize new documents that aren't already in the database
-4. Make all content immediately available for RAG queries
-
-This intelligent caching mechanism:
-- Prevents unnecessary re-vectorization of existing documents
-- Reduces startup time for subsequent runs
-- Preserves system resources
-- Maintains consistency across restarts
-
-### Example Usage
-
-#### LoLLMs RAG Server
-
-```bash
-# Start server with automatic document vectorization
-# Only new documents will be vectorized, existing ones will be loaded from cache
-lollms-lightrag-server --input-dir ./my_documents --port 8080
-```
-
-#### Ollama RAG Server
-
-```bash
-# Start server with automatic document vectorization
-# Previously vectorized documents will be loaded from the database
-ollama-lightrag-server --input-dir ./my_documents --port 8080
-```
-
-#### OpenAI RAG Server
-
-```bash
-# Start server with automatic document vectorization
-# Existing documents are retrieved from cache, only new ones are processed
-openai-lightrag-server --input-dir ./my_documents --port 9624
-```
-
-#### Azure OpenAI RAG Server
-
-```bash
-# Start server with automatic document vectorization
-# Existing documents are retrieved from cache, only new ones are processed
-azure-openai-lightrag-server --input-dir ./my_documents --port 9624
-```
-
-**Important Notes:**
-- The `--input-dir` parameter enables automatic document processing at startup
-- Documents already in the database are not re-vectorized
-- Only new documents in the input directory will be processed
-- This optimization significantly reduces startup time for subsequent runs
-- The working directory (`--working-dir`) stores the vectorized documents database
 
 ## Star History
 
diff --git a/docs/LightRagAPI.md b/docs/LightRagAPI.md
new file mode 100644
index 00000000..588d6164
--- /dev/null
+++ b/docs/LightRagAPI.md
@@ -0,0 +1,302 @@
+## Install with API Support
+
+LightRAG provides optional API support through FastAPI servers that add RAG capabilities to existing LLM services. You can install LightRAG with API support in two ways:
+
+### 1. Installation from PyPI
+
+```bash
+pip install "lightrag-hku[api]"
+```
+
+### 2. Installation from Source (Development)
+
+```bash
+# Clone the repository
+git clone https://github.com/HKUDS/lightrag.git
+
+# Change to the repository directory
+cd lightrag
+
+# Install in editable mode with API support
+pip install -e ".[api]"
+```
+
+### Prerequisites
+
+Before running any of the servers, ensure you have the corresponding backend service running for both llm and embedding.
+The new api allows you to mix different bindings for llm/embeddings.
+For example, you have the possibility to use ollama for the embedding and openai for the llm.
+
+#### For LoLLMs Server
+- LoLLMs must be running and accessible
+- Default connection: http://localhost:9600
+- Configure using --llm-binding-host and/or --embedding-binding-host if running on a different host/port
+
+#### For Ollama Server
+- Ollama must be running and accessible
+- Default connection: http://localhost:11434
+- Configure using --ollama-host if running on a different host/port
+
+#### For OpenAI Server
+- Requires valid OpenAI API credentials set in environment variables
+- OPENAI_API_KEY must be set
+
+#### For Azure OpenAI Server
+Azure OpenAI API can be created using the following commands in Azure CLI (you need to install Azure CLI first from [https://docs.microsoft.com/en-us/cli/azure/install-azure-cli](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli)):
+```bash
+# Change the resource group name, location and OpenAI resource name as needed
+RESOURCE_GROUP_NAME=LightRAG
+LOCATION=swedencentral
+RESOURCE_NAME=LightRAG-OpenAI
+
+az login
+az group create --name $RESOURCE_GROUP_NAME --location $LOCATION
+az cognitiveservices account create --name $RESOURCE_NAME --resource-group $RESOURCE_GROUP_NAME  --kind OpenAI --sku S0 --location swedencentral
+az cognitiveservices account deployment create --resource-group $RESOURCE_GROUP_NAME  --model-format OpenAI --name $RESOURCE_NAME --deployment-name gpt-4o --model-name gpt-4o --model-version "2024-08-06"  --sku-capacity 100 --sku-name "Standard"
+az cognitiveservices account deployment create --resource-group $RESOURCE_GROUP_NAME  --model-format OpenAI --name $RESOURCE_NAME --deployment-name text-embedding-3-large --model-name text-embedding-3-large --model-version "1"  --sku-capacity 80 --sku-name "Standard"
+az cognitiveservices account show --name $RESOURCE_NAME --resource-group $RESOURCE_GROUP_NAME --query "properties.endpoint"
+az cognitiveservices account keys list --name $RESOURCE_NAME -g $RESOURCE_GROUP_NAME
+
+```
+The output of the last command will give you the endpoint and the key for the OpenAI API. You can use these values to set the environment variables in the `.env` file.
+
+
+
+### Configuration Options
+
+Each server has its own specific configuration options:
+
+#### LightRag Server Options
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| --host | 0.0.0.0 | Server host |
+| --port | 9621 | Server port |
+| --llm-binding | ollama | LLM binding to be used. Supported: lollms, ollama, openai |
+| --llm-binding-host | (dynamic) | LLM server host URL. Defaults based on binding: http://localhost:11434 (ollama), http://localhost:9600 (lollms), https://api.openai.com/v1 (openai) |
+| --llm-model | mistral-nemo:latest | LLM model name |
+| --embedding-binding | ollama | Embedding binding to be used. Supported: lollms, ollama, openai |
+| --embedding-binding-host | (dynamic) | Embedding server host URL. Defaults based on binding: http://localhost:11434 (ollama), http://localhost:9600 (lollms), https://api.openai.com/v1 (openai) |
+| --embedding-model | bge-m3:latest | Embedding model name |
+| --working-dir | ./rag_storage | Working directory for RAG storage |
+| --input-dir | ./inputs | Directory containing input documents |
+| --max-async | 4 | Maximum async operations |
+| --max-tokens | 32768 | Maximum token size |
+| --embedding-dim | 1024 | Embedding dimensions |
+| --max-embed-tokens | 8192 | Maximum embedding token size |
+| --timeout | None | Timeout in seconds (useful when using slow AI). Use None for infinite timeout |
+| --log-level | INFO | Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL) |
+| --key | None | API key for authentication. Protects lightrag server against unauthorized access |
+| --ssl | False | Enable HTTPS |
+| --ssl-certfile | None | Path to SSL certificate file (required if --ssl is enabled) |
+| --ssl-keyfile | None | Path to SSL private key file (required if --ssl is enabled) |
+
+
+
+For protecting the server using an authentication key, you can also use an environment variable named `LIGHTRAG_API_KEY`.
+### Example Usage
+
+#### Running a Lightrag server with ollama default local server as llm and embedding backends
+
+Ollama is the default backend for both llm and embedding, so by default you can run lightrag-server with no parameters and the default ones will be used. Make sure ollama is installed and is running and default models are already installed on ollama.
+
+```bash
+# Run lightrag with ollama, mistral-nemo:latest for llm, and bge-m3:latest for embedding
+lightrag-server
+
+# Using specific models (ensure they are installed in your ollama instance)
+lightrag-server --llm-model adrienbrault/nous-hermes2theta-llama3-8b:f16 --embedding-model nomic-embed-text --embedding-dim 1024
+
+# Using an authentication key
+lightrag-server --key my-key
+
+# Using lollms for llm and ollama for embedding
+lightrag-server --llm-binding lollms
+```
+
+#### Running a Lightrag server with lollms default local server as llm and embedding backends
+
+```bash
+# Run lightrag with lollms, mistral-nemo:latest for llm, and bge-m3:latest for embedding, use lollms for both llm and embedding
+lightrag-server --llm-binding lollms --embedding-binding lollms
+
+# Using specific models (ensure they are installed in your ollama instance)
+lightrag-server --llm-binding lollms --llm-model adrienbrault/nous-hermes2theta-llama3-8b:f16 --embedding-binding lollms --embedding-model nomic-embed-text --embedding-dim 1024
+
+# Using an authentication key
+lightrag-server --key my-key
+
+# Using lollms for llm and openai for embedding
+lightrag-server --llm-binding lollms --embedding-binding openai --embedding-model text-embedding-3-small
+```
+
+
+#### Running a Lightrag server with openai server as llm and embedding backends
+
+```bash
+# Run lightrag with lollms, GPT-4o-mini  for llm, and text-embedding-3-small for embedding, use openai for both llm and embedding
+lightrag-server --llm-binding openai --llm-model GPT-4o-mini --embedding-binding openai --embedding-model text-embedding-3-small
+
+# Using an authentication key
+lightrag-server --llm-binding openai --llm-model GPT-4o-mini --embedding-binding openai --embedding-model text-embedding-3-small --key my-key
+
+# Using lollms for llm and openai for embedding
+lightrag-server --llm-binding lollms --embedding-binding openai --embedding-model text-embedding-3-small
+```
+
+#### Running a Lightrag server with azure openai server as llm and embedding backends
+
+```bash
+# Run lightrag with lollms, GPT-4o-mini  for llm, and text-embedding-3-small for embedding, use openai for both llm and embedding
+lightrag-server --llm-binding azure_openai --llm-model GPT-4o-mini --embedding-binding openai --embedding-model text-embedding-3-small
+
+# Using an authentication key
+lightrag-server --llm-binding azure_openai --llm-model GPT-4o-mini --embedding-binding azure_openai --embedding-model text-embedding-3-small --key my-key
+
+# Using lollms for llm and azure_openai for embedding
+lightrag-server --llm-binding lollms --embedding-binding azure_openai --embedding-model text-embedding-3-small
+```
+
+**Important Notes:**
+- For LoLLMs: Make sure the specified models are installed in your LoLLMs instance
+- For Ollama: Make sure the specified models are installed in your Ollama instance
+- For OpenAI: Ensure you have set up your OPENAI_API_KEY environment variable
+- For Azure OpenAI: Build and configure your server as stated in the Prequisites section
+
+For help on any server, use the --help flag:
+```bash
+lightrag-server --help
+```
+
+Note: If you don't need the API functionality, you can install the base package without API support using:
+```bash
+pip install lightrag-hku
+```
+
+## API Endpoints
+
+All servers (LoLLMs, Ollama, OpenAI and Azure OpenAI) provide the same REST API endpoints for RAG functionality.
+
+### Query Endpoints
+
+#### POST /query
+Query the RAG system with options for different search modes.
+
+```bash
+curl -X POST "http://localhost:9621/query" \
+    -H "Content-Type: application/json" \
+    -d '{"query": "Your question here", "mode": "hybrid", ""}'
+```
+
+#### POST /query/stream
+Stream responses from the RAG system.
+
+```bash
+curl -X POST "http://localhost:9621/query/stream" \
+    -H "Content-Type: application/json" \
+    -d '{"query": "Your question here", "mode": "hybrid"}'
+```
+
+### Document Management Endpoints
+
+#### POST /documents/text
+Insert text directly into the RAG system.
+
+```bash
+curl -X POST "http://localhost:9621/documents/text" \
+    -H "Content-Type: application/json" \
+    -d '{"text": "Your text content here", "description": "Optional description"}'
+```
+
+#### POST /documents/file
+Upload a single file to the RAG system.
+
+```bash
+curl -X POST "http://localhost:9621/documents/file" \
+    -F "file=@/path/to/your/document.txt" \
+    -F "description=Optional description"
+```
+
+#### POST /documents/batch
+Upload multiple files at once.
+
+```bash
+curl -X POST "http://localhost:9621/documents/batch" \
+    -F "files=@/path/to/doc1.txt" \
+    -F "files=@/path/to/doc2.txt"
+```
+
+#### DELETE /documents
+Clear all documents from the RAG system.
+
+```bash
+curl -X DELETE "http://localhost:9621/documents"
+```
+
+### Utility Endpoints
+
+#### GET /health
+Check server health and configuration.
+
+```bash
+curl "http://localhost:9621/health"
+```
+
+## Development
+Contribute to the project: [Guide](contributor-readme.MD)
+
+### Running in Development Mode
+
+For LoLLMs:
+```bash
+uvicorn lollms_lightrag_server:app --reload --port 9621
+```
+
+For Ollama:
+```bash
+uvicorn ollama_lightrag_server:app --reload --port 9621
+```
+
+For OpenAI:
+```bash
+uvicorn openai_lightrag_server:app --reload --port 9621
+```
+For Azure OpenAI:
+```bash
+uvicorn azure_openai_lightrag_server:app --reload --port 9621
+```
+### API Documentation
+
+When any server is running, visit:
+- Swagger UI: http://localhost:9621/docs
+- ReDoc: http://localhost:9621/redoc
+
+### Testing API Endpoints
+
+You can test the API endpoints using the provided curl commands or through the Swagger UI interface. Make sure to:
+1. Start the appropriate backend service (LoLLMs, Ollama, or OpenAI)
+2. Start the RAG server
+3. Upload some documents using the document management endpoints
+4. Query the system using the query endpoints
+
+### Important Features
+
+#### Automatic Document Vectorization
+When starting any of the servers with the `--input-dir` parameter, the system will automatically:
+1. Scan the specified directory for documents
+2. Check for existing vectorized content in the database
+3. Only vectorize new documents that aren't already in the database
+4. Make all content immediately available for RAG queries
+
+This intelligent caching mechanism:
+- Prevents unnecessary re-vectorization of existing documents
+- Reduces startup time for subsequent runs
+- Preserves system resources
+- Maintains consistency across restarts
+
+**Important Notes:**
+- The `--input-dir` parameter enables automatic document processing at startup
+- Documents already in the database are not re-vectorized
+- Only new documents in the input directory will be processed
+- This optimization significantly reduces startup time for subsequent runs
+- The working directory (`--working-dir`) stores the vectorized documents database
\ No newline at end of file
diff --git a/lightrag/api/README.md b/lightrag/api/README.md
new file mode 100644
index 00000000..588d6164
--- /dev/null
+++ b/lightrag/api/README.md
@@ -0,0 +1,302 @@
+## Install with API Support
+
+LightRAG provides optional API support through FastAPI servers that add RAG capabilities to existing LLM services. You can install LightRAG with API support in two ways:
+
+### 1. Installation from PyPI
+
+```bash
+pip install "lightrag-hku[api]"
+```
+
+### 2. Installation from Source (Development)
+
+```bash
+# Clone the repository
+git clone https://github.com/HKUDS/lightrag.git
+
+# Change to the repository directory
+cd lightrag
+
+# Install in editable mode with API support
+pip install -e ".[api]"
+```
+
+### Prerequisites
+
+Before running any of the servers, ensure you have the corresponding backend service running for both llm and embedding.
+The new api allows you to mix different bindings for llm/embeddings.
+For example, you have the possibility to use ollama for the embedding and openai for the llm.
+
+#### For LoLLMs Server
+- LoLLMs must be running and accessible
+- Default connection: http://localhost:9600
+- Configure using --llm-binding-host and/or --embedding-binding-host if running on a different host/port
+
+#### For Ollama Server
+- Ollama must be running and accessible
+- Default connection: http://localhost:11434
+- Configure using --ollama-host if running on a different host/port
+
+#### For OpenAI Server
+- Requires valid OpenAI API credentials set in environment variables
+- OPENAI_API_KEY must be set
+
+#### For Azure OpenAI Server
+Azure OpenAI API can be created using the following commands in Azure CLI (you need to install Azure CLI first from [https://docs.microsoft.com/en-us/cli/azure/install-azure-cli](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli)):
+```bash
+# Change the resource group name, location and OpenAI resource name as needed
+RESOURCE_GROUP_NAME=LightRAG
+LOCATION=swedencentral
+RESOURCE_NAME=LightRAG-OpenAI
+
+az login
+az group create --name $RESOURCE_GROUP_NAME --location $LOCATION
+az cognitiveservices account create --name $RESOURCE_NAME --resource-group $RESOURCE_GROUP_NAME  --kind OpenAI --sku S0 --location swedencentral
+az cognitiveservices account deployment create --resource-group $RESOURCE_GROUP_NAME  --model-format OpenAI --name $RESOURCE_NAME --deployment-name gpt-4o --model-name gpt-4o --model-version "2024-08-06"  --sku-capacity 100 --sku-name "Standard"
+az cognitiveservices account deployment create --resource-group $RESOURCE_GROUP_NAME  --model-format OpenAI --name $RESOURCE_NAME --deployment-name text-embedding-3-large --model-name text-embedding-3-large --model-version "1"  --sku-capacity 80 --sku-name "Standard"
+az cognitiveservices account show --name $RESOURCE_NAME --resource-group $RESOURCE_GROUP_NAME --query "properties.endpoint"
+az cognitiveservices account keys list --name $RESOURCE_NAME -g $RESOURCE_GROUP_NAME
+
+```
+The output of the last command will give you the endpoint and the key for the OpenAI API. You can use these values to set the environment variables in the `.env` file.
+
+
+
+### Configuration Options
+
+Each server has its own specific configuration options:
+
+#### LightRag Server Options
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| --host | 0.0.0.0 | Server host |
+| --port | 9621 | Server port |
+| --llm-binding | ollama | LLM binding to be used. Supported: lollms, ollama, openai |
+| --llm-binding-host | (dynamic) | LLM server host URL. Defaults based on binding: http://localhost:11434 (ollama), http://localhost:9600 (lollms), https://api.openai.com/v1 (openai) |
+| --llm-model | mistral-nemo:latest | LLM model name |
+| --embedding-binding | ollama | Embedding binding to be used. Supported: lollms, ollama, openai |
+| --embedding-binding-host | (dynamic) | Embedding server host URL. Defaults based on binding: http://localhost:11434 (ollama), http://localhost:9600 (lollms), https://api.openai.com/v1 (openai) |
+| --embedding-model | bge-m3:latest | Embedding model name |
+| --working-dir | ./rag_storage | Working directory for RAG storage |
+| --input-dir | ./inputs | Directory containing input documents |
+| --max-async | 4 | Maximum async operations |
+| --max-tokens | 32768 | Maximum token size |
+| --embedding-dim | 1024 | Embedding dimensions |
+| --max-embed-tokens | 8192 | Maximum embedding token size |
+| --timeout | None | Timeout in seconds (useful when using slow AI). Use None for infinite timeout |
+| --log-level | INFO | Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL) |
+| --key | None | API key for authentication. Protects lightrag server against unauthorized access |
+| --ssl | False | Enable HTTPS |
+| --ssl-certfile | None | Path to SSL certificate file (required if --ssl is enabled) |
+| --ssl-keyfile | None | Path to SSL private key file (required if --ssl is enabled) |
+
+
+
+For protecting the server using an authentication key, you can also use an environment variable named `LIGHTRAG_API_KEY`.
+### Example Usage
+
+#### Running a Lightrag server with ollama default local server as llm and embedding backends
+
+Ollama is the default backend for both llm and embedding, so by default you can run lightrag-server with no parameters and the default ones will be used. Make sure ollama is installed and is running and default models are already installed on ollama.
+
+```bash
+# Run lightrag with ollama, mistral-nemo:latest for llm, and bge-m3:latest for embedding
+lightrag-server
+
+# Using specific models (ensure they are installed in your ollama instance)
+lightrag-server --llm-model adrienbrault/nous-hermes2theta-llama3-8b:f16 --embedding-model nomic-embed-text --embedding-dim 1024
+
+# Using an authentication key
+lightrag-server --key my-key
+
+# Using lollms for llm and ollama for embedding
+lightrag-server --llm-binding lollms
+```
+
+#### Running a Lightrag server with lollms default local server as llm and embedding backends
+
+```bash
+# Run lightrag with lollms, mistral-nemo:latest for llm, and bge-m3:latest for embedding, use lollms for both llm and embedding
+lightrag-server --llm-binding lollms --embedding-binding lollms
+
+# Using specific models (ensure they are installed in your ollama instance)
+lightrag-server --llm-binding lollms --llm-model adrienbrault/nous-hermes2theta-llama3-8b:f16 --embedding-binding lollms --embedding-model nomic-embed-text --embedding-dim 1024
+
+# Using an authentication key
+lightrag-server --key my-key
+
+# Using lollms for llm and openai for embedding
+lightrag-server --llm-binding lollms --embedding-binding openai --embedding-model text-embedding-3-small
+```
+
+
+#### Running a Lightrag server with openai server as llm and embedding backends
+
+```bash
+# Run lightrag with lollms, GPT-4o-mini  for llm, and text-embedding-3-small for embedding, use openai for both llm and embedding
+lightrag-server --llm-binding openai --llm-model GPT-4o-mini --embedding-binding openai --embedding-model text-embedding-3-small
+
+# Using an authentication key
+lightrag-server --llm-binding openai --llm-model GPT-4o-mini --embedding-binding openai --embedding-model text-embedding-3-small --key my-key
+
+# Using lollms for llm and openai for embedding
+lightrag-server --llm-binding lollms --embedding-binding openai --embedding-model text-embedding-3-small
+```
+
+#### Running a Lightrag server with azure openai server as llm and embedding backends
+
+```bash
+# Run lightrag with lollms, GPT-4o-mini  for llm, and text-embedding-3-small for embedding, use openai for both llm and embedding
+lightrag-server --llm-binding azure_openai --llm-model GPT-4o-mini --embedding-binding openai --embedding-model text-embedding-3-small
+
+# Using an authentication key
+lightrag-server --llm-binding azure_openai --llm-model GPT-4o-mini --embedding-binding azure_openai --embedding-model text-embedding-3-small --key my-key
+
+# Using lollms for llm and azure_openai for embedding
+lightrag-server --llm-binding lollms --embedding-binding azure_openai --embedding-model text-embedding-3-small
+```
+
+**Important Notes:**
+- For LoLLMs: Make sure the specified models are installed in your LoLLMs instance
+- For Ollama: Make sure the specified models are installed in your Ollama instance
+- For OpenAI: Ensure you have set up your OPENAI_API_KEY environment variable
+- For Azure OpenAI: Build and configure your server as stated in the Prequisites section
+
+For help on any server, use the --help flag:
+```bash
+lightrag-server --help
+```
+
+Note: If you don't need the API functionality, you can install the base package without API support using:
+```bash
+pip install lightrag-hku
+```
+
+## API Endpoints
+
+All servers (LoLLMs, Ollama, OpenAI and Azure OpenAI) provide the same REST API endpoints for RAG functionality.
+
+### Query Endpoints
+
+#### POST /query
+Query the RAG system with options for different search modes.
+
+```bash
+curl -X POST "http://localhost:9621/query" \
+    -H "Content-Type: application/json" \
+    -d '{"query": "Your question here", "mode": "hybrid", ""}'
+```
+
+#### POST /query/stream
+Stream responses from the RAG system.
+
+```bash
+curl -X POST "http://localhost:9621/query/stream" \
+    -H "Content-Type: application/json" \
+    -d '{"query": "Your question here", "mode": "hybrid"}'
+```
+
+### Document Management Endpoints
+
+#### POST /documents/text
+Insert text directly into the RAG system.
+
+```bash
+curl -X POST "http://localhost:9621/documents/text" \
+    -H "Content-Type: application/json" \
+    -d '{"text": "Your text content here", "description": "Optional description"}'
+```
+
+#### POST /documents/file
+Upload a single file to the RAG system.
+
+```bash
+curl -X POST "http://localhost:9621/documents/file" \
+    -F "file=@/path/to/your/document.txt" \
+    -F "description=Optional description"
+```
+
+#### POST /documents/batch
+Upload multiple files at once.
+
+```bash
+curl -X POST "http://localhost:9621/documents/batch" \
+    -F "files=@/path/to/doc1.txt" \
+    -F "files=@/path/to/doc2.txt"
+```
+
+#### DELETE /documents
+Clear all documents from the RAG system.
+
+```bash
+curl -X DELETE "http://localhost:9621/documents"
+```
+
+### Utility Endpoints
+
+#### GET /health
+Check server health and configuration.
+
+```bash
+curl "http://localhost:9621/health"
+```
+
+## Development
+Contribute to the project: [Guide](contributor-readme.MD)
+
+### Running in Development Mode
+
+For LoLLMs:
+```bash
+uvicorn lollms_lightrag_server:app --reload --port 9621
+```
+
+For Ollama:
+```bash
+uvicorn ollama_lightrag_server:app --reload --port 9621
+```
+
+For OpenAI:
+```bash
+uvicorn openai_lightrag_server:app --reload --port 9621
+```
+For Azure OpenAI:
+```bash
+uvicorn azure_openai_lightrag_server:app --reload --port 9621
+```
+### API Documentation
+
+When any server is running, visit:
+- Swagger UI: http://localhost:9621/docs
+- ReDoc: http://localhost:9621/redoc
+
+### Testing API Endpoints
+
+You can test the API endpoints using the provided curl commands or through the Swagger UI interface. Make sure to:
+1. Start the appropriate backend service (LoLLMs, Ollama, or OpenAI)
+2. Start the RAG server
+3. Upload some documents using the document management endpoints
+4. Query the system using the query endpoints
+
+### Important Features
+
+#### Automatic Document Vectorization
+When starting any of the servers with the `--input-dir` parameter, the system will automatically:
+1. Scan the specified directory for documents
+2. Check for existing vectorized content in the database
+3. Only vectorize new documents that aren't already in the database
+4. Make all content immediately available for RAG queries
+
+This intelligent caching mechanism:
+- Prevents unnecessary re-vectorization of existing documents
+- Reduces startup time for subsequent runs
+- Preserves system resources
+- Maintains consistency across restarts
+
+**Important Notes:**
+- The `--input-dir` parameter enables automatic document processing at startup
+- Documents already in the database are not re-vectorized
+- Only new documents in the input directory will be processed
+- This optimization significantly reduces startup time for subsequent runs
+- The working directory (`--working-dir`) stores the vectorized documents database
\ No newline at end of file