Remove Oracle storage implementation
This commit is contained in:
@@ -11,7 +11,6 @@
|
|||||||
- [X] [2024.12.31]🎯📢LightRAG现在支持[通过文档ID删除](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete)。
|
- [X] [2024.12.31]🎯📢LightRAG现在支持[通过文档ID删除](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete)。
|
||||||
- [X] [2024.11.25]🎯📢LightRAG现在支持无缝集成[自定义知识图谱](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#insert-custom-kg),使用户能够用自己的领域专业知识增强系统。
|
- [X] [2024.11.25]🎯📢LightRAG现在支持无缝集成[自定义知识图谱](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#insert-custom-kg),使用户能够用自己的领域专业知识增强系统。
|
||||||
- [X] [2024.11.19]🎯📢LightRAG的综合指南现已在[LearnOpenCV](https://learnopencv.com/lightrag)上发布。非常感谢博客作者。
|
- [X] [2024.11.19]🎯📢LightRAG的综合指南现已在[LearnOpenCV](https://learnopencv.com/lightrag)上发布。非常感谢博客作者。
|
||||||
- [X] [2024.11.12]🎯📢LightRAG现在支持[Oracle Database 23ai的所有存储类型(KV、向量和图)](https://github.com/HKUDS/LightRAG/blob/main/examples/lightrag_oracle_demo.py)。
|
|
||||||
- [X] [2024.11.11]🎯📢LightRAG现在支持[通过实体名称删除实体](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete)。
|
- [X] [2024.11.11]🎯📢LightRAG现在支持[通过实体名称删除实体](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete)。
|
||||||
- [X] [2024.11.09]🎯📢推出[LightRAG Gui](https://lightrag-gui.streamlit.app),允许您插入、查询、可视化和下载LightRAG知识。
|
- [X] [2024.11.09]🎯📢推出[LightRAG Gui](https://lightrag-gui.streamlit.app),允许您插入、查询、可视化和下载LightRAG知识。
|
||||||
- [X] [2024.11.04]🎯📢现在您可以[使用Neo4J进行存储](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#using-neo4j-for-storage)。
|
- [X] [2024.11.04]🎯📢现在您可以[使用Neo4J进行存储](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#using-neo4j-for-storage)。
|
||||||
@@ -1037,9 +1036,10 @@ rag.clear_cache(modes=["local"])
|
|||||||
| **参数** | **类型** | **说明** | **默认值** |
|
| **参数** | **类型** | **说明** | **默认值** |
|
||||||
|--------------|----------|-----------------|-------------|
|
|--------------|----------|-----------------|-------------|
|
||||||
| **working_dir** | `str` | 存储缓存的目录 | `lightrag_cache+timestamp` |
|
| **working_dir** | `str` | 存储缓存的目录 | `lightrag_cache+timestamp` |
|
||||||
| **kv_storage** | `str` | 文档和文本块的存储类型。支持的类型:`JsonKVStorage`、`OracleKVStorage` | `JsonKVStorage` |
|
| **kv_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`,`PGKVStorage`,`RedisKVStorage`,`MongoKVStorage`,`TiDBKVStorage` | `JsonKVStorage` |
|
||||||
| **vector_storage** | `str` | 嵌入向量的存储类型。支持的类型:`NanoVectorDBStorage`、`OracleVectorDBStorage` | `NanoVectorDBStorage` |
|
| **vector_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`,`PGVectorStorage`,`MilvusVectorDBStorage`,`ChromaVectorDBStorage`,`FaissVectorDBStorage`,`TiDBVectorDBStorage`,`MongoVectorDBStorage`,`QdrantVectorDBStorage` | `NanoVectorDBStorage` |
|
||||||
| **graph_storage** | `str` | 图边和节点的存储类型。支持的类型:`NetworkXStorage`、`Neo4JStorage`、`OracleGraphStorage` | `NetworkXStorage` |
|
| **graph_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`,`Neo4JStorage`,`PGGraphStorage`,`AGEStorage`,`GremlinStorage` | `NetworkXStorage` |
|
||||||
|
| **doc_status_storage** | `str` | Storage type for documents process status. Supported types: `JsonDocStatusStorage`,`PGDocStatusStorage`,`MongoDocStatusStorage` | `JsonDocStatusStorage` |
|
||||||
| **chunk_token_size** | `int` | 拆分文档时每个块的最大令牌大小 | `1200` |
|
| **chunk_token_size** | `int` | 拆分文档时每个块的最大令牌大小 | `1200` |
|
||||||
| **chunk_overlap_token_size** | `int` | 拆分文档时两个块之间的重叠令牌大小 | `100` |
|
| **chunk_overlap_token_size** | `int` | 拆分文档时两个块之间的重叠令牌大小 | `100` |
|
||||||
| **tiktoken_model_name** | `str` | 用于计算令牌数的Tiktoken编码器的模型名称 | `gpt-4o-mini` |
|
| **tiktoken_model_name** | `str` | 用于计算令牌数的Tiktoken编码器的模型名称 | `gpt-4o-mini` |
|
||||||
|
@@ -41,7 +41,6 @@
|
|||||||
- [X] [2024.12.31]🎯📢LightRAG now supports [deletion by document ID](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete).
|
- [X] [2024.12.31]🎯📢LightRAG now supports [deletion by document ID](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete).
|
||||||
- [X] [2024.11.25]🎯📢LightRAG now supports seamless integration of [custom knowledge graphs](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#insert-custom-kg), empowering users to enhance the system with their own domain expertise.
|
- [X] [2024.11.25]🎯📢LightRAG now supports seamless integration of [custom knowledge graphs](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#insert-custom-kg), empowering users to enhance the system with their own domain expertise.
|
||||||
- [X] [2024.11.19]🎯📢A comprehensive guide to LightRAG is now available on [LearnOpenCV](https://learnopencv.com/lightrag). Many thanks to the blog author.
|
- [X] [2024.11.19]🎯📢A comprehensive guide to LightRAG is now available on [LearnOpenCV](https://learnopencv.com/lightrag). Many thanks to the blog author.
|
||||||
- [X] [2024.11.12]🎯📢LightRAG now supports [Oracle Database 23ai for all storage types (KV, vector, and graph)](https://github.com/HKUDS/LightRAG/blob/main/examples/lightrag_oracle_demo.py).
|
|
||||||
- [X] [2024.11.11]🎯📢LightRAG now supports [deleting entities by their names](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete).
|
- [X] [2024.11.11]🎯📢LightRAG now supports [deleting entities by their names](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#delete).
|
||||||
- [X] [2024.11.09]🎯📢Introducing the [LightRAG Gui](https://lightrag-gui.streamlit.app), which allows you to insert, query, visualize, and download LightRAG knowledge.
|
- [X] [2024.11.09]🎯📢Introducing the [LightRAG Gui](https://lightrag-gui.streamlit.app), which allows you to insert, query, visualize, and download LightRAG knowledge.
|
||||||
- [X] [2024.11.04]🎯📢You can now [use Neo4J for Storage](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#using-neo4j-for-storage).
|
- [X] [2024.11.04]🎯📢You can now [use Neo4J for Storage](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#using-neo4j-for-storage).
|
||||||
@@ -1065,9 +1064,10 @@ Valid modes are:
|
|||||||
| **Parameter** | **Type** | **Explanation** | **Default** |
|
| **Parameter** | **Type** | **Explanation** | **Default** |
|
||||||
|--------------|----------|-----------------|-------------|
|
|--------------|----------|-----------------|-------------|
|
||||||
| **working_dir** | `str` | Directory where the cache will be stored | `lightrag_cache+timestamp` |
|
| **working_dir** | `str` | Directory where the cache will be stored | `lightrag_cache+timestamp` |
|
||||||
| **kv_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`, `OracleKVStorage` | `JsonKVStorage` |
|
| **kv_storage** | `str` | Storage type for documents and text chunks. Supported types: `JsonKVStorage`,`PGKVStorage`,`RedisKVStorage`,`MongoKVStorage`,`TiDBKVStorage` | `JsonKVStorage` |
|
||||||
| **vector_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`, `OracleVectorDBStorage` | `NanoVectorDBStorage` |
|
| **vector_storage** | `str` | Storage type for embedding vectors. Supported types: `NanoVectorDBStorage`,`PGVectorStorage`,`MilvusVectorDBStorage`,`ChromaVectorDBStorage`,`FaissVectorDBStorage`,`TiDBVectorDBStorage`,`MongoVectorDBStorage`,`QdrantVectorDBStorage` | `NanoVectorDBStorage` |
|
||||||
| **graph_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`, `Neo4JStorage`, `OracleGraphStorage` | `NetworkXStorage` |
|
| **graph_storage** | `str` | Storage type for graph edges and nodes. Supported types: `NetworkXStorage`,`Neo4JStorage`,`PGGraphStorage`,`AGEStorage`,`GremlinStorage` | `NetworkXStorage` |
|
||||||
|
| **doc_status_storage** | `str` | Storage type for documents process status. Supported types: `JsonDocStatusStorage`,`PGDocStatusStorage`,`MongoDocStatusStorage` | `JsonDocStatusStorage` |
|
||||||
| **chunk_token_size** | `int` | Maximum token size per chunk when splitting documents | `1200` |
|
| **chunk_token_size** | `int` | Maximum token size per chunk when splitting documents | `1200` |
|
||||||
| **chunk_overlap_token_size** | `int` | Overlap token size between two chunks when splitting documents | `100` |
|
| **chunk_overlap_token_size** | `int` | Overlap token size between two chunks when splitting documents | `100` |
|
||||||
| **tiktoken_model_name** | `str` | Model name for the Tiktoken encoder used to calculate token numbers | `gpt-4o-mini` |
|
| **tiktoken_model_name** | `str` | Model name for the Tiktoken encoder used to calculate token numbers | `gpt-4o-mini` |
|
||||||
|
@@ -13,15 +13,6 @@ uri=redis://localhost:6379/1
|
|||||||
[qdrant]
|
[qdrant]
|
||||||
uri = http://localhost:16333
|
uri = http://localhost:16333
|
||||||
|
|
||||||
[oracle]
|
|
||||||
dsn = localhost:1521/XEPDB1
|
|
||||||
user = your_username
|
|
||||||
password = your_password
|
|
||||||
config_dir = /path/to/oracle/config
|
|
||||||
wallet_location = /path/to/wallet # 可选
|
|
||||||
wallet_password = your_wallet_password # 可选
|
|
||||||
workspace = default # 可选,默认为default
|
|
||||||
|
|
||||||
[tidb]
|
[tidb]
|
||||||
host = localhost
|
host = localhost
|
||||||
port = 4000
|
port = 4000
|
||||||
|
10
env.example
10
env.example
@@ -109,16 +109,6 @@ LIGHTRAG_VECTOR_STORAGE=NanoVectorDBStorage
|
|||||||
LIGHTRAG_GRAPH_STORAGE=NetworkXStorage
|
LIGHTRAG_GRAPH_STORAGE=NetworkXStorage
|
||||||
LIGHTRAG_DOC_STATUS_STORAGE=JsonDocStatusStorage
|
LIGHTRAG_DOC_STATUS_STORAGE=JsonDocStatusStorage
|
||||||
|
|
||||||
### Oracle Database Configuration
|
|
||||||
ORACLE_DSN=localhost:1521/XEPDB1
|
|
||||||
ORACLE_USER=your_username
|
|
||||||
ORACLE_PASSWORD='your_password'
|
|
||||||
ORACLE_CONFIG_DIR=/path/to/oracle/config
|
|
||||||
#ORACLE_WALLET_LOCATION=/path/to/wallet
|
|
||||||
#ORACLE_WALLET_PASSWORD='your_password'
|
|
||||||
### separating all data from difference Lightrag instances(deprecating)
|
|
||||||
#ORACLE_WORKSPACE=default
|
|
||||||
|
|
||||||
### TiDB Configuration
|
### TiDB Configuration
|
||||||
TIDB_HOST=localhost
|
TIDB_HOST=localhost
|
||||||
TIDB_PORT=4000
|
TIDB_PORT=4000
|
||||||
|
@@ -1,267 +0,0 @@
|
|||||||
from fastapi import FastAPI, HTTPException, File, UploadFile
|
|
||||||
from fastapi import Query
|
|
||||||
from contextlib import asynccontextmanager
|
|
||||||
from pydantic import BaseModel
|
|
||||||
from typing import Optional, Any
|
|
||||||
|
|
||||||
import sys
|
|
||||||
import os
|
|
||||||
|
|
||||||
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
import asyncio
|
|
||||||
import nest_asyncio
|
|
||||||
from lightrag import LightRAG, QueryParam
|
|
||||||
from lightrag.llm.openai import openai_complete_if_cache, openai_embed
|
|
||||||
from lightrag.utils import EmbeddingFunc
|
|
||||||
import numpy as np
|
|
||||||
from lightrag.kg.shared_storage import initialize_pipeline_status
|
|
||||||
|
|
||||||
|
|
||||||
print(os.getcwd())
|
|
||||||
script_directory = Path(__file__).resolve().parent.parent
|
|
||||||
sys.path.append(os.path.abspath(script_directory))
|
|
||||||
|
|
||||||
|
|
||||||
# Apply nest_asyncio to solve event loop issues
|
|
||||||
nest_asyncio.apply()
|
|
||||||
|
|
||||||
DEFAULT_RAG_DIR = "index_default"
|
|
||||||
|
|
||||||
|
|
||||||
# We use OpenAI compatible API to call LLM on Oracle Cloud
|
|
||||||
# More docs here https://github.com/jin38324/OCI_GenAI_access_gateway
|
|
||||||
BASE_URL = "http://xxx.xxx.xxx.xxx:8088/v1/"
|
|
||||||
APIKEY = "ocigenerativeai"
|
|
||||||
|
|
||||||
# Configure working directory
|
|
||||||
WORKING_DIR = os.environ.get("RAG_DIR", f"{DEFAULT_RAG_DIR}")
|
|
||||||
print(f"WORKING_DIR: {WORKING_DIR}")
|
|
||||||
LLM_MODEL = os.environ.get("LLM_MODEL", "cohere.command-r-plus-08-2024")
|
|
||||||
print(f"LLM_MODEL: {LLM_MODEL}")
|
|
||||||
EMBEDDING_MODEL = os.environ.get("EMBEDDING_MODEL", "cohere.embed-multilingual-v3.0")
|
|
||||||
print(f"EMBEDDING_MODEL: {EMBEDDING_MODEL}")
|
|
||||||
EMBEDDING_MAX_TOKEN_SIZE = int(os.environ.get("EMBEDDING_MAX_TOKEN_SIZE", 512))
|
|
||||||
print(f"EMBEDDING_MAX_TOKEN_SIZE: {EMBEDDING_MAX_TOKEN_SIZE}")
|
|
||||||
|
|
||||||
if not os.path.exists(WORKING_DIR):
|
|
||||||
os.mkdir(WORKING_DIR)
|
|
||||||
|
|
||||||
os.environ["ORACLE_USER"] = ""
|
|
||||||
os.environ["ORACLE_PASSWORD"] = ""
|
|
||||||
os.environ["ORACLE_DSN"] = ""
|
|
||||||
os.environ["ORACLE_CONFIG_DIR"] = "path_to_config_dir"
|
|
||||||
os.environ["ORACLE_WALLET_LOCATION"] = "path_to_wallet_location"
|
|
||||||
os.environ["ORACLE_WALLET_PASSWORD"] = "wallet_password"
|
|
||||||
os.environ["ORACLE_WORKSPACE"] = "company"
|
|
||||||
|
|
||||||
|
|
||||||
async def llm_model_func(
|
|
||||||
prompt, system_prompt=None, history_messages=[], keyword_extraction=False, **kwargs
|
|
||||||
) -> str:
|
|
||||||
return await openai_complete_if_cache(
|
|
||||||
LLM_MODEL,
|
|
||||||
prompt,
|
|
||||||
system_prompt=system_prompt,
|
|
||||||
history_messages=history_messages,
|
|
||||||
api_key=APIKEY,
|
|
||||||
base_url=BASE_URL,
|
|
||||||
**kwargs,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
async def embedding_func(texts: list[str]) -> np.ndarray:
|
|
||||||
return await openai_embed(
|
|
||||||
texts,
|
|
||||||
model=EMBEDDING_MODEL,
|
|
||||||
api_key=APIKEY,
|
|
||||||
base_url=BASE_URL,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
async def get_embedding_dim():
|
|
||||||
test_text = ["This is a test sentence."]
|
|
||||||
embedding = await embedding_func(test_text)
|
|
||||||
embedding_dim = embedding.shape[1]
|
|
||||||
return embedding_dim
|
|
||||||
|
|
||||||
|
|
||||||
async def init():
|
|
||||||
# Detect embedding dimension
|
|
||||||
embedding_dimension = await get_embedding_dim()
|
|
||||||
print(f"Detected embedding dimension: {embedding_dimension}")
|
|
||||||
# Create Oracle DB connection
|
|
||||||
# The `config` parameter is the connection configuration of Oracle DB
|
|
||||||
# More docs here https://python-oracledb.readthedocs.io/en/latest/user_guide/connection_handling.html
|
|
||||||
# We storage data in unified tables, so we need to set a `workspace` parameter to specify which docs we want to store and query
|
|
||||||
# Below is an example of how to connect to Oracle Autonomous Database on Oracle Cloud
|
|
||||||
|
|
||||||
# Initialize LightRAG
|
|
||||||
# We use Oracle DB as the KV/vector/graph storage
|
|
||||||
rag = LightRAG(
|
|
||||||
enable_llm_cache=False,
|
|
||||||
working_dir=WORKING_DIR,
|
|
||||||
chunk_token_size=512,
|
|
||||||
llm_model_func=llm_model_func,
|
|
||||||
embedding_func=EmbeddingFunc(
|
|
||||||
embedding_dim=embedding_dimension,
|
|
||||||
max_token_size=512,
|
|
||||||
func=embedding_func,
|
|
||||||
),
|
|
||||||
graph_storage="OracleGraphStorage",
|
|
||||||
kv_storage="OracleKVStorage",
|
|
||||||
vector_storage="OracleVectorDBStorage",
|
|
||||||
)
|
|
||||||
|
|
||||||
await rag.initialize_storages()
|
|
||||||
await initialize_pipeline_status()
|
|
||||||
|
|
||||||
return rag
|
|
||||||
|
|
||||||
|
|
||||||
# Extract and Insert into LightRAG storage
|
|
||||||
# with open("./dickens/book.txt", "r", encoding="utf-8") as f:
|
|
||||||
# await rag.ainsert(f.read())
|
|
||||||
|
|
||||||
# # Perform search in different modes
|
|
||||||
# modes = ["naive", "local", "global", "hybrid"]
|
|
||||||
# for mode in modes:
|
|
||||||
# print("="*20, mode, "="*20)
|
|
||||||
# print(await rag.aquery("这篇文档是关于什么内容的?", param=QueryParam(mode=mode)))
|
|
||||||
# print("-"*100, "\n")
|
|
||||||
|
|
||||||
# Data models
|
|
||||||
|
|
||||||
|
|
||||||
class QueryRequest(BaseModel):
|
|
||||||
query: str
|
|
||||||
mode: str = "hybrid"
|
|
||||||
only_need_context: bool = False
|
|
||||||
only_need_prompt: bool = False
|
|
||||||
|
|
||||||
|
|
||||||
class DataRequest(BaseModel):
|
|
||||||
limit: int = 100
|
|
||||||
|
|
||||||
|
|
||||||
class InsertRequest(BaseModel):
|
|
||||||
text: str
|
|
||||||
|
|
||||||
|
|
||||||
class Response(BaseModel):
|
|
||||||
status: str
|
|
||||||
data: Optional[Any] = None
|
|
||||||
message: Optional[str] = None
|
|
||||||
|
|
||||||
|
|
||||||
# API routes
|
|
||||||
|
|
||||||
rag = None
|
|
||||||
|
|
||||||
|
|
||||||
@asynccontextmanager
|
|
||||||
async def lifespan(app: FastAPI):
|
|
||||||
global rag
|
|
||||||
rag = await init()
|
|
||||||
print("done!")
|
|
||||||
yield
|
|
||||||
|
|
||||||
|
|
||||||
app = FastAPI(
|
|
||||||
title="LightRAG API", description="API for RAG operations", lifespan=lifespan
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
@app.post("/query", response_model=Response)
|
|
||||||
async def query_endpoint(request: QueryRequest):
|
|
||||||
# try:
|
|
||||||
# loop = asyncio.get_event_loop()
|
|
||||||
if request.mode == "naive":
|
|
||||||
top_k = 3
|
|
||||||
else:
|
|
||||||
top_k = 60
|
|
||||||
result = await rag.aquery(
|
|
||||||
request.query,
|
|
||||||
param=QueryParam(
|
|
||||||
mode=request.mode,
|
|
||||||
only_need_context=request.only_need_context,
|
|
||||||
only_need_prompt=request.only_need_prompt,
|
|
||||||
top_k=top_k,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
return Response(status="success", data=result)
|
|
||||||
# except Exception as e:
|
|
||||||
# raise HTTPException(status_code=500, detail=str(e))
|
|
||||||
|
|
||||||
|
|
||||||
@app.get("/data", response_model=Response)
|
|
||||||
async def query_all_nodes(type: str = Query("nodes"), limit: int = Query(100)):
|
|
||||||
if type == "nodes":
|
|
||||||
result = await rag.chunk_entity_relation_graph.get_all_nodes(limit=limit)
|
|
||||||
elif type == "edges":
|
|
||||||
result = await rag.chunk_entity_relation_graph.get_all_edges(limit=limit)
|
|
||||||
elif type == "statistics":
|
|
||||||
result = await rag.chunk_entity_relation_graph.get_statistics()
|
|
||||||
return Response(status="success", data=result)
|
|
||||||
|
|
||||||
|
|
||||||
@app.post("/insert", response_model=Response)
|
|
||||||
async def insert_endpoint(request: InsertRequest):
|
|
||||||
try:
|
|
||||||
loop = asyncio.get_event_loop()
|
|
||||||
await loop.run_in_executor(None, lambda: rag.insert(request.text))
|
|
||||||
return Response(status="success", message="Text inserted successfully")
|
|
||||||
except Exception as e:
|
|
||||||
raise HTTPException(status_code=500, detail=str(e))
|
|
||||||
|
|
||||||
|
|
||||||
@app.post("/insert_file", response_model=Response)
|
|
||||||
async def insert_file(file: UploadFile = File(...)):
|
|
||||||
try:
|
|
||||||
file_content = await file.read()
|
|
||||||
# Read file content
|
|
||||||
try:
|
|
||||||
content = file_content.decode("utf-8")
|
|
||||||
except UnicodeDecodeError:
|
|
||||||
# If UTF-8 decoding fails, try other encodings
|
|
||||||
content = file_content.decode("gbk")
|
|
||||||
# Insert file content
|
|
||||||
loop = asyncio.get_event_loop()
|
|
||||||
await loop.run_in_executor(None, lambda: rag.insert(content))
|
|
||||||
|
|
||||||
return Response(
|
|
||||||
status="success",
|
|
||||||
message=f"File content from {file.filename} inserted successfully",
|
|
||||||
)
|
|
||||||
except Exception as e:
|
|
||||||
raise HTTPException(status_code=500, detail=str(e))
|
|
||||||
|
|
||||||
|
|
||||||
@app.get("/health")
|
|
||||||
async def health_check():
|
|
||||||
return {"status": "healthy"}
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
import uvicorn
|
|
||||||
|
|
||||||
uvicorn.run(app, host="127.0.0.1", port=8020)
|
|
||||||
|
|
||||||
# Usage example
|
|
||||||
# To run the server, use the following command in your terminal:
|
|
||||||
# python lightrag_api_openai_compatible_demo.py
|
|
||||||
|
|
||||||
# Example requests:
|
|
||||||
# 1. Query:
|
|
||||||
# curl -X POST "http://127.0.0.1:8020/query" -H "Content-Type: application/json" -d '{"query": "your query here", "mode": "hybrid"}'
|
|
||||||
|
|
||||||
# 2. Insert text:
|
|
||||||
# curl -X POST "http://127.0.0.1:8020/insert" -H "Content-Type: application/json" -d '{"text": "your text here"}'
|
|
||||||
|
|
||||||
# 3. Insert file:
|
|
||||||
# curl -X POST "http://127.0.0.1:8020/insert_file" -H "Content-Type: multipart/form-data" -F "file=@path/to/your/file.txt"
|
|
||||||
|
|
||||||
|
|
||||||
# 4. Health check:
|
|
||||||
# curl -X GET "http://127.0.0.1:8020/health"
|
|
@@ -1,141 +0,0 @@
|
|||||||
import sys
|
|
||||||
import os
|
|
||||||
from pathlib import Path
|
|
||||||
import asyncio
|
|
||||||
from lightrag import LightRAG, QueryParam
|
|
||||||
from lightrag.llm.openai import openai_complete_if_cache, openai_embed
|
|
||||||
from lightrag.utils import EmbeddingFunc
|
|
||||||
import numpy as np
|
|
||||||
from lightrag.kg.shared_storage import initialize_pipeline_status
|
|
||||||
|
|
||||||
print(os.getcwd())
|
|
||||||
script_directory = Path(__file__).resolve().parent.parent
|
|
||||||
sys.path.append(os.path.abspath(script_directory))
|
|
||||||
|
|
||||||
WORKING_DIR = "./dickens"
|
|
||||||
|
|
||||||
# We use OpenAI compatible API to call LLM on Oracle Cloud
|
|
||||||
# More docs here https://github.com/jin38324/OCI_GenAI_access_gateway
|
|
||||||
BASE_URL = "http://xxx.xxx.xxx.xxx:8088/v1/"
|
|
||||||
APIKEY = "ocigenerativeai"
|
|
||||||
CHATMODEL = "cohere.command-r-plus"
|
|
||||||
EMBEDMODEL = "cohere.embed-multilingual-v3.0"
|
|
||||||
CHUNK_TOKEN_SIZE = 1024
|
|
||||||
MAX_TOKENS = 4000
|
|
||||||
|
|
||||||
if not os.path.exists(WORKING_DIR):
|
|
||||||
os.mkdir(WORKING_DIR)
|
|
||||||
|
|
||||||
os.environ["ORACLE_USER"] = "username"
|
|
||||||
os.environ["ORACLE_PASSWORD"] = "xxxxxxxxx"
|
|
||||||
os.environ["ORACLE_DSN"] = "xxxxxxx_medium"
|
|
||||||
os.environ["ORACLE_CONFIG_DIR"] = "path_to_config_dir"
|
|
||||||
os.environ["ORACLE_WALLET_LOCATION"] = "path_to_wallet_location"
|
|
||||||
os.environ["ORACLE_WALLET_PASSWORD"] = "wallet_password"
|
|
||||||
os.environ["ORACLE_WORKSPACE"] = "company"
|
|
||||||
|
|
||||||
|
|
||||||
async def llm_model_func(
|
|
||||||
prompt, system_prompt=None, history_messages=[], keyword_extraction=False, **kwargs
|
|
||||||
) -> str:
|
|
||||||
return await openai_complete_if_cache(
|
|
||||||
CHATMODEL,
|
|
||||||
prompt,
|
|
||||||
system_prompt=system_prompt,
|
|
||||||
history_messages=history_messages,
|
|
||||||
api_key=APIKEY,
|
|
||||||
base_url=BASE_URL,
|
|
||||||
**kwargs,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
async def embedding_func(texts: list[str]) -> np.ndarray:
|
|
||||||
return await openai_embed(
|
|
||||||
texts,
|
|
||||||
model=EMBEDMODEL,
|
|
||||||
api_key=APIKEY,
|
|
||||||
base_url=BASE_URL,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
async def get_embedding_dim():
|
|
||||||
test_text = ["This is a test sentence."]
|
|
||||||
embedding = await embedding_func(test_text)
|
|
||||||
embedding_dim = embedding.shape[1]
|
|
||||||
return embedding_dim
|
|
||||||
|
|
||||||
|
|
||||||
async def initialize_rag():
|
|
||||||
# Detect embedding dimension
|
|
||||||
embedding_dimension = await get_embedding_dim()
|
|
||||||
print(f"Detected embedding dimension: {embedding_dimension}")
|
|
||||||
|
|
||||||
# Initialize LightRAG
|
|
||||||
# We use Oracle DB as the KV/vector/graph storage
|
|
||||||
# You can add `addon_params={"example_number": 1, "language": "Simplfied Chinese"}` to control the prompt
|
|
||||||
rag = LightRAG(
|
|
||||||
# log_level="DEBUG",
|
|
||||||
working_dir=WORKING_DIR,
|
|
||||||
entity_extract_max_gleaning=1,
|
|
||||||
enable_llm_cache=True,
|
|
||||||
enable_llm_cache_for_entity_extract=True,
|
|
||||||
embedding_cache_config=None, # {"enabled": True,"similarity_threshold": 0.90},
|
|
||||||
chunk_token_size=CHUNK_TOKEN_SIZE,
|
|
||||||
llm_model_max_token_size=MAX_TOKENS,
|
|
||||||
llm_model_func=llm_model_func,
|
|
||||||
embedding_func=EmbeddingFunc(
|
|
||||||
embedding_dim=embedding_dimension,
|
|
||||||
max_token_size=500,
|
|
||||||
func=embedding_func,
|
|
||||||
),
|
|
||||||
graph_storage="OracleGraphStorage",
|
|
||||||
kv_storage="OracleKVStorage",
|
|
||||||
vector_storage="OracleVectorDBStorage",
|
|
||||||
addon_params={
|
|
||||||
"example_number": 1,
|
|
||||||
"language": "Simplfied Chinese",
|
|
||||||
"entity_types": ["organization", "person", "geo", "event"],
|
|
||||||
"insert_batch_size": 2,
|
|
||||||
},
|
|
||||||
)
|
|
||||||
await rag.initialize_storages()
|
|
||||||
await initialize_pipeline_status()
|
|
||||||
|
|
||||||
return rag
|
|
||||||
|
|
||||||
|
|
||||||
async def main():
|
|
||||||
try:
|
|
||||||
# Initialize RAG instance
|
|
||||||
rag = await initialize_rag()
|
|
||||||
|
|
||||||
# Extract and Insert into LightRAG storage
|
|
||||||
with open(WORKING_DIR + "/docs.txt", "r", encoding="utf-8") as f:
|
|
||||||
all_text = f.read()
|
|
||||||
texts = [x for x in all_text.split("\n") if x]
|
|
||||||
|
|
||||||
# New mode use pipeline
|
|
||||||
await rag.apipeline_enqueue_documents(texts)
|
|
||||||
await rag.apipeline_process_enqueue_documents()
|
|
||||||
|
|
||||||
# Old method use ainsert
|
|
||||||
# await rag.ainsert(texts)
|
|
||||||
|
|
||||||
# Perform search in different modes
|
|
||||||
modes = ["naive", "local", "global", "hybrid"]
|
|
||||||
for mode in modes:
|
|
||||||
print("=" * 20, mode, "=" * 20)
|
|
||||||
print(
|
|
||||||
await rag.aquery(
|
|
||||||
"What are the top themes in this story?",
|
|
||||||
param=QueryParam(mode=mode),
|
|
||||||
)
|
|
||||||
)
|
|
||||||
print("-" * 100, "\n")
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
print(f"An error occurred: {e}")
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
asyncio.run(main())
|
|
@@ -291,11 +291,10 @@ LightRAG 使用 4 种类型的存储用于不同目的:
|
|||||||
|
|
||||||
```
|
```
|
||||||
JsonKVStorage JsonFile(默认)
|
JsonKVStorage JsonFile(默认)
|
||||||
MongoKVStorage MogonDB
|
|
||||||
RedisKVStorage Redis
|
|
||||||
TiDBKVStorage TiDB
|
|
||||||
PGKVStorage Postgres
|
PGKVStorage Postgres
|
||||||
OracleKVStorage Oracle
|
RedisKVStorage Redis
|
||||||
|
MongoKVStorage MogonDB
|
||||||
|
TiDBKVStorage TiDB
|
||||||
```
|
```
|
||||||
|
|
||||||
* GRAPH_STORAGE 支持的实现名称
|
* GRAPH_STORAGE 支持的实现名称
|
||||||
@@ -303,25 +302,21 @@ OracleKVStorage Oracle
|
|||||||
```
|
```
|
||||||
NetworkXStorage NetworkX(默认)
|
NetworkXStorage NetworkX(默认)
|
||||||
Neo4JStorage Neo4J
|
Neo4JStorage Neo4J
|
||||||
MongoGraphStorage MongoDB
|
PGGraphStorage Postgres
|
||||||
TiDBGraphStorage TiDB
|
|
||||||
AGEStorage AGE
|
AGEStorage AGE
|
||||||
GremlinStorage Gremlin
|
GremlinStorage Gremlin
|
||||||
PGGraphStorage Postgres
|
|
||||||
OracleGraphStorage Postgres
|
|
||||||
```
|
```
|
||||||
|
|
||||||
* VECTOR_STORAGE 支持的实现名称
|
* VECTOR_STORAGE 支持的实现名称
|
||||||
|
|
||||||
```
|
```
|
||||||
NanoVectorDBStorage NanoVector(默认)
|
NanoVectorDBStorage NanoVector(默认)
|
||||||
|
PGVectorStorage Postgres
|
||||||
MilvusVectorDBStorge Milvus
|
MilvusVectorDBStorge Milvus
|
||||||
ChromaVectorDBStorage Chroma
|
ChromaVectorDBStorage Chroma
|
||||||
TiDBVectorDBStorage TiDB
|
|
||||||
PGVectorStorage Postgres
|
|
||||||
FaissVectorDBStorage Faiss
|
FaissVectorDBStorage Faiss
|
||||||
QdrantVectorDBStorage Qdrant
|
QdrantVectorDBStorage Qdrant
|
||||||
OracleVectorDBStorage Oracle
|
TiDBVectorDBStorage TiDB
|
||||||
MongoVectorDBStorage MongoDB
|
MongoVectorDBStorage MongoDB
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@@ -302,11 +302,10 @@ Each storage type have servals implementations:
|
|||||||
|
|
||||||
```
|
```
|
||||||
JsonKVStorage JsonFile(default)
|
JsonKVStorage JsonFile(default)
|
||||||
MongoKVStorage MogonDB
|
|
||||||
RedisKVStorage Redis
|
|
||||||
TiDBKVStorage TiDB
|
|
||||||
PGKVStorage Postgres
|
PGKVStorage Postgres
|
||||||
OracleKVStorage Oracle
|
RedisKVStorage Redis
|
||||||
|
MongoKVStorage MogonDB
|
||||||
|
TiDBKVStorage TiDB
|
||||||
```
|
```
|
||||||
|
|
||||||
* GRAPH_STORAGE supported implement-name
|
* GRAPH_STORAGE supported implement-name
|
||||||
@@ -314,25 +313,21 @@ OracleKVStorage Oracle
|
|||||||
```
|
```
|
||||||
NetworkXStorage NetworkX(defualt)
|
NetworkXStorage NetworkX(defualt)
|
||||||
Neo4JStorage Neo4J
|
Neo4JStorage Neo4J
|
||||||
MongoGraphStorage MongoDB
|
PGGraphStorage Postgres
|
||||||
TiDBGraphStorage TiDB
|
|
||||||
AGEStorage AGE
|
AGEStorage AGE
|
||||||
GremlinStorage Gremlin
|
GremlinStorage Gremlin
|
||||||
PGGraphStorage Postgres
|
|
||||||
OracleGraphStorage Postgres
|
|
||||||
```
|
```
|
||||||
|
|
||||||
* VECTOR_STORAGE supported implement-name
|
* VECTOR_STORAGE supported implement-name
|
||||||
|
|
||||||
```
|
```
|
||||||
NanoVectorDBStorage NanoVector(default)
|
NanoVectorDBStorage NanoVector(default)
|
||||||
MilvusVectorDBStorage Milvus
|
|
||||||
ChromaVectorDBStorage Chroma
|
|
||||||
TiDBVectorDBStorage TiDB
|
|
||||||
PGVectorStorage Postgres
|
PGVectorStorage Postgres
|
||||||
|
MilvusVectorDBStorge Milvus
|
||||||
|
ChromaVectorDBStorage Chroma
|
||||||
FaissVectorDBStorage Faiss
|
FaissVectorDBStorage Faiss
|
||||||
QdrantVectorDBStorage Qdrant
|
QdrantVectorDBStorage Qdrant
|
||||||
OracleVectorDBStorage Oracle
|
TiDBVectorDBStorage TiDB
|
||||||
MongoVectorDBStorage MongoDB
|
MongoVectorDBStorage MongoDB
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@@ -6,7 +6,6 @@ STORAGE_IMPLEMENTATIONS = {
|
|||||||
"RedisKVStorage",
|
"RedisKVStorage",
|
||||||
"TiDBKVStorage",
|
"TiDBKVStorage",
|
||||||
"PGKVStorage",
|
"PGKVStorage",
|
||||||
"OracleKVStorage",
|
|
||||||
],
|
],
|
||||||
"required_methods": ["get_by_id", "upsert"],
|
"required_methods": ["get_by_id", "upsert"],
|
||||||
},
|
},
|
||||||
@@ -19,7 +18,6 @@ STORAGE_IMPLEMENTATIONS = {
|
|||||||
"AGEStorage",
|
"AGEStorage",
|
||||||
"GremlinStorage",
|
"GremlinStorage",
|
||||||
"PGGraphStorage",
|
"PGGraphStorage",
|
||||||
# "OracleGraphStorage",
|
|
||||||
],
|
],
|
||||||
"required_methods": ["upsert_node", "upsert_edge"],
|
"required_methods": ["upsert_node", "upsert_edge"],
|
||||||
},
|
},
|
||||||
@@ -32,7 +30,6 @@ STORAGE_IMPLEMENTATIONS = {
|
|||||||
"PGVectorStorage",
|
"PGVectorStorage",
|
||||||
"FaissVectorDBStorage",
|
"FaissVectorDBStorage",
|
||||||
"QdrantVectorDBStorage",
|
"QdrantVectorDBStorage",
|
||||||
"OracleVectorDBStorage",
|
|
||||||
"MongoVectorDBStorage",
|
"MongoVectorDBStorage",
|
||||||
],
|
],
|
||||||
"required_methods": ["query", "upsert"],
|
"required_methods": ["query", "upsert"],
|
||||||
@@ -41,7 +38,6 @@ STORAGE_IMPLEMENTATIONS = {
|
|||||||
"implementations": [
|
"implementations": [
|
||||||
"JsonDocStatusStorage",
|
"JsonDocStatusStorage",
|
||||||
"PGDocStatusStorage",
|
"PGDocStatusStorage",
|
||||||
"PGDocStatusStorage",
|
|
||||||
"MongoDocStatusStorage",
|
"MongoDocStatusStorage",
|
||||||
],
|
],
|
||||||
"required_methods": ["get_docs_by_status"],
|
"required_methods": ["get_docs_by_status"],
|
||||||
@@ -56,12 +52,6 @@ STORAGE_ENV_REQUIREMENTS: dict[str, list[str]] = {
|
|||||||
"RedisKVStorage": ["REDIS_URI"],
|
"RedisKVStorage": ["REDIS_URI"],
|
||||||
"TiDBKVStorage": ["TIDB_USER", "TIDB_PASSWORD", "TIDB_DATABASE"],
|
"TiDBKVStorage": ["TIDB_USER", "TIDB_PASSWORD", "TIDB_DATABASE"],
|
||||||
"PGKVStorage": ["POSTGRES_USER", "POSTGRES_PASSWORD", "POSTGRES_DATABASE"],
|
"PGKVStorage": ["POSTGRES_USER", "POSTGRES_PASSWORD", "POSTGRES_DATABASE"],
|
||||||
"OracleKVStorage": [
|
|
||||||
"ORACLE_DSN",
|
|
||||||
"ORACLE_USER",
|
|
||||||
"ORACLE_PASSWORD",
|
|
||||||
"ORACLE_CONFIG_DIR",
|
|
||||||
],
|
|
||||||
# Graph Storage Implementations
|
# Graph Storage Implementations
|
||||||
"NetworkXStorage": [],
|
"NetworkXStorage": [],
|
||||||
"Neo4JStorage": ["NEO4J_URI", "NEO4J_USERNAME", "NEO4J_PASSWORD"],
|
"Neo4JStorage": ["NEO4J_URI", "NEO4J_USERNAME", "NEO4J_PASSWORD"],
|
||||||
@@ -78,12 +68,6 @@ STORAGE_ENV_REQUIREMENTS: dict[str, list[str]] = {
|
|||||||
"POSTGRES_PASSWORD",
|
"POSTGRES_PASSWORD",
|
||||||
"POSTGRES_DATABASE",
|
"POSTGRES_DATABASE",
|
||||||
],
|
],
|
||||||
"OracleGraphStorage": [
|
|
||||||
"ORACLE_DSN",
|
|
||||||
"ORACLE_USER",
|
|
||||||
"ORACLE_PASSWORD",
|
|
||||||
"ORACLE_CONFIG_DIR",
|
|
||||||
],
|
|
||||||
# Vector Storage Implementations
|
# Vector Storage Implementations
|
||||||
"NanoVectorDBStorage": [],
|
"NanoVectorDBStorage": [],
|
||||||
"MilvusVectorDBStorage": [],
|
"MilvusVectorDBStorage": [],
|
||||||
@@ -92,12 +76,6 @@ STORAGE_ENV_REQUIREMENTS: dict[str, list[str]] = {
|
|||||||
"PGVectorStorage": ["POSTGRES_USER", "POSTGRES_PASSWORD", "POSTGRES_DATABASE"],
|
"PGVectorStorage": ["POSTGRES_USER", "POSTGRES_PASSWORD", "POSTGRES_DATABASE"],
|
||||||
"FaissVectorDBStorage": [],
|
"FaissVectorDBStorage": [],
|
||||||
"QdrantVectorDBStorage": ["QDRANT_URL"], # QDRANT_API_KEY has default value None
|
"QdrantVectorDBStorage": ["QDRANT_URL"], # QDRANT_API_KEY has default value None
|
||||||
"OracleVectorDBStorage": [
|
|
||||||
"ORACLE_DSN",
|
|
||||||
"ORACLE_USER",
|
|
||||||
"ORACLE_PASSWORD",
|
|
||||||
"ORACLE_CONFIG_DIR",
|
|
||||||
],
|
|
||||||
"MongoVectorDBStorage": [],
|
"MongoVectorDBStorage": [],
|
||||||
# Document Status Storage Implementations
|
# Document Status Storage Implementations
|
||||||
"JsonDocStatusStorage": [],
|
"JsonDocStatusStorage": [],
|
||||||
@@ -112,9 +90,6 @@ STORAGES = {
|
|||||||
"NanoVectorDBStorage": ".kg.nano_vector_db_impl",
|
"NanoVectorDBStorage": ".kg.nano_vector_db_impl",
|
||||||
"JsonDocStatusStorage": ".kg.json_doc_status_impl",
|
"JsonDocStatusStorage": ".kg.json_doc_status_impl",
|
||||||
"Neo4JStorage": ".kg.neo4j_impl",
|
"Neo4JStorage": ".kg.neo4j_impl",
|
||||||
"OracleKVStorage": ".kg.oracle_impl",
|
|
||||||
"OracleGraphStorage": ".kg.oracle_impl",
|
|
||||||
"OracleVectorDBStorage": ".kg.oracle_impl",
|
|
||||||
"MilvusVectorDBStorage": ".kg.milvus_impl",
|
"MilvusVectorDBStorage": ".kg.milvus_impl",
|
||||||
"MongoKVStorage": ".kg.mongo_impl",
|
"MongoKVStorage": ".kg.mongo_impl",
|
||||||
"MongoDocStatusStorage": ".kg.mongo_impl",
|
"MongoDocStatusStorage": ".kg.mongo_impl",
|
||||||
|
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user