update
This commit is contained in:
171
README.md
171
README.md
@@ -1,6 +1,171 @@
|
||||
# LightRAG
|
||||
# LightRAG: Simple and Fast Retrieval-Augmented Generation
|
||||
<img src='' />
|
||||
|
||||
<a href='https://github.com/HKUDS/GraphEdit'><img src='https://img.shields.io/badge/Project-Page-Green'></a>
|
||||
<a href='https://arxiv.org/abs/'><img src='https://img.shields.io/badge/arXiv--b31b1b'></a>
|
||||
|
||||
This repository hosts the code of LightRAG. The structure of this code is based on [nano-graphrag](https://github.com/gusye1234/nano-graphrag).
|
||||
## Install
|
||||
|
||||
* Install from source
|
||||
|
||||
```
|
||||
cd LightRAG
|
||||
pip install -e .
|
||||
```
|
||||
* Install from PyPI
|
||||
```
|
||||
pip install lightrag-hku
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
* Set OpenAI API key in environment: `export OPENAI_API_KEY="sk-...".`
|
||||
* Download the demo text "A Christmas Carol by Charles Dickens"
|
||||
```
|
||||
curl https://raw.githubusercontent.com/gusye1234/nano-graphrag/main/tests/mock_data.txt > ./book.txt
|
||||
```
|
||||
Use the below python snippet:
|
||||
|
||||
```
|
||||
from lightrag import LightRAG, QueryParam
|
||||
|
||||
rag = LightRAG(working_dir="./dickens")
|
||||
|
||||
with open("./book.txt") as f:
|
||||
rag.insert(f.read())
|
||||
|
||||
# Perform naive search
|
||||
print(rag.query("What are the top themes in this story?", param=QueryParam(mode="naive")))
|
||||
|
||||
# Perform local search
|
||||
print(rag.query("What are the top themes in this story?", param=QueryParam(mode="local")))
|
||||
|
||||
# Perform global search
|
||||
print(rag.query("What are the top themes in this story?", param=QueryParam(mode="global")))
|
||||
|
||||
# Perform hybird search
|
||||
print(rag.query("What are the top themes in this story?", param=QueryParam(mode="hybird")))
|
||||
```
|
||||
Batch Insert
|
||||
```
|
||||
rag.insert(["TEXT1", "TEXT2",...])
|
||||
```
|
||||
Incremental Insert
|
||||
|
||||
```
|
||||
rag = LightRAG(working_dir="./dickens")
|
||||
|
||||
with open("./newText.txt") as f:
|
||||
rag.insert(f.read())
|
||||
```
|
||||
## Evaluation
|
||||
### Dataset
|
||||
The dataset used in LightRAG can be download from [TommyChien/UltraDomain](https://huggingface.co/datasets/TommyChien/UltraDomain).
|
||||
|
||||
### Generate Query
|
||||
LightRAG uses the following prompt to generate high-level queries, with the corresponding code located in `example/generate_query.py`.
|
||||
```
|
||||
Given the following description of a dataset:
|
||||
|
||||
{description}
|
||||
|
||||
Please identify 5 potential users who would engage with this dataset. For each user, list 5 tasks they would perform with this dataset. Then, for each (user, task) combination, generate 5 questions that require a high-level understanding of the entire dataset.
|
||||
|
||||
Output the results in the following structure:
|
||||
- User 1: [user description]
|
||||
- Task 1: [task description]
|
||||
- Question 1:
|
||||
- Question 2:
|
||||
- Question 3:
|
||||
- Question 4:
|
||||
- Question 5:
|
||||
- Task 2: [task description]
|
||||
...
|
||||
- Task 5: [task description]
|
||||
- User 2: [user description]
|
||||
...
|
||||
- User 5: [user description]
|
||||
...
|
||||
```
|
||||
|
||||
### Batch Eval
|
||||
To evaluate the performance of two RAG systems on high-level queries, LightRAG uses the following prompt, with the specific code available in `example/batch_eval.py`.
|
||||
```
|
||||
---Role---
|
||||
You are an expert tasked with evaluating two answers to the same question based on three criteria: **Comprehensiveness**, **Diversity**, and **Empowerment**.
|
||||
---Goal---
|
||||
You will evaluate two answers to the same question based on three criteria: **Comprehensiveness**, **Diversity**, and **Empowerment**.
|
||||
|
||||
- **Comprehensiveness**: How much detail does the answer provide to cover all aspects and details of the question?
|
||||
- **Diversity**: How varied and rich is the answer in providing different perspectives and insights on the question?
|
||||
- **Empowerment**: How well does the answer help the reader understand and make informed judgments about the topic?
|
||||
|
||||
For each criterion, choose the better answer (either Answer 1 or Answer 2) and explain why. Then, select an overall winner based on these three categories.
|
||||
|
||||
Here is the question:
|
||||
{query}
|
||||
|
||||
Here are the two answers:
|
||||
|
||||
**Answer 1:**
|
||||
{answer1}
|
||||
|
||||
**Answer 2:**
|
||||
{answer2}
|
||||
|
||||
Evaluate both answers using the three criteria listed above and provide detailed explanations for each criterion.
|
||||
|
||||
Output your evaluation in the following JSON format:
|
||||
|
||||
{{
|
||||
"Comprehensiveness": {{
|
||||
"Winner": "[Answer 1 or Answer 2]",
|
||||
"Explanation": "[Provide explanation here]"
|
||||
}},
|
||||
"Empowerment": {{
|
||||
"Winner": "[Answer 1 or Answer 2]",
|
||||
"Explanation": "[Provide explanation here]"
|
||||
}},
|
||||
"Overall Winner": {{
|
||||
"Winner": "[Answer 1 or Answer 2]",
|
||||
"Explanation": "[Summarize why this answer is the overall winner based on the three criteria]"
|
||||
}}
|
||||
}}
|
||||
```
|
||||
## Code Structure
|
||||
|
||||
```
|
||||
.
|
||||
├── examples
|
||||
│ ├── batch_eval.py
|
||||
│ ├── generate_query.py
|
||||
│ ├── insert.py
|
||||
│ └── query.py
|
||||
├── lightrag
|
||||
│ ├── __init__.py
|
||||
│ ├── base.py
|
||||
│ ├── lightrag.py
|
||||
│ ├── llm.py
|
||||
│ ├── operate.py
|
||||
│ ├── prompt.py
|
||||
│ ├── storage.py
|
||||
│ └── utils.jpeg
|
||||
├── LICENSE
|
||||
├── README.md
|
||||
├── requirements.txt
|
||||
└── setup.py
|
||||
```
|
||||
## Citation
|
||||
## Acknowledgement
|
||||
|
||||
The structure of this code is based on [nano-graphrag](https://github.com/gusye1234/nano-graphrag).
|
||||
```
|
||||
@article{guo2024lightrag,
|
||||
title={LightRAG: Simple and Fast Retrieval-Augmented Generation},
|
||||
author={Zirui Guo and Lianghao Xia and Yanhua Yu and Tu Ao and Chao Huang},
|
||||
year={2024},
|
||||
eprint={},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.IR}
|
||||
}
|
||||
```
|
||||
|
||||
|
@@ -6,8 +6,8 @@ import jsonlines
|
||||
from openai import OpenAI
|
||||
|
||||
|
||||
def batch_eval(query_file, result1_file, result2_file, output_file_path, api_key):
|
||||
client = OpenAI(api_key=api_key)
|
||||
def batch_eval(query_file, result1_file, result2_file, output_file_path):
|
||||
client = OpenAI()
|
||||
|
||||
with open(query_file, 'r') as f:
|
||||
data = f.read()
|
||||
|
@@ -2,7 +2,7 @@ import os
|
||||
|
||||
from openai import OpenAI
|
||||
|
||||
os.environ["OPENAI_API_KEY"] = ""
|
||||
# os.environ["OPENAI_API_KEY"] = ""
|
||||
|
||||
def openai_complete_if_cache(
|
||||
model="gpt-4o-mini", prompt=None, system_prompt=None, history_messages=[], **kwargs
|
||||
|
@@ -3,7 +3,7 @@ import sys
|
||||
|
||||
from lightrag import LightRAG
|
||||
|
||||
os.environ["OPENAI_API_KEY"] = ""
|
||||
# os.environ["OPENAI_API_KEY"] = ""
|
||||
|
||||
WORKING_DIR = ""
|
||||
|
||||
|
@@ -3,7 +3,7 @@ import sys
|
||||
|
||||
from lightrag import LightRAG, QueryParam
|
||||
|
||||
os.environ["OPENAI_API_KEY"] = ""
|
||||
# os.environ["OPENAI_API_KEY"] = ""
|
||||
|
||||
WORKING_DIR = ""
|
||||
|
||||
|
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Reference in New Issue
Block a user