update
This commit is contained in:
171
README.md
171
README.md
@@ -1,6 +1,171 @@
|
||||
# LightRAG
|
||||
# LightRAG: Simple and Fast Retrieval-Augmented Generation
|
||||
<img src='' />
|
||||
|
||||
<a href='https://github.com/HKUDS/GraphEdit'><img src='https://img.shields.io/badge/Project-Page-Green'></a>
|
||||
<a href='https://arxiv.org/abs/'><img src='https://img.shields.io/badge/arXiv--b31b1b'></a>
|
||||
|
||||
This repository hosts the code of LightRAG. The structure of this code is based on [nano-graphrag](https://github.com/gusye1234/nano-graphrag).
|
||||
## Install
|
||||
|
||||
* Install from source
|
||||
|
||||
```
|
||||
cd LightRAG
|
||||
pip install -e .
|
||||
```
|
||||
* Install from PyPI
|
||||
```
|
||||
pip install lightrag-hku
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
* Set OpenAI API key in environment: `export OPENAI_API_KEY="sk-...".`
|
||||
* Download the demo text "A Christmas Carol by Charles Dickens"
|
||||
```
|
||||
curl https://raw.githubusercontent.com/gusye1234/nano-graphrag/main/tests/mock_data.txt > ./book.txt
|
||||
```
|
||||
Use the below python snippet:
|
||||
|
||||
```
|
||||
from lightrag import LightRAG, QueryParam
|
||||
|
||||
rag = LightRAG(working_dir="./dickens")
|
||||
|
||||
with open("./book.txt") as f:
|
||||
rag.insert(f.read())
|
||||
|
||||
# Perform naive search
|
||||
print(rag.query("What are the top themes in this story?", param=QueryParam(mode="naive")))
|
||||
|
||||
# Perform local search
|
||||
print(rag.query("What are the top themes in this story?", param=QueryParam(mode="local")))
|
||||
|
||||
# Perform global search
|
||||
print(rag.query("What are the top themes in this story?", param=QueryParam(mode="global")))
|
||||
|
||||
# Perform hybird search
|
||||
print(rag.query("What are the top themes in this story?", param=QueryParam(mode="hybird")))
|
||||
```
|
||||
Batch Insert
|
||||
```
|
||||
rag.insert(["TEXT1", "TEXT2",...])
|
||||
```
|
||||
Incremental Insert
|
||||
|
||||
```
|
||||
rag = LightRAG(working_dir="./dickens")
|
||||
|
||||
with open("./newText.txt") as f:
|
||||
rag.insert(f.read())
|
||||
```
|
||||
## Evaluation
|
||||
### Dataset
|
||||
The dataset used in LightRAG can be download from [TommyChien/UltraDomain](https://huggingface.co/datasets/TommyChien/UltraDomain).
|
||||
|
||||
### Generate Query
|
||||
LightRAG uses the following prompt to generate high-level queries, with the corresponding code located in `example/generate_query.py`.
|
||||
```
|
||||
Given the following description of a dataset:
|
||||
|
||||
{description}
|
||||
|
||||
Please identify 5 potential users who would engage with this dataset. For each user, list 5 tasks they would perform with this dataset. Then, for each (user, task) combination, generate 5 questions that require a high-level understanding of the entire dataset.
|
||||
|
||||
Output the results in the following structure:
|
||||
- User 1: [user description]
|
||||
- Task 1: [task description]
|
||||
- Question 1:
|
||||
- Question 2:
|
||||
- Question 3:
|
||||
- Question 4:
|
||||
- Question 5:
|
||||
- Task 2: [task description]
|
||||
...
|
||||
- Task 5: [task description]
|
||||
- User 2: [user description]
|
||||
...
|
||||
- User 5: [user description]
|
||||
...
|
||||
```
|
||||
|
||||
### Batch Eval
|
||||
To evaluate the performance of two RAG systems on high-level queries, LightRAG uses the following prompt, with the specific code available in `example/batch_eval.py`.
|
||||
```
|
||||
---Role---
|
||||
You are an expert tasked with evaluating two answers to the same question based on three criteria: **Comprehensiveness**, **Diversity**, and **Empowerment**.
|
||||
---Goal---
|
||||
You will evaluate two answers to the same question based on three criteria: **Comprehensiveness**, **Diversity**, and **Empowerment**.
|
||||
|
||||
- **Comprehensiveness**: How much detail does the answer provide to cover all aspects and details of the question?
|
||||
- **Diversity**: How varied and rich is the answer in providing different perspectives and insights on the question?
|
||||
- **Empowerment**: How well does the answer help the reader understand and make informed judgments about the topic?
|
||||
|
||||
For each criterion, choose the better answer (either Answer 1 or Answer 2) and explain why. Then, select an overall winner based on these three categories.
|
||||
|
||||
Here is the question:
|
||||
{query}
|
||||
|
||||
Here are the two answers:
|
||||
|
||||
**Answer 1:**
|
||||
{answer1}
|
||||
|
||||
**Answer 2:**
|
||||
{answer2}
|
||||
|
||||
Evaluate both answers using the three criteria listed above and provide detailed explanations for each criterion.
|
||||
|
||||
Output your evaluation in the following JSON format:
|
||||
|
||||
{{
|
||||
"Comprehensiveness": {{
|
||||
"Winner": "[Answer 1 or Answer 2]",
|
||||
"Explanation": "[Provide explanation here]"
|
||||
}},
|
||||
"Empowerment": {{
|
||||
"Winner": "[Answer 1 or Answer 2]",
|
||||
"Explanation": "[Provide explanation here]"
|
||||
}},
|
||||
"Overall Winner": {{
|
||||
"Winner": "[Answer 1 or Answer 2]",
|
||||
"Explanation": "[Summarize why this answer is the overall winner based on the three criteria]"
|
||||
}}
|
||||
}}
|
||||
```
|
||||
## Code Structure
|
||||
|
||||
```
|
||||
.
|
||||
├── examples
|
||||
│ ├── batch_eval.py
|
||||
│ ├── generate_query.py
|
||||
│ ├── insert.py
|
||||
│ └── query.py
|
||||
├── lightrag
|
||||
│ ├── __init__.py
|
||||
│ ├── base.py
|
||||
│ ├── lightrag.py
|
||||
│ ├── llm.py
|
||||
│ ├── operate.py
|
||||
│ ├── prompt.py
|
||||
│ ├── storage.py
|
||||
│ └── utils.jpeg
|
||||
├── LICENSE
|
||||
├── README.md
|
||||
├── requirements.txt
|
||||
└── setup.py
|
||||
```
|
||||
## Citation
|
||||
## Acknowledgement
|
||||
|
||||
The structure of this code is based on [nano-graphrag](https://github.com/gusye1234/nano-graphrag).
|
||||
```
|
||||
@article{guo2024lightrag,
|
||||
title={LightRAG: Simple and Fast Retrieval-Augmented Generation},
|
||||
author={Zirui Guo and Lianghao Xia and Yanhua Yu and Tu Ao and Chao Huang},
|
||||
year={2024},
|
||||
eprint={},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.IR}
|
||||
}
|
||||
```
|
||||
|
||||
|
Reference in New Issue
Block a user