Merge pull request #56 from sank8-2/dev
chore: added pre-commit-hooks and ruff formatting for commit-hooks
This commit is contained in:
50
README.md
50
README.md
@@ -16,16 +16,16 @@
|
||||
<a href="https://pypi.org/project/lightrag-hku/"><img src="https://img.shields.io/pypi/v/lightrag-hku.svg"></a>
|
||||
<a href="https://pepy.tech/project/lightrag-hku"><img src="https://static.pepy.tech/badge/lightrag-hku/month"></a>
|
||||
</p>
|
||||
|
||||
|
||||
This repository hosts the code of LightRAG. The structure of this code is based on [nano-graphrag](https://github.com/gusye1234/nano-graphrag).
|
||||

|
||||
</div>
|
||||
|
||||
## 🎉 News
|
||||
## 🎉 News
|
||||
- [x] [2024.10.18]🎯🎯📢📢We’ve added a link to a [LightRAG Introduction Video](https://youtu.be/oageL-1I0GE). Thanks to the author!
|
||||
- [x] [2024.10.17]🎯🎯📢📢We have created a [Discord channel](https://discord.gg/mvsfu2Tg)! Welcome to join for sharing and discussions! 🎉🎉
|
||||
- [x] [2024.10.16]🎯🎯📢📢LightRAG now supports [Ollama models](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#quick-start)!
|
||||
- [x] [2024.10.15]🎯🎯📢📢LightRAG now supports [Hugging Face models](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#quick-start)!
|
||||
- [x] [2024.10.16]🎯🎯📢📢LightRAG now supports [Ollama models](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#quick-start)!
|
||||
- [x] [2024.10.15]🎯🎯📢📢LightRAG now supports [Hugging Face models](https://github.com/HKUDS/LightRAG?tab=readme-ov-file#quick-start)!
|
||||
|
||||
## Install
|
||||
|
||||
@@ -92,7 +92,7 @@ print(rag.query("What are the top themes in this story?", param=QueryParam(mode=
|
||||
<details>
|
||||
<summary> Using Open AI-like APIs </summary>
|
||||
|
||||
LightRAG also support Open AI-like chat/embeddings APIs:
|
||||
LightRAG also supports Open AI-like chat/embeddings APIs:
|
||||
```python
|
||||
async def llm_model_func(
|
||||
prompt, system_prompt=None, history_messages=[], **kwargs
|
||||
@@ -129,7 +129,7 @@ rag = LightRAG(
|
||||
|
||||
<details>
|
||||
<summary> Using Hugging Face Models </summary>
|
||||
|
||||
|
||||
If you want to use Hugging Face models, you only need to set LightRAG as follows:
|
||||
```python
|
||||
from lightrag.llm import hf_model_complete, hf_embedding
|
||||
@@ -145,7 +145,7 @@ rag = LightRAG(
|
||||
embedding_dim=384,
|
||||
max_token_size=5000,
|
||||
func=lambda texts: hf_embedding(
|
||||
texts,
|
||||
texts,
|
||||
tokenizer=AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2"),
|
||||
embed_model=AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
|
||||
)
|
||||
@@ -157,7 +157,7 @@ rag = LightRAG(
|
||||
<details>
|
||||
<summary> Using Ollama Models </summary>
|
||||
If you want to use Ollama models, you only need to set LightRAG as follows:
|
||||
|
||||
|
||||
```python
|
||||
from lightrag.llm import ollama_model_complete, ollama_embedding
|
||||
|
||||
@@ -171,7 +171,7 @@ rag = LightRAG(
|
||||
embedding_dim=768,
|
||||
max_token_size=8192,
|
||||
func=lambda texts: ollama_embedding(
|
||||
texts,
|
||||
texts,
|
||||
embed_model="nomic-embed-text"
|
||||
)
|
||||
),
|
||||
@@ -196,14 +196,14 @@ with open("./newText.txt") as f:
|
||||
```
|
||||
## Evaluation
|
||||
### Dataset
|
||||
The dataset used in LightRAG can be download from [TommyChien/UltraDomain](https://huggingface.co/datasets/TommyChien/UltraDomain).
|
||||
The dataset used in LightRAG can be downloaded from [TommyChien/UltraDomain](https://huggingface.co/datasets/TommyChien/UltraDomain).
|
||||
|
||||
### Generate Query
|
||||
LightRAG uses the following prompt to generate high-level queries, with the corresponding code located in `example/generate_query.py`.
|
||||
LightRAG uses the following prompt to generate high-level queries, with the corresponding code in `example/generate_query.py`.
|
||||
|
||||
<details>
|
||||
<summary> Prompt </summary>
|
||||
|
||||
|
||||
```python
|
||||
Given the following description of a dataset:
|
||||
|
||||
@@ -228,18 +228,18 @@ Output the results in the following structure:
|
||||
...
|
||||
```
|
||||
</details>
|
||||
|
||||
|
||||
### Batch Eval
|
||||
To evaluate the performance of two RAG systems on high-level queries, LightRAG uses the following prompt, with the specific code available in `example/batch_eval.py`.
|
||||
|
||||
<details>
|
||||
<summary> Prompt </summary>
|
||||
|
||||
|
||||
```python
|
||||
---Role---
|
||||
You are an expert tasked with evaluating two answers to the same question based on three criteria: **Comprehensiveness**, **Diversity**, and **Empowerment**.
|
||||
---Goal---
|
||||
You will evaluate two answers to the same question based on three criteria: **Comprehensiveness**, **Diversity**, and **Empowerment**.
|
||||
You will evaluate two answers to the same question based on three criteria: **Comprehensiveness**, **Diversity**, and **Empowerment**.
|
||||
|
||||
- **Comprehensiveness**: How much detail does the answer provide to cover all aspects and details of the question?
|
||||
- **Diversity**: How varied and rich is the answer in providing different perspectives and insights on the question?
|
||||
@@ -303,7 +303,7 @@ Output your evaluation in the following JSON format:
|
||||
| **Empowerment** | 36.69% | **63.31%** | 45.09% | **54.91%** | 42.81% | **57.19%** | **52.94%** | 47.06% |
|
||||
| **Overall** | 43.62% | **56.38%** | 45.98% | **54.02%** | 45.70% | **54.30%** | **51.86%** | 48.14% |
|
||||
|
||||
## Reproduce
|
||||
## Reproduce
|
||||
All the code can be found in the `./reproduce` directory.
|
||||
|
||||
### Step-0 Extract Unique Contexts
|
||||
@@ -311,7 +311,7 @@ First, we need to extract unique contexts in the datasets.
|
||||
|
||||
<details>
|
||||
<summary> Code </summary>
|
||||
|
||||
|
||||
```python
|
||||
def extract_unique_contexts(input_directory, output_directory):
|
||||
|
||||
@@ -370,12 +370,12 @@ For the extracted contexts, we insert them into the LightRAG system.
|
||||
|
||||
<details>
|
||||
<summary> Code </summary>
|
||||
|
||||
|
||||
```python
|
||||
def insert_text(rag, file_path):
|
||||
with open(file_path, mode='r') as f:
|
||||
unique_contexts = json.load(f)
|
||||
|
||||
|
||||
retries = 0
|
||||
max_retries = 3
|
||||
while retries < max_retries:
|
||||
@@ -393,11 +393,11 @@ def insert_text(rag, file_path):
|
||||
|
||||
### Step-2 Generate Queries
|
||||
|
||||
We extract tokens from both the first half and the second half of each context in the dataset, then combine them as the dataset description to generate queries.
|
||||
We extract tokens from the first and the second half of each context in the dataset, then combine them as dataset descriptions to generate queries.
|
||||
|
||||
<details>
|
||||
<summary> Code </summary>
|
||||
|
||||
|
||||
```python
|
||||
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
|
||||
|
||||
@@ -410,7 +410,7 @@ def get_summary(context, tot_tokens=2000):
|
||||
|
||||
summary_tokens = start_tokens + end_tokens
|
||||
summary = tokenizer.convert_tokens_to_string(summary_tokens)
|
||||
|
||||
|
||||
return summary
|
||||
```
|
||||
</details>
|
||||
@@ -420,12 +420,12 @@ For the queries generated in Step-2, we will extract them and query LightRAG.
|
||||
|
||||
<details>
|
||||
<summary> Code </summary>
|
||||
|
||||
|
||||
```python
|
||||
def extract_queries(file_path):
|
||||
with open(file_path, 'r') as f:
|
||||
data = f.read()
|
||||
|
||||
|
||||
data = data.replace('**', '')
|
||||
|
||||
queries = re.findall(r'- Question \d+: (.+)', data)
|
||||
@@ -479,7 +479,7 @@ def extract_queries(file_path):
|
||||
|
||||
```python
|
||||
@article{guo2024lightrag,
|
||||
title={LightRAG: Simple and Fast Retrieval-Augmented Generation},
|
||||
title={LightRAG: Simple and Fast Retrieval-Augmented Generation},
|
||||
author={Zirui Guo and Lianghao Xia and Yanhua Yu and Tu Ao and Chao Huang},
|
||||
year={2024},
|
||||
eprint={2410.05779},
|
||||
|
Reference in New Issue
Block a user