update

2024-10-08 14:39:05 +08:00
parent 44463503fd
commit c18697305e
15 changed files with 174 additions and 9 deletions
--- a/README.md
+++ b/README.md
@@ -1,6 +1,171 @@
-# LightRAG
+# LightRAG: Simple and Fast Retrieval-Augmented Generation
 <img src='' />
 <a href='https://github.com/HKUDS/GraphEdit'><img src='https://img.shields.io/badge/Project-Page-Green'></a>
 <a href='https://arxiv.org/abs/'><img src='https://img.shields.io/badge/arXiv--b31b1b'></a>
 This repository hosts the code of LightRAG. The structure of this code is based on [nano-graphrag](https://github.com/gusye1234/nano-graphrag).
 ## Install
 * Install from source
 ```
 cd LightRAG
 pip install -e .
 ```
 * Install from PyPI
 ```
 pip install lightrag-hku
 ```
 ## Quick Start
 * Set OpenAI API key in environment: `export OPENAI_API_KEY="sk-...".`
 * Download the demo text "A Christmas Carol by Charles Dickens" 
 ```
 curl https://raw.githubusercontent.com/gusye1234/nano-graphrag/main/tests/mock_data.txt > ./book.txt
 ```
 Use the below python snippet:
 ```
 from lightrag import LightRAG, QueryParam
 rag = LightRAG(working_dir="./dickens")
 with open("./book.txt") as f:
    rag.insert(f.read())
 # Perform naive search
 print(rag.query("What are the top themes in this story?", param=QueryParam(mode="naive")))
 # Perform local search
 print(rag.query("What are the top themes in this story?", param=QueryParam(mode="local")))
 # Perform global search
 print(rag.query("What are the top themes in this story?", param=QueryParam(mode="global")))
 # Perform hybird search
 print(rag.query("What are the top themes in this story?", param=QueryParam(mode="hybird")))
 ```
 Batch Insert
 ```
 rag.insert(["TEXT1", "TEXT2",...])
 ```
 Incremental Insert
 ```
 rag = LightRAG(working_dir="./dickens")
 with open("./newText.txt") as f:
    rag.insert(f.read())
 ```
 ## Evaluation
 ### Dataset
 The dataset used in LightRAG can be download from [TommyChien/UltraDomain](https://huggingface.co/datasets/TommyChien/UltraDomain).
 ### Generate Query
 LightRAG uses the following prompt to generate high-level queries, with the corresponding code located in `example/generate_query.py`.
 ```
 Given the following description of a dataset:
 {description}
 Please identify 5 potential users who would engage with this dataset. For each user, list 5 tasks they would perform with this dataset. Then, for each (user, task) combination, generate 5 questions that require a high-level understanding of the entire dataset.
 Output the results in the following structure:
 - User 1: [user description]
    - Task 1: [task description]
        - Question 1:
        - Question 2:
        - Question 3:
        - Question 4:
        - Question 5:
    - Task 2: [task description]
        ...
    - Task 5: [task description]
 - User 2: [user description]
    ...
 - User 5: [user description]
    ...
 ```
 ### Batch Eval
 To evaluate the performance of two RAG systems on high-level queries, LightRAG uses the following prompt, with the specific code available in `example/batch_eval.py`.
 ```
 ---Role---
 You are an expert tasked with evaluating two answers to the same question based on three criteria: **Comprehensiveness**, **Diversity**, and **Empowerment**.
 ---Goal---
 You will evaluate two answers to the same question based on three criteria: **Comprehensiveness**, **Diversity**, and **Empowerment**. 
 - **Comprehensiveness**: How much detail does the answer provide to cover all aspects and details of the question?
 - **Diversity**: How varied and rich is the answer in providing different perspectives and insights on the question?
 - **Empowerment**: How well does the answer help the reader understand and make informed judgments about the topic?
 For each criterion, choose the better answer (either Answer 1 or Answer 2) and explain why. Then, select an overall winner based on these three categories.
 Here is the question:
 {query}
 Here are the two answers:
 **Answer 1:**
 {answer1}
 **Answer 2:**
 {answer2}
 Evaluate both answers using the three criteria listed above and provide detailed explanations for each criterion.
 Output your evaluation in the following JSON format:
 {{
    "Comprehensiveness": {{
        "Winner": "[Answer 1 or Answer 2]",
        "Explanation": "[Provide explanation here]"
    }},
    "Empowerment": {{
        "Winner": "[Answer 1 or Answer 2]",
        "Explanation": "[Provide explanation here]"
    }},
    "Overall Winner": {{
        "Winner": "[Answer 1 or Answer 2]",
        "Explanation": "[Summarize why this answer is the overall winner based on the three criteria]"
    }}
 }}
 ```
 ## Code Structure
 ```
 .
 ├── examples
 │   ├── batch_eval.py
 │   ├── generate_query.py
 │   ├── insert.py
 │   └── query.py
 ├── lightrag
 │   ├── __init__.py
 │   ├── base.py
 │   ├── lightrag.py
 │   ├── llm.py
 │   ├── operate.py
 │   ├── prompt.py
 │   ├── storage.py
 │   └── utils.jpeg
 ├── LICENSE
 ├── README.md
 ├── requirements.txt
 └── setup.py
 ```
 ## Citation
 ## Acknowledgement
-The structure of this code is based on [nano-graphrag](https://github.com/gusye1234/nano-graphrag).
+```
@article{guo2024lightrag,
 title={LightRAG: Simple and Fast Retrieval-Augmented Generation}, 
 author={Zirui Guo and Lianghao Xia and Yanhua Yu and Tu Ao and Chao Huang},
 year={2024},
 eprint={},
 archivePrefix={arXiv},
 primaryClass={cs.IR}
 }
 ```
--- a/examples/batch_eval.py
+++ b/examples/batch_eval.py
@@ -6,8 +6,8 @@ import jsonlines
 from openai import OpenAI
-def batch_eval(query_file, result1_file, result2_file, output_file_path, api_key):
+def batch_eval(query_file, result1_file, result2_file, output_file_path):
-    client = OpenAI(api_key=api_key)
+    client = OpenAI()
    with open(query_file, 'r') as f:
        data = f.read()
--- a/examples/generate_query.py
+++ b/examples/generate_query.py
@@ -2,7 +2,7 @@ import os
 from openai import OpenAI
-os.environ["OPENAI_API_KEY"] = ""
+# os.environ["OPENAI_API_KEY"] = ""
 def openai_complete_if_cache(
    model="gpt-4o-mini", prompt=None, system_prompt=None, history_messages=[], **kwargs
--- a/examples/insert.py
+++ b/examples/insert.py
@@ -3,7 +3,7 @@ import sys
 from lightrag import LightRAG
-os.environ["OPENAI_API_KEY"] = ""
+# os.environ["OPENAI_API_KEY"] = ""
 WORKING_DIR = ""
--- a/examples/query.py
+++ b/examples/query.py
@@ -3,7 +3,7 @@ import sys
 from lightrag import LightRAG, QueryParam
-os.environ["OPENAI_API_KEY"] = ""
+# os.environ["OPENAI_API_KEY"] = ""
 WORKING_DIR = ""
--- a/lightrag/pycache/init.cpython-310.pyc
+++ b/lightrag/pycache/init.cpython-310.pyc
--- a/lightrag/pycache/base.cpython-310.pyc
+++ b/lightrag/pycache/base.cpython-310.pyc
--- a/lightrag/pycache/lightrag.cpython-310.pyc
+++ b/lightrag/pycache/lightrag.cpython-310.pyc
--- a/lightrag/pycache/llm.cpython-310.pyc
+++ b/lightrag/pycache/llm.cpython-310.pyc
--- a/lightrag/pycache/myrag.cpython-310.pyc
+++ b/lightrag/pycache/myrag.cpython-310.pyc
--- a/lightrag/pycache/operate.cpython-310.pyc
+++ b/lightrag/pycache/operate.cpython-310.pyc
--- a/lightrag/pycache/prompt.cpython-310.pyc
+++ b/lightrag/pycache/prompt.cpython-310.pyc
--- a/lightrag/pycache/storage.cpython-310.pyc
+++ b/lightrag/pycache/storage.cpython-310.pyc
--- a/lightrag/pycache/utils.cpython-310.pyc
+++ b/lightrag/pycache/utils.cpython-310.pyc
--- a/setup.py
+++ b/setup.py
@@ -21,7 +21,7 @@ with open("./requirements.txt") as f:
        deps.append(line.strip())
 setuptools.setup(
-    name="lightrag",
+    name="light-rag",
    url=vars2readme["__url__"],
    version=vars2readme["__version__"],
    author=vars2readme["__author__"],