Merge pull request #164 from MrGidea/main

Multiple file types support input
This commit is contained in:
zrguo
2024-10-29 17:03:34 +08:00
committed by GitHub

View File

@@ -22,6 +22,7 @@ This repository hosts the code of LightRAG. The structure of this code is based
</div> </div>
## 🎉 News ## 🎉 News
- [x] [2024.10.29]🎯🎯📢📢Multi-file types are now supported by `textract`.
- [x] [2024.10.20]🎯🎯📢📢Weve added a new feature to LightRAG: Graph Visualization. - [x] [2024.10.20]🎯🎯📢📢Weve added a new feature to LightRAG: Graph Visualization.
- [x] [2024.10.18]🎯🎯📢📢Weve added a link to a [LightRAG Introduction Video](https://youtu.be/oageL-1I0GE). Thanks to the author! - [x] [2024.10.18]🎯🎯📢📢Weve added a link to a [LightRAG Introduction Video](https://youtu.be/oageL-1I0GE). Thanks to the author!
- [x] [2024.10.17]🎯🎯📢📢We have created a [Discord channel](https://discord.gg/mvsfu2Tg)! Welcome to join for sharing and discussions! 🎉🎉 - [x] [2024.10.17]🎯🎯📢📢We have created a [Discord channel](https://discord.gg/mvsfu2Tg)! Welcome to join for sharing and discussions! 🎉🎉
@@ -285,6 +286,19 @@ with open("./newText.txt") as f:
rag.insert(f.read()) rag.insert(f.read())
``` ```
### Multi-file Type Support
The `testract` supports reading file types such as TXT, DOCX, PPTX, CSV, and PDF.
```python
import textract
file_path = 'TEXT.pdf'
text_content = textract.process(file_path)
rag.insert(text_content.decode('utf-8'))
```
### Graph Visualization ### Graph Visualization
<details> <details>
@@ -863,3 +877,6 @@ archivePrefix={arXiv},
primaryClass={cs.IR} primaryClass={cs.IR}
} }
``` ```