Merge pull request #164 from MrGidea/main

Multiple file types support input
2024-10-29 17:03:34 +08:00
parent 818074d258 5f3537c5ed
commit 8ebeeb465f
1 changed files with 17 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -22,6 +22,7 @@ This repository hosts the code of LightRAG. The structure of this code is based
 </div>

 ## 🎉 News
+- [x] [2024.10.29]🎯🎯📢📢Multi-file types are now supported by `textract`.
 - [x] [2024.10.20]🎯🎯📢📢We’ve added a new feature to LightRAG: Graph Visualization.
 - [x] [2024.10.18]🎯🎯📢📢We’ve added a link to a [LightRAG Introduction Video](https://youtu.be/oageL-1I0GE). Thanks to the author!
 - [x] [2024.10.17]🎯🎯📢📢We have created a [Discord channel](https://discord.gg/mvsfu2Tg)! Welcome to join for sharing and discussions! 🎉🎉
@@ -285,6 +286,19 @@ with open("./newText.txt") as f:
    rag.insert(f.read())
 ```

+### Multi-file Type Support
+
+The `testract` supports reading file types such as TXT, DOCX, PPTX, CSV, and PDF.
+
+```python
+import textract
+
+file_path = 'TEXT.pdf'
+text_content = textract.process(file_path)
+
+rag.insert(text_content.decode('utf-8'))
+```
+
 ### Graph Visualization

 <details>
@@ -863,3 +877,6 @@ archivePrefix={arXiv},
 primaryClass={cs.IR}
 }
 ```
+
+
+