diff --git a/README.md b/README.md index 870c2d6f..acfd7c96 100644 --- a/README.md +++ b/README.md @@ -22,6 +22,7 @@ This repository hosts the code of LightRAG. The structure of this code is based ## πŸŽ‰ News +- [x] [2024.10.29]πŸŽ―πŸŽ―πŸ“’πŸ“’Multi-file types are now supported by `textract`. - [x] [2024.10.20]πŸŽ―πŸŽ―πŸ“’πŸ“’We’ve added a new feature to LightRAG: Graph Visualization. - [x] [2024.10.18]πŸŽ―πŸŽ―πŸ“’πŸ“’We’ve added a link to a [LightRAG Introduction Video](https://youtu.be/oageL-1I0GE). Thanks to the author! - [x] [2024.10.17]πŸŽ―πŸŽ―πŸ“’πŸ“’We have created a [Discord channel](https://discord.gg/mvsfu2Tg)! Welcome to join for sharing and discussions! πŸŽ‰πŸŽ‰ @@ -285,6 +286,19 @@ with open("./newText.txt") as f: rag.insert(f.read()) ``` +### Multi-file Type Support + +The `testract` supports reading file types such as TXT, DOCX, PPTX, CSV, and PDF. + +```python +import textract + +file_path = 'TEXT.pdf' +text_content = textract.process(file_path) + +rag.insert(text_content.decode('utf-8')) +``` + ### Graph Visualization
@@ -863,3 +877,6 @@ archivePrefix={arXiv}, primaryClass={cs.IR} } ``` + + +