9fd2add75535f41c12d6a7aabca0d0e940180d8d
Quick start
Currently, the test supports pptx, pdf, csv, docx, txt file types
- install textract
pip install textract
- example
import textract
# 指定要提取文本的文件路径
file_path = 'path/to/your/file.pdf'
# 从文件中提取文本
text_content = textract.process(file_path)
# 打印提取的文本
print(text_content.decode('utf-8'))
Description
Languages
Python
73.3%
TypeScript
25.7%
JavaScript
0.6%
CSS
0.3%
Dockerfile
0.1%