2024-10-29 15:45:00 +08:00
2024-10-28 17:05:38 +02:00
2024-10-28 17:05:38 +02:00
2024-10-25 19:25:26 +08:00
2024-10-25 13:32:25 +05:30
2024-10-08 10:38:50 +08:00
2024-10-29 15:45:00 +08:00
2024-10-26 16:11:15 +08:00

Quick start

  • install textract
pip install textract
  • example
import textract
# 指定要提取文本的文件路径
file_path = 'path/to/your/file.pdf'
# 从文件中提取文本
text_content = textract.process(file_path)
# 打印提取的文本
print(text_content.decode('utf-8'))
Description
No description provided
Readme 44 MiB
Languages
Python 73.3%
TypeScript 25.7%
JavaScript 0.6%
CSS 0.3%
Dockerfile 0.1%