Coverage for src / local_deep_research / document_loaders / __init__.py: 100%
5 statements
« prev ^ index » next coverage.py v7.13.4, created at 2026-02-25 01:07 +0000
« prev ^ index » next coverage.py v7.13.4, created at 2026-02-25 01:07 +0000
1"""
2Document loaders module.
4Provides centralized document loading functionality for both:
5- Collection uploads (bytes from HTTP requests)
6- Local search engine (file paths on disk)
8Supported formats (35+ formats):
10Documents:
11- PDF (.pdf)
12- Text (.txt)
13- Markdown (.md, .markdown)
14- Word (.doc, .docx)
15- RTF (.rtf) - Rich Text Format
16- RST (.rst) - reStructuredText documentation
18Presentations:
19- PowerPoint (.ppt, .pptx)
21Spreadsheets:
22- Excel (.xls, .xlsx)
23- CSV (.csv), TSV (.tsv)
24- ODT (.odt) - OpenDocument text
26Data formats:
27- JSON (.json)
28- YAML (.yaml, .yml)
29- XML (.xml) - important for USPTO patent data
30- TOML (.toml) - config files
32Web content:
33- HTML (.html, .htm)
34- MHTML (.mhtml, .mht) - saved web pages
36Images (OCR):
37- PNG, JPG, JPEG, TIFF, BMP, HEIC
39Research/Notes:
40- Jupyter Notebooks (.ipynb)
41- Evernote exports (.enex)
42- EPUB (.epub) - ebooks (requires pandoc)
43- Org (.org) - Emacs org-mode files
44- Email (.eml) - email messages
45"""
47from .bytes_loader import extract_text_from_bytes, load_from_bytes
48from .json_loader import SimpleJSONLoader
49from .loader_registry import (
50 get_loader_class_for_extension,
51 get_loader_for_path,
52 get_supported_extensions,
53 is_extension_supported,
54)
55from .yaml_loader import YAMLLoader
57__all__ = [
58 # Bytes loading (for uploads)
59 "load_from_bytes",
60 "extract_text_from_bytes",
61 # Path loading (for local files)
62 "get_loader_for_path",
63 # Registry functions
64 "get_supported_extensions",
65 "is_extension_supported",
66 "get_loader_class_for_extension",
67 # Custom loaders
68 "YAMLLoader",
69 "SimpleJSONLoader",
70]