zrguo
fd9f71e0ee
fix delete_by_doc_id
2025-03-04 13:22:33 +08:00
yangdx
11fdb60fe5
Remove Chinese comments
2025-03-03 01:30:41 +08:00
yangdx
465737efed
Fix linting
2025-03-02 17:32:25 +08:00
yangdx
68bf02abb6
refactor: improve graph querying with label substring matching and security fixes
2025-03-02 16:20:37 +08:00
yangdx
0f1eb42c8d
Add node limit and prioritization for knowledge graph retrieval
...
• Add MAX_GRAPH_NODES limit from env var
• Prioritize nodes by label match & connection
2025-03-02 15:39:14 +08:00
yangdx
1ca6837219
Add max nodes limit for graph retrieval of networkX
...
• Set MAX_GRAPH_NODES env var (default 1000)
• Change edge type to "RELATED"
2025-03-02 12:52:25 +08:00
yangdx
c0b22a8ae2
Merge branch 'main' into add-multi-worker-support
2025-03-02 02:54:57 +08:00
yangdx
7cd25fe5ab
Improve shared storage cleanup and clarify initialization in multi-worker setup
2025-03-02 01:00:27 +08:00
yangdx
e8d0d065f3
fix: Improve async handling and FAISS storage reliability
...
- Add async context manager support
- Fix embedding data type conversion
- Improve error handling in FAISS ops
- Add multiprocess storage sync
2025-03-01 23:35:09 +08:00
yangdx
9aef112d51
Fix incorrect comment about update flag behavior in FAISS implementation
2025-03-01 22:27:12 +08:00
zrguo
4219454fab
fix format
2025-03-01 17:45:06 +08:00
yangdx
e3a40c2fdb
Fix linting
2025-03-01 16:23:34 +08:00
yangdx
d18eb52ccc
Add type ignore comments for asyncpg imports to suppress mypy errors
2025-03-01 15:38:39 +08:00
yangdx
40e9e26edb
feat: add update flags status to API health endpoint
2025-03-01 14:58:26 +08:00
yangdx
41eff2ca2f
Fix data persistence issue in NanoVectorDBStorage
2025-03-01 13:35:00 +08:00
yangdx
35bcfca28f
feat: add multi-process support for FAISS vector storage
...
• Add storage update flag and locks
• Support cross-process index reload
• Add async initialize method
2025-03-01 12:42:30 +08:00
yangdx
d4f6dcfd54
Improve multi-process data synchronization and persistence in storage implementations
...
• Remove _get_client() or _get_graph() from index_done_callback
• Add return value for index_done_callback
2025-03-01 12:41:30 +08:00
yangdx
c07a5039b7
Refactor shared storage locks to separate pipeline, storage and internal locks for deadlock preventing
2025-03-01 10:48:55 +08:00
yangdx
d3de57c1e4
Add multi-process support for vector database and graph storage with lock flags
...
• Implement storage lock mechanism
• Add update flag handling
• Add cross-process reload detection
2025-03-01 10:37:05 +08:00
yangdx
d704512139
Refactor shared storage module to improve async handling and naming consistency
...
• Add async support for get_namespace_data
• Rename get_update_flags to get_update_flag
• Rename set_update_flag to set_all_update_flags
• Update docstrings for clarity
• Fix typos in log messages
2025-03-01 05:01:26 +08:00
yangdx
fd76e00c6a
Refactor storage initialization to separate object creation from data loading
...
• Split __post_init__ and initialize()
• Move data loading to initialize()
• Add FastAPI lifespan integration
2025-03-01 03:48:19 +08:00
yangdx
b3328542c7
refactor: migrate synchronous locks to async locks for improved concurrency
...
• Add UnifiedLock wrapper class
• Convert with blocks to async with
2025-03-01 02:22:35 +08:00
yangdx
a721421bd8
Add async support and update flag mechanism for shared storage
...
• Use asyncio.Lock instead of thread lock for single process mode
• Add storage update notification system
2025-03-01 01:49:26 +08:00
yangdx
731d820bcc
Remove redundancy set_logger function and related calls
2025-02-28 21:46:45 +08:00
yangdx
8cd45161f2
feat: add history_messages to track pipeline processing progress
...
• Add shared history_messages list
• Track pipeline progress with messages
2025-02-28 13:53:40 +08:00
yangdx
b090a22be7
Add concurrency check for auto scan task to prevent duplicate scans
...
• Add pipeline status check before scan
• Add storage lock protection
• Add latest_message to status tracking
• Add helpful log message at startup
2025-02-28 12:22:20 +08:00
yangdx
b2da69b7f1
Add pipeline status control for concurrent document indexing processes
...
• Add shared pipeline status namespace
• Implement concurrent process control
• Add request queuing for pending jobs
2025-02-28 11:52:42 +08:00
yangdx
cd7648791a
Fix linting
2025-02-28 01:25:59 +08:00
yangdx
3dcfa561d7
Remove debug logging
2025-02-28 01:15:12 +08:00
yangdx
291e0c1b14
revert vector and graph use local data(single process)
2025-02-28 01:14:25 +08:00
yangdx
05d03638ec
Clean up logging output and remove redundant log messages
2025-02-27 20:17:28 +08:00
yangdx
05cf029bcc
fix: convert multiprocessing managed dict to normal dict before JSON dump
2025-02-27 20:16:53 +08:00
yangdx
64f22966a3
Fix linting
2025-02-27 19:05:51 +08:00
yangdx
946095ef80
Fix multiprocess dict creation logic, add process safety locks for namespace creation.
2025-02-27 19:03:53 +08:00
yangdx
e881bc0709
simplify process state management by removing redundant multiprocess flag
2025-02-27 15:36:12 +08:00
yangdx
1699b10a25
Refactor direct client/graph access to reduce redundant get calls in vector/graph ops
2025-02-27 15:14:54 +08:00
yangdx
438e4780a8
Refactor Faiss index access with helper method to improve code organization
2025-02-27 15:09:19 +08:00
yangdx
f007ebf006
Refactor initialization logic for vector, KV and graph storage implementations
...
• Add try_initialize_namespace check
• Move init code out of storage locks
• Reduce redundant init conditions
• Simplify initialization flow
• Make init thread-safer
2025-02-27 14:55:07 +08:00
yangdx
03d05b094d
Improve Gunicorn support and cleanup shared storage initialization
...
• Move Gunicorn check before other startup
• Improve startup flow organization
2025-02-27 14:13:42 +08:00
yangdx
7aec78833c
Implement Gunicorn+Uvicorn integration for shared data preloading
...
- Create run_with_gunicorn.py script to properly initialize shared data in the
main process before forking worker processes
- Revert unvicorn to single process mode only, and let gunicorn do all the multi-process jobs
2025-02-27 13:25:22 +08:00
yangdx
7c237920b1
Refactor shared storage to support both single and multi-process modes
...
• Initialize storage based on worker count
• Remove redundant global variable checks
• Add explicit mutex initialization
• Centralize shared storage initialization
• Fix process/thread lock selection logic
2025-02-27 08:48:33 +08:00
yangdx
7436c06f6c
Fix linting
2025-02-26 18:11:16 +08:00
yangdx
7d12715f09
Refactor shared storage to safely handle multi-process initialization and data sharing
...
• Add namespace initialization check
• Use atomic operations for shared data
2025-02-26 18:11:02 +08:00
yangdx
145bacc773
Add empty graph creation logging in NetworkXStorage
2025-02-26 17:42:30 +08:00
yangdx
2c019dbc7b
Refactor storage initialization to avoid redundant intitial data loads across processes, show init logs to first load only
2025-02-26 12:28:49 +08:00
yangdx
2752a764ae
Refactor storage implementations to support both single and multi-process modes
...
• Add shared storage management module
• Support process/thread lock based on mode
2025-02-26 05:38:38 +08:00
yangdx
a642bb3190
refactor: use shared manager from main process for storage implementations.
2025-02-25 12:08:49 +08:00
yangdx
e22e014f22
feat(storage): Add shared memory support for FAISS
2025-02-25 11:25:06 +08:00
yangdx
362321204f
Merge branch 'main' into add-multi-worker-support
2025-02-25 11:15:12 +08:00
yangdx
087d5770b0
feat(storage): Add shared memory support for file-based storage implementations
...
This commit adds multiprocessing shared memory support to file-based storage implementations:
- JsonDocStatusStorage
- JsonKVStorage
- NanoVectorDBStorage
- NetworkXStorage
Each storage module now uses module-level global variables with multiprocessing.Manager() to ensure data consistency across multiple uvicorn workers. All processes will see
updates immediately when data is modified through ainsert function.
2025-02-25 11:10:13 +08:00