You've already forked DataMate
feat(auto-annotation): add batch progress updates to reduce DB write pressure
Some checks failed
Some checks failed
Throttle progress updates to reduce database write operations during large dataset processing. Key features: - Add PROGRESS_UPDATE_INTERVAL config (default 2.0s, configurable via AUTO_ANNOTATION_PROGRESS_INTERVAL env) - Conditional progress updates: Only write to DB when (now - last_update) >= interval - Use time.monotonic() for timing (immune to system clock adjustments) - Final status updates (completed/stopped/failed) always execute (not throttled) Implementation: - Initialize last_progress_update timestamp before as_completed() loop - Replace unconditional _update_task_status() with conditional call based on time interval - Update docstring to reflect throttling capability Performance impact (T=2s): - 1,000 files / 100s processing: DB writes reduced from 1,000 to ~50 (95% reduction) - 10,000 files / 500s processing: DB writes reduced from 10,000 to ~250 (97.5% reduction) - Small datasets (10 files): Minimal difference Backward compatibility: - PROGRESS_UPDATE_INTERVAL=0: Updates every file (identical to previous behavior) - Heartbeat mechanism unaffected (2s interval << 300s timeout) - Stop check mechanism independent of progress updates - Final status updates always execute Testing: - 14 unit tests all passed (11 existing + 3 new): * Fast processing with throttling * PROGRESS_UPDATE_INTERVAL=0 updates every file * Slow processing (per-file > T) updates every file - py_compile syntax check passed Edge cases handled: - Single file task: Works normally - Very slow processing: Degrades to per-file updates - Concurrent FILE_WORKERS > 1: Counters accurate (lock-protected), DB reflects with max T seconds delay
This commit is contained in:
@@ -13,6 +13,8 @@ the datamate-python backend and frontend can display real-time status.
|
||||
- 任务内文件并发: 可通过 AUTO_ANNOTATION_FILE_WORKERS 配置线程池大小,
|
||||
单任务内并行处理多个文件(LLM I/O 密集型场景尤其有效)。
|
||||
算子链通过对象池隔离,每个线程使用独立的链实例。
|
||||
- 进度更新节流: 可通过 AUTO_ANNOTATION_PROGRESS_INTERVAL 控制进度写入频率,
|
||||
避免大数据集每文件都写 DB 造成的写压力(默认 2 秒间隔)。
|
||||
- 启动时自动恢复心跳超时的 running 任务:未处理文件重置为 pending,
|
||||
已有部分进度的标记为 failed,由用户决定是否手动重试。
|
||||
"""
|
||||
@@ -133,6 +135,8 @@ WORKER_COUNT = int(os.getenv("AUTO_ANNOTATION_WORKER_COUNT", "1"))
|
||||
|
||||
FILE_WORKERS = int(os.getenv("AUTO_ANNOTATION_FILE_WORKERS", "1"))
|
||||
|
||||
PROGRESS_UPDATE_INTERVAL = float(os.getenv("AUTO_ANNOTATION_PROGRESS_INTERVAL", "2.0"))
|
||||
|
||||
|
||||
def _recover_stale_running_tasks() -> int:
|
||||
"""启动时恢复心跳超时的 running 任务。
|
||||
@@ -1274,6 +1278,7 @@ def _process_single_task(task: Dict[str, Any]) -> None:
|
||||
# --- 并发文件处理 ---
|
||||
stop_check_interval = max(1, effective_file_workers * 2)
|
||||
completed_since_check = 0
|
||||
last_progress_update = time.monotonic()
|
||||
|
||||
with ThreadPoolExecutor(max_workers=effective_file_workers) as executor:
|
||||
future_to_file = {
|
||||
@@ -1299,18 +1304,21 @@ def _process_single_task(task: Dict[str, Any]) -> None:
|
||||
current_processed = processed
|
||||
current_detected = detected_total
|
||||
|
||||
progress = int(current_processed * 100 / total_images) if total_images > 0 else 100
|
||||
_update_task_status(
|
||||
task_id,
|
||||
run_token=run_token,
|
||||
status="running",
|
||||
progress=progress,
|
||||
processed_images=current_processed,
|
||||
detected_objects=current_detected,
|
||||
total_images=total_images,
|
||||
output_path=output_dir,
|
||||
output_dataset_id=output_dataset_id,
|
||||
)
|
||||
now = time.monotonic()
|
||||
if PROGRESS_UPDATE_INTERVAL <= 0 or (now - last_progress_update) >= PROGRESS_UPDATE_INTERVAL:
|
||||
progress = int(current_processed * 100 / total_images) if total_images > 0 else 100
|
||||
_update_task_status(
|
||||
task_id,
|
||||
run_token=run_token,
|
||||
status="running",
|
||||
progress=progress,
|
||||
processed_images=current_processed,
|
||||
detected_objects=current_detected,
|
||||
total_images=total_images,
|
||||
output_path=output_dir,
|
||||
output_dataset_id=output_dataset_id,
|
||||
)
|
||||
last_progress_update = now
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
"Failed to process file for task {}: file_path={}, error={}",
|
||||
|
||||
Reference in New Issue
Block a user