feat(annotation): 支持通用算子编排的数据标注功能

## 功能概述将数据标注模块从固定 YOLO 算子改造为支持通用算子编排，实现与数据清洗模块类似的灵活算子组合能力。 ## 改动内容 ### 第 1 步：数据库改造（DDL） - 新增 SQL migration 脚本：scripts/db/data-annotation-operator-pipeline-migration.sql - 修改 t_dm_auto_annotation_tasks 表： - 新增字段：task_mode, executor_type, pipeline, output_dataset_id, created_by, stop_requested, started_at, heartbeat_at, run_token - 新增索引：idx_status_created, idx_created_by - 创建 t_dm_annotation_task_operator_instance 表：用于存储算子实例详情 ### 第 2 步：API 层改造 - 扩展请求模型（schema/auto.py）： - 新增 OperatorPipelineStep 模型 - 支持 pipeline 字段，保留旧 YOLO 字段向后兼容 - 实现多写法归一（operatorId/operator_id/id, overrides/settingsOverride/settings_override） - 修改任务创建服务（service/auto.py）： - 新增 validate_file_ids() 校验方法 - 新增 _to_pipeline() 兼容映射方法 - 写入新字段并集成算子实例表 - 修复 fileIds 去重准确性问题 - 新增 API 路由（interface/auto.py）： - 新增 /operator-tasks 系列接口 - 新增 stop API 接口（/auto/{id}/stop 和 /operator-tasks/{id}/stop） - 保留旧 /auto 接口向后兼容 - ORM 模型对齐（annotation_management.py）： - AutoAnnotationTask 新增所有 DDL 字段 - 新增 AnnotationTaskOperatorInstance 模型 - 状态定义补充 stopped ### 第 3 步：Runtime 层改造 - 修改 worker 执行逻辑（auto_annotation_worker.py）： - 实现原子任务抢占机制（run_token） - 从硬编码 YOLO 改为通用 pipeline 执行 - 新增算子解析和实例化能力 - 支持 stop_requested 检查 - 保留 legacy_yolo 模式向后兼容 - 支持多种算子调用方式（execute 和 __call__） ### 第 4 步：灰度发布 - 完善 YOLO 算子元数据（metadata.yml）： - 补齐 raw_id, language, modal, inputs, outputs, settings 字段 - 注册标注算子（__init__.py）： - 将 YOLO 算子注册到 OPERATORS 注册表 - 确保 annotation 包被正确加载 - 新增白名单控制： - 支持环境变量 AUTO_ANNOTATION_OPERATOR_WHITELIST - 灰度发布时可限制可用算子 ## 关键特性 ### 向后兼容 - 旧 /auto 接口完全保留 - 旧请求参数自动映射到 pipeline - legacy_yolo 模式确保旧逻辑正常运行 ### 新功能 - 支持通用 pipeline 编排 - 支持多算子组合 - 支持任务停止控制 - 支持白名单灰度发布 ### 可靠性 - 原子任务抢占（防止重复执行） - 完整的错误处理和状态管理 - 详细的审计追踪（算子实例表） ## 部署说明 1. 执行 DDL：mysql < scripts/db/data-annotation-operator-pipeline-migration.sql 2. 配置环境变量：AUTO_ANNOTATION_OPERATOR_WHITELIST=ImageObjectDetectionBoundingBox 3. 重启服务：datamate-runtime 和 datamate-backend-python ## 验证步骤 1. 兼容模式验证：使用旧 /auto 接口创建任务 2. 通用编排验证：使用新 /operator-tasks 接口创建 pipeline 任务 3. 原子 claim 验证：检查 run_token 机制 4. 停止验证：测试 stop API 5. 白名单验证：测试算子白名单拦截 ## 相关文件 - DDL: scripts/db/data-annotation-operator-pipeline-migration.sql - API: runtime/datamate-python/app/module/annotation/ - Worker: runtime/python-executor/datamate/auto_annotation_worker.py - 算子: runtime/ops/annotation/image_object_detection_bounding_box/
2026-02-07 22:35:33 +08:00
parent 9efc07935f
commit 2f49fc4199
9 changed files with 1606 additions and 480 deletions
@@ -1,6 +1,8 @@
-# -*- coding: utf-8 -*-
-"""Annotation-related operators (e.g. YOLO detection)."""
-
-__all__ = [
-    "image_object_detection_bounding_box",
-]
+# -*- coding: utf-8 -*-
+"""Annotation-related operators (e.g. YOLO detection)."""
+
+from . import image_object_detection_bounding_box
+
+__all__ = [
+    "image_object_detection_bounding_box",
+]
@@ -1,9 +1,16 @@
-"""Image object detection (YOLOv8) operator package.
-
-This package exposes the ImageObjectDetectionBoundingBox annotator so that
-the auto-annotation worker can import it via different module paths.
-"""
-
-from .process import ImageObjectDetectionBoundingBox
-
-__all__ = ["ImageObjectDetectionBoundingBox"]
+"""Image object detection (YOLOv8) operator package.
+
+This package exposes the ImageObjectDetectionBoundingBox annotator so that
+the auto-annotation worker can import it via different module paths.
+"""
+
+from datamate.core.base_op import OPERATORS
+
+from .process import ImageObjectDetectionBoundingBox
+
+OPERATORS.register_module(
+    module_name="ImageObjectDetectionBoundingBox",
+    module_path="ops.annotation.image_object_detection_bounding_box.process",
+)
+
+__all__ = ["ImageObjectDetectionBoundingBox"]
@@ -1,3 +1,48 @@
-name: image_object_detection_bounding_box
-version: 0.1.0
-description: "YOLOv8-based object detection operator for auto annotation"
+name: '图像目标检测（YOLOv8）'
+name_en: 'Image Object Detection (YOLOv8)'
+description: '基于 YOLOv8 的目标检测算子，输出带框图像与标注 JSON。'
+description_en: 'YOLOv8-based object detection operator that outputs boxed images and annotation JSON files.'
+language: 'python'
+vendor: 'huawei'
+raw_id: 'ImageObjectDetectionBoundingBox'
+version: '1.0.0'
+types:
+  - 'annotation'
+modal: 'image'
+inputs: 'image'
+outputs: 'image'
+settings:
+  modelSize:
+    name: '模型规模'
+    description: 'YOLOv8 模型规模：n/s/m/l/x。'
+    type: 'select'
+    defaultVal: 'l'
+    options:
+      - label: 'n'
+        value: 'n'
+      - label: 's'
+        value: 's'
+      - label: 'm'
+        value: 'm'
+      - label: 'l'
+        value: 'l'
+      - label: 'x'
+        value: 'x'
+  confThreshold:
+    name: '置信度阈值'
+    description: '检测结果最小置信度，范围 0~1。'
+    type: 'slider'
+    defaultVal: 0.7
+    min: 0
+    max: 1
+    step: 0.01
+  targetClasses:
+    name: '目标类别'
+    description: 'COCO 类别 ID 列表；为空表示全部类别。'
+    type: 'input'
+    defaultVal: '[]'
+  outputDir:
+    name: '输出目录'
+    description: '算子输出目录（由运行时注入）。'
+    type: 'input'
+    defaultVal: ''