feat: add labeling template. refactor: switch to Poetry, build and deploy of backend Python (#79)

* feat: Enhance annotation module with template management and validation - Added DatasetMappingCreateRequest and DatasetMappingUpdateRequest schemas to handle dataset mapping requests with camelCase and snake_case support. - Introduced Annotation Template schemas including CreateAnnotationTemplateRequest, UpdateAnnotationTemplateRequest, and AnnotationTemplateResponse for managing annotation templates. - Implemented AnnotationTemplateService for creating, updating, retrieving, and deleting annotation templates, including validation of configurations and XML generation. - Added utility class LabelStudioConfigValidator for validating Label Studio configurations and XML formats. - Updated database schema for annotation templates and labeling projects to include new fields and constraints. - Seeded initial annotation templates for various use cases including image classification, object detection, and text classification. * feat: Enhance TemplateForm with improved validation and dynamic field rendering; update LabelStudio config validation for camelCase support * feat: Update docker-compose.yml to mark datamate dataset volume and network as external * feat: Add tag configuration management and related components - Introduced new components for tag selection and browsing in the frontend. - Added API endpoint to fetch tag configuration from the backend. - Implemented tag configuration management in the backend, including loading from YAML. - Enhanced template service to support dynamic tag rendering based on configuration. - Updated validation utilities to incorporate tag configuration checks. - Refactored existing code to utilize the new tag configuration structure. * feat: Refactor LabelStudioTagConfig for improved configuration loading and validation * feat: Update Makefile to include backend-python-docker-build in the build process * feat: Migrate to poetry for better deps management * Add pyyaml dependency and update Dockerfile to use Poetry for dependency management - Added pyyaml (>=6.0.3,<7.0.0) to pyproject.toml dependencies. - Updated Dockerfile to install Poetry and manage dependencies using it. - Improved layer caching by copying only dependency files before the application code. - Removed unnecessary installation of build dependencies to keep the final image size small. * feat: Remove duplicated backend-python-docker-build target from Makefile * fix: airflow is not ready for adding yet * feat: update Python version to 3.12 and remove project installation step in Dockerfile
2025-11-13 15:32:30 +08:00
parent 2660845b74
commit 45743f39f5
40 changed files with 3223 additions and 262 deletions
--- a/runtime/datamate-python/app/module/dataset/service/service.py
+++ b/runtime/datamate-python/app/module/dataset/service/service.py
@@ -1,7 +1,8 @@
 from sqlalchemy.ext.asyncio import AsyncSession
 from sqlalchemy.future import select
 from sqlalchemy import func
-from typing import Optional
+from typing import Optional, List, Dict, Any
+from datetime import datetime

 from app.core.config import settings
 from app.core.logging import get_logger
@@ -22,12 +23,12 @@ class Service:
            db: 数据库会话
        """
        self.db = db
-        logger.info("Initialize DM service client (Database mode)")
+        logger.debug("Initialize DM service client (Database mode)")

    async def get_dataset(self, dataset_id: str) -> Optional[DatasetResponse]:
        """获取数据集详情"""
        try:
-            logger.info(f"Getting dataset detail: {dataset_id} ...")
+            logger.debug(f"Getting dataset detail: {dataset_id} ...")
            
            result = await self.db.execute(
                select(Dataset).where(Dataset.id == dataset_id)
@@ -66,7 +67,7 @@ class Service:
    ) -> Optional[PagedDatasetFileResponse]:
        """获取数据集文件列表"""
        try:
-            logger.info(f"Get dataset files: dataset={dataset_id}, page={page}, size={size}")
+            logger.debug(f"Get dataset files: dataset={dataset_id}, page={page}, size={size}")
            
            # 构建查询
            query = select(DatasetFiles).where(DatasetFiles.dataset_id == dataset_id)
@@ -159,4 +160,67 @@ class Service:
    
    async def close(self):
        """关闭客户端连接（数据库模式下无需操作）"""
-        logger.info("DM service client closed (Database mode)")
+        logger.info("DM service client closed (Database mode)")
+    
+    async def update_file_tags_partial(
+        self, 
+        file_id: str, 
+        new_tags: List[Dict[str, Any]]
+    ) -> tuple[bool, Optional[str], Optional[datetime]]:
+        """
+        部分更新文件标签
+        
+        Args:
+            file_id: 文件ID
+            new_tags: 新的标签列表（部分更新）
+        
+        Returns:
+            (成功标志, 错误信息, 更新时间)
+        """
+        try:
+            logger.info(f"Partial updating tags for file: {file_id}")
+            
+            # 获取文件记录
+            result = await self.db.execute(
+                select(DatasetFiles).where(DatasetFiles.id == file_id)
+            )
+            file_record = result.scalar_one_or_none()
+            
+            if not file_record:
+                logger.error(f"File not found: {file_id}")
+                return False, f"File not found: {file_id}", None
+            
+            # 获取现有标签
+            existing_tags: List[Dict[str, Any]] = file_record.tags or []  # type: ignore
+            
+            # 创建标签ID到索引的映射
+            tag_id_map = {tag.get('id'): idx for idx, tag in enumerate(existing_tags) if tag.get('id')}
+            
+            # 更新或追加标签
+            for new_tag in new_tags:
+                tag_id = new_tag.get('id')
+                if tag_id and tag_id in tag_id_map:
+                    # 更新现有标签
+                    idx = tag_id_map[tag_id]
+                    existing_tags[idx] = new_tag
+                    logger.debug(f"Updated existing tag with id: {tag_id}")
+                else:
+                    # 追加新标签
+                    existing_tags.append(new_tag)
+                    logger.debug(f"Added new tag with id: {tag_id}")
+            
+            # 更新数据库
+            update_time = datetime.utcnow()
+            file_record.tags = existing_tags  # type: ignore
+            file_record.tags_updated_at = update_time  # type: ignore
+            
+            await self.db.commit()
+            await self.db.refresh(file_record)
+            
+            logger.info(f"Successfully updated tags for file: {file_id}")
+            return True, None, update_time
+            
+        except Exception as e:
+            logger.error(f"Failed to update tags for file {file_id}: {e}")
+            await self.db.rollback()
+            return False, str(e), None