feat: add labeling template. refactor: switch to Poetry, build and deploy of backend Python (#79)

* feat: Enhance annotation module with template management and validation

- Added DatasetMappingCreateRequest and DatasetMappingUpdateRequest schemas to handle dataset mapping requests with camelCase and snake_case support.
- Introduced Annotation Template schemas including CreateAnnotationTemplateRequest, UpdateAnnotationTemplateRequest, and AnnotationTemplateResponse for managing annotation templates.
- Implemented AnnotationTemplateService for creating, updating, retrieving, and deleting annotation templates, including validation of configurations and XML generation.
- Added utility class LabelStudioConfigValidator for validating Label Studio configurations and XML formats.
- Updated database schema for annotation templates and labeling projects to include new fields and constraints.
- Seeded initial annotation templates for various use cases including image classification, object detection, and text classification.

* feat: Enhance TemplateForm with improved validation and dynamic field rendering; update LabelStudio config validation for camelCase support

* feat: Update docker-compose.yml to mark datamate dataset volume and network as external

* feat: Add tag configuration management and related components

- Introduced new components for tag selection and browsing in the frontend.
- Added API endpoint to fetch tag configuration from the backend.
- Implemented tag configuration management in the backend, including loading from YAML.
- Enhanced template service to support dynamic tag rendering based on configuration.
- Updated validation utilities to incorporate tag configuration checks.
- Refactored existing code to utilize the new tag configuration structure.

* feat: Refactor LabelStudioTagConfig for improved configuration loading and validation

* feat: Update Makefile to include backend-python-docker-build in the build process

* feat: Migrate to poetry for better deps management

* Add pyyaml dependency and update Dockerfile to use Poetry for dependency management

- Added pyyaml (>=6.0.3,<7.0.0) to pyproject.toml dependencies.
- Updated Dockerfile to install Poetry and manage dependencies using it.
- Improved layer caching by copying only dependency files before the application code.
- Removed unnecessary installation of build dependencies to keep the final image size small.

* feat: Remove duplicated backend-python-docker-build target from Makefile

* fix: airflow is not ready for adding yet

* feat: update Python version to 3.12 and remove project installation step in Dockerfile
This commit is contained in:
Jason Wang
2025-11-13 15:32:30 +08:00
committed by GitHub
parent 2660845b74
commit 45743f39f5
40 changed files with 3223 additions and 262 deletions

View File

@@ -0,0 +1,4 @@
"""Tag configuration package"""
from .tag_config import LabelStudioTagConfig
__all__ = ['LabelStudioTagConfig']

View File

@@ -0,0 +1,467 @@
# Label Studio Tag Configuration
# Defines supported tags, their properties, and child element requirements
# Object tags - represent data to be annotated
objects:
Audio:
description: "Display audio files"
required_attrs: [name, value]
optional_attrs: []
category: media
Bitmask:
description: "Display bitmask images for segmentation"
required_attrs: [name, value]
optional_attrs: []
category: image
PDF:
description: "Display PDF documents"
required_attrs: [name, value]
optional_attrs: []
category: document
Markdown:
description: "Display Markdown content"
required_attrs: [name, value]
optional_attrs: []
category: document
ParagraphLabels:
description: "Display paragraphs with label support"
required_attrs: [name, value]
optional_attrs: []
category: text
Timeseries:
description: "Display timeseries data"
required_attrs: [name, value]
optional_attrs: []
category: data
Vector:
description: "Display vector data for annotation"
required_attrs: [name, value]
optional_attrs: []
category: data
Chat:
description: "Display chat data for annotation"
required_attrs: [name, value]
optional_attrs: []
category: text
HyperText:
description: "Display HTML content"
required_attrs: [name, value]
optional_attrs: []
category: document
Image:
description: "Display images for annotation"
required_attrs: [name, value]
optional_attrs: []
category: image
Text:
description: "Display text for annotation"
required_attrs: [name, value]
optional_attrs: []
category: text
Video:
description: "Display video files"
required_attrs: [name, value]
optional_attrs: []
category: media
AudioPlus:
description: "Advanced audio player"
required_attrs: [name, value]
optional_attrs: []
category: media
Paragraphs:
description: "Display paragraphs of text"
required_attrs: [name, value]
optional_attrs: []
category: text
Table:
description: "Display tabular data"
required_attrs: [name, value]
optional_attrs: []
category: data
# Control tags - tools for annotation
# Categories:
# - labeling: Controls used for annotating/labeling objects (shown in template form)
# - layout: UI/layout elements not used for labeling (hidden from template form by default)
controls:
# Choice-based controls (use <Choice> children)
Choices:
description: "Multiple choice classification"
required_attrs: [name, toName]
optional_attrs:
required:
type: boolean
description: "Whether the choice is required"
choice:
type: string
values: [single, multiple]
default: single
description: "Selection mode: single or multiple"
showInline:
type: boolean
default: true
description: "Show choices inline or as dropdown"
requires_children: true
child_tag: Choice
child_required_attrs: [value]
category: labeling
Taxonomy:
description: "Hierarchical multi-label classification"
required_attrs: [name, toName]
optional_attrs:
required:
type: boolean
maxDepth:
type: number
default: 3
description: "Maximum depth of taxonomy tree"
requires_children: true
child_tag: Path
child_required_attrs: [value]
category: labeling
Ranker:
description: "Rank items in order"
required_attrs: [name, toName]
optional_attrs:
required:
type: boolean
maxChoices:
type: number
default: 5
description: "Maximum number of choices to rank"
requires_children: true
child_tag: Choice
child_required_attrs: [value]
category: layout
List:
description: "List selection control"
required_attrs: [name, toName]
optional_attrs:
required:
type: boolean
mode:
type: string
values: [single, multiple]
default: single
requires_children: true
child_tag: Item
child_required_attrs: [value]
category: layout
Filter:
description: "Filter control for annotation"
required_attrs: [name, toName]
optional_attrs:
required:
type: boolean
requires_children: false
category: layout
Collapse:
description: "Collapsible UI section"
required_attrs: [name]
optional_attrs:
collapsed:
type: boolean
default: false
requires_children: false
category: layout
Header:
description: "Section header for UI grouping"
required_attrs: [name]
optional_attrs:
level:
type: number
default: 1
description: "Header level (1-6)"
requires_children: false
category: layout
Shortcut:
description: "Keyboard shortcut definition"
required_attrs: [name, toName]
optional_attrs:
key:
type: string
description: "Shortcut key"
requires_children: false
category: layout
Style:
description: "Custom style for annotation UI"
required_attrs: [name]
optional_attrs:
value:
type: string
description: "CSS style value"
requires_children: false
category: layout
MagicWand:
description: "Magic wand segmentation tool"
required_attrs: [name, toName]
optional_attrs:
required:
type: boolean
requires_children: false
category: labeling
BitmaskLabels:
description: "Bitmask segmentation with labels"
required_attrs: [name, toName]
optional_attrs: [required]
requires_children: true
child_tag: Label
child_required_attrs: [value]
category: labeling
TimeseriesLabels:
description: "Labels for timeseries data"
required_attrs: [name, toName]
optional_attrs: [required]
requires_children: true
child_tag: Label
child_required_attrs: [value]
category: labeling
VectorLabels:
description: "Labels for vector data"
required_attrs: [name, toName]
optional_attrs: [required]
requires_children: true
child_tag: Label
child_required_attrs: [value]
category: labeling
ParagraphLabels:
description: "Labels for paragraphs"
required_attrs: [name, toName]
optional_attrs: [required]
requires_children: true
child_tag: Label
child_required_attrs: [value]
category: labeling
Relation:
description: "Draw relation between objects"
required_attrs: [name, toName]
optional_attrs:
required:
type: boolean
requires_children: false
category: layout
Relations:
description: "Draw multiple relations between objects"
required_attrs: [name, toName]
optional_attrs:
required:
type: boolean
requires_children: false
category: layout
Pairwise:
description: "Pairwise comparison control"
required_attrs: [name, toName]
optional_attrs:
required:
type: boolean
requires_children: false
category: layout
DateTime:
description: "Date and time input"
required_attrs: [name, toName]
optional_attrs:
required:
type: boolean
format:
type: string
default: "YYYY-MM-DD HH:mm:ss"
requires_children: false
category: labeling
Number:
description: "Numeric input field"
required_attrs: [name, toName]
optional_attrs:
required:
type: boolean
min:
type: number
max:
type: number
step:
type: number
default: 1
requires_children: false
category: labeling
# Label-based controls (use <Label> children)
RectangleLabels:
description: "Rectangle bounding boxes with labels"
required_attrs: [name, toName]
optional_attrs:
required:
type: boolean
description: "Whether annotation is required"
strokeWidth:
type: number
default: 3
description: "Width of the bounding box border"
canRotate:
type: boolean
default: true
description: "Allow rotation of rectangles"
requires_children: true
child_tag: Label
child_required_attrs: [value]
category: labeling
PolygonLabels:
description: "Polygon annotations with labels"
required_attrs: [name, toName]
optional_attrs:
required:
type: boolean
strokeWidth:
type: number
default: 3
pointSize:
type: string
values: [small, medium, large]
default: medium
requires_children: true
child_tag: Label
child_required_attrs: [value]
category: labeling
Labels:
description: "Generic labels for classification"
required_attrs: [name, toName]
optional_attrs: [required]
requires_children: true
child_tag: Label
child_required_attrs: [value]
category: labeling
KeyPointLabels:
description: "Keypoint annotations with labels"
required_attrs: [name, toName]
optional_attrs: [required]
requires_children: true
child_tag: Label
child_required_attrs: [value]
category: labeling
BrushLabels:
description: "Brush/semantic segmentation with labels"
required_attrs: [name, toName]
optional_attrs: [required]
requires_children: true
child_tag: Label
child_required_attrs: [value]
category: labeling
EllipseLabels:
description: "Ellipse annotations with labels"
required_attrs: [name, toName]
optional_attrs: [required]
requires_children: true
child_tag: Label
child_required_attrs: [value]
category: labeling
# Simple controls (no children required)
Rectangle:
description: "Rectangle bounding box without labels"
required_attrs: [name, toName]
optional_attrs: [required]
requires_children: false
category: labeling
Polygon:
description: "Polygon annotation without labels"
required_attrs: [name, toName]
optional_attrs: [required]
requires_children: false
category: labeling
Ellipse:
description: "Ellipse annotation without labels"
required_attrs: [name, toName]
optional_attrs: [required]
requires_children: false
category: labeling
KeyPoint:
description: "Keypoint annotation without labels"
required_attrs: [name, toName]
optional_attrs: [required]
requires_children: false
category: labeling
Brush:
description: "Brush annotation without labels"
required_attrs: [name, toName]
optional_attrs: [required]
requires_children: false
category: labeling
TextArea:
description: "Text input field"
required_attrs: [name, toName]
optional_attrs:
required:
type: boolean
placeholder:
type: string
description: "Placeholder text"
maxSubmissions:
type: number
description: "Maximum number of submissions"
rows:
type: number
default: 3
description: "Number of rows in textarea"
editable:
type: boolean
default: true
requires_children: false
category: labeling
Rating:
description: "Star rating or numeric rating"
required_attrs: [name, toName]
optional_attrs:
required:
type: boolean
maxRating:
type: number
default: 5
description: "Maximum rating value"
defaultValue:
type: number
description: "Default rating value"
size:
type: string
values: [small, medium, large]
default: medium
icon:
type: string
values: [star, heart, fire, thumbs]
default: star
requires_children: false
category: labeling
VideoRectangle:
description: "Rectangle annotations for video"
required_attrs: [name, toName]
optional_attrs: [required]
requires_children: false
category: labeling

View File

@@ -0,0 +1,150 @@
"""
Label Studio Tag Configuration Loader
"""
import yaml
from typing import Dict, Any, Optional, Set, Tuple
from pathlib import Path
class LabelStudioTagConfig:
"""Label Studio标签配置管理器"""
_instance: Optional['LabelStudioTagConfig'] = None
_config: Dict[str, Any] = {}
def __new__(cls):
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
def __init__(self):
"""初始化时加载配置"""
if not self._config:
self._load_config()
@classmethod
def _load_config(cls):
"""加载YAML配置文件"""
config_path = Path(__file__).parent / "label_studio_tags.yaml"
with open(config_path, 'r', encoding='utf-8') as f:
cls._config = yaml.safe_load(f) or {}
@classmethod
def get_object_types(cls) -> Set[str]:
"""获取所有支持的对象类型"""
return set(cls._config.get('objects', {}).keys())
@classmethod
def get_control_types(cls) -> Set[str]:
"""获取所有支持的控件类型"""
return set(cls._config.get('controls', {}).keys())
@classmethod
def get_control_config(cls, control_type: str) -> Optional[Dict[str, Any]]:
"""获取控件的配置信息"""
return cls._config.get('controls', {}).get(control_type)
@classmethod
def get_object_config(cls, object_type: str) -> Optional[Dict[str, Any]]:
"""获取对象的配置信息"""
return cls._config.get('objects', {}).get(object_type)
@classmethod
def requires_children(cls, control_type: str) -> bool:
"""检查控件是否需要子元素"""
config = cls.get_control_config(control_type)
return config.get('requires_children', False) if config else False
@classmethod
def get_child_tag(cls, control_type: str) -> Optional[str]:
"""获取控件的子元素标签名"""
config = cls.get_control_config(control_type)
return config.get('child_tag') if config else None
@classmethod
def get_controls_with_child_tag(cls, child_tag: str) -> Set[str]:
"""获取使用指定子元素标签的所有控件类型"""
controls = set()
for control_type, config in cls._config.get('controls', {}).items():
if config.get('child_tag') == child_tag:
controls.add(control_type)
return controls
@classmethod
def get_optional_attrs(cls, tag_type: str, is_control: bool = True) -> Dict[str, Any]:
"""
获取标签的可选属性配置
Args:
tag_type: 标签类型
is_control: 是否为控件类型(否则为对象类型)
Returns:
可选属性配置字典
"""
config = cls.get_control_config(tag_type) if is_control else cls.get_object_config(tag_type)
if not config:
return {}
optional_attrs = config.get('optional_attrs', {})
# 如果是简单列表格式(旧格式),转换为字典
if isinstance(optional_attrs, list):
return {attr: {} for attr in optional_attrs}
# 确保返回的是字典
return optional_attrs if isinstance(optional_attrs, dict) else {}
@classmethod
def validate_attr_value(cls, tag_type: str, attr_name: str, attr_value: Any, is_control: bool = True) -> Tuple[bool, Optional[str]]:
"""
验证属性值是否符合配置要求
Args:
tag_type: 标签类型
attr_name: 属性名
attr_value: 属性值
is_control: 是否为控件类型
Returns:
(是否有效, 错误信息)
"""
optional_attrs = cls.get_optional_attrs(tag_type, is_control)
if attr_name not in optional_attrs:
return True, None # 不在配置中的属性,不验证
attr_config = optional_attrs.get(attr_name, {})
# 如果配置不是字典,跳过验证
if not isinstance(attr_config, dict):
return True, None
# 检查类型
expected_type = attr_config.get('type')
if expected_type == 'boolean':
if not isinstance(attr_value, (bool, str)) or (isinstance(attr_value, str) and attr_value.lower() not in ['true', 'false']):
return False, f"Attribute '{attr_name}' must be boolean"
elif expected_type == 'number':
try:
float(attr_value)
except (ValueError, TypeError):
return False, f"Attribute '{attr_name}' must be a number"
# 检查枚举值
allowed_values = attr_config.get('values')
if allowed_values and attr_value not in allowed_values:
return False, f"Attribute '{attr_name}' must be one of {allowed_values}, got '{attr_value}'"
return True, None
@classmethod
def get_attr_default(cls, tag_type: str, attr_name: str, is_control: bool = True) -> Optional[Any]:
"""获取属性的默认值"""
optional_attrs = cls.get_optional_attrs(tag_type, is_control)
attr_config = optional_attrs.get(attr_name, {})
# 确保attr_config是字典后再访问
if isinstance(attr_config, dict):
return attr_config.get('default')
return None