Automatically convert auto-annotation outputs to Label Studio format and write to t_dm_annotation_results table, enabling seamless editing in the annotation editor.
New file:
- runtime/python-executor/datamate/annotation_result_converter.py
* 4 converters for different annotation types:
- convert_text_classification → choices type
- convert_ner → labels (span) type
- convert_relation_extraction → labels + relation type
- convert_object_detection → rectanglelabels type
* convert_annotation() dispatcher (auto-detects task_type)
* generate_label_config_xml() for dynamic XML generation
* Pipeline introspection utilities
* Label Studio ID generation logic
Modified file:
- runtime/python-executor/datamate/auto_annotation_worker.py
* Preserve file_id through processing loop (line 918)
* Collect file_results as (file_id, annotations) pairs
* New _create_labeling_project_with_annotations() function:
- Creates labeling project linked to source dataset
- Snapshots all files
- Converts results to Label Studio format
- Writes to t_dm_annotation_results in single transaction
* label_config XML stored in t_dm_labeling_projects.configuration
Key features:
- Supports 4 annotation types: text classification, NER, relation extraction, object detection
- Deterministic region IDs for entity references in relation extraction
- Pixel to percentage conversion for object detection
- XML escaping handled by xml.etree.ElementTree
- Partial results preserved on task stop
Users can now view and edit auto-annotation results seamlessly in the annotation editor.
Add three new LLM-powered auto-annotation operators:
- LLMTextClassification: Text classification using LLM
- LLMNamedEntityRecognition: Named entity recognition with type validation
- LLMRelationExtraction: Relation extraction with entity and relation type validation
Key features:
- Load LLM config from t_model_config table via modelId parameter
- Lazy loading of LLM configuration on first execute()
- Result validation with whitelist checking for entity/relation types
- Fault-tolerant: returns empty results on LLM failure instead of throwing
- Fully compatible with existing Worker pipeline
Files added:
- runtime/ops/annotation/_llm_utils.py: Shared LLM utilities
- runtime/ops/annotation/llm_text_classification/: Text classification operator
- runtime/ops/annotation/llm_named_entity_recognition/: NER operator
- runtime/ops/annotation/llm_relation_extraction/: Relation extraction operator
Files modified:
- runtime/ops/annotation/__init__.py: Register 3 new operators
- runtime/python-executor/datamate/auto_annotation_worker.py: Add to Worker whitelist
- frontend/src/pages/DataAnnotation/OperatorCreate/hooks/useOperatorOperations.ts: Add to frontend whitelist
新增 AnnotationMigrator 迁移算法,在 TEXT 类型数据集的文件版本更新时,
可选通过 difflib 位置偏移映射和文字二次匹配将旧版本标注迁移到新版本上。
前端版本切换对话框增加"保留标注"复选框(仅 TEXT 类型显示),后端 API
增加 preserveAnnotations 参数,完全向后兼容。
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Problem:
use_new_version returned 404 annotation not found for files without
annotation, preventing users from switching to new versions.
Solution:
1. Query latest file by logical_path
2. Update LabelingProjectFile to point to latest version
3. If annotation exists: clear it and update file_id
4. If no annotation: just update project file snapshot
5. Return new file_id in response
Change has_new_version logic to compare current file version with
latest version, regardless of whether annotation exists.
Before: Only show warning if annotation exists and version is outdated
After: Show warning if current file is not the latest version
This ensures users are informed when viewing an old file version,
even if they haven't started annotating yet.
Problem:
check_file_version was comparing annotation version with the passed
file_id's version, but when files are updated, new file records are
created with higher versions and old ones are marked ARCHIVED.
Solution:
1. Query the latest ACTIVE file by logical_path
2. Compare annotation version with latest file version
3. Return latestFileId so frontend can switch to new version
Changes:
- check_file_version now queries latest version by logical_path
- Added latest_file_id to FileVersionCheckResponse schema
- Updated descriptions to clarify currentFileVersion is latest version
Database scenario:
- old file: id=6dae9f2f, version=1, status=ARCHIVED
- new file: id=3365b4e7, version=3, status=ACTIVE
- Both have same logical_path='rufus.ini'
- Now correctly detects version 3 > annotation version
Replace datetime.datetime.now(datetime.UTC) with datetime.datetime.now()
to fix compatibility issues with Python 3.10 and earlier versions.
datetime.UTC is only available in Python 3.11+, causing 500 errors
in production environment.
Files fixed:
- app/module/dataset/service/pdf_extract.py
- app/module/generation/service/export_service.py
Update data-annotation-init.sql and Alembic migration to support both new and old deployments:
SQL Initialization Script (data-annotation-init.sql):
- Add file_version column to t_dm_annotation_results table
- Add Alembic version table creation and version insertion
- New deployments using this script will have latest schema and Alembic version marked
Alembic Migration (20250205_0001_add_file_version.py):
- Add column_exists() helper function to detect if column already exists
- Add compatibility check in upgrade(): skip if column exists (new SQL init)
- Add informative print messages for deployment clarity
- Enhanced docstrings explaining compatibility strategy
Deployment Scenarios:
1. New deployment with latest SQL script: Schema created with file_version, Alembic marked as applied
2. Old deployment upgrade: Alembic detects missing column and adds it
This ensures backward compatibility while supporting fresh installs with complete schema.
Add support for detecting new file versions and switching to them:
Backend Changes:
- Add file_version column to AnnotationResult model
- Create Alembic migration for database schema update
- Implement check_file_version() method to compare annotation and file versions
- Implement use_new_version() method to clear annotations and update version
- Update upsert_annotation() to record file version when saving
- Add new API endpoints: GET /version and POST /use-new-version
- Add FileVersionCheckResponse and UseNewVersionResponse schemas
Frontend Changes:
- Add checkFileVersionUsingGet and useNewVersionUsingPost API calls
- Add version warning banner showing current vs latest file version
- Add 'Use New Version' button with confirmation dialog
- Clear version info state when switching files to avoid stale warnings
Bug Fixes:
- Fix previousFileVersion returning updated value (save before update)
- Handle null file_version for historical data compatibility
- Fix segmented annotation clearing (preserve structure, clear results)
- Fix files without annotations incorrectly showing new version warnings
- Preserve total_segments when clearing segmented annotations
Files Modified:
- frontend/src/pages/DataAnnotation/Annotate/LabelStudioTextEditor.tsx
- frontend/src/pages/DataAnnotation/annotation.api.ts
- runtime/datamate-python/app/db/models/annotation_management.py
- runtime/datamate-python/app/module/annotation/interface/editor.py
- runtime/datamate-python/app/module/annotation/schema/editor.py
- runtime/datamate-python/app/module/annotation/service/editor.py
New Files:
- runtime/datamate-python/alembic.ini
- runtime/datamate-python/alembic/env.py
- runtime/datamate-python/alembic/script.py.mako
- runtime/datamate-python/alembic/versions/20250205_0001_add_file_version.py