feat: File and Annotation 2-way sync implementation (#63)

* feat: Refactor configuration and sync logic for improved dataset handling and logging

* feat: Enhance annotation synchronization and dataset file management

- Added new fields `tags_updated_at` to `DatasetFiles` model for tracking the last update time of tags.
- Implemented new asynchronous methods in the Label Studio client for fetching, creating, updating, and deleting task annotations.
- Introduced bidirectional synchronization for annotations between DataMate and Label Studio, allowing for flexible data management.
- Updated sync service to handle annotation conflicts based on timestamps, ensuring data integrity during synchronization.
- Enhanced dataset file response model to include tags and their update timestamps.
- Modified database initialization script to create a new column for `tags_updated_at` in the dataset files table.
- Updated requirements to ensure compatibility with the latest dependencies.
This commit is contained in:
Jason Wang
2025-11-07 15:03:07 +08:00
committed by GitHub
parent d136bad38c
commit 78f50ea520
16 changed files with 1336 additions and 290 deletions

View File

@@ -15,6 +15,8 @@ class DatasetFileResponse(BaseModel):
description: Optional[str] = Field(None, description="文件描述")
uploadedBy: Optional[str] = Field(None, description="上传者")
lastAccessTime: Optional[datetime] = Field(None, description="最后访问时间")
tags: Optional[List[Dict[str, Any]]] = Field(None, description="文件标签/标注信息")
tags_updated_at: Optional[datetime] = Field(None, description="标签最后更新时间", alias="tagsUpdatedAt")
class PagedDatasetFileResponse(BaseModel):
"""DM服务分页文件响应模型"""

View File

@@ -108,7 +108,9 @@ class Service:
uploadedAt=f.upload_time, # type: ignore
description=None,
uploadedBy=None,
lastAccessTime=f.last_access_time # type: ignore
lastAccessTime=f.last_access_time, # type: ignore
tags=f.tags, # type: ignore
tags_updated_at=f.tags_updated_at # type: ignore
)
for f in files
]