You've already forked DataMate
Develop labeling module (#25)
* refactor: remove db table management from LS adapter (mv to scripts later); change adapter to use the same MySQL DB as other modules. * refactor: Rename LS Adapter module to datamate-python
This commit is contained in:
138
runtime/datamate-python/app/models/README.md
Normal file
138
runtime/datamate-python/app/models/README.md
Normal file
@@ -0,0 +1,138 @@
|
||||
# DataMate 数据模型结构
|
||||
|
||||
本文档列出了根据 `scripts/db` 中的 SQL 文件创建的所有 Python 数据模型。
|
||||
|
||||
## 模型组织结构
|
||||
|
||||
```
|
||||
app/models/
|
||||
├── __init__.py # 主模块导出文件
|
||||
├── dm/ # 数据管理 (Data Management) 模块
|
||||
│ ├── __init__.py
|
||||
│ ├── annotation_template.py # 标注模板
|
||||
│ ├── labeling_project.py # 标注项目
|
||||
│ ├── dataset.py # 数据集
|
||||
│ ├── dataset_files.py # 数据集文件
|
||||
│ ├── dataset_statistics.py # 数据集统计
|
||||
│ ├── dataset_tag.py # 数据集标签关联
|
||||
│ ├── tag.py # 标签
|
||||
│ └── user.py # 用户
|
||||
├── cleaning/ # 数据清洗 (Data Cleaning) 模块
|
||||
│ ├── __init__.py
|
||||
│ ├── clean_template.py # 清洗模板
|
||||
│ ├── clean_task.py # 清洗任务
|
||||
│ ├── operator_instance.py # 算子实例
|
||||
│ └── clean_result.py # 清洗结果
|
||||
├── collection/ # 数据归集 (Data Collection) 模块
|
||||
│ ├── __init__.py
|
||||
│ ├── task_execution.py # 任务执行明细
|
||||
│ ├── collection_task.py # 数据归集任务
|
||||
│ ├── task_log.py # 任务执行记录
|
||||
│ └── datax_template.py # DataX模板配置
|
||||
├── common/ # 通用 (Common) 模块
|
||||
│ ├── __init__.py
|
||||
│ └── chunk_upload_request.py # 文件切片上传请求
|
||||
└── operator/ # 算子 (Operator) 模块
|
||||
├── __init__.py
|
||||
├── operator.py # 算子
|
||||
├── operator_category.py # 算子分类
|
||||
└── operator_category_relation.py # 算子分类关联
|
||||
```
|
||||
|
||||
## 模块详情
|
||||
|
||||
### 1. Data Management (DM) 模块
|
||||
对应 SQL: `data-management-init.sql` 和 `data-annotation-init.sql`
|
||||
|
||||
#### 模型列表:
|
||||
- **AnnotationTemplate** (`t_dm_annotation_templates`) - 标注模板
|
||||
- **LabelingProject** (`t_dm_labeling_projects`) - 标注项目
|
||||
- **Dataset** (`t_dm_datasets`) - 数据集(支持医学影像、文本、问答等多种类型)
|
||||
- **DatasetFiles** (`t_dm_dataset_files`) - 数据集文件
|
||||
- **DatasetStatistics** (`t_dm_dataset_statistics`) - 数据集统计信息
|
||||
- **Tag** (`t_dm_tags`) - 标签
|
||||
- **DatasetTag** (`t_dm_dataset_tags`) - 数据集标签关联
|
||||
- **User** (`users`) - 用户
|
||||
|
||||
### 2. Data Cleaning 模块
|
||||
对应 SQL: `data-cleaning-init.sql`
|
||||
|
||||
#### 模型列表:
|
||||
- **CleanTemplate** (`t_clean_template`) - 清洗模板
|
||||
- **CleanTask** (`t_clean_task`) - 清洗任务
|
||||
- **OperatorInstance** (`t_operator_instance`) - 算子实例
|
||||
- **CleanResult** (`t_clean_result`) - 清洗结果
|
||||
|
||||
### 3. Data Collection (DC) 模块
|
||||
对应 SQL: `data-collection-init.sql`
|
||||
|
||||
#### 模型列表:
|
||||
- **TaskExecution** (`t_dc_task_executions`) - 任务执行明细
|
||||
- **CollectionTask** (`t_dc_collection_tasks`) - 数据归集任务
|
||||
- **TaskLog** (`t_dc_task_log`) - 任务执行记录
|
||||
- **DataxTemplate** (`t_dc_datax_templates`) - DataX模板配置
|
||||
|
||||
### 4. Common 模块
|
||||
对应 SQL: `data-common-init.sql`
|
||||
|
||||
#### 模型列表:
|
||||
- **ChunkUploadRequest** (`t_chunk_upload_request`) - 文件切片上传请求
|
||||
|
||||
### 5. Operator 模块
|
||||
对应 SQL: `data-operator-init.sql`
|
||||
|
||||
#### 模型列表:
|
||||
- **Operator** (`t_operator`) - 算子
|
||||
- **OperatorCategory** (`t_operator_category`) - 算子分类
|
||||
- **OperatorCategoryRelation** (`t_operator_category_relation`) - 算子分类关联
|
||||
|
||||
## 使用方式
|
||||
|
||||
```python
|
||||
# 导入所有模型
|
||||
from app.models import (
|
||||
# DM 模块
|
||||
AnnotationTemplate,
|
||||
LabelingProject,
|
||||
Dataset,
|
||||
DatasetFiles,
|
||||
DatasetStatistics,
|
||||
DatasetTag,
|
||||
Tag,
|
||||
User,
|
||||
# Cleaning 模块
|
||||
CleanTemplate,
|
||||
CleanTask,
|
||||
OperatorInstance,
|
||||
CleanResult,
|
||||
# Collection 模块
|
||||
TaskExecution,
|
||||
CollectionTask,
|
||||
TaskLog,
|
||||
DataxTemplate,
|
||||
# Common 模块
|
||||
ChunkUploadRequest,
|
||||
# Operator 模块
|
||||
Operator,
|
||||
OperatorCategory,
|
||||
OperatorCategoryRelation
|
||||
)
|
||||
|
||||
# 或者按模块导入
|
||||
from app.models.dm import Dataset, DatasetFiles
|
||||
from app.models.collection import CollectionTask
|
||||
from app.models.operator import Operator
|
||||
```
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. **UUID 主键**: 大部分表使用 UUID (String(36)) 作为主键
|
||||
2. **时间戳**: 使用 `TIMESTAMP` 类型,并配置自动更新
|
||||
3. **软删除**: 部分模型(如 AnnotationTemplate, LabelingProject)支持软删除,包含 `deleted_at` 字段和 `is_deleted` 属性
|
||||
4. **JSON 字段**: 配置信息、元数据等使用 JSON 类型存储
|
||||
5. **字段一致性**: 所有模型字段都严格按照 SQL 定义创建,确保与数据库表结构完全一致
|
||||
|
||||
## 更新记录
|
||||
|
||||
- 2025-10-25: 根据 `scripts/db` 中的 SQL 文件创建所有数据模型
|
||||
- 已更新现有的 `annotation_template.py`、`labeling_project.py`、`dataset_files.py` 以匹配 SQL 定义
|
||||
69
runtime/datamate-python/app/models/__init__.py
Normal file
69
runtime/datamate-python/app/models/__init__.py
Normal file
@@ -0,0 +1,69 @@
|
||||
# app/models/__init__.py
|
||||
|
||||
# Data Management (DM) 模块
|
||||
from .dm import (
|
||||
AnnotationTemplate,
|
||||
LabelingProject,
|
||||
Dataset,
|
||||
DatasetFiles,
|
||||
DatasetStatistics,
|
||||
DatasetTag,
|
||||
Tag,
|
||||
User
|
||||
)
|
||||
|
||||
# Data Cleaning 模块
|
||||
from .cleaning import (
|
||||
CleanTemplate,
|
||||
CleanTask,
|
||||
OperatorInstance,
|
||||
CleanResult
|
||||
)
|
||||
|
||||
# Data Collection (DC) 模块
|
||||
from .collection import (
|
||||
TaskExecution,
|
||||
CollectionTask,
|
||||
TaskLog,
|
||||
DataxTemplate
|
||||
)
|
||||
|
||||
# Common 模块
|
||||
from .common import (
|
||||
ChunkUploadRequest
|
||||
)
|
||||
|
||||
# Operator 模块
|
||||
from .operator import (
|
||||
Operator,
|
||||
OperatorCategory,
|
||||
OperatorCategoryRelation
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
# DM 模块
|
||||
"AnnotationTemplate",
|
||||
"LabelingProject",
|
||||
"Dataset",
|
||||
"DatasetFiles",
|
||||
"DatasetStatistics",
|
||||
"DatasetTag",
|
||||
"Tag",
|
||||
"User",
|
||||
# Cleaning 模块
|
||||
"CleanTemplate",
|
||||
"CleanTask",
|
||||
"OperatorInstance",
|
||||
"CleanResult",
|
||||
# Collection 模块
|
||||
"TaskExecution",
|
||||
"CollectionTask",
|
||||
"TaskLog",
|
||||
"DataxTemplate",
|
||||
# Common 模块
|
||||
"ChunkUploadRequest",
|
||||
# Operator 模块
|
||||
"Operator",
|
||||
"OperatorCategory",
|
||||
"OperatorCategoryRelation"
|
||||
]
|
||||
13
runtime/datamate-python/app/models/cleaning/__init__.py
Normal file
13
runtime/datamate-python/app/models/cleaning/__init__.py
Normal file
@@ -0,0 +1,13 @@
|
||||
# app/models/cleaning/__init__.py
|
||||
|
||||
from .clean_template import CleanTemplate
|
||||
from .clean_task import CleanTask
|
||||
from .operator_instance import OperatorInstance
|
||||
from .clean_result import CleanResult
|
||||
|
||||
__all__ = [
|
||||
"CleanTemplate",
|
||||
"CleanTask",
|
||||
"OperatorInstance",
|
||||
"CleanResult"
|
||||
]
|
||||
22
runtime/datamate-python/app/models/cleaning/clean_result.py
Normal file
22
runtime/datamate-python/app/models/cleaning/clean_result.py
Normal file
@@ -0,0 +1,22 @@
|
||||
from sqlalchemy import Column, String, BigInteger, Text
|
||||
from app.db.database import Base
|
||||
|
||||
class CleanResult(Base):
|
||||
"""清洗结果模型"""
|
||||
|
||||
__tablename__ = "t_clean_result"
|
||||
|
||||
instance_id = Column(String(64), primary_key=True, comment="实例ID")
|
||||
src_file_id = Column(String(64), nullable=True, comment="源文件ID")
|
||||
dest_file_id = Column(String(64), primary_key=True, comment="目标文件ID")
|
||||
src_name = Column(String(256), nullable=True, comment="源文件名")
|
||||
dest_name = Column(String(256), nullable=True, comment="目标文件名")
|
||||
src_type = Column(String(256), nullable=True, comment="源文件类型")
|
||||
dest_type = Column(String(256), nullable=True, comment="目标文件类型")
|
||||
src_size = Column(BigInteger, nullable=True, comment="源文件大小")
|
||||
dest_size = Column(BigInteger, nullable=True, comment="目标文件大小")
|
||||
status = Column(String(256), nullable=True, comment="处理状态")
|
||||
result = Column(Text, nullable=True, comment="处理结果")
|
||||
|
||||
def __repr__(self):
|
||||
return f"<CleanResult(instance_id={self.instance_id}, dest_file_id={self.dest_file_id}, status={self.status})>"
|
||||
27
runtime/datamate-python/app/models/cleaning/clean_task.py
Normal file
27
runtime/datamate-python/app/models/cleaning/clean_task.py
Normal file
@@ -0,0 +1,27 @@
|
||||
from sqlalchemy import Column, String, BigInteger, Integer, TIMESTAMP
|
||||
from sqlalchemy.sql import func
|
||||
from app.db.database import Base
|
||||
|
||||
class CleanTask(Base):
|
||||
"""清洗任务模型"""
|
||||
|
||||
__tablename__ = "t_clean_task"
|
||||
|
||||
id = Column(String(64), primary_key=True, comment="任务ID")
|
||||
name = Column(String(64), nullable=True, comment="任务名称")
|
||||
description = Column(String(256), nullable=True, comment="任务描述")
|
||||
status = Column(String(256), nullable=True, comment="任务状态")
|
||||
src_dataset_id = Column(String(64), nullable=True, comment="源数据集ID")
|
||||
src_dataset_name = Column(String(64), nullable=True, comment="源数据集名称")
|
||||
dest_dataset_id = Column(String(64), nullable=True, comment="目标数据集ID")
|
||||
dest_dataset_name = Column(String(64), nullable=True, comment="目标数据集名称")
|
||||
before_size = Column(BigInteger, nullable=True, comment="清洗前大小")
|
||||
after_size = Column(BigInteger, nullable=True, comment="清洗后大小")
|
||||
file_count = Column(Integer, nullable=True, comment="文件数量")
|
||||
created_at = Column(TIMESTAMP, server_default=func.current_timestamp(), comment="创建时间")
|
||||
started_at = Column(TIMESTAMP, nullable=True, comment="开始时间")
|
||||
finished_at = Column(TIMESTAMP, nullable=True, comment="完成时间")
|
||||
created_by = Column(String(256), nullable=True, comment="创建者")
|
||||
|
||||
def __repr__(self):
|
||||
return f"<CleanTask(id={self.id}, name={self.name}, status={self.status})>"
|
||||
@@ -0,0 +1,18 @@
|
||||
from sqlalchemy import Column, String, Text, TIMESTAMP
|
||||
from sqlalchemy.sql import func
|
||||
from app.db.database import Base
|
||||
|
||||
class CleanTemplate(Base):
|
||||
"""清洗模板模型"""
|
||||
|
||||
__tablename__ = "t_clean_template"
|
||||
|
||||
id = Column(String(64), primary_key=True, unique=True, comment="模板ID")
|
||||
name = Column(String(64), nullable=True, comment="模板名称")
|
||||
description = Column(String(256), nullable=True, comment="模板描述")
|
||||
created_at = Column(TIMESTAMP, server_default=func.current_timestamp(), comment="创建时间")
|
||||
updated_at = Column(TIMESTAMP, server_default=func.current_timestamp(), onupdate=func.current_timestamp(), comment="更新时间")
|
||||
created_by = Column(String(256), nullable=True, comment="创建者")
|
||||
|
||||
def __repr__(self):
|
||||
return f"<CleanTemplate(id={self.id}, name={self.name})>"
|
||||
@@ -0,0 +1,15 @@
|
||||
from sqlalchemy import Column, String, Integer, Text
|
||||
from app.db.database import Base
|
||||
|
||||
class OperatorInstance(Base):
|
||||
"""算子实例模型"""
|
||||
|
||||
__tablename__ = "t_operator_instance"
|
||||
|
||||
instance_id = Column(String(256), primary_key=True, comment="实例ID")
|
||||
operator_id = Column(String(256), primary_key=True, comment="算子ID")
|
||||
op_index = Column(Integer, primary_key=True, comment="算子索引")
|
||||
settings_override = Column(Text, nullable=True, comment="配置覆盖")
|
||||
|
||||
def __repr__(self):
|
||||
return f"<OperatorInstance(instance_id={self.instance_id}, operator_id={self.operator_id}, index={self.op_index})>"
|
||||
13
runtime/datamate-python/app/models/collection/__init__.py
Normal file
13
runtime/datamate-python/app/models/collection/__init__.py
Normal file
@@ -0,0 +1,13 @@
|
||||
# app/models/collection/__init__.py
|
||||
|
||||
from .task_execution import TaskExecution
|
||||
from .collection_task import CollectionTask
|
||||
from .task_log import TaskLog
|
||||
from .datax_template import DataxTemplate
|
||||
|
||||
__all__ = [
|
||||
"TaskExecution",
|
||||
"CollectionTask",
|
||||
"TaskLog",
|
||||
"DataxTemplate"
|
||||
]
|
||||
@@ -0,0 +1,28 @@
|
||||
from sqlalchemy import Column, String, Text, Integer, BigInteger, TIMESTAMP
|
||||
from sqlalchemy.sql import func
|
||||
from app.db.database import Base
|
||||
|
||||
class CollectionTask(Base):
|
||||
"""数据归集任务模型"""
|
||||
|
||||
__tablename__ = "t_dc_collection_tasks"
|
||||
|
||||
id = Column(String(36), primary_key=True, comment="任务ID(UUID)")
|
||||
name = Column(String(255), nullable=False, comment="任务名称")
|
||||
description = Column(Text, nullable=True, comment="任务描述")
|
||||
sync_mode = Column(String(20), default='ONCE', comment="同步模式:ONCE/SCHEDULED")
|
||||
config = Column(Text, nullable=False, comment="归集配置(DataX配置),包含源端和目标端配置信息")
|
||||
schedule_expression = Column(String(255), nullable=True, comment="Cron调度表达式")
|
||||
status = Column(String(20), default='DRAFT', comment="任务状态:DRAFT/READY/RUNNING/SUCCESS/FAILED/STOPPED")
|
||||
retry_count = Column(Integer, default=3, comment="重试次数")
|
||||
timeout_seconds = Column(Integer, default=3600, comment="超时时间(秒)")
|
||||
max_records = Column(BigInteger, nullable=True, comment="最大处理记录数")
|
||||
sort_field = Column(String(100), nullable=True, comment="增量字段")
|
||||
last_execution_id = Column(String(36), nullable=True, comment="最后执行ID(UUID)")
|
||||
created_at = Column(TIMESTAMP, server_default=func.current_timestamp(), comment="创建时间")
|
||||
updated_at = Column(TIMESTAMP, server_default=func.current_timestamp(), onupdate=func.current_timestamp(), comment="更新时间")
|
||||
created_by = Column(String(255), nullable=True, comment="创建者")
|
||||
updated_by = Column(String(255), nullable=True, comment="更新者")
|
||||
|
||||
def __repr__(self):
|
||||
return f"<CollectionTask(id={self.id}, name={self.name}, status={self.status})>"
|
||||
@@ -0,0 +1,23 @@
|
||||
from sqlalchemy import Column, String, Text, Boolean, TIMESTAMP
|
||||
from sqlalchemy.sql import func
|
||||
from app.db.database import Base
|
||||
|
||||
class DataxTemplate(Base):
|
||||
"""DataX模板配置模型"""
|
||||
|
||||
__tablename__ = "t_dc_datax_templates"
|
||||
|
||||
id = Column(String(36), primary_key=True, comment="模板ID(UUID)")
|
||||
name = Column(String(255), nullable=False, unique=True, comment="模板名称")
|
||||
source_type = Column(String(50), nullable=False, comment="源数据源类型")
|
||||
target_type = Column(String(50), nullable=False, comment="目标数据源类型")
|
||||
template_content = Column(Text, nullable=False, comment="模板内容")
|
||||
description = Column(Text, nullable=True, comment="模板描述")
|
||||
version = Column(String(20), default='1.0.0', comment="版本号")
|
||||
is_system = Column(Boolean, default=False, comment="是否系统模板")
|
||||
created_at = Column(TIMESTAMP, server_default=func.current_timestamp(), comment="创建时间")
|
||||
updated_at = Column(TIMESTAMP, server_default=func.current_timestamp(), onupdate=func.current_timestamp(), comment="更新时间")
|
||||
created_by = Column(String(255), nullable=True, comment="创建者")
|
||||
|
||||
def __repr__(self):
|
||||
return f"<DataxTemplate(id={self.id}, name={self.name}, source={self.source_type}, target={self.target_type})>"
|
||||
@@ -0,0 +1,34 @@
|
||||
from sqlalchemy import Column, String, Text, Integer, BigInteger, DECIMAL, JSON, TIMESTAMP
|
||||
from sqlalchemy.sql import func
|
||||
from app.db.database import Base
|
||||
|
||||
class TaskExecution(Base):
|
||||
"""任务执行明细模型"""
|
||||
|
||||
__tablename__ = "t_dc_task_executions"
|
||||
|
||||
id = Column(String(36), primary_key=True, comment="执行记录ID(UUID)")
|
||||
task_id = Column(String(36), nullable=False, comment="任务ID")
|
||||
task_name = Column(String(255), nullable=False, comment="任务名称")
|
||||
status = Column(String(20), default='RUNNING', comment="执行状态:RUNNING/SUCCESS/FAILED/STOPPED")
|
||||
progress = Column(DECIMAL(5, 2), default=0.00, comment="进度百分比")
|
||||
records_total = Column(BigInteger, default=0, comment="总记录数")
|
||||
records_processed = Column(BigInteger, default=0, comment="已处理记录数")
|
||||
records_success = Column(BigInteger, default=0, comment="成功记录数")
|
||||
records_failed = Column(BigInteger, default=0, comment="失败记录数")
|
||||
throughput = Column(DECIMAL(10, 2), default=0.00, comment="吞吐量(条/秒)")
|
||||
data_size_bytes = Column(BigInteger, default=0, comment="数据量(字节)")
|
||||
started_at = Column(TIMESTAMP, nullable=True, comment="开始时间")
|
||||
completed_at = Column(TIMESTAMP, nullable=True, comment="完成时间")
|
||||
duration_seconds = Column(Integer, default=0, comment="执行时长(秒)")
|
||||
config = Column(JSON, nullable=True, comment="执行配置")
|
||||
error_message = Column(Text, nullable=True, comment="错误信息")
|
||||
datax_job_id = Column(Text, nullable=True, comment="datax任务ID")
|
||||
result = Column(Text, nullable=True, comment="执行结果")
|
||||
created_at = Column(TIMESTAMP, server_default=func.current_timestamp(), comment="创建时间")
|
||||
updated_at = Column(TIMESTAMP, server_default=func.current_timestamp(), onupdate=func.current_timestamp(), comment="更新时间")
|
||||
created_by = Column(String(255), nullable=True, comment="创建者")
|
||||
updated_by = Column(String(255), nullable=True, comment="更新者")
|
||||
|
||||
def __repr__(self):
|
||||
return f"<TaskExecution(id={self.id}, task_id={self.task_id}, status={self.status})>"
|
||||
26
runtime/datamate-python/app/models/collection/task_log.py
Normal file
26
runtime/datamate-python/app/models/collection/task_log.py
Normal file
@@ -0,0 +1,26 @@
|
||||
from sqlalchemy import Column, String, Text, Integer, BigInteger, TIMESTAMP
|
||||
from sqlalchemy.sql import func
|
||||
from app.db.database import Base
|
||||
|
||||
class TaskLog(Base):
|
||||
"""任务执行记录模型"""
|
||||
|
||||
__tablename__ = "t_dc_task_log"
|
||||
|
||||
id = Column(String(36), primary_key=True, comment="执行记录ID(UUID)")
|
||||
task_id = Column(String(36), nullable=False, comment="任务ID")
|
||||
task_name = Column(String(255), nullable=False, comment="任务名称")
|
||||
sync_mode = Column(String(20), default='FULL', comment="同步模式:FULL/INCREMENTAL")
|
||||
status = Column(String(20), default='RUNNING', comment="执行状态:RUNNING/SUCCESS/FAILED/STOPPED")
|
||||
start_time = Column(TIMESTAMP, nullable=True, comment="开始时间")
|
||||
end_time = Column(TIMESTAMP, nullable=True, comment="结束时间")
|
||||
duration = Column(BigInteger, nullable=True, comment="执行时长(毫秒)")
|
||||
process_id = Column(String(50), nullable=True, comment="进程ID")
|
||||
log_path = Column(String(500), nullable=True, comment="日志文件路径")
|
||||
error_msg = Column(Text, nullable=True, comment="错误信息")
|
||||
result = Column(Text, nullable=True, comment="执行结果")
|
||||
retry_times = Column(Integer, default=0, comment="重试次数")
|
||||
create_time = Column(TIMESTAMP, server_default=func.current_timestamp(), comment="创建时间")
|
||||
|
||||
def __repr__(self):
|
||||
return f"<TaskLog(id={self.id}, task_id={self.task_id}, status={self.status})>"
|
||||
7
runtime/datamate-python/app/models/common/__init__.py
Normal file
7
runtime/datamate-python/app/models/common/__init__.py
Normal file
@@ -0,0 +1,7 @@
|
||||
# app/models/common/__init__.py
|
||||
|
||||
from .chunk_upload_request import ChunkUploadRequest
|
||||
|
||||
__all__ = [
|
||||
"ChunkUploadRequest"
|
||||
]
|
||||
@@ -0,0 +1,19 @@
|
||||
from sqlalchemy import Column, String, Integer, Text, TIMESTAMP
|
||||
from sqlalchemy.sql import func
|
||||
from app.db.database import Base
|
||||
|
||||
class ChunkUploadRequest(Base):
|
||||
"""文件切片上传请求模型"""
|
||||
|
||||
__tablename__ = "t_chunk_upload_request"
|
||||
|
||||
id = Column(String(36), primary_key=True, comment="UUID")
|
||||
total_file_num = Column(Integer, nullable=True, comment="总文件数")
|
||||
uploaded_file_num = Column(Integer, nullable=True, comment="已上传文件数")
|
||||
upload_path = Column(String(256), nullable=True, comment="文件路径")
|
||||
timeout = Column(TIMESTAMP, server_default=func.current_timestamp(), comment="上传请求超时时间")
|
||||
service_id = Column(String(64), nullable=True, comment="上传请求所属服务:DATA-MANAGEMENT(数据管理)")
|
||||
check_info = Column(Text, nullable=True, comment="业务信息")
|
||||
|
||||
def __repr__(self):
|
||||
return f"<ChunkUploadRequest(id={self.id}, service_id={self.service_id}, progress={self.uploaded_file_num}/{self.total_file_num})>"
|
||||
21
runtime/datamate-python/app/models/dm/__init__.py
Normal file
21
runtime/datamate-python/app/models/dm/__init__.py
Normal file
@@ -0,0 +1,21 @@
|
||||
# app/models/dm/__init__.py
|
||||
|
||||
from .annotation_template import AnnotationTemplate
|
||||
from .labeling_project import LabelingProject
|
||||
from .dataset import Dataset
|
||||
from .dataset_files import DatasetFiles
|
||||
from .dataset_statistics import DatasetStatistics
|
||||
from .dataset_tag import DatasetTag
|
||||
from .tag import Tag
|
||||
from .user import User
|
||||
|
||||
__all__ = [
|
||||
"AnnotationTemplate",
|
||||
"LabelingProject",
|
||||
"Dataset",
|
||||
"DatasetFiles",
|
||||
"DatasetStatistics",
|
||||
"DatasetTag",
|
||||
"Tag",
|
||||
"User"
|
||||
]
|
||||
24
runtime/datamate-python/app/models/dm/annotation_template.py
Normal file
24
runtime/datamate-python/app/models/dm/annotation_template.py
Normal file
@@ -0,0 +1,24 @@
|
||||
from sqlalchemy import Column, String, JSON, TIMESTAMP
|
||||
from sqlalchemy.sql import func
|
||||
from app.db.database import Base
|
||||
import uuid
|
||||
|
||||
class AnnotationTemplate(Base):
|
||||
"""标注模板模型"""
|
||||
|
||||
__tablename__ = "t_dm_annotation_templates"
|
||||
|
||||
id = Column(String(36), primary_key=True, default=lambda: str(uuid.uuid4()), comment="UUID主键ID")
|
||||
name = Column(String(32), nullable=False, comment="模板名称")
|
||||
description = Column(String(255), nullable=True, comment="模板描述")
|
||||
configuration = Column(JSON, nullable=True, comment="配置信息(JSON格式)")
|
||||
created_at = Column(TIMESTAMP, server_default=func.current_timestamp(), comment="创建时间")
|
||||
deleted_at = Column(TIMESTAMP, nullable=True, comment="删除时间(软删除)")
|
||||
|
||||
def __repr__(self):
|
||||
return f"<AnnotationTemplate(id={self.id}, name={self.name})>"
|
||||
|
||||
@property
|
||||
def is_deleted(self) -> bool:
|
||||
"""检查是否已被软删除"""
|
||||
return self.deleted_at is not None
|
||||
35
runtime/datamate-python/app/models/dm/dataset.py
Normal file
35
runtime/datamate-python/app/models/dm/dataset.py
Normal file
@@ -0,0 +1,35 @@
|
||||
from sqlalchemy import Column, String, Text, BigInteger, Integer, Boolean, JSON, TIMESTAMP
|
||||
from sqlalchemy.sql import func
|
||||
from app.db.database import Base
|
||||
import uuid
|
||||
|
||||
class Dataset(Base):
|
||||
"""数据集模型(支持医学影像、文本、问答等多种类型)"""
|
||||
|
||||
__tablename__ = "t_dm_datasets"
|
||||
|
||||
id = Column(String(36), primary_key=True, default=lambda: str(uuid.uuid4()), comment="UUID")
|
||||
name = Column(String(255), nullable=False, comment="数据集名称")
|
||||
description = Column(Text, nullable=True, comment="数据集描述")
|
||||
dataset_type = Column(String(50), nullable=False, comment="数据集类型:IMAGE/TEXT/QA/MULTIMODAL/OTHER")
|
||||
category = Column(String(100), nullable=True, comment="数据集分类:医学影像/问答/文献等")
|
||||
path = Column(String(500), nullable=True, comment="数据存储路径")
|
||||
format = Column(String(50), nullable=True, comment="数据格式:DCM/JPG/JSON/CSV等")
|
||||
schema_info = Column(JSON, nullable=True, comment="数据结构信息")
|
||||
size_bytes = Column(BigInteger, default=0, comment="数据大小(字节)")
|
||||
file_count = Column(BigInteger, default=0, comment="文件数量")
|
||||
record_count = Column(BigInteger, default=0, comment="记录数量")
|
||||
retention_days = Column(Integer, default=0, comment="数据保留天数(0表示长期保留)")
|
||||
tags = Column(JSON, nullable=True, comment="标签列表")
|
||||
metadata = Column(JSON, nullable=True, comment="元数据信息")
|
||||
status = Column(String(50), default='DRAFT', comment="状态:DRAFT/ACTIVE/ARCHIVED")
|
||||
is_public = Column(Boolean, default=False, comment="是否公开")
|
||||
is_featured = Column(Boolean, default=False, comment="是否推荐")
|
||||
version = Column(BigInteger, nullable=False, default=0, comment="版本号")
|
||||
created_at = Column(TIMESTAMP, server_default=func.current_timestamp(), comment="创建时间")
|
||||
updated_at = Column(TIMESTAMP, server_default=func.current_timestamp(), onupdate=func.current_timestamp(), comment="更新时间")
|
||||
created_by = Column(String(255), nullable=True, comment="创建者")
|
||||
updated_by = Column(String(255), nullable=True, comment="更新者")
|
||||
|
||||
def __repr__(self):
|
||||
return f"<Dataset(id={self.id}, name={self.name}, type={self.dataset_type})>"
|
||||
27
runtime/datamate-python/app/models/dm/dataset_files.py
Normal file
27
runtime/datamate-python/app/models/dm/dataset_files.py
Normal file
@@ -0,0 +1,27 @@
|
||||
from sqlalchemy import Column, String, JSON, BigInteger, TIMESTAMP
|
||||
from sqlalchemy.sql import func
|
||||
from app.db.database import Base
|
||||
import uuid
|
||||
|
||||
class DatasetFiles(Base):
|
||||
"""DM数据集文件模型"""
|
||||
|
||||
__tablename__ = "t_dm_dataset_files"
|
||||
|
||||
id = Column(String(36), primary_key=True, default=lambda: str(uuid.uuid4()), comment="UUID")
|
||||
dataset_id = Column(String(36), nullable=False, comment="所属数据集ID(UUID)")
|
||||
file_name = Column(String(255), nullable=False, comment="文件名")
|
||||
file_path = Column(String(1000), nullable=False, comment="文件路径")
|
||||
file_type = Column(String(50), nullable=True, comment="文件格式:JPG/PNG/DCM/TXT等")
|
||||
file_size = Column(BigInteger, default=0, comment="文件大小(字节)")
|
||||
check_sum = Column(String(64), nullable=True, comment="文件校验和")
|
||||
tags = Column(JSON, nullable=True, comment="文件标签信息")
|
||||
metadata = Column(JSON, nullable=True, comment="文件元数据")
|
||||
status = Column(String(50), default='ACTIVE', comment="文件状态:ACTIVE/DELETED/PROCESSING")
|
||||
upload_time = Column(TIMESTAMP, server_default=func.current_timestamp(), comment="上传时间")
|
||||
last_access_time = Column(TIMESTAMP, nullable=True, comment="最后访问时间")
|
||||
created_at = Column(TIMESTAMP, server_default=func.current_timestamp(), comment="创建时间")
|
||||
updated_at = Column(TIMESTAMP, server_default=func.current_timestamp(), onupdate=func.current_timestamp(), comment="更新时间")
|
||||
|
||||
def __repr__(self):
|
||||
return f"<DatasetFiles(id={self.id}, dataset_id={self.dataset_id}, file_name={self.file_name})>"
|
||||
25
runtime/datamate-python/app/models/dm/dataset_statistics.py
Normal file
25
runtime/datamate-python/app/models/dm/dataset_statistics.py
Normal file
@@ -0,0 +1,25 @@
|
||||
from sqlalchemy import Column, String, Date, BigInteger, JSON, TIMESTAMP
|
||||
from sqlalchemy.sql import func
|
||||
from app.db.database import Base
|
||||
import uuid
|
||||
|
||||
class DatasetStatistics(Base):
|
||||
"""数据集统计信息模型"""
|
||||
|
||||
__tablename__ = "t_dm_dataset_statistics"
|
||||
|
||||
id = Column(String(36), primary_key=True, default=lambda: str(uuid.uuid4()), comment="UUID")
|
||||
dataset_id = Column(String(36), nullable=False, comment="数据集ID(UUID)")
|
||||
stat_date = Column(Date, nullable=False, comment="统计日期")
|
||||
total_files = Column(BigInteger, default=0, comment="总文件数")
|
||||
total_size = Column(BigInteger, default=0, comment="总大小(字节)")
|
||||
processed_files = Column(BigInteger, default=0, comment="已处理文件数")
|
||||
error_files = Column(BigInteger, default=0, comment="错误文件数")
|
||||
download_count = Column(BigInteger, default=0, comment="下载次数")
|
||||
view_count = Column(BigInteger, default=0, comment="查看次数")
|
||||
quality_metrics = Column(JSON, nullable=True, comment="质量指标")
|
||||
created_at = Column(TIMESTAMP, server_default=func.current_timestamp(), comment="创建时间")
|
||||
updated_at = Column(TIMESTAMP, server_default=func.current_timestamp(), onupdate=func.current_timestamp(), comment="更新时间")
|
||||
|
||||
def __repr__(self):
|
||||
return f"<DatasetStatistics(id={self.id}, dataset_id={self.dataset_id}, date={self.stat_date})>"
|
||||
15
runtime/datamate-python/app/models/dm/dataset_tag.py
Normal file
15
runtime/datamate-python/app/models/dm/dataset_tag.py
Normal file
@@ -0,0 +1,15 @@
|
||||
from sqlalchemy import Column, String, TIMESTAMP
|
||||
from sqlalchemy.sql import func
|
||||
from app.db.database import Base
|
||||
|
||||
class DatasetTag(Base):
|
||||
"""数据集标签关联模型"""
|
||||
|
||||
__tablename__ = "t_dm_dataset_tags"
|
||||
|
||||
dataset_id = Column(String(36), primary_key=True, comment="数据集ID(UUID)")
|
||||
tag_id = Column(String(36), primary_key=True, comment="标签ID(UUID)")
|
||||
created_at = Column(TIMESTAMP, server_default=func.current_timestamp(), comment="创建时间")
|
||||
|
||||
def __repr__(self):
|
||||
return f"<DatasetTag(dataset_id={self.dataset_id}, tag_id={self.tag_id})>"
|
||||
26
runtime/datamate-python/app/models/dm/labeling_project.py
Normal file
26
runtime/datamate-python/app/models/dm/labeling_project.py
Normal file
@@ -0,0 +1,26 @@
|
||||
from sqlalchemy import Column, String, Integer, JSON, TIMESTAMP
|
||||
from sqlalchemy.sql import func
|
||||
from app.db.database import Base
|
||||
import uuid
|
||||
|
||||
class LabelingProject(Base):
|
||||
"""DM标注项目模型(原 DatasetMapping)"""
|
||||
|
||||
__tablename__ = "t_dm_labeling_projects"
|
||||
|
||||
id = Column(String(36), primary_key=True, default=lambda: str(uuid.uuid4()), comment="UUID主键ID")
|
||||
dataset_id = Column(String(36), nullable=False, comment="数据集ID")
|
||||
name = Column(String(32), nullable=False, comment="项目名称")
|
||||
labeling_project_id = Column(Integer, nullable=False, comment="Label Studio项目ID")
|
||||
configuration = Column(JSON, nullable=True, comment="标签配置")
|
||||
progress = Column(JSON, nullable=True, comment="标注进度统计")
|
||||
created_at = Column(TIMESTAMP, server_default=func.current_timestamp(), comment="创建时间")
|
||||
deleted_at = Column(TIMESTAMP, nullable=True, comment="删除时间(软删除)")
|
||||
|
||||
def __repr__(self):
|
||||
return f"<LabelingProject(id={self.id}, dataset_id={self.dataset_id}, name={self.name})>"
|
||||
|
||||
@property
|
||||
def is_deleted(self) -> bool:
|
||||
"""检查是否已被软删除"""
|
||||
return self.deleted_at is not None
|
||||
21
runtime/datamate-python/app/models/dm/tag.py
Normal file
21
runtime/datamate-python/app/models/dm/tag.py
Normal file
@@ -0,0 +1,21 @@
|
||||
from sqlalchemy import Column, String, Text, BigInteger, TIMESTAMP
|
||||
from sqlalchemy.sql import func
|
||||
from app.db.database import Base
|
||||
import uuid
|
||||
|
||||
class Tag(Base):
|
||||
"""标签模型"""
|
||||
|
||||
__tablename__ = "t_dm_tags"
|
||||
|
||||
id = Column(String(36), primary_key=True, default=lambda: str(uuid.uuid4()), comment="UUID")
|
||||
name = Column(String(100), nullable=False, unique=True, comment="标签名称")
|
||||
description = Column(Text, nullable=True, comment="标签描述")
|
||||
category = Column(String(50), nullable=True, comment="标签分类")
|
||||
color = Column(String(7), nullable=True, comment="标签颜色(十六进制)")
|
||||
usage_count = Column(BigInteger, default=0, comment="使用次数")
|
||||
created_at = Column(TIMESTAMP, server_default=func.current_timestamp(), comment="创建时间")
|
||||
updated_at = Column(TIMESTAMP, server_default=func.current_timestamp(), onupdate=func.current_timestamp(), comment="更新时间")
|
||||
|
||||
def __repr__(self):
|
||||
return f"<Tag(id={self.id}, name={self.name}, category={self.category})>"
|
||||
24
runtime/datamate-python/app/models/dm/user.py
Normal file
24
runtime/datamate-python/app/models/dm/user.py
Normal file
@@ -0,0 +1,24 @@
|
||||
from sqlalchemy import Column, String, BigInteger, Boolean, TIMESTAMP
|
||||
from sqlalchemy.sql import func
|
||||
from app.db.database import Base
|
||||
|
||||
class User(Base):
|
||||
"""用户模型"""
|
||||
|
||||
__tablename__ = "users"
|
||||
|
||||
id = Column(BigInteger, primary_key=True, autoincrement=True, comment="用户ID")
|
||||
username = Column(String(255), nullable=False, unique=True, comment="用户名")
|
||||
email = Column(String(255), nullable=False, unique=True, comment="邮箱")
|
||||
password_hash = Column(String(255), nullable=False, comment="密码哈希")
|
||||
full_name = Column(String(255), nullable=True, comment="真实姓名")
|
||||
avatar_url = Column(String(500), nullable=True, comment="头像URL")
|
||||
role = Column(String(50), nullable=False, default='USER', comment="角色:ADMIN/USER")
|
||||
organization = Column(String(255), nullable=True, comment="所属机构")
|
||||
enabled = Column(Boolean, nullable=False, default=True, comment="是否启用")
|
||||
last_login_at = Column(TIMESTAMP, nullable=True, comment="最后登录时间")
|
||||
created_at = Column(TIMESTAMP, server_default=func.current_timestamp(), comment="创建时间")
|
||||
updated_at = Column(TIMESTAMP, server_default=func.current_timestamp(), onupdate=func.current_timestamp(), comment="更新时间")
|
||||
|
||||
def __repr__(self):
|
||||
return f"<User(id={self.id}, username={self.username}, role={self.role})>"
|
||||
11
runtime/datamate-python/app/models/operator/__init__.py
Normal file
11
runtime/datamate-python/app/models/operator/__init__.py
Normal file
@@ -0,0 +1,11 @@
|
||||
# app/models/operator/__init__.py
|
||||
|
||||
from .operator import Operator
|
||||
from .operator_category import OperatorCategory
|
||||
from .operator_category_relation import OperatorCategoryRelation
|
||||
|
||||
__all__ = [
|
||||
"Operator",
|
||||
"OperatorCategory",
|
||||
"OperatorCategoryRelation"
|
||||
]
|
||||
24
runtime/datamate-python/app/models/operator/operator.py
Normal file
24
runtime/datamate-python/app/models/operator/operator.py
Normal file
@@ -0,0 +1,24 @@
|
||||
from sqlalchemy import Column, String, Text, Boolean, TIMESTAMP
|
||||
from sqlalchemy.sql import func
|
||||
from app.db.database import Base
|
||||
|
||||
class Operator(Base):
|
||||
"""算子模型"""
|
||||
|
||||
__tablename__ = "t_operator"
|
||||
|
||||
id = Column(String(64), primary_key=True, comment="算子ID")
|
||||
name = Column(String(64), nullable=True, comment="算子名称")
|
||||
description = Column(String(256), nullable=True, comment="算子描述")
|
||||
version = Column(String(256), nullable=True, comment="版本")
|
||||
inputs = Column(String(256), nullable=True, comment="输入类型")
|
||||
outputs = Column(String(256), nullable=True, comment="输出类型")
|
||||
runtime = Column(Text, nullable=True, comment="运行时信息")
|
||||
settings = Column(Text, nullable=True, comment="配置信息")
|
||||
file_name = Column(Text, nullable=True, comment="文件名")
|
||||
is_star = Column(Boolean, nullable=True, comment="是否收藏")
|
||||
created_at = Column(TIMESTAMP, server_default=func.current_timestamp(), comment="创建时间")
|
||||
updated_at = Column(TIMESTAMP, server_default=func.current_timestamp(), onupdate=func.current_timestamp(), comment="更新时间")
|
||||
|
||||
def __repr__(self):
|
||||
return f"<Operator(id={self.id}, name={self.name}, version={self.version})>"
|
||||
@@ -0,0 +1,15 @@
|
||||
from sqlalchemy import Column, String, Integer
|
||||
from app.db.database import Base
|
||||
|
||||
class OperatorCategory(Base):
|
||||
"""算子分类模型"""
|
||||
|
||||
__tablename__ = "t_operator_category"
|
||||
|
||||
id = Column(Integer, primary_key=True, autoincrement=True, comment="分类ID")
|
||||
name = Column(String(64), nullable=True, comment="分类名称")
|
||||
type = Column(String(64), nullable=True, comment="分类类型")
|
||||
parent_id = Column(Integer, nullable=True, comment="父分类ID")
|
||||
|
||||
def __repr__(self):
|
||||
return f"<OperatorCategory(id={self.id}, name={self.name}, type={self.type})>"
|
||||
@@ -0,0 +1,13 @@
|
||||
from sqlalchemy import Column, String, Integer
|
||||
from app.db.database import Base
|
||||
|
||||
class OperatorCategoryRelation(Base):
|
||||
"""算子分类关联模型"""
|
||||
|
||||
__tablename__ = "t_operator_category_relation"
|
||||
|
||||
category_id = Column(Integer, primary_key=True, comment="分类ID")
|
||||
operator_id = Column(String(64), primary_key=True, comment="算子ID")
|
||||
|
||||
def __repr__(self):
|
||||
return f"<OperatorCategoryRelation(category_id={self.category_id}, operator_id={self.operator_id})>"
|
||||
Reference in New Issue
Block a user