Files
DataMate/runtime/datamate-python/app/module/synthesis/schema/ratio_task.py
hefanli 08bd4eca5c feature:增加数据配比功能 (#52)
* refactor: 修改调整数据归集实现,删除无用代码,优化代码结构

* feature: 每天凌晨00:00扫描所有数据集,检查数据集是否超过了预设的保留天数,超出保留天数的数据集调用删除接口进行删除

* fix: 修改删除数据集文件的逻辑,上传到数据集中的文件会同时删除数据库中的记录和文件系统中的文件,归集过来的文件仅删除数据库中的记录

* fix: 增加参数校验和接口定义,删除不使用的接口

* fix: 数据集统计数据默认为0

* feature: 数据集状态增加流转,创建时为草稿状态,上传文件或者归集文件后修改为活动状态

* refactor: 修改分页查询归集任务的代码

* fix: 更新后重新执行;归集任务执行增加事务控制

* feature: 创建归集任务时能够同步创建数据集,更新归集任务时能更新到指定数据集

* fix: 创建归集任务不需要创建数据集时不应该报错

* fix: 修复删除文件时数据集的统计数据不变动

* feature: 查询数据集详情时能够获取到文件标签分布

* fix: tags为空时不进行分析

* fix: 状态修改为ACTIVE

* fix: 修改解析tag的方法

* feature: 实现创建、分页查询、删除配比任务

* feature: 实现创建、分页查询、删除配比任务的前端交互

* fix: 修复进度计算异常导致的页面报错
2025-11-03 10:17:39 +08:00

87 lines
2.4 KiB
Python

from typing import List, Optional
from pydantic import BaseModel, Field, field_validator
class RatioConfigItem(BaseModel):
dataset_id: str = Field(..., alias="datasetId", description="数据集id")
counts: str = Field(..., description="数量")
filter_conditions: str = Field(..., description="过滤条件")
@field_validator("counts")
@classmethod
def validate_counts(cls, v: str) -> str:
# ensure it's a numeric string
try:
int(v)
except Exception:
raise ValueError("counts must be a numeric string")
return v
class CreateRatioTaskRequest(BaseModel):
name: str = Field(..., description="名称")
description: Optional[str] = Field(None, description="描述")
totals: str = Field(..., description="目标数量")
ratio_method: str = Field(..., description="配比方式", alias="ratio_method")
config: List[RatioConfigItem] = Field(..., description="配比设置列表")
@field_validator("ratio_method")
@classmethod
def validate_ratio_method(cls, v: str) -> str:
allowed = {"TAG", "DATASET"}
if v not in allowed:
raise ValueError(f"ratio_method must be one of {allowed}")
return v
@field_validator("totals")
@classmethod
def validate_totals(cls, v: str) -> str:
try:
iv = int(v)
if iv < 0:
raise ValueError("totals must be >= 0")
except Exception:
raise ValueError("totals must be a numeric string")
return v
class TargetDatasetInfo(BaseModel):
id: str
name: str
datasetType: str
status: str
class CreateRatioTaskResponse(BaseModel):
# task info
id: str
name: str
description: Optional[str] = None
totals: int
ratio_method: str
status: str
# echoed config
config: List[RatioConfigItem]
# created dataset
targetDataset: TargetDatasetInfo
class RatioTaskItem(BaseModel):
id: str
name: str
description: Optional[str] = None
status: Optional[str] = None
totals: Optional[int] = None
ratio_method: Optional[str] = None
target_dataset_id: Optional[str] = None
target_dataset_name: Optional[str] = None
created_at: Optional[str] = None
updated_at: Optional[str] = None
class PagedRatioTaskResponse(BaseModel):
content: List[RatioTaskItem]
totalElements: int
totalPages: int
page: int
size: int