Commit Graph

57 Commits

Author SHA1 Message Date
ea6765ea0f fix(annotation): 修改数据集文件状态查询逻辑
- 将文件状态查询从仅统计 ACTIVE 状态扩展为 ACTIVE 和 COMPLETED 状态
- 使用 in_ 操作符替代等于操作符以支持多状态查询
- 保持原有数据集标注计数功能不变
2026-01-20 00:30:21 +08:00
d890a5679d refactor(annotation): 统一查询参数命名规范
- 将分页查询参数 pageSize 替换为 size
- 更新所有相关函数中的参数引用
- 修改日志输出中的参数名称显示
- 保持原有的分页逻辑不变
2026-01-19 23:56:40 +08:00
cc0a977349 feat(annotation): 添加标注任务的数据量统计功能
- 在前端表格中新增数据量和已标注列显示
- 添加标注完成百分比计算和提示功能
- 在后端schema中增加totalCount和annotatedCount字段
- 实现项目统计数据查询服务方法
- 集成前后端数据映射和接口响应更新
2026-01-19 22:43:41 +08:00
649ab2f6bb refactor(annotation): 移除调试日志和异常堆栈跟踪
- 移除了项目映射获取接口中的traceback打印
- 简化了内部服务器错误响应消息
- 删除了映射服务中的多个调试日志输出
- 清理了响应数据构建过程中的调试信息
2026-01-19 21:58:00 +08:00
496161b1f1 ```
chore(annotation): 添加调试日志到映射服务

- 在 _to_response_from_row 方法中添加配置和标签配置的调试日志
- 在 _to_response 方法中添加映射ID和配置信息的调试日志
- 添加响应数据键名的调试日志
- 优化配置解析逻辑以确保字典类型的正确检查
```
2026-01-19 21:52:01 +08:00
f4a86b4af1 feat(annotation): 添加 labelConfig 字段并优化配置解析逻辑
- 在 DatasetMappingResponse 模型中新增 label_config 字段
- 修改前端获取 labelConfig 的逻辑,优先使用任务自身配置
- 移除模板配置的 condition 分支,统一从 XML 解析配置
- 更新后端服务从 configuration JSON 字段中提取 label_config 和 description
- 优化前后端配置解析的一致性处理
2026-01-19 21:39:00 +08:00
70ea998564 feat(annotation): 优化标注编辑器的标签配置获取逻辑
- 优先使用项目配置中的label_config(用户编辑版本)
- 其次使用模板默认配置作为备选方案
- 支持从项目配置字典中获取label_config字段
- 保持向后兼容性,当项目配置无效时回退到模板配置
2026-01-19 16:34:20 +08:00
e192c826eb fix(annotation): 解决文件名中文编码问题
- 添加 urllib.parse.quote 用于文件名编码
- 实现 RFC 5987 标准支持 UTF-8 编码的文件名
- 修改 Content-Disposition 头部使用 filename* 参数
- 确保中文文件名在下载时正确显示
2026-01-19 14:23:14 +08:00
0c94361cde Revert "feat(annotation): 添加模板示例数据配置功能"
This reverts commit a2b0fc3674.
2026-01-18 22:08:20 +08:00
a2b0fc3674 feat(annotation): 添加模板示例数据配置功能
- 在模板配置表单中新增示例数据输入区域
- 实现不同数据类型的示例输入框(文本、图片、音频、视频等)
- 添加图片类型示例的实时预览功能
- 在模板详情页增加示例数据预览卡片
- 支持多种媒体类型的示例展示(图片、音频、视频、文本)
- 更新前后端数据模型以支持exampleData字段
- 添加示例数据的placeholder提示文案
2026-01-18 21:59:41 +08:00
b992b08b2c feat(annotation): 扩展标注模板功能支持多模态数据类型
- 扩展数据类型支持包括pdf/chat/html/table等多种格式
- 新增标注类型涵盖asr/ner/object-detection等专业领域
- 添加label_config字段用于Label Studio XML配置存储
- 更新模板分类体系为audio-speech/chat/computer-vision/nlp等
- 实现预定义label_config优先使用的配置加载逻辑
- 完善数据库初始化脚本包含多模态标注模板数据
2026-01-18 20:35:34 +08:00
0c97648a9e fix(annotation): 修复导出统计功能中的文件状态过滤问题
- 在获取总文件数时添加 ACTIVE 状态过滤条件
- 修改已标注文件数统计逻辑,使用 distinct(file_id) 进行计数
- 在导出功能中为所有文件查询添加 ACTIVE 状态过滤
- 增加日志记录以跟踪导出统计过程
- 修正
2026-01-18 17:35:40 +08:00
c48d2fdeb8 feat(annotation): 添加标注数据导出功能
- 新增导出对话框组件,支持多种格式选择
- 实现 JSON、JSONL、CSV、COCO、YOLO 五种导出格式
- 添加导出统计信息显示,包括总文件数和已标注数
- 集成前端导出按钮和后端 API 接口
- 支持仅导出已标注数据和包含原始数据选项
- 实现文件下载和命名功能
2026-01-18 16:54:02 +08:00
01dcd16a98 feat(annotation): 添加标注任务自定义配置功能
- 新增 LabelStudioEmbed 组件用于嵌入式标注界面预览
- 在创建标注任务对话框中添加 XML 配置编辑器
- 支持从现有模板加载配置并进行自定义修改
- 实现标注界面实时预览功能
- 后端支持直接传递 label_config 覆盖模板配置
- 更新 CreateAnnotationTaskRequest 模型添加 labelConfig 字段
2026-01-18 14:12:12 +08:00
e1c41a93c3 refactor(annotation): 优化模板生成逻辑移除文本类型特殊处理
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (java-kotlin) (push) Has been cancelled
CodeQL Advanced / Analyze (javascript-typescript) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
- 移除了 text_object_types 变量定义
- 删除了 is_text_template 判断逻辑
- 移除了长文本优化的双栏布局实现
- 添加了关于 Label Studio 默认侧栏控件行为的说明
- 简化了 XML 结构生成逻辑
2026-01-09 18:52:52 +08:00
294e7a1021 fix destPath param 2026-01-09 15:21:48 +08:00
7de49feb66 feat(annotation): 优化标注模板生成和配置验证
- 添加文本对象类型集合用于模板类型判断
- 将XML生成部分拆分为对象和控件两个独立部分
- 为文本类模板添加响应式布局支持长文本标注
- 修复配置验证器中对象和控件查找逻辑
- 优化标签控件在长文本场景下的显示位置
2026-01-09 13:39:55 +08:00
08336e2a13 feat(annotation): 添加标注模板配置功能
- 在schema中新增choice和show_inline字段支持选择模式配置
- 为编辑器服务添加空标注创建逻辑避免前端异常
- 实现标签类型的标准化处理和大小写兼容
- 支持Choices标签的单选/多选和行内显示配置
- 优化前端界面滚动条显示控制样式
2026-01-09 13:05:09 +08:00
a82f4f1bc3 refactor(annotation): 移除对 Label Studio Server 的依赖并切换到内嵌编辑器模式
- 移除 LabelStudioClient 和 SyncService 的导入及使用
- 删除与 Label Studio 项目的创建、删除和同步相关代码
- 修改创建数据集映射功能,改为创建 DataMate 标注项目
- 更新删除映射接口,仅进行软删除不再删除 Label Studio 项目
- 修改同步接口为兼容性保留,实际操作为空操作
- 移除 Label Studio 连接诊断功能
- 更新文档说明以反映内嵌编辑器模式的变化
2026-01-09 12:31:03 +08:00
3aa7f6e3a1 refactor(annotation): 移除对 Label Studio Server 的依赖并切换到内嵌编辑器模式
- 移除 LabelStudioClient 和 SyncService 的导入及使用
- 删除与 Label Studio 项目的创建、删除和同步相关代码
- 修改创建数据集映射功能,改为创建 DataMate 标注项目
- 更新删除映射接口,仅进行软删除不再删除 Label Studio 项目
- 修改同步接口为兼容性保留,实际操作为空操作
- 移除 Label Studio 连接诊断功能
- 更新文档说明以反映内嵌编辑器模式的变化
2026-01-09 12:01:20 +08:00
d5b75fee0d LSF 2026-01-07 00:00:16 +08:00
hefanli
a15a6134ff fix the ratio task config (#224)
* fix: fix the dataset card icon

* fix: fix the dataset file tag distribution and ratio task

* refactor: change dateRange config from latest to start-end
2026-01-05 17:02:28 +08:00
Kecheng Sha
3f1ad6a872 feat(auto-annotation): integrate YOLO auto-labeling and enhance data management (#223)
* feat(auto-annotation): initial setup

* chore: remove package-lock.json

* chore: 清理本地测试脚本与 Maven 设置

* chore: change package-lock.json
2026-01-05 14:22:44 +08:00
hefanli
ccfb84c034 feature: add mysql collection and starrocks collection (#222)
* fix: fix the path for backend-python imaage building

* feature: add mysql collection and starrocks collection

* feature: add mysql collection and starrocks collection

* fix: change the permission of those files which collected from nfs to 754

* fix: delete collected files, config files and log files while deleting collection task

* fix: add the collection task detail api

* fix: change the log of collecting for dataset

* fix: add collection task selecting while creating and updating dataset

* fix: set the umask value to 0022 for java process
2026-01-04 19:05:08 +08:00
hefanli
63f4e3e447 refactor: modify data collection to python implementation (#214)
* feature: LabelStudio jumps without login

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* fix: remove terrabase dependency

* feature: add the collection task executions page and the collection template page

* fix: fix the collection task creation

* fix: fix the collection task creation
2025-12-30 18:48:43 +08:00
hefanli
29e4a333a9 feature: LabelStudio jumps without login (#201) 2025-12-25 16:49:06 +08:00
hefanli
215d7f0612 Fix the ratio task bug (#194)
* fix: add feign client configurations

* fix: add nacos configurations

* fix: add python to gateway

* fix: Fix the ratio task bug
2025-12-24 11:40:26 +08:00
Dallas98
8fc4455b57 配置文件更改 (#186)
* feat(generation_service): add image URL extraction and random QA generation logic

* fix(generation_service): increase batch size from 20 to 100 for improved chunk processing

* fix(generation_service): increase batch size from 20 to 100 for improved chunk processing
2025-12-22 09:29:00 +08:00
Dallas98
85eb5a99ba feat(generation_service): add image URL extraction and random QA generation logic (#182) 2025-12-19 17:36:00 +08:00
Dallas98
d70a3eda0d feat(generation_service): add document filtering to remove short documents based on chunk size (#180)
* fix(chart): update Helm chart helpers and values for improved configuration

* feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths

* feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthDataDetail): add delete action for chunks with confirmation prompt

* feat(SynthDataDetail): update edit and delete buttons to icon-only format

* feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion

* feat(DocumentSplitter): add enhanced document splitting functionality with CJK support and metadata detection

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* feat(DataSynthesis): streamline synthesis task handling and enhance chunk processing logic

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* fix(generation_service): ensure processed chunks are incremented regardless of question generation success

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(model_chat): enhance JSON parsing by removing additional thought tags and improving fallback logic

* feat(generation_service): add document filtering to remove short documents based on chunk size
2025-12-19 09:34:02 +08:00
Dallas98
27b1cc8e09 feat(model_chat): enhance JSON parsing by removing additional thought tags and improving fallback logic (#178)
* fix(chart): update Helm chart helpers and values for improved configuration

* feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths

* feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthDataDetail): add delete action for chunks with confirmation prompt

* feat(SynthDataDetail): update edit and delete buttons to icon-only format

* feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion

* feat(DocumentSplitter): add enhanced document splitting functionality with CJK support and metadata detection

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* feat(DataSynthesis): streamline synthesis task handling and enhance chunk processing logic

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* fix(generation_service): ensure processed chunks are incremented regardless of question generation success

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(model_chat): enhance JSON parsing by removing additional thought tags and improving fallback logic
2025-12-18 19:19:54 +08:00
Dallas98
e0e9b1d94d feat:问题生成过程优化及COT数据生成优化 (#169)
* fix(chart): update Helm chart helpers and values for improved configuration

* feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths

* feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthDataDetail): add delete action for chunks with confirmation prompt

* feat(SynthDataDetail): update edit and delete buttons to icon-only format

* feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion

* feat(DocumentSplitter): add enhanced document splitting functionality with CJK support and metadata detection

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* feat(DataSynthesis): streamline synthesis task handling and enhance chunk processing logic

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* fix(generation_service): ensure processed chunks are incremented regardless of question generation success

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options
2025-12-18 16:51:18 +08:00
Dallas98
ec87e4f204 feat(frontend): 增强Synthesis Data Detail页面UX体验 (#163)
* fix(chart): update Helm chart helpers and values for improved configuration

* feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths

* feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthDataDetail): add delete action for chunks with confirmation prompt

* feat(SynthDataDetail): update edit and delete buttons to icon-only format

* feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion
2025-12-11 21:02:44 +08:00
hefanli
8f529952f6 Fix ratio (#162)
* fix: fixed the issue where an error would be reported when only setting the proportioning quantity when creating a proportioning task

* fix: prevent adding the same file multiple times

* fix: implement a more flexible matching strategy, allowing only the tag name to be configured for matching
2025-12-11 17:45:16 +08:00
Dallas98
2f3ae21f8a feat: enhance dataset file fetching with improved pagination and document loading support (#156) 2025-12-10 22:39:24 +08:00
hefanli
99fd46cb70 fix: fix the Data Evaluation Detail page (#154)
* fix: the Data Evaluation Detail Page should show the model used

* fix: fix the time format displayed

* fix: fix the Data Evaluation Detail page
2025-12-10 18:35:29 +08:00
hefanli
f87060490c feature: data management supports nested folders (#150)
* fix: k8s部署场景下,backend-python服务挂载需要存储

* fix: 增加数据集文件免拷贝的接口定义

* fix: 评估时评估结果赋予初始空值,防止未评估完成时接口报错

* feature: 数据管理支持嵌套文件夹(展示时按照文件系统展示;批量下载时带上相对路径)

* fix: 去除多余的文件重命名逻辑

* refactor: remove unused imports
2025-12-10 16:42:45 +08:00
Dallas98
015e738a7f feat(SynthDataDetail): add chunk/synthesis data management with edit/delete & UI enhancements (#139)
* feat(synthesis): add evaluation task creation functionality and UI enhancements

* feat(synthesis): implement synthesis data management features including loading, editing, and deleting

* feat(synthesis): add endpoints for deleting and updating synthesis data and chunks

* fix: Correctly extract file values from selectedFilesMap in AddDataDialog
2025-12-09 09:59:40 +08:00
hefanli
744d15ba24 fix: 修复评估时模型输出json格式不对导致读取错误的问题 (#133)
* feature: add cot data evaluation function

* fix: added verification to evaluation results

* fix: fix the prompt for evaluating

* fix: 修复当评估结果为空导致读取失败的问题
2025-12-04 18:49:50 +08:00
Dallas98
31c4966608 feat(synthesis): add functionality to archive synthesis tasks to existing datasets (#132) 2025-12-04 17:11:43 +08:00
Dallas98
7012a9ad98 feat: enhance backend deployment, frontend file selection and synthesis task management (#129)
* feat: Implement data synthesis task management with database models and API endpoints

* feat: Update Python version requirements and refine dependency constraints in configuration

* fix: Correctly extract file values from selectedFilesMap in AddDataDialog

* feat: Refactor synthesis task routes and enhance file task management in the API

* feat: Enhance SynthesisTaskTab with tooltip actions and add chunk data retrieval in API
2025-12-04 09:57:13 +08:00
hefanli
1d19cd3a62 feature: add data-evaluation
* feature: add evaluation task management function

* feature: add evaluation task detail page

* fix: delete duplicate definition for table t_model_config

* refactor: rename package synthesis to ratio

* refactor: add eval file table and  refactor related code

* fix: calling large models in parallel during evaluation
2025-12-04 09:23:54 +08:00
Dallas98
8b164cb012 feat: Implement data synthesis task management with database models and API endpoints (#122) 2025-12-02 15:23:58 +08:00
hefanli
c1352ab91f feature: multiple ratio configurations can be set for the data set. (#103)
feature: multiple ratio configurations can be set for the data set.
2025-11-24 15:28:17 +08:00
hefanli
cddfe9b149 feature: 数据配比增加通过更新时间来配置 (#95)
* feature: 数据配比增加通过更新时间来配置

* fix: 修复配比时间参数传递的问题
2025-11-20 18:50:51 +08:00
hefanli
a07fba23f2 feature:数据集导入数据集支持选择归集任务导入 (#92)
* feature: 实现obs归集

* feature: 增加数据集中出现同名文件时的处理方式

* feature: 前端数据集导入数据时增加可以选择归集任务导入
2025-11-19 11:05:33 +08:00
Jason Wang
df853a5177 feat: Enhance file tag update functionality with automatic format conversion (#84)
- Updated `update_file_tags` to support both simplified and full tag formats.
- Introduced `TagFormatConverter` to handle conversion from simplified external tags to internal storage format.
- Added logic to fetch and utilize the appropriate annotation template for conversion.
- Improved error handling for missing templates and unknown controls during tag updates.
- Created example script demonstrating the usage of the new tag format conversion feature.
- Added unit tests for `TagFormatConverter` to ensure correct functionality and edge case handling.
2025-11-14 12:42:39 +08:00
Jason Wang
45743f39f5 feat: add labeling template. refactor: switch to Poetry, build and deploy of backend Python (#79)
* feat: Enhance annotation module with template management and validation

- Added DatasetMappingCreateRequest and DatasetMappingUpdateRequest schemas to handle dataset mapping requests with camelCase and snake_case support.
- Introduced Annotation Template schemas including CreateAnnotationTemplateRequest, UpdateAnnotationTemplateRequest, and AnnotationTemplateResponse for managing annotation templates.
- Implemented AnnotationTemplateService for creating, updating, retrieving, and deleting annotation templates, including validation of configurations and XML generation.
- Added utility class LabelStudioConfigValidator for validating Label Studio configurations and XML formats.
- Updated database schema for annotation templates and labeling projects to include new fields and constraints.
- Seeded initial annotation templates for various use cases including image classification, object detection, and text classification.

* feat: Enhance TemplateForm with improved validation and dynamic field rendering; update LabelStudio config validation for camelCase support

* feat: Update docker-compose.yml to mark datamate dataset volume and network as external

* feat: Add tag configuration management and related components

- Introduced new components for tag selection and browsing in the frontend.
- Added API endpoint to fetch tag configuration from the backend.
- Implemented tag configuration management in the backend, including loading from YAML.
- Enhanced template service to support dynamic tag rendering based on configuration.
- Updated validation utilities to incorporate tag configuration checks.
- Refactored existing code to utilize the new tag configuration structure.

* feat: Refactor LabelStudioTagConfig for improved configuration loading and validation

* feat: Update Makefile to include backend-python-docker-build in the build process

* feat: Migrate to poetry for better deps management

* Add pyyaml dependency and update Dockerfile to use Poetry for dependency management

- Added pyyaml (>=6.0.3,<7.0.0) to pyproject.toml dependencies.
- Updated Dockerfile to install Poetry and manage dependencies using it.
- Improved layer caching by copying only dependency files before the application code.
- Removed unnecessary installation of build dependencies to keep the final image size small.

* feat: Remove duplicated backend-python-docker-build target from Makefile

* fix: airflow is not ready for adding yet

* feat: update Python version to 3.12 and remove project installation step in Dockerfile
2025-11-13 15:32:30 +08:00
Jason Wang
c5ccc56cca feat: Add labeling template (#72)
* feat: Enhance annotation module with template management and validation

- Added DatasetMappingCreateRequest and DatasetMappingUpdateRequest schemas to handle dataset mapping requests with camelCase and snake_case support.
- Introduced Annotation Template schemas including CreateAnnotationTemplateRequest, UpdateAnnotationTemplateRequest, and AnnotationTemplateResponse for managing annotation templates.
- Implemented AnnotationTemplateService for creating, updating, retrieving, and deleting annotation templates, including validation of configurations and XML generation.
- Added utility class LabelStudioConfigValidator for validating Label Studio configurations and XML formats.
- Updated database schema for annotation templates and labeling projects to include new fields and constraints.
- Seeded initial annotation templates for various use cases including image classification, object detection, and text classification.

* feat: Enhance TemplateForm with improved validation and dynamic field rendering; update LabelStudio config validation for camelCase support

* feat: Update docker-compose.yml to mark datamate dataset volume and network as external
2025-11-11 09:14:14 +08:00
Vincent
60e2289019 fix:修复配比任务操作问题 (#66)
* fix:配比任务需要能够跳转到目标数据集

* feature:增加配比任务详情接口

* fix:删除不存在的配比详情页面

* fix:使用正式的逻辑来展示标签

* fix:参数默认值去掉多余的-

* fix:修复配比任务相关操作
2025-11-07 19:01:45 +08:00