Commit Graph

160 Commits

Author SHA1 Message Date
3f36be0f9f feat(runtime): 实现运行时操作模块的自动导入功能
- 添加 importlib 和 os 模块用于动态导入
- 集成 loguru 日志记录器进行错误追踪
- 实现自动遍历并导入所有子模块的逻辑
- 添加异常处理机制捕获模块加载失败的情况
- 确保所有子模块注册的算子能够正确加载
- 修复模块导入顺序以支持注解操作正常工作
2026-01-19 12:37:40 +08:00
0ed5a27a72 fix(dataset): 解决操作符查找失败时的错误处理
- 当操作符在任何注册表中都找不到时抛出ImportError异常
- 对无效的注册表内容类型抛出更具体的ImportError异常
- 提供更清晰的错误信息帮助用户诊断问题
2026-01-19 12:12:47 +08:00
0c94361cde Revert "feat(annotation): 添加模板示例数据配置功能"
This reverts commit a2b0fc3674.
2026-01-18 22:08:20 +08:00
a2b0fc3674 feat(annotation): 添加模板示例数据配置功能
- 在模板配置表单中新增示例数据输入区域
- 实现不同数据类型的示例输入框(文本、图片、音频、视频等)
- 添加图片类型示例的实时预览功能
- 在模板详情页增加示例数据预览卡片
- 支持多种媒体类型的示例展示(图片、音频、视频、文本)
- 更新前后端数据模型以支持exampleData字段
- 添加示例数据的placeholder提示文案
2026-01-18 21:59:41 +08:00
e81c0bf199 feat(annotation): 扩展模板ID字段长度以支持自定义ID
- 将标注配置模板表的id字段从VARCHAR(36)扩展到VARCHAR(64)
- 修改标注管理模型中的template_id字段从VARCHAR(36)扩展到VARCHAR(64)
- 更新数据库初始化脚本中的字段长度定义
- 支持更长的UUID或自定义ID格式的模板标识符
2026-01-18 20:50:00 +08:00
b992b08b2c feat(annotation): 扩展标注模板功能支持多模态数据类型
- 扩展数据类型支持包括pdf/chat/html/table等多种格式
- 新增标注类型涵盖asr/ner/object-detection等专业领域
- 添加label_config字段用于Label Studio XML配置存储
- 更新模板分类体系为audio-speech/chat/computer-vision/nlp等
- 实现预定义label_config优先使用的配置加载逻辑
- 完善数据库初始化脚本包含多模态标注模板数据
2026-01-18 20:35:34 +08:00
0c97648a9e fix(annotation): 修复导出统计功能中的文件状态过滤问题
- 在获取总文件数时添加 ACTIVE 状态过滤条件
- 修改已标注文件数统计逻辑,使用 distinct(file_id) 进行计数
- 在导出功能中为所有文件查询添加 ACTIVE 状态过滤
- 增加日志记录以跟踪导出统计过程
- 修正
2026-01-18 17:35:40 +08:00
c48d2fdeb8 feat(annotation): 添加标注数据导出功能
- 新增导出对话框组件,支持多种格式选择
- 实现 JSON、JSONL、CSV、COCO、YOLO 五种导出格式
- 添加导出统计信息显示,包括总文件数和已标注数
- 集成前端导出按钮和后端 API 接口
- 支持仅导出已标注数据和包含原始数据选项
- 实现文件下载和命名功能
2026-01-18 16:54:02 +08:00
01dcd16a98 feat(annotation): 添加标注任务自定义配置功能
- 新增 LabelStudioEmbed 组件用于嵌入式标注界面预览
- 在创建标注任务对话框中添加 XML 配置编辑器
- 支持从现有模板加载配置并进行自定义修改
- 实现标注界面实时预览功能
- 后端支持直接传递 label_config 覆盖模板配置
- 更新 CreateAnnotationTaskRequest 模型添加 labelConfig 字段
2026-01-18 14:12:12 +08:00
e1c41a93c3 refactor(annotation): 优化模板生成逻辑移除文本类型特殊处理
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (java-kotlin) (push) Has been cancelled
CodeQL Advanced / Analyze (javascript-typescript) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
- 移除了 text_object_types 变量定义
- 删除了 is_text_template 判断逻辑
- 移除了长文本优化的双栏布局实现
- 添加了关于 Label Studio 默认侧栏控件行为的说明
- 简化了 XML 结构生成逻辑
2026-01-09 18:52:52 +08:00
b5aaf52bb6 chore(deps): 更新 paddlenlp 依赖版本
- 将 paddlenlp 从 3.0b4 版本降级到 2.8.1 版本
- 保持其他依赖包版本不变
- 确保依赖版本兼容性
2026-01-09 17:20:05 +08:00
103cb94a6d feat(runtime): 添加 PaddleNLP 依赖包
- 在 pyproject.toml 中新增 paddlenlp==3.0.0b4 依赖
- 为 OCR 功能扩展提供自然语言处理支持
2026-01-09 15:51:42 +08:00
294e7a1021 fix destPath param 2026-01-09 15:21:48 +08:00
a98eeb530f s3-compatible-fs support 2026-01-09 14:35:03 +08:00
ba210d3d4f localfs support 2026-01-09 14:35:03 +08:00
010ffceab5 glusterfs support 2026-01-09 13:49:18 +08:00
fa755faf72 ftp 2026-01-09 13:47:43 +08:00
7de49feb66 feat(annotation): 优化标注模板生成和配置验证
- 添加文本对象类型集合用于模板类型判断
- 将XML生成部分拆分为对象和控件两个独立部分
- 为文本类模板添加响应式布局支持长文本标注
- 修复配置验证器中对象和控件查找逻辑
- 优化标签控件在长文本场景下的显示位置
2026-01-09 13:39:55 +08:00
d6aaa1b7a8 feat(runtime): 启用 FTP 读写模块
- 取消注释 ftpreader 模块的配置
- 取消注释 ftpwriter 模块的配置
- 使 FTP 数据传输功能在运行时可用
2026-01-09 13:36:51 +08:00
08336e2a13 feat(annotation): 添加标注模板配置功能
- 在schema中新增choice和show_inline字段支持选择模式配置
- 为编辑器服务添加空标注创建逻辑避免前端异常
- 实现标签类型的标准化处理和大小写兼容
- 支持Choices标签的单选/多选和行内显示配置
- 优化前端界面滚动条显示控制样式
2026-01-09 13:05:09 +08:00
a82f4f1bc3 refactor(annotation): 移除对 Label Studio Server 的依赖并切换到内嵌编辑器模式
- 移除 LabelStudioClient 和 SyncService 的导入及使用
- 删除与 Label Studio 项目的创建、删除和同步相关代码
- 修改创建数据集映射功能,改为创建 DataMate 标注项目
- 更新删除映射接口,仅进行软删除不再删除 Label Studio 项目
- 修改同步接口为兼容性保留,实际操作为空操作
- 移除 Label Studio 连接诊断功能
- 更新文档说明以反映内嵌编辑器模式的变化
2026-01-09 12:31:03 +08:00
3aa7f6e3a1 refactor(annotation): 移除对 Label Studio Server 的依赖并切换到内嵌编辑器模式
- 移除 LabelStudioClient 和 SyncService 的导入及使用
- 删除与 Label Studio 项目的创建、删除和同步相关代码
- 修改创建数据集映射功能,改为创建 DataMate 标注项目
- 更新删除映射接口,仅进行软删除不再删除 Label Studio 项目
- 修改同步接口为兼容性保留,实际操作为空操作
- 移除 Label Studio 连接诊断功能
- 更新文档说明以反映内嵌编辑器模式的变化
2026-01-09 12:01:20 +08:00
d5b75fee0d LSF 2026-01-07 00:00:16 +08:00
hhhhsc701
7d4dcb756b fix: 修复入库可能重复;筛选逻辑优化 (#226)
* 修改数据清洗筛选逻辑-筛选修改为多选

* 修改数据清洗筛选逻辑-筛选修改为多选

* antd 组件库样式定制修改

* fix: 修复入库可能重复

* fix: 算子市场筛选逻辑优化

* fix: 清洗任务创建筛选逻辑优化

* fix: 清洗任务创建筛选逻辑优化

---------

Co-authored-by: chase <byzhangxin11@126.com>
2026-01-06 17:57:25 +08:00
hefanli
a15a6134ff fix the ratio task config (#224)
* fix: fix the dataset card icon

* fix: fix the dataset file tag distribution and ratio task

* refactor: change dateRange config from latest to start-end
2026-01-05 17:02:28 +08:00
Kecheng Sha
3f1ad6a872 feat(auto-annotation): integrate YOLO auto-labeling and enhance data management (#223)
* feat(auto-annotation): initial setup

* chore: remove package-lock.json

* chore: 清理本地测试脚本与 Maven 设置

* chore: change package-lock.json
2026-01-05 14:22:44 +08:00
hefanli
ccfb84c034 feature: add mysql collection and starrocks collection (#222)
* fix: fix the path for backend-python imaage building

* feature: add mysql collection and starrocks collection

* feature: add mysql collection and starrocks collection

* fix: change the permission of those files which collected from nfs to 754

* fix: delete collected files, config files and log files while deleting collection task

* fix: add the collection task detail api

* fix: change the log of collecting for dataset

* fix: add collection task selecting while creating and updating dataset

* fix: set the umask value to 0022 for java process
2026-01-04 19:05:08 +08:00
o0Shark0o
cbed6fbcd7 Revert "Merge branch 'main' of https://github.com/ModelEngine-Group/DataMate"
This reverts commit a12f4c90a5, reversing
changes made to 34f08df86b.
2025-12-31 16:19:19 +08:00
hefanli
3a874fe699 fix: fix the collection for nfs (#218)
* fix: remove the datax-builder for the Backend Image

* fix: fix the collection for nfs
2025-12-31 15:56:01 +08:00
hhhhsc701
6a1eb85e8e feat: 支持运行data-juicer算子 (#215)
* feature: 增加data-juicer算子

* feat: 支持运行data-juicer算子

* feat: 支持data-juicer任务下发

* feat: 支持data-juicer结果数据集归档

* feat: 支持data-juicer结果数据集归档
2025-12-31 09:20:41 +08:00
hefanli
63f4e3e447 refactor: modify data collection to python implementation (#214)
* feature: LabelStudio jumps without login

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* fix: remove terrabase dependency

* feature: add the collection task executions page and the collection template page

* fix: fix the collection task creation

* fix: fix the collection task creation
2025-12-30 18:48:43 +08:00
hefanli
29e4a333a9 feature: LabelStudio jumps without login (#201) 2025-12-25 16:49:06 +08:00
hhhhsc701
1c507ac98a feat: 支持npu自动扩缩容 (#197)
* feat: npu动态调度

* feat: 数据集分页优化

* feat: 支持npu自动扩缩容

* feat: 支持npu自动扩缩容

* feat: 支持npu自动扩缩容

* feat: clean code
2025-12-24 18:03:30 +08:00
hefanli
215d7f0612 Fix the ratio task bug (#194)
* fix: add feign client configurations

* fix: add nacos configurations

* fix: add python to gateway

* fix: Fix the ratio task bug
2025-12-24 11:40:26 +08:00
hhhhsc701
d82bff441a fix: prevent deletion of predefined operators and improve error handling (#192)
* fix: prevent deletion of predefined operators and improve error handling

* fix: prevent deletion of predefined operators and improve error handling
2025-12-22 19:30:41 +08:00
hefanli
e5b28c26b1 add gateway (#187)
* feature: add gateway
2025-12-22 15:41:17 +08:00
hhhhsc701
46f4a8c219 feat: add download functionality for example operator and update Dock… (#188)
* feat: add download functionality for example operator and update Dockerfile

* feat: enhance download response by exposing content disposition header

* feat: update download function to accept filename parameter for example operator
2025-12-22 15:39:32 +08:00
Dallas98
8fc4455b57 配置文件更改 (#186)
* feat(generation_service): add image URL extraction and random QA generation logic

* fix(generation_service): increase batch size from 20 to 100 for improved chunk processing

* fix(generation_service): increase batch size from 20 to 100 for improved chunk processing
2025-12-22 09:29:00 +08:00
Dallas98
85eb5a99ba feat(generation_service): add image URL extraction and random QA generation logic (#182) 2025-12-19 17:36:00 +08:00
hhhhsc701
ab4523b556 add export type settings and enhance metadata structure (#181)
* fix(session): enhance database connection settings with pool pre-ping and recycle options

* feat(metadata): add export type settings and enhance metadata structure

* fix(base_op): improve sample handling by introducing target_type key and consolidating text/data retrieval logic

* feat(metadata): add export type settings and enhance metadata structure

* feat(metadata): add export type settings and enhance metadata structure
2025-12-19 11:54:08 +08:00
Dallas98
d70a3eda0d feat(generation_service): add document filtering to remove short documents based on chunk size (#180)
* fix(chart): update Helm chart helpers and values for improved configuration

* feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths

* feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthDataDetail): add delete action for chunks with confirmation prompt

* feat(SynthDataDetail): update edit and delete buttons to icon-only format

* feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion

* feat(DocumentSplitter): add enhanced document splitting functionality with CJK support and metadata detection

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* feat(DataSynthesis): streamline synthesis task handling and enhance chunk processing logic

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* fix(generation_service): ensure processed chunks are incremented regardless of question generation success

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(model_chat): enhance JSON parsing by removing additional thought tags and improving fallback logic

* feat(generation_service): add document filtering to remove short documents based on chunk size
2025-12-19 09:34:02 +08:00
hhhhsc701
be875086db feat: add operator-packages-volume to docker-compose and update Docke… (#179)
* feat: add operator-packages-volume to docker-compose and update Dockerfile for site-packages path

* feat: add retry
2025-12-18 20:32:42 +08:00
Dallas98
27b1cc8e09 feat(model_chat): enhance JSON parsing by removing additional thought tags and improving fallback logic (#178)
* fix(chart): update Helm chart helpers and values for improved configuration

* feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths

* feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthDataDetail): add delete action for chunks with confirmation prompt

* feat(SynthDataDetail): update edit and delete buttons to icon-only format

* feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion

* feat(DocumentSplitter): add enhanced document splitting functionality with CJK support and metadata detection

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* feat(DataSynthesis): streamline synthesis task handling and enhance chunk processing logic

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* fix(generation_service): ensure processed chunks are incremented regardless of question generation success

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(model_chat): enhance JSON parsing by removing additional thought tags and improving fallback logic
2025-12-18 19:19:54 +08:00
Dallas98
e0e9b1d94d feat:问题生成过程优化及COT数据生成优化 (#169)
* fix(chart): update Helm chart helpers and values for improved configuration

* feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths

* feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthDataDetail): add delete action for chunks with confirmation prompt

* feat(SynthDataDetail): update edit and delete buttons to icon-only format

* feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion

* feat(DocumentSplitter): add enhanced document splitting functionality with CJK support and metadata detection

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* feat(DataSynthesis): streamline synthesis task handling and enhance chunk processing logic

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* fix(generation_service): ensure processed chunks are incremented regardless of question generation success

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options
2025-12-18 16:51:18 +08:00
hhhhsc701
761f7f6a51 fix: optimize PDF parsing by implementing concurrent processing with … (#177)
* fix: optimize PDF parsing by implementing concurrent processing with ThreadPoolExecutor

* Refactor to async processing for file extraction

Refactor the file processing to use asyncio for improved performance and concurrency.
2025-12-18 15:28:30 +08:00
hhhhsc701
924d977d6f 支持mineru npu处理 (#174)
* feature: unstructured支持简单pdf处理

* feature: update values.yaml to enhance ray-cluster configuration with security context, environment variables, and resource limits

* feature: update deploy.yaml and process.py for mineru server configuration and PDF processing enhancements

* feature: update deploy.yaml and process.py for mineru server configuration and PDF processing enhancements

* feature: improve PDF processing logic and update dependencies in process.py and pyproject.toml

* feature: improve PDF processing logic and update dependencies in process.py and pyproject.toml

* feature: update Dockerfile for improved package source mirrors and add mineru-npu to build targets
2025-12-17 16:31:06 +08:00
hhhhsc701
62b91b6deb bugfix: update values.yaml to enhance ray-cluster configuration with security context, environment variables, and resource limits (#172)
* feature: unstructured支持简单pdf处理

* feature: update values.yaml to enhance ray-cluster configuration with security context, environment variables, and resource limits
2025-12-17 10:41:13 +08:00
hhhhsc701
d8c0b0ed73 补充modal范围 (#165) 2025-12-12 13:34:03 +08:00
hhhhsc701
fc9fb07e77 bugfix (#164) 2025-12-11 23:17:01 +08:00
Dallas98
ec87e4f204 feat(frontend): 增强Synthesis Data Detail页面UX体验 (#163)
* fix(chart): update Helm chart helpers and values for improved configuration

* feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths

* feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthDataDetail): add delete action for chunks with confirmation prompt

* feat(SynthDataDetail): update edit and delete buttons to icon-only format

* feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion
2025-12-11 21:02:44 +08:00