Commit Graph

280 Commits

Author SHA1 Message Date
4d228ba739 mirror 2026-01-09 08:49:18 +08:00
f3f1609455 mirror 2026-01-09 08:46:01 +08:00
d5b75fee0d LSF 2026-01-07 00:00:16 +08:00
hhhhsc701
7d4dcb756b fix: 修复入库可能重复;筛选逻辑优化 (#226)
* 修改数据清洗筛选逻辑-筛选修改为多选

* 修改数据清洗筛选逻辑-筛选修改为多选

* antd 组件库样式定制修改

* fix: 修复入库可能重复

* fix: 算子市场筛选逻辑优化

* fix: 清洗任务创建筛选逻辑优化

* fix: 清洗任务创建筛选逻辑优化

---------

Co-authored-by: chase <byzhangxin11@126.com>
2026-01-06 17:57:25 +08:00
Kecheng Sha
49cc98941f 优化合成任务数据集选择交互体验 (#225) 2026-01-06 15:02:06 +08:00
o0Shark0o
843bfef07b fix: correct dataset file pagination for synthesis task 2026-01-06 11:28:37 +08:00
hefanli
a15a6134ff fix the ratio task config (#224)
* fix: fix the dataset card icon

* fix: fix the dataset file tag distribution and ratio task

* refactor: change dateRange config from latest to start-end
2026-01-05 17:02:28 +08:00
Kecheng Sha
3f1ad6a872 feat(auto-annotation): integrate YOLO auto-labeling and enhance data management (#223)
* feat(auto-annotation): initial setup

* chore: remove package-lock.json

* chore: 清理本地测试脚本与 Maven 设置

* chore: change package-lock.json
2026-01-05 14:22:44 +08:00
hefanli
ccfb84c034 feature: add mysql collection and starrocks collection (#222)
* fix: fix the path for backend-python imaage building

* feature: add mysql collection and starrocks collection

* feature: add mysql collection and starrocks collection

* fix: change the permission of those files which collected from nfs to 754

* fix: delete collected files, config files and log files while deleting collection task

* fix: add the collection task detail api

* fix: change the log of collecting for dataset

* fix: add collection task selecting while creating and updating dataset

* fix: set the umask value to 0022 for java process
2026-01-04 19:05:08 +08:00
hhhhsc701
8d61eb28c3 fix: ray部署修改 (#221) 2026-01-04 09:42:25 +08:00
Dallas98
91f02300d7 feat: update Docker configuration to include backend Nginx settings (#219)
* feat: 增加label-studio的k8s部署卸载

* Revert "feat: 增加label-studio的k8s部署卸载"

This reverts commit 3e59c33e1de7d2c8d45a1f3d5fb53112a20a24a6.

* feat: update Docker configuration to include backend Nginx settings
2025-12-31 16:44:21 +08:00
o0Shark0o
cbed6fbcd7 Revert "Merge branch 'main' of https://github.com/ModelEngine-Group/DataMate"
This reverts commit a12f4c90a5, reversing
changes made to 34f08df86b.
2025-12-31 16:19:19 +08:00
o0Shark0o
a12f4c90a5 Merge branch 'main' of https://github.com/ModelEngine-Group/DataMate 2025-12-31 16:04:05 +08:00
o0Shark0o
34f08df86b Add lucide icons and colors for video, audio and multimodal operators. 2025-12-31 16:03:57 +08:00
hefanli
3a874fe699 fix: fix the collection for nfs (#218)
* fix: remove the datax-builder for the Backend Image

* fix: fix the collection for nfs
2025-12-31 15:56:01 +08:00
Kecheng Sha
01e1c6c7e9 polish(operator cards): improve icon color distinction and subtle UI … (#217)
polish(operator cards): improve icon color distinction and subtle UI details
2025-12-31 10:54:34 +08:00
hhhhsc701
f183b9f2f3 feat: 算子上传适配 (#216) 2025-12-31 10:30:32 +08:00
hhhhsc701
6a1eb85e8e feat: 支持运行data-juicer算子 (#215)
* feature: 增加data-juicer算子

* feat: 支持运行data-juicer算子

* feat: 支持data-juicer任务下发

* feat: 支持data-juicer结果数据集归档

* feat: 支持data-juicer结果数据集归档
2025-12-31 09:20:41 +08:00
hefanli
63f4e3e447 refactor: modify data collection to python implementation (#214)
* feature: LabelStudio jumps without login

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* fix: remove terrabase dependency

* feature: add the collection task executions page and the collection template page

* fix: fix the collection task creation

* fix: fix the collection task creation
2025-12-30 18:48:43 +08:00
hhhhsc701
80d4dfd285 feat: 修复清洗任务创建 (#207) 2025-12-30 14:41:39 +08:00
Dallas98
c7ee10b007 feat: 增加label-studio的k8s部署卸载 (#206) 2025-12-30 10:26:13 +08:00
Kecheng Sha
e22f16166c fix: reset pagination when switching operator market category filters (#205) 2025-12-29 15:16:33 +08:00
Kecheng Sha
081abf7d2f Revert "feat: fix the problem in the Operator Market frontend pages" (#204)
Reverts ModelEngine-Group/DataMate#203
2025-12-29 12:01:19 +08:00
Kecheng Sha
0df7a872e4 Revert "feat: fix the problem in the Operator Market frontend pages" 2025-12-29 12:00:37 +08:00
Kecheng Sha
8f30f71a68 feat: fix the problem in the Operator Market frontend pages (#203) 2025-12-29 11:50:01 +08:00
root
844add27ea feat: fix the problem in the Operator Market frontend pages 2025-12-29 11:38:47 +08:00
hefanli
29e4a333a9 feature: LabelStudio jumps without login (#201) 2025-12-25 16:49:06 +08:00
hhhhsc701
87e73d3bf7 feat: label-studio支持指定sc (#200)
* feat: label-studio构建脚本

* feat: label-studio构建脚本

* feat: label-studio构建脚本

* feat: label-studio安装脚本

* feat: label-studio支持指定sc
2025-12-25 16:13:38 +08:00
hhhhsc701
7e842c7cd5 feat: label-studio构建脚本 (#198)
* feat: label-studio构建脚本

* feat: label-studio构建脚本

* feat: label-studio构建脚本

* feat: label-studio安装脚本
2025-12-25 11:44:05 +08:00
hhhhsc701
1c507ac98a feat: 支持npu自动扩缩容 (#197)
* feat: npu动态调度

* feat: 数据集分页优化

* feat: 支持npu自动扩缩容

* feat: 支持npu自动扩缩容

* feat: 支持npu自动扩缩容

* feat: clean code
2025-12-24 18:03:30 +08:00
Dallas98
de7f853c83 feat: add CodeQL analysis workflow configuration (#196) 2025-12-24 12:14:16 +08:00
hefanli
215d7f0612 Fix the ratio task bug (#194)
* fix: add feign client configurations

* fix: add nacos configurations

* fix: add python to gateway

* fix: Fix the ratio task bug
2025-12-24 11:40:26 +08:00
hhhhsc701
6d61348388 feat: deer-flow通过gateway转发 (#193) 2025-12-23 11:35:45 +08:00
hhhhsc701
d82bff441a fix: prevent deletion of predefined operators and improve error handling (#192)
* fix: prevent deletion of predefined operators and improve error handling

* fix: prevent deletion of predefined operators and improve error handling
2025-12-22 19:30:41 +08:00
hefanli
c1516c87b6 Feat gateway (#191)
* fix: fix the routes definition

* fix: fix the helm installing file

* fix: modify the logging dependencies
2025-12-22 18:58:55 +08:00
hefanli
d419eec3ec fix: fix the routes definition (#189) 2025-12-22 16:04:49 +08:00
hefanli
e5b28c26b1 add gateway (#187)
* feature: add gateway
2025-12-22 15:41:17 +08:00
hhhhsc701
46f4a8c219 feat: add download functionality for example operator and update Dock… (#188)
* feat: add download functionality for example operator and update Dockerfile

* feat: enhance download response by exposing content disposition header

* feat: update download function to accept filename parameter for example operator
2025-12-22 15:39:32 +08:00
Dallas98
8fc4455b57 配置文件更改 (#186)
* feat(generation_service): add image URL extraction and random QA generation logic

* fix(generation_service): increase batch size from 20 to 100 for improved chunk processing

* fix(generation_service): increase batch size from 20 to 100 for improved chunk processing
2025-12-22 09:29:00 +08:00
Dallas98
85eb5a99ba feat(generation_service): add image URL extraction and random QA generation logic (#182) 2025-12-19 17:36:00 +08:00
hhhhsc701
ab4523b556 add export type settings and enhance metadata structure (#181)
* fix(session): enhance database connection settings with pool pre-ping and recycle options

* feat(metadata): add export type settings and enhance metadata structure

* fix(base_op): improve sample handling by introducing target_type key and consolidating text/data retrieval logic

* feat(metadata): add export type settings and enhance metadata structure

* feat(metadata): add export type settings and enhance metadata structure
2025-12-19 11:54:08 +08:00
Dallas98
d70a3eda0d feat(generation_service): add document filtering to remove short documents based on chunk size (#180)
* fix(chart): update Helm chart helpers and values for improved configuration

* feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths

* feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthDataDetail): add delete action for chunks with confirmation prompt

* feat(SynthDataDetail): update edit and delete buttons to icon-only format

* feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion

* feat(DocumentSplitter): add enhanced document splitting functionality with CJK support and metadata detection

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* feat(DataSynthesis): streamline synthesis task handling and enhance chunk processing logic

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* fix(generation_service): ensure processed chunks are incremented regardless of question generation success

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(model_chat): enhance JSON parsing by removing additional thought tags and improving fallback logic

* feat(generation_service): add document filtering to remove short documents based on chunk size
2025-12-19 09:34:02 +08:00
hhhhsc701
be875086db feat: add operator-packages-volume to docker-compose and update Docke… (#179)
* feat: add operator-packages-volume to docker-compose and update Dockerfile for site-packages path

* feat: add retry
2025-12-18 20:32:42 +08:00
Dallas98
27b1cc8e09 feat(model_chat): enhance JSON parsing by removing additional thought tags and improving fallback logic (#178)
* fix(chart): update Helm chart helpers and values for improved configuration

* feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths

* feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthDataDetail): add delete action for chunks with confirmation prompt

* feat(SynthDataDetail): update edit and delete buttons to icon-only format

* feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion

* feat(DocumentSplitter): add enhanced document splitting functionality with CJK support and metadata detection

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* feat(DataSynthesis): streamline synthesis task handling and enhance chunk processing logic

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* fix(generation_service): ensure processed chunks are incremented regardless of question generation success

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(model_chat): enhance JSON parsing by removing additional thought tags and improving fallback logic
2025-12-18 19:19:54 +08:00
Dallas98
e0e9b1d94d feat:问题生成过程优化及COT数据生成优化 (#169)
* fix(chart): update Helm chart helpers and values for improved configuration

* feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths

* feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthDataDetail): add delete action for chunks with confirmation prompt

* feat(SynthDataDetail): update edit and delete buttons to icon-only format

* feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion

* feat(DocumentSplitter): add enhanced document splitting functionality with CJK support and metadata detection

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* feat(DataSynthesis): streamline synthesis task handling and enhance chunk processing logic

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* fix(generation_service): ensure processed chunks are incremented regardless of question generation success

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options
2025-12-18 16:51:18 +08:00
hhhhsc701
761f7f6a51 fix: optimize PDF parsing by implementing concurrent processing with … (#177)
* fix: optimize PDF parsing by implementing concurrent processing with ThreadPoolExecutor

* Refactor to async processing for file extraction

Refactor the file processing to use asyncio for improved performance and concurrency.
2025-12-18 15:28:30 +08:00
Dallas98
8113840ac7 fix(docker-compose): update entrypoint and command for mineru-openai-server configuration (#176) 2025-12-17 21:23:03 +08:00
hhhhsc701
12ade8bc7b fix: streamline Dockerfile by removing redundant mirror configuration… (#175)
fix: streamline Dockerfile by removing redundant mirror configurations and simplifying package installation
2025-12-17 16:34:41 +08:00
hhhhsc701
924d977d6f 支持mineru npu处理 (#174)
* feature: unstructured支持简单pdf处理

* feature: update values.yaml to enhance ray-cluster configuration with security context, environment variables, and resource limits

* feature: update deploy.yaml and process.py for mineru server configuration and PDF processing enhancements

* feature: update deploy.yaml and process.py for mineru server configuration and PDF processing enhancements

* feature: improve PDF processing logic and update dependencies in process.py and pyproject.toml

* feature: improve PDF processing logic and update dependencies in process.py and pyproject.toml

* feature: update Dockerfile for improved package source mirrors and add mineru-npu to build targets
2025-12-17 16:31:06 +08:00
hhhhsc701
3b4f8488e8 fix: update Dockerfile to improve pip installation process and remove unnecessary uninstalls (#173)
* feature: unstructured支持简单pdf处理

* feature: update values.yaml to enhance ray-cluster configuration with security context, environment variables, and resource limits

* fix: update Dockerfile to improve pip installation process and remove unnecessary uninstalls
2025-12-17 11:49:47 +08:00