Commit Graph

89 Commits

Author SHA1 Message Date
3aa7f6e3a1 refactor(annotation): 移除对 Label Studio Server 的依赖并切换到内嵌编辑器模式
- 移除 LabelStudioClient 和 SyncService 的导入及使用
- 删除与 Label Studio 项目的创建、删除和同步相关代码
- 修改创建数据集映射功能,改为创建 DataMate 标注项目
- 更新删除映射接口,仅进行软删除不再删除 Label Studio 项目
- 修改同步接口为兼容性保留,实际操作为空操作
- 移除 Label Studio 连接诊断功能
- 更新文档说明以反映内嵌编辑器模式的变化
2026-01-09 12:01:20 +08:00
d5b75fee0d LSF 2026-01-07 00:00:16 +08:00
hhhhsc701
7d4dcb756b fix: 修复入库可能重复;筛选逻辑优化 (#226)
* 修改数据清洗筛选逻辑-筛选修改为多选

* 修改数据清洗筛选逻辑-筛选修改为多选

* antd 组件库样式定制修改

* fix: 修复入库可能重复

* fix: 算子市场筛选逻辑优化

* fix: 清洗任务创建筛选逻辑优化

* fix: 清洗任务创建筛选逻辑优化

---------

Co-authored-by: chase <byzhangxin11@126.com>
2026-01-06 17:57:25 +08:00
hefanli
a15a6134ff fix the ratio task config (#224)
* fix: fix the dataset card icon

* fix: fix the dataset file tag distribution and ratio task

* refactor: change dateRange config from latest to start-end
2026-01-05 17:02:28 +08:00
Kecheng Sha
3f1ad6a872 feat(auto-annotation): integrate YOLO auto-labeling and enhance data management (#223)
* feat(auto-annotation): initial setup

* chore: remove package-lock.json

* chore: 清理本地测试脚本与 Maven 设置

* chore: change package-lock.json
2026-01-05 14:22:44 +08:00
hefanli
ccfb84c034 feature: add mysql collection and starrocks collection (#222)
* fix: fix the path for backend-python imaage building

* feature: add mysql collection and starrocks collection

* feature: add mysql collection and starrocks collection

* fix: change the permission of those files which collected from nfs to 754

* fix: delete collected files, config files and log files while deleting collection task

* fix: add the collection task detail api

* fix: change the log of collecting for dataset

* fix: add collection task selecting while creating and updating dataset

* fix: set the umask value to 0022 for java process
2026-01-04 19:05:08 +08:00
o0Shark0o
cbed6fbcd7 Revert "Merge branch 'main' of https://github.com/ModelEngine-Group/DataMate"
This reverts commit a12f4c90a5, reversing
changes made to 34f08df86b.
2025-12-31 16:19:19 +08:00
hefanli
3a874fe699 fix: fix the collection for nfs (#218)
* fix: remove the datax-builder for the Backend Image

* fix: fix the collection for nfs
2025-12-31 15:56:01 +08:00
hhhhsc701
6a1eb85e8e feat: 支持运行data-juicer算子 (#215)
* feature: 增加data-juicer算子

* feat: 支持运行data-juicer算子

* feat: 支持data-juicer任务下发

* feat: 支持data-juicer结果数据集归档

* feat: 支持data-juicer结果数据集归档
2025-12-31 09:20:41 +08:00
hefanli
63f4e3e447 refactor: modify data collection to python implementation (#214)
* feature: LabelStudio jumps without login

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* fix: remove terrabase dependency

* feature: add the collection task executions page and the collection template page

* fix: fix the collection task creation

* fix: fix the collection task creation
2025-12-30 18:48:43 +08:00
hefanli
29e4a333a9 feature: LabelStudio jumps without login (#201) 2025-12-25 16:49:06 +08:00
hhhhsc701
1c507ac98a feat: 支持npu自动扩缩容 (#197)
* feat: npu动态调度

* feat: 数据集分页优化

* feat: 支持npu自动扩缩容

* feat: 支持npu自动扩缩容

* feat: 支持npu自动扩缩容

* feat: clean code
2025-12-24 18:03:30 +08:00
hefanli
215d7f0612 Fix the ratio task bug (#194)
* fix: add feign client configurations

* fix: add nacos configurations

* fix: add python to gateway

* fix: Fix the ratio task bug
2025-12-24 11:40:26 +08:00
hhhhsc701
d82bff441a fix: prevent deletion of predefined operators and improve error handling (#192)
* fix: prevent deletion of predefined operators and improve error handling

* fix: prevent deletion of predefined operators and improve error handling
2025-12-22 19:30:41 +08:00
hefanli
e5b28c26b1 add gateway (#187)
* feature: add gateway
2025-12-22 15:41:17 +08:00
hhhhsc701
46f4a8c219 feat: add download functionality for example operator and update Dock… (#188)
* feat: add download functionality for example operator and update Dockerfile

* feat: enhance download response by exposing content disposition header

* feat: update download function to accept filename parameter for example operator
2025-12-22 15:39:32 +08:00
Dallas98
8fc4455b57 配置文件更改 (#186)
* feat(generation_service): add image URL extraction and random QA generation logic

* fix(generation_service): increase batch size from 20 to 100 for improved chunk processing

* fix(generation_service): increase batch size from 20 to 100 for improved chunk processing
2025-12-22 09:29:00 +08:00
Dallas98
85eb5a99ba feat(generation_service): add image URL extraction and random QA generation logic (#182) 2025-12-19 17:36:00 +08:00
hhhhsc701
ab4523b556 add export type settings and enhance metadata structure (#181)
* fix(session): enhance database connection settings with pool pre-ping and recycle options

* feat(metadata): add export type settings and enhance metadata structure

* fix(base_op): improve sample handling by introducing target_type key and consolidating text/data retrieval logic

* feat(metadata): add export type settings and enhance metadata structure

* feat(metadata): add export type settings and enhance metadata structure
2025-12-19 11:54:08 +08:00
Dallas98
d70a3eda0d feat(generation_service): add document filtering to remove short documents based on chunk size (#180)
* fix(chart): update Helm chart helpers and values for improved configuration

* feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths

* feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthDataDetail): add delete action for chunks with confirmation prompt

* feat(SynthDataDetail): update edit and delete buttons to icon-only format

* feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion

* feat(DocumentSplitter): add enhanced document splitting functionality with CJK support and metadata detection

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* feat(DataSynthesis): streamline synthesis task handling and enhance chunk processing logic

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* fix(generation_service): ensure processed chunks are incremented regardless of question generation success

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(model_chat): enhance JSON parsing by removing additional thought tags and improving fallback logic

* feat(generation_service): add document filtering to remove short documents based on chunk size
2025-12-19 09:34:02 +08:00
hhhhsc701
be875086db feat: add operator-packages-volume to docker-compose and update Docke… (#179)
* feat: add operator-packages-volume to docker-compose and update Dockerfile for site-packages path

* feat: add retry
2025-12-18 20:32:42 +08:00
Dallas98
27b1cc8e09 feat(model_chat): enhance JSON parsing by removing additional thought tags and improving fallback logic (#178)
* fix(chart): update Helm chart helpers and values for improved configuration

* feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths

* feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthDataDetail): add delete action for chunks with confirmation prompt

* feat(SynthDataDetail): update edit and delete buttons to icon-only format

* feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion

* feat(DocumentSplitter): add enhanced document splitting functionality with CJK support and metadata detection

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* feat(DataSynthesis): streamline synthesis task handling and enhance chunk processing logic

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* fix(generation_service): ensure processed chunks are incremented regardless of question generation success

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(model_chat): enhance JSON parsing by removing additional thought tags and improving fallback logic
2025-12-18 19:19:54 +08:00
Dallas98
e0e9b1d94d feat:问题生成过程优化及COT数据生成优化 (#169)
* fix(chart): update Helm chart helpers and values for improved configuration

* feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths

* feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthDataDetail): add delete action for chunks with confirmation prompt

* feat(SynthDataDetail): update edit and delete buttons to icon-only format

* feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion

* feat(DocumentSplitter): add enhanced document splitting functionality with CJK support and metadata detection

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* feat(DataSynthesis): streamline synthesis task handling and enhance chunk processing logic

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* fix(generation_service): ensure processed chunks are incremented regardless of question generation success

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options
2025-12-18 16:51:18 +08:00
hhhhsc701
761f7f6a51 fix: optimize PDF parsing by implementing concurrent processing with … (#177)
* fix: optimize PDF parsing by implementing concurrent processing with ThreadPoolExecutor

* Refactor to async processing for file extraction

Refactor the file processing to use asyncio for improved performance and concurrency.
2025-12-18 15:28:30 +08:00
hhhhsc701
924d977d6f 支持mineru npu处理 (#174)
* feature: unstructured支持简单pdf处理

* feature: update values.yaml to enhance ray-cluster configuration with security context, environment variables, and resource limits

* feature: update deploy.yaml and process.py for mineru server configuration and PDF processing enhancements

* feature: update deploy.yaml and process.py for mineru server configuration and PDF processing enhancements

* feature: improve PDF processing logic and update dependencies in process.py and pyproject.toml

* feature: improve PDF processing logic and update dependencies in process.py and pyproject.toml

* feature: update Dockerfile for improved package source mirrors and add mineru-npu to build targets
2025-12-17 16:31:06 +08:00
hhhhsc701
62b91b6deb bugfix: update values.yaml to enhance ray-cluster configuration with security context, environment variables, and resource limits (#172)
* feature: unstructured支持简单pdf处理

* feature: update values.yaml to enhance ray-cluster configuration with security context, environment variables, and resource limits
2025-12-17 10:41:13 +08:00
hhhhsc701
d8c0b0ed73 补充modal范围 (#165) 2025-12-12 13:34:03 +08:00
hhhhsc701
fc9fb07e77 bugfix (#164) 2025-12-11 23:17:01 +08:00
Dallas98
ec87e4f204 feat(frontend): 增强Synthesis Data Detail页面UX体验 (#163)
* fix(chart): update Helm chart helpers and values for improved configuration

* feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths

* feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthDataDetail): add delete action for chunks with confirmation prompt

* feat(SynthDataDetail): update edit and delete buttons to icon-only format

* feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion
2025-12-11 21:02:44 +08:00
hefanli
8f529952f6 Fix ratio (#162)
* fix: fixed the issue where an error would be reported when only setting the proportioning quantity when creating a proportioning task

* fix: prevent adding the same file multiple times

* fix: implement a more flexible matching strategy, allowing only the tag name to be configured for matching
2025-12-11 17:45:16 +08:00
hhhhsc701
72669d1293 feat: add .env and conf.yaml for deer-flow configuration (#160)
* fix: update MILVUS_URI in .env.example for correct service endpoint

* feat: add .env and conf.yaml for deer-flow configuration
2025-12-11 14:33:06 +08:00
hhhhsc701
a6e82ce68b fix: update MILVUS_URI in .env.example for correct service endpoint (#159) 2025-12-11 14:17:02 +08:00
hhhhsc701
f69ed6b8aa Revert "feature: 增加data-juicer算子" (#158)
Revert "feature: 增加data-juicer算子 (#157)"

This reverts commit 786f13f9c3.
2025-12-11 10:32:53 +08:00
hhhhsc701
786f13f9c3 feature: 增加data-juicer算子 (#157) 2025-12-11 10:32:19 +08:00
Dallas98
2f3ae21f8a feat: enhance dataset file fetching with improved pagination and document loading support (#156) 2025-12-10 22:39:24 +08:00
hefanli
99fd46cb70 fix: fix the Data Evaluation Detail page (#154)
* fix: the Data Evaluation Detail Page should show the model used

* fix: fix the time format displayed

* fix: fix the Data Evaluation Detail page
2025-12-10 18:35:29 +08:00
hhhhsc701
19a04df276 feature: 增加水印去除/高级匿名化算子 (#151)
* feature: 增加水印去除算子

* feature: clean code

* feature: clean code

* feature: 增加高级匿名化算子
2025-12-10 18:12:47 +08:00
Dallas98
cbb146d3d7 feat(chart): add Helm chart for deploying Label Studio with PostgreSQL (#152)
* feat(chart): add Helm chart for deploying Label Studio with PostgreSQL

* feat(milvus): update Milvus configuration to use URI and remove deprecated host/port settings
2025-12-10 17:46:12 +08:00
hefanli
f87060490c feature: data management supports nested folders (#150)
* fix: k8s部署场景下,backend-python服务挂载需要存储

* fix: 增加数据集文件免拷贝的接口定义

* fix: 评估时评估结果赋予初始空值,防止未评估完成时接口报错

* feature: 数据管理支持嵌套文件夹(展示时按照文件系统展示;批量下载时带上相对路径)

* fix: 去除多余的文件重命名逻辑

* refactor: remove unused imports
2025-12-10 16:42:45 +08:00
Dallas98
c18b7af2c4 docs: update README and Makefile for clarity and new development instructions (#147)
* feat(synthesis): add evaluation task creation functionality and UI enhancements

* feat(synthesis): implement synthesis data management features including loading, editing, and deleting

* feat(synthesis): add endpoints for deleting and updating synthesis data and chunks

* fix: Correctly extract file values from selectedFilesMap in AddDataDialog

* docs: update README and Makefile for clarity and new development instructions
2025-12-10 12:25:25 +08:00
Dallas98
6ccbc8a02f feat(dependencies): update greenlet to version 3.3.0 and adjust Python version compatibility (#145) 2025-12-10 10:17:48 +08:00
Dallas98
015e738a7f feat(SynthDataDetail): add chunk/synthesis data management with edit/delete & UI enhancements (#139)
* feat(synthesis): add evaluation task creation functionality and UI enhancements

* feat(synthesis): implement synthesis data management features including loading, editing, and deleting

* feat(synthesis): add endpoints for deleting and updating synthesis data and chunks

* fix: Correctly extract file values from selectedFilesMap in AddDataDialog
2025-12-09 09:59:40 +08:00
hhhhsc701
d59c167da4 算子将抽取与落盘固定到流程中 (#134)
* feature: 将抽取动作移到每一个算子中

* feature: 落盘算子改为默认执行

* feature: 优化前端展示

* feature: 使用pyproject管理依赖
2025-12-05 17:26:29 +08:00
hefanli
744d15ba24 fix: 修复评估时模型输出json格式不对导致读取错误的问题 (#133)
* feature: add cot data evaluation function

* fix: added verification to evaluation results

* fix: fix the prompt for evaluating

* fix: 修复当评估结果为空导致读取失败的问题
2025-12-04 18:49:50 +08:00
Dallas98
31c4966608 feat(synthesis): add functionality to archive synthesis tasks to existing datasets (#132) 2025-12-04 17:11:43 +08:00
Dallas98
7012a9ad98 feat: enhance backend deployment, frontend file selection and synthesis task management (#129)
* feat: Implement data synthesis task management with database models and API endpoints

* feat: Update Python version requirements and refine dependency constraints in configuration

* fix: Correctly extract file values from selectedFilesMap in AddDataDialog

* feat: Refactor synthesis task routes and enhance file task management in the API

* feat: Enhance SynthesisTaskTab with tooltip actions and add chunk data retrieval in API
2025-12-04 09:57:13 +08:00
hefanli
1d19cd3a62 feature: add data-evaluation
* feature: add evaluation task management function

* feature: add evaluation task detail page

* fix: delete duplicate definition for table t_model_config

* refactor: rename package synthesis to ratio

* refactor: add eval file table and  refactor related code

* fix: calling large models in parallel during evaluation
2025-12-04 09:23:54 +08:00
hhhhsc701
265e284fb8 feature: 修改算子开发指南 (#127) 2025-12-03 17:45:08 +08:00
hhhhsc701
c22683d635 优化部分问题 (#126)
* feature: 支持相对路径引用

* feature: 优化本地部署命令

* feature: 优化算子编排展示

* feature: 优化清洗任务失败后重试
2025-12-03 16:41:48 +08:00
Dallas98
bcd1bc1534 feat: Update Python version requirements and refine dependency constraints in configuration (#123)
* feat: Implement data synthesis task management with database models and API endpoints

* feat: Update Python version requirements and refine dependency constraints in configuration
2025-12-02 16:34:22 +08:00