Commit Graph

512 Commits

Author SHA1 Message Date
hefanli
63f4e3e447 refactor: modify data collection to python implementation (#214)
* feature: LabelStudio jumps without login

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* refactor: modify data collection to python implementation

* fix: remove terrabase dependency

* feature: add the collection task executions page and the collection template page

* fix: fix the collection task creation

* fix: fix the collection task creation
2025-12-30 18:48:43 +08:00
hhhhsc701
80d4dfd285 feat: 修复清洗任务创建 (#207) 2025-12-30 14:41:39 +08:00
Dallas98
c7ee10b007 feat: 增加label-studio的k8s部署卸载 (#206) 2025-12-30 10:26:13 +08:00
Kecheng Sha
e22f16166c fix: reset pagination when switching operator market category filters (#205) 2025-12-29 15:16:33 +08:00
Kecheng Sha
081abf7d2f Revert "feat: fix the problem in the Operator Market frontend pages" (#204)
Reverts ModelEngine-Group/DataMate#203
2025-12-29 12:01:19 +08:00
Kecheng Sha
0df7a872e4 Revert "feat: fix the problem in the Operator Market frontend pages" 2025-12-29 12:00:37 +08:00
Kecheng Sha
8f30f71a68 feat: fix the problem in the Operator Market frontend pages (#203) 2025-12-29 11:50:01 +08:00
root
844add27ea feat: fix the problem in the Operator Market frontend pages 2025-12-29 11:38:47 +08:00
hefanli
29e4a333a9 feature: LabelStudio jumps without login (#201) 2025-12-25 16:49:06 +08:00
hhhhsc701
87e73d3bf7 feat: label-studio支持指定sc (#200)
* feat: label-studio构建脚本

* feat: label-studio构建脚本

* feat: label-studio构建脚本

* feat: label-studio安装脚本

* feat: label-studio支持指定sc
2025-12-25 16:13:38 +08:00
hhhhsc701
7e842c7cd5 feat: label-studio构建脚本 (#198)
* feat: label-studio构建脚本

* feat: label-studio构建脚本

* feat: label-studio构建脚本

* feat: label-studio安装脚本
2025-12-25 11:44:05 +08:00
hhhhsc701
1c507ac98a feat: 支持npu自动扩缩容 (#197)
* feat: npu动态调度

* feat: 数据集分页优化

* feat: 支持npu自动扩缩容

* feat: 支持npu自动扩缩容

* feat: 支持npu自动扩缩容

* feat: clean code
2025-12-24 18:03:30 +08:00
Dallas98
de7f853c83 feat: add CodeQL analysis workflow configuration (#196) 2025-12-24 12:14:16 +08:00
hefanli
215d7f0612 Fix the ratio task bug (#194)
* fix: add feign client configurations

* fix: add nacos configurations

* fix: add python to gateway

* fix: Fix the ratio task bug
2025-12-24 11:40:26 +08:00
hhhhsc701
6d61348388 feat: deer-flow通过gateway转发 (#193) 2025-12-23 11:35:45 +08:00
hhhhsc701
d82bff441a fix: prevent deletion of predefined operators and improve error handling (#192)
* fix: prevent deletion of predefined operators and improve error handling

* fix: prevent deletion of predefined operators and improve error handling
2025-12-22 19:30:41 +08:00
hefanli
c1516c87b6 Feat gateway (#191)
* fix: fix the routes definition

* fix: fix the helm installing file

* fix: modify the logging dependencies
2025-12-22 18:58:55 +08:00
hefanli
d419eec3ec fix: fix the routes definition (#189) 2025-12-22 16:04:49 +08:00
hefanli
e5b28c26b1 add gateway (#187)
* feature: add gateway
2025-12-22 15:41:17 +08:00
hhhhsc701
46f4a8c219 feat: add download functionality for example operator and update Dock… (#188)
* feat: add download functionality for example operator and update Dockerfile

* feat: enhance download response by exposing content disposition header

* feat: update download function to accept filename parameter for example operator
2025-12-22 15:39:32 +08:00
Dallas98
8fc4455b57 配置文件更改 (#186)
* feat(generation_service): add image URL extraction and random QA generation logic

* fix(generation_service): increase batch size from 20 to 100 for improved chunk processing

* fix(generation_service): increase batch size from 20 to 100 for improved chunk processing
2025-12-22 09:29:00 +08:00
Dallas98
85eb5a99ba feat(generation_service): add image URL extraction and random QA generation logic (#182) 2025-12-19 17:36:00 +08:00
hhhhsc701
ab4523b556 add export type settings and enhance metadata structure (#181)
* fix(session): enhance database connection settings with pool pre-ping and recycle options

* feat(metadata): add export type settings and enhance metadata structure

* fix(base_op): improve sample handling by introducing target_type key and consolidating text/data retrieval logic

* feat(metadata): add export type settings and enhance metadata structure

* feat(metadata): add export type settings and enhance metadata structure
2025-12-19 11:54:08 +08:00
Dallas98
d70a3eda0d feat(generation_service): add document filtering to remove short documents based on chunk size (#180)
* fix(chart): update Helm chart helpers and values for improved configuration

* feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths

* feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthDataDetail): add delete action for chunks with confirmation prompt

* feat(SynthDataDetail): update edit and delete buttons to icon-only format

* feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion

* feat(DocumentSplitter): add enhanced document splitting functionality with CJK support and metadata detection

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* feat(DataSynthesis): streamline synthesis task handling and enhance chunk processing logic

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* fix(generation_service): ensure processed chunks are incremented regardless of question generation success

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(model_chat): enhance JSON parsing by removing additional thought tags and improving fallback logic

* feat(generation_service): add document filtering to remove short documents based on chunk size
2025-12-19 09:34:02 +08:00
hhhhsc701
be875086db feat: add operator-packages-volume to docker-compose and update Docke… (#179)
* feat: add operator-packages-volume to docker-compose and update Dockerfile for site-packages path

* feat: add retry
2025-12-18 20:32:42 +08:00
Dallas98
27b1cc8e09 feat(model_chat): enhance JSON parsing by removing additional thought tags and improving fallback logic (#178)
* fix(chart): update Helm chart helpers and values for improved configuration

* feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths

* feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthDataDetail): add delete action for chunks with confirmation prompt

* feat(SynthDataDetail): update edit and delete buttons to icon-only format

* feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion

* feat(DocumentSplitter): add enhanced document splitting functionality with CJK support and metadata detection

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* feat(DataSynthesis): streamline synthesis task handling and enhance chunk processing logic

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* fix(generation_service): ensure processed chunks are incremented regardless of question generation success

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(model_chat): enhance JSON parsing by removing additional thought tags and improving fallback logic
2025-12-18 19:19:54 +08:00
Dallas98
e0e9b1d94d feat:问题生成过程优化及COT数据生成优化 (#169)
* fix(chart): update Helm chart helpers and values for improved configuration

* feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths

* feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthDataDetail): add delete action for chunks with confirmation prompt

* feat(SynthDataDetail): update edit and delete buttons to icon-only format

* feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion

* feat(DocumentSplitter): add enhanced document splitting functionality with CJK support and metadata detection

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* feat(DataSynthesis): streamline synthesis task handling and enhance chunk processing logic

* feat(DataSynthesis): refactor data synthesis models and update task handling logic

* fix(generation_service): ensure processed chunks are incremented regardless of question generation success

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options

* feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options
2025-12-18 16:51:18 +08:00
hhhhsc701
761f7f6a51 fix: optimize PDF parsing by implementing concurrent processing with … (#177)
* fix: optimize PDF parsing by implementing concurrent processing with ThreadPoolExecutor

* Refactor to async processing for file extraction

Refactor the file processing to use asyncio for improved performance and concurrency.
2025-12-18 15:28:30 +08:00
Dallas98
8113840ac7 fix(docker-compose): update entrypoint and command for mineru-openai-server configuration (#176) 2025-12-17 21:23:03 +08:00
hhhhsc701
12ade8bc7b fix: streamline Dockerfile by removing redundant mirror configuration… (#175)
fix: streamline Dockerfile by removing redundant mirror configurations and simplifying package installation
2025-12-17 16:34:41 +08:00
hhhhsc701
924d977d6f 支持mineru npu处理 (#174)
* feature: unstructured支持简单pdf处理

* feature: update values.yaml to enhance ray-cluster configuration with security context, environment variables, and resource limits

* feature: update deploy.yaml and process.py for mineru server configuration and PDF processing enhancements

* feature: update deploy.yaml and process.py for mineru server configuration and PDF processing enhancements

* feature: improve PDF processing logic and update dependencies in process.py and pyproject.toml

* feature: improve PDF processing logic and update dependencies in process.py and pyproject.toml

* feature: update Dockerfile for improved package source mirrors and add mineru-npu to build targets
2025-12-17 16:31:06 +08:00
hhhhsc701
3b4f8488e8 fix: update Dockerfile to improve pip installation process and remove unnecessary uninstalls (#173)
* feature: unstructured支持简单pdf处理

* feature: update values.yaml to enhance ray-cluster configuration with security context, environment variables, and resource limits

* fix: update Dockerfile to improve pip installation process and remove unnecessary uninstalls
2025-12-17 11:49:47 +08:00
hhhhsc701
62b91b6deb bugfix: update values.yaml to enhance ray-cluster configuration with security context, environment variables, and resource limits (#172)
* feature: unstructured支持简单pdf处理

* feature: update values.yaml to enhance ray-cluster configuration with security context, environment variables, and resource limits
2025-12-17 10:41:13 +08:00
hefanli
082aca1597 fix: the interface for querying data set files is compatible with ret… (#171)
fix: the interface for querying data set files is compatible with returns in file system format and list returns.
2025-12-16 11:31:52 +08:00
hefanli
b3558d3202 修改系统参数的预置数据 (#170)
* feature: add the pipeline for pushing images to Huawei Cloud

* fix: updates the dataset-pvc name
2025-12-15 16:43:22 +08:00
hefanli
4712f00196 feature: add the pipeline for pushing images to Huawei Cloud (#167) 2025-12-12 16:44:43 +08:00
hhhhsc701
c51058a867 feature: 商业版构建 (#166) 2025-12-12 16:19:46 +08:00
hhhhsc701
d8c0b0ed73 补充modal范围 (#165) 2025-12-12 13:34:03 +08:00
hhhhsc701
fc9fb07e77 bugfix (#164) 2025-12-11 23:17:01 +08:00
Dallas98
ec87e4f204 feat(frontend): 增强Synthesis Data Detail页面UX体验 (#163)
* fix(chart): update Helm chart helpers and values for improved configuration

* feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths

* feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthFileTask): enhance file display with progress tracking and delete action

* feat(SynthDataDetail): add delete action for chunks with confirmation prompt

* feat(SynthDataDetail): update edit and delete buttons to icon-only format

* feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion
2025-12-11 21:02:44 +08:00
hefanli
8f529952f6 Fix ratio (#162)
* fix: fixed the issue where an error would be reported when only setting the proportioning quantity when creating a proportioning task

* fix: prevent adding the same file multiple times

* fix: implement a more flexible matching strategy, allowing only the tag name to be configured for matching
2025-12-11 17:45:16 +08:00
hhhhsc701
bb8641bea2 docs: update README files to include instructions for accessing the f… (#161)
docs: update README files to include instructions for accessing the front-end interface
2025-12-11 16:15:23 +08:00
o0Shark0o
12529276ee fix(settings): improve ModelAccess table responsiveness during browser zoom 2025-12-11 14:53:01 +08:00
hhhhsc701
72669d1293 feat: add .env and conf.yaml for deer-flow configuration (#160)
* fix: update MILVUS_URI in .env.example for correct service endpoint

* feat: add .env and conf.yaml for deer-flow configuration
2025-12-11 14:33:06 +08:00
hhhhsc701
a6e82ce68b fix: update MILVUS_URI in .env.example for correct service endpoint (#159) 2025-12-11 14:17:02 +08:00
hhhhsc701
f69ed6b8aa Revert "feature: 增加data-juicer算子" (#158)
Revert "feature: 增加data-juicer算子 (#157)"

This reverts commit 786f13f9c3.
2025-12-11 10:32:53 +08:00
hhhhsc701
786f13f9c3 feature: 增加data-juicer算子 (#157) 2025-12-11 10:32:19 +08:00
o0Shark0o
cfa6301e9e feat(annotation-templates): add new NLP templates for multilabel classification, keyword extraction, and text summarization 2025-12-11 09:37:32 +08:00
Dallas98
2f3ae21f8a feat: enhance dataset file fetching with improved pagination and document loading support (#156) 2025-12-10 22:39:24 +08:00
Dallas98
e9fd6a3ae1 fix: adjust pagination logic in dataset fetching to start from the current page 2025-12-10 19:52:06 +08:00