Commit Graph

50 Commits

Author SHA1 Message Date
Dallas98
cbb146d3d7 feat(chart): add Helm chart for deploying Label Studio with PostgreSQL (#152)
* feat(chart): add Helm chart for deploying Label Studio with PostgreSQL

* feat(milvus): update Milvus configuration to use URI and remove deprecated host/port settings
2025-12-10 17:46:12 +08:00
hhhhsc701
103c21945d 修复部分功能 (#138)
* feature: 版本统一

* feature: 定时同步时默认值展示异常,增加提示

* feature: 修复数据归集搜索

* feature: 优化标注模板查询

* feature: 屏蔽webhook功能
2025-12-10 14:31:05 +08:00
Dallas98
c18b7af2c4 docs: update README and Makefile for clarity and new development instructions (#147)
* feat(synthesis): add evaluation task creation functionality and UI enhancements

* feat(synthesis): implement synthesis data management features including loading, editing, and deleting

* feat(synthesis): add endpoints for deleting and updating synthesis data and chunks

* fix: Correctly extract file values from selectedFilesMap in AddDataDialog

* docs: update README and Makefile for clarity and new development instructions
2025-12-10 12:25:25 +08:00
uname
a728bc3100 修改数据标注:标注模板到主流语言。 2025-12-09 20:00:43 +08:00
hhhhsc701
d59c167da4 算子将抽取与落盘固定到流程中 (#134)
* feature: 将抽取动作移到每一个算子中

* feature: 落盘算子改为默认执行

* feature: 优化前端展示

* feature: 使用pyproject管理依赖
2025-12-05 17:26:29 +08:00
hefanli
744d15ba24 fix: 修复评估时模型输出json格式不对导致读取错误的问题 (#133)
* feature: add cot data evaluation function

* fix: added verification to evaluation results

* fix: fix the prompt for evaluating

* fix: 修复当评估结果为空导致读取失败的问题
2025-12-04 18:49:50 +08:00
hhhhsc701
7a9530c1e3 feature: 增加对redis未部署时异常捕获 (#131)
* feature: 增加download-deer-flow

* feature: 增加对redis未部署时异常捕获

* feature: clean code
2025-12-04 16:09:29 +08:00
hefanli
1d19cd3a62 feature: add data-evaluation
* feature: add evaluation task management function

* feature: add evaluation task detail page

* fix: delete duplicate definition for table t_model_config

* refactor: rename package synthesis to ratio

* refactor: add eval file table and  refactor related code

* fix: calling large models in parallel during evaluation
2025-12-04 09:23:54 +08:00
hhhhsc701
c22683d635 优化部分问题 (#126)
* feature: 支持相对路径引用

* feature: 优化本地部署命令

* feature: 优化算子编排展示

* feature: 优化清洗任务失败后重试
2025-12-03 16:41:48 +08:00
Dallas98
8b164cb012 feat: Implement data synthesis task management with database models and API endpoints (#122) 2025-12-02 15:23:58 +08:00
hhhhsc701
91390cace0 feature: 北向接口:支持通过模板创建清洗任务 (#111)
feature: 北向接口:支持通过模板创建清洗任务
2025-11-26 17:30:54 +08:00
hefanli
c1352ab91f feature: multiple ratio configurations can be set for the data set. (#103)
feature: multiple ratio configurations can be set for the data set.
2025-11-24 15:28:17 +08:00
hhhhsc701
a53f6776b8 feature: 构建双架构镜像 (#101)
feature: 构建双架构镜像
2025-11-24 11:34:53 +08:00
Dallas98
9858388084 feat: Refactor dataset file pagination and enhance retrieval functionality with new request structure #98
* feat: Enhance knowledge base management with collection renaming, imp…

* feat: Update Milvus integration with new API, enhance collection mana…

* Merge branch 'refs/heads/main' into dev

* feat: Refactor dataset file pagination and enhance retrieval function…

* Merge branch 'main' into dev
2025-11-21 17:28:25 +08:00
Dallas98
5638bdcf1c feat: add file copying functionality to dataset directory and update base path configuration 2025-11-14 18:05:40 +08:00
hhhhsc701
d9e163c163 Develop deer flow (#85)
* fix: deer-flow支持从datamate获取搜索引擎
2025-11-14 17:36:55 +08:00
hhhhsc701
5cef9cb273 feature: deer-flow支持从datamate获取外部接入模型 (#83)
* feature: deer-flow支持从datamate获取外部接入模型
2025-11-13 20:13:16 +08:00
Jason Wang
45743f39f5 feat: add labeling template. refactor: switch to Poetry, build and deploy of backend Python (#79)
* feat: Enhance annotation module with template management and validation

- Added DatasetMappingCreateRequest and DatasetMappingUpdateRequest schemas to handle dataset mapping requests with camelCase and snake_case support.
- Introduced Annotation Template schemas including CreateAnnotationTemplateRequest, UpdateAnnotationTemplateRequest, and AnnotationTemplateResponse for managing annotation templates.
- Implemented AnnotationTemplateService for creating, updating, retrieving, and deleting annotation templates, including validation of configurations and XML generation.
- Added utility class LabelStudioConfigValidator for validating Label Studio configurations and XML formats.
- Updated database schema for annotation templates and labeling projects to include new fields and constraints.
- Seeded initial annotation templates for various use cases including image classification, object detection, and text classification.

* feat: Enhance TemplateForm with improved validation and dynamic field rendering; update LabelStudio config validation for camelCase support

* feat: Update docker-compose.yml to mark datamate dataset volume and network as external

* feat: Add tag configuration management and related components

- Introduced new components for tag selection and browsing in the frontend.
- Added API endpoint to fetch tag configuration from the backend.
- Implemented tag configuration management in the backend, including loading from YAML.
- Enhanced template service to support dynamic tag rendering based on configuration.
- Updated validation utilities to incorporate tag configuration checks.
- Refactored existing code to utilize the new tag configuration structure.

* feat: Refactor LabelStudioTagConfig for improved configuration loading and validation

* feat: Update Makefile to include backend-python-docker-build in the build process

* feat: Migrate to poetry for better deps management

* Add pyyaml dependency and update Dockerfile to use Poetry for dependency management

- Added pyyaml (>=6.0.3,<7.0.0) to pyproject.toml dependencies.
- Updated Dockerfile to install Poetry and manage dependencies using it.
- Improved layer caching by copying only dependency files before the application code.
- Removed unnecessary installation of build dependencies to keep the final image size small.

* feat: Remove duplicated backend-python-docker-build target from Makefile

* fix: airflow is not ready for adding yet

* feat: update Python version to 3.12 and remove project installation step in Dockerfile
2025-11-13 15:32:30 +08:00
hhhhsc701
6bbde0ec56 feature: 清洗任务详情页 (#73)
* feature: 清洗任务详情

* fix: 取消构建镜像,改为直接拉取

* fix: 增加清洗任务详情页

* fix: 增加清洗任务详情页

* fix: 算子列表可点击

* fix: 模板详情和更新
2025-11-12 18:00:19 +08:00
Vincent
b8d7aca8b7 refactor:重构数据归集部分代码 (#75)
* fix:配比任务需要能够跳转到目标数据集

* feature:增加配比任务详情接口

* fix:删除不存在的配比详情页面

* fix:使用正式的逻辑来展示标签

* fix:参数默认值去掉多余的-

* fix:修复配比任务相关操作

* fix:去除不需要的日志打印和import

* feature:数据归集创建时将obs、mysql归集也放出

* refactor:重构数据归集的代码

* refactor:重构数据归集的代码
2025-11-12 09:34:50 +08:00
Dallas98
aa01f52535 合并拉取请求 #74
* feat: Implement system parameter management with Redis integration
2025-11-11 22:13:14 +08:00
Jason Wang
c5ccc56cca feat: Add labeling template (#72)
* feat: Enhance annotation module with template management and validation

- Added DatasetMappingCreateRequest and DatasetMappingUpdateRequest schemas to handle dataset mapping requests with camelCase and snake_case support.
- Introduced Annotation Template schemas including CreateAnnotationTemplateRequest, UpdateAnnotationTemplateRequest, and AnnotationTemplateResponse for managing annotation templates.
- Implemented AnnotationTemplateService for creating, updating, retrieving, and deleting annotation templates, including validation of configurations and XML generation.
- Added utility class LabelStudioConfigValidator for validating Label Studio configurations and XML formats.
- Updated database schema for annotation templates and labeling projects to include new fields and constraints.
- Seeded initial annotation templates for various use cases including image classification, object detection, and text classification.

* feat: Enhance TemplateForm with improved validation and dynamic field rendering; update LabelStudio config validation for camelCase support

* feat: Update docker-compose.yml to mark datamate dataset volume and network as external
2025-11-11 09:14:14 +08:00
hhhhsc701
9dd26d622f feature: 数据库镜像制作 (#70)
* feature: 数据库镜像制作

* feature: 增加归档包流水线
2025-11-10 19:06:53 +08:00
hhhhsc701
2138ba23c7 feature: 增加算子详情页;优化算子上传更新逻辑 (#64)
* feature: 增加算子详情页;优化算子上传更新逻辑
2025-11-07 16:54:00 +08:00
Jason Wang
78f50ea520 feat: File and Annotation 2-way sync implementation (#63)
* feat: Refactor configuration and sync logic for improved dataset handling and logging

* feat: Enhance annotation synchronization and dataset file management

- Added new fields `tags_updated_at` to `DatasetFiles` model for tracking the last update time of tags.
- Implemented new asynchronous methods in the Label Studio client for fetching, creating, updating, and deleting task annotations.
- Introduced bidirectional synchronization for annotations between DataMate and Label Studio, allowing for flexible data management.
- Updated sync service to handle annotation conflicts based on timestamps, ensuring data integrity during synchronization.
- Enhanced dataset file response model to include tags and their update timestamps.
- Modified database initialization script to create a new column for `tags_updated_at` in the dataset files table.
- Updated requirements to ensure compatibility with the latest dependencies.
2025-11-07 15:03:07 +08:00
hhhhsc701
05b26a2981 feature: 更新算子名称;增加创建任务、模板校验 (#57)
* feature: 更新算子名称;增加创建任务、模板校验

* feature: 镜像构建增加缓存
2025-11-05 17:38:03 +08:00
Jason Wang
b5fe787c20 feat: Labeling Frontend adaptations + Backend build and deploy + Logging improvement (#55)
* feat: Front-end data annotation page adaptation to the backend API.

* feat: Implement labeling configuration editor and enhance annotation task creation form

* feat: add python backend build and deployment; add backend configuration for Label Studio integration and improve logging setup

* refactor: remove duplicate log configuration
2025-11-05 01:55:53 +08:00
hhhhsc701
f3958f08d9 feature: 对接deer-flow (#54)
feature: 对接deer-flow
2025-11-04 20:30:40 +08:00
hefanli
08bd4eca5c feature:增加数据配比功能 (#52)
* refactor: 修改调整数据归集实现,删除无用代码,优化代码结构

* feature: 每天凌晨00:00扫描所有数据集,检查数据集是否超过了预设的保留天数,超出保留天数的数据集调用删除接口进行删除

* fix: 修改删除数据集文件的逻辑,上传到数据集中的文件会同时删除数据库中的记录和文件系统中的文件,归集过来的文件仅删除数据库中的记录

* fix: 增加参数校验和接口定义,删除不使用的接口

* fix: 数据集统计数据默认为0

* feature: 数据集状态增加流转,创建时为草稿状态,上传文件或者归集文件后修改为活动状态

* refactor: 修改分页查询归集任务的代码

* fix: 更新后重新执行;归集任务执行增加事务控制

* feature: 创建归集任务时能够同步创建数据集,更新归集任务时能更新到指定数据集

* fix: 创建归集任务不需要创建数据集时不应该报错

* fix: 修复删除文件时数据集的统计数据不变动

* feature: 查询数据集详情时能够获取到文件标签分布

* fix: tags为空时不进行分析

* fix: 状态修改为ACTIVE

* fix: 修改解析tag的方法

* feature: 实现创建、分页查询、删除配比任务

* feature: 实现创建、分页查询、删除配比任务的前端交互

* fix: 修复进度计算异常导致的页面报错
2025-11-03 10:17:39 +08:00
hhhhsc701
d0bac68d3f bugfix: sql指定数据库 (#45) 2025-10-31 10:41:19 +08:00
Startalker
a600c1d793 feature: modify UnstructuredFormatter and ExternalPDFFormatter description (#44)
* feature: add UnstructuredFormatter

* feature: add UnstructuredFormatter in db

* feature: add unstructured[docx]==0.18.15

* feature: support doc

* feature: add mineru

* feature: add external pdf extract operator by using mineru

* feature: mineru docker install bugfix

* feature: add unstructured xlsx/xls/csv/pptx/ppt

* feature: modify UnstructuredFormatter and ExternalPDFFormatter description

---------

Co-authored-by: Startalker <438747480@qq.com>
2025-10-31 10:32:14 +08:00
Startalker
06b05a65a9 feature: add unstructured xlsx/xls/csv/pptx/ppt (#41)
* feature: add UnstructuredFormatter

* feature: add UnstructuredFormatter in db

* feature: add unstructured[docx]==0.18.15

* feature: support doc

* feature: add mineru

* feature: add external pdf extract operator by using mineru

* feature: mineru docker install bugfix

* feature: add unstructured xlsx/xls/csv/pptx/ppt

---------

Co-authored-by: Startalker <438747480@qq.com>
2025-10-30 20:21:12 +08:00
hhhhsc701
b9b97c1ac2 Develop op (#35)
* refactor: enhance CleaningTaskService and related components with validation and repository updates
* feature: 支持算子上传创建
2025-10-30 17:17:00 +08:00
Dallas98
8d2b41ed94 feature: Implement the basic knowledge generation function (#40) 2025-10-30 16:50:54 +08:00
Startalker
155603b1ca feature: add external pdf extract operator by using mineru (#36)
* feature: add UnstructuredFormatter

* feature: add UnstructuredFormatter in db

* feature: add unstructured[docx]==0.18.15

* feature: support doc

* feature: add mineru

* feature: add external pdf extract operator by using mineru

* feature: mineru docker install bugfix

---------

Co-authored-by: Startalker <438747480@qq.com>
2025-10-30 15:55:10 +08:00
Jason Wang
2f7341dc1f refactor: Reorganize datamate-python (#34)
refactor: Reorganize datamate-python (previously label-studio-adapter) into a DDD style structure.
2025-10-30 01:32:59 +08:00
Dallas98
3f484e988d feat: increase api_key length and enhance ModelConfig annotations (#32)
* refactor: rename artifactId and application name to 'datamate'; add model configuration and related services

* refactor: simplify package scanning by using wildcard for mapper packages

* feat: add model health check functionality and improve model configuration

* feat: increase api_key length and enhance ModelConfig annotations
2025-10-28 17:30:26 +08:00
hhhhsc701
67eb571d8d feature: 对接deer-flow (#27)
feature: 对接deer-flow
2025-10-28 16:28:26 +08:00
hhhhsc
4f5a9a9a83 refactor: simplify Dockerfile by removing redundant mirror configurations and cleaning up package installation commands 2025-10-28 16:24:40 +08:00
Dallas98
1a6e25758e feat: add model health check functionality and improve model configuration (#30)
* refactor: rename artifactId and application name to 'datamate'; add model configuration and related services

* refactor: simplify package scanning by using wildcard for mapper packages

* feat: add model health check functionality and improve model configuration
2025-10-28 16:06:53 +08:00
hhhhsc
41e7e684c3 Merge branch 'main' into develop_deer 2025-10-28 11:03:01 +08:00
hhhhsc
a69b9f4921 feature: 对接deer-flow 2025-10-28 10:54:29 +08:00
Dallas98
f54afddbeb refactor: rename artifactId and application name to 'datamate'; add model configuration and related services (#26) 2025-10-28 10:39:26 +08:00
hhhhsc701
f9dbefd737 Merge pull request #21 from ModelEngine-Group/develop_db
refactor: rename and reorganize data models and repositories for clarity
2025-10-24 15:46:32 +08:00
hhhhsc
2d2419205a refactor: rename and reorganize data models and repositories for clarity 2025-10-24 15:33:46 +08:00
hefanli
cc072bbf90 refactor: 修改调整数据归集实现,删除无用代码,优化代码结构 (#20) 2025-10-23 21:10:57 +08:00
Startalker
f86d4fae25 feature: add unstructured formatter operator for doc/docx (#17)
* feature: add UnstructuredFormatter

* feature: add UnstructuredFormatter in db

* feature: add unstructured[docx]==0.18.15

* feature: support doc

---------

Co-authored-by: Startalker <438747480@qq.com>
2025-10-23 16:49:03 +08:00
hhhhsc701
31ef8bc265 [Feature] Refactor project to use 'datamate' naming convention for services and configurations (#14)
* Enhance CleaningTaskService to track cleaning process progress and update ExecutorType to DATAMATE

* Refactor project to use 'datamate' naming convention for services and configurations
2025-10-22 17:53:16 +08:00
Jason Wang
c640105333 Add Label Studio adapter module and its build scipts. 2025-10-22 15:14:01 +08:00
Dallas98
1c97afed7d init datamate 2025-10-21 23:00:48 +08:00