Commit Graph

18 Commits

Author SHA1 Message Date
Jason Wang
78f50ea520 feat: File and Annotation 2-way sync implementation (#63)
* feat: Refactor configuration and sync logic for improved dataset handling and logging

* feat: Enhance annotation synchronization and dataset file management

- Added new fields `tags_updated_at` to `DatasetFiles` model for tracking the last update time of tags.
- Implemented new asynchronous methods in the Label Studio client for fetching, creating, updating, and deleting task annotations.
- Introduced bidirectional synchronization for annotations between DataMate and Label Studio, allowing for flexible data management.
- Updated sync service to handle annotation conflicts based on timestamps, ensuring data integrity during synchronization.
- Enhanced dataset file response model to include tags and their update timestamps.
- Modified database initialization script to create a new column for `tags_updated_at` in the dataset files table.
- Updated requirements to ensure compatibility with the latest dependencies.
2025-11-07 15:03:07 +08:00
hhhhsc701
05b26a2981 feature: 更新算子名称;增加创建任务、模板校验 (#57)
* feature: 更新算子名称;增加创建任务、模板校验

* feature: 镜像构建增加缓存
2025-11-05 17:38:03 +08:00
hefanli
08bd4eca5c feature:增加数据配比功能 (#52)
* refactor: 修改调整数据归集实现,删除无用代码,优化代码结构

* feature: 每天凌晨00:00扫描所有数据集,检查数据集是否超过了预设的保留天数,超出保留天数的数据集调用删除接口进行删除

* fix: 修改删除数据集文件的逻辑,上传到数据集中的文件会同时删除数据库中的记录和文件系统中的文件,归集过来的文件仅删除数据库中的记录

* fix: 增加参数校验和接口定义,删除不使用的接口

* fix: 数据集统计数据默认为0

* feature: 数据集状态增加流转,创建时为草稿状态,上传文件或者归集文件后修改为活动状态

* refactor: 修改分页查询归集任务的代码

* fix: 更新后重新执行;归集任务执行增加事务控制

* feature: 创建归集任务时能够同步创建数据集,更新归集任务时能更新到指定数据集

* fix: 创建归集任务不需要创建数据集时不应该报错

* fix: 修复删除文件时数据集的统计数据不变动

* feature: 查询数据集详情时能够获取到文件标签分布

* fix: tags为空时不进行分析

* fix: 状态修改为ACTIVE

* fix: 修改解析tag的方法

* feature: 实现创建、分页查询、删除配比任务

* feature: 实现创建、分页查询、删除配比任务的前端交互

* fix: 修复进度计算异常导致的页面报错
2025-11-03 10:17:39 +08:00
hhhhsc701
d0bac68d3f bugfix: sql指定数据库 (#45) 2025-10-31 10:41:19 +08:00
Startalker
a600c1d793 feature: modify UnstructuredFormatter and ExternalPDFFormatter description (#44)
* feature: add UnstructuredFormatter

* feature: add UnstructuredFormatter in db

* feature: add unstructured[docx]==0.18.15

* feature: support doc

* feature: add mineru

* feature: add external pdf extract operator by using mineru

* feature: mineru docker install bugfix

* feature: add unstructured xlsx/xls/csv/pptx/ppt

* feature: modify UnstructuredFormatter and ExternalPDFFormatter description

---------

Co-authored-by: Startalker <438747480@qq.com>
2025-10-31 10:32:14 +08:00
hhhhsc701
b9b97c1ac2 Develop op (#35)
* refactor: enhance CleaningTaskService and related components with validation and repository updates
* feature: 支持算子上传创建
2025-10-30 17:17:00 +08:00
Dallas98
8d2b41ed94 feature: Implement the basic knowledge generation function (#40) 2025-10-30 16:50:54 +08:00
Startalker
155603b1ca feature: add external pdf extract operator by using mineru (#36)
* feature: add UnstructuredFormatter

* feature: add UnstructuredFormatter in db

* feature: add unstructured[docx]==0.18.15

* feature: support doc

* feature: add mineru

* feature: add external pdf extract operator by using mineru

* feature: mineru docker install bugfix

---------

Co-authored-by: Startalker <438747480@qq.com>
2025-10-30 15:55:10 +08:00
Jason Wang
2f7341dc1f refactor: Reorganize datamate-python (#34)
refactor: Reorganize datamate-python (previously label-studio-adapter) into a DDD style structure.
2025-10-30 01:32:59 +08:00
Dallas98
3f484e988d feat: increase api_key length and enhance ModelConfig annotations (#32)
* refactor: rename artifactId and application name to 'datamate'; add model configuration and related services

* refactor: simplify package scanning by using wildcard for mapper packages

* feat: add model health check functionality and improve model configuration

* feat: increase api_key length and enhance ModelConfig annotations
2025-10-28 17:30:26 +08:00
Dallas98
1a6e25758e feat: add model health check functionality and improve model configuration (#30)
* refactor: rename artifactId and application name to 'datamate'; add model configuration and related services

* refactor: simplify package scanning by using wildcard for mapper packages

* feat: add model health check functionality and improve model configuration
2025-10-28 16:06:53 +08:00
Dallas98
f54afddbeb refactor: rename artifactId and application name to 'datamate'; add model configuration and related services (#26) 2025-10-28 10:39:26 +08:00
hhhhsc701
f9dbefd737 Merge pull request #21 from ModelEngine-Group/develop_db
refactor: rename and reorganize data models and repositories for clarity
2025-10-24 15:46:32 +08:00
hhhhsc
2d2419205a refactor: rename and reorganize data models and repositories for clarity 2025-10-24 15:33:46 +08:00
hefanli
cc072bbf90 refactor: 修改调整数据归集实现,删除无用代码,优化代码结构 (#20) 2025-10-23 21:10:57 +08:00
Startalker
f86d4fae25 feature: add unstructured formatter operator for doc/docx (#17)
* feature: add UnstructuredFormatter

* feature: add UnstructuredFormatter in db

* feature: add unstructured[docx]==0.18.15

* feature: support doc

---------

Co-authored-by: Startalker <438747480@qq.com>
2025-10-23 16:49:03 +08:00
hhhhsc701
31ef8bc265 [Feature] Refactor project to use 'datamate' naming convention for services and configurations (#14)
* Enhance CleaningTaskService to track cleaning process progress and update ExecutorType to DATAMATE

* Refactor project to use 'datamate' naming convention for services and configurations
2025-10-22 17:53:16 +08:00
Dallas98
1c97afed7d init datamate 2025-10-21 23:00:48 +08:00