hhhhsc701
f78475e29f
Develop hsc ( #58 )
...
feature: 优化镜像构建/部署
2025-11-06 17:14:54 +08:00
Vincent
1686f56641
fix: 配比任务能够跳转到目标数据集 ( #59 )
...
* fix:配比任务需要能够跳转到目标数据集
* feature:增加配比任务详情接口
* fix:删除不存在的配比详情页面
2025-11-06 12:16:20 +08:00
hhhhsc701
05b26a2981
feature: 更新算子名称;增加创建任务、模板校验 ( #57 )
...
* feature: 更新算子名称;增加创建任务、模板校验
* feature: 镜像构建增加缓存
2025-11-05 17:38:03 +08:00
Jason Wang
b5fe787c20
feat: Labeling Frontend adaptations + Backend build and deploy + Logging improvement ( #55 )
...
* feat: Front-end data annotation page adaptation to the backend API.
* feat: Implement labeling configuration editor and enhance annotation task creation form
* feat: add python backend build and deployment; add backend configuration for Label Studio integration and improve logging setup
* refactor: remove duplicate log configuration
2025-11-05 01:55:53 +08:00
hhhhsc701
f3958f08d9
feature: 对接deer-flow ( #54 )
...
feature: 对接deer-flow
2025-11-04 20:30:40 +08:00
hefanli
08bd4eca5c
feature:增加数据配比功能 ( #52 )
...
* refactor: 修改调整数据归集实现,删除无用代码,优化代码结构
* feature: 每天凌晨00:00扫描所有数据集,检查数据集是否超过了预设的保留天数,超出保留天数的数据集调用删除接口进行删除
* fix: 修改删除数据集文件的逻辑,上传到数据集中的文件会同时删除数据库中的记录和文件系统中的文件,归集过来的文件仅删除数据库中的记录
* fix: 增加参数校验和接口定义,删除不使用的接口
* fix: 数据集统计数据默认为0
* feature: 数据集状态增加流转,创建时为草稿状态,上传文件或者归集文件后修改为活动状态
* refactor: 修改分页查询归集任务的代码
* fix: 更新后重新执行;归集任务执行增加事务控制
* feature: 创建归集任务时能够同步创建数据集,更新归集任务时能更新到指定数据集
* fix: 创建归集任务不需要创建数据集时不应该报错
* fix: 修复删除文件时数据集的统计数据不变动
* feature: 查询数据集详情时能够获取到文件标签分布
* fix: tags为空时不进行分析
* fix: 状态修改为ACTIVE
* fix: 修改解析tag的方法
* feature: 实现创建、分页查询、删除配比任务
* feature: 实现创建、分页查询、删除配比任务的前端交互
* fix: 修复进度计算异常导致的页面报错
2025-11-03 10:17:39 +08:00
Jason Wang
ba0c69086a
feat: data annotation page adaptation to backend API. Improve labeling project creation module.
...
* feat: data annotation page adaptation to the backend API.
* feat: Implement labeling configuration editor and enhance annotation task creation form
2025-10-31 15:56:29 +08:00
Startalker
a600c1d793
feature: modify UnstructuredFormatter and ExternalPDFFormatter description ( #44 )
...
* feature: add UnstructuredFormatter
* feature: add UnstructuredFormatter in db
* feature: add unstructured[docx]==0.18.15
* feature: support doc
* feature: add mineru
* feature: add external pdf extract operator by using mineru
* feature: mineru docker install bugfix
* feature: add unstructured xlsx/xls/csv/pptx/ppt
* feature: modify UnstructuredFormatter and ExternalPDFFormatter description
---------
Co-authored-by: Startalker <438747480@qq.com >
2025-10-31 10:32:14 +08:00
Startalker
06b05a65a9
feature: add unstructured xlsx/xls/csv/pptx/ppt ( #41 )
...
* feature: add UnstructuredFormatter
* feature: add UnstructuredFormatter in db
* feature: add unstructured[docx]==0.18.15
* feature: support doc
* feature: add mineru
* feature: add external pdf extract operator by using mineru
* feature: mineru docker install bugfix
* feature: add unstructured xlsx/xls/csv/pptx/ppt
---------
Co-authored-by: Startalker <438747480@qq.com >
2025-10-30 20:21:12 +08:00
Jason Wang
e0884ab048
Develop py update schema ( #37 )
...
* feature: implement endpoints with multi-level response models
* refactor: move `/health` and `/config` endpoints to system module, remove example from base schemas
* refactor: remove unused get_standard_response_model()
2025-10-30 16:24:37 +08:00
Startalker
155603b1ca
feature: add external pdf extract operator by using mineru ( #36 )
...
* feature: add UnstructuredFormatter
* feature: add UnstructuredFormatter in db
* feature: add unstructured[docx]==0.18.15
* feature: support doc
* feature: add mineru
* feature: add external pdf extract operator by using mineru
* feature: mineru docker install bugfix
---------
Co-authored-by: Startalker <438747480@qq.com >
2025-10-30 15:55:10 +08:00
Jason Wang
2f7341dc1f
refactor: Reorganize datamate-python ( #34 )
...
refactor: Reorganize datamate-python (previously label-studio-adapter) into a DDD style structure.
2025-10-30 01:32:59 +08:00
hhhhsc
41e7e684c3
Merge branch 'main' into develop_deer
2025-10-28 11:03:01 +08:00
hhhhsc
a69b9f4921
feature: 对接deer-flow
2025-10-28 10:54:29 +08:00
Jinglong Wang
7f819563db
Develop labeling module ( #25 )
...
* refactor: remove db table management from LS adapter (mv to scripts later); change adapter to use the same MySQL DB as other modules.
* refactor: Rename LS Adapter module to datamate-python
2025-10-27 16:16:14 +08:00
Jinglong Wang
ad9f41ffd7
feat: Dataset pagination; camelCase support in schemas ( #22 )
...
implement pagination for dataset mappings.
update response models to support camelCase parameters.
2025-10-24 17:14:42 +08:00
hhhhsc
2d2419205a
refactor: rename and reorganize data models and repositories for clarity
2025-10-24 15:33:46 +08:00
Startalker
f86d4fae25
feature: add unstructured formatter operator for doc/docx ( #17 )
...
* feature: add UnstructuredFormatter
* feature: add UnstructuredFormatter in db
* feature: add unstructured[docx]==0.18.15
* feature: support doc
---------
Co-authored-by: Startalker <438747480@qq.com >
2025-10-23 16:49:03 +08:00
hhhhsc701
31ef8bc265
[Feature] Refactor project to use 'datamate' naming convention for services and configurations ( #14 )
...
* Enhance CleaningTaskService to track cleaning process progress and update ExecutorType to DATAMATE
* Refactor project to use 'datamate' naming convention for services and configurations
2025-10-22 17:53:16 +08:00
Jason Wang
c640105333
Add Label Studio adapter module and its build scipts.
2025-10-22 15:14:01 +08:00
Dallas98
1c97afed7d
init datamate
2025-10-21 23:00:48 +08:00