hhhhsc701
786f13f9c3
feature: 增加data-juicer算子 ( #157 )
2025-12-11 10:32:19 +08:00
hhhhsc701
19a04df276
feature: 增加水印去除/高级匿名化算子 ( #151 )
...
* feature: 增加水印去除算子
* feature: clean code
* feature: clean code
* feature: 增加高级匿名化算子
2025-12-10 18:12:47 +08:00
hhhhsc701
d59c167da4
算子将抽取与落盘固定到流程中 ( #134 )
...
* feature: 将抽取动作移到每一个算子中
* feature: 落盘算子改为默认执行
* feature: 优化前端展示
* feature: 使用pyproject管理依赖
2025-12-05 17:26:29 +08:00
hhhhsc701
91390cace0
feature: 北向接口:支持通过模板创建清洗任务 ( #111 )
...
feature: 北向接口:支持通过模板创建清洗任务
2025-11-26 17:30:54 +08:00
hhhhsc701
9dd26d622f
feature: 数据库镜像制作 ( #70 )
...
* feature: 数据库镜像制作
* feature: 增加归档包流水线
2025-11-10 19:06:53 +08:00
hhhhsc701
2138ba23c7
feature: 增加算子详情页;优化算子上传更新逻辑 ( #64 )
...
* feature: 增加算子详情页;优化算子上传更新逻辑
2025-11-07 16:54:00 +08:00
hhhhsc701
05b26a2981
feature: 更新算子名称;增加创建任务、模板校验 ( #57 )
...
* feature: 更新算子名称;增加创建任务、模板校验
* feature: 镜像构建增加缓存
2025-11-05 17:38:03 +08:00
Startalker
a600c1d793
feature: modify UnstructuredFormatter and ExternalPDFFormatter description ( #44 )
...
* feature: add UnstructuredFormatter
* feature: add UnstructuredFormatter in db
* feature: add unstructured[docx]==0.18.15
* feature: support doc
* feature: add mineru
* feature: add external pdf extract operator by using mineru
* feature: mineru docker install bugfix
* feature: add unstructured xlsx/xls/csv/pptx/ppt
* feature: modify UnstructuredFormatter and ExternalPDFFormatter description
---------
Co-authored-by: Startalker <438747480@qq.com >
2025-10-31 10:32:14 +08:00
hhhhsc701
b9b97c1ac2
Develop op ( #35 )
...
* refactor: enhance CleaningTaskService and related components with validation and repository updates
* feature: 支持算子上传创建
2025-10-30 17:17:00 +08:00
Startalker
155603b1ca
feature: add external pdf extract operator by using mineru ( #36 )
...
* feature: add UnstructuredFormatter
* feature: add UnstructuredFormatter in db
* feature: add unstructured[docx]==0.18.15
* feature: support doc
* feature: add mineru
* feature: add external pdf extract operator by using mineru
* feature: mineru docker install bugfix
---------
Co-authored-by: Startalker <438747480@qq.com >
2025-10-30 15:55:10 +08:00
Startalker
f86d4fae25
feature: add unstructured formatter operator for doc/docx ( #17 )
...
* feature: add UnstructuredFormatter
* feature: add UnstructuredFormatter in db
* feature: add unstructured[docx]==0.18.15
* feature: support doc
---------
Co-authored-by: Startalker <438747480@qq.com >
2025-10-23 16:49:03 +08:00
hhhhsc701
31ef8bc265
[Feature] Refactor project to use 'datamate' naming convention for services and configurations ( #14 )
...
* Enhance CleaningTaskService to track cleaning process progress and update ExecutorType to DATAMATE
* Refactor project to use 'datamate' naming convention for services and configurations
2025-10-22 17:53:16 +08:00
Dallas98
1c97afed7d
init datamate
2025-10-21 23:00:48 +08:00