Compare commits

...

78 Commits

Author SHA1 Message Date
078f303f57 Revert "fix: 修复 hasArchive 和 splitByLine 同时存在的问题"
This reverts commit 50f2da5503.
2026-02-04 15:48:01 +08:00
50f2da5503 fix: 修复 hasArchive 和 splitByLine 同时存在的问题
问题:hasArchive 默认为 true,而 splitByLine 可以与其同时开启,
      导致压缩包被错误地按行分割,产生逻辑矛盾。

修复:
1. 当 hasArchive=true 时,禁用 splitByLine switch
2. 添加 useEffect,当 hasArchive 变为 true 时自动关闭 splitByLine

修改文件:frontend/src/pages/DataManagement/Detail/components/ImportConfiguration.tsx
2026-02-04 15:43:53 +08:00
3af1daf8b6 fix: 修复流式分割上传的"预上传请求不存在"错误
问题:handleStreamUpload 中为所有文件只调用一次 preUpload,设置
      totalFileNum: files.length(原始文件数),但实际上传的文件数量
      是按行分割后的总行数,导致后端提前删除预上传请求。

修复:将 preUpload 调用移到文件循环内部,为每个原始文件单独调用
      preUpload,设置 totalFileNum: 1,每个文件有自己的 reqId。
      这样可以避免按行分割导致的请求被提前删除问题。

修改文件:frontend/src/hooks/useSliceUpload.tsx
2026-02-04 15:39:05 +08:00
7c7729434b fix: 修复流式分割上传的三个问题
1. 实现真正的并发控制,避免同时产生大量请求
   - 使用任务队列模式,确保同时运行的任务不超过 maxConcurrency
   - 完成一个任务后才启动下一个,而不是一次性启动所有任务

2. 修复 API 错误(预上传请求不存在)
   - 所有分片使用相同的 fileNo=1(属于同一个预上传请求)
   - chunkNo 改为行号,表示第几行数据
   - 这是根本原因:之前每行都被当作不同文件,但只有第一个文件有有效的预上传请求

3. 保留原始文件扩展名
   - 正确提取并保留文件扩展名
   - 例如:132.txt → 132_000001.txt(而不是 132_000001)
2026-02-04 15:06:02 +08:00
17a62cd3c2 fix: 修复上传取消功能,确保 HTTP 请求正确中止
- 在 XMLHttpRequest 中添加 signal.aborted 检查
- 修复 useSliceUpload 中的 cancelFn 闭包问题
- 确保流式上传和分片上传都能正确取消
2026-02-04 14:51:23 +08:00
f381d641ab fix(upload): 修复流式上传中的文件名处理逻辑
- 修正预上传接口调用时传递正确的文件总数而非固定值-1
- 移除导入配置中文件分割时的文件扩展名保留逻辑
- 删除流式上传选项中的fileExtension参数定义
- 移除流式上传实现中的文件扩展名处理相关代码
- 简化新文件名生成逻辑,不再附加扩展名后缀
2026-02-04 07:47:41 +08:00
c8611d29ff feat(upload): 实现流式分割上传,优化大文件上传体验
实现边分割边上传的流式处理,避免大文件一次性加载导致前端卡顿。

修改内容:
1. file.util.ts - 流式分割上传核心功能
   - 新增 streamSplitAndUpload 函数,实现边分割边上传
   - 新增 shouldStreamUpload 函数,判断是否使用流式上传
   - 新增 StreamUploadOptions 和 StreamUploadResult 接口
   - 优化分片大小(默认 5MB)

2. ImportConfiguration.tsx - 智能上传策略
   - 大文件(>5MB)使用流式分割上传
   - 小文件(≤5MB)使用传统分割方式
   - 保持 UI 不变

3. useSliceUpload.tsx - 流式上传处理
   - 新增 handleStreamUpload 处理流式上传事件
   - 支持并发上传和更好的进度管理

4. TaskUpload.tsx - 进度显示优化
   - 注册流式上传事件监听器
   - 显示流式上传信息(已上传行数、当前文件等)

5. dataset.model.ts - 类型定义扩展
   - 新增 StreamUploadInfo 接口
   - TaskItem 接口添加 streamUploadInfo 和 prefix 字段

实现特点:
- 流式读取:使用 Blob.slice 逐块读取,避免一次性加载
- 逐行检测:按换行符分割,形成完整行后立即上传
- 内存优化:buffer 只保留当前块和未完成行,不累积所有分割结果
- 并发控制:支持 3 个并发上传,提升效率
- 进度可见:实时显示已上传行数和总体进度
- 错误处理:单个文件上传失败不影响其他文件
- 向后兼容:小文件仍使用原有分割方式

优势:
- 大文件上传不再卡顿,用户体验大幅提升
- 内存占用显著降低(从加载整个文件到只保留当前块)
- 上传效率提升(边分割边上传,并发上传多个小文件)

相关文件:
- frontend/src/utils/file.util.ts
- frontend/src/pages/DataManagement/Detail/components/ImportConfiguration.tsx
- frontend/src/hooks/useSliceUpload.tsx
- frontend/src/pages/Layout/TaskUpload.tsx
- frontend/src/pages/DataManagement/dataset.model.ts
2026-02-03 13:12:10 +00:00
147beb1ec7 feat(annotation): 实现文本切片预生成功能
在创建标注任务时自动预生成文本切片结构,避免每次进入标注页面时的实时计算。

修改内容:
1. 在 AnnotationEditorService 中新增 precompute_segmentation_for_project 方法
   - 为项目的所有文本文件预计算切片结构
   - 使用 AnnotationTextSplitter 执行切片
   - 将切片结构持久化到 AnnotationResult 表(状态为 IN_PROGRESS)
   - 支持失败重试机制
   - 返回统计信息

2. 修改 create_mapping 接口
   - 在创建标注任务后,如果启用分段且为文本数据集,自动触发切片预生成
   - 使用 try-except 捕获异常,确保切片失败不影响项目创建

特点:
- 使用现有的 AnnotationTextSplitter 类
- 切片数据结构与现有分段标注格式一致
- 向后兼容(未切片的任务仍然使用实时计算)
- 性能优化:避免进入标注页面时的重复计算

相关文件:
- runtime/datamate-python/app/module/annotation/service/editor.py
- runtime/datamate-python/app/module/annotation/interface/project.py
2026-02-03 12:59:29 +00:00
699031dae7 fix: 修复编辑数据集时无法清除关联数据集的编译问题
问题分析:
之前尝试使用 @TableField(updateStrategy = FieldStrategy.IGNORED/ALWAYS) 注解
来强制更新 null 值,但 FieldStrategy.ALWAYS 可能不存在于当前
MyBatis-Plus 3.5.14 版本中,导致编译错误。

修复方案:
1. 移除 Dataset.java 中 parentDatasetId 字段的 @TableField(updateStrategy) 注解
2. 移除不需要的 import com.baomidou.mybatisplus.annotation.FieldStrategy
3. 在 DatasetApplicationService.updateDataset 方法中:
   - 添加 import com.baomidou.mybatisplus.core.conditions.update.LambdaUpdateWrapper
   - 保存原始的 parentDatasetId 值用于比较
   - handleParentChange 之后,检查 parentDatasetId 是否发生变化
   - 如果发生变化,使用 LambdaUpdateWrapper 显式地更新 parentDatasetId 字段
   - 这样即使值为 null 也能被正确更新到数据库

原理:
MyBatis-Plus 的 updateById 方法默认只更新非 null 字段。
通过使用 LambdaUpdateWrapper 的 set 方法,可以显式地设置字段值,
包括 null 值,从而确保字段能够被正确更新到数据库。
2026-02-03 11:09:15 +00:00
88b1383653 fix: 恢复前端发送空字符串以支持清除关联数据集
修改说明:
移除了之前将空字符串转换为 undefined 的逻辑,
现在直接发送表单值,包括空字符串。

配合后端修改(commit cc6415c):
1. 当用户选择"无关联数据集"时,发送空字符串 ""
2. 后端 handleParentChange 方法通过 normalizeParentId 将空字符串转为 null
3. Dataset.parentDatasetId 字段添加了 @TableField(updateStrategy = FieldStrategy.IGNORED)
4. 确保即使值为 null 也会被更新到数据库
2026-02-03 10:57:14 +00:00
cc6415c4d9 fix: 修复编辑数据集时无法清除关联数据集的问题
问题描述:
在数据管理的数据集编辑中,如果之前设置了关联数据集,编辑时选择不关联数据集后保存不会生效。

根本原因:
MyBatis-Plus 的 updateById 方法默认使用 FieldStrategy.NOT_NULL 策略,
只有当字段值为非 null 时才会更新到数据库。
当 parentDatasetId 从有值变为 null 时,默认情况下不会更新到数据库。

修复方案:
在 Dataset.java 的 parentDatasetId 字段上添加 @TableField(updateStrategy = FieldStrategy.IGNORED) 注解,
表示即使值为 null 也需要更新到数据库。

配合前端修改(恢复发送空字符串),现在可以正确清除关联数据集:
1. 前端发送空字符串表示"无关联数据集"
2. 后端 handleParentChange 通过 normalizeParentId 将空字符串转为 null
3. dataset.setParentDatasetId(null) 设置为 null
4. 由于添加了 IGNORED 策略,即使为 null 也会更新到数据库
2026-02-03 10:57:08 +00:00
3d036c4cd6 fix: 修复编辑数据集时无法清除关联数据集的问题
问题描述:
在数据管理的数据集编辑中,如果之前设置了关联数据集,编辑时选择不关联数据集后保存不会生效。

问题原因:
后端 updateDataset 方法中的条件判断:
```java
if (updateDatasetRequest.getParentDatasetId() != null) {
    handleParentChange(dataset, updateDatasetRequest.getParentDatasetId());
}
```
当 parentDatasetId 为 null 或空字符串时,条件判断为 false,不会执行 handleParentChange,导致无法清除关联数据集。

修复方案:
去掉条件判断,始终调用 handleParentChange。handleParentChange 内部通过 normalizeParentId 方法将空字符串和 null 都转换为 null,这样既支持设置新的父数据集,也支持清除关联。

配合前端修改(commit 2445235),将空字符串转换为 undefined(被后端反序列化为 null),确保清除关联的操作能够正确执行。
2026-02-03 09:35:09 +00:00
2445235fd2 fix: 修复编辑数据集时清除关联数据集不生效的问题
问题描述:
在数据管理的数据集编辑中,如果之前设置了关联数据集,编辑时选择不关联数据集后保存不会生效。

问题原因:
- BasicInformation.tsx中,"无关联数据集"选项的值是空字符串""
- 当用户选择不关联数据集时,parentDatasetId的值为""
- 后端API将空字符串视为无效值而忽略,而不是识别为"清除关联"的操作

修复方案:
- 在EditDataset.tsx的handleSubmit函数中,将parentDatasetId的空字符串转换为undefined
- 使用 formValues.parentDatasetId || undefined 确保空字符串被转换为 undefined
- 这样后端API能正确识别为要清除关联数据集的操作
2026-02-03 09:23:13 +00:00
893e0a1580 fix: 上传文件时任务中心立即显示
问题描述:
在数据管理的数据集详情页上传文件时,点击确认后,弹窗消失,但是需要等待文件处理(特别是启用按行分割时)后任务中心才弹出来,用户体验不好。

修改内容:
1. useSliceUpload.tsx: 在 createTask 函数中添加立即显示任务中心的逻辑,确保任务创建后立即显示
2. ImportConfiguration.tsx: 在 handleImportData 函数中,在执行耗时的文件处理操作(如文件分割)之前,立即触发 show:task-popover 事件显示任务中心

效果:
- 修改前:点击确认 → 弹窗消失 → (等待文件处理)→ 任务中心弹出
- 修改后:点击确认 → 弹窗消失 + 任务中心立即弹出 → 文件开始处理
2026-02-03 09:14:40 +00:00
05e6842fc8 refactor(DataManagement): 移除不必要的数据集类型过滤逻辑
- 删除了对数据集类型的过滤操作
- 移除了不再使用的 textDatasetTypeOptions 变量
- 简化了 BasicInformation 组件的数据传递逻辑
- 减少了代码冗余,提高了组件性能
2026-02-03 13:33:12 +08:00
da5b18e423 feat(scripts): 添加 APT 缓存预装功能解决离线构建问题
- 新增 APT 缓存目录和相关构建脚本 export-cache.sh
- 添加 build-base-images.sh 脚本用于构建预装 APT 包的基础镜像
- 增加 build-offline-final.sh 最终版离线构建脚本
- 更新 Makefile.offline.mk 添加新的离线构建目标
- 扩展 README.md 文档详细说明 APT 缓存问题解决方案
- 为多个服务添加使用预装基础镜像的离线 Dockerfile
- 修改打包脚本包含 APT 缓存到最终压缩包中
2026-02-03 13:16:17 +08:00
31629ab50b docs(offline): 更新离线构建文档添加传统构建方式和故障排查指南
- 添加传统 docker build 方式作为推荐方案
- 新增离线环境诊断命令 make offline-diagnose
- 扩展故障排查章节,增加多个常见问题解决方案
- 添加文件清单和推荐工作流说明
- 为 BuildKit 构建器无法使用本地镜像问题提供多种解决方法
- 更新构建命令使用说明和重要提示信息
2026-02-03 13:10:28 +08:00
fb43052ddf feat(build): 添加传统 Docker 构建方式和诊断功能
- 在 build-offline.sh 脚本中添加 --pull=false 参数并改进错误处理
- 为 Makefile.offline.mk 中的各个服务构建任务添加 --pull=false 参数
- 新增 build-offline-classic.sh 脚本,提供不使用 BuildKit 的传统构建方式
- 新增 build-offline-v2.sh 脚本,提供增强版 BuildKit 离线构建功能
- 新增 diagnose.sh 脚本,用于诊断离线构建环境状态
- 在 Makefile 中添加 offline-build-classic 和 offline-diagnose
2026-02-02 23:53:45 +08:00
c44c75be25 fix(login): 修复登录页面样式问题
- 修正了标题下方描述文字的CSS类名,移除了错误的空格
- 更新了页脚版权信息的样式类名
- 简化了底部描述文本的内容,保持一致的品牌信息
2026-02-02 22:49:46 +08:00
05f3efc148 build(docker): 更新 Docker 镜像源为南京大学镜像地址
- 将 frontend Dockerfile 中的基础镜像从 gcr.io 切换到 gcr.nju.edu.cn
- 更新 offline Dockerfile 中的 nodejs20-debian12 镜像源
- 修改 export-cache.sh 脚本中的基础镜像列表为南京大学镜像
- 更新 Makefile.offline.mk 中的镜像拉取地址为本地镜像源
- 优化 export-cache.sh 脚本的格式和输出信息
- 添加缓存导出过程中的警告处理机制
2026-02-02 22:48:41 +08:00
16eb5cacf9 feat(data-management): 添加知识项扩展元数据支持
- 在 KnowledgeItemApplicationService 中实现元数据字段的更新逻辑
- 为 CreateKnowledgeItemRequest 添加 metadata 字段定义
- 为 UpdateKnowledgeItemRequest 添加 metadata 字段定义
- 支持知识项创建和更新时的扩展元数据存储
2026-02-02 22:20:05 +08:00
e71116d117 refactor(components): 更新标签组件类型定义和数据处理逻辑
- 修改 Tag 接口定义,将 id 和 color 字段改为可选类型
- 更新 onAddTag 回调函数参数类型,从对象改为字符串
- 在 AddTagPopover 组件中添加 useCallback 优化数据获取逻辑
- 调整标签去重逻辑,支持 id 或 name 任一字段匹配
- 更新 DetailHeader 组件的数据类型定义和泛型约束
- 添加 parseMetadata 工具函数用于解析元数据
- 实现 isAnnotationItem 函数判断注释类型数据
- 优化知识库详情页的标签处理和数据类型转换
2026-02-02 22:15:16 +08:00
cac53d7aac fix(knowledge): 更新知识管理页面标题为知识集
- 将页面标题从"知识管理"修改为"知识集"
2026-02-02 21:49:39 +08:00
43b4a619bc refactor(knowledge): 移除知识库创建中的扩展元数据字段
- 删除了表单中的扩展元数据输入区域
- 移除了对应的 Form.Item 包装器
- 简化了创建知识库表单结构
2026-02-02 21:48:21 +08:00
9da187d2c6 feat(build): 添加离线构建支持
- 新增 build-offline.sh 脚本实现无网环境构建
- 添加离线版 Dockerfiles 使用本地资源替代网络下载
- 创建 export-cache.sh 脚本在有网环境预下载依赖
- 集成 Makefile.offline.mk 提供便捷的离线构建命令
- 添加详细的离线构建文档和故障排查指南
- 实现基础镜像、BuildKit 缓存和外部资源的一键打包
2026-02-02 21:44:44 +08:00
b36fdd2438 feat(annotation): 添加数据类型过滤功能到标签配置树编辑器
- 引入 DataType 枚举类型定义
- 根据数据类型动态过滤对象标签选项
- 在模板表单中添加数据类型监听
- 改进错误处理逻辑以提高类型安全性
- 集成数据类型参数到配置树编辑器组件
2026-02-02 20:37:38 +08:00
daa63bdd13 feat(knowledge): 移除知识库管理中的敏感级别功能
- 注释掉创建知识集表单中的敏感级别选择字段
- 移除知识集详情页面中的敏感级别显示项
- 注释掉相关的敏感级别选项配置常量
- 更新表单布局以保持一致的两列网格结构
2026-02-02 19:06:03 +08:00
85433ac071 feat(template): 移除模板类型和版本字段并添加管理员权限控制
- 移除了模板详情页面中的类型和版本显示字段
- 移除了模板列表页面中的类型和版本列
- 添加了管理员权限检查功能,通过 localStorage 键控制
- 将编辑和删除操作按钮限制为仅管理员可见
- 将创建模板按钮限制为仅管理员可见
2026-02-02 18:59:32 +08:00
fc2e50b415 Revert "refactor(template): 移除模板列表中的类型、版本和操作列"
This reverts commit a5261b33b2.
2026-02-02 18:39:52 +08:00
26e1ae69d7 Revert "refactor(template): 移除模板列表页面的创建按钮"
This reverts commit b2bdf9e066.
2026-02-02 18:39:48 +08:00
7092c3f955 feat(annotation): 调整文本编辑器大小限制配置
- 将editor_max_text_bytes默认值从2MB改为0,表示不限制
- 更新文本获取服务中的大小检查逻辑,只在max_bytes大于0时进行限制
- 修改错误提示信息中的字节限制显示
- 优化配置参数的条件判断流程
2026-02-02 17:53:09 +08:00
b2bdf9e066 refactor(template): 移除模板列表页面的创建按钮
- 删除了右上角的创建模板按钮组件
- 移除了相关的点击事件处理函数调用
- 调整了页面布局结构以适应按钮移除后的变化
2026-02-02 16:35:09 +08:00
a5261b33b2 refactor(template): 移除模板列表中的类型、版本和操作列
- 移除了类型列(内置/自定义标签显示)
- 移除了版本列
- 移除了操作列(查看、编辑、删除按钮)
- 保留了创建时间列并维持其渲染逻辑
2026-02-02 16:20:50 +08:00
root
52daf30869 a 2026-02-02 16:09:25 +08:00
07a901043a refactor(annotation): 移除文本内容获取相关功能
- 删除了 fetch_text_content_via_download_api 导入
- 移除了 TEXT 类型数据集的文本内容获取逻辑
- 删除了 _append_annotation_to_content 方法实现
- 简化了知识同步服务的内容处理流程
2026-02-02 15:39:06 +08:00
32e3fc97c6 feat(annotation): 增强知识库同步服务以支持项目隔离
- 在知识库查找时添加项目ID验证,确保知识库归属正确
- 修改日志消息以显示项目ID信息便于调试
- 重构知识库查找逻辑,从按名称查找改为按名称和项目ID组合查找
- 新增_metadata_matches_project方法验证元数据中的项目归属
- 新增_parse_metadata方法安全解析元数据JSON字符串
- 更新回退命名逻辑以确保项目级别的唯一性
- 在所有知识库操作中统一使用项目名称和项目ID进行验证
2026-02-02 15:28:33 +08:00
a73571bd73 feat(annotation): 优化模板配置树编辑器中的属性填充逻辑
- 修改对象配置属性填充条件,仅在名称不存在时设置默认值
- 为控制配置添加标签类别判断逻辑
- 区分标注类和布局类控件的属性填充策略
- 标注类控件始终填充必需属性,布局类控件仅在需要时填充
- 修复属性值设置逻辑,确保正确引用名称属性
2026-02-02 15:26:25 +08:00
00fa1b86eb refactor(DataAnnotation): 移除未使用的状态变量并优化选择器逻辑
- 删除未使用的 addChildTag 和 addSiblingTag 状态变量
- 将 Select 组件的值设置为 null 以重置选择状态
- 简化 handleAddNode 调用的处理逻辑
- 移除不再需要的状态管理代码以提高性能
2026-02-02 15:23:01 +08:00
626c0fcd9a fix(data-annotation): 修复数据标注任务进度计算问题
- 添加 toSafeCount 工具函数确保数值安全处理
- 支持 totalCount 和 total_count 字段兼容性
-
2026-02-01 23:42:06 +08:00
2f2e0d6a8d feat(KnowledgeManagement): 保留知识集原始字段信息
- 在更新标签时保持知识集的名称、描述、状态等核心属性
- 保留领域、业务线、负责人等元数据信息
- 维护有效期、敏感度等配置项
- 确保源类型和自定义元数据字段不被覆盖
- 防止更新标签操作意外丢失其他重要字段值
2026-02-01 23:30:01 +08:00
10fad39e02 feat(KnowledgeManagement): 为知识集详情页添加标签功能
- 引入 updateKnowledgeSetByIdUsingPut、createDatasetTagUsingPost 和 queryDatasetTagsUsingGet API
- 添加 Clock 图标用于显示更新时间
- 替换条目数和更新时间的图标为 File 和 Clock 组件
- 配置标签组件以支持添加、获取和创建标签
- 实现标签的创建和添加逻辑
- 集成标签的异步加载和更新功能
2026-02-01 23:26:54 +08:00
9014dca1ac fix(knowledge): 修复知识集详情页面状态判断逻辑
- 修正了 office 预览状态的条件判断
- 移除了对 PENDING 状态的冗余检查
- 优化了状态轮询的触发条件
2026-02-01 23:15:50 +08:00
0b8fe34586 refactor(DataManagement): 简化文件操作逻辑并移除文本数据集类型检查
- 移除了未使用的 DatasetType 导入
- 删除了 TEXT_DATASET_TYPE_PREFIX 常量定义
- 移除了 isTextDataset 工具函数
- 直接设置 excludeDerivedFiles 参数为 true,简化查询逻辑
2026-02-01 23:13:09 +08:00
27e27a09d4 fix(knowledge): 移除知识条目编辑器中的冗余提示消息
- 删除了文件上传成功后的重复提示信息
- 保持了文件对象的正确处理逻辑
- 优化了用户体验避免不必要的操作反馈
2026-02-01 23:07:32 +08:00
d24fea83d8 feat(KnowledgeItemEditor): 添加文件上传替换功能的加载状态
- 添加 loading 状态用于控制文件上传和替换操作
- 在文件上传前设置 loading 状态为 true
- 在文件替换前设置 loading 状态为 true
- 在操作完成后通过 finally 块重置 loading 状态
- 将 loading 状态绑定到确认按钮的 confirmLoading 属性
2026-02-01 23:07:10 +08:00
05088fef1a refactor(data-management): 优化文本数据集类型判断逻辑
- 添加 TEXT_DATASET_TYPE_PREFIX 常量定义
- 新增 isTextDataset 工具函数用于判断文本数据集类型
- 使用 isTextDataset 函数替换原有的直接比较逻辑
- 提高代码可读性和类型判断的准确性
2026-02-01 23:02:05 +08:00
a0239518fb feat(dataset): 实现数据集文件可见性过滤功能
- 添加派生文件识别逻辑,通过元数据中的derived_from_file_id字段判断
- 实现applyVisibleFileCounts方法为数据集批量设置可见文件数量
- 修改数据集统计接口使用过滤后的可见文件进行统计计算
- 添加normalizeFilePath工具方法统一路径格式处理
- 更新文件查询逻辑支持派生文件过滤功能
- 新增DatasetFileCount DTO用于文件计数统计返回
2026-02-01 22:55:07 +08:00
9d185bb10c feat(deploy): 添加上传文件存储卷配置
- 新增 uploads_volume 卷用于存储上传文件
- 配置卷名称为 datamate-uploads-volume
- 将上传卷挂载到容器 /uploads 目录
- 更新部署配置以支持文件上传功能
2026-02-01 22:34:14 +08:00
6c4f05c0b9 fix(data-management): 修复文件预览状态检查逻辑
- 移除 PENDING 状态的预览轮询检查
- 避免在 PENDING 状态下重复轮询导致的性能问题
- 优化预览加载状态管理流程
2026-02-01 22:32:03 +08:00
438acebb89 feat(data-management): 添加Office文档预览功能
- 集成LibreOffice转换器实现DOC/DOCX转PDF功能
- 新增DatasetFilePreviewService处理预览文件管理
- 新增DatasetFilePreviewAsyncService异步转换任务
- 在文件删除时同步清理预览文件
- 前端实现Office文档预览状态轮询机制
- 添加预览API接口支持状态查询和转换触发
- 优化文件预览界面显示转换进度和错误信息
2026-02-01 22:26:05 +08:00
f06d6e5a7e fix(utils): 修复请求工具中的XMLHttpRequest配置问题
- 移动XMLHttpRequest实例化到方法开头避免重复创建
- 删除被注释掉的旧请求完成事件处理代码
- 修正请求错误和中止事件的错误处理逻辑
- 移除重复的xhr.open调用确保正确的HTTP方法设置
2026-02-01 22:07:43 +08:00
fda283198d refactor(knowledge): 移除未使用的Tag组件导入
- 从KnowledgeSetDetail.tsx中移除未使用的Tag组件导入
- 保持代码整洁性,消除无用的依赖项
2026-02-01 22:05:10 +08:00
d535d0ac1b feat(knowledge): 添加Office文档预览轮询机制
- 引入useRef钩子用于管理轮询定时器和当前处理项目
- 添加Spin组件用于预览加载状态显示
- 新增queryKnowledgeItemPreviewStatusUsingGet API调用接口
- 设置OFFICE_PREVIEW_POLL_INTERVAL和OFFICE_PREVIEW_POLL_MAX_TIMES常量
- 移除原有的Office预览元数据解析相关代码
- 添加officePreviewStatus、officePreviewError状态管理
- 实现pollOfficePreviewStatus函数进行预览状态轮询
- 添加clearOfficePreviewPolling清理轮询定时器功能
- 在handlePreviewItemFile中集成预览状态轮询逻辑
- 更新关闭预览时清理轮询和重置状态
- 移除表格中的Office预览标签显示
- 优化PDF预览界面,在无预览URL时显示加载或错误状态
2026-02-01 22:02:57 +08:00
4d2c9e546c refactor(menu): 调整菜单结构并更新数据管理标题
- 将数据管理菜单项标题从"数据管理"改为"数集管理"
- 重新排列菜单项顺序,将数据标注和内容生成功能移至数据管理后
- 数据集统计页面标题从"数据管理"更新为"数据集统计"
- 移除重复的数据标注和内容生成菜单配置项
2026-02-01 21:40:21 +08:00
02cd16523f refactor(data-management-service): 移除 docx4j 依赖
- 删除了 docx4j-core 依赖项
- 删除了 docx4j-export-fo 依赖项
- 更新了项目依赖配置
- 简化了构建配置文件
2026-02-01 21:18:50 +08:00
d4a44f3bf5 refactor(data-management): 优化知识项目预览服务的文件转换逻辑
- 移除 docx4j 相关依赖和转换方法
- 统一 office 文件转换为 pdf 的处理方式,全部使用 libreoffice
- 删除单独的 docx 到 pdf 转换方法
- 重命名转换方法为 convertOfficeToPdfByLibreOffice
- 增强路径解析逻辑,添加多种候选路径处理
- 添加路径安全性验证和规范化处理
- 新增 extractRelativePathFromSegment 和 normalizeRelativePathValue 辅助方法
- 改进文件存在性检查和路径构建逻辑
2026-02-01 21:18:14 +08:00
340a0ad364 refactor(data-management): 更新知识项存储路径解析方法
- 将 resolveKnowledgeItemStoragePath 方法替换为 resolveKnowledgeItemStoragePathWithFallback
- 新方法提供备用路径解析逻辑以增强文件定位的可靠性
2026-02-01 21:14:39 +08:00
00c41fbbd3 refactor(knowledge-item): 优化知识项预览文件路径处理逻辑
- 将文件路径验证逻辑从方法开始位置移动到实际使用位置
- 修复了预览文件名获取方式,直接从相对路径解析文件名
- 确保文件存在性检查只在需要时执行
- 提高了代码可读性和执行效率
2026-02-01 21:00:07 +08:00
2430db290d fix(knowledge): 修复知识管理页面统计信息显示错误
- 将第二个统计项从"文件总数"更正为"知识类别"
- 将第三个统计项从"标签总数"更正为"文件总数"
- 在统计数据显示区域调整标签总数的位置
- 确保统计数据与标题正确对应
2026-02-01 20:46:54 +08:00
40889baacc feat(knowledge): 添加知识库条目预览功能
- 集成 docx4j 和 LibreOffice 实现 Office 文档转 PDF 预览
- 新增 KnowledgeItemPreviewService 处理预览转换逻辑
- 添加异步任务 KnowledgeItemPreviewAsyncService 进行文档转换
- 实现预览状态管理包括待转换、转换中、就绪和失败状态
- 在前端界面添加 Office 文档预览状态标签显示
- 支持 DOC/DOCX 文件在线预览功能
- 添加预览元数据存储和管理机制
2026-02-01 20:05:25 +08:00
551248ec76 feat(data-annotation): 添加表格序号列并移除任务ID列
- 添加序号列显示当前页码计算后的行号
- 移除原有的任务ID列
- 序号列居中对齐宽度为80px
- 序号基于当前页码和页面大小动态计算
- 保持表格
2026-02-01 19:11:39 +08:00
0bb9abb200 feat(annotation): 添加标注类型显示功能
- 在前端页面中新增标注类型列并使用Tag组件展示
- 添加AnnotationTypeMap常量用于标注类型的映射
- 修改接口定义支持labelingType字段的传递
- 更新后端项目创建和更新逻辑以存储标注类型
- 添加标注类型配置键常量统一管理
- 扩展数据传输对象支持标注类型属性
- 实现模板标注类型的继承逻辑
2026-02-01 19:08:11 +08:00
d135a7f336 feat(knowledge): 添加知识库标签统计功能
- 在 KnowledgeItemApplicationService 中注入 TagMapper 并调用统计方法
- 新增 countKnowledgeSetTags 方法用于计算知识库中的标签总数
- 在 KnowledgeManagementStatisticsResponse 中添加 totalTags 字段
- 在前端 KnowledgeManagementPage 中显示标签总数统计信息
- 更新统计卡片布局从 3 列改为 4 列以适应新增统计项
- 在知识管理模型中添加 totalTags 类型定义
2026-02-01 18:46:31 +08:00
7043a26ab3 feat(auth): 添加登录功能和路由保护
- 在侧边栏添加退出登录按钮并实现登出逻辑
- 添加 ProtectedRoute 组件用于路由权限控制
- 创建 LoginPage 组件实现登录界面和逻辑
- 集成本地登录验证到 authSlice 状态管理
- 配置路由表添加登录页面和保护路由
- 实现自动跳转到登录页面的重定向逻辑
2026-02-01 14:11:44 +08:00
906bb39b83 feat(annotation): 添加保存并跳转到下一段功能
- 新增 SAVE_AND_NEXT_LABEL 常量用于保存并跳转按钮文本
- 添加 saveDisabled 状态控制保存按钮禁用逻辑
- 修改顶部工具栏布局为三列网格结构
- 在工具栏中间位置添加保存并跳转到下一段/下一条按钮
- 调整保存按钮样式移除主色调设置
- 优化保存按钮禁用状态逻辑统一管理
- 修改保存功能区分普通保存和跳转保存操作
2026-02-01 13:09:55 +08:00
dbf8ec53dd style(ui): 统一预览模态框宽度为响应式尺寸
- 将 CreateAnnotationTaskDialog 中的预览模态框宽度从固定像素改为 80vw
- 将 VisualTemplateBuilder 中的预览抽屉宽度从 600px 改为 80vw
- 将 PreviewPromptModal 中的模态框宽度从 800px 改为 80vw
- 将 Overview 组件中的文本和媒体预览宽度统一改为 80vw
- 将 KnowledgeSetDetail 中的文本和媒体预览宽度统一改为 80vw
- 移除原来固定的像素值,使用响应式单位提升用户体验
2026-02-01 12:49:56 +08:00
5f89968974 refactor(dataset): 重构数据集基础信息组件
- 优化 BasicInformation 组件结构和逻辑
- 更新 CreateDataset 组件的数据处理流程
- 改进表单验证和错误处理机制
- 统一组件间的事件传递方式
- 提升代码可读性和维护性
2026-02-01 11:31:09 +08:00
be313cf425 refactor(db): 优化知识条目表索引结构
- 移除知识条目表中 relative_path 字段的索引
- 移除知识条目目录表中 relative_path 字段的唯一约束
- 移除知识条目目录表中 relative_path 字段的索引
- 保留必要的 source_file 和 set_id 关
2026-02-01 11:26:10 +08:00
db37de8aee perf(db): 优化知识条目表索引配置
- 为 idx_dm_ki_relative_path 索引添加长度限制 (768)
- 为 uk_dm_kd_set_path 唯一约束添加相对路径长度限制 (768)
- 为 idx_dm_kd_relative_path 索引添加长度限制 (768)
- 提升数据库查询性能和索引效率
2026-02-01 11:24:35 +08:00
aeec19b99f feat(annotation): 添加保存快捷键功能
- 实现了 Ctrl+S 保存快捷键检测逻辑
- 添加了 handleSaveShortcut 事件处理函数
- 在窗口上注册键盘事件监听器
- 修改 requestExport 函数支持 autoAdvance 参数
- 更新保存按钮点击事件传递 autoAdvance 参数
2026-01-31 20:47:33 +08:00
a4aefe66cd perf(file): 增加文件上传默认超时时间
- 将默认超时时间从 120 秒增加到 1800 秒
- 提高大文件上传的处理能力
2026-01-31 19:15:21 +08:00
2f3a8b38d0 fix(dataset): 解决数据集文件查询时空目录导致异常的问题
- 添加目录存在性检查,避免文件系统访问异常
- 目录不存在时返回空分页结果而不是抛出异常
- 优化数据集刚创建时的用户体验
2026-01-31 19:10:22 +08:00
150af1a741 fix(annotation): 修复项目映射查询逻辑错误
- 移除旧的映射服务查询方式,改为直接查询 ORM 模型获取原始数据
- 更新配置字段读取逻辑以使用新的 ORM 对象
- 修复更新无变化时的响应数据返回问题
- 添加软删除过滤条件确保只返回未删除的项目记录
- 统一数据访问方式提高查询效率和代码一致性
2026-01-31 18:57:08 +08:00
e28f680abb feat(annotation): 添加标注项目信息更新功能
- 引入 DatasetMappingUpdateRequest 请求模型支持 name、description、template_id 和 label_config 字段更新
- 在项目接口中添加 PUT /{project_id} 端点用于更新标注项目信息
- 实现更新逻辑包括映射记录查询、配置信息处理和数据库更新操作
- 集成标准响应格式返回更新结果
- 添加异常处理和日志记录确保操作可追溯性
2026-01-31 18:54:05 +08:00
4f99875670 feat(data-management): 添加数据集类型判断并控制按行分割功能显示
- 从 dataset.model 中导入 DatasetType 类型定义
- 新增 isTextDataset 变量用于判断当前数据集是否为文本类型
- 将按行分割配置项包裹在条件渲染中,仅在文本数据集时显示
- 保持原有非文本文件禁用逻辑不变
2026-01-31 18:50:56 +08:00
c23a9da8cb feat(knowledge): 添加知识库目录管理功能
- 在知识条目表中新增relative_path字段用于存储条目相对路径
- 创建知识条目目录表用于管理知识库中的目录结构
- 实现目录的增删查接口和相应的应用服务逻辑
- 在前端知识库详情页面集成目录显示和操作功能
- 添加目录创建删除等相关的API接口和DTO定义
- 更新数据库初始化脚本包含新的目录表结构
2026-01-31 18:36:40 +08:00
310bc356b1 feat(knowledge): 添加知识库文件目录结构支持功能
- 在 KnowledgeItem 模型中增加 relativePath 字段存储相对路径
- 实现文件上传时的目录前缀处理和相对路径构建逻辑
- 添加批量删除知识条目的接口和实现方法
- 重构前端 KnowledgeSetDetail 组件以支持目录浏览和管理
- 实现文件夹创建、删除、导航等目录操作功能
- 更新数据查询逻辑以支持按相对路径进行搜索和过滤
- 添加前端文件夹图标显示和目录层级展示功能
2026-01-31 17:45:43 +08:00
c1fb02b0f5 refactor(annotation): 更新任务编辑模式的数据类型定义
- 移除 AnnotationTask 类型导入
- 添加 AnnotationTaskListItem 类型导入
- 修改 editTask 属性类型从 AnnotationTask 到 AnnotationTaskListItem
- 优化组件类型定义以匹配实际使用的数据结构
2026-01-31 17:19:18 +08:00
115 changed files with 8804 additions and 1574 deletions

304
Makefile.offline.mk Normal file
View File

@@ -0,0 +1,304 @@
# ============================================================================
# Makefile 离线构建扩展
# 将此文件内容追加到主 Makefile 末尾,或单独包含使用
# ============================================================================
# 离线构建配置
CACHE_DIR ?= ./build-cache
OFFLINE_VERSION ?= latest
# 创建 buildx 构建器(如果不存在)
.PHONY: ensure-buildx
ensure-buildx:
@if ! docker buildx inspect offline-builder > /dev/null 2>&1; then \
echo "创建 buildx 构建器..."; \
docker buildx create --name offline-builder --driver docker-container --use 2>/dev/null || docker buildx use offline-builder; \
else \
docker buildx use offline-builder 2>/dev/null || true; \
fi
# ========== 离线缓存导出(有网环境) ==========
.PHONY: offline-export
offline-export: ensure-buildx
@echo "======================================"
@echo "导出离线构建缓存..."
@echo "======================================"
@mkdir -p $(CACHE_DIR)/buildkit $(CACHE_DIR)/images $(CACHE_DIR)/resources
@$(MAKE) _offline-export-base-images
@$(MAKE) _offline-export-cache
@$(MAKE) _offline-export-resources
@$(MAKE) _offline-package
.PHONY: _offline-export-base-images
_offline-export-base-images:
@echo ""
@echo "1. 导出基础镜像..."
@bash -c 'images=( \
"maven:3-eclipse-temurin-21" \
"maven:3-eclipse-temurin-8" \
"eclipse-temurin:21-jdk" \
"mysql:8" \
"node:20-alpine" \
"nginx:1.29" \
"ghcr.nju.edu.cn/astral-sh/uv:python3.11-bookworm" \
"ghcr.nju.edu.cn/astral-sh/uv:python3.12-bookworm" \
"ghcr.nju.edu.cn/astral-sh/uv:latest" \
"python:3.12-slim" \
"python:3.11-slim" \
"gcr.nju.edu.cn/distroless/nodejs20-debian12" \
); for img in "$${images[@]}"; do echo " Pulling $$img..."; docker pull "$$img" 2>/dev/null || true; done'
@echo " Saving base images..."
@docker save -o $(CACHE_DIR)/images/base-images.tar \
maven:3-eclipse-temurin-21 \
maven:3-eclipse-temurin-8 \
eclipse-temurin:21-jdk \
mysql:8 \
node:20-alpine \
nginx:1.29 \
ghcr.nju.edu.cn/astral-sh/uv:python3.11-bookworm \
ghcr.nju.edu.cn/astral-sh/uv:python3.12-bookworm \
ghcr.nju.edu.cn/astral-sh/uv:latest \
python:3.12-slim \
python:3.11-slim \
gcr.nju.edu.cn/distroless/nodejs20-debian12 2>/dev/null || echo " Warning: Some images may not exist"
.PHONY: _offline-export-cache
_offline-export-cache:
@echo ""
@echo "2. 导出 BuildKit 缓存..."
@echo " backend..."
@docker buildx build --cache-to type=local,dest=$(CACHE_DIR)/buildkit/backend-cache,mode=max -f scripts/images/backend/Dockerfile -t datamate-backend:cache . 2>/dev/null || echo " Warning: backend cache export failed"
@echo " backend-python..."
@docker buildx build --cache-to type=local,dest=$(CACHE_DIR)/buildkit/backend-python-cache,mode=max -f scripts/images/backend-python/Dockerfile -t datamate-backend-python:cache . 2>/dev/null || echo " Warning: backend-python cache export failed"
@echo " database..."
@docker buildx build --cache-to type=local,dest=$(CACHE_DIR)/buildkit/database-cache,mode=max -f scripts/images/database/Dockerfile -t datamate-database:cache . 2>/dev/null || echo " Warning: database cache export failed"
@echo " frontend..."
@docker buildx build --cache-to type=local,dest=$(CACHE_DIR)/buildkit/frontend-cache,mode=max -f scripts/images/frontend/Dockerfile -t datamate-frontend:cache . 2>/dev/null || echo " Warning: frontend cache export failed"
@echo " gateway..."
@docker buildx build --cache-to type=local,dest=$(CACHE_DIR)/buildkit/gateway-cache,mode=max -f scripts/images/gateway/Dockerfile -t datamate-gateway:cache . 2>/dev/null || echo " Warning: gateway cache export failed"
@echo " runtime..."
@docker buildx build --cache-to type=local,dest=$(CACHE_DIR)/buildkit/runtime-cache,mode=max -f scripts/images/runtime/Dockerfile -t datamate-runtime:cache . 2>/dev/null || echo " Warning: runtime cache export failed"
@echo " deer-flow-backend..."
@docker buildx build --cache-to type=local,dest=$(CACHE_DIR)/buildkit/deer-flow-backend-cache,mode=max -f scripts/images/deer-flow-backend/Dockerfile -t deer-flow-backend:cache . 2>/dev/null || echo " Warning: deer-flow-backend cache export failed"
@echo " deer-flow-frontend..."
@docker buildx build --cache-to type=local,dest=$(CACHE_DIR)/buildkit/deer-flow-frontend-cache,mode=max -f scripts/images/deer-flow-frontend/Dockerfile -t deer-flow-frontend:cache . 2>/dev/null || echo " Warning: deer-flow-frontend cache export failed"
@echo " mineru..."
@docker buildx build --cache-to type=local,dest=$(CACHE_DIR)/buildkit/mineru-cache,mode=max -f scripts/images/mineru/Dockerfile -t datamate-mineru:cache . 2>/dev/null || echo " Warning: mineru cache export failed"
.PHONY: _offline-export-resources
_offline-export-resources:
@echo ""
@echo "3. 预下载外部资源..."
@mkdir -p $(CACHE_DIR)/resources/models
@echo " PaddleOCR model..."
@wget -q -O $(CACHE_DIR)/resources/models/ch_ppocr_mobile_v2.0_cls_infer.tar \
https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar 2>/dev/null || echo " Warning: PaddleOCR model download failed"
@echo " spaCy model..."
@wget -q -O $(CACHE_DIR)/resources/models/zh_core_web_sm-3.8.0-py3-none-any.whl \
https://ghproxy.net/https://github.com/explosion/spacy-models/releases/download/zh_core_web_sm-3.8.0/zh_core_web_sm-3.8.0-py3-none-any.whl 2>/dev/null || echo " Warning: spaCy model download failed"
@echo " DataX source..."
@if [ ! -d "$(CACHE_DIR)/resources/DataX" ]; then \
git clone --depth 1 https://gitee.com/alibaba/DataX.git $(CACHE_DIR)/resources/DataX 2>/dev/null || echo " Warning: DataX clone failed"; \
fi
@echo " deer-flow source..."
@if [ ! -d "$(CACHE_DIR)/resources/deer-flow" ]; then \
git clone --depth 1 https://ghproxy.net/https://github.com/ModelEngine-Group/deer-flow.git $(CACHE_DIR)/resources/deer-flow 2>/dev/null || echo " Warning: deer-flow clone failed"; \
fi
.PHONY: _offline-package
_offline-package:
@echo ""
@echo "4. 打包缓存..."
@cd $(CACHE_DIR) && tar -czf "build-cache-$$(date +%Y%m%d).tar.gz" buildkit images resources 2>/dev/null && cd - > /dev/null
@echo ""
@echo "======================================"
@echo "✓ 缓存导出完成!"
@echo "======================================"
@echo "传输文件: $(CACHE_DIR)/build-cache-$$(date +%Y%m%d).tar.gz"
# ========== 离线构建(无网环境) ==========
.PHONY: offline-setup
offline-setup:
@echo "======================================"
@echo "设置离线构建环境..."
@echo "======================================"
@if [ ! -d "$(CACHE_DIR)" ]; then \
echo "查找并解压缓存包..."; \
cache_file=$$(ls -t build-cache-*.tar.gz 2>/dev/null | head -1); \
if [ -z "$$cache_file" ]; then \
echo "错误: 未找到缓存压缩包 (build-cache-*.tar.gz)"; \
exit 1; \
fi; \
echo "解压 $$cache_file..."; \
tar -xzf "$$cache_file"; \
else \
echo "缓存目录已存在: $(CACHE_DIR)"; \
fi
@echo ""
@echo "加载基础镜像..."
@if [ -f "$(CACHE_DIR)/images/base-images.tar" ]; then \
docker load -i $(CACHE_DIR)/images/base-images.tar; \
else \
echo "警告: 基础镜像文件不存在,假设已手动加载"; \
fi
@$(MAKE) ensure-buildx
@echo ""
@echo "✓ 离线环境准备完成"
.PHONY: offline-build
offline-build: offline-setup
@echo ""
@echo "======================================"
@echo "开始离线构建..."
@echo "======================================"
@$(MAKE) _offline-build-services
.PHONY: _offline-build-services
_offline-build-services: ensure-buildx
@echo ""
@echo "构建 datamate-database..."
@docker buildx build \
--cache-from type=local,src=$(CACHE_DIR)/buildkit/database-cache \
--pull=false \
-f scripts/images/database/Dockerfile \
-t datamate-database:$(OFFLINE_VERSION) \
--load . || echo " Failed"
@echo ""
@echo "构建 datamate-gateway..."
@docker buildx build \
--cache-from type=local,src=$(CACHE_DIR)/buildkit/gateway-cache \
--pull=false \
-f scripts/images/gateway/Dockerfile \
-t datamate-gateway:$(OFFLINE_VERSION) \
--load . || echo " Failed"
@echo ""
@echo "构建 datamate-backend..."
@docker buildx build \
--cache-from type=local,src=$(CACHE_DIR)/buildkit/backend-cache \
--pull=false \
-f scripts/images/backend/Dockerfile \
-t datamate-backend:$(OFFLINE_VERSION) \
--load . || echo " Failed"
@echo ""
@echo "构建 datamate-frontend..."
@docker buildx build \
--cache-from type=local,src=$(CACHE_DIR)/buildkit/frontend-cache \
--pull=false \
-f scripts/images/frontend/Dockerfile \
-t datamate-frontend:$(OFFLINE_VERSION) \
--load . || echo " Failed"
@echo ""
@echo "构建 datamate-runtime..."
@docker buildx build \
--cache-from type=local,src=$(CACHE_DIR)/buildkit/runtime-cache \
--pull=false \
--build-arg RESOURCES_DIR=$(CACHE_DIR)/resources \
-f scripts/images/runtime/Dockerfile \
-t datamate-runtime:$(OFFLINE_VERSION) \
--load . || echo " Failed"
@echo ""
@echo "构建 datamate-backend-python..."
@docker buildx build \
--cache-from type=local,src=$(CACHE_DIR)/buildkit/backend-python-cache \
--pull=false \
--build-arg RESOURCES_DIR=$(CACHE_DIR)/resources \
-f scripts/images/backend-python/Dockerfile \
-t datamate-backend-python:$(OFFLINE_VERSION) \
--load . || echo " Failed"
@echo ""
@echo "======================================"
@echo "✓ 离线构建完成"
@echo "======================================"
# 单个服务离线构建 (BuildKit)
.PHONY: %-offline-build
%-offline-build: offline-setup ensure-buildx
@echo "离线构建 $*..."
@if [ ! -d "$(CACHE_DIR)/buildkit/$*-cache" ]; then \
echo "错误: $* 的缓存不存在"; \
exit 1; \
fi
@$(eval IMAGE_NAME := $(if $(filter deer-flow%,$*),$*,datamate-$*))
@docker buildx build \
--cache-from type=local,src=$(CACHE_DIR)/buildkit/$*-cache \
--pull=false \
$(if $(filter runtime backend-python deer-flow%,$*),--build-arg RESOURCES_DIR=$(CACHE_DIR)/resources,) \
-f scripts/images/$*/Dockerfile \
-t $(IMAGE_NAME):$(OFFLINE_VERSION) \
--load .
# 传统 Docker 构建(不使用 BuildKit,更稳定)
.PHONY: offline-build-classic
offline-build-classic: offline-setup
@echo "使用传统 docker build 进行离线构建..."
@bash scripts/offline/build-offline-classic.sh $(CACHE_DIR) $(OFFLINE_VERSION)
# 诊断离线环境
.PHONY: offline-diagnose
offline-diagnose:
@bash scripts/offline/diagnose.sh $(CACHE_DIR)
# 构建 APT 预装基础镜像(有网环境)
.PHONY: offline-build-base-images
offline-build-base-images:
@echo "构建 APT 预装基础镜像..."
@bash scripts/offline/build-base-images.sh $(CACHE_DIR)
# 使用预装基础镜像进行离线构建(推荐)
.PHONY: offline-build-final
offline-build-final: offline-setup
@echo "使用预装 APT 包的基础镜像进行离线构建..."
@bash scripts/offline/build-offline-final.sh $(CACHE_DIR) $(OFFLINE_VERSION)
# 完整离线导出(包含 APT 预装基础镜像)
.PHONY: offline-export-full
offline-export-full:
@echo "======================================"
@echo "完整离线缓存导出(含 APT 预装基础镜像)"
@echo "======================================"
@$(MAKE) offline-build-base-images
@$(MAKE) offline-export
@echo ""
@echo "导出完成!传输时请包含以下文件:"
@echo " - build-cache/images/base-images-with-apt.tar"
@echo " - build-cache-YYYYMMDD.tar.gz"
# ========== 帮助 ==========
.PHONY: help-offline
help-offline:
@echo "离线构建命令:"
@echo ""
@echo "【有网环境】"
@echo " make offline-export [CACHE_DIR=./build-cache] - 导出构建缓存"
@echo " make offline-export-full - 导出完整缓存(含 APT 预装基础镜像)"
@echo " make offline-build-base-images - 构建 APT 预装基础镜像"
@echo ""
@echo "【无网环境】"
@echo " make offline-setup [CACHE_DIR=./build-cache] - 解压并准备离线缓存"
@echo " make offline-build-final - 使用预装基础镜像构建(推荐,解决 APT 问题)"
@echo " make offline-build-classic - 使用传统 docker build"
@echo " make offline-build - 使用 BuildKit 构建"
@echo " make offline-diagnose - 诊断离线构建环境"
@echo " make <service>-offline-build - 离线构建单个服务"
@echo ""
@echo "【完整工作流程(推荐)】"
@echo " # 1. 有网环境导出完整缓存"
@echo " make offline-export-full"
@echo ""
@echo " # 2. 传输到无网环境(需要传输两个文件)"
@echo " scp build-cache/images/base-images-with-apt.tar user@offline-server:/path/"
@echo " scp build-cache-*.tar.gz user@offline-server:/path/"
@echo ""
@echo " # 3. 无网环境构建"
@echo " tar -xzf build-cache-*.tar.gz"
@echo " docker load -i build-cache/images/base-images-with-apt.tar"
@echo " make offline-build-final"

View File

@@ -1,5 +1,6 @@
package com.datamate.datamanagement.application;
import com.baomidou.mybatisplus.core.conditions.update.LambdaUpdateWrapper;
import com.baomidou.mybatisplus.core.metadata.IPage;
import com.baomidou.mybatisplus.extension.plugins.pagination.Page;
import com.datamate.common.domain.utils.ChunksSaver;
@@ -19,8 +20,11 @@ import com.datamate.datamanagement.infrastructure.exception.DataManagementErrorC
import com.datamate.datamanagement.infrastructure.persistence.mapper.TagMapper;
import com.datamate.datamanagement.infrastructure.persistence.repository.DatasetFileRepository;
import com.datamate.datamanagement.infrastructure.persistence.repository.DatasetRepository;
import com.datamate.datamanagement.infrastructure.persistence.repository.dto.DatasetFileCount;
import com.datamate.datamanagement.interfaces.converter.DatasetConverter;
import com.datamate.datamanagement.interfaces.dto.*;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.collections4.CollectionUtils;
@@ -53,6 +57,7 @@ public class DatasetApplicationService {
private static final int SIMILAR_DATASET_MAX_LIMIT = 50;
private static final int SIMILAR_DATASET_CANDIDATE_FACTOR = 5;
private static final int SIMILAR_DATASET_CANDIDATE_MAX = 100;
private static final String DERIVED_METADATA_KEY = "derived_from_file_id";
private final DatasetRepository datasetRepository;
private final TagMapper tagMapper;
private final DatasetFileRepository datasetFileRepository;
@@ -97,6 +102,7 @@ public class DatasetApplicationService {
public Dataset updateDataset(String datasetId, UpdateDatasetRequest updateDatasetRequest) {
Dataset dataset = datasetRepository.getById(datasetId);
BusinessAssert.notNull(dataset, DataManagementErrorCode.DATASET_NOT_FOUND);
if (StringUtils.hasText(updateDatasetRequest.getName())) {
dataset.setName(updateDatasetRequest.getName());
}
@@ -109,13 +115,31 @@ public class DatasetApplicationService {
if (Objects.nonNull(updateDatasetRequest.getStatus())) {
dataset.setStatus(updateDatasetRequest.getStatus());
}
if (updateDatasetRequest.getParentDatasetId() != null) {
if (updateDatasetRequest.isParentDatasetIdProvided()) {
// 保存原始的 parentDatasetId 值,用于比较是否发生了变化
String originalParentDatasetId = dataset.getParentDatasetId();
// 处理父数据集变更:仅当请求显式包含 parentDatasetId 时处理
// handleParentChange 内部通过 normalizeParentId 方法将空字符串和 null 都转换为 null
// 这样既支持设置新的父数据集,也支持清除关联
handleParentChange(dataset, updateDatasetRequest.getParentDatasetId());
// 检查 parentDatasetId 是否发生了变化
if (!Objects.equals(originalParentDatasetId, dataset.getParentDatasetId())) {
// 使用 LambdaUpdateWrapper 显式地更新 parentDatasetId 字段
// 这样即使值为 null 也能被正确更新到数据库
datasetRepository.update(null, new LambdaUpdateWrapper<Dataset>()
.eq(Dataset::getId, datasetId)
.set(Dataset::getParentDatasetId, dataset.getParentDatasetId()));
}
}
if (StringUtils.hasText(updateDatasetRequest.getDataSource())) {
// 数据源id不为空,使用异步线程进行文件扫盘落库
processDataSourceAsync(dataset.getId(), updateDatasetRequest.getDataSource());
}
// 更新其他字段(不包括 parentDatasetId,因为它已经在上面的代码中更新了)
datasetRepository.updateById(dataset);
return dataset;
}
@@ -142,6 +166,7 @@ public class DatasetApplicationService {
BusinessAssert.notNull(dataset, DataManagementErrorCode.DATASET_NOT_FOUND);
List<DatasetFile> datasetFiles = datasetFileRepository.findAllByDatasetId(datasetId);
dataset.setFiles(datasetFiles);
applyVisibleFileCounts(Collections.singletonList(dataset));
return dataset;
}
@@ -153,6 +178,7 @@ public class DatasetApplicationService {
IPage<Dataset> page = new Page<>(query.getPage(), query.getSize());
page = datasetRepository.findByCriteria(page, query);
String datasetPvcName = getDatasetPvcName();
applyVisibleFileCounts(page.getRecords());
List<DatasetResponse> datasetResponses = DatasetConverter.INSTANCE.convertToResponse(page.getRecords());
datasetResponses.forEach(dataset -> dataset.setPvcName(datasetPvcName));
return PagedResponse.of(datasetResponses, page.getCurrent(), page.getTotal(), page.getPages());
@@ -200,6 +226,7 @@ public class DatasetApplicationService {
})
.limit(safeLimit)
.toList();
applyVisibleFileCounts(sorted);
List<DatasetResponse> responses = DatasetConverter.INSTANCE.convertToResponse(sorted);
responses.forEach(item -> item.setPvcName(datasetPvcName));
return responses;
@@ -345,6 +372,61 @@ public class DatasetApplicationService {
dataset.setPath(newPath);
}
private void applyVisibleFileCounts(List<Dataset> datasets) {
if (CollectionUtils.isEmpty(datasets)) {
return;
}
List<String> datasetIds = datasets.stream()
.filter(Objects::nonNull)
.map(Dataset::getId)
.filter(StringUtils::hasText)
.toList();
if (datasetIds.isEmpty()) {
return;
}
Map<String, Long> countMap = datasetFileRepository.countNonDerivedByDatasetIds(datasetIds).stream()
.filter(Objects::nonNull)
.collect(Collectors.toMap(
DatasetFileCount::getDatasetId,
count -> Optional.ofNullable(count.getFileCount()).orElse(0L),
(left, right) -> left
));
for (Dataset dataset : datasets) {
if (dataset == null || !StringUtils.hasText(dataset.getId())) {
continue;
}
Long visibleCount = countMap.get(dataset.getId());
dataset.setFileCount(visibleCount != null ? visibleCount : 0L);
}
}
private List<DatasetFile> filterVisibleFiles(List<DatasetFile> files) {
if (CollectionUtils.isEmpty(files)) {
return Collections.emptyList();
}
return files.stream()
.filter(file -> !isDerivedFile(file))
.collect(Collectors.toList());
}
private boolean isDerivedFile(DatasetFile datasetFile) {
if (datasetFile == null) {
return false;
}
String metadata = datasetFile.getMetadata();
if (!StringUtils.hasText(metadata)) {
return false;
}
try {
ObjectMapper mapper = new ObjectMapper();
Map<String, Object> metadataMap = mapper.readValue(metadata, new TypeReference<Map<String, Object>>() {});
return metadataMap.get(DERIVED_METADATA_KEY) != null;
} catch (Exception e) {
log.debug("Failed to parse dataset file metadata for derived detection: {}", datasetFile.getId(), e);
return false;
}
}
/**
* 获取数据集统计信息
*/
@@ -357,27 +439,29 @@ public class DatasetApplicationService {
Map<String, Object> statistics = new HashMap<>();
// 基础统计
Long totalFiles = datasetFileRepository.countByDatasetId(datasetId);
Long completedFiles = datasetFileRepository.countCompletedByDatasetId(datasetId);
List<DatasetFile> allFiles = datasetFileRepository.findAllByDatasetId(datasetId);
List<DatasetFile> visibleFiles = filterVisibleFiles(allFiles);
long totalFiles = visibleFiles.size();
long completedFiles = visibleFiles.stream()
.filter(file -> "COMPLETED".equalsIgnoreCase(file.getStatus()))
.count();
Long totalSize = datasetFileRepository.sumSizeByDatasetId(datasetId);
statistics.put("totalFiles", totalFiles != null ? totalFiles.intValue() : 0);
statistics.put("completedFiles", completedFiles != null ? completedFiles.intValue() : 0);
statistics.put("totalFiles", (int) totalFiles);
statistics.put("completedFiles", (int) completedFiles);
statistics.put("totalSize", totalSize != null ? totalSize : 0L);
// 完成率计算
float completionRate = 0.0f;
if (totalFiles != null && totalFiles > 0) {
completionRate = (completedFiles != null ? completedFiles.floatValue() : 0.0f) / totalFiles.floatValue() * 100.0f;
if (totalFiles > 0) {
completionRate = ((float) completedFiles) / (float) totalFiles * 100.0f;
}
statistics.put("completionRate", completionRate);
// 文件类型分布统计
Map<String, Integer> fileTypeDistribution = new HashMap<>();
List<DatasetFile> allFiles = datasetFileRepository.findAllByDatasetId(datasetId);
if (allFiles != null) {
for (DatasetFile file : allFiles) {
if (!visibleFiles.isEmpty()) {
for (DatasetFile file : visibleFiles) {
String fileType = file.getFileType() != null ? file.getFileType() : "unknown";
fileTypeDistribution.put(fileType, fileTypeDistribution.getOrDefault(fileType, 0) + 1);
}
@@ -386,8 +470,8 @@ public class DatasetApplicationService {
// 状态分布统计
Map<String, Integer> statusDistribution = new HashMap<>();
if (allFiles != null) {
for (DatasetFile file : allFiles) {
if (!visibleFiles.isEmpty()) {
for (DatasetFile file : visibleFiles) {
String status = file.getStatus() != null ? file.getStatus() : "unknown";
statusDistribution.put(status, statusDistribution.getOrDefault(status, 0) + 1);
}

View File

@@ -22,16 +22,16 @@ import com.datamate.datamanagement.domain.model.dataset.DatasetFileUploadCheckIn
import com.datamate.datamanagement.infrastructure.exception.DataManagementErrorCode;
import com.datamate.datamanagement.infrastructure.persistence.repository.DatasetFileRepository;
import com.datamate.datamanagement.infrastructure.persistence.repository.DatasetRepository;
import com.datamate.datamanagement.interfaces.converter.DatasetConverter;
import com.datamate.datamanagement.interfaces.dto.AddFilesRequest;
import com.datamate.datamanagement.interfaces.dto.CopyFilesRequest;
import com.datamate.datamanagement.interfaces.dto.CreateDirectoryRequest;
import com.datamate.datamanagement.interfaces.dto.UploadFileRequest;
import com.datamate.datamanagement.interfaces.dto.UploadFilesPreRequest;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
import jakarta.servlet.http.HttpServletResponse;
import com.datamate.datamanagement.interfaces.converter.DatasetConverter;
import com.datamate.datamanagement.interfaces.dto.AddFilesRequest;
import com.datamate.datamanagement.interfaces.dto.CopyFilesRequest;
import com.datamate.datamanagement.interfaces.dto.CreateDirectoryRequest;
import com.datamate.datamanagement.interfaces.dto.UploadFileRequest;
import com.datamate.datamanagement.interfaces.dto.UploadFilesPreRequest;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
import jakarta.servlet.http.HttpServletResponse;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.compress.archivers.zip.ZipArchiveEntry;
import org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream;
@@ -40,24 +40,24 @@ import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.core.io.Resource;
import org.springframework.core.io.UrlResource;
import org.springframework.http.HttpHeaders;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
import org.springframework.transaction.support.TransactionSynchronization;
import org.springframework.transaction.support.TransactionSynchronizationManager;
import org.springframework.http.HttpHeaders;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
import org.springframework.transaction.support.TransactionSynchronization;
import org.springframework.transaction.support.TransactionSynchronizationManager;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.net.MalformedURLException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.attribute.BasicFileAttributes;
import java.time.LocalDateTime;
import java.time.ZoneId;
import java.time.format.DateTimeFormatter;
import java.util.*;
import java.io.InputStream;
import java.net.MalformedURLException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.attribute.BasicFileAttributes;
import java.time.LocalDateTime;
import java.time.ZoneId;
import java.time.format.DateTimeFormatter;
import java.util.*;
import java.util.concurrent.CompletableFuture;
import java.util.function.Function;
import java.util.stream.Collectors;
@@ -70,24 +70,25 @@ import java.util.stream.Stream;
@Service
@Transactional
public class DatasetFileApplicationService {
private static final String PDF_FILE_TYPE = "pdf";
private static final String DOC_FILE_TYPE = "doc";
private static final String DOCX_FILE_TYPE = "docx";
private static final String XLS_FILE_TYPE = "xls";
private static final String XLSX_FILE_TYPE = "xlsx";
private static final Set<String> DOCUMENT_TEXT_FILE_TYPES = Set.of(
PDF_FILE_TYPE,
DOC_FILE_TYPE,
DOCX_FILE_TYPE,
XLS_FILE_TYPE,
XLSX_FILE_TYPE
);
private static final String DERIVED_METADATA_KEY = "derived_from_file_id";
private static final String PDF_FILE_TYPE = "pdf";
private static final String DOC_FILE_TYPE = "doc";
private static final String DOCX_FILE_TYPE = "docx";
private static final String XLS_FILE_TYPE = "xls";
private static final String XLSX_FILE_TYPE = "xlsx";
private static final Set<String> DOCUMENT_TEXT_FILE_TYPES = Set.of(
PDF_FILE_TYPE,
DOC_FILE_TYPE,
DOCX_FILE_TYPE,
XLS_FILE_TYPE,
XLSX_FILE_TYPE
);
private static final String DERIVED_METADATA_KEY = "derived_from_file_id";
private final DatasetFileRepository datasetFileRepository;
private final DatasetRepository datasetRepository;
private final FileService fileService;
private final PdfTextExtractAsyncService pdfTextExtractAsyncService;
private final DatasetFilePreviewService datasetFilePreviewService;
@Value("${datamate.data-management.base-path:/dataset}")
private String datasetBasePath;
@@ -96,15 +97,17 @@ public class DatasetFileApplicationService {
private DuplicateMethod duplicateMethod;
@Autowired
public DatasetFileApplicationService(DatasetFileRepository datasetFileRepository,
DatasetRepository datasetRepository,
FileService fileService,
PdfTextExtractAsyncService pdfTextExtractAsyncService) {
this.datasetFileRepository = datasetFileRepository;
this.datasetRepository = datasetRepository;
this.fileService = fileService;
this.pdfTextExtractAsyncService = pdfTextExtractAsyncService;
}
public DatasetFileApplicationService(DatasetFileRepository datasetFileRepository,
DatasetRepository datasetRepository,
FileService fileService,
PdfTextExtractAsyncService pdfTextExtractAsyncService,
DatasetFilePreviewService datasetFilePreviewService) {
this.datasetFileRepository = datasetFileRepository;
this.datasetRepository = datasetRepository;
this.fileService = fileService;
this.pdfTextExtractAsyncService = pdfTextExtractAsyncService;
this.datasetFilePreviewService = datasetFilePreviewService;
}
/**
* 获取数据集文件列表
@@ -123,57 +126,70 @@ public class DatasetFileApplicationService {
* @param status 状态过滤
* @param name 文件名模糊查询
* @param hasAnnotation 是否有标注
* @param excludeSourceDocuments 是否排除源文档(PDF/DOC/DOCX/XLS/XLSX)
* @param excludeSourceDocuments 是否排除源文档(PDF/DOC/DOCX/XLS/XLSX)
* @param pagingQuery 分页参数
* @return 分页文件列表
*/
@Transactional(readOnly = true)
public PagedResponse<DatasetFile> getDatasetFiles(String datasetId, String fileType, String status, String name,
Boolean hasAnnotation, boolean excludeSourceDocuments, PagingQuery pagingQuery) {
IPage<DatasetFile> page = new Page<>(pagingQuery.getPage(), pagingQuery.getSize());
IPage<DatasetFile> files = datasetFileRepository.findByCriteria(datasetId, fileType, status, name, hasAnnotation, page);
if (excludeSourceDocuments) {
// 过滤掉源文档文件(PDF/DOC/DOCX/XLS/XLSX),用于标注场景只展示派生文件
List<DatasetFile> filteredRecords = files.getRecords().stream()
.filter(file -> !isSourceDocument(file))
.collect(Collectors.toList());
// 重新构建分页结果
Page<DatasetFile> filteredPage = new Page<>(files.getCurrent(), files.getSize(), files.getTotal());
filteredPage.setRecords(filteredRecords);
return PagedResponse.of(filteredPage);
}
return PagedResponse.of(files);
}
public PagedResponse<DatasetFile> getDatasetFiles(String datasetId, String fileType, String status, String name,
Boolean hasAnnotation, boolean excludeSourceDocuments, PagingQuery pagingQuery) {
IPage<DatasetFile> page = new Page<>(pagingQuery.getPage(), pagingQuery.getSize());
IPage<DatasetFile> files = datasetFileRepository.findByCriteria(datasetId, fileType, status, name, hasAnnotation, page);
if (excludeSourceDocuments) {
// 过滤掉源文档文件(PDF/DOC/DOCX/XLS/XLSX),用于标注场景只展示派生文件
List<DatasetFile> filteredRecords = files.getRecords().stream()
.filter(file -> !isSourceDocument(file))
.collect(Collectors.toList());
// 重新构建分页结果
Page<DatasetFile> filteredPage = new Page<>(files.getCurrent(), files.getSize(), files.getTotal());
filteredPage.setRecords(filteredRecords);
return PagedResponse.of(filteredPage);
}
return PagedResponse.of(files);
}
/**
* 获取数据集文件列表
*/
@Transactional(readOnly = true)
public PagedResponse<DatasetFile> getDatasetFilesWithDirectory(String datasetId, String prefix, boolean excludeDerivedFiles, PagingQuery pagingQuery) {
Dataset dataset = datasetRepository.getById(datasetId);
int page = Math.max(pagingQuery.getPage(), 1);
int size = pagingQuery.getSize() == null || pagingQuery.getSize() < 0 ? 20 : pagingQuery.getSize();
if (dataset == null) {
return PagedResponse.of(new Page<>(page, size));
}
String datasetPath = dataset.getPath();
Path queryPath = Path.of(dataset.getPath() + File.separator + prefix);
public PagedResponse<DatasetFile> getDatasetFilesWithDirectory(String datasetId, String prefix, boolean excludeDerivedFiles, PagingQuery pagingQuery) {
Dataset dataset = datasetRepository.getById(datasetId);
int page = Math.max(pagingQuery.getPage(), 1);
int size = pagingQuery.getSize() == null || pagingQuery.getSize() < 0 ? 20 : pagingQuery.getSize();
if (dataset == null) {
return PagedResponse.of(new Page<>(page, size));
}
String datasetPath = dataset.getPath();
Path queryPath = Path.of(dataset.getPath() + File.separator + prefix);
Map<String, DatasetFile> datasetFilesMap = datasetFileRepository.findAllByDatasetId(datasetId)
.stream().collect(Collectors.toMap(DatasetFile::getFilePath, Function.identity()));
.stream()
.filter(file -> file.getFilePath() != null)
.collect(Collectors.toMap(
file -> normalizeFilePath(file.getFilePath()),
Function.identity(),
(left, right) -> left
));
Set<String> derivedFilePaths = excludeDerivedFiles
? datasetFilesMap.values().stream()
.filter(this::isDerivedFile)
.map(DatasetFile::getFilePath)
.map(this::normalizeFilePath)
.filter(Objects::nonNull)
.collect(Collectors.toSet())
: Collections.emptySet();
try (Stream<Path> pathStream = Files.list(queryPath)) {
// 如果目录不存在,直接返回空结果(数据集刚创建时目录可能还未生成)
if (!Files.exists(queryPath)) {
return new PagedResponse<>(page, size, 0, 0, Collections.emptyList());
}
try (Stream<Path> pathStream = Files.list(queryPath)) {
List<Path> allFiles = pathStream
.filter(path -> path.toString().startsWith(datasetPath))
.filter(path -> !excludeDerivedFiles || Files.isDirectory(path) || !derivedFilePaths.contains(path.toString()))
.filter(path -> !excludeDerivedFiles
|| Files.isDirectory(path)
|| !derivedFilePaths.contains(normalizeFilePath(path.toString())))
.sorted(Comparator
.comparing((Path path) -> !Files.isDirectory(path))
.thenComparing(path -> path.getFileName().toString()))
@@ -192,16 +208,21 @@ public class DatasetFileApplicationService {
if (fromIndex < total) {
pageData = allFiles.subList(fromIndex, toIndex);
}
List<DatasetFile> datasetFiles = pageData.stream().map(path -> getDatasetFile(path, datasetFilesMap)).toList();
List<DatasetFile> datasetFiles = pageData.stream()
.map(path -> getDatasetFile(path, datasetFilesMap, excludeDerivedFiles, derivedFilePaths))
.toList();
return new PagedResponse<>(page, size, total, totalPages, datasetFiles);
} catch (IOException e) {
log.error("list dataset path error", e);
return PagedResponse.of(new Page<>(page, size));
}
}
} catch (IOException e) {
log.error("list dataset path error", e);
return PagedResponse.of(new Page<>(page, size));
}
}
private DatasetFile getDatasetFile(Path path, Map<String, DatasetFile> datasetFilesMap) {
private DatasetFile getDatasetFile(Path path,
Map<String, DatasetFile> datasetFilesMap,
boolean excludeDerivedFiles,
Set<String> derivedFilePaths) {
DatasetFile datasetFile = new DatasetFile();
LocalDateTime localDateTime = LocalDateTime.now();
try {
@@ -222,23 +243,32 @@ public class DatasetFileApplicationService {
long fileCount;
long totalSize;
try (Stream<Path> walk = Files.walk(path)) {
fileCount = walk.filter(Files::isRegularFile).count();
}
try (Stream<Path> walk = Files.walk(path)) {
totalSize = walk
.filter(Files::isRegularFile)
.mapToLong(p -> {
try {
return Files.size(p);
} catch (IOException e) {
log.error("get file size error", e);
return 0L;
}
})
.sum();
}
try (Stream<Path> walk = Files.walk(path)) {
Stream<Path> fileStream = walk.filter(Files::isRegularFile);
if (excludeDerivedFiles && !derivedFilePaths.isEmpty()) {
fileStream = fileStream.filter(filePath ->
!derivedFilePaths.contains(normalizeFilePath(filePath.toString())));
}
fileCount = fileStream.count();
}
try (Stream<Path> walk = Files.walk(path)) {
Stream<Path> fileStream = walk.filter(Files::isRegularFile);
if (excludeDerivedFiles && !derivedFilePaths.isEmpty()) {
fileStream = fileStream.filter(filePath ->
!derivedFilePaths.contains(normalizeFilePath(filePath.toString())));
}
totalSize = fileStream
.mapToLong(p -> {
try {
return Files.size(p);
} catch (IOException e) {
log.error("get file size error", e);
return 0L;
}
})
.sum();
}
datasetFile.setFileCount(fileCount);
datasetFile.setFileSize(totalSize);
@@ -246,45 +276,56 @@ public class DatasetFileApplicationService {
log.error("stat directory info error", e);
}
} else {
DatasetFile exist = datasetFilesMap.get(path.toString());
if (exist == null) {
datasetFile.setId("file-" + datasetFile.getFileName());
datasetFile.setFileSize(path.toFile().length());
} else {
DatasetFile exist = datasetFilesMap.get(normalizeFilePath(path.toString()));
if (exist == null) {
datasetFile.setId("file-" + datasetFile.getFileName());
datasetFile.setFileSize(path.toFile().length());
} else {
datasetFile = exist;
}
}
return datasetFile;
}
private String normalizeFilePath(String filePath) {
if (filePath == null || filePath.isBlank()) {
return null;
}
try {
return Paths.get(filePath).toAbsolutePath().normalize().toString();
} catch (Exception e) {
return filePath.replace("\\", "/");
}
}
private boolean isSourceDocument(DatasetFile datasetFile) {
if (datasetFile == null) {
return false;
}
String fileType = datasetFile.getFileType();
if (fileType == null || fileType.isBlank()) {
return false;
}
return DOCUMENT_TEXT_FILE_TYPES.contains(fileType.toLowerCase(Locale.ROOT));
}
private boolean isDerivedFile(DatasetFile datasetFile) {
if (datasetFile == null) {
return false;
}
String metadata = datasetFile.getMetadata();
if (metadata == null || metadata.isBlank()) {
return false;
}
try {
ObjectMapper mapper = new ObjectMapper();
Map<String, Object> metadataMap = mapper.readValue(metadata, new TypeReference<Map<String, Object>>() {});
return metadataMap.get(DERIVED_METADATA_KEY) != null;
} catch (Exception e) {
log.debug("Failed to parse dataset file metadata for derived detection: {}", datasetFile.getId(), e);
return false;
}
}
}
String fileType = datasetFile.getFileType();
if (fileType == null || fileType.isBlank()) {
return false;
}
return DOCUMENT_TEXT_FILE_TYPES.contains(fileType.toLowerCase(Locale.ROOT));
}
private boolean isDerivedFile(DatasetFile datasetFile) {
if (datasetFile == null) {
return false;
}
String metadata = datasetFile.getMetadata();
if (metadata == null || metadata.isBlank()) {
return false;
}
try {
ObjectMapper mapper = new ObjectMapper();
Map<String, Object> metadataMap = mapper.readValue(metadata, new TypeReference<Map<String, Object>>() {});
return metadataMap.get(DERIVED_METADATA_KEY) != null;
} catch (Exception e) {
log.debug("Failed to parse dataset file metadata for derived detection: {}", datasetFile.getId(), e);
return false;
}
}
/**
* 获取文件详情
@@ -305,18 +346,19 @@ public class DatasetFileApplicationService {
* 删除文件
*/
@Transactional
public void deleteDatasetFile(String datasetId, String fileId) {
DatasetFile file = getDatasetFile(datasetId, fileId);
Dataset dataset = datasetRepository.getById(datasetId);
dataset.setFiles(new ArrayList<>(Collections.singleton(file)));
datasetFileRepository.removeById(fileId);
dataset.removeFile(file);
datasetRepository.updateById(dataset);
// 删除文件时,上传到数据集中的文件会同时删除数据库中的记录和文件系统中的文件,归集过来的文件仅删除数据库中的记录
if (file.getFilePath().startsWith(dataset.getPath())) {
try {
Path filePath = Paths.get(file.getFilePath());
Files.deleteIfExists(filePath);
public void deleteDatasetFile(String datasetId, String fileId) {
DatasetFile file = getDatasetFile(datasetId, fileId);
Dataset dataset = datasetRepository.getById(datasetId);
dataset.setFiles(new ArrayList<>(Collections.singleton(file)));
datasetFileRepository.removeById(fileId);
dataset.removeFile(file);
datasetRepository.updateById(dataset);
datasetFilePreviewService.deletePreviewFileQuietly(datasetId, fileId);
// 删除文件时,上传到数据集中的文件会同时删除数据库中的记录和文件系统中的文件,归集过来的文件仅删除数据库中的记录
if (file.getFilePath().startsWith(dataset.getPath())) {
try {
Path filePath = Paths.get(file.getFilePath());
Files.deleteIfExists(filePath);
} catch (IOException ex) {
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}
@@ -682,9 +724,10 @@ public class DatasetFileApplicationService {
})
.collect(Collectors.toList());
for (DatasetFile file : filesToDelete) {
datasetFileRepository.removeById(file.getId());
}
for (DatasetFile file : filesToDelete) {
datasetFileRepository.removeById(file.getId());
datasetFilePreviewService.deletePreviewFileQuietly(datasetId, file.getId());
}
// 删除文件系统中的目录
try {
@@ -740,17 +783,17 @@ public class DatasetFileApplicationService {
}
}
/**
* 复制文件到数据集目录
*
* @param datasetId 数据集id
* @param req 复制文件请求
* @return 复制的文件列表
*/
@Transactional
public List<DatasetFile> copyFilesToDatasetDir(String datasetId, CopyFilesRequest req) {
Dataset dataset = datasetRepository.getById(datasetId);
BusinessAssert.notNull(dataset, SystemErrorCode.RESOURCE_NOT_FOUND);
/**
* 复制文件到数据集目录
*
* @param datasetId 数据集id
* @param req 复制文件请求
* @return 复制的文件列表
*/
@Transactional
public List<DatasetFile> copyFilesToDatasetDir(String datasetId, CopyFilesRequest req) {
Dataset dataset = datasetRepository.getById(datasetId);
BusinessAssert.notNull(dataset, SystemErrorCode.RESOURCE_NOT_FOUND);
List<DatasetFile> copiedFiles = new ArrayList<>();
List<DatasetFile> existDatasetFiles = datasetFileRepository.findAllByDatasetId(datasetId);
dataset.setFiles(existDatasetFiles);
@@ -780,80 +823,80 @@ public class DatasetFileApplicationService {
datasetFileRepository.saveOrUpdateBatch(copiedFiles, 100);
dataset.active();
datasetRepository.updateById(dataset);
CompletableFuture.runAsync(() -> copyFilesToDatasetDir(req.sourcePaths(), dataset));
return copiedFiles;
}
/**
* 复制文件到数据集目录(保留相对路径,适用于数据源导入)
*
* @param datasetId 数据集id
* @param sourceRoot 数据源根目录
* @param sourcePaths 源文件路径列表
* @return 复制的文件列表
*/
@Transactional
public List<DatasetFile> copyFilesToDatasetDirWithSourceRoot(String datasetId, Path sourceRoot, List<String> sourcePaths) {
Dataset dataset = datasetRepository.getById(datasetId);
BusinessAssert.notNull(dataset, SystemErrorCode.RESOURCE_NOT_FOUND);
Path normalizedRoot = sourceRoot.toAbsolutePath().normalize();
List<DatasetFile> copiedFiles = new ArrayList<>();
List<DatasetFile> existDatasetFiles = datasetFileRepository.findAllByDatasetId(datasetId);
dataset.setFiles(existDatasetFiles);
Map<String, DatasetFile> copyTargets = new LinkedHashMap<>();
for (String sourceFilePath : sourcePaths) {
if (sourceFilePath == null || sourceFilePath.isBlank()) {
continue;
}
Path sourcePath = Paths.get(sourceFilePath).toAbsolutePath().normalize();
if (!sourcePath.startsWith(normalizedRoot)) {
log.warn("Source file path is out of root: {}", sourceFilePath);
continue;
}
if (!Files.exists(sourcePath) || !Files.isRegularFile(sourcePath)) {
log.warn("Source file does not exist or is not a regular file: {}", sourceFilePath);
continue;
}
Path relativePath = normalizedRoot.relativize(sourcePath);
String fileName = sourcePath.getFileName().toString();
File sourceFile = sourcePath.toFile();
LocalDateTime currentTime = LocalDateTime.now();
Path targetPath = Paths.get(dataset.getPath(), relativePath.toString());
DatasetFile datasetFile = DatasetFile.builder()
.id(UUID.randomUUID().toString())
.datasetId(datasetId)
.fileName(fileName)
.fileType(AnalyzerUtils.getExtension(fileName))
.fileSize(sourceFile.length())
.filePath(targetPath.toString())
.uploadTime(currentTime)
.lastAccessTime(currentTime)
.build();
setDatasetFileId(datasetFile, dataset);
dataset.addFile(datasetFile);
copiedFiles.add(datasetFile);
copyTargets.put(sourceFilePath, datasetFile);
}
if (copiedFiles.isEmpty()) {
return copiedFiles;
}
datasetFileRepository.saveOrUpdateBatch(copiedFiles, 100);
dataset.active();
datasetRepository.updateById(dataset);
CompletableFuture.runAsync(() -> copyFilesToDatasetDirWithRelativePath(copyTargets, dataset, normalizedRoot));
return copiedFiles;
}
private void copyFilesToDatasetDir(List<String> sourcePaths, Dataset dataset) {
for (String sourcePath : sourcePaths) {
Path sourceFilePath = Paths.get(sourcePath);
Path targetFilePath = Paths.get(dataset.getPath(), sourceFilePath.getFileName().toString());
try {
CompletableFuture.runAsync(() -> copyFilesToDatasetDir(req.sourcePaths(), dataset));
return copiedFiles;
}
/**
* 复制文件到数据集目录(保留相对路径,适用于数据源导入)
*
* @param datasetId 数据集id
* @param sourceRoot 数据源根目录
* @param sourcePaths 源文件路径列表
* @return 复制的文件列表
*/
@Transactional
public List<DatasetFile> copyFilesToDatasetDirWithSourceRoot(String datasetId, Path sourceRoot, List<String> sourcePaths) {
Dataset dataset = datasetRepository.getById(datasetId);
BusinessAssert.notNull(dataset, SystemErrorCode.RESOURCE_NOT_FOUND);
Path normalizedRoot = sourceRoot.toAbsolutePath().normalize();
List<DatasetFile> copiedFiles = new ArrayList<>();
List<DatasetFile> existDatasetFiles = datasetFileRepository.findAllByDatasetId(datasetId);
dataset.setFiles(existDatasetFiles);
Map<String, DatasetFile> copyTargets = new LinkedHashMap<>();
for (String sourceFilePath : sourcePaths) {
if (sourceFilePath == null || sourceFilePath.isBlank()) {
continue;
}
Path sourcePath = Paths.get(sourceFilePath).toAbsolutePath().normalize();
if (!sourcePath.startsWith(normalizedRoot)) {
log.warn("Source file path is out of root: {}", sourceFilePath);
continue;
}
if (!Files.exists(sourcePath) || !Files.isRegularFile(sourcePath)) {
log.warn("Source file does not exist or is not a regular file: {}", sourceFilePath);
continue;
}
Path relativePath = normalizedRoot.relativize(sourcePath);
String fileName = sourcePath.getFileName().toString();
File sourceFile = sourcePath.toFile();
LocalDateTime currentTime = LocalDateTime.now();
Path targetPath = Paths.get(dataset.getPath(), relativePath.toString());
DatasetFile datasetFile = DatasetFile.builder()
.id(UUID.randomUUID().toString())
.datasetId(datasetId)
.fileName(fileName)
.fileType(AnalyzerUtils.getExtension(fileName))
.fileSize(sourceFile.length())
.filePath(targetPath.toString())
.uploadTime(currentTime)
.lastAccessTime(currentTime)
.build();
setDatasetFileId(datasetFile, dataset);
dataset.addFile(datasetFile);
copiedFiles.add(datasetFile);
copyTargets.put(sourceFilePath, datasetFile);
}
if (copiedFiles.isEmpty()) {
return copiedFiles;
}
datasetFileRepository.saveOrUpdateBatch(copiedFiles, 100);
dataset.active();
datasetRepository.updateById(dataset);
CompletableFuture.runAsync(() -> copyFilesToDatasetDirWithRelativePath(copyTargets, dataset, normalizedRoot));
return copiedFiles;
}
private void copyFilesToDatasetDir(List<String> sourcePaths, Dataset dataset) {
for (String sourcePath : sourcePaths) {
Path sourceFilePath = Paths.get(sourcePath);
Path targetFilePath = Paths.get(dataset.getPath(), sourceFilePath.getFileName().toString());
try {
Files.createDirectories(Path.of(dataset.getPath()));
Files.copy(sourceFilePath, targetFilePath);
DatasetFile datasetFile = datasetFileRepository.findByDatasetIdAndFileName(
@@ -863,39 +906,39 @@ public class DatasetFileApplicationService {
triggerPdfTextExtraction(dataset, datasetFile);
} catch (IOException e) {
log.error("Failed to copy file from {} to {}", sourcePath, targetFilePath, e);
}
}
}
private void copyFilesToDatasetDirWithRelativePath(
Map<String, DatasetFile> copyTargets,
Dataset dataset,
Path sourceRoot
) {
Path datasetRoot = Paths.get(dataset.getPath()).toAbsolutePath().normalize();
Path normalizedRoot = sourceRoot.toAbsolutePath().normalize();
for (Map.Entry<String, DatasetFile> entry : copyTargets.entrySet()) {
Path sourcePath = Paths.get(entry.getKey()).toAbsolutePath().normalize();
if (!sourcePath.startsWith(normalizedRoot)) {
log.warn("Source file path is out of root: {}", sourcePath);
continue;
}
Path relativePath = normalizedRoot.relativize(sourcePath);
Path targetFilePath = datasetRoot.resolve(relativePath).normalize();
if (!targetFilePath.startsWith(datasetRoot)) {
log.warn("Target file path is out of dataset path: {}", targetFilePath);
continue;
}
try {
Files.createDirectories(targetFilePath.getParent());
Files.copy(sourcePath, targetFilePath);
triggerPdfTextExtraction(dataset, entry.getValue());
} catch (IOException e) {
log.error("Failed to copy file from {} to {}", sourcePath, targetFilePath, e);
}
}
}
}
}
}
private void copyFilesToDatasetDirWithRelativePath(
Map<String, DatasetFile> copyTargets,
Dataset dataset,
Path sourceRoot
) {
Path datasetRoot = Paths.get(dataset.getPath()).toAbsolutePath().normalize();
Path normalizedRoot = sourceRoot.toAbsolutePath().normalize();
for (Map.Entry<String, DatasetFile> entry : copyTargets.entrySet()) {
Path sourcePath = Paths.get(entry.getKey()).toAbsolutePath().normalize();
if (!sourcePath.startsWith(normalizedRoot)) {
log.warn("Source file path is out of root: {}", sourcePath);
continue;
}
Path relativePath = normalizedRoot.relativize(sourcePath);
Path targetFilePath = datasetRoot.resolve(relativePath).normalize();
if (!targetFilePath.startsWith(datasetRoot)) {
log.warn("Target file path is out of dataset path: {}", targetFilePath);
continue;
}
try {
Files.createDirectories(targetFilePath.getParent());
Files.copy(sourcePath, targetFilePath);
triggerPdfTextExtraction(dataset, entry.getValue());
} catch (IOException e) {
log.error("Failed to copy file from {} to {}", sourcePath, targetFilePath, e);
}
}
}
/**
* 添加文件到数据集(仅创建数据库记录,不执行文件系统操作)
*
@@ -952,31 +995,31 @@ public class DatasetFileApplicationService {
return addedFiles;
}
private void triggerPdfTextExtraction(Dataset dataset, DatasetFile datasetFile) {
if (dataset == null || datasetFile == null) {
return;
}
if (dataset.getDatasetType() != DatasetType.TEXT) {
return;
}
String fileType = datasetFile.getFileType();
if (fileType == null || !DOCUMENT_TEXT_FILE_TYPES.contains(fileType.toLowerCase(Locale.ROOT))) {
return;
}
String datasetId = dataset.getId();
String fileId = datasetFile.getId();
if (datasetId == null || fileId == null) {
return;
}
if (TransactionSynchronizationManager.isSynchronizationActive()) {
TransactionSynchronizationManager.registerSynchronization(new TransactionSynchronization() {
@Override
public void afterCommit() {
pdfTextExtractAsyncService.extractPdfText(datasetId, fileId);
}
});
return;
}
pdfTextExtractAsyncService.extractPdfText(datasetId, fileId);
}
}
private void triggerPdfTextExtraction(Dataset dataset, DatasetFile datasetFile) {
if (dataset == null || datasetFile == null) {
return;
}
if (dataset.getDatasetType() != DatasetType.TEXT) {
return;
}
String fileType = datasetFile.getFileType();
if (fileType == null || !DOCUMENT_TEXT_FILE_TYPES.contains(fileType.toLowerCase(Locale.ROOT))) {
return;
}
String datasetId = dataset.getId();
String fileId = datasetFile.getId();
if (datasetId == null || fileId == null) {
return;
}
if (TransactionSynchronizationManager.isSynchronizationActive()) {
TransactionSynchronizationManager.registerSynchronization(new TransactionSynchronization() {
@Override
public void afterCommit() {
pdfTextExtractAsyncService.extractPdfText(datasetId, fileId);
}
});
return;
}
pdfTextExtractAsyncService.extractPdfText(datasetId, fileId);
}
}

View File

@@ -0,0 +1,171 @@
package com.datamate.datamanagement.application;
import com.datamate.datamanagement.common.enums.KnowledgeItemPreviewStatus;
import com.datamate.datamanagement.domain.model.dataset.DatasetFile;
import com.datamate.datamanagement.infrastructure.config.DataManagementProperties;
import com.datamate.datamanagement.infrastructure.persistence.repository.DatasetFileRepository;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.StringUtils;
import org.springframework.scheduling.annotation.Async;
import org.springframework.stereotype.Service;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Set;
/**
* 数据集文件预览转换异步任务
*/
@Service
@RequiredArgsConstructor
@Slf4j
public class DatasetFilePreviewAsyncService {
private static final Set<String> OFFICE_EXTENSIONS = Set.of("doc", "docx");
private static final String DATASET_PREVIEW_DIR = "dataset-previews";
private static final String PREVIEW_FILE_SUFFIX = ".pdf";
private static final String PATH_SEPARATOR = "/";
private static final int MAX_ERROR_LENGTH = 500;
private static final DateTimeFormatter PREVIEW_TIME_FORMATTER = DateTimeFormatter.ISO_LOCAL_DATE_TIME;
private final DatasetFileRepository datasetFileRepository;
private final DataManagementProperties dataManagementProperties;
private final ObjectMapper objectMapper = new ObjectMapper();
@Async
public void convertPreviewAsync(String fileId) {
if (StringUtils.isBlank(fileId)) {
return;
}
DatasetFile file = datasetFileRepository.getById(fileId);
if (file == null) {
return;
}
String extension = resolveFileExtension(resolveOriginalName(file));
if (!OFFICE_EXTENSIONS.contains(extension)) {
updatePreviewStatus(file, KnowledgeItemPreviewStatus.FAILED, null, "仅支持 DOC/DOCX 转换");
return;
}
if (StringUtils.isBlank(file.getFilePath())) {
updatePreviewStatus(file, KnowledgeItemPreviewStatus.FAILED, null, "源文件路径为空");
return;
}
Path sourcePath = Paths.get(file.getFilePath()).toAbsolutePath().normalize();
if (!Files.exists(sourcePath) || !Files.isRegularFile(sourcePath)) {
updatePreviewStatus(file, KnowledgeItemPreviewStatus.FAILED, null, "源文件不存在");
return;
}
KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo = KnowledgeItemPreviewMetadataHelper
.readPreviewInfo(file.getMetadata(), objectMapper);
String previewRelativePath = StringUtils.defaultIfBlank(
previewInfo.pdfPath(),
resolvePreviewRelativePath(file.getDatasetId(), file.getId())
);
Path targetPath = resolvePreviewStoragePath(previewRelativePath);
try {
ensureParentDirectory(targetPath);
LibreOfficeConverter.convertToPdf(sourcePath, targetPath);
updatePreviewStatus(file, KnowledgeItemPreviewStatus.READY, previewRelativePath, null);
} catch (Exception e) {
log.error("dataset preview convert failed, fileId: {}", file.getId(), e);
updatePreviewStatus(file, KnowledgeItemPreviewStatus.FAILED, previewRelativePath, trimError(e.getMessage()));
}
}
private void updatePreviewStatus(
DatasetFile file,
KnowledgeItemPreviewStatus status,
String previewRelativePath,
String error
) {
if (file == null) {
return;
}
String updatedMetadata = KnowledgeItemPreviewMetadataHelper.applyPreviewInfo(
file.getMetadata(),
objectMapper,
status,
previewRelativePath,
error,
nowText()
);
file.setMetadata(updatedMetadata);
datasetFileRepository.updateById(file);
}
private String resolveOriginalName(DatasetFile file) {
if (file == null) {
return "";
}
if (StringUtils.isNotBlank(file.getFileName())) {
return file.getFileName();
}
if (StringUtils.isNotBlank(file.getFilePath())) {
return Paths.get(file.getFilePath()).getFileName().toString();
}
return "";
}
private String resolveFileExtension(String fileName) {
if (StringUtils.isBlank(fileName)) {
return "";
}
int dotIndex = fileName.lastIndexOf('.');
if (dotIndex <= 0 || dotIndex >= fileName.length() - 1) {
return "";
}
return fileName.substring(dotIndex + 1).toLowerCase();
}
private String resolvePreviewRelativePath(String datasetId, String fileId) {
String relativePath = Paths.get(DATASET_PREVIEW_DIR, datasetId, fileId + PREVIEW_FILE_SUFFIX)
.toString();
return relativePath.replace("\\", PATH_SEPARATOR);
}
private Path resolvePreviewStoragePath(String relativePath) {
String normalizedRelativePath = StringUtils.defaultString(relativePath).replace("/", java.io.File.separator);
Path root = resolveUploadRootPath();
Path target = root.resolve(normalizedRelativePath).toAbsolutePath().normalize();
if (!target.startsWith(root)) {
throw new IllegalArgumentException("invalid preview path");
}
return target;
}
private Path resolveUploadRootPath() {
String uploadDir = dataManagementProperties.getFileStorage().getUploadDir();
return Paths.get(uploadDir).toAbsolutePath().normalize();
}
private void ensureParentDirectory(Path targetPath) {
try {
Path parent = targetPath.getParent();
if (parent != null) {
Files.createDirectories(parent);
}
} catch (Exception e) {
throw new IllegalStateException("创建预览目录失败", e);
}
}
private String trimError(String error) {
if (StringUtils.isBlank(error)) {
return "";
}
if (error.length() <= MAX_ERROR_LENGTH) {
return error;
}
return error.substring(0, MAX_ERROR_LENGTH);
}
private String nowText() {
return LocalDateTime.now().format(PREVIEW_TIME_FORMATTER);
}
}

View File

@@ -0,0 +1,233 @@
package com.datamate.datamanagement.application;
import com.datamate.common.infrastructure.exception.BusinessAssert;
import com.datamate.common.infrastructure.exception.CommonErrorCode;
import com.datamate.datamanagement.common.enums.KnowledgeItemPreviewStatus;
import com.datamate.datamanagement.domain.model.dataset.DatasetFile;
import com.datamate.datamanagement.infrastructure.config.DataManagementProperties;
import com.datamate.datamanagement.infrastructure.persistence.repository.DatasetFileRepository;
import com.datamate.datamanagement.interfaces.dto.DatasetFilePreviewStatusResponse;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.StringUtils;
import org.springframework.stereotype.Service;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Objects;
import java.util.Set;
/**
* 数据集文件预览转换服务
*/
@Service
@RequiredArgsConstructor
@Slf4j
public class DatasetFilePreviewService {
private static final Set<String> OFFICE_EXTENSIONS = Set.of("doc", "docx");
private static final String DATASET_PREVIEW_DIR = "dataset-previews";
private static final String PREVIEW_FILE_SUFFIX = ".pdf";
private static final String PATH_SEPARATOR = "/";
private static final DateTimeFormatter PREVIEW_TIME_FORMATTER = DateTimeFormatter.ISO_LOCAL_DATE_TIME;
private final DatasetFileRepository datasetFileRepository;
private final DataManagementProperties dataManagementProperties;
private final DatasetFilePreviewAsyncService datasetFilePreviewAsyncService;
private final ObjectMapper objectMapper = new ObjectMapper();
public DatasetFilePreviewStatusResponse getPreviewStatus(String datasetId, String fileId) {
DatasetFile file = requireDatasetFile(datasetId, fileId);
assertOfficeDocument(file);
KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo = KnowledgeItemPreviewMetadataHelper
.readPreviewInfo(file.getMetadata(), objectMapper);
if (previewInfo.status() == KnowledgeItemPreviewStatus.READY && !previewPdfExists(file, previewInfo)) {
previewInfo = markPreviewFailed(file, previewInfo, "预览文件不存在");
}
return buildResponse(previewInfo);
}
public DatasetFilePreviewStatusResponse ensurePreview(String datasetId, String fileId) {
DatasetFile file = requireDatasetFile(datasetId, fileId);
assertOfficeDocument(file);
KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo = KnowledgeItemPreviewMetadataHelper
.readPreviewInfo(file.getMetadata(), objectMapper);
if (previewInfo.status() == KnowledgeItemPreviewStatus.READY && previewPdfExists(file, previewInfo)) {
return buildResponse(previewInfo);
}
if (previewInfo.status() == KnowledgeItemPreviewStatus.PROCESSING) {
return buildResponse(previewInfo);
}
String previewRelativePath = resolvePreviewRelativePath(file.getDatasetId(), file.getId());
String updatedMetadata = KnowledgeItemPreviewMetadataHelper.applyPreviewInfo(
file.getMetadata(),
objectMapper,
KnowledgeItemPreviewStatus.PROCESSING,
previewRelativePath,
null,
nowText()
);
file.setMetadata(updatedMetadata);
datasetFileRepository.updateById(file);
datasetFilePreviewAsyncService.convertPreviewAsync(file.getId());
KnowledgeItemPreviewMetadataHelper.PreviewInfo refreshed = KnowledgeItemPreviewMetadataHelper
.readPreviewInfo(updatedMetadata, objectMapper);
return buildResponse(refreshed);
}
public boolean isOfficeDocument(String fileName) {
String extension = resolveFileExtension(fileName);
return StringUtils.isNotBlank(extension) && OFFICE_EXTENSIONS.contains(extension.toLowerCase());
}
public PreviewFile resolveReadyPreviewFile(String datasetId, DatasetFile file) {
if (file == null) {
return null;
}
KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo = KnowledgeItemPreviewMetadataHelper
.readPreviewInfo(file.getMetadata(), objectMapper);
if (previewInfo.status() != KnowledgeItemPreviewStatus.READY) {
return null;
}
String relativePath = StringUtils.defaultIfBlank(previewInfo.pdfPath(), resolvePreviewRelativePath(datasetId, file.getId()));
Path filePath = resolvePreviewStoragePath(relativePath);
if (!Files.exists(filePath) || !Files.isRegularFile(filePath)) {
markPreviewFailed(file, previewInfo, "预览文件不存在");
return null;
}
String previewName = resolvePreviewPdfName(file);
return new PreviewFile(filePath, previewName);
}
public void deletePreviewFileQuietly(String datasetId, String fileId) {
String relativePath = resolvePreviewRelativePath(datasetId, fileId);
Path filePath = resolvePreviewStoragePath(relativePath);
try {
Files.deleteIfExists(filePath);
} catch (Exception e) {
log.warn("delete dataset preview pdf error, fileId: {}", fileId, e);
}
}
private DatasetFilePreviewStatusResponse buildResponse(KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo) {
DatasetFilePreviewStatusResponse response = new DatasetFilePreviewStatusResponse();
KnowledgeItemPreviewStatus status = previewInfo.status() == null
? KnowledgeItemPreviewStatus.PENDING
: previewInfo.status();
response.setStatus(status);
response.setPreviewError(previewInfo.error());
response.setUpdatedAt(previewInfo.updatedAt());
return response;
}
private DatasetFile requireDatasetFile(String datasetId, String fileId) {
BusinessAssert.isTrue(StringUtils.isNotBlank(datasetId), CommonErrorCode.PARAM_ERROR);
BusinessAssert.isTrue(StringUtils.isNotBlank(fileId), CommonErrorCode.PARAM_ERROR);
DatasetFile datasetFile = datasetFileRepository.getById(fileId);
BusinessAssert.notNull(datasetFile, CommonErrorCode.PARAM_ERROR);
BusinessAssert.isTrue(Objects.equals(datasetFile.getDatasetId(), datasetId), CommonErrorCode.PARAM_ERROR);
return datasetFile;
}
private void assertOfficeDocument(DatasetFile file) {
BusinessAssert.notNull(file, CommonErrorCode.PARAM_ERROR);
String extension = resolveFileExtension(resolveOriginalName(file));
BusinessAssert.isTrue(OFFICE_EXTENSIONS.contains(extension), CommonErrorCode.PARAM_ERROR);
}
private String resolveOriginalName(DatasetFile file) {
if (file == null) {
return "";
}
if (StringUtils.isNotBlank(file.getFileName())) {
return file.getFileName();
}
if (StringUtils.isNotBlank(file.getFilePath())) {
return Paths.get(file.getFilePath()).getFileName().toString();
}
return "";
}
private String resolveFileExtension(String fileName) {
if (StringUtils.isBlank(fileName)) {
return "";
}
int dotIndex = fileName.lastIndexOf('.');
if (dotIndex <= 0 || dotIndex >= fileName.length() - 1) {
return "";
}
return fileName.substring(dotIndex + 1).toLowerCase();
}
private String resolvePreviewPdfName(DatasetFile file) {
String originalName = resolveOriginalName(file);
if (StringUtils.isBlank(originalName)) {
return "预览.pdf";
}
int dotIndex = originalName.lastIndexOf('.');
if (dotIndex <= 0) {
return originalName + PREVIEW_FILE_SUFFIX;
}
return originalName.substring(0, dotIndex) + PREVIEW_FILE_SUFFIX;
}
private boolean previewPdfExists(DatasetFile file, KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo) {
String relativePath = StringUtils.defaultIfBlank(previewInfo.pdfPath(), resolvePreviewRelativePath(file.getDatasetId(), file.getId()));
Path filePath = resolvePreviewStoragePath(relativePath);
return Files.exists(filePath) && Files.isRegularFile(filePath);
}
private KnowledgeItemPreviewMetadataHelper.PreviewInfo markPreviewFailed(
DatasetFile file,
KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo,
String error
) {
String relativePath = StringUtils.defaultIfBlank(previewInfo.pdfPath(), resolvePreviewRelativePath(file.getDatasetId(), file.getId()));
String updatedMetadata = KnowledgeItemPreviewMetadataHelper.applyPreviewInfo(
file.getMetadata(),
objectMapper,
KnowledgeItemPreviewStatus.FAILED,
relativePath,
error,
nowText()
);
file.setMetadata(updatedMetadata);
datasetFileRepository.updateById(file);
return KnowledgeItemPreviewMetadataHelper.readPreviewInfo(updatedMetadata, objectMapper);
}
private String resolvePreviewRelativePath(String datasetId, String fileId) {
String relativePath = Paths.get(DATASET_PREVIEW_DIR, datasetId, fileId + PREVIEW_FILE_SUFFIX)
.toString();
return relativePath.replace("\\", PATH_SEPARATOR);
}
Path resolvePreviewStoragePath(String relativePath) {
String normalizedRelativePath = StringUtils.defaultString(relativePath).replace("/", java.io.File.separator);
Path root = resolveUploadRootPath();
Path target = root.resolve(normalizedRelativePath).toAbsolutePath().normalize();
BusinessAssert.isTrue(target.startsWith(root), CommonErrorCode.PARAM_ERROR);
return target;
}
private Path resolveUploadRootPath() {
String uploadDir = dataManagementProperties.getFileStorage().getUploadDir();
BusinessAssert.isTrue(StringUtils.isNotBlank(uploadDir), CommonErrorCode.PARAM_ERROR);
return Paths.get(uploadDir).toAbsolutePath().normalize();
}
private String nowText() {
return LocalDateTime.now().format(PREVIEW_TIME_FORMATTER);
}
public record PreviewFile(Path filePath, String fileName) {
}
}

View File

@@ -0,0 +1,142 @@
package com.datamate.datamanagement.application;
import com.datamate.common.infrastructure.exception.BusinessAssert;
import com.datamate.common.infrastructure.exception.CommonErrorCode;
import com.datamate.datamanagement.common.enums.KnowledgeStatusType;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItemDirectory;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeSet;
import com.datamate.datamanagement.infrastructure.exception.DataManagementErrorCode;
import com.datamate.datamanagement.infrastructure.persistence.repository.KnowledgeItemDirectoryRepository;
import com.datamate.datamanagement.infrastructure.persistence.repository.KnowledgeItemRepository;
import com.datamate.datamanagement.infrastructure.persistence.repository.KnowledgeSetRepository;
import com.datamate.datamanagement.interfaces.dto.CreateKnowledgeDirectoryRequest;
import com.datamate.datamanagement.interfaces.dto.KnowledgeDirectoryQuery;
import lombok.RequiredArgsConstructor;
import org.apache.commons.lang3.StringUtils;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
import java.util.List;
import java.util.UUID;
/**
* 知识条目目录应用服务
*/
@Service
@Transactional
@RequiredArgsConstructor
public class KnowledgeDirectoryApplicationService {
private static final String PATH_SEPARATOR = "/";
private static final String INVALID_PATH_SEGMENT = "..";
private final KnowledgeItemDirectoryRepository knowledgeItemDirectoryRepository;
private final KnowledgeItemRepository knowledgeItemRepository;
private final KnowledgeSetRepository knowledgeSetRepository;
@Transactional(readOnly = true)
public List<KnowledgeItemDirectory> getKnowledgeDirectories(String setId, KnowledgeDirectoryQuery query) {
BusinessAssert.notNull(query, CommonErrorCode.PARAM_ERROR);
query.setSetId(setId);
return knowledgeItemDirectoryRepository.findByCriteria(query);
}
public KnowledgeItemDirectory createKnowledgeDirectory(String setId, CreateKnowledgeDirectoryRequest request) {
BusinessAssert.notNull(request, CommonErrorCode.PARAM_ERROR);
KnowledgeSet knowledgeSet = requireKnowledgeSet(setId);
BusinessAssert.isTrue(!isReadOnlyStatus(knowledgeSet.getStatus()),
DataManagementErrorCode.KNOWLEDGE_SET_STATUS_ERROR);
String directoryName = normalizeDirectoryName(request.getDirectoryName());
validateDirectoryName(directoryName);
String parentPrefix = normalizeRelativePathPrefix(request.getParentPrefix());
String relativePath = normalizeRelativePathValue(parentPrefix + directoryName);
validateRelativePath(relativePath);
BusinessAssert.isTrue(!knowledgeItemRepository.existsBySetIdAndRelativePath(setId, relativePath),
CommonErrorCode.PARAM_ERROR);
KnowledgeItemDirectory existing = knowledgeItemDirectoryRepository.findBySetIdAndPath(setId, relativePath);
if (existing != null) {
return existing;
}
KnowledgeItemDirectory directory = new KnowledgeItemDirectory();
directory.setId(UUID.randomUUID().toString());
directory.setSetId(setId);
directory.setName(directoryName);
directory.setRelativePath(relativePath);
knowledgeItemDirectoryRepository.save(directory);
return directory;
}
public void deleteKnowledgeDirectory(String setId, String relativePath) {
KnowledgeSet knowledgeSet = requireKnowledgeSet(setId);
BusinessAssert.isTrue(!isReadOnlyStatus(knowledgeSet.getStatus()),
DataManagementErrorCode.KNOWLEDGE_SET_STATUS_ERROR);
String normalized = normalizeRelativePathValue(relativePath);
validateRelativePath(normalized);
knowledgeItemRepository.removeByRelativePathPrefix(setId, normalized);
knowledgeItemDirectoryRepository.removeByRelativePathPrefix(setId, normalized);
}
private KnowledgeSet requireKnowledgeSet(String setId) {
KnowledgeSet knowledgeSet = knowledgeSetRepository.getById(setId);
BusinessAssert.notNull(knowledgeSet, DataManagementErrorCode.KNOWLEDGE_SET_NOT_FOUND);
return knowledgeSet;
}
private boolean isReadOnlyStatus(KnowledgeStatusType status) {
return status == KnowledgeStatusType.ARCHIVED || status == KnowledgeStatusType.DEPRECATED;
}
private String normalizeDirectoryName(String name) {
return StringUtils.trimToEmpty(name);
}
private void validateDirectoryName(String name) {
BusinessAssert.isTrue(StringUtils.isNotBlank(name), CommonErrorCode.PARAM_ERROR);
BusinessAssert.isTrue(!name.contains(PATH_SEPARATOR), CommonErrorCode.PARAM_ERROR);
BusinessAssert.isTrue(!name.contains("\\"), CommonErrorCode.PARAM_ERROR);
BusinessAssert.isTrue(!name.contains(INVALID_PATH_SEGMENT), CommonErrorCode.PARAM_ERROR);
}
private void validateRelativePath(String relativePath) {
BusinessAssert.isTrue(StringUtils.isNotBlank(relativePath), CommonErrorCode.PARAM_ERROR);
BusinessAssert.isTrue(!relativePath.contains(INVALID_PATH_SEGMENT), CommonErrorCode.PARAM_ERROR);
}
private String normalizeRelativePathPrefix(String prefix) {
if (StringUtils.isBlank(prefix)) {
return "";
}
String normalized = prefix.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
while (normalized.endsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(0, normalized.length() - 1);
}
if (StringUtils.isBlank(normalized)) {
return "";
}
validateRelativePath(normalized);
return normalized + PATH_SEPARATOR;
}
private String normalizeRelativePathValue(String relativePath) {
if (StringUtils.isBlank(relativePath)) {
return "";
}
String normalized = relativePath.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
while (normalized.endsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(0, normalized.length() - 1);
}
return normalized;
}
}

View File

@@ -16,12 +16,14 @@ import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItem;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeSet;
import com.datamate.datamanagement.infrastructure.config.DataManagementProperties;
import com.datamate.datamanagement.infrastructure.exception.DataManagementErrorCode;
import com.datamate.datamanagement.infrastructure.persistence.mapper.TagMapper;
import com.datamate.datamanagement.infrastructure.persistence.repository.DatasetFileRepository;
import com.datamate.datamanagement.infrastructure.persistence.repository.DatasetRepository;
import com.datamate.datamanagement.infrastructure.persistence.repository.KnowledgeItemRepository;
import com.datamate.datamanagement.infrastructure.persistence.repository.KnowledgeSetRepository;
import com.datamate.datamanagement.interfaces.converter.KnowledgeConverter;
import com.datamate.datamanagement.interfaces.dto.CreateKnowledgeItemRequest;
import com.datamate.datamanagement.interfaces.dto.DeleteKnowledgeItemsRequest;
import com.datamate.datamanagement.interfaces.dto.ImportKnowledgeItemsRequest;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemPagingQuery;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemResponse;
@@ -74,16 +76,20 @@ public class KnowledgeItemApplicationService {
private static final String EXPORT_FILE_PREFIX = "knowledge_set_";
private static final String EXPORT_FILE_SUFFIX = ".zip";
private static final String EXPORT_CONTENT_TYPE = "application/zip";
private static final String PREVIEW_PDF_CONTENT_TYPE = "application/pdf";
private static final int MAX_FILE_BASE_LENGTH = 120;
private static final int MAX_TITLE_LENGTH = 200;
private static final String KNOWLEDGE_ITEM_UPLOAD_DIR = "knowledge-items";
private static final String DEFAULT_FILE_EXTENSION = "bin";
private static final String PATH_SEPARATOR = "/";
private final KnowledgeItemRepository knowledgeItemRepository;
private final KnowledgeSetRepository knowledgeSetRepository;
private final DatasetRepository datasetRepository;
private final DatasetFileRepository datasetFileRepository;
private final DataManagementProperties dataManagementProperties;
private final TagMapper tagMapper;
private final KnowledgeItemPreviewService knowledgeItemPreviewService;
public KnowledgeItem createKnowledgeItem(String setId, CreateKnowledgeItemRequest request) {
KnowledgeSet knowledgeSet = requireKnowledgeSet(setId);
@@ -112,6 +118,7 @@ public class KnowledgeItemApplicationService {
List<MultipartFile> files = request.getFiles();
BusinessAssert.isTrue(CollectionUtils.isNotEmpty(files), CommonErrorCode.PARAM_ERROR);
String parentPrefix = normalizeRelativePathPrefix(request.getParentPrefix());
Path uploadRoot = resolveUploadRootPath();
Path setDir = uploadRoot.resolve(KNOWLEDGE_ITEM_UPLOAD_DIR).resolve(setId).normalize();
@@ -145,6 +152,7 @@ public class KnowledgeItemApplicationService {
knowledgeItem.setContentType(KnowledgeContentType.FILE);
knowledgeItem.setSourceType(KnowledgeSourceType.FILE_UPLOAD);
knowledgeItem.setSourceFileId(trimToLength(safeOriginalName, MAX_TITLE_LENGTH));
knowledgeItem.setRelativePath(buildRelativePath(parentPrefix, safeOriginalName));
items.add(knowledgeItem);
}
@@ -170,6 +178,9 @@ public class KnowledgeItemApplicationService {
if (request.getContentType() != null) {
knowledgeItem.setContentType(request.getContentType());
}
if (request.getMetadata() != null) {
knowledgeItem.setMetadata(request.getMetadata());
}
knowledgeItemRepository.updateById(knowledgeItem);
return knowledgeItem;
@@ -182,6 +193,22 @@ public class KnowledgeItemApplicationService {
knowledgeItemRepository.removeById(itemId);
}
public void deleteKnowledgeItems(String setId, DeleteKnowledgeItemsRequest request) {
BusinessAssert.notNull(request, CommonErrorCode.PARAM_ERROR);
List<String> ids = request.getIds();
BusinessAssert.isTrue(CollectionUtils.isNotEmpty(ids), CommonErrorCode.PARAM_ERROR);
List<KnowledgeItem> items = knowledgeItemRepository.listByIds(ids);
BusinessAssert.isTrue(CollectionUtils.isNotEmpty(items), DataManagementErrorCode.KNOWLEDGE_ITEM_NOT_FOUND);
BusinessAssert.isTrue(items.size() == ids.size(), DataManagementErrorCode.KNOWLEDGE_ITEM_NOT_FOUND);
boolean allMatch = items.stream().allMatch(item -> Objects.equals(item.getSetId(), setId));
BusinessAssert.isTrue(allMatch, CommonErrorCode.PARAM_ERROR);
List<String> deleteIds = items.stream().map(KnowledgeItem::getId).toList();
knowledgeItemRepository.removeByIds(deleteIds);
}
@Transactional(readOnly = true)
public KnowledgeItem getKnowledgeItem(String setId, String itemId) {
KnowledgeItem knowledgeItem = knowledgeItemRepository.getById(itemId);
@@ -213,6 +240,7 @@ public class KnowledgeItemApplicationService {
long datasetFileSize = safeLong(knowledgeItemRepository.sumDatasetFileSize());
long uploadFileSize = calculateUploadFileTotalSize();
response.setTotalSize(datasetFileSize + uploadFileSize);
response.setTotalTags(safeLong(tagMapper.countKnowledgeSetTags()));
return response;
}
@@ -256,6 +284,7 @@ public class KnowledgeItemApplicationService {
knowledgeItem.setSourceType(KnowledgeSourceType.DATASET_FILE);
knowledgeItem.setSourceDatasetId(dataset.getId());
knowledgeItem.setSourceFileId(datasetFile.getId());
knowledgeItem.setRelativePath(resolveDatasetFileRelativePath(dataset, datasetFile));
items.add(knowledgeItem);
}
@@ -307,7 +336,7 @@ public class KnowledgeItemApplicationService {
String relativePath = knowledgeItem.getContent();
BusinessAssert.isTrue(StringUtils.isNotBlank(relativePath), CommonErrorCode.PARAM_ERROR);
Path filePath = resolveKnowledgeItemStoragePath(relativePath);
Path filePath = resolveKnowledgeItemStoragePathWithFallback(relativePath);
BusinessAssert.isTrue(Files.exists(filePath) && Files.isRegularFile(filePath), CommonErrorCode.PARAM_ERROR);
String downloadName = StringUtils.isNotBlank(knowledgeItem.getSourceFileId())
@@ -340,12 +369,32 @@ public class KnowledgeItemApplicationService {
String relativePath = knowledgeItem.getContent();
BusinessAssert.isTrue(StringUtils.isNotBlank(relativePath), CommonErrorCode.PARAM_ERROR);
Path filePath = resolveKnowledgeItemStoragePath(relativePath);
BusinessAssert.isTrue(Files.exists(filePath) && Files.isRegularFile(filePath), CommonErrorCode.PARAM_ERROR);
String previewName = StringUtils.isNotBlank(knowledgeItem.getSourceFileId())
? knowledgeItem.getSourceFileId()
: filePath.getFileName().toString();
: Paths.get(relativePath).getFileName().toString();
if (knowledgeItemPreviewService.isOfficeDocument(previewName)) {
KnowledgeItemPreviewService.PreviewFile previewFile = knowledgeItemPreviewService.resolveReadyPreviewFile(setId, knowledgeItem);
if (previewFile == null) {
response.setStatus(HttpServletResponse.SC_CONFLICT);
return;
}
response.setContentType(PREVIEW_PDF_CONTENT_TYPE);
response.setCharacterEncoding(StandardCharsets.UTF_8.name());
response.setHeader(HttpHeaders.CONTENT_DISPOSITION,
"inline; filename=\"" + URLEncoder.encode(previewFile.fileName(), StandardCharsets.UTF_8) + "\"");
try (InputStream inputStream = Files.newInputStream(previewFile.filePath())) {
inputStream.transferTo(response.getOutputStream());
response.flushBuffer();
} catch (IOException e) {
log.error("preview knowledge item pdf error, itemId: {}", itemId, e);
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}
return;
}
Path filePath = resolveKnowledgeItemStoragePathWithFallback(relativePath);
BusinessAssert.isTrue(Files.exists(filePath) && Files.isRegularFile(filePath), CommonErrorCode.PARAM_ERROR);
String contentType = null;
try {
@@ -418,7 +467,10 @@ public class KnowledgeItemApplicationService {
knowledgeItem.setContentType(KnowledgeContentType.FILE);
knowledgeItem.setSourceType(KnowledgeSourceType.FILE_UPLOAD);
knowledgeItem.setSourceFileId(sourceFileId);
knowledgeItem.setRelativePath(resolveReplacedRelativePath(knowledgeItem.getRelativePath(), sourceFileId));
knowledgeItem.setMetadata(knowledgeItemPreviewService.clearPreviewMetadata(knowledgeItem.getMetadata()));
knowledgeItemRepository.updateById(knowledgeItem);
knowledgeItemPreviewService.deletePreviewFileQuietly(setId, knowledgeItem.getId());
deleteFile(oldFilePath);
} catch (Exception e) {
deleteFileQuietly(targetPath);
@@ -483,6 +535,86 @@ public class KnowledgeItemApplicationService {
return target;
}
private Path resolveKnowledgeItemStoragePathWithFallback(String relativePath) {
BusinessAssert.isTrue(StringUtils.isNotBlank(relativePath), CommonErrorCode.PARAM_ERROR);
String normalizedInput = relativePath.replace("\\", PATH_SEPARATOR).trim();
Path root = resolveUploadRootPath();
java.util.LinkedHashSet<Path> candidates = new java.util.LinkedHashSet<>();
Path inputPath = Paths.get(normalizedInput.replace(PATH_SEPARATOR, File.separator));
if (inputPath.isAbsolute()) {
Path normalizedAbsolute = inputPath.toAbsolutePath().normalize();
if (normalizedAbsolute.startsWith(root)) {
candidates.add(normalizedAbsolute);
}
String segmentRelativePath = extractRelativePathFromSegment(normalizedInput, KNOWLEDGE_ITEM_UPLOAD_DIR);
if (StringUtils.isNotBlank(segmentRelativePath)) {
candidates.add(buildKnowledgeItemStoragePath(root, segmentRelativePath));
}
BusinessAssert.isTrue(!candidates.isEmpty(), CommonErrorCode.PARAM_ERROR);
} else {
String normalizedRelative = normalizeRelativePathValue(normalizedInput);
if (StringUtils.isNotBlank(normalizedRelative)) {
candidates.add(buildKnowledgeItemStoragePath(root, normalizedRelative));
}
String segmentRelativePath = extractRelativePathFromSegment(normalizedInput, KNOWLEDGE_ITEM_UPLOAD_DIR);
if (StringUtils.isNotBlank(segmentRelativePath)) {
candidates.add(buildKnowledgeItemStoragePath(root, segmentRelativePath));
}
if (StringUtils.isNotBlank(normalizedRelative)
&& !normalizedRelative.startsWith(KNOWLEDGE_ITEM_UPLOAD_DIR + PATH_SEPARATOR)
&& !normalizedRelative.equals(KNOWLEDGE_ITEM_UPLOAD_DIR)) {
candidates.add(buildKnowledgeItemStoragePath(root, KNOWLEDGE_ITEM_UPLOAD_DIR + PATH_SEPARATOR + normalizedRelative));
}
}
if (root.getFileName() != null && KNOWLEDGE_ITEM_UPLOAD_DIR.equals(root.getFileName().toString())) {
String normalizedRelative = normalizeRelativePathValue(normalizedInput);
if (StringUtils.isNotBlank(normalizedRelative)
&& normalizedRelative.startsWith(KNOWLEDGE_ITEM_UPLOAD_DIR + PATH_SEPARATOR)) {
String withoutPrefix = normalizedRelative.substring(KNOWLEDGE_ITEM_UPLOAD_DIR.length() + PATH_SEPARATOR.length());
if (StringUtils.isNotBlank(withoutPrefix)) {
candidates.add(buildKnowledgeItemStoragePath(root, withoutPrefix));
}
}
}
Path fallback = null;
for (Path candidate : candidates) {
if (fallback == null) {
fallback = candidate;
}
if (Files.exists(candidate) && Files.isRegularFile(candidate)) {
return candidate;
}
}
BusinessAssert.notNull(fallback, CommonErrorCode.PARAM_ERROR);
return fallback;
}
private Path buildKnowledgeItemStoragePath(Path root, String relativePath) {
String normalizedRelativePath = StringUtils.defaultString(relativePath).replace(PATH_SEPARATOR, File.separator);
Path target = root.resolve(normalizedRelativePath).toAbsolutePath().normalize();
BusinessAssert.isTrue(target.startsWith(root), CommonErrorCode.PARAM_ERROR);
return target;
}
private String extractRelativePathFromSegment(String rawPath, String segment) {
if (StringUtils.isBlank(rawPath) || StringUtils.isBlank(segment)) {
return null;
}
String normalized = rawPath.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
String segmentPrefix = segment + PATH_SEPARATOR;
int index = normalized.indexOf(segmentPrefix);
if (index < 0) {
return segment.equals(normalized) ? segment : null;
}
return normalizeRelativePathValue(normalized.substring(index));
}
private KnowledgeItemSearchResponse normalizeSearchResponse(KnowledgeItemSearchResponse item) {
BusinessAssert.notNull(item, CommonErrorCode.PARAM_ERROR);
if (item.getSourceType() == KnowledgeSourceType.FILE_UPLOAD) {
@@ -540,6 +672,84 @@ public class KnowledgeItemApplicationService {
return relativePath.replace(File.separatorChar, '/');
}
private String buildRelativePath(String parentPrefix, String fileName) {
String safeName = sanitizeFileName(fileName);
if (StringUtils.isBlank(safeName)) {
safeName = "file";
}
String normalizedPrefix = normalizeRelativePathPrefix(parentPrefix);
return normalizedPrefix + safeName;
}
private String normalizeRelativePathPrefix(String prefix) {
if (StringUtils.isBlank(prefix)) {
return "";
}
String normalized = prefix.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
while (normalized.endsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(0, normalized.length() - 1);
}
BusinessAssert.isTrue(!normalized.contains(".."), CommonErrorCode.PARAM_ERROR);
if (StringUtils.isBlank(normalized)) {
return "";
}
return normalized + PATH_SEPARATOR;
}
private String normalizeRelativePathValue(String relativePath) {
if (StringUtils.isBlank(relativePath)) {
return "";
}
String normalized = relativePath.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
while (normalized.endsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(0, normalized.length() - 1);
}
return normalized;
}
private String resolveDatasetFileRelativePath(Dataset dataset, DatasetFile datasetFile) {
if (datasetFile == null) {
return "";
}
String fileName = StringUtils.defaultIfBlank(datasetFile.getFileName(), datasetFile.getId());
String datasetPath = dataset == null ? null : dataset.getPath();
String filePath = datasetFile.getFilePath();
if (StringUtils.isBlank(datasetPath) || StringUtils.isBlank(filePath)) {
return buildRelativePath("", fileName);
}
try {
Path datasetRoot = Paths.get(datasetPath).toAbsolutePath().normalize();
Path targetPath = Paths.get(filePath).toAbsolutePath().normalize();
if (targetPath.startsWith(datasetRoot)) {
Path relative = datasetRoot.relativize(targetPath);
String relativeValue = relative.toString().replace(File.separatorChar, '/');
String normalized = normalizeRelativePathValue(relativeValue);
if (!normalized.contains("..") && StringUtils.isNotBlank(normalized)) {
return normalized;
}
}
} catch (Exception e) {
log.warn("resolve dataset file relative path failed, fileId: {}", datasetFile.getId(), e);
}
return buildRelativePath("", fileName);
}
private String resolveReplacedRelativePath(String existingRelativePath, String newFileName) {
String normalized = normalizeRelativePathValue(existingRelativePath);
if (StringUtils.isBlank(normalized)) {
return buildRelativePath("", newFileName);
}
int lastIndex = normalized.lastIndexOf(PATH_SEPARATOR);
String parentPrefix = lastIndex >= 0 ? normalized.substring(0, lastIndex + 1) : "";
return buildRelativePath(parentPrefix, newFileName);
}
private void createDirectories(Path path) {
try {
Files.createDirectories(path);

View File

@@ -0,0 +1,275 @@
package com.datamate.datamanagement.application;
import com.datamate.datamanagement.common.enums.KnowledgeItemPreviewStatus;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItem;
import com.datamate.datamanagement.infrastructure.config.DataManagementProperties;
import com.datamate.datamanagement.infrastructure.persistence.repository.KnowledgeItemRepository;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.StringUtils;
import org.springframework.scheduling.annotation.Async;
import org.springframework.stereotype.Service;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Set;
/**
* 知识条目预览转换异步任务
*/
@Service
@RequiredArgsConstructor
@Slf4j
public class KnowledgeItemPreviewAsyncService {
private static final Set<String> OFFICE_EXTENSIONS = Set.of("doc", "docx");
private static final String KNOWLEDGE_ITEM_UPLOAD_DIR = "knowledge-items";
private static final String PREVIEW_SUB_DIR = "preview";
private static final String PREVIEW_FILE_SUFFIX = ".pdf";
private static final String PATH_SEPARATOR = "/";
private static final int MAX_ERROR_LENGTH = 500;
private static final DateTimeFormatter PREVIEW_TIME_FORMATTER = DateTimeFormatter.ISO_LOCAL_DATE_TIME;
private final KnowledgeItemRepository knowledgeItemRepository;
private final DataManagementProperties dataManagementProperties;
private final ObjectMapper objectMapper = new ObjectMapper();
@Async
public void convertPreviewAsync(String itemId) {
if (StringUtils.isBlank(itemId)) {
return;
}
KnowledgeItem item = knowledgeItemRepository.getById(itemId);
if (item == null) {
return;
}
String extension = resolveFileExtension(resolveOriginalName(item));
if (!OFFICE_EXTENSIONS.contains(extension)) {
updatePreviewStatus(item, KnowledgeItemPreviewStatus.FAILED, null, "仅支持 DOC/DOCX 转换");
return;
}
if (StringUtils.isBlank(item.getContent())) {
updatePreviewStatus(item, KnowledgeItemPreviewStatus.FAILED, null, "源文件路径为空");
return;
}
Path sourcePath = resolveKnowledgeItemStoragePath(item.getContent());
if (!Files.exists(sourcePath) || !Files.isRegularFile(sourcePath)) {
updatePreviewStatus(item, KnowledgeItemPreviewStatus.FAILED, null, "源文件不存在");
return;
}
KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo = KnowledgeItemPreviewMetadataHelper
.readPreviewInfo(item.getMetadata(), objectMapper);
String previewRelativePath = StringUtils.defaultIfBlank(
previewInfo.pdfPath(),
resolvePreviewRelativePath(item.getSetId(), item.getId())
);
Path targetPath = resolvePreviewStoragePath(previewRelativePath);
ensureParentDirectory(targetPath);
try {
LibreOfficeConverter.convertToPdf(sourcePath, targetPath);
updatePreviewStatus(item, KnowledgeItemPreviewStatus.READY, previewRelativePath, null);
} catch (Exception e) {
log.error("preview convert failed, itemId: {}", item.getId(), e);
updatePreviewStatus(item, KnowledgeItemPreviewStatus.FAILED, previewRelativePath, trimError(e.getMessage()));
}
}
private void updatePreviewStatus(
KnowledgeItem item,
KnowledgeItemPreviewStatus status,
String previewRelativePath,
String error
) {
if (item == null) {
return;
}
String updatedMetadata = KnowledgeItemPreviewMetadataHelper.applyPreviewInfo(
item.getMetadata(),
objectMapper,
status,
previewRelativePath,
error,
nowText()
);
item.setMetadata(updatedMetadata);
knowledgeItemRepository.updateById(item);
}
private String resolveOriginalName(KnowledgeItem item) {
if (item == null) {
return "";
}
if (StringUtils.isNotBlank(item.getSourceFileId())) {
return item.getSourceFileId();
}
if (StringUtils.isNotBlank(item.getContent())) {
return Paths.get(item.getContent()).getFileName().toString();
}
return "";
}
private String resolveFileExtension(String fileName) {
if (StringUtils.isBlank(fileName)) {
return "";
}
int dotIndex = fileName.lastIndexOf('.');
if (dotIndex <= 0 || dotIndex >= fileName.length() - 1) {
return "";
}
return fileName.substring(dotIndex + 1).toLowerCase();
}
private String resolvePreviewRelativePath(String setId, String itemId) {
String relativePath = Paths.get(KNOWLEDGE_ITEM_UPLOAD_DIR, setId, PREVIEW_SUB_DIR, itemId + PREVIEW_FILE_SUFFIX)
.toString();
return relativePath.replace("\\", PATH_SEPARATOR);
}
private Path resolvePreviewStoragePath(String relativePath) {
String normalizedRelativePath = StringUtils.defaultString(relativePath).replace("/", java.io.File.separator);
Path root = resolveUploadRootPath();
Path target = root.resolve(normalizedRelativePath).toAbsolutePath().normalize();
if (!target.startsWith(root)) {
throw new IllegalArgumentException("invalid preview path");
}
return target;
}
private Path resolveKnowledgeItemStoragePath(String relativePath) {
if (StringUtils.isBlank(relativePath)) {
throw new IllegalArgumentException("invalid knowledge item path");
}
String normalizedInput = relativePath.replace("\\", PATH_SEPARATOR).trim();
Path root = resolveUploadRootPath();
java.util.LinkedHashSet<Path> candidates = new java.util.LinkedHashSet<>();
Path inputPath = Paths.get(normalizedInput.replace(PATH_SEPARATOR, java.io.File.separator));
if (inputPath.isAbsolute()) {
Path normalizedAbsolute = inputPath.toAbsolutePath().normalize();
if (normalizedAbsolute.startsWith(root)) {
candidates.add(normalizedAbsolute);
}
String segmentRelativePath = extractRelativePathFromSegment(normalizedInput, KNOWLEDGE_ITEM_UPLOAD_DIR);
if (StringUtils.isNotBlank(segmentRelativePath)) {
candidates.add(buildKnowledgeItemStoragePath(root, segmentRelativePath));
}
if (candidates.isEmpty()) {
throw new IllegalArgumentException("invalid knowledge item path");
}
} else {
String normalizedRelative = normalizeRelativePathValue(normalizedInput);
if (StringUtils.isNotBlank(normalizedRelative)) {
candidates.add(buildKnowledgeItemStoragePath(root, normalizedRelative));
}
String segmentRelativePath = extractRelativePathFromSegment(normalizedInput, KNOWLEDGE_ITEM_UPLOAD_DIR);
if (StringUtils.isNotBlank(segmentRelativePath)) {
candidates.add(buildKnowledgeItemStoragePath(root, segmentRelativePath));
}
if (StringUtils.isNotBlank(normalizedRelative)
&& !normalizedRelative.startsWith(KNOWLEDGE_ITEM_UPLOAD_DIR + PATH_SEPARATOR)
&& !normalizedRelative.equals(KNOWLEDGE_ITEM_UPLOAD_DIR)) {
candidates.add(buildKnowledgeItemStoragePath(root, KNOWLEDGE_ITEM_UPLOAD_DIR + PATH_SEPARATOR + normalizedRelative));
}
}
if (root.getFileName() != null && KNOWLEDGE_ITEM_UPLOAD_DIR.equals(root.getFileName().toString())) {
String normalizedRelative = normalizeRelativePathValue(normalizedInput);
if (StringUtils.isNotBlank(normalizedRelative)
&& normalizedRelative.startsWith(KNOWLEDGE_ITEM_UPLOAD_DIR + PATH_SEPARATOR)) {
String withoutPrefix = normalizedRelative.substring(KNOWLEDGE_ITEM_UPLOAD_DIR.length() + PATH_SEPARATOR.length());
if (StringUtils.isNotBlank(withoutPrefix)) {
candidates.add(buildKnowledgeItemStoragePath(root, withoutPrefix));
}
}
}
Path fallback = null;
for (Path candidate : candidates) {
if (fallback == null) {
fallback = candidate;
}
if (Files.exists(candidate) && Files.isRegularFile(candidate)) {
return candidate;
}
}
if (fallback == null) {
throw new IllegalArgumentException("invalid knowledge item path");
}
return fallback;
}
private Path buildKnowledgeItemStoragePath(Path root, String relativePath) {
String normalizedRelativePath = StringUtils.defaultString(relativePath).replace(PATH_SEPARATOR, java.io.File.separator);
Path target = root.resolve(normalizedRelativePath).toAbsolutePath().normalize();
if (!target.startsWith(root)) {
throw new IllegalArgumentException("invalid knowledge item path");
}
return target;
}
private String extractRelativePathFromSegment(String rawPath, String segment) {
if (StringUtils.isBlank(rawPath) || StringUtils.isBlank(segment)) {
return null;
}
String normalized = rawPath.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
String segmentPrefix = segment + PATH_SEPARATOR;
int index = normalized.indexOf(segmentPrefix);
if (index < 0) {
return segment.equals(normalized) ? segment : null;
}
return normalizeRelativePathValue(normalized.substring(index));
}
private String normalizeRelativePathValue(String relativePath) {
if (StringUtils.isBlank(relativePath)) {
return "";
}
String normalized = relativePath.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
while (normalized.endsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(0, normalized.length() - 1);
}
return normalized;
}
private Path resolveUploadRootPath() {
String uploadDir = dataManagementProperties.getFileStorage().getUploadDir();
return Paths.get(uploadDir).toAbsolutePath().normalize();
}
private void ensureParentDirectory(Path targetPath) {
try {
Path parent = targetPath.getParent();
if (parent != null) {
Files.createDirectories(parent);
}
} catch (IOException e) {
throw new IllegalStateException("创建预览目录失败", e);
}
}
private String trimError(String error) {
if (StringUtils.isBlank(error)) {
return "";
}
if (error.length() <= MAX_ERROR_LENGTH) {
return error;
}
return error.substring(0, MAX_ERROR_LENGTH);
}
private String nowText() {
return LocalDateTime.now().format(PREVIEW_TIME_FORMATTER);
}
}

View File

@@ -0,0 +1,134 @@
package com.datamate.datamanagement.application;
import com.datamate.datamanagement.common.enums.KnowledgeItemPreviewStatus;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.ObjectNode;
import org.apache.commons.lang3.StringUtils;
/**
* 知识条目预览元数据解析与写入辅助类
*/
public final class KnowledgeItemPreviewMetadataHelper {
public static final String PREVIEW_STATUS_KEY = "previewStatus";
public static final String PREVIEW_PDF_PATH_KEY = "previewPdfPath";
public static final String PREVIEW_ERROR_KEY = "previewError";
public static final String PREVIEW_UPDATED_AT_KEY = "previewUpdatedAt";
private KnowledgeItemPreviewMetadataHelper() {
}
public static PreviewInfo readPreviewInfo(String metadata, ObjectMapper objectMapper) {
if (StringUtils.isBlank(metadata) || objectMapper == null) {
return PreviewInfo.empty();
}
try {
JsonNode node = objectMapper.readTree(metadata);
if (node == null || !node.isObject()) {
return PreviewInfo.empty();
}
String statusText = textValue(node, PREVIEW_STATUS_KEY);
KnowledgeItemPreviewStatus status = parseStatus(statusText);
return new PreviewInfo(
status,
textValue(node, PREVIEW_PDF_PATH_KEY),
textValue(node, PREVIEW_ERROR_KEY),
textValue(node, PREVIEW_UPDATED_AT_KEY)
);
} catch (Exception ignore) {
return PreviewInfo.empty();
}
}
public static String applyPreviewInfo(
String metadata,
ObjectMapper objectMapper,
KnowledgeItemPreviewStatus status,
String pdfPath,
String error,
String updatedAt
) {
if (objectMapper == null) {
return metadata;
}
ObjectNode root = parseRoot(metadata, objectMapper);
if (status == null) {
root.remove(PREVIEW_STATUS_KEY);
} else {
root.put(PREVIEW_STATUS_KEY, status.name());
}
if (StringUtils.isBlank(pdfPath)) {
root.remove(PREVIEW_PDF_PATH_KEY);
} else {
root.put(PREVIEW_PDF_PATH_KEY, pdfPath);
}
if (StringUtils.isBlank(error)) {
root.remove(PREVIEW_ERROR_KEY);
} else {
root.put(PREVIEW_ERROR_KEY, error);
}
if (StringUtils.isBlank(updatedAt)) {
root.remove(PREVIEW_UPDATED_AT_KEY);
} else {
root.put(PREVIEW_UPDATED_AT_KEY, updatedAt);
}
return root.size() == 0 ? null : root.toString();
}
public static String clearPreviewInfo(String metadata, ObjectMapper objectMapper) {
if (objectMapper == null) {
return metadata;
}
ObjectNode root = parseRoot(metadata, objectMapper);
root.remove(PREVIEW_STATUS_KEY);
root.remove(PREVIEW_PDF_PATH_KEY);
root.remove(PREVIEW_ERROR_KEY);
root.remove(PREVIEW_UPDATED_AT_KEY);
return root.size() == 0 ? null : root.toString();
}
private static ObjectNode parseRoot(String metadata, ObjectMapper objectMapper) {
if (StringUtils.isBlank(metadata)) {
return objectMapper.createObjectNode();
}
try {
JsonNode node = objectMapper.readTree(metadata);
if (node instanceof ObjectNode objectNode) {
return objectNode;
}
} catch (Exception ignore) {
return objectMapper.createObjectNode();
}
return objectMapper.createObjectNode();
}
private static String textValue(JsonNode node, String key) {
if (node == null || StringUtils.isBlank(key)) {
return null;
}
JsonNode value = node.get(key);
return value == null || value.isNull() ? null : value.asText();
}
private static KnowledgeItemPreviewStatus parseStatus(String statusText) {
if (StringUtils.isBlank(statusText)) {
return null;
}
try {
return KnowledgeItemPreviewStatus.valueOf(statusText);
} catch (Exception ignore) {
return null;
}
}
public record PreviewInfo(
KnowledgeItemPreviewStatus status,
String pdfPath,
String error,
String updatedAt
) {
public static PreviewInfo empty() {
return new PreviewInfo(null, null, null, null);
}
}
}

View File

@@ -0,0 +1,244 @@
package com.datamate.datamanagement.application;
import com.datamate.common.infrastructure.exception.BusinessAssert;
import com.datamate.common.infrastructure.exception.CommonErrorCode;
import com.datamate.datamanagement.common.enums.KnowledgeContentType;
import com.datamate.datamanagement.common.enums.KnowledgeItemPreviewStatus;
import com.datamate.datamanagement.common.enums.KnowledgeSourceType;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItem;
import com.datamate.datamanagement.infrastructure.config.DataManagementProperties;
import com.datamate.datamanagement.infrastructure.persistence.repository.KnowledgeItemRepository;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemPreviewStatusResponse;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.StringUtils;
import org.springframework.stereotype.Service;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Objects;
import java.util.Set;
/**
* 知识条目预览转换服务
*/
@Service
@RequiredArgsConstructor
@Slf4j
public class KnowledgeItemPreviewService {
private static final Set<String> OFFICE_EXTENSIONS = Set.of("doc", "docx");
private static final String KNOWLEDGE_ITEM_UPLOAD_DIR = "knowledge-items";
private static final String PREVIEW_SUB_DIR = "preview";
private static final String PREVIEW_FILE_SUFFIX = ".pdf";
private static final String PATH_SEPARATOR = "/";
private static final DateTimeFormatter PREVIEW_TIME_FORMATTER = DateTimeFormatter.ISO_LOCAL_DATE_TIME;
private final KnowledgeItemRepository knowledgeItemRepository;
private final DataManagementProperties dataManagementProperties;
private final KnowledgeItemPreviewAsyncService knowledgeItemPreviewAsyncService;
private final ObjectMapper objectMapper = new ObjectMapper();
public KnowledgeItemPreviewStatusResponse getPreviewStatus(String setId, String itemId) {
KnowledgeItem item = requireKnowledgeItem(setId, itemId);
assertOfficeDocument(item);
KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo = KnowledgeItemPreviewMetadataHelper
.readPreviewInfo(item.getMetadata(), objectMapper);
if (previewInfo.status() == KnowledgeItemPreviewStatus.READY && !previewPdfExists(item, previewInfo)) {
previewInfo = markPreviewFailed(item, previewInfo, "预览文件不存在");
}
return buildResponse(previewInfo);
}
public KnowledgeItemPreviewStatusResponse ensurePreview(String setId, String itemId) {
KnowledgeItem item = requireKnowledgeItem(setId, itemId);
assertOfficeDocument(item);
KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo = KnowledgeItemPreviewMetadataHelper
.readPreviewInfo(item.getMetadata(), objectMapper);
if (previewInfo.status() == KnowledgeItemPreviewStatus.READY && previewPdfExists(item, previewInfo)) {
return buildResponse(previewInfo);
}
if (previewInfo.status() == KnowledgeItemPreviewStatus.PROCESSING) {
return buildResponse(previewInfo);
}
String previewRelativePath = resolvePreviewRelativePath(item.getSetId(), item.getId());
String updatedMetadata = KnowledgeItemPreviewMetadataHelper.applyPreviewInfo(
item.getMetadata(),
objectMapper,
KnowledgeItemPreviewStatus.PROCESSING,
previewRelativePath,
null,
nowText()
);
item.setMetadata(updatedMetadata);
knowledgeItemRepository.updateById(item);
knowledgeItemPreviewAsyncService.convertPreviewAsync(item.getId());
KnowledgeItemPreviewMetadataHelper.PreviewInfo refreshed = KnowledgeItemPreviewMetadataHelper
.readPreviewInfo(updatedMetadata, objectMapper);
return buildResponse(refreshed);
}
public boolean isOfficeDocument(String fileName) {
String extension = resolveFileExtension(fileName);
return StringUtils.isNotBlank(extension) && OFFICE_EXTENSIONS.contains(extension.toLowerCase());
}
public PreviewFile resolveReadyPreviewFile(String setId, KnowledgeItem item) {
if (item == null) {
return null;
}
KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo = KnowledgeItemPreviewMetadataHelper
.readPreviewInfo(item.getMetadata(), objectMapper);
if (previewInfo.status() != KnowledgeItemPreviewStatus.READY) {
return null;
}
String relativePath = StringUtils.defaultIfBlank(previewInfo.pdfPath(), resolvePreviewRelativePath(setId, item.getId()));
Path filePath = resolvePreviewStoragePath(relativePath);
if (!Files.exists(filePath) || !Files.isRegularFile(filePath)) {
markPreviewFailed(item, previewInfo, "预览文件不存在");
return null;
}
String previewName = resolvePreviewPdfName(item);
return new PreviewFile(filePath, previewName);
}
public String clearPreviewMetadata(String metadata) {
return KnowledgeItemPreviewMetadataHelper.clearPreviewInfo(metadata, objectMapper);
}
public void deletePreviewFileQuietly(String setId, String itemId) {
String relativePath = resolvePreviewRelativePath(setId, itemId);
Path filePath = resolvePreviewStoragePath(relativePath);
try {
Files.deleteIfExists(filePath);
} catch (Exception e) {
log.warn("delete preview pdf error, itemId: {}", itemId, e);
}
}
private KnowledgeItemPreviewStatusResponse buildResponse(KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo) {
KnowledgeItemPreviewStatusResponse response = new KnowledgeItemPreviewStatusResponse();
KnowledgeItemPreviewStatus status = previewInfo.status() == null
? KnowledgeItemPreviewStatus.PENDING
: previewInfo.status();
response.setStatus(status);
response.setPreviewError(previewInfo.error());
response.setUpdatedAt(previewInfo.updatedAt());
return response;
}
private KnowledgeItem requireKnowledgeItem(String setId, String itemId) {
BusinessAssert.isTrue(StringUtils.isNotBlank(setId), CommonErrorCode.PARAM_ERROR);
BusinessAssert.isTrue(StringUtils.isNotBlank(itemId), CommonErrorCode.PARAM_ERROR);
KnowledgeItem knowledgeItem = knowledgeItemRepository.getById(itemId);
BusinessAssert.notNull(knowledgeItem, CommonErrorCode.PARAM_ERROR);
BusinessAssert.isTrue(Objects.equals(knowledgeItem.getSetId(), setId), CommonErrorCode.PARAM_ERROR);
return knowledgeItem;
}
private void assertOfficeDocument(KnowledgeItem item) {
BusinessAssert.notNull(item, CommonErrorCode.PARAM_ERROR);
BusinessAssert.isTrue(
item.getContentType() == KnowledgeContentType.FILE || item.getSourceType() == KnowledgeSourceType.FILE_UPLOAD,
CommonErrorCode.PARAM_ERROR
);
String extension = resolveFileExtension(resolveOriginalName(item));
BusinessAssert.isTrue(OFFICE_EXTENSIONS.contains(extension), CommonErrorCode.PARAM_ERROR);
}
private String resolveOriginalName(KnowledgeItem item) {
if (item == null) {
return "";
}
if (StringUtils.isNotBlank(item.getSourceFileId())) {
return item.getSourceFileId();
}
if (StringUtils.isNotBlank(item.getContent())) {
return Paths.get(item.getContent()).getFileName().toString();
}
return "";
}
private String resolveFileExtension(String fileName) {
if (StringUtils.isBlank(fileName)) {
return "";
}
int dotIndex = fileName.lastIndexOf('.');
if (dotIndex <= 0 || dotIndex >= fileName.length() - 1) {
return "";
}
return fileName.substring(dotIndex + 1).toLowerCase();
}
private String resolvePreviewPdfName(KnowledgeItem item) {
String originalName = resolveOriginalName(item);
if (StringUtils.isBlank(originalName)) {
return "预览.pdf";
}
int dotIndex = originalName.lastIndexOf('.');
if (dotIndex <= 0) {
return originalName + PREVIEW_FILE_SUFFIX;
}
return originalName.substring(0, dotIndex) + PREVIEW_FILE_SUFFIX;
}
private boolean previewPdfExists(KnowledgeItem item, KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo) {
String relativePath = StringUtils.defaultIfBlank(previewInfo.pdfPath(), resolvePreviewRelativePath(item.getSetId(), item.getId()));
Path filePath = resolvePreviewStoragePath(relativePath);
return Files.exists(filePath) && Files.isRegularFile(filePath);
}
private KnowledgeItemPreviewMetadataHelper.PreviewInfo markPreviewFailed(
KnowledgeItem item,
KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo,
String error
) {
String relativePath = StringUtils.defaultIfBlank(previewInfo.pdfPath(), resolvePreviewRelativePath(item.getSetId(), item.getId()));
String updatedMetadata = KnowledgeItemPreviewMetadataHelper.applyPreviewInfo(
item.getMetadata(),
objectMapper,
KnowledgeItemPreviewStatus.FAILED,
relativePath,
error,
nowText()
);
item.setMetadata(updatedMetadata);
knowledgeItemRepository.updateById(item);
return KnowledgeItemPreviewMetadataHelper.readPreviewInfo(updatedMetadata, objectMapper);
}
private String resolvePreviewRelativePath(String setId, String itemId) {
String relativePath = Paths.get(KNOWLEDGE_ITEM_UPLOAD_DIR, setId, PREVIEW_SUB_DIR, itemId + PREVIEW_FILE_SUFFIX)
.toString();
return relativePath.replace("\\", PATH_SEPARATOR);
}
private Path resolvePreviewStoragePath(String relativePath) {
String normalizedRelativePath = StringUtils.defaultString(relativePath).replace("/", java.io.File.separator);
Path root = resolveUploadRootPath();
Path target = root.resolve(normalizedRelativePath).toAbsolutePath().normalize();
BusinessAssert.isTrue(target.startsWith(root), CommonErrorCode.PARAM_ERROR);
return target;
}
private Path resolveUploadRootPath() {
String uploadDir = dataManagementProperties.getFileStorage().getUploadDir();
BusinessAssert.isTrue(StringUtils.isNotBlank(uploadDir), CommonErrorCode.PARAM_ERROR);
return Paths.get(uploadDir).toAbsolutePath().normalize();
}
private String nowText() {
return LocalDateTime.now().format(PREVIEW_TIME_FORMATTER);
}
public record PreviewFile(Path filePath, String fileName) {
}
}

View File

@@ -0,0 +1,93 @@
package com.datamate.datamanagement.application;
import java.io.IOException;
import java.io.InputStream;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.StandardCopyOption;
import java.time.Duration;
import java.util.List;
import java.util.concurrent.TimeUnit;
/**
* LibreOffice 文档转换工具
*/
public final class LibreOfficeConverter {
private static final String LIBREOFFICE_COMMAND = "soffice";
private static final Duration CONVERT_TIMEOUT = Duration.ofMinutes(5);
private static final int MAX_OUTPUT_LENGTH = 500;
private LibreOfficeConverter() {
}
public static void convertToPdf(Path sourcePath, Path targetPath) throws Exception {
Path outputDir = targetPath.getParent();
List<String> command = List.of(
LIBREOFFICE_COMMAND,
"--headless",
"--nologo",
"--nolockcheck",
"--nodefault",
"--nofirststartwizard",
"--convert-to",
"pdf",
"--outdir",
outputDir.toString(),
sourcePath.toString()
);
ProcessBuilder processBuilder = new ProcessBuilder(command);
processBuilder.redirectErrorStream(true);
Process process = processBuilder.start();
boolean finished = process.waitFor(CONVERT_TIMEOUT.toMillis(), TimeUnit.MILLISECONDS);
String output = readProcessOutput(process.getInputStream());
if (!finished) {
process.destroyForcibly();
throw new IllegalStateException("LibreOffice 转换超时");
}
if (process.exitValue() != 0) {
throw new IllegalStateException("LibreOffice 转换失败: " + output);
}
Path generated = outputDir.resolve(stripExtension(sourcePath.getFileName().toString()) + ".pdf");
if (!Files.exists(generated)) {
throw new IllegalStateException("LibreOffice 输出文件不存在");
}
if (!generated.equals(targetPath)) {
Files.move(generated, targetPath, StandardCopyOption.REPLACE_EXISTING);
}
}
private static String readProcessOutput(InputStream inputStream) throws IOException {
if (inputStream == null) {
return "";
}
byte[] buffer = new byte[1024];
StringBuilder builder = new StringBuilder();
int total = 0;
int read;
while ((read = inputStream.read(buffer)) >= 0) {
if (read == 0) {
continue;
}
int remaining = MAX_OUTPUT_LENGTH - total;
if (remaining <= 0) {
break;
}
int toAppend = Math.min(remaining, read);
builder.append(new String(buffer, 0, toAppend, StandardCharsets.UTF_8));
total += toAppend;
if (total >= MAX_OUTPUT_LENGTH) {
break;
}
}
return builder.toString();
}
private static String stripExtension(String fileName) {
if (fileName == null || fileName.isBlank()) {
return "preview";
}
int dotIndex = fileName.lastIndexOf('.');
return dotIndex <= 0 ? fileName : fileName.substring(0, dotIndex);
}
}

View File

@@ -0,0 +1,11 @@
package com.datamate.datamanagement.common.enums;
/**
* 知识条目预览转换状态
*/
public enum KnowledgeItemPreviewStatus {
PENDING,
PROCESSING,
READY,
FAILED
}

View File

@@ -38,4 +38,12 @@ public class KnowledgeItem extends BaseEntity<String> {
* 来源文件ID
*/
private String sourceFileId;
/**
* 相对路径(用于目录展示)
*/
private String relativePath;
/**
* 扩展元数据
*/
private String metadata;
}

View File

@@ -0,0 +1,29 @@
package com.datamate.datamanagement.domain.model.knowledge;
import com.baomidou.mybatisplus.annotation.TableName;
import com.datamate.common.domain.model.base.BaseEntity;
import lombok.Getter;
import lombok.Setter;
/**
* 知识条目目录实体(与数据库表 t_dm_knowledge_item_directories 对齐)
*/
@Getter
@Setter
@TableName(value = "t_dm_knowledge_item_directories", autoResultMap = true)
public class KnowledgeItemDirectory extends BaseEntity<String> {
/**
* 所属知识集ID
*/
private String setId;
/**
* 目录名称
*/
private String name;
/**
* 目录相对路径
*/
private String relativePath;
}

View File

@@ -2,6 +2,7 @@ package com.datamate.datamanagement.infrastructure.persistence.mapper;
import com.baomidou.mybatisplus.core.mapper.BaseMapper;
import com.datamate.datamanagement.domain.model.dataset.DatasetFile;
import com.datamate.datamanagement.infrastructure.persistence.repository.dto.DatasetFileCount;
import org.apache.ibatis.annotations.Mapper;
import org.apache.ibatis.annotations.Param;
import org.apache.ibatis.session.RowBounds;
@@ -17,6 +18,7 @@ public interface DatasetFileMapper extends BaseMapper<DatasetFile> {
Long countByDatasetId(@Param("datasetId") String datasetId);
Long countCompletedByDatasetId(@Param("datasetId") String datasetId);
Long sumSizeByDatasetId(@Param("datasetId") String datasetId);
Long countNonDerivedByDatasetId(@Param("datasetId") String datasetId);
DatasetFile findByDatasetIdAndFileName(@Param("datasetId") String datasetId, @Param("fileName") String fileName);
List<DatasetFile> findAllByDatasetId(@Param("datasetId") String datasetId);
List<DatasetFile> findByCriteria(@Param("datasetId") String datasetId,
@@ -38,4 +40,12 @@ public interface DatasetFileMapper extends BaseMapper<DatasetFile> {
* @return 源文件ID列表
*/
List<String> findSourceFileIdsWithDerivedFiles(@Param("datasetId") String datasetId);
/**
* 批量统计排除衍生文件后的文件数
*
* @param datasetIds 数据集ID列表
* @return 文件数统计列表
*/
List<DatasetFileCount> countNonDerivedByDatasetIds(@Param("datasetIds") List<String> datasetIds);
}

View File

@@ -0,0 +1,9 @@
package com.datamate.datamanagement.infrastructure.persistence.mapper;
import com.baomidou.mybatisplus.core.mapper.BaseMapper;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItemDirectory;
import org.apache.ibatis.annotations.Mapper;
@Mapper
public interface KnowledgeItemDirectoryMapper extends BaseMapper<KnowledgeItemDirectory> {
}

View File

@@ -28,13 +28,16 @@ public interface KnowledgeItemMapper extends BaseMapper<KnowledgeItem> {
WHEN ki.source_type = 'FILE_UPLOAD' THEN ki.content
ELSE NULL
END AS content,
ki.relative_path AS relativePath,
ki.created_at AS createdAt,
ki.updated_at AS updatedAt
FROM t_dm_knowledge_items ki
LEFT JOIN t_dm_knowledge_sets ks ON ki.set_id = ks.id
LEFT JOIN t_dm_dataset_files df ON ki.source_file_id = df.id AND ki.source_type = 'DATASET_FILE'
WHERE (ki.source_type = 'FILE_UPLOAD' AND ki.source_file_id LIKE CONCAT('%', #{keyword}, '%'))
OR (ki.source_type = 'DATASET_FILE' AND df.file_name LIKE CONCAT('%', #{keyword}, '%'))
WHERE (ki.source_type = 'FILE_UPLOAD' AND (ki.source_file_id LIKE CONCAT('%', #{keyword}, '%')
OR ki.relative_path LIKE CONCAT('%', #{keyword}, '%')))
OR (ki.source_type = 'DATASET_FILE' AND (df.file_name LIKE CONCAT('%', #{keyword}, '%')
OR ki.relative_path LIKE CONCAT('%', #{keyword}, '%')))
ORDER BY ki.created_at DESC
""")
IPage<KnowledgeItemSearchResponse> searchFileItems(IPage<?> page, @Param("keyword") String keyword);

View File

@@ -14,6 +14,7 @@ public interface TagMapper {
List<Tag> findByIdIn(@Param("ids") List<String> ids);
List<Tag> findByKeyword(@Param("keyword") String keyword);
List<Tag> findAllByOrderByUsageCountDesc();
Long countKnowledgeSetTags();
int insert(Tag tag);
int update(Tag tag);

View File

@@ -3,6 +3,7 @@ package com.datamate.datamanagement.infrastructure.persistence.repository;
import com.baomidou.mybatisplus.core.metadata.IPage;
import com.baomidou.mybatisplus.extension.repository.IRepository;
import com.datamate.datamanagement.domain.model.dataset.DatasetFile;
import com.datamate.datamanagement.infrastructure.persistence.repository.dto.DatasetFileCount;
import java.util.List;
@@ -15,6 +16,8 @@ import java.util.List;
public interface DatasetFileRepository extends IRepository<DatasetFile> {
Long countByDatasetId(String datasetId);
Long countNonDerivedByDatasetId(String datasetId);
Long countCompletedByDatasetId(String datasetId);
Long sumSizeByDatasetId(String datasetId);
@@ -36,4 +39,6 @@ public interface DatasetFileRepository extends IRepository<DatasetFile> {
* @return 源文件ID列表
*/
List<String> findSourceFileIdsWithDerivedFiles(String datasetId);
List<DatasetFileCount> countNonDerivedByDatasetIds(List<String> datasetIds);
}

View File

@@ -0,0 +1,18 @@
package com.datamate.datamanagement.infrastructure.persistence.repository;
import com.baomidou.mybatisplus.extension.repository.IRepository;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItemDirectory;
import com.datamate.datamanagement.interfaces.dto.KnowledgeDirectoryQuery;
import java.util.List;
/**
* 知识条目目录仓储接口
*/
public interface KnowledgeItemDirectoryRepository extends IRepository<KnowledgeItemDirectory> {
List<KnowledgeItemDirectory> findByCriteria(KnowledgeDirectoryQuery query);
KnowledgeItemDirectory findBySetIdAndPath(String setId, String relativePath);
int removeByRelativePathPrefix(String setId, String relativePath);
}

View File

@@ -26,4 +26,8 @@ public interface KnowledgeItemRepository extends IRepository<KnowledgeItem> {
IPage<KnowledgeItemSearchResponse> searchFileItems(IPage<?> page, String keyword);
Long sumDatasetFileSize();
boolean existsBySetIdAndRelativePath(String setId, String relativePath);
int removeByRelativePathPrefix(String setId, String relativePath);
}

View File

@@ -0,0 +1,18 @@
package com.datamate.datamanagement.infrastructure.persistence.repository.dto;
import lombok.AllArgsConstructor;
import lombok.Getter;
import lombok.NoArgsConstructor;
import lombok.Setter;
/**
* 数据集文件数统计结果
*/
@Getter
@Setter
@NoArgsConstructor
@AllArgsConstructor
public class DatasetFileCount {
private String datasetId;
private Long fileCount;
}

View File

@@ -6,6 +6,7 @@ import com.baomidou.mybatisplus.extension.repository.CrudRepository;
import com.datamate.datamanagement.domain.model.dataset.DatasetFile;
import com.datamate.datamanagement.infrastructure.persistence.mapper.DatasetFileMapper;
import com.datamate.datamanagement.infrastructure.persistence.repository.DatasetFileRepository;
import com.datamate.datamanagement.infrastructure.persistence.repository.dto.DatasetFileCount;
import lombok.RequiredArgsConstructor;
import org.springframework.stereotype.Repository;
import org.springframework.util.StringUtils;
@@ -30,6 +31,11 @@ public class DatasetFileRepositoryImpl extends CrudRepository<DatasetFileMapper,
return datasetFileMapper.selectCount(new LambdaQueryWrapper<DatasetFile>().eq(DatasetFile::getDatasetId, datasetId));
}
@Override
public Long countNonDerivedByDatasetId(String datasetId) {
return datasetFileMapper.countNonDerivedByDatasetId(datasetId);
}
@Override
public Long countCompletedByDatasetId(String datasetId) {
return datasetFileMapper.countCompletedByDatasetId(datasetId);
@@ -71,4 +77,9 @@ public class DatasetFileRepositoryImpl extends CrudRepository<DatasetFileMapper,
// 使用 MyBatis 的 @Select 注解或直接调用 mapper 方法
return datasetFileMapper.findSourceFileIdsWithDerivedFiles(datasetId);
}
@Override
public List<DatasetFileCount> countNonDerivedByDatasetIds(List<String> datasetIds) {
return datasetFileMapper.countNonDerivedByDatasetIds(datasetIds);
}
}

View File

@@ -0,0 +1,96 @@
package com.datamate.datamanagement.infrastructure.persistence.repository.impl;
import com.baomidou.mybatisplus.core.conditions.query.LambdaQueryWrapper;
import com.baomidou.mybatisplus.extension.repository.CrudRepository;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItemDirectory;
import com.datamate.datamanagement.infrastructure.persistence.mapper.KnowledgeItemDirectoryMapper;
import com.datamate.datamanagement.infrastructure.persistence.repository.KnowledgeItemDirectoryRepository;
import com.datamate.datamanagement.interfaces.dto.KnowledgeDirectoryQuery;
import lombok.RequiredArgsConstructor;
import org.apache.commons.lang3.StringUtils;
import org.springframework.stereotype.Repository;
import java.util.List;
/**
* 知识条目目录仓储实现类
*/
@Repository
@RequiredArgsConstructor
public class KnowledgeItemDirectoryRepositoryImpl
extends CrudRepository<KnowledgeItemDirectoryMapper, KnowledgeItemDirectory>
implements KnowledgeItemDirectoryRepository {
private static final String PATH_SEPARATOR = "/";
private final KnowledgeItemDirectoryMapper knowledgeItemDirectoryMapper;
@Override
public List<KnowledgeItemDirectory> findByCriteria(KnowledgeDirectoryQuery query) {
String relativePath = normalizeRelativePathPrefix(query.getRelativePath());
LambdaQueryWrapper<KnowledgeItemDirectory> wrapper = new LambdaQueryWrapper<KnowledgeItemDirectory>()
.eq(StringUtils.isNotBlank(query.getSetId()), KnowledgeItemDirectory::getSetId, query.getSetId())
.likeRight(StringUtils.isNotBlank(relativePath), KnowledgeItemDirectory::getRelativePath, relativePath);
if (StringUtils.isNotBlank(query.getKeyword())) {
wrapper.and(w -> w.like(KnowledgeItemDirectory::getName, query.getKeyword())
.or()
.like(KnowledgeItemDirectory::getRelativePath, query.getKeyword()));
}
wrapper.orderByAsc(KnowledgeItemDirectory::getRelativePath);
return knowledgeItemDirectoryMapper.selectList(wrapper);
}
@Override
public KnowledgeItemDirectory findBySetIdAndPath(String setId, String relativePath) {
return knowledgeItemDirectoryMapper.selectOne(new LambdaQueryWrapper<KnowledgeItemDirectory>()
.eq(KnowledgeItemDirectory::getSetId, setId)
.eq(KnowledgeItemDirectory::getRelativePath, relativePath));
}
@Override
public int removeByRelativePathPrefix(String setId, String relativePath) {
String normalized = normalizeRelativePathValue(relativePath);
if (StringUtils.isBlank(normalized)) {
return 0;
}
String prefix = normalizeRelativePathPrefix(normalized);
LambdaQueryWrapper<KnowledgeItemDirectory> wrapper = new LambdaQueryWrapper<KnowledgeItemDirectory>()
.eq(KnowledgeItemDirectory::getSetId, setId)
.and(w -> w.eq(KnowledgeItemDirectory::getRelativePath, normalized)
.or()
.likeRight(KnowledgeItemDirectory::getRelativePath, prefix));
return knowledgeItemDirectoryMapper.delete(wrapper);
}
private String normalizeRelativePathPrefix(String relativePath) {
if (StringUtils.isBlank(relativePath)) {
return "";
}
String normalized = relativePath.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
if (StringUtils.isBlank(normalized)) {
return "";
}
if (!normalized.endsWith(PATH_SEPARATOR)) {
normalized = normalized + PATH_SEPARATOR;
}
return normalized;
}
private String normalizeRelativePathValue(String relativePath) {
if (StringUtils.isBlank(relativePath)) {
return "";
}
String normalized = relativePath.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
while (normalized.endsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(0, normalized.length() - 1);
}
return normalized;
}
}

View File

@@ -21,21 +21,26 @@ import java.util.List;
@Repository
@RequiredArgsConstructor
public class KnowledgeItemRepositoryImpl extends CrudRepository<KnowledgeItemMapper, KnowledgeItem> implements KnowledgeItemRepository {
private static final String PATH_SEPARATOR = "/";
private final KnowledgeItemMapper knowledgeItemMapper;
@Override
public IPage<KnowledgeItem> findByCriteria(IPage<KnowledgeItem> page, KnowledgeItemPagingQuery query) {
String relativePath = normalizeRelativePathPrefix(query.getRelativePath());
LambdaQueryWrapper<KnowledgeItem> wrapper = new LambdaQueryWrapper<KnowledgeItem>()
.eq(StringUtils.isNotBlank(query.getSetId()), KnowledgeItem::getSetId, query.getSetId())
.eq(query.getContentType() != null, KnowledgeItem::getContentType, query.getContentType())
.eq(query.getSourceType() != null, KnowledgeItem::getSourceType, query.getSourceType())
.eq(StringUtils.isNotBlank(query.getSourceDatasetId()), KnowledgeItem::getSourceDatasetId, query.getSourceDatasetId())
.eq(StringUtils.isNotBlank(query.getSourceFileId()), KnowledgeItem::getSourceFileId, query.getSourceFileId());
.eq(StringUtils.isNotBlank(query.getSourceFileId()), KnowledgeItem::getSourceFileId, query.getSourceFileId())
.likeRight(StringUtils.isNotBlank(relativePath), KnowledgeItem::getRelativePath, relativePath);
if (StringUtils.isNotBlank(query.getKeyword())) {
wrapper.and(w -> w.like(KnowledgeItem::getSourceFileId, query.getKeyword())
.or()
.like(KnowledgeItem::getContent, query.getKeyword()));
.like(KnowledgeItem::getContent, query.getKeyword())
.or()
.like(KnowledgeItem::getRelativePath, query.getKeyword()));
}
wrapper.orderByDesc(KnowledgeItem::getCreatedAt);
@@ -77,4 +82,60 @@ public class KnowledgeItemRepositoryImpl extends CrudRepository<KnowledgeItemMap
public Long sumDatasetFileSize() {
return knowledgeItemMapper.sumDatasetFileSize();
}
@Override
public boolean existsBySetIdAndRelativePath(String setId, String relativePath) {
if (StringUtils.isBlank(setId) || StringUtils.isBlank(relativePath)) {
return false;
}
return knowledgeItemMapper.selectCount(new LambdaQueryWrapper<KnowledgeItem>()
.eq(KnowledgeItem::getSetId, setId)
.eq(KnowledgeItem::getRelativePath, relativePath)) > 0;
}
@Override
public int removeByRelativePathPrefix(String setId, String relativePath) {
String normalized = normalizeRelativePathValue(relativePath);
if (StringUtils.isBlank(setId) || StringUtils.isBlank(normalized)) {
return 0;
}
String prefix = normalizeRelativePathPrefix(normalized);
LambdaQueryWrapper<KnowledgeItem> wrapper = new LambdaQueryWrapper<KnowledgeItem>()
.eq(KnowledgeItem::getSetId, setId)
.and(w -> w.eq(KnowledgeItem::getRelativePath, normalized)
.or()
.likeRight(KnowledgeItem::getRelativePath, prefix));
return knowledgeItemMapper.delete(wrapper);
}
private String normalizeRelativePathPrefix(String relativePath) {
if (StringUtils.isBlank(relativePath)) {
return "";
}
String normalized = relativePath.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
if (StringUtils.isBlank(normalized)) {
return "";
}
if (!normalized.endsWith(PATH_SEPARATOR)) {
normalized = normalized + PATH_SEPARATOR;
}
return normalized;
}
private String normalizeRelativePathValue(String relativePath) {
if (StringUtils.isBlank(relativePath)) {
return "";
}
String normalized = relativePath.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
while (normalized.endsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(0, normalized.length() - 1);
}
return normalized;
}
}

View File

@@ -1,9 +1,11 @@
package com.datamate.datamanagement.interfaces.converter;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItem;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItemDirectory;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeSet;
import com.datamate.datamanagement.interfaces.dto.CreateKnowledgeItemRequest;
import com.datamate.datamanagement.interfaces.dto.CreateKnowledgeSetRequest;
import com.datamate.datamanagement.interfaces.dto.KnowledgeDirectoryResponse;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemResponse;
import com.datamate.datamanagement.interfaces.dto.KnowledgeSetResponse;
import org.mapstruct.Mapper;
@@ -31,4 +33,8 @@ public interface KnowledgeConverter {
KnowledgeItemResponse convertToResponse(KnowledgeItem knowledgeItem);
List<KnowledgeItemResponse> convertItemResponses(List<KnowledgeItem> items);
KnowledgeDirectoryResponse convertToResponse(KnowledgeItemDirectory directory);
List<KnowledgeDirectoryResponse> convertDirectoryResponses(List<KnowledgeItemDirectory> directories);
}

View File

@@ -0,0 +1,20 @@
package com.datamate.datamanagement.interfaces.dto;
import jakarta.validation.constraints.NotBlank;
import lombok.Getter;
import lombok.Setter;
/**
* 创建知识条目目录请求
*/
@Getter
@Setter
public class CreateKnowledgeDirectoryRequest {
/** 父级前缀路径,例如 "docs/",为空表示知识集根目录 */
private String parentPrefix;
/** 新建目录名称 */
@NotBlank
private String directoryName;
}

View File

@@ -34,4 +34,8 @@ public class CreateKnowledgeItemRequest {
* 来源文件ID(用于标注同步等场景)
*/
private String sourceFileId;
/**
* 扩展元数据
*/
private String metadata;
}

View File

@@ -0,0 +1,16 @@
package com.datamate.datamanagement.interfaces.dto;
import com.datamate.datamanagement.common.enums.KnowledgeItemPreviewStatus;
import lombok.Getter;
import lombok.Setter;
/**
* 数据集文件预览状态响应
*/
@Getter
@Setter
public class DatasetFilePreviewStatusResponse {
private KnowledgeItemPreviewStatus status;
private String previewError;
private String updatedAt;
}

View File

@@ -0,0 +1,20 @@
package com.datamate.datamanagement.interfaces.dto;
import jakarta.validation.constraints.NotEmpty;
import lombok.Getter;
import lombok.Setter;
import java.util.List;
/**
* 批量删除知识条目请求
*/
@Getter
@Setter
public class DeleteKnowledgeItemsRequest {
/**
* 知识条目ID列表
*/
@NotEmpty(message = "知识条目ID不能为空")
private List<String> ids;
}

View File

@@ -0,0 +1,20 @@
package com.datamate.datamanagement.interfaces.dto;
import lombok.Getter;
import lombok.Setter;
/**
* 知识条目目录查询参数
*/
@Getter
@Setter
public class KnowledgeDirectoryQuery {
/** 所属知识集ID */
private String setId;
/** 目录相对路径前缀 */
private String relativePath;
/** 搜索关键字 */
private String keyword;
}

View File

@@ -0,0 +1,20 @@
package com.datamate.datamanagement.interfaces.dto;
import lombok.Getter;
import lombok.Setter;
import java.time.LocalDateTime;
/**
* 知识条目目录响应
*/
@Getter
@Setter
public class KnowledgeDirectoryResponse {
private String id;
private String setId;
private String name;
private String relativePath;
private LocalDateTime createdAt;
private LocalDateTime updatedAt;
}

View File

@@ -41,4 +41,8 @@ public class KnowledgeItemPagingQuery extends PagingQuery {
* 来源文件ID
*/
private String sourceFileId;
/**
* 相对路径前缀
*/
private String relativePath;
}

View File

@@ -0,0 +1,16 @@
package com.datamate.datamanagement.interfaces.dto;
import com.datamate.datamanagement.common.enums.KnowledgeItemPreviewStatus;
import lombok.Getter;
import lombok.Setter;
/**
* 知识条目预览状态响应
*/
@Getter
@Setter
public class KnowledgeItemPreviewStatusResponse {
private KnowledgeItemPreviewStatus status;
private String previewError;
private String updatedAt;
}

View File

@@ -20,6 +20,14 @@ public class KnowledgeItemResponse {
private KnowledgeSourceType sourceType;
private String sourceDatasetId;
private String sourceFileId;
/**
* 相对路径(用于目录展示)
*/
private String relativePath;
/**
* 扩展元数据
*/
private String metadata;
private LocalDateTime createdAt;
private LocalDateTime updatedAt;
private String createdBy;

View File

@@ -23,6 +23,10 @@ public class KnowledgeItemSearchResponse {
private String sourceFileId;
private String fileName;
private Long fileSize;
/**
* 相对路径(用于目录展示)
*/
private String relativePath;
private LocalDateTime createdAt;
private LocalDateTime updatedAt;

View File

@@ -12,4 +12,5 @@ public class KnowledgeManagementStatisticsResponse {
private Long totalKnowledgeSets = 0L;
private Long totalFiles = 0L;
private Long totalSize = 0L;
private Long totalTags = 0L;
}

View File

@@ -1,8 +1,10 @@
package com.datamate.datamanagement.interfaces.dto;
import com.datamate.datamanagement.common.enums.DatasetStatusType;
import com.fasterxml.jackson.annotation.JsonIgnore;
import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.Size;
import lombok.AccessLevel;
import lombok.Getter;
import lombok.Setter;
@@ -24,9 +26,18 @@ public class UpdateDatasetRequest {
/** 归集任务id */
private String dataSource;
/** 父数据集ID */
@Setter(AccessLevel.NONE)
private String parentDatasetId;
@JsonIgnore
@Setter(AccessLevel.NONE)
private boolean parentDatasetIdProvided;
/** 标签列表 */
private List<String> tags;
/** 数据集状态 */
private DatasetStatusType status;
public void setParentDatasetId(String parentDatasetId) {
this.parentDatasetIdProvided = true;
this.parentDatasetId = parentDatasetId;
}
}

View File

@@ -18,4 +18,8 @@ public class UpdateKnowledgeItemRequest {
* 内容类型
*/
private KnowledgeContentType contentType;
/**
* 扩展元数据
*/
private String metadata;
}

View File

@@ -17,4 +17,8 @@ public class UploadKnowledgeItemsRequest {
*/
@NotEmpty(message = "文件列表不能为空")
private List<MultipartFile> files;
/**
* 目录前缀(用于目录上传)
*/
private String parentPrefix;
}

View File

@@ -5,20 +5,23 @@ import com.datamate.common.infrastructure.common.Response;
import com.datamate.common.infrastructure.exception.SystemErrorCode;
import com.datamate.common.interfaces.PagedResponse;
import com.datamate.common.interfaces.PagingQuery;
import com.datamate.datamanagement.application.DatasetFileApplicationService;
import com.datamate.datamanagement.application.DatasetFileApplicationService;
import com.datamate.datamanagement.application.DatasetFilePreviewService;
import com.datamate.datamanagement.domain.model.dataset.DatasetFile;
import com.datamate.datamanagement.interfaces.converter.DatasetConverter;
import com.datamate.datamanagement.interfaces.dto.AddFilesRequest;
import com.datamate.datamanagement.interfaces.dto.CopyFilesRequest;
import com.datamate.datamanagement.interfaces.dto.CreateDirectoryRequest;
import com.datamate.datamanagement.interfaces.dto.DatasetFileResponse;
import com.datamate.datamanagement.interfaces.dto.UploadFileRequest;
import com.datamate.datamanagement.interfaces.dto.UploadFilesPreRequest;
import com.datamate.datamanagement.interfaces.dto.AddFilesRequest;
import com.datamate.datamanagement.interfaces.dto.CopyFilesRequest;
import com.datamate.datamanagement.interfaces.dto.CreateDirectoryRequest;
import com.datamate.datamanagement.interfaces.dto.DatasetFilePreviewStatusResponse;
import com.datamate.datamanagement.interfaces.dto.DatasetFileResponse;
import com.datamate.datamanagement.interfaces.dto.UploadFileRequest;
import com.datamate.datamanagement.interfaces.dto.UploadFilesPreRequest;
import jakarta.servlet.http.HttpServletResponse;
import jakarta.validation.Valid;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.core.io.Resource;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.core.io.Resource;
import org.springframework.core.io.UrlResource;
import org.springframework.http.HttpHeaders;
import org.springframework.http.HttpStatus;
import org.springframework.http.MediaType;
@@ -36,12 +39,15 @@ import java.util.List;
@RequestMapping("/data-management/datasets/{datasetId}/files")
public class DatasetFileController {
private final DatasetFileApplicationService datasetFileApplicationService;
private final DatasetFileApplicationService datasetFileApplicationService;
private final DatasetFilePreviewService datasetFilePreviewService;
@Autowired
public DatasetFileController(DatasetFileApplicationService datasetFileApplicationService) {
this.datasetFileApplicationService = datasetFileApplicationService;
}
public DatasetFileController(DatasetFileApplicationService datasetFileApplicationService,
DatasetFilePreviewService datasetFilePreviewService) {
this.datasetFileApplicationService = datasetFileApplicationService;
this.datasetFilePreviewService = datasetFilePreviewService;
}
@GetMapping
public Response<PagedResponse<DatasetFile>> getDatasetFiles(
@@ -114,15 +120,28 @@ public class DatasetFileController {
}
}
@IgnoreResponseWrap
@GetMapping(value = "/{fileId}/preview", produces = MediaType.ALL_VALUE)
public ResponseEntity<Resource> previewDatasetFileById(@PathVariable("datasetId") String datasetId,
@PathVariable("fileId") String fileId) {
try {
DatasetFile datasetFile = datasetFileApplicationService.getDatasetFile(datasetId, fileId);
Resource resource = datasetFileApplicationService.downloadFile(datasetId, fileId);
MediaType mediaType = MediaTypeFactory.getMediaType(resource)
.orElse(MediaType.APPLICATION_OCTET_STREAM);
@IgnoreResponseWrap
@GetMapping(value = "/{fileId}/preview", produces = MediaType.ALL_VALUE)
public ResponseEntity<Resource> previewDatasetFileById(@PathVariable("datasetId") String datasetId,
@PathVariable("fileId") String fileId) {
try {
DatasetFile datasetFile = datasetFileApplicationService.getDatasetFile(datasetId, fileId);
if (datasetFilePreviewService.isOfficeDocument(datasetFile.getFileName())) {
DatasetFilePreviewService.PreviewFile previewFile = datasetFilePreviewService
.resolveReadyPreviewFile(datasetId, datasetFile);
if (previewFile == null) {
return ResponseEntity.status(HttpStatus.CONFLICT).build();
}
Resource previewResource = new UrlResource(previewFile.filePath().toUri());
return ResponseEntity.ok()
.contentType(MediaType.APPLICATION_PDF)
.header(HttpHeaders.CONTENT_DISPOSITION,
"inline; filename=\"" + previewFile.fileName() + "\"")
.body(previewResource);
}
Resource resource = datasetFileApplicationService.downloadFile(datasetId, fileId);
MediaType mediaType = MediaTypeFactory.getMediaType(resource)
.orElse(MediaType.APPLICATION_OCTET_STREAM);
return ResponseEntity.ok()
.contentType(mediaType)
@@ -133,8 +152,20 @@ public class DatasetFileController {
return ResponseEntity.status(HttpStatus.NOT_FOUND).build();
} catch (Exception e) {
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).build();
}
}
}
}
@GetMapping("/{fileId}/preview/status")
public DatasetFilePreviewStatusResponse getDatasetFilePreviewStatus(@PathVariable("datasetId") String datasetId,
@PathVariable("fileId") String fileId) {
return datasetFilePreviewService.getPreviewStatus(datasetId, fileId);
}
@PostMapping("/{fileId}/preview/convert")
public DatasetFilePreviewStatusResponse convertDatasetFilePreview(@PathVariable("datasetId") String datasetId,
@PathVariable("fileId") String fileId) {
return datasetFilePreviewService.ensurePreview(datasetId, fileId);
}
@IgnoreResponseWrap
@GetMapping(value = "/download", produces = MediaType.APPLICATION_OCTET_STREAM_VALUE)

View File

@@ -0,0 +1,43 @@
package com.datamate.datamanagement.interfaces.rest;
import com.datamate.datamanagement.application.KnowledgeDirectoryApplicationService;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItemDirectory;
import com.datamate.datamanagement.interfaces.converter.KnowledgeConverter;
import com.datamate.datamanagement.interfaces.dto.CreateKnowledgeDirectoryRequest;
import com.datamate.datamanagement.interfaces.dto.KnowledgeDirectoryQuery;
import com.datamate.datamanagement.interfaces.dto.KnowledgeDirectoryResponse;
import jakarta.validation.Valid;
import lombok.RequiredArgsConstructor;
import org.springframework.web.bind.annotation.*;
import java.util.List;
/**
* 知识条目目录 REST 控制器
*/
@RestController
@RequiredArgsConstructor
@RequestMapping("/data-management/knowledge-sets/{setId}/directories")
public class KnowledgeDirectoryController {
private final KnowledgeDirectoryApplicationService knowledgeDirectoryApplicationService;
@GetMapping
public List<KnowledgeDirectoryResponse> getKnowledgeDirectories(@PathVariable("setId") String setId,
KnowledgeDirectoryQuery query) {
List<KnowledgeItemDirectory> directories = knowledgeDirectoryApplicationService.getKnowledgeDirectories(setId, query);
return KnowledgeConverter.INSTANCE.convertDirectoryResponses(directories);
}
@PostMapping
public KnowledgeDirectoryResponse createKnowledgeDirectory(@PathVariable("setId") String setId,
@RequestBody @Valid CreateKnowledgeDirectoryRequest request) {
KnowledgeItemDirectory directory = knowledgeDirectoryApplicationService.createKnowledgeDirectory(setId, request);
return KnowledgeConverter.INSTANCE.convertToResponse(directory);
}
@DeleteMapping
public void deleteKnowledgeDirectory(@PathVariable("setId") String setId,
@RequestParam("relativePath") String relativePath) {
knowledgeDirectoryApplicationService.deleteKnowledgeDirectory(setId, relativePath);
}
}

View File

@@ -3,11 +3,14 @@ package com.datamate.datamanagement.interfaces.rest;
import com.datamate.common.infrastructure.common.IgnoreResponseWrap;
import com.datamate.common.interfaces.PagedResponse;
import com.datamate.datamanagement.application.KnowledgeItemApplicationService;
import com.datamate.datamanagement.application.KnowledgeItemPreviewService;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItem;
import com.datamate.datamanagement.interfaces.converter.KnowledgeConverter;
import com.datamate.datamanagement.interfaces.dto.CreateKnowledgeItemRequest;
import com.datamate.datamanagement.interfaces.dto.DeleteKnowledgeItemsRequest;
import com.datamate.datamanagement.interfaces.dto.ImportKnowledgeItemsRequest;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemPagingQuery;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemPreviewStatusResponse;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemResponse;
import com.datamate.datamanagement.interfaces.dto.ReplaceKnowledgeItemFileRequest;
import com.datamate.datamanagement.interfaces.dto.UpdateKnowledgeItemRequest;
@@ -30,6 +33,7 @@ import java.util.List;
@RequestMapping("/data-management/knowledge-sets/{setId}/items")
public class KnowledgeItemController {
private final KnowledgeItemApplicationService knowledgeItemApplicationService;
private final KnowledgeItemPreviewService knowledgeItemPreviewService;
@GetMapping
public PagedResponse<KnowledgeItemResponse> getKnowledgeItems(@PathVariable("setId") String setId,
@@ -80,6 +84,18 @@ public class KnowledgeItemController {
knowledgeItemApplicationService.previewKnowledgeItemFile(setId, itemId, response);
}
@GetMapping("/{itemId}/preview/status")
public KnowledgeItemPreviewStatusResponse getKnowledgeItemPreviewStatus(@PathVariable("setId") String setId,
@PathVariable("itemId") String itemId) {
return knowledgeItemPreviewService.getPreviewStatus(setId, itemId);
}
@PostMapping("/{itemId}/preview/convert")
public KnowledgeItemPreviewStatusResponse convertKnowledgeItemPreview(@PathVariable("setId") String setId,
@PathVariable("itemId") String itemId) {
return knowledgeItemPreviewService.ensurePreview(setId, itemId);
}
@GetMapping("/{itemId}")
public KnowledgeItemResponse getKnowledgeItemById(@PathVariable("setId") String setId,
@PathVariable("itemId") String itemId) {
@@ -108,4 +124,10 @@ public class KnowledgeItemController {
@PathVariable("itemId") String itemId) {
knowledgeItemApplicationService.deleteKnowledgeItem(setId, itemId);
}
@PostMapping("/batch-delete")
public void deleteKnowledgeItems(@PathVariable("setId") String setId,
@RequestBody @Valid DeleteKnowledgeItemsRequest request) {
knowledgeItemApplicationService.deleteKnowledgeItems(setId, request);
}
}

View File

@@ -42,6 +42,13 @@
SELECT COUNT(*) FROM t_dm_dataset_files WHERE dataset_id = #{datasetId}
</select>
<select id="countNonDerivedByDatasetId" parameterType="string" resultType="long">
SELECT COUNT(*)
FROM t_dm_dataset_files
WHERE dataset_id = #{datasetId}
AND (metadata IS NULL OR JSON_EXTRACT(metadata, '$.derived_from_file_id') IS NULL)
</select>
<select id="countCompletedByDatasetId" parameterType="string" resultType="long">
SELECT COUNT(*) FROM t_dm_dataset_files WHERE dataset_id = #{datasetId} AND status = 'COMPLETED'
</select>
@@ -110,4 +117,16 @@
AND metadata IS NOT NULL
AND JSON_EXTRACT(metadata, '$.derived_from_file_id') IS NOT NULL
</select>
<select id="countNonDerivedByDatasetIds" resultType="com.datamate.datamanagement.infrastructure.persistence.repository.dto.DatasetFileCount">
SELECT dataset_id AS datasetId,
COUNT(*) AS fileCount
FROM t_dm_dataset_files
WHERE dataset_id IN
<foreach collection="datasetIds" item="datasetId" open="(" separator="," close=")">
#{datasetId}
</foreach>
AND (metadata IS NULL OR JSON_EXTRACT(metadata, '$.derived_from_file_id') IS NULL)
GROUP BY dataset_id
</select>
</mapper>

View File

@@ -145,9 +145,10 @@
<select id="getAllDatasetStatistics" resultType="com.datamate.datamanagement.interfaces.dto.AllDatasetStatisticsResponse">
SELECT
COUNT(*) AS total_datasets,
SUM(size_bytes) AS total_size,
SUM(file_count) AS total_files
FROM t_dm_datasets;
(SELECT COUNT(*) FROM t_dm_datasets) AS total_datasets,
(SELECT COALESCE(SUM(size_bytes), 0) FROM t_dm_datasets) AS total_size,
(SELECT COUNT(*)
FROM t_dm_dataset_files
WHERE metadata IS NULL OR JSON_EXTRACT(metadata, '$.derived_from_file_id') IS NULL) AS total_files
</select>
</mapper>

View File

@@ -53,6 +53,19 @@
ORDER BY usage_count DESC, name ASC
</select>
<select id="countKnowledgeSetTags" resultType="long">
SELECT COUNT(DISTINCT t.id)
FROM t_dm_tags t
WHERE EXISTS (
SELECT 1
FROM t_dm_knowledge_sets ks
WHERE ks.tags IS NOT NULL
AND JSON_VALID(ks.tags) = 1
AND JSON_LENGTH(ks.tags) > 0
AND JSON_SEARCH(ks.tags, 'one', t.name, NULL, '$[*].name') IS NOT NULL
)
</select>
<insert id="insert" parameterType="com.datamate.datamanagement.domain.model.dataset.Tag">
INSERT INTO t_dm_tags (id, name, description, category, color, usage_count)
VALUES (#{id}, #{name}, #{description}, #{category}, #{color}, #{usageCount})

View File

@@ -21,7 +21,7 @@ import java.util.UUID;
*/
@Component
public class FileService {
private static final int DEFAULT_TIMEOUT = 120;
private static final int DEFAULT_TIMEOUT = 1800;
private final ChunkUploadRequestMapper chunkUploadRequestMapper;

View File

@@ -5,7 +5,7 @@ server {
access_log /var/log/datamate/frontend/access.log main;
error_log /var/log/datamate/frontend/error.log notice;
client_max_body_size 1024M;
client_max_body_size 0;
add_header Set-Cookie "NEXT_LOCALE=zh";

View File

@@ -11,6 +11,7 @@ services:
- log_volume:/var/log/datamate
- operator-upload-volume:/operators/upload
- operator-runtime-volume:/operators/extract
- uploads_volume:/uploads
networks: [ datamate ]
depends_on:
- datamate-database
@@ -154,6 +155,8 @@ services:
profiles: [ data-juicer ]
volumes:
uploads_volume:
name: datamate-uploads-volume
dataset_volume:
name: datamate-dataset-volume
flow_volume:

View File

@@ -268,6 +268,17 @@
return true;
}
function isSaveShortcut(event) {
if (!event || event.defaultPrevented || event.isComposing) return false;
const key = event.key;
const code = event.code;
const isS = key === "s" || key === "S" || code === "KeyS";
if (!isS) return false;
if (!(event.ctrlKey || event.metaKey)) return false;
if (event.shiftKey || event.altKey) return false;
return true;
}
function handleSaveAndNextShortcut(event) {
if (!isSaveAndNextShortcut(event) || event.repeat) return;
event.preventDefault();
@@ -280,6 +291,18 @@
}
}
function handleSaveShortcut(event) {
if (!isSaveShortcut(event) || event.repeat) return;
event.preventDefault();
event.stopPropagation();
try {
const raw = exportSelectedAnnotation();
postToParent("LS_EXPORT_RESULT", raw);
} catch (e) {
postToParent("LS_ERROR", { message: e?.message || String(e) });
}
}
function initLabelStudio(payload) {
if (!window.LabelStudio) {
throw new Error("LabelStudio 未加载(请检查静态资源/网络)");
@@ -351,6 +374,7 @@
}
window.addEventListener("keydown", handleSaveAndNextShortcut);
window.addEventListener("keydown", handleSaveShortcut);
window.addEventListener("message", (event) => {
if (event.origin !== ORIGIN) return;

View File

@@ -1,17 +1,17 @@
import { Button, Input, Popover, theme, Tag, Empty } from "antd";
import { PlusOutlined } from "@ant-design/icons";
import { useEffect, useMemo, useState } from "react";
import { useCallback, useEffect, useMemo, useState } from "react";
interface Tag {
id: number;
id?: string | number;
name: string;
color: string;
color?: string;
}
interface AddTagPopoverProps {
tags: Tag[];
onFetchTags?: () => Promise<Tag[]>;
onAddTag?: (tag: Tag) => void;
onAddTag?: (tagName: string) => void;
onCreateAndTag?: (tagName: string) => void;
}
@@ -27,20 +27,23 @@ export default function AddTagPopover({
const [newTag, setNewTag] = useState("");
const [allTags, setAllTags] = useState<Tag[]>([]);
const tagsSet = useMemo(() => new Set(tags.map((tag) => tag.id)), [tags]);
const tagsSet = useMemo(
() => new Set(tags.map((tag) => (tag.id ?? tag.name))),
[tags]
);
const fetchTags = async () => {
const fetchTags = useCallback(async () => {
if (onFetchTags && showPopover) {
const data = await onFetchTags?.();
setAllTags(data || []);
}
};
}, [onFetchTags, showPopover]);
useEffect(() => {
fetchTags();
}, [showPopover]);
}, [fetchTags]);
const availableTags = useMemo(() => {
return allTags.filter((tag) => !tagsSet.has(tag.id));
return allTags.filter((tag) => !tagsSet.has(tag.id ?? tag.name));
}, [allTags, tagsSet]);
const handleCreateAndAddTag = () => {

View File

@@ -22,44 +22,51 @@ interface OperationItem {
danger?: boolean;
}
interface TagConfig {
showAdd: boolean;
tags: { id: number; name: string; color: string }[];
onFetchTags?: () => Promise<{
data: { id: number; name: string; color: string }[];
}>;
onAddTag?: (tag: { id: number; name: string; color: string }) => void;
onCreateAndTag?: (tagName: string) => void;
}
interface DetailHeaderProps<T> {
data: T;
statistics: StatisticItem[];
operations: OperationItem[];
tagConfig?: TagConfig;
}
function DetailHeader<T>({
data = {} as T,
statistics,
operations,
tagConfig,
}: DetailHeaderProps<T>): React.ReactNode {
interface TagConfig {
showAdd: boolean;
tags: { id?: string | number; name: string; color?: string }[];
onFetchTags?: () => Promise<{ id?: string | number; name: string; color?: string }[]>;
onAddTag?: (tagName: string) => void;
onCreateAndTag?: (tagName: string) => void;
}
interface DetailHeaderData {
name?: string;
description?: string;
status?: { color?: string; icon?: React.ReactNode; label?: string };
tags?: { id?: string | number; name?: string }[];
icon?: React.ReactNode;
iconColor?: string;
}
interface DetailHeaderProps<T extends DetailHeaderData> {
data: T;
statistics: StatisticItem[];
operations: OperationItem[];
tagConfig?: TagConfig;
}
function DetailHeader<T extends DetailHeaderData>({
data = {} as T,
statistics,
operations,
tagConfig,
}: DetailHeaderProps<T>): React.ReactNode {
return (
<Card>
<div className="flex items-start justify-between">
<div className="flex items-start gap-4 flex-1">
<div
className={`w-16 h-16 text-white rounded-lg flex-center shadow-lg ${
(data as any)?.iconColor
? ""
: "bg-gradient-to-br from-sky-300 to-blue-500 text-white"
}`}
style={(data as any)?.iconColor ? { backgroundColor: (data as any).iconColor } : undefined}
>
{<div className="w-[2.8rem] h-[2.8rem] text-gray-50">{(data as any)?.icon}</div> || (
<Database className="w-8 h-8 text-white" />
)}
</div>
data?.iconColor
? ""
: "bg-gradient-to-br from-sky-300 to-blue-500 text-white"
}`}
style={data?.iconColor ? { backgroundColor: data.iconColor } : undefined}
>
{<div className="w-[2.8rem] h-[2.8rem] text-gray-50">{data?.icon}</div> || (
<Database className="w-8 h-8 text-white" />
)}
</div>
<div className="flex-1">
<div className="flex items-center gap-3 mb-2">
<h1 className="text-lg font-bold text-gray-900">{data?.name}</h1>

View File

@@ -0,0 +1,21 @@
import React from 'react';
import { Navigate, useLocation, Outlet } from 'react-router';
import { useAppSelector } from '@/store/hooks';
interface ProtectedRouteProps {
children?: React.ReactNode;
}
const ProtectedRoute: React.FC<ProtectedRouteProps> = ({ children }) => {
const { isAuthenticated } = useAppSelector((state) => state.auth);
const location = useLocation();
if (!isAuthenticated) {
// Redirect to the login page, but save the current location they were trying to go to
return <Navigate to="/login" state={{ from: location }} replace />;
}
return children ? <>{children}</> : <Outlet />;
};
export default ProtectedRoute;

View File

@@ -1,198 +1,384 @@
import { TaskItem } from "@/pages/DataManagement/dataset.model";
import { calculateSHA256, checkIsFilesExist } from "@/utils/file.util";
import { App } from "antd";
import { useRef, useState } from "react";
export function useFileSliceUpload(
{
preUpload,
uploadChunk,
cancelUpload,
}: {
preUpload: (id: string, params: any) => Promise<{ data: number }>;
uploadChunk: (id: string, formData: FormData, config: any) => Promise<any>;
cancelUpload: ((reqId: number) => Promise<any>) | null;
},
showTaskCenter = true // 上传时是否显示任务中心
) {
const { message } = App.useApp();
const [taskList, setTaskList] = useState<TaskItem[]>([]);
const taskListRef = useRef<TaskItem[]>([]); // 用于固定任务顺序
const createTask = (detail: any = {}) => {
const { dataset } = detail;
const title = `上传数据集: ${dataset.name} `;
const controller = new AbortController();
const task: TaskItem = {
key: dataset.id,
title,
percent: 0,
reqId: -1,
controller,
size: 0,
updateEvent: detail.updateEvent,
hasArchive: detail.hasArchive,
prefix: detail.prefix,
};
taskListRef.current = [task, ...taskListRef.current];
setTaskList(taskListRef.current);
return task;
};
const updateTaskList = (task: TaskItem) => {
taskListRef.current = taskListRef.current.map((item) =>
item.key === task.key ? task : item
);
setTaskList(taskListRef.current);
};
const removeTask = (task: TaskItem) => {
const { key } = task;
taskListRef.current = taskListRef.current.filter(
(item) => item.key !== key
);
setTaskList(taskListRef.current);
if (task.isCancel && task.cancelFn) {
task.cancelFn();
}
if (task.updateEvent) {
// 携带前缀信息,便于刷新后仍停留在当前目录
window.dispatchEvent(
new CustomEvent(task.updateEvent, {
detail: { prefix: (task as any).prefix },
})
);
}
if (showTaskCenter) {
window.dispatchEvent(
new CustomEvent("show:task-popover", { detail: { show: false } })
);
}
};
async function buildFormData({ file, reqId, i, j }) {
const formData = new FormData();
const { slices, name, size } = file;
const checkSum = await calculateSHA256(slices[j]);
formData.append("file", slices[j]);
formData.append("reqId", reqId.toString());
formData.append("fileNo", (i + 1).toString());
formData.append("chunkNo", (j + 1).toString());
formData.append("fileName", name);
formData.append("fileSize", size.toString());
formData.append("totalChunkNum", slices.length.toString());
formData.append("checkSumHex", checkSum);
return formData;
}
async function uploadSlice(task: TaskItem, fileInfo) {
if (!task) {
return;
}
const { reqId, key } = task;
const { loaded, i, j, files, totalSize } = fileInfo;
const formData = await buildFormData({
file: files[i],
i,
j,
reqId,
});
let newTask = { ...task };
await uploadChunk(key, formData, {
onUploadProgress: (e) => {
const loadedSize = loaded + e.loaded;
const curPercent = Number((loadedSize / totalSize) * 100).toFixed(2);
newTask = {
...newTask,
...taskListRef.current.find((item) => item.key === key),
size: loadedSize,
percent: curPercent >= 100 ? 99.99 : curPercent,
};
updateTaskList(newTask);
},
});
}
async function uploadFile({ task, files, totalSize }) {
console.log('[useSliceUpload] Calling preUpload with prefix:', task.prefix);
const { data: reqId } = await preUpload(task.key, {
totalFileNum: files.length,
totalSize,
datasetId: task.key,
hasArchive: task.hasArchive,
prefix: task.prefix,
});
console.log('[useSliceUpload] PreUpload response reqId:', reqId);
const newTask: TaskItem = {
...task,
reqId,
isCancel: false,
cancelFn: () => {
task.controller.abort();
cancelUpload?.(reqId);
if (task.updateEvent) window.dispatchEvent(new Event(task.updateEvent));
},
};
updateTaskList(newTask);
if (showTaskCenter) {
window.dispatchEvent(
new CustomEvent("show:task-popover", { detail: { show: true } })
);
}
// // 更新数据状态
if (task.updateEvent) window.dispatchEvent(new Event(task.updateEvent));
let loaded = 0;
for (let i = 0; i < files.length; i++) {
const { slices } = files[i];
for (let j = 0; j < slices.length; j++) {
await uploadSlice(newTask, {
loaded,
i,
j,
files,
totalSize,
});
loaded += slices[j].size;
}
}
removeTask(newTask);
}
const handleUpload = async ({ task, files }) => {
const isErrorFile = await checkIsFilesExist(files);
if (isErrorFile) {
message.error("文件被修改或删除,请重新选择文件上传");
removeTask({
...task,
isCancel: false,
...taskListRef.current.find((item) => item.key === task.key),
});
return;
}
try {
const totalSize = files.reduce((acc, file) => acc + file.size, 0);
await uploadFile({ task, files, totalSize });
} catch (err) {
console.error(err);
message.error("文件上传失败,请稍后重试");
removeTask({
...task,
isCancel: true,
...taskListRef.current.find((item) => item.key === task.key),
});
}
};
return {
taskList,
createTask,
removeTask,
handleUpload,
};
}
import { TaskItem } from "@/pages/DataManagement/dataset.model";
import { calculateSHA256, checkIsFilesExist, streamSplitAndUpload, StreamUploadResult } from "@/utils/file.util";
import { App } from "antd";
import { useRef, useState } from "react";
export function useFileSliceUpload(
{
preUpload,
uploadChunk,
cancelUpload,
}: {
preUpload: (id: string, params: Record<string, unknown>) => Promise<{ data: number }>;
uploadChunk: (id: string, formData: FormData, config: Record<string, unknown>) => Promise<unknown>;
cancelUpload: ((reqId: number) => Promise<unknown>) | null;
},
showTaskCenter = true, // 上传时是否显示任务中心
enableStreamUpload = true // 是否启用流式分割上传
) {
const { message } = App.useApp();
const [taskList, setTaskList] = useState<TaskItem[]>([]);
const taskListRef = useRef<TaskItem[]>([]); // 用于固定任务顺序
const createTask = (detail: Record<string, unknown> = {}) => {
const { dataset } = detail;
const title = `上传数据集: ${dataset.name} `;
const controller = new AbortController();
const task: TaskItem = {
key: dataset.id,
title,
percent: 0,
reqId: -1,
controller,
size: 0,
updateEvent: detail.updateEvent,
hasArchive: detail.hasArchive,
prefix: detail.prefix,
};
taskListRef.current = [task, ...taskListRef.current];
setTaskList(taskListRef.current);
// 立即显示任务中心,让用户感知上传已开始
if (showTaskCenter) {
window.dispatchEvent(
new CustomEvent("show:task-popover", { detail: { show: true } })
);
}
return task;
};
const updateTaskList = (task: TaskItem) => {
taskListRef.current = taskListRef.current.map((item) =>
item.key === task.key ? task : item
);
setTaskList(taskListRef.current);
};
const removeTask = (task: TaskItem) => {
const { key } = task;
taskListRef.current = taskListRef.current.filter(
(item) => item.key !== key
);
setTaskList(taskListRef.current);
if (task.isCancel && task.cancelFn) {
task.cancelFn();
}
if (task.updateEvent) {
// 携带前缀信息,便于刷新后仍停留在当前目录
window.dispatchEvent(
new CustomEvent(task.updateEvent, {
detail: { prefix: task.prefix },
})
);
}
if (showTaskCenter) {
window.dispatchEvent(
new CustomEvent("show:task-popover", { detail: { show: false } })
);
}
};
async function buildFormData({ file, reqId, i, j }: { file: { slices: Blob[]; name: string; size: number }; reqId: number; i: number; j: number }) {
const formData = new FormData();
const { slices, name, size } = file;
const checkSum = await calculateSHA256(slices[j]);
formData.append("file", slices[j]);
formData.append("reqId", reqId.toString());
formData.append("fileNo", (i + 1).toString());
formData.append("chunkNo", (j + 1).toString());
formData.append("fileName", name);
formData.append("fileSize", size.toString());
formData.append("totalChunkNum", slices.length.toString());
formData.append("checkSumHex", checkSum);
return formData;
}
async function uploadSlice(task: TaskItem, fileInfo: { loaded: number; i: number; j: number; files: { slices: Blob[]; name: string; size: number }[]; totalSize: number }) {
if (!task) {
return;
}
const { reqId, key, controller } = task;
const { loaded, i, j, files, totalSize } = fileInfo;
// 检查是否已取消
if (controller.signal.aborted) {
throw new Error("Upload cancelled");
}
const formData = await buildFormData({
file: files[i],
i,
j,
reqId,
});
let newTask = { ...task };
await uploadChunk(key, formData, {
signal: controller.signal,
onUploadProgress: (e) => {
const loadedSize = loaded + e.loaded;
const curPercent = Number((loadedSize / totalSize) * 100).toFixed(2);
newTask = {
...newTask,
...taskListRef.current.find((item) => item.key === key),
size: loadedSize,
percent: curPercent >= 100 ? 99.99 : curPercent,
};
updateTaskList(newTask);
},
});
}
async function uploadFile({ task, files, totalSize }: { task: TaskItem; files: { slices: Blob[]; name: string; size: number; originFile: Blob }[]; totalSize: number }) {
console.log('[useSliceUpload] Calling preUpload with prefix:', task.prefix);
const { data: reqId } = await preUpload(task.key, {
totalFileNum: files.length,
totalSize,
datasetId: task.key,
hasArchive: task.hasArchive,
prefix: task.prefix,
});
console.log('[useSliceUpload] PreUpload response reqId:', reqId);
const newTask: TaskItem = {
...task,
reqId,
isCancel: false,
cancelFn: () => {
// 使用 newTask 的 controller 确保一致性
newTask.controller.abort();
cancelUpload?.(reqId);
if (newTask.updateEvent) window.dispatchEvent(new Event(newTask.updateEvent));
},
};
updateTaskList(newTask);
// 注意:show:task-popover 事件已在 createTask 中触发,此处不再重复触发
// // 更新数据状态
if (task.updateEvent) window.dispatchEvent(new Event(task.updateEvent));
let loaded = 0;
for (let i = 0; i < files.length; i++) {
// 检查是否已取消
if (newTask.controller.signal.aborted) {
throw new Error("Upload cancelled");
}
const { slices } = files[i];
for (let j = 0; j < slices.length; j++) {
// 检查是否已取消
if (newTask.controller.signal.aborted) {
throw new Error("Upload cancelled");
}
await uploadSlice(newTask, {
loaded,
i,
j,
files,
totalSize,
});
loaded += slices[j].size;
}
}
removeTask(newTask);
}
const handleUpload = async ({ task, files }: { task: TaskItem; files: { slices: Blob[]; name: string; size: number; originFile: Blob }[] }) => {
const isErrorFile = await checkIsFilesExist(files);
if (isErrorFile) {
message.error("文件被修改或删除,请重新选择文件上传");
removeTask({
...task,
isCancel: false,
...taskListRef.current.find((item) => item.key === task.key),
});
return;
}
try {
const totalSize = files.reduce((acc, file) => acc + file.size, 0);
await uploadFile({ task, files, totalSize });
} catch (err) {
console.error(err);
message.error("文件上传失败,请稍后重试");
removeTask({
...task,
isCancel: true,
...taskListRef.current.find((item) => item.key === task.key),
});
}
};
/**
* 流式分割上传处理
* 用于大文件按行分割并立即上传的场景
*/
const handleStreamUpload = async ({ task, files }: { task: TaskItem; files: File[] }) => {
try {
console.log('[useSliceUpload] Starting stream upload for', files.length, 'files');
const totalSize = files.reduce((acc, file) => acc + file.size, 0);
// 存储所有文件的 reqId,用于取消上传
const reqIds: number[] = [];
const newTask: TaskItem = {
...task,
reqId: -1,
isCancel: false,
cancelFn: () => {
// 使用 newTask 的 controller 确保一致性
newTask.controller.abort();
// 取消所有文件的预上传请求
reqIds.forEach(id => cancelUpload?.(id));
if (newTask.updateEvent) window.dispatchEvent(new Event(newTask.updateEvent));
},
};
updateTaskList(newTask);
let totalUploadedLines = 0;
let totalProcessedBytes = 0;
const results: StreamUploadResult[] = [];
// 逐个处理文件,每个文件单独调用 preUpload
for (let i = 0; i < files.length; i++) {
// 检查是否已取消
if (newTask.controller.signal.aborted) {
throw new Error("Upload cancelled");
}
const file = files[i];
console.log(`[useSliceUpload] Processing file ${i + 1}/${files.length}: ${file.name}`);
// 为每个文件单独调用 preUpload,获取独立的 reqId
const { data: reqId } = await preUpload(task.key, {
totalFileNum: 1,
totalSize: file.size,
datasetId: task.key,
hasArchive: task.hasArchive,
prefix: task.prefix,
});
console.log(`[useSliceUpload] File ${file.name} preUpload response reqId:`, reqId);
reqIds.push(reqId);
const result = await streamSplitAndUpload(
file,
(formData, config) => uploadChunk(task.key, formData, {
...config,
signal: newTask.controller.signal,
}),
(currentBytes, totalBytes, uploadedLines) => {
// 检查是否已取消
if (newTask.controller.signal.aborted) {
return;
}
// 更新进度
const overallBytes = totalProcessedBytes + currentBytes;
const curPercent = Number((overallBytes / totalSize) * 100).toFixed(2);
const updatedTask: TaskItem = {
...newTask,
...taskListRef.current.find((item) => item.key === task.key),
size: overallBytes,
percent: curPercent >= 100 ? 99.99 : curPercent,
streamUploadInfo: {
currentFile: file.name,
fileIndex: i + 1,
totalFiles: files.length,
uploadedLines: totalUploadedLines + uploadedLines,
},
};
updateTaskList(updatedTask);
},
1024 * 1024, // 1MB chunk size
{
reqId,
hasArchive: newTask.hasArchive,
prefix: newTask.prefix,
signal: newTask.controller.signal,
maxConcurrency: 3,
}
);
results.push(result);
totalUploadedLines += result.uploadedCount;
totalProcessedBytes += file.size;
console.log(`[useSliceUpload] File ${file.name} processed, uploaded ${result.uploadedCount} lines`);
}
console.log('[useSliceUpload] Stream upload completed, total lines:', totalUploadedLines);
removeTask(newTask);
message.success(`成功上传 ${totalUploadedLines} 个文件(按行分割)`);
} catch (err) {
console.error('[useSliceUpload] Stream upload error:', err);
if (err.message === "Upload cancelled") {
message.info("上传已取消");
} else {
message.error("文件上传失败,请稍后重试");
}
removeTask({
...task,
isCancel: true,
...taskListRef.current.find((item) => item.key === task.key),
});
}
};
/**
* 注册流式上传事件监听
* 返回注销函数
*/
const registerStreamUploadListener = () => {
if (!enableStreamUpload) return () => {};
const streamUploadHandler = async (e: Event) => {
const customEvent = e as CustomEvent;
const { dataset, files, updateEvent, hasArchive, prefix } = customEvent.detail;
const controller = new AbortController();
const task: TaskItem = {
key: dataset.id,
title: `上传数据集: ${dataset.name} (按行分割)`,
percent: 0,
reqId: -1,
controller,
size: 0,
updateEvent,
hasArchive,
prefix,
};
taskListRef.current = [task, ...taskListRef.current];
setTaskList(taskListRef.current);
// 显示任务中心
if (showTaskCenter) {
window.dispatchEvent(
new CustomEvent("show:task-popover", { detail: { show: true } })
);
}
await handleStreamUpload({ task, files });
};
window.addEventListener("upload:dataset-stream", streamUploadHandler);
return () => {
window.removeEventListener("upload:dataset-stream", streamUploadHandler);
};
};
return {
taskList,
createTask,
removeTask,
handleUpload,
handleStreamUpload,
registerStreamUploadListener,
};
}

View File

@@ -3,7 +3,9 @@
* 通过 iframe 加载外部页面
*/
export default function ContentGenerationPage() {
const iframeUrl = "http://192.168.0.8:3000";
const iframeUrl = "/api#/meeting";
window.localStorage.setItem("geeker-user", '{"token":"123","userInfo":{"name":"xteam"},"loginFrom":null,"loginData":null}');
return (
<div className="h-full w-full flex flex-col">
@@ -16,6 +18,11 @@ export default function ContentGenerationPage() {
className="w-full h-full border-0"
title="内容生成"
sandbox="allow-same-origin allow-scripts allow-popups allow-forms allow-downloads"
style={{marginLeft: "-220px",
marginTop: "-66px",
width: "calc(100% + 233px)",
height: "calc(100% + 108px)"
}}
/>
</div>
</div>

View File

@@ -102,6 +102,7 @@ const NO_ANNOTATION_CONFIRM_TITLE = "没有标注任何内容";
const NO_ANNOTATION_CONFIRM_OK_TEXT = "设为无标注并保存";
const NOT_APPLICABLE_CONFIRM_TEXT = "设为不适用并保存";
const NO_ANNOTATION_CONFIRM_CANCEL_TEXT = "继续标注";
const SAVE_AND_NEXT_LABEL = "保存并跳转到下一段/下一条";
type NormalizedTaskList = {
items: EditorTaskListItem[];
@@ -117,6 +118,17 @@ const resolveSegmentIndex = (value: unknown) => {
return Number.isFinite(parsed) ? parsed : undefined;
};
const isSaveShortcut = (event: KeyboardEvent) => {
if (event.defaultPrevented || event.isComposing) return false;
const key = event.key;
const code = event.code;
const isS = key === "s" || key === "S" || code === "KeyS";
if (!isS) return false;
if (!(event.ctrlKey || event.metaKey)) return false;
if (event.shiftKey || event.altKey) return false;
return true;
};
const normalizePayload = (payload: unknown): ExportPayload | undefined => {
if (!payload || typeof payload !== "object") return undefined;
return payload as ExportPayload;
@@ -851,14 +863,27 @@ export default function LabelStudioTextEditor() {
});
}, [modal]);
const requestExport = () => {
const requestExport = useCallback((autoAdvance: boolean) => {
if (!selectedFileId) {
message.warning("请先选择文件");
return;
}
pendingAutoAdvanceRef.current = true;
pendingAutoAdvanceRef.current = autoAdvance;
postToIframe("LS_EXPORT", {});
};
}, [message, postToIframe, selectedFileId]);
useEffect(() => {
const handleSaveShortcut = (event: KeyboardEvent) => {
if (!isSaveShortcut(event) || event.repeat) return;
if (saving || loadingTaskDetail || segmentSwitching) return;
if (!iframeReady || !lsReady) return;
event.preventDefault();
event.stopPropagation();
requestExport(false);
};
window.addEventListener("keydown", handleSaveShortcut);
return () => window.removeEventListener("keydown", handleSaveShortcut);
}, [iframeReady, loadingTaskDetail, lsReady, requestExport, saving, segmentSwitching]);
// 段落切换处理
const handleSegmentChange = useCallback(async (newIndex: number) => {
@@ -1122,6 +1147,8 @@ export default function LabelStudioTextEditor() {
}, [message, origin, saveFromExport]);
const canLoadMore = taskTotalPages > 0 && taskPage + 1 < taskTotalPages;
const saveDisabled =
!iframeReady || !selectedFileId || saving || segmentSwitching || loadingTaskDetail;
const loadMoreNode = canLoadMore ? (
<div className="p-2 text-center">
<Button
@@ -1185,7 +1212,7 @@ export default function LabelStudioTextEditor() {
return (
<div className="h-full flex flex-col">
{/* 顶部工具栏 */}
<div className="flex items-center justify-between px-3 py-2 border-b border-gray-200 bg-white">
<div className="grid grid-cols-[1fr_auto_1fr] items-center px-3 py-2 border-b border-gray-200 bg-white">
<div className="flex items-center gap-2">
<Button icon={<LeftOutlined />} onClick={() => navigate("/data/annotation")}>
@@ -1199,7 +1226,18 @@ export default function LabelStudioTextEditor() {
</Typography.Title>
</div>
<div className="flex items-center gap-2">
<div className="flex items-center justify-center">
<Button
type="primary"
icon={<SaveOutlined />}
loading={saving}
disabled={saveDisabled}
onClick={() => requestExport(true)}
>
{SAVE_AND_NEXT_LABEL}
</Button>
</div>
<div className="flex items-center gap-2 justify-end">
<Button
icon={<ReloadOutlined />}
loading={loadingTasks}
@@ -1208,11 +1246,10 @@ export default function LabelStudioTextEditor() {
</Button>
<Button
type="primary"
icon={<SaveOutlined />}
loading={saving}
disabled={!iframeReady || !selectedFileId}
onClick={requestExport}
disabled={saveDisabled}
onClick={() => requestExport(false)}
>
</Button>

View File

@@ -19,7 +19,8 @@ import {
queryAnnotationTemplatesUsingGet,
} from "../../annotation.api";
import { DatasetType, type Dataset } from "@/pages/DataManagement/dataset.model";
import { DataType, type AnnotationTemplate, type AnnotationTask } from "../../annotation.model";
import { DataType, type AnnotationTemplate } from "../../annotation.model";
import type { AnnotationTaskListItem } from "../../annotation.const";
import LabelStudioEmbed from "@/components/business/LabelStudioEmbed";
import TemplateConfigurationTreeEditor from "../../components/TemplateConfigurationTreeEditor";
import { useTagConfig } from "@/hooks/useTagConfig";
@@ -29,7 +30,7 @@ interface AnnotationTaskDialogProps {
onClose: () => void;
onRefresh: () => void;
/** 编辑模式:传入要编辑的任务数据 */
editTask?: AnnotationTask | null;
editTask?: AnnotationTaskListItem | null;
}
type DatasetOption = Dataset & { icon?: ReactNode };
@@ -60,6 +61,7 @@ const isRecord = (value: unknown): value is Record<string, unknown> =>
const DEFAULT_SEGMENTATION_ENABLED = true;
const FILE_PREVIEW_MAX_HEIGHT = 500;
const PREVIEW_MODAL_WIDTH = "80vw";
const SEGMENTATION_OPTIONS = [
{ label: "需要切片段", value: true },
{ label: "不需要切片段", value: false },
@@ -828,7 +830,7 @@ export default function CreateAnnotationTask({
open={showPreview}
onCancel={() => setShowPreview(false)}
title="标注界面预览"
width={1000}
width={PREVIEW_MODAL_WIDTH}
footer={[
<Button key="close" onClick={() => setShowPreview(false)}>
@@ -853,7 +855,7 @@ export default function CreateAnnotationTask({
open={datasetPreviewVisible}
onCancel={() => setDatasetPreviewVisible(false)}
title="数据集预览(前10条文件)"
width={700}
width={PREVIEW_MODAL_WIDTH}
footer={[
<Button key="close" onClick={() => setDatasetPreviewVisible(false)}>
@@ -910,7 +912,7 @@ export default function CreateAnnotationTask({
setFileContent("");
}}
title={`文件预览:${previewFileName}`}
width={previewFileType === "text" ? 800 : 700}
width={PREVIEW_MODAL_WIDTH}
footer={[
<Button key="close" onClick={() => {
setFileContentVisible(false);

View File

@@ -1,5 +1,5 @@
import { useState } from "react";
import { Card, Button, Table, message, Modal, Tabs } from "antd";
import { Card, Button, Table, Tag, message, Modal, Tabs } from "antd";
import {
PlusOutlined,
EditOutlined,
@@ -15,7 +15,11 @@ import {
deleteAnnotationTaskByIdUsingDelete,
queryAnnotationTasksUsingGet,
} from "../annotation.api";
import { mapAnnotationTask, type AnnotationTaskListItem } from "../annotation.const";
import {
AnnotationTypeMap,
mapAnnotationTask,
type AnnotationTaskListItem,
} from "../annotation.const";
import CreateAnnotationTask from "../Create/components/CreateAnnotationTaskDialog";
import ExportAnnotationDialog from "./ExportAnnotationDialog";
import { ColumnType } from "antd/es/table";
@@ -53,6 +57,9 @@ export default function DataAnnotation() {
const [selectedRowKeys, setSelectedRowKeys] = useState<AnnotationTaskRowKey[]>([]);
const [selectedRows, setSelectedRows] = useState<AnnotationTaskListItem[]>([]);
const toSafeCount = (value: unknown) =>
typeof value === "number" && Number.isFinite(value) ? value : 0;
const handleAnnotate = (task: AnnotationTaskListItem) => {
const projectId = task.id;
if (!projectId) {
@@ -151,23 +158,44 @@ export default function DataAnnotation() {
];
const columns: ColumnType<AnnotationTaskListItem>[] = [
{
title: "序号",
key: "index",
width: 80,
align: "center" as const,
render: (_value: unknown, _record: AnnotationTaskListItem, index: number) => {
const current = pagination.current ?? 1;
const pageSize = pagination.pageSize ?? tableData.length ?? 0;
return (current - 1) * pageSize + index + 1;
},
},
{
title: "任务名称",
dataIndex: "name",
key: "name",
fixed: "left" as const,
},
{
title: "任务ID",
dataIndex: "id",
key: "id",
},
{
title: "数据集",
dataIndex: "datasetName",
key: "datasetName",
width: 180,
},
{
title: "标注类型",
dataIndex: "labelingType",
key: "labelingType",
width: 160,
render: (value?: string) => {
if (!value) {
return "-";
}
const label =
AnnotationTypeMap[value as keyof typeof AnnotationTypeMap]?.label ||
value;
return <Tag color="geekblue">{label}</Tag>;
},
},
{
title: "数据量",
dataIndex: "totalCount",
@@ -182,8 +210,20 @@ export default function DataAnnotation() {
width: 100,
align: "center" as const,
render: (value: number, record: AnnotationTaskListItem) => {
const total = record.totalCount || 0;
const annotated = value || 0;
const total = toSafeCount(record.totalCount ?? record.total_count);
const annotatedRaw = toSafeCount(
value ?? record.annotatedCount ?? record.annotated_count
);
const segmentationEnabled =
record.segmentationEnabled ?? record.segmentation_enabled;
const inProgressRaw = segmentationEnabled
? toSafeCount(record.inProgressCount ?? record.in_progress_count)
: 0;
const shouldExcludeInProgress =
total > 0 && annotatedRaw + inProgressRaw > total;
const annotated = shouldExcludeInProgress
? Math.max(annotatedRaw - inProgressRaw, 0)
: annotatedRaw;
const percent = total > 0 ? Math.round((annotated / total) * 100) : 0;
return (
<span title={`${annotated}/${total} (${percent}%)`}>

View File

@@ -43,14 +43,6 @@ const TemplateDetail: React.FC<TemplateDetailProps> = ({
<Descriptions.Item label="样式">
{template.style}
</Descriptions.Item>
<Descriptions.Item label="类型">
<Tag color={template.builtIn ? "gold" : "default"}>
{template.builtIn ? "系统内置" : "自定义"}
</Tag>
</Descriptions.Item>
<Descriptions.Item label="版本">
{template.version}
</Descriptions.Item>
<Descriptions.Item label="创建时间" span={2}>
{new Date(template.createdAt).toLocaleString()}
</Descriptions.Item>

View File

@@ -36,6 +36,7 @@ const TemplateForm: React.FC<TemplateFormProps> = ({
const [form] = Form.useForm();
const [loading, setLoading] = useState(false);
const [labelConfig, setLabelConfig] = useState("");
const selectedDataType = Form.useWatch("dataType", form);
useEffect(() => {
if (visible && template && mode === "edit") {
@@ -96,8 +97,12 @@ const TemplateForm: React.FC<TemplateFormProps> = ({
} else {
message.error(response.message || `模板${mode === "create" ? "创建" : "更新"}失败`);
}
} catch (error: any) {
if (error.errorFields) {
} catch (error: unknown) {
const hasErrorFields =
typeof error === "object" &&
error !== null &&
"errorFields" in error;
if (hasErrorFields) {
message.error("请填写所有必填字段");
} else {
message.error(`模板${mode === "create" ? "创建" : "更新"}失败`);
@@ -195,6 +200,7 @@ const TemplateForm: React.FC<TemplateFormProps> = ({
value={labelConfig}
onChange={setLabelConfig}
height={420}
dataType={selectedDataType}
/>
</div>
</Form>

View File

@@ -1,4 +1,4 @@
import React, { useState } from "react";
import React, { useState, useEffect } from "react";
import {
Button,
Table,
@@ -32,7 +32,16 @@ import {
TemplateTypeMap
} from "@/pages/DataAnnotation/annotation.const.tsx";
const TEMPLATE_ADMIN_KEY = "datamate_template_admin";
const TemplateList: React.FC = () => {
const [isAdmin, setIsAdmin] = useState(false);
useEffect(() => {
// 检查 localStorage 中是否存在特殊键
const hasAdminKey = localStorage.getItem(TEMPLATE_ADMIN_KEY) !== null;
setIsAdmin(hasAdminKey);
}, []);
const filterOptions = [
{
key: "category",
@@ -225,23 +234,7 @@ const TemplateList: React.FC = () => {
<Tag color={getCategoryColor(category)}>{ClassificationMap[category as keyof typeof ClassificationMap]?.label || category}</Tag>
),
},
{
title: "类型",
dataIndex: "builtIn",
key: "builtIn",
width: 100,
render: (builtIn: boolean) => (
<Tag color={builtIn ? "gold" : "default"}>
{builtIn ? "系统内置" : "自定义"}
</Tag>
),
},
{
title: "版本",
dataIndex: "version",
key: "version",
width: 80,
},
{
title: "创建时间",
dataIndex: "createdAt",
@@ -263,29 +256,31 @@ const TemplateList: React.FC = () => {
onClick={() => handleView(record)}
/>
</Tooltip>
<>
<Tooltip title="编辑">
<Button
type="link"
icon={<EditOutlined />}
onClick={() => handleEdit(record)}
/>
</Tooltip>
<Popconfirm
title="确定要删除这个模板吗?"
onConfirm={() => handleDelete(record.id)}
okText="确定"
cancelText="取消"
>
<Tooltip title="删除">
{isAdmin && (
<>
<Tooltip title="编辑">
<Button
type="link"
danger
icon={<DeleteOutlined />}
icon={<EditOutlined />}
onClick={() => handleEdit(record)}
/>
</Tooltip>
</Popconfirm>
</>
<Popconfirm
title="确定要删除这个模板吗?"
onConfirm={() => handleDelete(record.id)}
okText="确定"
cancelText="取消"
>
<Tooltip title="删除">
<Button
type="link"
danger
icon={<DeleteOutlined />}
/>
</Tooltip>
</Popconfirm>
</>
)}
</Space>
),
},
@@ -310,11 +305,13 @@ const TemplateList: React.FC = () => {
</div>
{/* Right side: Create button */}
<div className="flex items-center gap-2">
<Button type="primary" icon={<PlusOutlined />} onClick={handleCreate}>
</Button>
</div>
{isAdmin && (
<div className="flex items-center gap-2">
<Button type="primary" icon={<PlusOutlined />} onClick={handleCreate}>
</Button>
</div>
)}
</div>
<Card>

View File

@@ -18,6 +18,7 @@ import {
import { TagBrowser } from "./components";
const { Paragraph } = Typography;
const PREVIEW_DRAWER_WIDTH = "80vw";
interface VisualTemplateBuilderProps {
onSave?: (templateCode: string) => void;
@@ -129,7 +130,7 @@ const VisualTemplateBuilder: React.FC<VisualTemplateBuilderProps> = ({
<Drawer
title="模板代码预览"
placement="right"
width={600}
width={PREVIEW_DRAWER_WIDTH}
open={previewVisible}
onClose={() => setPreviewVisible(false)}
>

View File

@@ -23,6 +23,12 @@ type AnnotationTaskPayload = {
datasetId?: string;
datasetName?: string;
dataset_name?: string;
labelingType?: string;
labeling_type?: string;
template?: {
labelingType?: string;
labeling_type?: string;
};
totalCount?: number;
total_count?: number;
annotatedCount?: number;
@@ -48,6 +54,7 @@ export type AnnotationTaskListItem = {
description?: string;
datasetId?: string;
datasetName?: string;
labelingType?: string;
totalCount?: number;
annotatedCount?: number;
inProgressCount?: number;
@@ -90,6 +97,11 @@ export function mapAnnotationTask(task: AnnotationTaskPayload): AnnotationTaskLi
const labelingProjId = task?.labelingProjId || task?.labelingProjectId || task?.projId || task?.labeling_project_id || "";
const segmentationEnabled = task?.segmentationEnabled ?? task?.segmentation_enabled ?? false;
const inProgressCount = task?.inProgressCount ?? task?.in_progress_count ?? 0;
const labelingType =
task?.labelingType ||
task?.labeling_type ||
task?.template?.labelingType ||
task?.template?.labeling_type;
const statsArray = task?.statistics
? [
@@ -107,6 +119,7 @@ export function mapAnnotationTask(task: AnnotationTaskPayload): AnnotationTaskLi
projId: labelingProjId,
segmentationEnabled,
inProgressCount,
labelingType,
name: task.name,
description: task.description || "",
datasetName: task.datasetName || task.dataset_name || "-",

View File

@@ -22,6 +22,7 @@ import {
getObjectDisplayName,
type LabelStudioTagConfig,
} from "../annotation.tagconfig";
import { DataType } from "../annotation.model";
const { Text, Title } = Typography;
@@ -44,10 +45,22 @@ interface TemplateConfigurationTreeEditorProps {
readOnly?: boolean;
readOnlyStructure?: boolean;
height?: number | string;
dataType?: DataType;
}
const DEFAULT_ROOT_TAG = "View";
const CHILD_TAGS = ["Label", "Choice", "Relation", "Item", "Path", "Channel"];
const OBJECT_TAGS_BY_DATA_TYPE: Record<DataType, string[]> = {
[DataType.TEXT]: ["Text", "Paragraphs", "Markdown"],
[DataType.IMAGE]: ["Image", "Bitmask"],
[DataType.AUDIO]: ["Audio", "AudioPlus"],
[DataType.VIDEO]: ["Video"],
[DataType.PDF]: ["PDF"],
[DataType.TIMESERIES]: ["Timeseries", "TimeSeries", "Vector"],
[DataType.CHAT]: ["Chat"],
[DataType.HTML]: ["HyperText", "Markdown"],
[DataType.TABLE]: ["Table", "Vector"],
};
const createId = () =>
`node_${Date.now().toString(36)}_${Math.random().toString(36).slice(2, 8)}`;
@@ -247,18 +260,34 @@ const createNode = (
attrs[attr] = "";
});
if (objectConfig && attrs.name !== undefined) {
if (objectConfig) {
const name = getDefaultName(tag);
attrs.name = name;
if (attrs.value !== undefined) {
attrs.value = `$${name}`;
if (!attrs.name) {
attrs.name = name;
}
if (!attrs.value) {
attrs.value = `$${attrs.name}`;
}
}
if (controlConfig && attrs.name !== undefined) {
attrs.name = getDefaultName(tag);
if (attrs.toName !== undefined) {
attrs.toName = objectNames[0] || "";
if (controlConfig) {
const isLabeling = controlConfig.category === "labeling";
if (isLabeling) {
if (!attrs.name) {
attrs.name = getDefaultName(tag);
}
if (!attrs.toName) {
attrs.toName = objectNames[0] || "";
}
} else {
// For layout controls, only fill if required
if (attrs.name !== undefined && !attrs.name) {
attrs.name = getDefaultName(tag);
}
if (attrs.toName !== undefined && !attrs.toName) {
attrs.toName = objectNames[0] || "";
}
}
}
@@ -420,14 +449,13 @@ const TemplateConfigurationTreeEditor = ({
readOnly = false,
readOnlyStructure = false,
height = 420,
dataType,
}: TemplateConfigurationTreeEditorProps) => {
const { config } = useTagConfig(false);
const [tree, setTree] = useState<XmlNode>(() => createEmptyTree());
const [selectedId, setSelectedId] = useState<string>(tree.id);
const [parseError, setParseError] = useState<string | null>(null);
const lastSerialized = useRef<string>("");
const [addChildTag, setAddChildTag] = useState<string | undefined>();
const [addSiblingTag, setAddSiblingTag] = useState<string | undefined>();
useEffect(() => {
if (!value) {
@@ -498,11 +526,17 @@ const TemplateConfigurationTreeEditor = ({
const objectOptions = useMemo(() => {
if (!config?.objects) return [];
return Object.keys(config.objects).map((tag) => ({
const options = Object.keys(config.objects).map((tag) => ({
value: tag,
label: getObjectDisplayName(tag),
}));
}, [config]);
if (!dataType) return options;
const allowedTags = OBJECT_TAGS_BY_DATA_TYPE[dataType];
if (!allowedTags) return options;
const allowedSet = new Set(allowedTags);
const filtered = options.filter((option) => allowedSet.has(option.value));
return filtered.length > 0 ? filtered : options;
}, [config, dataType]);
const tagOptions = useMemo(() => {
const options = [] as {
@@ -763,9 +797,8 @@ const TemplateConfigurationTreeEditor = ({
<Select
placeholder="添加子节点"
options={tagOptions}
value={addChildTag}
value={null}
onChange={(value) => {
setAddChildTag(undefined);
handleAddNode(value, "child");
}}
disabled={isStructureLocked}
@@ -773,9 +806,8 @@ const TemplateConfigurationTreeEditor = ({
<Select
placeholder="添加同级节点"
options={tagOptions}
value={addSiblingTag}
value={null}
onChange={(value) => {
setAddSiblingTag(undefined);
handleAddNode(value, "sibling");
}}
disabled={isStructureLocked || selectedNode.id === tree.id}

View File

@@ -7,6 +7,8 @@ interface PreviewPromptModalProps {
evaluationPrompt: string;
}
const PREVIEW_MODAL_WIDTH = "80vw";
const PreviewPromptModal: React.FC<PreviewPromptModalProps> = ({ previewVisible, onCancel, evaluationPrompt }) => {
return (
<Modal
@@ -24,7 +26,7 @@ const PreviewPromptModal: React.FC<PreviewPromptModalProps> = ({ previewVisible,
</Button>
]}
width={800}
width={PREVIEW_MODAL_WIDTH}
>
<div style={{
background: '#f5f5f5',

View File

@@ -11,10 +11,12 @@ export default function BasicInformation({
data,
setData,
hidden = [],
datasetTypeOptions = datasetTypes,
}: {
data: DatasetFormData;
setData: Dispatch<SetStateAction<DatasetFormData>>;
hidden?: string[];
datasetTypeOptions?: DatasetTypeOption[];
}) {
const [tagOptions, setTagOptions] = useState<DatasetTagOption[]>([]);
const [collectionOptions, setCollectionOptions] = useState<SelectOption[]>([]);
@@ -119,7 +121,7 @@ export default function BasicInformation({
rules={[{ required: true, message: "请选择数据集类型" }]}
>
<RadioCard
options={datasetTypes}
options={datasetTypeOptions}
value={data.type}
onChange={(datasetType) => setData({ ...data, datasetType })}
/>
@@ -149,6 +151,8 @@ type DatasetFormData = Partial<Dataset> & {
parentDatasetId?: string;
};
type DatasetTypeOption = (typeof datasetTypes)[number];
type DatasetTagOption = {
label: string;
value: string;

View File

@@ -1,13 +1,13 @@
import { Select, Input, Form, Radio, Modal, Button, UploadFile, Switch, Tooltip } from "antd";
import { InboxOutlined, QuestionCircleOutlined } from "@ant-design/icons";
import { dataSourceOptions } from "../../dataset.const";
import { Dataset, DataSource } from "../../dataset.model";
import { Select, Input, Form, Radio, Modal, Button, UploadFile, Switch, Tooltip } from "antd";
import { InboxOutlined, QuestionCircleOutlined } from "@ant-design/icons";
import { dataSourceOptions } from "../../dataset.const";
import { Dataset, DatasetType, DataSource } from "../../dataset.model";
import { useCallback, useEffect, useMemo, useState } from "react";
import { queryTasksUsingGet } from "@/pages/DataCollection/collection.apis";
import { updateDatasetByIdUsingPut } from "../../dataset.api";
import { sliceFile } from "@/utils/file.util";
import Dragger from "antd/es/upload/Dragger";
import { queryTasksUsingGet } from "@/pages/DataCollection/collection.apis";
import { updateDatasetByIdUsingPut } from "../../dataset.api";
import { sliceFile, shouldStreamUpload } from "@/utils/file.util";
import Dragger from "antd/es/upload/Dragger";
const TEXT_FILE_MIME_PREFIX = "text/";
const TEXT_FILE_MIME_TYPES = new Set([
"application/json",
@@ -90,14 +90,16 @@ async function splitFileByLines(file: UploadFile): Promise<UploadFile[]> {
const lines = text.split(/\r?\n/).filter((line: string) => line.trim() !== "");
if (lines.length === 0) return [];
// 生成文件名:原文件名_序号.扩展名
// 生成文件名:原文件名_序号(不保留后缀)
const nameParts = file.name.split(".");
const ext = nameParts.length > 1 ? "." + nameParts.pop() : "";
if (nameParts.length > 1) {
nameParts.pop();
}
const baseName = nameParts.join(".");
const padLength = String(lines.length).length;
return lines.map((line: string, index: number) => {
const newFileName = `${baseName}_${String(index + 1).padStart(padLength, "0")}${ext}`;
const newFileName = `${baseName}_${String(index + 1).padStart(padLength, "0")}`;
const blob = new Blob([line], { type: "text/plain" });
const newFile = new File([blob], newFileName, { type: "text/plain" });
return {
@@ -131,18 +133,18 @@ type ImportConfig = {
};
export default function ImportConfiguration({
data,
open,
onClose,
updateEvent = "update:dataset",
prefix,
}: {
data: Dataset | null;
open: boolean;
onClose: () => void;
updateEvent?: string;
prefix?: string;
}) {
data,
open,
onClose,
updateEvent = "update:dataset",
prefix,
}: {
data: Dataset | null;
open: boolean;
onClose: () => void;
updateEvent?: string;
prefix?: string;
}) {
const [form] = Form.useForm();
const [collectionOptions, setCollectionOptions] = useState<SelectOption[]>([]);
const availableSourceOptions = dataSourceOptions.filter(
@@ -159,23 +161,82 @@ export default function ImportConfiguration({
if (files.length === 0) return false;
return files.some((file) => !isTextUploadFile(file));
}, [importConfig.files]);
// 本地上传文件相关逻辑
const handleUpload = async (dataset: Dataset) => {
let filesToUpload =
const isTextDataset = data?.datasetType === DatasetType.TEXT;
// 本地上传文件相关逻辑
const handleUpload = async (dataset: Dataset) => {
const filesToUpload =
(form.getFieldValue("files") as UploadFile[] | undefined) || [];
// 如果启用分行分割,处理文件
// 如果启用分行分割,对大文件使用流式处理
if (importConfig.splitByLine && !hasNonTextFile) {
const splitResults = await Promise.all(
filesToUpload.map((file) => splitFileByLines(file))
);
filesToUpload = splitResults.flat();
// 检查是否有大文件需要流式分割上传
const filesForStreamUpload: File[] = [];
const filesForNormalUpload: UploadFile[] = [];
for (const file of filesToUpload) {
const originFile = file.originFileObj ?? file;
if (originFile instanceof File && shouldStreamUpload(originFile)) {
filesForStreamUpload.push(originFile);
} else {
filesForNormalUpload.push(file);
}
}
// 大文件使用流式分割上传
if (filesForStreamUpload.length > 0) {
window.dispatchEvent(
new CustomEvent("upload:dataset-stream", {
detail: {
dataset,
files: filesForStreamUpload,
updateEvent,
hasArchive: importConfig.hasArchive,
prefix: currentPrefix,
},
})
);
}
// 小文件使用传统分割方式
if (filesForNormalUpload.length > 0) {
const splitResults = await Promise.all(
filesForNormalUpload.map((file) => splitFileByLines(file))
);
const smallFilesToUpload = splitResults.flat();
// 计算分片列表
const sliceList = smallFilesToUpload.map((file) => {
const originFile = (file.originFileObj ?? file) as Blob;
const slices = sliceFile(originFile);
return {
originFile: originFile,
slices,
name: file.name,
size: originFile.size || 0,
};
});
console.log("[ImportConfiguration] Uploading small files with currentPrefix:", currentPrefix);
window.dispatchEvent(
new CustomEvent("upload:dataset", {
detail: {
dataset,
files: sliceList,
updateEvent,
hasArchive: importConfig.hasArchive,
prefix: currentPrefix,
},
})
);
}
return;
}
// 计算分片列表
const sliceList = filesToUpload.map((file) => {
// 未启用分行分割,使用普通上传
// 计算分片列表
const sliceList = filesToUpload.map((file) => {
const originFile = (file.originFileObj ?? file) as Blob;
const slices = sliceFile(originFile);
return {
@@ -184,22 +245,22 @@ export default function ImportConfiguration({
name: file.name,
size: originFile.size || 0,
};
});
console.log("[ImportConfiguration] Uploading with currentPrefix:", currentPrefix);
window.dispatchEvent(
new CustomEvent("upload:dataset", {
detail: {
dataset,
files: sliceList,
updateEvent,
hasArchive: importConfig.hasArchive,
prefix: currentPrefix,
},
})
);
};
});
console.log("[ImportConfiguration] Uploading with currentPrefix:", currentPrefix);
window.dispatchEvent(
new CustomEvent("upload:dataset", {
detail: {
dataset,
files: sliceList,
updateEvent,
hasArchive: importConfig.hasArchive,
prefix: currentPrefix,
},
})
);
};
const fetchCollectionTasks = useCallback(async () => {
if (importConfig.source !== DataSource.COLLECTION) return;
try {
@@ -211,7 +272,7 @@ export default function ImportConfiguration({
label: task.name,
value: task.id,
}));
setCollectionOptions(options);
setCollectionOptions(options);
} catch (error) {
console.error("Error fetching collection tasks:", error);
}
@@ -228,27 +289,31 @@ export default function ImportConfiguration({
});
console.log('[ImportConfiguration] resetState done, currentPrefix still:', currentPrefix);
}, [currentPrefix, form]);
const handleImportData = async () => {
if (!data) return;
console.log('[ImportConfiguration] handleImportData called, currentPrefix:', currentPrefix);
if (importConfig.source === DataSource.UPLOAD) {
await handleUpload(data);
} else if (importConfig.source === DataSource.COLLECTION) {
await updateDatasetByIdUsingPut(data.id, {
...importConfig,
});
}
onClose();
};
const handleImportData = async () => {
if (!data) return;
console.log('[ImportConfiguration] handleImportData called, currentPrefix:', currentPrefix);
if (importConfig.source === DataSource.UPLOAD) {
// 立即显示任务中心,让用户感知上传已开始(在文件分割等耗时操作之前)
window.dispatchEvent(
new CustomEvent("show:task-popover", { detail: { show: true } })
);
await handleUpload(data);
} else if (importConfig.source === DataSource.COLLECTION) {
await updateDatasetByIdUsingPut(data.id, {
...importConfig,
});
}
onClose();
};
useEffect(() => {
if (open) {
setCurrentPrefix(prefix || "");
console.log('[ImportConfiguration] Modal opened with prefix:', prefix);
resetState();
fetchCollectionTasks();
}
console.log('[ImportConfiguration] Modal opened with prefix:', prefix);
resetState();
fetchCollectionTasks();
}
}, [fetchCollectionTasks, open, prefix, resetState]);
useEffect(() => {
@@ -258,135 +323,137 @@ export default function ImportConfiguration({
form.setFieldsValue({ splitByLine: false });
setImportConfig((prev) => ({ ...prev, splitByLine: false }));
}, [form, hasNonTextFile, importConfig.files, importConfig.splitByLine]);
// Separate effect for fetching collection tasks when source changes
useEffect(() => {
if (open && importConfig.source === DataSource.COLLECTION) {
fetchCollectionTasks();
}
// Separate effect for fetching collection tasks when source changes
useEffect(() => {
if (open && importConfig.source === DataSource.COLLECTION) {
fetchCollectionTasks();
}
}, [fetchCollectionTasks, importConfig.source, open]);
return (
<Modal
title="导入数据"
open={open}
width={600}
onCancel={() => {
onClose();
resetState();
}}
maskClosable={false}
footer={
<>
<Button onClick={onClose}></Button>
<Button
type="primary"
disabled={!importConfig?.files?.length && !importConfig.dataSource}
onClick={handleImportData}
>
</Button>
</>
}
>
<Form
form={form}
layout="vertical"
initialValues={importConfig || {}}
onValuesChange={(_, allValues) => setImportConfig(allValues)}
>
<Form.Item
label="数据源"
name="source"
rules={[{ required: true, message: "请选择数据源" }]}
>
return (
<Modal
title="导入数据"
open={open}
width={600}
onCancel={() => {
onClose();
resetState();
}}
maskClosable={false}
footer={
<>
<Button onClick={onClose}></Button>
<Button
type="primary"
disabled={!importConfig?.files?.length && !importConfig.dataSource}
onClick={handleImportData}
>
</Button>
</>
}
>
<Form
form={form}
layout="vertical"
initialValues={importConfig || {}}
onValuesChange={(_, allValues) => setImportConfig(allValues)}
>
<Form.Item
label="数据源"
name="source"
rules={[{ required: true, message: "请选择数据源" }]}
>
<Radio.Group
buttonStyle="solid"
options={availableSourceOptions}
optionType="button"
/>
</Form.Item>
{importConfig?.source === DataSource.COLLECTION && (
<Form.Item name="dataSource" label="归集任务" required>
<Select placeholder="请选择归集任务" options={collectionOptions} />
</Form.Item>
)}
{/* obs import */}
{importConfig?.source === DataSource.OBS && (
<div className="grid grid-cols-2 gap-3 p-4 bg-blue-50 rounded-lg">
<Form.Item
name="endpoint"
rules={[{ required: true }]}
label="Endpoint"
>
<Input
className="h-8 text-xs"
placeholder="obs.cn-north-4.myhuaweicloud.com"
/>
</Form.Item>
<Form.Item
name="bucket"
rules={[{ required: true }]}
label="Bucket"
>
<Input className="h-8 text-xs" placeholder="my-bucket" />
</Form.Item>
<Form.Item
name="accessKey"
rules={[{ required: true }]}
label="Access Key"
>
<Input className="h-8 text-xs" placeholder="Access Key" />
</Form.Item>
<Form.Item
name="secretKey"
rules={[{ required: true }]}
label="Secret Key"
>
<Input
type="password"
className="h-8 text-xs"
placeholder="Secret Key"
/>
</Form.Item>
</div>
)}
{/* Local Upload Component */}
{importConfig?.source === DataSource.UPLOAD && (
<>
<Form.Item
label="自动解压上传的压缩包"
name="hasArchive"
valuePropName="checked"
>
<Switch />
</Form.Item>
{importConfig?.source === DataSource.COLLECTION && (
<Form.Item name="dataSource" label="归集任务" required>
<Select placeholder="请选择归集任务" options={collectionOptions} />
</Form.Item>
)}
{/* obs import */}
{importConfig?.source === DataSource.OBS && (
<div className="grid grid-cols-2 gap-3 p-4 bg-blue-50 rounded-lg">
<Form.Item
label={
<span>
{" "}
<Tooltip
title={
hasNonTextFile
? "已选择非文本文件,无法按行分割"
: "选中后,文本文件的每一行将被分割成独立文件"
}
>
<QuestionCircleOutlined style={{ color: "#999" }} />
</Tooltip>
</span>
}
name="splitByLine"
name="endpoint"
rules={[{ required: true }]}
label="Endpoint"
>
<Input
className="h-8 text-xs"
placeholder="obs.cn-north-4.myhuaweicloud.com"
/>
</Form.Item>
<Form.Item
name="bucket"
rules={[{ required: true }]}
label="Bucket"
>
<Input className="h-8 text-xs" placeholder="my-bucket" />
</Form.Item>
<Form.Item
name="accessKey"
rules={[{ required: true }]}
label="Access Key"
>
<Input className="h-8 text-xs" placeholder="Access Key" />
</Form.Item>
<Form.Item
name="secretKey"
rules={[{ required: true }]}
label="Secret Key"
>
<Input
type="password"
className="h-8 text-xs"
placeholder="Secret Key"
/>
</Form.Item>
</div>
)}
{/* Local Upload Component */}
{importConfig?.source === DataSource.UPLOAD && (
<>
<Form.Item
label="自动解压上传的压缩包"
name="hasArchive"
valuePropName="checked"
>
<Switch disabled={hasNonTextFile} />
<Switch />
</Form.Item>
<Form.Item
label="上传文件"
name="files"
valuePropName="fileList"
{isTextDataset && (
<Form.Item
label={
<span>
{" "}
<Tooltip
title={
hasNonTextFile
? "已选择非文本文件,无法按行分割"
: "选中后,文本文件的每一行将被分割成独立文件"
}
>
<QuestionCircleOutlined style={{ color: "#999" }} />
</Tooltip>
</span>
}
name="splitByLine"
valuePropName="checked"
>
<Switch disabled={hasNonTextFile} />
</Form.Item>
)}
<Form.Item
label="上传文件"
name="files"
valuePropName="fileList"
getValueFromEvent={(
event: { fileList?: UploadFile[] } | UploadFile[]
) => {
@@ -395,69 +462,69 @@ export default function ImportConfiguration({
}
return event?.fileList;
}}
rules={[
{
required: true,
message: "请上传文件",
},
]}
>
<Dragger
className="w-full"
beforeUpload={() => false}
multiple
>
<p className="ant-upload-drag-icon">
<InboxOutlined />
</p>
<p className="ant-upload-text"></p>
<p className="ant-upload-hint"></p>
</Dragger>
</Form.Item>
</>
)}
{/* Target Configuration */}
{importConfig?.target && importConfig?.target !== DataSource.UPLOAD && (
<div className="space-y-3 p-4 bg-blue-50 rounded-lg">
{importConfig?.target === DataSource.DATABASE && (
<div className="grid grid-cols-2 gap-3">
<Form.Item
name="databaseType"
rules={[{ required: true }]}
label="数据库类型"
>
<Select
className="w-full"
options={[
{ label: "MySQL", value: "mysql" },
{ label: "PostgreSQL", value: "postgresql" },
{ label: "MongoDB", value: "mongodb" },
]}
></Select>
</Form.Item>
<Form.Item
name="tableName"
rules={[{ required: true }]}
label="表名"
>
<Input className="h-8 text-xs" placeholder="dataset_table" />
</Form.Item>
<Form.Item
name="connectionString"
rules={[{ required: true }]}
label="连接字符串"
>
<Input
className="h-8 text-xs col-span-2"
placeholder="数据库连接字符串"
/>
</Form.Item>
</div>
)}
</div>
)}
</Form>
</Modal>
);
}
rules={[
{
required: true,
message: "请上传文件",
},
]}
>
<Dragger
className="w-full"
beforeUpload={() => false}
multiple
>
<p className="ant-upload-drag-icon">
<InboxOutlined />
</p>
<p className="ant-upload-text"></p>
<p className="ant-upload-hint"></p>
</Dragger>
</Form.Item>
</>
)}
{/* Target Configuration */}
{importConfig?.target && importConfig?.target !== DataSource.UPLOAD && (
<div className="space-y-3 p-4 bg-blue-50 rounded-lg">
{importConfig?.target === DataSource.DATABASE && (
<div className="grid grid-cols-2 gap-3">
<Form.Item
name="databaseType"
rules={[{ required: true }]}
label="数据库类型"
>
<Select
className="w-full"
options={[
{ label: "MySQL", value: "mysql" },
{ label: "PostgreSQL", value: "postgresql" },
{ label: "MongoDB", value: "mongodb" },
]}
></Select>
</Form.Item>
<Form.Item
name="tableName"
rules={[{ required: true }]}
label="表名"
>
<Input className="h-8 text-xs" placeholder="dataset_table" />
</Form.Item>
<Form.Item
name="connectionString"
rules={[{ required: true }]}
label="连接字符串"
>
<Input
className="h-8 text-xs col-span-2"
placeholder="数据库连接字符串"
/>
</Form.Item>
</div>
)}
</div>
)}
</Form>
</Modal>
);
}

View File

@@ -1,12 +1,13 @@
import {
App,
Button,
Descriptions,
DescriptionsProps,
Modal,
Table,
Input,
} from "antd";
import {
App,
Button,
Descriptions,
DescriptionsProps,
Modal,
Spin,
Table,
Input,
} from "antd";
import { formatBytes, formatDateTime } from "@/utils/unit";
import { Download, Trash2, Folder, File } from "lucide-react";
import { datasetTypeMap } from "../../dataset.const";
@@ -20,10 +21,10 @@ type DatasetFileRow = DatasetFile & {
};
const PREVIEW_MAX_HEIGHT = 500;
const PREVIEW_MODAL_WIDTH = {
text: 800,
media: 700,
};
const PREVIEW_MODAL_WIDTH = {
text: "80vw",
media: "80vw",
};
const PREVIEW_TEXT_FONT_SIZE = 12;
const PREVIEW_TEXT_PADDING = 12;
const PREVIEW_AUDIO_PADDING = 40;
@@ -49,10 +50,12 @@ export default function Overview({
previewVisible,
previewFileName,
previewContent,
previewFileType,
previewMediaUrl,
previewLoading,
closePreview,
previewFileType,
previewMediaUrl,
previewLoading,
officePreviewStatus,
officePreviewError,
closePreview,
handleDeleteFile,
handleDownloadFile,
handleBatchDeleteFiles,
@@ -446,13 +449,41 @@ export default function Overview({
/>
</div>
)}
{previewFileType === "pdf" && (
<iframe
src={previewMediaUrl}
title={previewFileName || "PDF 预览"}
style={{ width: "100%", height: `${PREVIEW_MAX_HEIGHT}px`, border: "none" }}
/>
)}
{previewFileType === "pdf" && (
<>
{previewMediaUrl ? (
<iframe
src={previewMediaUrl}
title={previewFileName || "PDF 预览"}
style={{ width: "100%", height: `${PREVIEW_MAX_HEIGHT}px`, border: "none" }}
/>
) : (
<div
style={{
height: `${PREVIEW_MAX_HEIGHT}px`,
display: "flex",
flexDirection: "column",
alignItems: "center",
justifyContent: "center",
gap: 12,
color: "#666",
}}
>
{officePreviewStatus === "FAILED" ? (
<>
<div></div>
<div>{officePreviewError || "请稍后重试"}</div>
</>
) : (
<>
<Spin />
<div>...</div>
</>
)}
</div>
)}
</>
)}
{previewFileType === "video" && (
<div style={{ textAlign: "center" }}>
<video

View File

@@ -2,27 +2,50 @@ import type {
Dataset,
DatasetFile,
} from "@/pages/DataManagement/dataset.model";
import { DatasetType } from "@/pages/DataManagement/dataset.model";
import { App } from "antd";
import { useState } from "react";
import { useCallback, useEffect, useRef, useState } from "react";
import {
PREVIEW_TEXT_MAX_LENGTH,
resolvePreviewFileType,
truncatePreviewText,
type PreviewFileType,
} from "@/utils/filePreview";
import {
deleteDatasetFileUsingDelete,
downloadFileByIdUsingGet,
exportDatasetUsingPost,
queryDatasetFilesUsingGet,
createDatasetDirectoryUsingPost,
downloadDirectoryUsingGet,
deleteDirectoryUsingDelete,
} from "../dataset.api";
import {
deleteDatasetFileUsingDelete,
downloadFileByIdUsingGet,
exportDatasetUsingPost,
queryDatasetFilesUsingGet,
createDatasetDirectoryUsingPost,
downloadDirectoryUsingGet,
deleteDirectoryUsingDelete,
queryDatasetFilePreviewStatusUsingGet,
convertDatasetFilePreviewUsingPost,
} from "../dataset.api";
import { useParams } from "react-router";
const OFFICE_FILE_EXTENSIONS = [".doc", ".docx"];
const OFFICE_PREVIEW_POLL_INTERVAL = 2000;
const OFFICE_PREVIEW_POLL_MAX_TIMES = 60;
type OfficePreviewStatus = "UNSET" | "PENDING" | "PROCESSING" | "READY" | "FAILED";
const isOfficeFileName = (fileName?: string) => {
const lowerName = (fileName || "").toLowerCase();
return OFFICE_FILE_EXTENSIONS.some((ext) => lowerName.endsWith(ext));
};
const normalizeOfficePreviewStatus = (status?: string): OfficePreviewStatus => {
if (!status) {
return "UNSET";
}
const upper = status.toUpperCase();
if (upper === "PENDING" || upper === "PROCESSING" || upper === "READY" || upper === "FAILED") {
return upper as OfficePreviewStatus;
}
return "UNSET";
};
export function useFilesOperation(dataset: Dataset) {
const { message } = App.useApp();
const { id } = useParams(); // 获取动态路由参数
@@ -44,6 +67,23 @@ export function useFilesOperation(dataset: Dataset) {
const [previewFileType, setPreviewFileType] = useState<PreviewFileType>("text");
const [previewMediaUrl, setPreviewMediaUrl] = useState("");
const [previewLoading, setPreviewLoading] = useState(false);
const [officePreviewStatus, setOfficePreviewStatus] = useState<OfficePreviewStatus | null>(null);
const [officePreviewError, setOfficePreviewError] = useState("");
const officePreviewPollingRef = useRef<number | null>(null);
const officePreviewFileRef = useRef<string | null>(null);
const clearOfficePreviewPolling = useCallback(() => {
if (officePreviewPollingRef.current) {
window.clearTimeout(officePreviewPollingRef.current);
officePreviewPollingRef.current = null;
}
}, []);
useEffect(() => {
return () => {
clearOfficePreviewPolling();
};
}, [clearOfficePreviewPolling]);
const fetchFiles = async (
prefix?: string,
@@ -52,14 +92,13 @@ export function useFilesOperation(dataset: Dataset) {
) => {
// 如果明确传了 prefix(包括空字符串),使用传入的值;否则使用当前 pagination.prefix
const targetPrefix = prefix !== undefined ? prefix : (pagination.prefix || '');
const shouldExcludeDerivedFiles = dataset?.datasetType === DatasetType.TEXT;
const params: DatasetFilesQueryParams = {
page: current !== undefined ? current : pagination.current,
size: pageSize !== undefined ? pageSize : pagination.pageSize,
isWithDirectory: true,
prefix: targetPrefix,
...(shouldExcludeDerivedFiles ? { excludeDerivedFiles: true } : {}),
excludeDerivedFiles: true,
};
const { data } = await queryDatasetFilesUsingGet(id!, params);
@@ -113,17 +152,61 @@ export function useFilesOperation(dataset: Dataset) {
return;
}
const previewUrl = `/api/data-management/datasets/${datasetId}/files/${file.id}/preview`;
setPreviewFileName(file.fileName);
setPreviewContent("");
setPreviewMediaUrl("");
if (isOfficeFileName(file?.fileName)) {
setPreviewFileType("pdf");
setPreviewVisible(true);
setPreviewLoading(true);
setOfficePreviewStatus("PROCESSING");
setOfficePreviewError("");
officePreviewFileRef.current = file.id;
try {
const { data: statusData } = await queryDatasetFilePreviewStatusUsingGet(datasetId, file.id);
const currentStatus = normalizeOfficePreviewStatus(statusData?.status);
if (currentStatus === "READY") {
setPreviewMediaUrl(previewUrl);
setOfficePreviewStatus("READY");
setPreviewLoading(false);
return;
}
if (currentStatus === "PROCESSING") {
pollOfficePreviewStatus(datasetId, file.id, 0);
return;
}
const { data } = await convertDatasetFilePreviewUsingPost(datasetId, file.id);
const status = normalizeOfficePreviewStatus(data?.status);
if (status === "READY") {
setPreviewMediaUrl(previewUrl);
setOfficePreviewStatus("READY");
} else if (status === "FAILED") {
setOfficePreviewStatus("FAILED");
setOfficePreviewError(data?.previewError || "转换失败,请稍后重试");
} else {
setOfficePreviewStatus("PROCESSING");
pollOfficePreviewStatus(datasetId, file.id, 0);
return;
}
} catch (error) {
console.error("触发预览转换失败", error);
message.error({ content: "触发预览转换失败" });
setOfficePreviewStatus("FAILED");
setOfficePreviewError("触发预览转换失败");
} finally {
setPreviewLoading(false);
}
return;
}
const fileType = resolvePreviewFileType(file?.fileName);
if (!fileType) {
message.warning({ content: "不支持预览该文件类型" });
return;
}
const previewUrl = `/api/data-management/datasets/${datasetId}/files/${file.id}/preview`;
setPreviewFileName(file.fileName);
setPreviewFileType(fileType);
setPreviewContent("");
setPreviewMediaUrl("");
if (fileType === "text") {
setPreviewLoading(true);
@@ -149,13 +232,62 @@ export function useFilesOperation(dataset: Dataset) {
};
const closePreview = () => {
clearOfficePreviewPolling();
officePreviewFileRef.current = null;
setPreviewVisible(false);
setPreviewContent("");
setPreviewMediaUrl("");
setPreviewFileName("");
setPreviewFileType("text");
setOfficePreviewStatus(null);
setOfficePreviewError("");
};
const pollOfficePreviewStatus = useCallback(
async (datasetId: string, fileId: string, attempt: number) => {
clearOfficePreviewPolling();
officePreviewPollingRef.current = window.setTimeout(async () => {
if (officePreviewFileRef.current !== fileId) {
return;
}
try {
const { data } = await queryDatasetFilePreviewStatusUsingGet(datasetId, fileId);
const status = normalizeOfficePreviewStatus(data?.status);
if (status === "READY") {
setPreviewMediaUrl(`/api/data-management/datasets/${datasetId}/files/${fileId}/preview`);
setOfficePreviewStatus("READY");
setOfficePreviewError("");
setPreviewLoading(false);
return;
}
if (status === "FAILED") {
setOfficePreviewStatus("FAILED");
setOfficePreviewError(data?.previewError || "转换失败,请稍后重试");
setPreviewLoading(false);
return;
}
if (attempt >= OFFICE_PREVIEW_POLL_MAX_TIMES - 1) {
setOfficePreviewStatus("FAILED");
setOfficePreviewError("转换超时,请稍后重试");
setPreviewLoading(false);
return;
}
pollOfficePreviewStatus(datasetId, fileId, attempt + 1);
} catch (error) {
console.error("轮询预览状态失败", error);
if (attempt >= OFFICE_PREVIEW_POLL_MAX_TIMES - 1) {
setOfficePreviewStatus("FAILED");
setOfficePreviewError("转换超时,请稍后重试");
setPreviewLoading(false);
return;
}
pollOfficePreviewStatus(datasetId, fileId, attempt + 1);
}
}, OFFICE_PREVIEW_POLL_INTERVAL);
},
[clearOfficePreviewPolling]
);
const handleDeleteFile = async (file: DatasetFile) => {
try {
await deleteDatasetFileUsingDelete(dataset.id, file.id);
@@ -198,6 +330,8 @@ export function useFilesOperation(dataset: Dataset) {
previewFileType,
previewMediaUrl,
previewLoading,
officePreviewStatus,
officePreviewError,
closePreview,
fetchFiles,
setFileList,

View File

@@ -8,8 +8,8 @@ import {
} from "@ant-design/icons";
import TagManager from "@/components/business/TagManagement";
import { Link, useNavigate } from "react-router";
import { useEffect, useMemo, useState } from "react";
import type { ReactNode } from "react";
import { useEffect, useMemo, useState } from "react";
import type { ReactNode } from "react";
import { SearchControls } from "@/components/SearchControls";
import CardView from "@/components/CardView";
import type { Dataset } from "@/pages/DataManagement/dataset.model";
@@ -36,19 +36,19 @@ export default function DatasetManagementPage() {
const [editDatasetOpen, setEditDatasetOpen] = useState(false);
const [currentDataset, setCurrentDataset] = useState<Dataset | null>(null);
const [showUploadDialog, setShowUploadDialog] = useState(false);
const [statisticsData, setStatisticsData] = useState<StatisticsData>({
count: [],
size: [],
});
const [statisticsData, setStatisticsData] = useState<StatisticsData>({
count: [],
size: [],
});
async function fetchStatistics() {
const { data } = await getDatasetStatisticsUsingGet();
const statistics: StatisticsData = {
size: [
{
title: "数据集总数",
value: data?.totalDatasets || 0,
const statistics: StatisticsData = {
size: [
{
title: "数据集总数",
value: data?.totalDatasets || 0,
},
{
title: "文件总数",
@@ -76,10 +76,10 @@ export default function DatasetManagementPage() {
title: "视频",
value: data?.count?.video || 0,
},
],
};
setStatisticsData(statistics);
}
],
};
setStatisticsData(statistics);
}
const [tags, setTags] = useState<string[]>([]);
@@ -136,9 +136,9 @@ export default function DatasetManagementPage() {
message.success("数据集下载成功");
};
const handleDeleteDataset = async (id: string) => {
if (!id) return;
await deleteDatasetByIdUsingDelete(id);
const handleDeleteDataset = async (id: string) => {
if (!id) return;
await deleteDatasetByIdUsingDelete(id);
fetchData({ pageOffset: 0 });
message.success("数据删除成功");
};
@@ -223,12 +223,12 @@ export default function DatasetManagementPage() {
title: "状态",
dataIndex: "status",
key: "status",
render: (status: DatasetStatusMeta) => {
return (
<Tag icon={status?.icon} color={status?.color}>
{status?.label}
</Tag>
);
render: (status: DatasetStatusMeta) => {
return (
<Tag icon={status?.icon} color={status?.color}>
{status?.label}
</Tag>
);
},
width: 120,
},
@@ -274,10 +274,10 @@ export default function DatasetManagementPage() {
key: "actions",
width: 200,
fixed: "right",
render: (_: unknown, record: Dataset) => (
<div className="flex items-center gap-2">
{operations.map((op) => (
<Tooltip key={op.key} title={op.label}>
render: (_: unknown, record: Dataset) => (
<div className="flex items-center gap-2">
{operations.map((op) => (
<Tooltip key={op.key} title={op.label}>
<Button
type="text"
icon={op.icon}
@@ -329,7 +329,7 @@ export default function DatasetManagementPage() {
<div className="gap-4 h-full flex flex-col">
{/* Header */}
<div className="flex items-center justify-between">
<h1 className="text-xl font-bold"></h1>
<h1 className="text-xl font-bold"></h1>
<div className="flex gap-2 items-center">
{/* tasks */}
<TagManager
@@ -353,13 +353,13 @@ export default function DatasetManagementPage() {
<div className="grid grid-cols-1 gap-4">
<Card>
<div className="grid grid-cols-3">
{statisticsData.size.map((item) => (
<Statistic
title={item.title}
key={item.title}
value={`${item.value}`}
/>
))}
{statisticsData.size.map((item) => (
<Statistic
title={item.title}
key={item.title}
value={`${item.value}`}
/>
))}
</div>
</Card>
</div>
@@ -396,22 +396,22 @@ export default function DatasetManagementPage() {
updateEvent="update:datasets"
/>
</div>
);
}
type StatisticsItem = {
title: string;
value: number | string;
};
type StatisticsData = {
count: StatisticsItem[];
size: StatisticsItem[];
};
type DatasetStatusMeta = {
label: string;
value: string;
color: string;
icon: ReactNode;
};
);
}
type StatisticsItem = {
title: string;
value: number | string;
};
type StatisticsData = {
count: StatisticsItem[];
size: StatisticsItem[];
};
type DatasetStatusMeta = {
label: string;
value: string;
color: string;
icon: ReactNode;
};

View File

@@ -107,17 +107,33 @@ export function deleteDirectoryUsingDelete(
return del(`/api/data-management/datasets/${id}/files/directories?prefix=${encodeURIComponent(directoryPath)}`);
}
export function downloadFileByIdUsingGet(
id: string | number,
fileId: string | number,
fileName: string
) {
return download(
`/api/data-management/datasets/${id}/files/${fileId}/download`,
null,
fileName
);
}
export function downloadFileByIdUsingGet(
id: string | number,
fileId: string | number,
fileName: string
) {
return download(
`/api/data-management/datasets/${id}/files/${fileId}/download`,
null,
fileName
);
}
// 数据集文件预览状态
export function queryDatasetFilePreviewStatusUsingGet(
datasetId: string | number,
fileId: string | number
) {
return get(`/api/data-management/datasets/${datasetId}/files/${fileId}/preview/status`);
}
// 触发数据集文件预览转换
export function convertDatasetFilePreviewUsingPost(
datasetId: string | number,
fileId: string | number
) {
return post(`/api/data-management/datasets/${datasetId}/files/${fileId}/preview/convert`, {});
}
// 删除数据集文件
export function deleteDatasetFileUsingDelete(

View File

@@ -102,6 +102,13 @@ export interface DatasetTask {
executionHistory?: { time: string; status: string }[];
}
export interface StreamUploadInfo {
currentFile: string;
fileIndex: number;
totalFiles: number;
uploadedLines: number;
}
export interface TaskItem {
key: string;
title: string;
@@ -113,4 +120,6 @@ export interface TaskItem {
updateEvent?: string;
size?: number;
hasArchive?: boolean;
prefix?: string;
streamUploadInfo?: StreamUploadInfo;
}

View File

@@ -36,6 +36,10 @@ const DEFAULT_STATISTICS: StatisticsItem[] = [
title: "知识集总数",
value: 0,
},
{
title: "知识类别",
value: 0,
},
{
title: "文件总数",
value: 0,
@@ -109,6 +113,10 @@ export default function KnowledgeManagementPage() {
title: "知识集总数",
value: stats?.totalKnowledgeSets ?? 0,
},
{
title: "知识类别",
value: stats?.totalTags ?? 0,
},
{
title: "文件总数",
value: stats?.totalFiles ?? 0,
@@ -249,7 +257,7 @@ export default function KnowledgeManagementPage() {
return (
<div className="h-full flex flex-col gap-4">
<div className="flex items-center justify-between">
<h1 className="text-xl font-bold"></h1>
<h1 className="text-xl font-bold"></h1>
<div className="flex gap-2 items-center">
<Button onClick={() => navigate("/data/knowledge-management/search")}>
@@ -276,7 +284,7 @@ export default function KnowledgeManagementPage() {
<div className="grid grid-cols-1 gap-4">
<Card>
<div className="grid grid-cols-3">
<div className="grid grid-cols-4">
{statisticsData.map((item) => (
<Statistic
title={item.title}

View File

@@ -9,6 +9,7 @@ import {
import {
knowledgeSourceTypeOptions,
knowledgeStatusOptions,
// sensitivityOptions,
} from "../knowledge-management.const";
import {
KnowledgeSet,
@@ -169,9 +170,9 @@ export default function CreateKnowledgeSet({
<Form.Item label="负责人" name="owner">
<Input placeholder="请输入负责人" />
</Form.Item>
<Form.Item label="敏感级别" name="sensitivity">
<Input placeholder="请输入敏感级别" />
</Form.Item>
{/* <Form.Item label="敏感级别" name="sensitivity">
<Select options={sensitivityOptions} placeholder="请选择敏感级别" />
</Form.Item> */}
</div>
<div className="grid grid-cols-2 gap-4">
<Form.Item label="有效期开始" name="validFrom">
@@ -191,9 +192,6 @@ export default function CreateKnowledgeSet({
placeholder="请选择或输入标签"
/>
</Form.Item>
<Form.Item label="扩展元数据" name="metadata">
<Input.TextArea placeholder="请输入元数据(JSON)" rows={3} />
</Form.Item>
</Form>
</Modal>
</>

View File

@@ -16,6 +16,7 @@ export default function KnowledgeItemEditor({
open,
setId,
data,
parentPrefix,
onCancel,
onSuccess,
readOnly,
@@ -23,12 +24,14 @@ export default function KnowledgeItemEditor({
open: boolean;
setId: string;
data?: Partial<KnowledgeItem> | null;
parentPrefix?: string;
readOnly?: boolean;
onCancel: () => void;
onSuccess: () => void;
}) {
const [fileList, setFileList] = useState<UploadFile[]>([]);
const [replaceFileList, setReplaceFileList] = useState<UploadFile[]>([]);
const [loading, setLoading] = useState(false);
const isFileItem =
data?.contentType === KnowledgeContentType.FILE ||
data?.sourceType === KnowledgeSourceType.FILE_UPLOAD;
@@ -49,7 +52,6 @@ export default function KnowledgeItemEditor({
originFileObj: file,
},
]);
message.success("文件已就绪,可提交创建条目");
return false;
};
@@ -95,6 +97,7 @@ export default function KnowledgeItemEditor({
message.warning("请先选择文件");
return;
}
setLoading(true);
const formData = new FormData();
fileList.forEach((file) => {
const origin = file.originFileObj as File | undefined;
@@ -102,6 +105,9 @@ export default function KnowledgeItemEditor({
formData.append("files", origin);
}
});
if (parentPrefix) {
formData.append("parentPrefix", parentPrefix);
}
await uploadKnowledgeItemsUsingPost(setId, formData);
message.success(`已创建 ${fileList.length} 个知识条目`);
} else {
@@ -121,6 +127,7 @@ export default function KnowledgeItemEditor({
message.warning("请先选择要替换的文件");
return;
}
setLoading(true);
const formData = new FormData();
formData.append("file", replaceFile);
await replaceKnowledgeItemFileUsingPut(setId, data.id, formData);
@@ -132,6 +139,8 @@ export default function KnowledgeItemEditor({
onSuccess();
} catch {
message.error("操作失败,请重试");
} finally {
setLoading(false);
}
};
@@ -148,6 +157,7 @@ export default function KnowledgeItemEditor({
width={860}
maskClosable={false}
okButtonProps={{ disabled: readOnly }}
confirmLoading={loading}
>
<Form layout="vertical" disabled={readOnly}>
{isCreateMode && (

View File

@@ -35,6 +35,22 @@ export function queryKnowledgeItemsUsingGet(setId: string, params?: Record<strin
return get(`/api/data-management/knowledge-sets/${setId}/items`, params);
}
// 知识条目目录列表
export function queryKnowledgeDirectoriesUsingGet(setId: string, params?: Record<string, unknown>) {
return get(`/api/data-management/knowledge-sets/${setId}/directories`, params);
}
// 创建知识条目目录
export function createKnowledgeDirectoryUsingPost(setId: string, data: Record<string, unknown>) {
return post(`/api/data-management/knowledge-sets/${setId}/directories`, data);
}
// 删除知识条目目录
export function deleteKnowledgeDirectoryUsingDelete(setId: string, relativePath: string) {
const query = new URLSearchParams({ relativePath }).toString();
return del(`/api/data-management/knowledge-sets/${setId}/directories?${query}`);
}
// 知识条目文件搜索
export function searchKnowledgeItemsUsingGet(params?: Record<string, unknown>) {
return get("/api/data-management/knowledge-items/search", params);
@@ -70,6 +86,11 @@ export function deleteKnowledgeItemByIdUsingDelete(setId: string, itemId: string
return del(`/api/data-management/knowledge-sets/${setId}/items/${itemId}`);
}
// 批量删除知识条目
export function deleteKnowledgeItemsByIdsUsingPost(setId: string, data: { ids: string[] }) {
return post(`/api/data-management/knowledge-sets/${setId}/items/batch-delete`, data);
}
// 上传知识条目文件
export function uploadKnowledgeItemsUsingPost(setId: string, data: FormData) {
return post(`/api/data-management/knowledge-sets/${setId}/items/upload`, data);
@@ -80,6 +101,16 @@ export function downloadKnowledgeItemFileUsingGet(setId: string, itemId: string,
return download(`/api/data-management/knowledge-sets/${setId}/items/${itemId}/file`, null, fileName || "");
}
// 知识条目预览状态
export function queryKnowledgeItemPreviewStatusUsingGet(setId: string, itemId: string) {
return get(`/api/data-management/knowledge-sets/${setId}/items/${itemId}/preview/status`);
}
// 触发知识条目预览转换
export function convertKnowledgeItemPreviewUsingPost(setId: string, itemId: string) {
return post(`/api/data-management/knowledge-sets/${setId}/items/${itemId}/preview/convert`, {});
}
// 导出知识条目
export function exportKnowledgeItemsUsingGet(setId: string) {
return download(`/api/data-management/knowledge-sets/${setId}/items/export`);

View File

@@ -66,6 +66,11 @@ export const knowledgeSourceTypeOptions = [
{ label: "文件上传", value: KnowledgeSourceType.FILE_UPLOAD },
];
// export const sensitivityOptions = [
// { label: "敏感", value: "敏感" },
// { label: "不敏感", value: "不敏感" },
// ];
export type KnowledgeSetView = {
id: string;
name: string;
@@ -106,6 +111,7 @@ export type KnowledgeItemView = {
sensitivity?: string;
sourceDatasetId?: string;
sourceFileId?: string;
relativePath?: string;
metadata?: string;
createdAt?: string;
updatedAt?: string;
@@ -153,6 +159,7 @@ export function mapKnowledgeItem(data: KnowledgeItem): KnowledgeItemView {
sensitivity: data.sensitivity,
sourceDatasetId: data.sourceDatasetId,
sourceFileId: data.sourceFileId,
relativePath: data.relativePath,
metadata: data.metadata,
createdAt: data.createdAt ? formatDateTime(data.createdAt) : "",
updatedAt: data.updatedAt ? formatDateTime(data.updatedAt) : "",

View File

@@ -61,6 +61,7 @@ export interface KnowledgeItem {
sensitivity?: string;
sourceDatasetId?: string;
sourceFileId?: string;
relativePath?: string;
metadata?: string;
createdAt?: string;
updatedAt?: string;
@@ -68,10 +69,20 @@ export interface KnowledgeItem {
updatedBy?: string;
}
export interface KnowledgeDirectory {
id: string;
setId: string;
name: string;
relativePath: string;
createdAt?: string;
updatedAt?: string;
}
export interface KnowledgeManagementStatistics {
totalKnowledgeSets: number;
totalFiles: number;
totalSize: number;
totalTags: number;
}
export interface KnowledgeItemSearchResult {
@@ -84,6 +95,7 @@ export interface KnowledgeItemSearchResult {
sourceFileId?: string;
fileName?: string;
fileSize?: number;
relativePath?: string;
createdAt?: string;
updatedAt?: string;
}

View File

@@ -4,6 +4,7 @@ import {
CloseOutlined,
MenuOutlined,
SettingOutlined,
LogoutOutlined,
} from "@ant-design/icons";
import { ClipboardList, X } from "lucide-react";
import { menuItems } from "@/pages/Layout/menu";
@@ -12,6 +13,7 @@ import TaskUpload from "./TaskUpload";
import SettingsPage from "../SettingsPage/SettingsPage";
import { useAppSelector, useAppDispatch } from "@/store/hooks";
import { showSettings, hideSettings } from "@/store/slices/settingsSlice";
import { logout } from "@/store/slices/authSlice";
const isPathMatch = (currentPath: string, targetPath: string) =>
currentPath === targetPath || currentPath.startsWith(`${targetPath}/`);
@@ -67,6 +69,11 @@ const AsiderAndHeaderLayout = () => {
};
}, []);
const handleLogout = () => {
dispatch(logout());
navigate("/login");
};
return (
<div
className={`${
@@ -148,6 +155,9 @@ const AsiderAndHeaderLayout = () => {
>
</Button>
<Button block danger onClick={handleLogout}>
退
</Button>
</div>
) : (
<div className="space-y-2">
@@ -175,6 +185,7 @@ const AsiderAndHeaderLayout = () => {
>
<SettingOutlined />
</Button>
<Button block danger onClick={handleLogout} icon={<LogoutOutlined />} />
</div>
)}
</div>

View File

@@ -1,69 +1,93 @@
import {
cancelUploadUsingPut,
preUploadUsingPost,
uploadFileChunkUsingPost,
} from "@/pages/DataManagement/dataset.api";
import { Button, Empty, Progress } from "antd";
import { DeleteOutlined } from "@ant-design/icons";
import { useEffect } from "react";
import { useFileSliceUpload } from "@/hooks/useSliceUpload";
export default function TaskUpload() {
const { createTask, taskList, removeTask, handleUpload } = useFileSliceUpload(
{
preUpload: preUploadUsingPost,
uploadChunk: uploadFileChunkUsingPost,
cancelUpload: cancelUploadUsingPut,
}
);
useEffect(() => {
const uploadHandler = (e: any) => {
console.log('[TaskUpload] Received upload event detail:', e.detail);
const { files } = e.detail;
const task = createTask(e.detail);
console.log('[TaskUpload] Created task with prefix:', task.prefix);
handleUpload({ task, files });
};
window.addEventListener("upload:dataset", uploadHandler);
return () => {
window.removeEventListener("upload:dataset", uploadHandler);
};
}, []);
return (
<div
className="w-90 max-w-90 max-h-96 overflow-y-auto p-2"
id="header-task-popover"
>
{taskList.length > 0 &&
taskList.map((task) => (
<div key={task.key} className="border-b border-gray-200 pb-2">
<div className="flex items-center justify-between">
<div>{task.title}</div>
<Button
type="text"
danger
disabled={!task?.cancelFn}
onClick={() =>
removeTask({
...task,
isCancel: true,
})
}
icon={<DeleteOutlined />}
></Button>
</div>
<Progress size="small" percent={task.percent} />
</div>
))}
{taskList.length === 0 && (
<Empty
image={Empty.PRESENTED_IMAGE_SIMPLE}
description="暂无上传任务"
/>
)}
</div>
);
}
import {
cancelUploadUsingPut,
preUploadUsingPost,
uploadFileChunkUsingPost,
} from "@/pages/DataManagement/dataset.api";
import { Button, Empty, Progress, Tag } from "antd";
import { DeleteOutlined, FileTextOutlined } from "@ant-design/icons";
import { useEffect } from "react";
import { useFileSliceUpload } from "@/hooks/useSliceUpload";
export default function TaskUpload() {
const { createTask, taskList, removeTask, handleUpload, registerStreamUploadListener } = useFileSliceUpload(
{
preUpload: preUploadUsingPost,
uploadChunk: uploadFileChunkUsingPost,
cancelUpload: cancelUploadUsingPut,
},
true, // showTaskCenter
true // enableStreamUpload
);
useEffect(() => {
const uploadHandler = (e: Event) => {
const customEvent = e as CustomEvent;
console.log('[TaskUpload] Received upload event detail:', customEvent.detail);
const { files } = customEvent.detail;
const task = createTask(customEvent.detail);
console.log('[TaskUpload] Created task with prefix:', task.prefix);
handleUpload({ task, files });
};
window.addEventListener("upload:dataset", uploadHandler);
return () => {
window.removeEventListener("upload:dataset", uploadHandler);
};
}, [createTask, handleUpload]);
// 注册流式上传监听器
useEffect(() => {
const unregister = registerStreamUploadListener();
return unregister;
}, [registerStreamUploadListener]);
return (
<div
className="w-90 max-w-90 max-h-96 overflow-y-auto p-2"
id="header-task-popover"
>
{taskList.length > 0 &&
taskList.map((task) => (
<div key={task.key} className="border-b border-gray-200 pb-2">
<div className="flex items-center justify-between">
<div>{task.title}</div>
<Button
type="text"
danger
disabled={!task?.cancelFn}
onClick={() =>
removeTask({
...task,
isCancel: true,
})
}
icon={<DeleteOutlined />}
></Button>
</div>
<Progress size="small" percent={Number(task.percent)} />
{task.streamUploadInfo && (
<div className="flex items-center gap-2 text-xs text-gray-500 mt-1">
<Tag icon={<FileTextOutlined />} size="small">
</Tag>
<span>
: {task.streamUploadInfo.uploadedLines}
</span>
{task.streamUploadInfo.totalFiles > 1 && (
<span>
({task.streamUploadInfo.fileIndex}/{task.streamUploadInfo.totalFiles} )
</span>
)}
</div>
)}
</div>
))}
{taskList.length === 0 && (
<Empty
image={Empty.PRESENTED_IMAGE_SIMPLE}
description="暂无上传任务"
/>
)}
</div>
);
}

View File

@@ -24,11 +24,25 @@ export const menuItems = [
// },
{
id: "management",
title: "数管理",
title: "数管理",
icon: FolderOpen,
description: "创建、导入和管理数据集",
color: "bg-blue-500",
},
{
id: "annotation",
title: "数据标注",
icon: Tag,
description: "对数据进行标注和标记",
color: "bg-green-500",
},
{
id: "content-generation",
title: "内容生成",
icon: Sparkles,
description: "智能内容生成与创作",
color: "bg-purple-500",
},
{
id: "knowledge-management",
title: "知识管理",
@@ -43,20 +57,6 @@ export const menuItems = [
// description: "数据清洗和预处理",
// color: "bg-purple-500",
// },
{
id: "annotation",
title: "数据标注",
icon: Tag,
description: "对数据进行标注和标记",
color: "bg-green-500",
},
{
id: "content-generation",
title: "内容生成",
icon: Sparkles,
description: "智能内容生成与创作",
color: "bg-purple-500",
},
// {
// id: "synthesis",
// title: "数据合成",

View File

@@ -0,0 +1,114 @@
import React, { useState } from 'react';
import { useNavigate, useLocation } from 'react-router';
import { Form, Input, Button, Typography, message, Card } from 'antd';
import { UserOutlined, LockOutlined } from '@ant-design/icons';
import { useAppDispatch, useAppSelector } from '@/store/hooks';
import { loginLocal } from '@/store/slices/authSlice';
const { Title, Text } = Typography;
const LoginPage: React.FC = () => {
const navigate = useNavigate();
const location = useLocation();
const dispatch = useAppDispatch();
const { loading, error } = useAppSelector((state) => state.auth);
const [messageApi, contextHolder] = message.useMessage();
const from = location.state?.from?.pathname || '/data';
const onFinish = (values: any) => {
dispatch(loginLocal(values));
// The reducer updates state synchronously.
if (values.username === 'admin' && values.password === '123456') {
messageApi.success('登录成功');
navigate(from, { replace: true });
} else {
messageApi.error('账号或密码错误');
}
};
return (
<div className="min-h-screen flex items-center justify-center bg-[#050b14] relative overflow-hidden">
{contextHolder}
{/* Background Effects */}
<div className="absolute inset-0 z-0">
<div className="absolute top-0 left-0 w-full h-full bg-[radial-gradient(ellipse_at_center,_var(--tw-gradient-stops))] from-blue-900/20 via-[#050b14] to-[#050b14]"></div>
{/* Simple grid pattern if possible, or just gradient */}
</div>
<div className="absolute top-1/4 left-1/4 w-72 h-72 bg-blue-500/10 rounded-full blur-3xl animate-pulse"></div>
<div className="absolute bottom-1/4 right-1/4 w-96 h-96 bg-cyan-500/10 rounded-full blur-3xl animate-pulse delay-700"></div>
<div className="z-10 w-full max-w-md p-8 animate-[fadeIn_0.5s_ease-out_forwards]">
<div className="backdrop-blur-xl bg-white/5 border border-white/10 rounded-2xl shadow-2xl p-8 relative overflow-hidden">
{/* Decorative line */}
<div className="absolute top-0 left-0 w-full h-1 bg-gradient-to-r from-transparent via-blue-500 to-transparent"></div>
<div className="text-center mb-8">
<div className="inline-flex items-center justify-center w-16 h-16 rounded-full bg-blue-500/20 mb-4 border border-blue-500/30">
<svg className="w-8 h-8 text-blue-400" fill="none" stroke="currentColor" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M19 11H5m14 0a2 2 0 012 2v6a2 2 0 01-2 2H5a2 2 0 01-2-2v-6a2 2 0 012-2m14 0V9a2 2 0 00-2-2M5 11V9a2 2 0 012-2m0 0V5a2 2 0 012-2h6a2 2 0 012 2v2M7 7h10" />
</svg>
</div>
<Title level={2} className="!text-white !mb-2 tracking-wide font-bold">
DataBuilder
</Title>
<Text className="text-gray-400! text-sm tracking-wider">
</Text>
</div>
<Form
name="login"
initialValues={{ remember: true, username: 'admin', password: '123456' }}
onFinish={onFinish}
layout="vertical"
size="large"
>
<Form.Item
name="username"
rules={[{ required: true, message: '请输入账号!' }]}
>
<Input
prefix={<UserOutlined className="text-blue-400" />}
placeholder="账号"
className="!bg-white/5 !border-white/10 !text-white placeholder:!text-gray-600 hover:!border-blue-500/50 focus:!border-blue-500 !rounded-lg"
/>
</Form.Item>
<Form.Item
name="password"
rules={[{ required: true, message: '请输入密码!' }]}
>
<Input.Password
prefix={<LockOutlined className="text-blue-400" />}
type="password"
placeholder="密码"
className="!bg-white/5 !border-white/10 !text-white placeholder:!text-gray-600 hover:!border-blue-500/50 focus:!border-blue-500 !rounded-lg"
/>
</Form.Item>
<Form.Item className="mb-2">
<Button
type="primary"
htmlType="submit"
className="w-full bg-gradient-to-r from-blue-600 to-cyan-600 hover:from-blue-500 hover:to-cyan-500 border-none h-12 rounded-lg font-semibold tracking-wide shadow-lg shadow-blue-900/20"
loading={loading}
>
</Button>
</Form.Item>
<div className="text-center mt-4">
<Text className="text-gray-600! text-xs">
·
</Text>
</div>
</Form>
</div>
</div>
</div>
);
};
export default LoginPage;

View File

@@ -49,243 +49,254 @@ import EvaluationDetailPage from "@/pages/DataEvaluation/Detail/TaskDetail.tsx";
import SynthDataDetail from "@/pages/SynthesisTask/SynthDataDetail.tsx";
import Home from "@/pages/Home/Home";
import ContentGenerationPage from "@/pages/ContentGeneration/ContentGenerationPage";
import LoginPage from "@/pages/Login/LoginPage";
import ProtectedRoute from "@/components/ProtectedRoute";
const router = createBrowserRouter([
{
path: "/login",
Component: LoginPage,
},
{
path: "/",
Component: Home,
},
{
path: "/chat",
Component: withErrorBoundary(AgentPage),
},
{
path: "/orchestration",
Component: ProtectedRoute,
children: [
{
path: "",
index: true,
Component: withErrorBoundary(OrchestrationPage),
path: "/chat",
Component: withErrorBoundary(AgentPage),
},
{
path: "create-workflow",
Component: withErrorBoundary(WorkflowEditor),
},
],
},
{
path: "/data",
Component: withErrorBoundary(MainLayout),
children: [
{
path: "collection",
path: "/orchestration",
children: [
{
path: "",
index: true,
Component: DataCollection,
Component: withErrorBoundary(OrchestrationPage),
},
{
path: "create-task",
Component: CollectionTaskCreate,
path: "create-workflow",
Component: withErrorBoundary(WorkflowEditor),
},
],
},
{
path: "management",
path: "/data",
Component: withErrorBoundary(MainLayout),
children: [
{
path: "",
index: true,
Component: DatasetManagement,
path: "collection",
children: [
{
path: "",
index: true,
Component: DataCollection,
},
{
path: "create-task",
Component: CollectionTaskCreate,
},
],
},
{
path: "create/:id?",
Component: DatasetCreate,
path: "management",
children: [
{
path: "",
index: true,
Component: DatasetManagement,
},
{
path: "create/:id?",
Component: DatasetCreate,
},
{
path: "detail/:id",
Component: DatasetDetail,
},
],
},
{
path: "detail/:id",
Component: DatasetDetail,
path: "knowledge-management",
children: [
{
path: "",
index: true,
Component: KnowledgeManagementPage,
},
{
path: "search",
Component: KnowledgeManagementSearch,
},
{
path: "detail/:id",
Component: KnowledgeSetDetail,
},
],
},
{
path: "cleansing",
children: [
{
path: "",
index: true,
Component: DataCleansing,
},
{
path: "create-task",
Component: CleansingTaskCreate,
},
{
path: "task-detail/:id",
Component: CleansingTaskDetail,
},
{
path: "create-template",
Component: CleansingTemplateCreate,
},
{
path: "template-detail/:id",
Component: CleansingTemplateDetail,
},
{
path: "update-template/:id",
Component: CleansingTemplateCreate,
},
],
},
{
path: "annotation",
children: [
{
path: "",
index: true,
Component: DataAnnotation,
},
{
path: "create-task",
Component: AnnotationTaskCreate,
},
{
path: "annotate/:projectId",
Component: LabelStudioTextEditor,
},
],
},
{
path: "content-generation",
Component: ContentGenerationPage,
},
{
path: "synthesis/task",
children: [
{
path: "",
Component: DataSynthesisPage,
},
{
path: "create-template",
Component: InstructionTemplateCreate,
},
{
path: "create",
Component: SynthesisTaskCreate,
},
{
path: ":id",
Component: SynthFileTask
},
{
path: "file/:id/detail",
Component: SynthDataDetail,
}
],
},
{
path: "synthesis/ratio-task",
children: [
{
path: "",
index: true,
Component: RatioTasksPage,
},
{
path: "create",
Component: CreateRatioTask,
},
{
path: "detail/:id",
Component: RatioTaskDetail,
}
],
},
{
path: "evaluation",
children: [
{
path: "",
index: true,
Component: DataEvaluationPage,
},
{
path: "detail/:id",
Component: EvaluationDetailPage,
},
{
path: "task-report/:id",
Component: EvaluationTaskReport,
},
{
path: "manual-evaluate/:id",
Component: ManualEvaluatePage,
},
],
},
{
path: "knowledge-base",
children: [
{
path: "",
index: true,
Component: KnowledgeBasePage,
},
{
path: "search",
Component: KnowledgeBaseSearch,
},
{
path: "detail/:id",
Component: KnowledgeBaseDetailPage,
},
{
path: "file-detail/:id",
Component: KnowledgeBaseFileDetailPage,
},
],
},
{
path: "operator-market",
children: [
{
path: "",
index: true,
Component: OperatorMarketPage,
},
{
path: "create/:id?",
Component: OperatorPluginCreate,
},
{
path: "plugin-detail/:id",
Component: OperatorPluginDetail,
},
],
},
],
},
{
path: "knowledge-management",
children: [
{
path: "",
index: true,
Component: KnowledgeManagementPage,
},
{
path: "search",
Component: KnowledgeManagementSearch,
},
{
path: "detail/:id",
Component: KnowledgeSetDetail,
},
],
},
{
path: "cleansing",
children: [
{
path: "",
index: true,
Component: DataCleansing,
},
{
path: "create-task",
Component: CleansingTaskCreate,
},
{
path: "task-detail/:id",
Component: CleansingTaskDetail,
},
{
path: "create-template",
Component: CleansingTemplateCreate,
},
{
path: "template-detail/:id",
Component: CleansingTemplateDetail,
},
{
path: "update-template/:id",
Component: CleansingTemplateCreate,
},
],
},
{
path: "annotation",
children: [
{
path: "",
index: true,
Component: DataAnnotation,
},
{
path: "create-task",
Component: AnnotationTaskCreate,
},
{
path: "annotate/:projectId",
Component: LabelStudioTextEditor,
},
],
},
{
path: "content-generation",
Component: ContentGenerationPage,
},
{
path: "synthesis/task",
children: [
{
path: "",
Component: DataSynthesisPage,
},
{
path: "create-template",
Component: InstructionTemplateCreate,
},
{
path: "create",
Component: SynthesisTaskCreate,
},
{
path: ":id",
Component: SynthFileTask
},
{
path: "file/:id/detail",
Component: SynthDataDetail,
}
],
},
{
path: "synthesis/ratio-task",
children: [
{
path: "",
index: true,
Component: RatioTasksPage,
},
{
path: "create",
Component: CreateRatioTask,
},
{
path: "detail/:id",
Component: RatioTaskDetail,
}
],
},
{
path: "evaluation",
children: [
{
path: "",
index: true,
Component: DataEvaluationPage,
},
{
path: "detail/:id",
Component: EvaluationDetailPage,
},
{
path: "task-report/:id",
Component: EvaluationTaskReport,
},
{
path: "manual-evaluate/:id",
Component: ManualEvaluatePage,
},
],
},
{
path: "knowledge-base",
children: [
{
path: "",
index: true,
Component: KnowledgeBasePage,
},
{
path: "search",
Component: KnowledgeBaseSearch,
},
{
path: "detail/:id",
Component: KnowledgeBaseDetailPage,
},
{
path: "file-detail/:id",
Component: KnowledgeBaseFileDetailPage,
},
],
},
{
path: "operator-market",
children: [
{
path: "",
index: true,
Component: OperatorMarketPage,
},
{
path: "create/:id?",
Component: OperatorPluginCreate,
},
{
path: "plugin-detail/:id",
Component: OperatorPluginDetail,
},
],
},
],
},
]
}
]);
export default router;
export default router;

View File

@@ -31,7 +31,7 @@ const authSlice = createSlice({
initialState: {
user: null,
token: localStorage.getItem('token'),
isAuthenticated: false,
isAuthenticated: !!localStorage.getItem('token'),
loading: false,
error: null,
},
@@ -49,6 +49,19 @@ const authSlice = createSlice({
state.token = action.payload;
localStorage.setItem('token', action.payload);
},
loginLocal: (state, action) => {
const { username, password } = action.payload;
if (username === 'admin' && password === '123456') {
state.user = { username: 'admin', role: 'admin' };
state.token = 'mock-token-' + Date.now();
state.isAuthenticated = true;
localStorage.setItem('token', state.token);
state.error = null;
} else {
state.error = 'Invalid credentials';
state.isAuthenticated = false;
}
},
},
extraReducers: (builder) => {
builder
@@ -71,5 +84,5 @@ const authSlice = createSlice({
},
});
export const { logout, clearError, setToken } = authSlice.actions;
export const { logout, clearError, setToken, loginLocal } = authSlice.actions;
export default authSlice.reducer;

View File

@@ -1,79 +1,600 @@
import { UploadFile } from "antd";
import jsSHA from "jssha";
const CHUNK_SIZE = 1024 * 1024 * 60;
// 默认分片大小:5MB(适合大多数网络环境)
export const DEFAULT_CHUNK_SIZE = 1024 * 1024 * 5;
// 大文件阈值:10MB
export const LARGE_FILE_THRESHOLD = 1024 * 1024 * 10;
// 最大并发上传数
export const MAX_CONCURRENT_UPLOADS = 3;
// 文本文件读取块大小:20MB(用于计算 SHA256)
const BUFFER_CHUNK_SIZE = 1024 * 1024 * 20;
export function sliceFile(file, chunkSize = CHUNK_SIZE): Blob[] {
/**
* 将文件分割为多个分片
* @param file 文件对象
* @param chunkSize 分片大小(字节),默认 5MB
* @returns 分片数组(Blob 列表)
*/
export function sliceFile(file: Blob, chunkSize = DEFAULT_CHUNK_SIZE): Blob[] {
const totalSize = file.size;
const chunks: Blob[] = [];
// 小文件不需要分片
if (totalSize <= chunkSize) {
return [file];
}
let start = 0;
let end = start + chunkSize;
const chunks = [];
while (start < totalSize) {
const end = Math.min(start + chunkSize, totalSize);
const blob = file.slice(start, end);
chunks.push(blob);
start = end;
end = start + chunkSize;
}
return chunks;
}
export function calculateSHA256(file: Blob): Promise<string> {
let count = 0;
const hash = new jsSHA("SHA-256", "ARRAYBUFFER", { encoding: "UTF8" });
/**
* 计算文件的 SHA256 哈希值
* @param file 文件 Blob
* @param onProgress 进度回调(可选)
* @returns SHA256 哈希字符串
*/
export function calculateSHA256(
file: Blob,
onProgress?: (percent: number) => void
): Promise<string> {
return new Promise((resolve, reject) => {
const hash = new jsSHA("SHA-256", "ARRAYBUFFER", { encoding: "UTF8" });
const reader = new FileReader();
let processedSize = 0;
function readChunk(start: number, end: number) {
const slice = file.slice(start, end);
reader.readAsArrayBuffer(slice);
}
const bufferChunkSize = 1024 * 1024 * 20;
function processChunk(offset: number) {
const start = offset;
const end = Math.min(start + bufferChunkSize, file.size);
count = end;
const end = Math.min(start + BUFFER_CHUNK_SIZE, file.size);
readChunk(start, end);
}
reader.onloadend = function () {
const arraybuffer = reader.result;
reader.onloadend = function (e) {
const arraybuffer = reader.result as ArrayBuffer;
if (!arraybuffer) {
reject(new Error("Failed to read file"));
return;
}
hash.update(arraybuffer);
if (count < file.size) {
processChunk(count);
processedSize += (e.target as FileReader).result?.byteLength || 0;
if (onProgress) {
const percent = Math.min(100, Math.round((processedSize / file.size) * 100));
onProgress(percent);
}
if (processedSize < file.size) {
processChunk(processedSize);
} else {
resolve(hash.getHash("HEX", { outputLen: 256 }));
}
};
reader.onerror = () => reject(new Error("File reading failed"));
processChunk(0);
});
}
/**
* 批量计算多个文件的 SHA256
* @param files 文件列表
* @param onFileProgress 单个文件进度回调(可选)
* @returns 哈希值数组
*/
export async function calculateSHA256Batch(
files: Blob[],
onFileProgress?: (index: number, percent: number) => void
): Promise<string[]> {
const results: string[] = [];
for (let i = 0; i < files.length; i++) {
const hash = await calculateSHA256(files[i], (percent) => {
onFileProgress?.(i, percent);
});
results.push(hash);
}
return results;
}
/**
* 检查文件是否存在(未被修改或删除)
* @param fileList 文件列表
* @returns 返回第一个不存在的文件,或 null(如果都存在)
*/
export function checkIsFilesExist(
fileList: UploadFile[]
): Promise<UploadFile | null> {
fileList: Array<{ originFile?: Blob }>
): Promise<{ originFile?: Blob } | null> {
return new Promise((resolve) => {
const loadEndFn = (file: UploadFile, reachEnd: boolean, e) => {
const fileNotExist = !e.target.result;
if (!fileList.length) {
resolve(null);
return;
}
let checkedCount = 0;
const totalCount = fileList.length;
const loadEndFn = (file: { originFile?: Blob }, e: ProgressEvent<FileReader>) => {
checkedCount++;
const fileNotExist = !e.target?.result;
if (fileNotExist) {
resolve(file);
return;
}
if (reachEnd) {
if (checkedCount >= totalCount) {
resolve(null);
}
};
for (let i = 0; i < fileList.length; i++) {
const { originFile: file } = fileList[i];
for (const file of fileList) {
const fileReader = new FileReader();
fileReader.readAsArrayBuffer(file);
fileReader.onloadend = (e) =>
loadEndFn(fileList[i], i === fileList.length - 1, e);
const actualFile = file.originFile;
if (!actualFile) {
checkedCount++;
if (checkedCount >= totalCount) {
resolve(null);
}
continue;
}
fileReader.readAsArrayBuffer(actualFile.slice(0, 1));
fileReader.onloadend = (e) => loadEndFn(file, e);
fileReader.onerror = () => {
checkedCount++;
resolve(file);
};
}
});
}
/**
* 判断文件是否为大文件
* @param size 文件大小(字节)
* @param threshold 阈值(字节),默认 10MB
*/
export function isLargeFile(size: number, threshold = LARGE_FILE_THRESHOLD): boolean {
return size > threshold;
}
/**
* 格式化文件大小为人类可读格式
* @param bytes 字节数
* @param decimals 小数位数
*/
export function formatFileSize(bytes: number, decimals = 2): string {
if (bytes === 0) return "0 B";
const k = 1024;
const sizes = ["B", "KB", "MB", "GB", "TB", "PB"];
const i = Math.floor(Math.log(bytes) / Math.log(k));
return `${parseFloat((bytes / Math.pow(k, i)).toFixed(decimals))} ${sizes[i]}`;
}
/**
* 并发执行异步任务
* @param tasks 任务函数数组
* @param maxConcurrency 最大并发数
* @param onTaskComplete 单个任务完成回调(可选)
*/
export async function runConcurrentTasks<T>(
tasks: (() => Promise<T>)[],
maxConcurrency: number,
onTaskComplete?: (index: number, result: T) => void
): Promise<T[]> {
const results: T[] = new Array(tasks.length);
let index = 0;
async function runNext(): Promise<void> {
const currentIndex = index++;
if (currentIndex >= tasks.length) return;
const result = await tasks[currentIndex]();
results[currentIndex] = result;
onTaskComplete?.(currentIndex, result);
await runNext();
}
const workers = Array(Math.min(maxConcurrency, tasks.length))
.fill(null)
.map(() => runNext());
await Promise.all(workers);
return results;
}
/**
* 按行分割文本文件内容
* @param text 文本内容
* @param skipEmptyLines 是否跳过空行,默认 true
* @returns 行数组
*/
export function splitTextByLines(text: string, skipEmptyLines = true): string[] {
const lines = text.split(/\r?\n/);
if (skipEmptyLines) {
return lines.filter((line) => line.trim() !== "");
}
return lines;
}
/**
* 创建分片信息对象
* @param file 原始文件
* @param chunkSize 分片大小
*/
export function createFileSliceInfo(
file: File | Blob,
chunkSize = DEFAULT_CHUNK_SIZE
): {
originFile: Blob;
slices: Blob[];
name: string;
size: number;
totalChunks: number;
} {
const slices = sliceFile(file, chunkSize);
return {
originFile: file,
slices,
name: (file as File).name || "unnamed",
size: file.size,
totalChunks: slices.length,
};
}
/**
* 支持的文本文件 MIME 类型前缀
*/
export const TEXT_FILE_MIME_PREFIX = "text/";
/**
* 支持的文本文件 MIME 类型集合
*/
export const TEXT_FILE_MIME_TYPES = new Set([
"application/json",
"application/xml",
"application/csv",
"application/ndjson",
"application/x-ndjson",
"application/x-yaml",
"application/yaml",
"application/javascript",
"application/x-javascript",
"application/sql",
"application/rtf",
"application/xhtml+xml",
"application/svg+xml",
]);
/**
* 支持的文本文件扩展名集合
*/
export const TEXT_FILE_EXTENSIONS = new Set([
".txt",
".md",
".markdown",
".csv",
".tsv",
".json",
".jsonl",
".ndjson",
".log",
".xml",
".yaml",
".yml",
".sql",
".js",
".ts",
".jsx",
".tsx",
".html",
".htm",
".css",
".scss",
".less",
".py",
".java",
".c",
".cpp",
".h",
".hpp",
".go",
".rs",
".rb",
".php",
".sh",
".bash",
".zsh",
".ps1",
".bat",
".cmd",
".svg",
".rtf",
]);
/**
* 判断文件是否为文本文件(支持 UploadFile 类型)
* @param file UploadFile 对象
*/
export function isTextUploadFile(file: UploadFile): boolean {
const mimeType = (file.type || "").toLowerCase();
if (mimeType) {
if (mimeType.startsWith(TEXT_FILE_MIME_PREFIX)) return true;
if (TEXT_FILE_MIME_TYPES.has(mimeType)) return true;
}
const fileName = file.name || "";
const dotIndex = fileName.lastIndexOf(".");
if (dotIndex < 0) return false;
const ext = fileName.slice(dotIndex).toLowerCase();
return TEXT_FILE_EXTENSIONS.has(ext);
}
/**
* 判断文件名是否为文本文件
* @param fileName 文件名
*/
export function isTextFileByName(fileName: string): boolean {
const lowerName = fileName.toLowerCase();
// 先检查 MIME 类型(如果有)
// 这里简化处理,主要通过扩展名判断
const dotIndex = lowerName.lastIndexOf(".");
if (dotIndex < 0) return false;
const ext = lowerName.slice(dotIndex);
return TEXT_FILE_EXTENSIONS.has(ext);
}
/**
* 获取文件扩展名
* @param fileName 文件名
*/
export function getFileExtension(fileName: string): string {
const dotIndex = fileName.lastIndexOf(".");
if (dotIndex < 0) return "";
return fileName.slice(dotIndex).toLowerCase();
}
/**
* 安全地读取文件为文本
* @param file 文件对象
* @param encoding 编码,默认 UTF-8
*/
export function readFileAsText(
file: File | Blob,
encoding = "UTF-8"
): Promise<string> {
return new Promise((resolve, reject) => {
const reader = new FileReader();
reader.onload = (e) => resolve(e.target?.result as string);
reader.onerror = () => reject(new Error("Failed to read file"));
reader.readAsText(file, encoding);
});
}
/**
* 流式分割文件并逐行上传
* 使用 Blob.slice 逐块读取,避免一次性加载大文件到内存
* @param file 文件对象
* @param datasetId 数据集ID
* @param uploadFn 上传函数,接收 FormData 和配置,返回 Promise
* @param onProgress 进度回调 (currentBytes, totalBytes, uploadedLines)
* @param chunkSize 每次读取的块大小,默认 1MB
* @param options 其他选项
* @returns 上传结果统计
*/
export interface StreamUploadOptions {
reqId: number;
fileNamePrefix?: string;
hasArchive?: boolean;
prefix?: string;
signal?: AbortSignal;
maxConcurrency?: number;
}
export interface StreamUploadResult {
uploadedCount: number;
totalBytes: number;
skippedEmptyCount: number;
}
export async function streamSplitAndUpload(
file: File,
uploadFn: (formData: FormData, config?: { onUploadProgress?: (e: { loaded: number; total: number }) => void }) => Promise<unknown>,
onProgress?: (currentBytes: number, totalBytes: number, uploadedLines: number) => void,
chunkSize: number = 1024 * 1024, // 1MB
options: StreamUploadOptions
): Promise<StreamUploadResult> {
const { reqId, fileNamePrefix, prefix, signal, maxConcurrency = 3 } = options;
const fileSize = file.size;
let offset = 0;
let buffer = "";
let uploadedCount = 0;
let skippedEmptyCount = 0;
let currentBytes = 0;
// 获取文件名基础部分和扩展名
const originalFileName = fileNamePrefix || file.name;
const lastDotIndex = originalFileName.lastIndexOf(".");
const baseName = lastDotIndex > 0 ? originalFileName.slice(0, lastDotIndex) : originalFileName;
const fileExtension = lastDotIndex > 0 ? originalFileName.slice(lastDotIndex) : "";
// 收集所有需要上传的行
const pendingLines: { line: string; index: number }[] = [];
let lineIndex = 0;
// 逐块读取文件并收集行
while (offset < fileSize) {
// 检查是否已取消
if (signal?.aborted) {
throw new Error("Upload cancelled");
}
const end = Math.min(offset + chunkSize, fileSize);
const chunk = file.slice(offset, end);
const text = await readFileAsText(chunk);
// 将新读取的内容追加到 buffer
const combined = buffer + text;
// 按换行符分割(支持 \n 和 \r\n)
const lines = combined.split(/\r?\n/);
// 保留最后一行(可能不完整)
buffer = lines.pop() || "";
// 收集完整行
for (const line of lines) {
if (signal?.aborted) {
throw new Error("Upload cancelled");
}
pendingLines.push({ line, index: lineIndex++ });
}
currentBytes = end;
offset = end;
// 每处理完一个 chunk,更新进度
onProgress?.(currentBytes, fileSize, uploadedCount);
}
// 处理最后剩余的 buffer(如果文件不以换行符结尾)
if (buffer.trim()) {
pendingLines.push({ line: buffer, index: lineIndex++ });
}
/**
* 上传单行内容
* fileNo 固定为 1(因为所有行都属于同一个原始文件,只是不同的分片/行)
* chunkNo 用于标识是第几行
*/
async function uploadLine(line: string, index: number): Promise<void> {
// 检查是否已取消
if (signal?.aborted) {
throw new Error("Upload cancelled");
}
if (!line.trim()) {
skippedEmptyCount++;
return;
}
// 保留原始文件扩展名
const newFileName = `${baseName}_${String(index + 1).padStart(6, "0")}${fileExtension}`;
const blob = new Blob([line], { type: "text/plain" });
const lineFile = new File([blob], newFileName, { type: "text/plain" });
// 计算分片(小文件通常只需要一个分片)
const slices = sliceFile(lineFile, DEFAULT_CHUNK_SIZE);
const checkSum = await calculateSHA256(slices[0]);
// 检查是否已取消(计算哈希后)
if (signal?.aborted) {
throw new Error("Upload cancelled");
}
const formData = new FormData();
formData.append("file", slices[0]);
formData.append("reqId", reqId.toString());
// 所有行使用相同的 fileNo=1,因为它们属于同一个预上传请求
// chunkNo 表示这是第几行数据
formData.append("fileNo", "1");
formData.append("chunkNo", (index + 1).toString());
formData.append("fileName", newFileName);
formData.append("fileSize", lineFile.size.toString());
formData.append("totalChunkNum", "1");
formData.append("checkSumHex", checkSum);
if (prefix !== undefined) {
formData.append("prefix", prefix);
}
await uploadFn(formData, {
onUploadProgress: () => {
// 单行文件很小,进度主要用于追踪上传状态
},
});
}
/**
* 带并发控制的上传队列执行器
* 使用任务队列模式,确保不会同时启动所有上传任务
*/
async function executeUploadsWithConcurrency(): Promise<void> {
const lines = [...pendingLines];
let currentIndex = 0;
let activeCount = 0;
let resolvedCount = 0;
return new Promise((resolve, reject) => {
function tryStartNext() {
// 检查是否已完成
if (resolvedCount >= lines.length) {
if (activeCount === 0) {
resolve();
}
return;
}
// 启动新的上传任务,直到达到最大并发数
while (activeCount < maxConcurrency && currentIndex < lines.length) {
const { line, index } = lines[currentIndex++];
activeCount++;
uploadLine(line, index)
.then(() => {
uploadedCount++;
onProgress?.(fileSize, fileSize, uploadedCount);
})
.catch((err) => {
reject(err);
})
.finally(() => {
activeCount--;
resolvedCount++;
// 尝试启动下一个任务
tryStartNext();
});
}
}
// 开始执行
tryStartNext();
});
}
// 使用并发控制执行所有上传
await executeUploadsWithConcurrency();
return {
uploadedCount,
totalBytes: fileSize,
skippedEmptyCount,
};
}
/**
* 判断文件是否需要流式分割上传
* @param file 文件对象
* @param threshold 阈值,默认 5MB
*/
export function shouldStreamUpload(file: File, threshold: number = 5 * 1024 * 1024): boolean {
return file.size > threshold;
}

View File

@@ -82,6 +82,9 @@ class Request {
*/
createXHRWithProgress(url, config, onProgress, onDownloadProgress) {
return new Promise((resolve, reject) => {
const xhr = new XMLHttpRequest();
xhr.open(config.method || "POST", url);
// 设置请求头
if (config.headers) {
Object.keys(config.headers).forEach((key) => {
@@ -89,7 +92,13 @@ class Request {
});
}
const xhr = new XMLHttpRequest();
// 监听 AbortSignal 来中止请求
if (config.signal) {
config.signal.addEventListener("abort", () => {
xhr.abort();
reject(new Error("上传已取消"));
});
}
// 监听上传进度
xhr.upload.addEventListener("progress", function (event) {
@@ -103,14 +112,6 @@ class Request {
}
});
// 请求完成
// xhr.addEventListener("load", function () {
// if (xhr.status >= 200 && xhr.status < 300) {
// const response = JSON.parse(xhr.responseText);
// resolve(xhr);
// }
// });
// 请求完成处理
xhr.addEventListener("load", () => {
if (xhr.status >= 200 && xhr.status < 300) {
@@ -142,16 +143,15 @@ class Request {
// 请求错误
xhr.addEventListener("error", function () {
console.error("网络错误");
if (onError) onError(new Error("网络错误"));
reject(new Error("网络错误"));
});
// 请求中止
xhr.addEventListener("abort", function () {
console.log("上传已取消");
if (onError) onError(new Error("上传已取消"));
reject(new Error("上传已取消"));
});
xhr.open("POST", url);
xhr.send(config.body);
return xhr; // 返回 xhr 对象以便后续控制

View File

@@ -66,7 +66,7 @@ class Settings(BaseSettings):
datamate_backend_base_url: str = "http://datamate-backend:8080/api"
# 标注编辑器(Label Studio Editor)相关
editor_max_text_bytes: int = 2 * 1024 * 1024 # 2MB,避免一次加载超大文本卡死前端
editor_max_text_bytes: int = 0 # <=0 表示不限制,正数为最大字节数
# 全局设置实例
settings = Settings()

View File

@@ -3,7 +3,7 @@ import math
import uuid
from fastapi import APIRouter, Depends, HTTPException, Query, Path
from sqlalchemy import select
from sqlalchemy import select, update
from sqlalchemy.ext.asyncio import AsyncSession
from app.db.session import get_db
@@ -17,6 +17,7 @@ from ..service.template import AnnotationTemplateService
from ..schema import (
DatasetMappingCreateRequest,
DatasetMappingCreateResponse,
DatasetMappingUpdateRequest,
DeleteDatasetResponse,
DatasetMappingResponse,
)
@@ -28,6 +29,7 @@ router = APIRouter(
logger = get_logger(__name__)
TEXT_DATASET_TYPE = "TEXT"
SOURCE_DOCUMENT_FILE_TYPES = {"pdf", "doc", "docx", "xls", "xlsx"}
LABELING_TYPE_CONFIG_KEY = "labeling_type"
@router.get("/{mapping_id}/login")
async def login_label_studio(
@@ -81,6 +83,7 @@ async def create_mapping(
# 如果提供了模板ID,获取模板配置
label_config = None
template_labeling_type = None
if request.template_id:
logger.info(f"Using template: {request.template_id}")
template = await template_service.get_template(db, request.template_id)
@@ -90,6 +93,7 @@ async def create_mapping(
detail=f"Template not found: {request.template_id}"
)
label_config = template.label_config
template_labeling_type = getattr(template, "labeling_type", None)
logger.debug(f"Template label config loaded for template: {template.name}")
# 如果直接提供了 label_config (自定义或修改后的),则覆盖模板配置
@@ -108,6 +112,8 @@ async def create_mapping(
project_configuration["description"] = project_description
if dataset_type == TEXT_DATASET_TYPE and request.segmentation_enabled is not None:
project_configuration["segmentation_enabled"] = bool(request.segmentation_enabled)
if template_labeling_type:
project_configuration[LABELING_TYPE_CONFIG_KEY] = template_labeling_type
labeling_project = LabelingProject(
id=str(uuid.uuid4()), # Generate UUID here
@@ -144,6 +150,18 @@ async def create_mapping(
labeling_project, snapshot_file_ids
)
# 如果启用了分段且为文本数据集,预生成切片结构
if dataset_type == TEXT_DATASET_TYPE and request.segmentation_enabled:
try:
from ..service.editor import AnnotationEditorService
editor_service = AnnotationEditorService(db)
# 异步预计算切片(不阻塞创建响应)
segmentation_result = await editor_service.precompute_segmentation_for_project(labeling_project.id)
logger.info(f"Precomputed segmentation for project {labeling_project.id}: {segmentation_result}")
except Exception as e:
logger.warning(f"Failed to precompute segmentation for project {labeling_project.id}: {e}")
# 不影响项目创建,只记录警告
response_data = DatasetMappingCreateResponse(
id=mapping.id,
labeling_project_id=str(mapping.labeling_project_id),
@@ -382,3 +400,116 @@ async def delete_mapping(
except Exception as e:
logger.error(f"Error deleting mapping: {e}")
raise HTTPException(status_code=500, detail="Internal server error")
@router.put("/{project_id}", response_model=StandardResponse[DatasetMappingResponse])
async def update_mapping(
project_id: str = Path(..., description="映射UUID(path param)"),
request: DatasetMappingUpdateRequest = None,
db: AsyncSession = Depends(get_db)
):
"""
更新标注项目信息
通过 path 参数 `project_id` 指定要更新的映射(映射的 UUID)。
支持更新的字段:
- name: 标注项目名称
- description: 标注项目描述
- template_id: 标注模板ID
- label_config: Label Studio XML配置
"""
try:
logger.info(f"Update mapping request received: project_id={project_id!r}")
service = DatasetMappingService(db)
# 直接查询 ORM 模型获取原始数据
result = await db.execute(
select(LabelingProject).where(
LabelingProject.id == project_id,
LabelingProject.deleted_at.is_(None)
)
)
mapping_orm = result.scalar_one_or_none()
if not mapping_orm:
raise HTTPException(
status_code=404,
detail=f"Mapping not found: {project_id}"
)
# 构建更新数据
update_values = {}
if request.name is not None:
update_values["name"] = request.name
# 从 configuration 字段中读取和更新 description 和 label_config
configuration = {}
if mapping_orm.configuration:
configuration = mapping_orm.configuration.copy() if isinstance(mapping_orm.configuration, dict) else {}
if request.description is not None:
configuration["description"] = request.description
if request.label_config is not None:
configuration["label_config"] = request.label_config
if configuration:
update_values["configuration"] = configuration
if request.template_id is not None:
update_values["template_id"] = request.template_id
template_labeling_type = None
if request.template_id:
template_service = AnnotationTemplateService()
template = await template_service.get_template(db, request.template_id)
if not template:
raise HTTPException(
status_code=404,
detail=f"Template not found: {request.template_id}"
)
template_labeling_type = getattr(template, "labeling_type", None)
if template_labeling_type:
configuration[LABELING_TYPE_CONFIG_KEY] = template_labeling_type
if not update_values:
# 没有要更新的字段,直接返回当前数据
response_data = await service.get_mapping_by_uuid(project_id)
return StandardResponse(
code=200,
message="success",
data=response_data
)
# 执行更新
from datetime import datetime
update_values["updated_at"] = datetime.now()
result = await db.execute(
update(LabelingProject)
.where(LabelingProject.id == project_id)
.values(**update_values)
)
await db.commit()
if result.rowcount == 0:
raise HTTPException(
status_code=500,
detail="Failed to update mapping"
)
# 重新获取更新后的数据
updated_mapping = await service.get_mapping_by_uuid(project_id)
logger.info(f"Successfully updated mapping: {project_id}")
return StandardResponse(
code=200,
message="success",
data=updated_mapping
)
except HTTPException:
raise
except Exception as e:
logger.error(f"Error updating mapping: {e}")
raise HTTPException(status_code=500, detail="Internal server error")

View File

@@ -39,9 +39,22 @@ class DatasetMappingCreateResponse(BaseResponseModel):
labeling_project_id: str = Field(..., description="Label Studio项目ID")
labeling_project_name: str = Field(..., description="Label Studio项目名称")
class DatasetMappingUpdateRequest(BaseResponseModel):
"""数据集映射 更新 请求模型"""
dataset_id: Optional[str] = Field(None, description="源数据集ID")
class DatasetMappingUpdateRequest(BaseModel):
"""数据集映射 更新 请求模型
支持更新的字段:
- name: 标注项目名称
- description: 标注项目描述
- template_id: 标注模板ID
- label_config: Label Studio XML配置
"""
name: Optional[str] = Field(None, alias="name", description="标注项目名称")
description: Optional[str] = Field(None, alias="description", description="标注项目描述")
template_id: Optional[str] = Field(None, alias="templateId", description="标注模板ID")
label_config: Optional[str] = Field(None, alias="labelConfig", description="Label Studio XML配置")
class Config:
validate_by_name = True
class DatasetMappingResponse(BaseModel):
"""数据集映射 查询 响应模型"""
@@ -52,6 +65,7 @@ class DatasetMappingResponse(BaseModel):
name: Optional[str] = Field(None, description="标注项目名称")
description: Optional[str] = Field(None, description="标注项目描述")
template_id: Optional[str] = Field(None, alias="templateId", description="关联的模板ID")
labeling_type: Optional[str] = Field(None, alias="labelingType", description="标注类型")
template: Optional['AnnotationTemplateResponse'] = Field(None, description="关联的标注模板详情")
label_config: Optional[str] = Field(None, alias="labelConfig", description="实际使用的 Label Studio XML 配置")
segmentation_enabled: Optional[bool] = Field(

View File

@@ -1185,3 +1185,195 @@ class AnnotationEditorService:
except Exception as exc:
logger.warning("标注同步知识管理失败:%s", exc)
async def precompute_segmentation_for_project(
self,
project_id: str,
max_retries: int = 3
) -> Dict[str, Any]:
"""
为指定项目的所有文本文件预计算切片结构并持久化到数据库
Args:
project_id: 标注项目ID
max_retries: 失败重试次数
Returns:
统计信息:{total_files, succeeded, failed}
"""
project = await self._get_project_or_404(project_id)
dataset_type = self._normalize_dataset_type(await self._get_dataset_type(project.dataset_id))
# 只处理文本数据集
if dataset_type != DATASET_TYPE_TEXT:
logger.info(f"项目 {project_id} 不是文本数据集,跳过切片预生成")
return {"total_files": 0, "succeeded": 0, "failed": 0}
# 检查是否启用分段
if not self._resolve_segmentation_enabled(project):
logger.info(f"项目 {project_id} 未启用分段,跳过切片预生成")
return {"total_files": 0, "succeeded": 0, "failed": 0}
# 获取项目的所有文本文件(排除源文档)
files_result = await self.db.execute(
select(DatasetFiles)
.join(LabelingProjectFile, LabelingProjectFile.file_id == DatasetFiles.id)
.where(
LabelingProjectFile.project_id == project_id,
DatasetFiles.dataset_id == project.dataset_id,
)
)
file_records = files_result.scalars().all()
if not file_records:
logger.info(f"项目 {project_id} 没有文件,跳过切片预生成")
return {"total_files": 0, "succeeded": 0, "failed": 0}
# 过滤源文档文件
valid_files = []
for file_record in file_records:
file_type = str(getattr(file_record, "file_type", "") or "").lower()
file_name = str(getattr(file_record, "file_name", "")).lower()
is_source_document = (
file_type in SOURCE_DOCUMENT_TYPES or
any(file_name.endswith(ext) for ext in SOURCE_DOCUMENT_EXTENSIONS)
)
if not is_source_document:
valid_files.append(file_record)
total_files = len(valid_files)
succeeded = 0
failed = 0
label_config = await self._resolve_project_label_config(project)
primary_text_key = self._resolve_primary_text_key(label_config)
for file_record in valid_files:
file_id = str(file_record.id) # type: ignore
file_name = str(getattr(file_record, "file_name", ""))
for retry in range(max_retries):
try:
# 读取文本内容
text_content = await self._fetch_text_content_via_download_api(project.dataset_id, file_id)
if not isinstance(text_content, str):
logger.warning(f"文件 {file_id} 内容不是字符串,跳过切片")
failed += 1
break
# 解析文本记录
records: List[Tuple[Optional[Dict[str, Any]], str]] = []
if file_name.lower().endswith(JSONL_EXTENSION):
records = self._parse_jsonl_records(text_content)
else:
parsed_payload = self._try_parse_json_payload(text_content)
if parsed_payload:
records = [(parsed_payload, text_content)]
if not records:
records = [(None, text_content)]
record_texts = [
self._resolve_primary_text_value(payload, raw_text, primary_text_key)
for payload, raw_text in records
]
if not record_texts:
record_texts = [text_content]
# 判断是否需要分段
needs_segmentation = len(records) > 1 or any(
len(text or "") > self.SEGMENT_THRESHOLD for text in record_texts
)
if not needs_segmentation:
# 不需要分段的文件,跳过
succeeded += 1
break
# 执行切片
splitter = AnnotationTextSplitter(max_chars=self.SEGMENT_THRESHOLD)
segment_cursor = 0
segments = {}
for record_index, ((payload, raw_text), record_text) in enumerate(zip(records, record_texts)):
normalized_text = record_text or ""
if len(normalized_text) > self.SEGMENT_THRESHOLD:
raw_segments = splitter.split(normalized_text)
for chunk_index, seg in enumerate(raw_segments):
segments[str(segment_cursor)] = {
SEGMENT_RESULT_KEY: [],
SEGMENT_CREATED_AT_KEY: datetime.utcnow().isoformat() + "Z",
SEGMENT_UPDATED_AT_KEY: datetime.utcnow().isoformat() + "Z",
}
segment_cursor += 1
else:
segments[str(segment_cursor)] = {
SEGMENT_RESULT_KEY: [],
SEGMENT_CREATED_AT_KEY: datetime.utcnow().isoformat() + "Z",
SEGMENT_UPDATED_AT_KEY: datetime.utcnow().isoformat() + "Z",
}
segment_cursor += 1
if not segments:
succeeded += 1
break
# 构造分段标注结构
final_payload = {
SEGMENTED_KEY: True,
"version": 1,
SEGMENTS_KEY: segments,
SEGMENT_TOTAL_KEY: segment_cursor,
}
# 检查是否已存在标注
existing_result = await self.db.execute(
select(AnnotationResult).where(
AnnotationResult.project_id == project_id,
AnnotationResult.file_id == file_id,
)
)
existing = existing_result.scalar_one_or_none()
now = datetime.utcnow()
if existing:
# 更新现有标注
existing.annotation = final_payload # type: ignore[assignment]
existing.annotation_status = ANNOTATION_STATUS_IN_PROGRESS # type: ignore[assignment]
existing.updated_at = now # type: ignore[assignment]
else:
# 创建新标注记录
record = AnnotationResult(
id=str(uuid.uuid4()),
project_id=project_id,
file_id=file_id,
annotation=final_payload,
annotation_status=ANNOTATION_STATUS_IN_PROGRESS,
created_at=now,
updated_at=now,
)
self.db.add(record)
await self.db.commit()
succeeded += 1
logger.info(f"成功为文件 {file_id} 预生成 {segment_cursor} 个切片")
break
except Exception as e:
logger.warning(
f"为文件 {file_id} 预生成切片失败 (重试 {retry + 1}/{max_retries}): {e}"
)
if retry == max_retries - 1:
failed += 1
await self.db.rollback()
logger.info(
f"项目 {project_id} 切片预生成完成: 总计 {total_files}, 成功 {succeeded}, 失败 {failed}"
)
return {
"total_files": total_files,
"succeeded": succeeded,
"failed": failed,
}

View File

@@ -11,7 +11,6 @@ from sqlalchemy.ext.asyncio import AsyncSession
from app.core.config import settings
from app.core.logging import get_logger
from app.db.models import Dataset, DatasetFiles, LabelingProject
from app.module.annotation.service.text_fetcher import fetch_text_content_via_download_api
logger = get_logger(__name__)
@@ -77,15 +76,18 @@ class KnowledgeSyncService:
if set_id:
exists = await self._get_knowledge_set(set_id)
if exists:
if exists and self._metadata_matches_project(exists.get("metadata"), project.id):
return set_id
logger.warning("知识集不存在,准备重建:set_id=%s", set_id)
logger.warning(
"知识集不存在或归属不匹配,准备重建:set_id=%s project_id=%s",
set_id,
project.id,
)
dataset_name = project.name or "annotation-project"
base_name = dataset_name.strip() or "annotation-project"
project_name = (project.name or "annotation-project").strip() or "annotation-project"
metadata = self._build_set_metadata(project)
existing = await self._find_knowledge_set_by_name(base_name)
existing = await self._find_knowledge_set_by_name_and_project(project_name, project.id)
if existing:
await self._update_project_config(
project,
@@ -96,19 +98,19 @@ class KnowledgeSyncService:
)
return existing.get("id")
created = await self._create_knowledge_set(base_name, metadata)
created = await self._create_knowledge_set(project_name, metadata)
if not created:
created = await self._find_knowledge_set_by_name(base_name)
created = await self._find_knowledge_set_by_name_and_project(project_name, project.id)
if not created:
fallback_name = self._build_fallback_set_name(base_name, project.id)
existing = await self._find_knowledge_set_by_name(fallback_name)
fallback_name = self._build_fallback_set_name(project_name, project.id)
existing = await self._find_knowledge_set_by_name_and_project(fallback_name, project.id)
if existing:
created = existing
else:
created = await self._create_knowledge_set(fallback_name, metadata)
if not created:
created = await self._find_knowledge_set_by_name(fallback_name)
created = await self._find_knowledge_set_by_name_and_project(fallback_name, project.id)
if not created:
return None
@@ -153,16 +155,18 @@ class KnowledgeSyncService:
return []
return [item for item in content if isinstance(item, dict)]
async def _find_knowledge_set_by_name(self, name: str) -> Optional[Dict[str, Any]]:
async def _find_knowledge_set_by_name_and_project(self, name: str, project_id: str) -> Optional[Dict[str, Any]]:
if not name:
return None
items = await self._list_knowledge_sets(name)
if not items:
return None
exact_matches = [item for item in items if item.get("name") == name]
if not exact_matches:
return None
return exact_matches[0]
for item in items:
if item.get("name") != name:
continue
if self._metadata_matches_project(item.get("metadata"), project_id):
return item
return None
async def _create_knowledge_set(self, name: str, metadata: str) -> Optional[Dict[str, Any]]:
payload = {
@@ -249,16 +253,6 @@ class KnowledgeSyncService:
content_type = "MARKDOWN"
content = annotation_json
if dataset_type == "TEXT":
try:
content = await fetch_text_content_via_download_api(
project.dataset_id,
str(file_record.id),
)
content = self._append_annotation_to_content(content, annotation_json, content_type)
except Exception as exc:
logger.warning("读取文本失败,改为仅存标注JSON:%s", exc)
content = annotation_json
payload: Dict[str, Any] = {
"title": title,
@@ -289,13 +283,6 @@ class KnowledgeSyncService:
extension = file_type
return extension.lower() in {"md", "markdown"}
def _append_annotation_to_content(self, content: str, annotation_json: str, content_type: str) -> str:
if content_type == "MARKDOWN":
return (
f"{content}\n\n---\n\n## 标注结果\n\n```json\n"
f"{annotation_json}\n```")
return f"{content}\n\n---\n\n标注结果(JSON):\n{annotation_json}"
def _strip_extension(self, file_name: str) -> str:
if not file_name:
return ""
@@ -359,6 +346,27 @@ class KnowledgeSyncService:
except Exception:
return json.dumps({"error": "failed to serialize"}, ensure_ascii=False)
def _metadata_matches_project(self, metadata: Any, project_id: str) -> bool:
if not project_id:
return False
parsed = self._parse_metadata(metadata)
if not parsed:
return False
return str(parsed.get("project_id") or "").strip() == project_id
def _parse_metadata(self, metadata: Any) -> Optional[Dict[str, Any]]:
if metadata is None:
return None
if isinstance(metadata, dict):
return metadata
if isinstance(metadata, str):
try:
payload = json.loads(metadata)
except Exception:
return None
return payload if isinstance(payload, dict) else None
return None
def _safe_response_text(self, response: httpx.Response) -> str:
try:
return response.text

View File

@@ -7,7 +7,7 @@ from datetime import datetime
import uuid
from app.core.logging import get_logger
from app.db.models import LabelingProject, AnnotationTemplate, AnnotationResult, LabelingProjectFile
from app.db.models import LabelingProject, AnnotationResult, LabelingProjectFile
from app.db.models.annotation_management import ANNOTATION_STATUS_IN_PROGRESS
from app.db.models.dataset_management import Dataset, DatasetFiles
from app.module.annotation.schema import (
@@ -18,6 +18,7 @@ from app.module.annotation.schema import (
)
logger = get_logger(__name__)
LABELING_TYPE_CONFIG_KEY = "labeling_type"
class DatasetMappingService:
"""数据集映射服务"""
@@ -106,10 +107,12 @@ class DatasetMappingService:
label_config = None
description = None
segmentation_enabled = None
labeling_type = None
if isinstance(configuration, dict):
label_config = configuration.get('label_config')
description = configuration.get('description')
segmentation_enabled = configuration.get('segmentation_enabled')
labeling_type = configuration.get(LABELING_TYPE_CONFIG_KEY)
# Optionally fetch full template details
template_response = None
@@ -119,6 +122,9 @@ class DatasetMappingService:
template_response = await template_service.get_template(self.db, template_id)
logger.debug(f"Included template details for template_id: {template_id}")
if not labeling_type and template_response:
labeling_type = getattr(template_response, "labeling_type", None)
# 获取统计数据
total_count, annotated_count, in_progress_count = await self._get_project_stats(
mapping.id, mapping.dataset_id
@@ -132,6 +138,7 @@ class DatasetMappingService:
"name": mapping.name,
"description": description,
"template_id": template_id,
"labeling_type": labeling_type,
"template": template_response,
"label_config": label_config,
"segmentation_enabled": segmentation_enabled,
@@ -174,10 +181,12 @@ class DatasetMappingService:
label_config = None
description = None
segmentation_enabled = None
labeling_type = None
if isinstance(configuration, dict):
label_config = configuration.get('label_config')
description = configuration.get('description')
segmentation_enabled = configuration.get('segmentation_enabled')
labeling_type = configuration.get(LABELING_TYPE_CONFIG_KEY)
# Optionally fetch full template details
template_response = None
@@ -187,6 +196,9 @@ class DatasetMappingService:
template_response = await template_service.get_template(self.db, template_id)
logger.debug(f"Included template details for template_id: {template_id}")
if not labeling_type and template_response:
labeling_type = getattr(template_response, "labeling_type", None)
# 获取统计数据
total_count, annotated_count, in_progress_count = 0, 0, 0
if dataset_id:
@@ -203,6 +215,7 @@ class DatasetMappingService:
"name": mapping.name,
"description": description,
"template_id": template_id,
"labeling_type": labeling_type,
"template": template_response,
"label_config": label_config,
"segmentation_enabled": segmentation_enabled,

View File

@@ -19,23 +19,24 @@ async def fetch_text_content_via_download_api(dataset_id: str, file_id: str) ->
resp = await client.get(url)
resp.raise_for_status()
max_bytes = settings.editor_max_text_bytes
content_length = resp.headers.get("content-length")
if content_length:
if max_bytes > 0 and content_length:
try:
if int(content_length) > settings.editor_max_text_bytes:
if int(content_length) > max_bytes:
raise HTTPException(
status_code=413,
detail=f"文本文件过大,限制 {settings.editor_max_text_bytes} 字节",
detail=f"文本文件过大,限制 {max_bytes} 字节",
)
except ValueError:
# content-length 非法则忽略,走实际长度判断
pass
data = resp.content
if len(data) > settings.editor_max_text_bytes:
if max_bytes > 0 and len(data) > max_bytes:
raise HTTPException(
status_code=413,
detail=f"文本文件过大,限制 {settings.editor_max_text_bytes} 字节",
detail=f"文本文件过大,限制 {max_bytes} 字节",
)
# TEXT POC:默认按 UTF-8 解码,不可解码字符用替换符处理

View File

@@ -158,6 +158,7 @@ CREATE TABLE IF NOT EXISTS t_dm_knowledge_items (
sensitivity VARCHAR(50) COMMENT '敏感级别',
source_dataset_id VARCHAR(36) COMMENT '来源数据集ID',
source_file_id VARCHAR(36) COMMENT '来源文件ID',
relative_path VARCHAR(1000) COMMENT '条目相对路径',
tags JSON COMMENT '标签列表',
metadata JSON COMMENT '扩展元数据',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
@@ -177,6 +178,20 @@ CREATE TABLE IF NOT EXISTS t_dm_knowledge_items (
INDEX idx_dm_ki_source_file (source_file_id)
) COMMENT='知识条目表(UUID 主键)';
-- 知识条目目录表
CREATE TABLE IF NOT EXISTS t_dm_knowledge_item_directories (
id VARCHAR(36) PRIMARY KEY COMMENT 'UUID',
set_id VARCHAR(36) NOT NULL COMMENT '所属知识集ID(UUID)',
name VARCHAR(255) NOT NULL COMMENT '目录名称',
relative_path VARCHAR(1000) NOT NULL COMMENT '目录相对路径',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
created_by VARCHAR(255) COMMENT '创建者',
updated_by VARCHAR(255) COMMENT '更新者',
FOREIGN KEY (set_id) REFERENCES t_dm_knowledge_sets(id) ON DELETE CASCADE,
INDEX idx_dm_kd_set_id (set_id)
) COMMENT='知识条目目录表(UUID 主键)';
-- ===========================================
-- 非数据管理表(如 users、t_data_sources)保持不变
-- ===========================================

View File

@@ -67,7 +67,7 @@ RUN if [ -f /etc/apt/sources.list.d/debian.sources ]; then \
sed -i 's/deb.debian.org/mirrors.aliyun.com/g; s/archive.ubuntu.com/mirrors.aliyun.com/g; s/security.ubuntu.com/mirrors.aliyun.com/g' /etc/apt/sources.list; \
fi && \
apt-get update && \
apt-get install -y vim wget curl rsync python3 python3-pip python-is-python3 dos2unix && \
apt-get install -y vim wget curl rsync python3 python3-pip python-is-python3 dos2unix libreoffice fonts-noto-cjk && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

View File

@@ -45,7 +45,7 @@ RUN npm config set registry https://registry.npmmirror.com && \
##### RUNNER
FROM gcr.io/distroless/nodejs20-debian12 AS runner
FROM gcr.nju.edu.cn/distroless/nodejs20-debian12 AS runner
WORKDIR /app
ENV NODE_ENV=production

View File

@@ -0,0 +1,93 @@
# backend-python Dockerfile 离线版本
# 修改点: 使用本地 DataX 源码替代 git clone
FROM maven:3-eclipse-temurin-8 AS datax-builder
# 配置 Maven 阿里云镜像
RUN mkdir -p /root/.m2 && \
echo '<?xml version="1.0" encoding="UTF-8"?>\n\
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"\n\
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\n\
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">\n\
<mirrors>\n\
<mirror>\n\
<id>aliyunmaven</id>\n\
<mirrorOf>*</mirrorOf>\n\
<name>阿里云公共仓库</name>\n\
<url>https://maven.aliyun.com/repository/public</url>\n\
</mirror>\n\
</mirrors>\n\
</settings>' > /root/.m2/settings.xml
# 离线模式: 从构建参数获取本地 DataX 路径
ARG DATAX_LOCAL_PATH=./build-cache/resources/DataX
# 复制本地 DataX 源码(离线环境预先下载)
COPY ${DATAX_LOCAL_PATH} /DataX
COPY runtime/datax/ DataX/
RUN cd DataX && \
sed -i "s/com.mysql.jdbc.Driver/com.mysql.cj.jdbc.Driver/g" \
plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DataBaseType.java && \
mvn -U clean package assembly:assembly -Dmaven.test.skip=true
FROM python:3.12-slim
# 配置 apt 阿里云镜像源
RUN if [ -f /etc/apt/sources.list.d/debian.sources ]; then \
sed -i 's/deb.debian.org/mirrors.aliyun.com/g' /etc/apt/sources.list.d/debian.sources; \
elif [ -f /etc/apt/sources.list ]; then \
sed -i 's/deb.debian.org/mirrors.aliyun.com/g' /etc/apt/sources.list; \
fi && \
apt-get update && \
apt-get install -y --no-install-recommends vim openjdk-21-jre nfs-common glusterfs-client rsync && \
rm -rf /var/lib/apt/lists/*
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
POETRY_VERSION=2.2.1 \
POETRY_NO_INTERACTION=1 \
POETRY_VIRTUALENVS_CREATE=false \
POETRY_CACHE_DIR=/tmp/poetry_cache
ENV JAVA_HOME=/usr/lib/jvm/java-21-openjdk
ENV PATH="/root/.local/bin:$JAVA_HOME/bin:$PATH"
WORKDIR /app
# 配置 pip 阿里云镜像并安装 Poetry
RUN --mount=type=cache,target=/root/.cache/pip \
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/ && \
pip config set global.trusted-host mirrors.aliyun.com && \
pip install --upgrade --root-user-action=ignore pip \
&& pip install --root-user-action=ignore pipx \
&& pipx install "poetry==$POETRY_VERSION"
COPY --from=datax-builder /DataX/target/datax/datax /opt/datax
RUN cp /opt/datax/plugin/reader/mysqlreader/libs/mysql* /opt/datax/plugin/reader/starrocksreader/libs/
# Copy only dependency files first
COPY runtime/datamate-python/pyproject.toml runtime/datamate-python/poetry.lock* /app/
# Install dependencies
RUN --mount=type=cache,target=$POETRY_CACHE_DIR \
poetry install --no-root --only main
# 离线模式: 使用本地 NLTK 数据
ARG NLTK_DATA_LOCAL_PATH=./build-cache/resources/nltk_data
COPY ${NLTK_DATA_LOCAL_PATH} /usr/local/nltk_data
ENV NLTK_DATA=/usr/local/nltk_data
# Copy the rest of the application
COPY runtime/datamate-python /app
COPY runtime/datamate-python/deploy/docker-entrypoint.sh /docker-entrypoint.sh
RUN chmod +x /docker-entrypoint.sh || true
# Expose the application port
EXPOSE 18000
ENTRYPOINT ["/docker-entrypoint.sh"]

View File

@@ -0,0 +1,82 @@
# backend-python Dockerfile 离线版本 v2
FROM maven:3-eclipse-temurin-8 AS datax-builder
# 配置 Maven 阿里云镜像
RUN mkdir -p /root/.m2 && \
echo '<?xml version="1.0" encoding="UTF-8"?>\n\
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"\n\
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\n\
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">\n\
<mirrors>\n\
<mirror>\n\
<id>aliyunmaven</id>\n\
<mirrorOf>*</mirrorOf>\n\
<name>阿里云公共仓库</name>\n\
<url>https://maven.aliyun.com/repository/public</url>\n\
</mirror>\n\
</mirrors>\n\
</settings>' > /root/.m2/settings.xml
# 离线模式: 从构建参数获取本地 DataX 路径
ARG RESOURCES_DIR=./build-cache/resources
ARG DATAX_LOCAL_PATH=${RESOURCES_DIR}/DataX
# 复制本地 DataX 源码
COPY ${DATAX_LOCAL_PATH} /DataX
COPY runtime/datax/ DataX/
RUN cd DataX && \
sed -i "s/com.mysql.jdbc.Driver/com.mysql.cj.jdbc.Driver/g" \
plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DataBaseType.java && \
mvn -U clean package assembly:assembly -Dmaven.test.skip=true
# 使用预装 APT 包的基础镜像
FROM datamate-python-base:latest
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
POETRY_VERSION=2.2.1 \
POETRY_NO_INTERACTION=1 \
POETRY_VIRTUALENVS_CREATE=false \
POETRY_CACHE_DIR=/tmp/poetry_cache
ENV JAVA_HOME=/usr/lib/jvm/java-21-openjdk
ENV PATH="/root/.local/bin:$JAVA_HOME/bin:$PATH"
WORKDIR /app
# 配置 pip 阿里云镜像并安装 Poetry
RUN --mount=type=cache,target=/root/.cache/pip \
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/ && \
pip config set global.trusted-host mirrors.aliyun.com && \
pip install --upgrade --root-user-action=ignore pip \
&& pip install --root-user-action=ignore pipx \
&& pipx install "poetry==$POETRY_VERSION"
COPY --from=datax-builder /DataX/target/datax/datax /opt/datax
RUN cp /opt/datax/plugin/reader/mysqlreader/libs/mysql* /opt/datax/plugin/reader/starrocksreader/libs/
# Copy only dependency files first
COPY runtime/datamate-python/pyproject.toml runtime/datamate-python/poetry.lock* /app/
# Install dependencies
RUN --mount=type=cache,target=$POETRY_CACHE_DIR \
poetry install --no-root --only main
# 离线模式: 使用本地 NLTK 数据
ARG RESOURCES_DIR=./build-cache/resources
ARG NLTK_DATA_LOCAL_PATH=${RESOURCES_DIR}/nltk_data
COPY ${NLTK_DATA_LOCAL_PATH} /usr/local/nltk_data
ENV NLTK_DATA=/usr/local/nltk_data
# Copy the rest of the application
COPY runtime/datamate-python /app
COPY runtime/datamate-python/deploy/docker-entrypoint.sh /docker-entrypoint.sh
RUN chmod +x /docker-entrypoint.sh || true
EXPOSE 18000
ENTRYPOINT ["/docker-entrypoint.sh"]

View File

@@ -0,0 +1,71 @@
# backend Dockerfile 离线版本
# 使用预装 APT 包的基础镜像
FROM maven:3-eclipse-temurin-21 AS builder
# 配置 Maven 阿里云镜像
RUN mkdir -p /root/.m2 && \
echo '<?xml version="1.0" encoding="UTF-8"?>\n\
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"\n\
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\n\
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">\n\
<mirrors>\n\
<mirror>\n\
<id>aliyunmaven</id>\n\
<mirrorOf>*</mirrorOf>\n\
<name>阿里云公共仓库</name>\n\
<url>https://maven.aliyun.com/repository/public</url>\n\
</mirror>\n\
</mirrors>\n\
</settings>' > /root/.m2/settings.xml
WORKDIR /opt/backend
# 先复制所有 pom.xml 文件
COPY backend/pom.xml ./
COPY backend/services/pom.xml ./services/
COPY backend/shared/domain-common/pom.xml ./shared/domain-common/
COPY backend/shared/security-common/pom.xml ./shared/security-common/
COPY backend/services/data-annotation-service/pom.xml ./services/data-annotation-service/
COPY backend/services/data-cleaning-service/pom.xml ./services/data-cleaning-service/
COPY backend/services/data-evaluation-service/pom.xml ./services/data-evaluation-service/
COPY backend/services/data-management-service/pom.xml ./services/data-management-service/
COPY backend/services/data-synthesis-service/pom.xml ./services/data-synthesis-service/
COPY backend/services/execution-engine-service/pom.xml ./services/execution-engine-service/
COPY backend/services/main-application/pom.xml ./services/main-application/
COPY backend/services/operator-market-service/pom.xml ./services/operator-market-service/
COPY backend/services/pipeline-orchestration-service/pom.xml ./services/pipeline-orchestration-service/
COPY backend/services/rag-indexer-service/pom.xml ./services/rag-indexer-service/
COPY backend/services/rag-query-service/pom.xml ./services/rag-query-service/
# 使用缓存卷下载依赖
RUN --mount=type=cache,target=/root/.m2/repository \
cd /opt/backend/services && \
mvn dependency:go-offline -Dmaven.test.skip=true || true
# 复制所有源代码
COPY backend/ /opt/backend
# 编译打包
RUN --mount=type=cache,target=/root/.m2/repository \
cd /opt/backend/services && \
mvn clean package -Dmaven.test.skip=true
# 使用预装 APT 包的基础镜像
FROM datamate-java-base:latest
# 不再执行 apt-get update,因为基础镜像已经预装了所有需要的包
# 如果需要添加额外的包,可以在这里添加,但离线环境下会失败
COPY --from=builder /opt/backend/services/main-application/target/datamate.jar /opt/backend/datamate.jar
COPY scripts/images/backend/start.sh /opt/backend/start.sh
COPY runtime/ops/examples/test_operator/test_operator.tar /opt/backend/test_operator.tar
RUN dos2unix /opt/backend/start.sh \
&& chmod +x /opt/backend/start.sh \
&& ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
EXPOSE 8080
ENTRYPOINT ["/opt/backend/start.sh"]
CMD ["java", "-Duser.timezone=Asia/Shanghai", "-jar", "/opt/backend/datamate.jar"]

Some files were not shown because too many files have changed in this diff Show More