Compare commits

...

110 Commits

Author SHA1 Message Date
f381d641ab fix(upload): 修复流式上传中的文件名处理逻辑
- 修正预上传接口调用时传递正确的文件总数而非固定值-1
- 移除导入配置中文件分割时的文件扩展名保留逻辑
- 删除流式上传选项中的fileExtension参数定义
- 移除流式上传实现中的文件扩展名处理相关代码
- 简化新文件名生成逻辑,不再附加扩展名后缀
2026-02-04 07:47:41 +08:00
c8611d29ff feat(upload): 实现流式分割上传,优化大文件上传体验
实现边分割边上传的流式处理,避免大文件一次性加载导致前端卡顿。

修改内容:
1. file.util.ts - 流式分割上传核心功能
   - 新增 streamSplitAndUpload 函数,实现边分割边上传
   - 新增 shouldStreamUpload 函数,判断是否使用流式上传
   - 新增 StreamUploadOptions 和 StreamUploadResult 接口
   - 优化分片大小(默认 5MB)

2. ImportConfiguration.tsx - 智能上传策略
   - 大文件(>5MB)使用流式分割上传
   - 小文件(≤5MB)使用传统分割方式
   - 保持 UI 不变

3. useSliceUpload.tsx - 流式上传处理
   - 新增 handleStreamUpload 处理流式上传事件
   - 支持并发上传和更好的进度管理

4. TaskUpload.tsx - 进度显示优化
   - 注册流式上传事件监听器
   - 显示流式上传信息(已上传行数、当前文件等)

5. dataset.model.ts - 类型定义扩展
   - 新增 StreamUploadInfo 接口
   - TaskItem 接口添加 streamUploadInfo 和 prefix 字段

实现特点:
- 流式读取:使用 Blob.slice 逐块读取,避免一次性加载
- 逐行检测:按换行符分割,形成完整行后立即上传
- 内存优化:buffer 只保留当前块和未完成行,不累积所有分割结果
- 并发控制:支持 3 个并发上传,提升效率
- 进度可见:实时显示已上传行数和总体进度
- 错误处理:单个文件上传失败不影响其他文件
- 向后兼容:小文件仍使用原有分割方式

优势:
- 大文件上传不再卡顿,用户体验大幅提升
- 内存占用显著降低(从加载整个文件到只保留当前块)
- 上传效率提升(边分割边上传,并发上传多个小文件)

相关文件:
- frontend/src/utils/file.util.ts
- frontend/src/pages/DataManagement/Detail/components/ImportConfiguration.tsx
- frontend/src/hooks/useSliceUpload.tsx
- frontend/src/pages/Layout/TaskUpload.tsx
- frontend/src/pages/DataManagement/dataset.model.ts
2026-02-03 13:12:10 +00:00
147beb1ec7 feat(annotation): 实现文本切片预生成功能
在创建标注任务时自动预生成文本切片结构,避免每次进入标注页面时的实时计算。

修改内容:
1. 在 AnnotationEditorService 中新增 precompute_segmentation_for_project 方法
   - 为项目的所有文本文件预计算切片结构
   - 使用 AnnotationTextSplitter 执行切片
   - 将切片结构持久化到 AnnotationResult 表(状态为 IN_PROGRESS)
   - 支持失败重试机制
   - 返回统计信息

2. 修改 create_mapping 接口
   - 在创建标注任务后,如果启用分段且为文本数据集,自动触发切片预生成
   - 使用 try-except 捕获异常,确保切片失败不影响项目创建

特点:
- 使用现有的 AnnotationTextSplitter 类
- 切片数据结构与现有分段标注格式一致
- 向后兼容(未切片的任务仍然使用实时计算)
- 性能优化:避免进入标注页面时的重复计算

相关文件:
- runtime/datamate-python/app/module/annotation/service/editor.py
- runtime/datamate-python/app/module/annotation/interface/project.py
2026-02-03 12:59:29 +00:00
699031dae7 fix: 修复编辑数据集时无法清除关联数据集的编译问题
问题分析:
之前尝试使用 @TableField(updateStrategy = FieldStrategy.IGNORED/ALWAYS) 注解
来强制更新 null 值,但 FieldStrategy.ALWAYS 可能不存在于当前
MyBatis-Plus 3.5.14 版本中,导致编译错误。

修复方案:
1. 移除 Dataset.java 中 parentDatasetId 字段的 @TableField(updateStrategy) 注解
2. 移除不需要的 import com.baomidou.mybatisplus.annotation.FieldStrategy
3. 在 DatasetApplicationService.updateDataset 方法中:
   - 添加 import com.baomidou.mybatisplus.core.conditions.update.LambdaUpdateWrapper
   - 保存原始的 parentDatasetId 值用于比较
   - handleParentChange 之后,检查 parentDatasetId 是否发生变化
   - 如果发生变化,使用 LambdaUpdateWrapper 显式地更新 parentDatasetId 字段
   - 这样即使值为 null 也能被正确更新到数据库

原理:
MyBatis-Plus 的 updateById 方法默认只更新非 null 字段。
通过使用 LambdaUpdateWrapper 的 set 方法,可以显式地设置字段值,
包括 null 值,从而确保字段能够被正确更新到数据库。
2026-02-03 11:09:15 +00:00
88b1383653 fix: 恢复前端发送空字符串以支持清除关联数据集
修改说明:
移除了之前将空字符串转换为 undefined 的逻辑,
现在直接发送表单值,包括空字符串。

配合后端修改(commit cc6415c):
1. 当用户选择"无关联数据集"时,发送空字符串 ""
2. 后端 handleParentChange 方法通过 normalizeParentId 将空字符串转为 null
3. Dataset.parentDatasetId 字段添加了 @TableField(updateStrategy = FieldStrategy.IGNORED)
4. 确保即使值为 null 也会被更新到数据库
2026-02-03 10:57:14 +00:00
cc6415c4d9 fix: 修复编辑数据集时无法清除关联数据集的问题
问题描述:
在数据管理的数据集编辑中,如果之前设置了关联数据集,编辑时选择不关联数据集后保存不会生效。

根本原因:
MyBatis-Plus 的 updateById 方法默认使用 FieldStrategy.NOT_NULL 策略,
只有当字段值为非 null 时才会更新到数据库。
当 parentDatasetId 从有值变为 null 时,默认情况下不会更新到数据库。

修复方案:
在 Dataset.java 的 parentDatasetId 字段上添加 @TableField(updateStrategy = FieldStrategy.IGNORED) 注解,
表示即使值为 null 也需要更新到数据库。

配合前端修改(恢复发送空字符串),现在可以正确清除关联数据集:
1. 前端发送空字符串表示"无关联数据集"
2. 后端 handleParentChange 通过 normalizeParentId 将空字符串转为 null
3. dataset.setParentDatasetId(null) 设置为 null
4. 由于添加了 IGNORED 策略,即使为 null 也会更新到数据库
2026-02-03 10:57:08 +00:00
3d036c4cd6 fix: 修复编辑数据集时无法清除关联数据集的问题
问题描述:
在数据管理的数据集编辑中,如果之前设置了关联数据集,编辑时选择不关联数据集后保存不会生效。

问题原因:
后端 updateDataset 方法中的条件判断:
```java
if (updateDatasetRequest.getParentDatasetId() != null) {
    handleParentChange(dataset, updateDatasetRequest.getParentDatasetId());
}
```
当 parentDatasetId 为 null 或空字符串时,条件判断为 false,不会执行 handleParentChange,导致无法清除关联数据集。

修复方案:
去掉条件判断,始终调用 handleParentChange。handleParentChange 内部通过 normalizeParentId 方法将空字符串和 null 都转换为 null,这样既支持设置新的父数据集,也支持清除关联。

配合前端修改(commit 2445235),将空字符串转换为 undefined(被后端反序列化为 null),确保清除关联的操作能够正确执行。
2026-02-03 09:35:09 +00:00
2445235fd2 fix: 修复编辑数据集时清除关联数据集不生效的问题
问题描述:
在数据管理的数据集编辑中,如果之前设置了关联数据集,编辑时选择不关联数据集后保存不会生效。

问题原因:
- BasicInformation.tsx中,"无关联数据集"选项的值是空字符串""
- 当用户选择不关联数据集时,parentDatasetId的值为""
- 后端API将空字符串视为无效值而忽略,而不是识别为"清除关联"的操作

修复方案:
- 在EditDataset.tsx的handleSubmit函数中,将parentDatasetId的空字符串转换为undefined
- 使用 formValues.parentDatasetId || undefined 确保空字符串被转换为 undefined
- 这样后端API能正确识别为要清除关联数据集的操作
2026-02-03 09:23:13 +00:00
893e0a1580 fix: 上传文件时任务中心立即显示
问题描述:
在数据管理的数据集详情页上传文件时,点击确认后,弹窗消失,但是需要等待文件处理(特别是启用按行分割时)后任务中心才弹出来,用户体验不好。

修改内容:
1. useSliceUpload.tsx: 在 createTask 函数中添加立即显示任务中心的逻辑,确保任务创建后立即显示
2. ImportConfiguration.tsx: 在 handleImportData 函数中,在执行耗时的文件处理操作(如文件分割)之前,立即触发 show:task-popover 事件显示任务中心

效果:
- 修改前:点击确认 → 弹窗消失 → (等待文件处理)→ 任务中心弹出
- 修改后:点击确认 → 弹窗消失 + 任务中心立即弹出 → 文件开始处理
2026-02-03 09:14:40 +00:00
05e6842fc8 refactor(DataManagement): 移除不必要的数据集类型过滤逻辑
- 删除了对数据集类型的过滤操作
- 移除了不再使用的 textDatasetTypeOptions 变量
- 简化了 BasicInformation 组件的数据传递逻辑
- 减少了代码冗余,提高了组件性能
2026-02-03 13:33:12 +08:00
da5b18e423 feat(scripts): 添加 APT 缓存预装功能解决离线构建问题
- 新增 APT 缓存目录和相关构建脚本 export-cache.sh
- 添加 build-base-images.sh 脚本用于构建预装 APT 包的基础镜像
- 增加 build-offline-final.sh 最终版离线构建脚本
- 更新 Makefile.offline.mk 添加新的离线构建目标
- 扩展 README.md 文档详细说明 APT 缓存问题解决方案
- 为多个服务添加使用预装基础镜像的离线 Dockerfile
- 修改打包脚本包含 APT 缓存到最终压缩包中
2026-02-03 13:16:17 +08:00
31629ab50b docs(offline): 更新离线构建文档添加传统构建方式和故障排查指南
- 添加传统 docker build 方式作为推荐方案
- 新增离线环境诊断命令 make offline-diagnose
- 扩展故障排查章节,增加多个常见问题解决方案
- 添加文件清单和推荐工作流说明
- 为 BuildKit 构建器无法使用本地镜像问题提供多种解决方法
- 更新构建命令使用说明和重要提示信息
2026-02-03 13:10:28 +08:00
fb43052ddf feat(build): 添加传统 Docker 构建方式和诊断功能
- 在 build-offline.sh 脚本中添加 --pull=false 参数并改进错误处理
- 为 Makefile.offline.mk 中的各个服务构建任务添加 --pull=false 参数
- 新增 build-offline-classic.sh 脚本,提供不使用 BuildKit 的传统构建方式
- 新增 build-offline-v2.sh 脚本,提供增强版 BuildKit 离线构建功能
- 新增 diagnose.sh 脚本,用于诊断离线构建环境状态
- 在 Makefile 中添加 offline-build-classic 和 offline-diagnose
2026-02-02 23:53:45 +08:00
c44c75be25 fix(login): 修复登录页面样式问题
- 修正了标题下方描述文字的CSS类名,移除了错误的空格
- 更新了页脚版权信息的样式类名
- 简化了底部描述文本的内容,保持一致的品牌信息
2026-02-02 22:49:46 +08:00
05f3efc148 build(docker): 更新 Docker 镜像源为南京大学镜像地址
- 将 frontend Dockerfile 中的基础镜像从 gcr.io 切换到 gcr.nju.edu.cn
- 更新 offline Dockerfile 中的 nodejs20-debian12 镜像源
- 修改 export-cache.sh 脚本中的基础镜像列表为南京大学镜像
- 更新 Makefile.offline.mk 中的镜像拉取地址为本地镜像源
- 优化 export-cache.sh 脚本的格式和输出信息
- 添加缓存导出过程中的警告处理机制
2026-02-02 22:48:41 +08:00
16eb5cacf9 feat(data-management): 添加知识项扩展元数据支持
- 在 KnowledgeItemApplicationService 中实现元数据字段的更新逻辑
- 为 CreateKnowledgeItemRequest 添加 metadata 字段定义
- 为 UpdateKnowledgeItemRequest 添加 metadata 字段定义
- 支持知识项创建和更新时的扩展元数据存储
2026-02-02 22:20:05 +08:00
e71116d117 refactor(components): 更新标签组件类型定义和数据处理逻辑
- 修改 Tag 接口定义,将 id 和 color 字段改为可选类型
- 更新 onAddTag 回调函数参数类型,从对象改为字符串
- 在 AddTagPopover 组件中添加 useCallback 优化数据获取逻辑
- 调整标签去重逻辑,支持 id 或 name 任一字段匹配
- 更新 DetailHeader 组件的数据类型定义和泛型约束
- 添加 parseMetadata 工具函数用于解析元数据
- 实现 isAnnotationItem 函数判断注释类型数据
- 优化知识库详情页的标签处理和数据类型转换
2026-02-02 22:15:16 +08:00
cac53d7aac fix(knowledge): 更新知识管理页面标题为知识集
- 将页面标题从"知识管理"修改为"知识集"
2026-02-02 21:49:39 +08:00
43b4a619bc refactor(knowledge): 移除知识库创建中的扩展元数据字段
- 删除了表单中的扩展元数据输入区域
- 移除了对应的 Form.Item 包装器
- 简化了创建知识库表单结构
2026-02-02 21:48:21 +08:00
9da187d2c6 feat(build): 添加离线构建支持
- 新增 build-offline.sh 脚本实现无网环境构建
- 添加离线版 Dockerfiles 使用本地资源替代网络下载
- 创建 export-cache.sh 脚本在有网环境预下载依赖
- 集成 Makefile.offline.mk 提供便捷的离线构建命令
- 添加详细的离线构建文档和故障排查指南
- 实现基础镜像、BuildKit 缓存和外部资源的一键打包
2026-02-02 21:44:44 +08:00
b36fdd2438 feat(annotation): 添加数据类型过滤功能到标签配置树编辑器
- 引入 DataType 枚举类型定义
- 根据数据类型动态过滤对象标签选项
- 在模板表单中添加数据类型监听
- 改进错误处理逻辑以提高类型安全性
- 集成数据类型参数到配置树编辑器组件
2026-02-02 20:37:38 +08:00
daa63bdd13 feat(knowledge): 移除知识库管理中的敏感级别功能
- 注释掉创建知识集表单中的敏感级别选择字段
- 移除知识集详情页面中的敏感级别显示项
- 注释掉相关的敏感级别选项配置常量
- 更新表单布局以保持一致的两列网格结构
2026-02-02 19:06:03 +08:00
85433ac071 feat(template): 移除模板类型和版本字段并添加管理员权限控制
- 移除了模板详情页面中的类型和版本显示字段
- 移除了模板列表页面中的类型和版本列
- 添加了管理员权限检查功能,通过 localStorage 键控制
- 将编辑和删除操作按钮限制为仅管理员可见
- 将创建模板按钮限制为仅管理员可见
2026-02-02 18:59:32 +08:00
fc2e50b415 Revert "refactor(template): 移除模板列表中的类型、版本和操作列"
This reverts commit a5261b33b2.
2026-02-02 18:39:52 +08:00
26e1ae69d7 Revert "refactor(template): 移除模板列表页面的创建按钮"
This reverts commit b2bdf9e066.
2026-02-02 18:39:48 +08:00
7092c3f955 feat(annotation): 调整文本编辑器大小限制配置
- 将editor_max_text_bytes默认值从2MB改为0,表示不限制
- 更新文本获取服务中的大小检查逻辑,只在max_bytes大于0时进行限制
- 修改错误提示信息中的字节限制显示
- 优化配置参数的条件判断流程
2026-02-02 17:53:09 +08:00
b2bdf9e066 refactor(template): 移除模板列表页面的创建按钮
- 删除了右上角的创建模板按钮组件
- 移除了相关的点击事件处理函数调用
- 调整了页面布局结构以适应按钮移除后的变化
2026-02-02 16:35:09 +08:00
a5261b33b2 refactor(template): 移除模板列表中的类型、版本和操作列
- 移除了类型列(内置/自定义标签显示)
- 移除了版本列
- 移除了操作列(查看、编辑、删除按钮)
- 保留了创建时间列并维持其渲染逻辑
2026-02-02 16:20:50 +08:00
root
52daf30869 a 2026-02-02 16:09:25 +08:00
07a901043a refactor(annotation): 移除文本内容获取相关功能
- 删除了 fetch_text_content_via_download_api 导入
- 移除了 TEXT 类型数据集的文本内容获取逻辑
- 删除了 _append_annotation_to_content 方法实现
- 简化了知识同步服务的内容处理流程
2026-02-02 15:39:06 +08:00
32e3fc97c6 feat(annotation): 增强知识库同步服务以支持项目隔离
- 在知识库查找时添加项目ID验证,确保知识库归属正确
- 修改日志消息以显示项目ID信息便于调试
- 重构知识库查找逻辑,从按名称查找改为按名称和项目ID组合查找
- 新增_metadata_matches_project方法验证元数据中的项目归属
- 新增_parse_metadata方法安全解析元数据JSON字符串
- 更新回退命名逻辑以确保项目级别的唯一性
- 在所有知识库操作中统一使用项目名称和项目ID进行验证
2026-02-02 15:28:33 +08:00
a73571bd73 feat(annotation): 优化模板配置树编辑器中的属性填充逻辑
- 修改对象配置属性填充条件,仅在名称不存在时设置默认值
- 为控制配置添加标签类别判断逻辑
- 区分标注类和布局类控件的属性填充策略
- 标注类控件始终填充必需属性,布局类控件仅在需要时填充
- 修复属性值设置逻辑,确保正确引用名称属性
2026-02-02 15:26:25 +08:00
00fa1b86eb refactor(DataAnnotation): 移除未使用的状态变量并优化选择器逻辑
- 删除未使用的 addChildTag 和 addSiblingTag 状态变量
- 将 Select 组件的值设置为 null 以重置选择状态
- 简化 handleAddNode 调用的处理逻辑
- 移除不再需要的状态管理代码以提高性能
2026-02-02 15:23:01 +08:00
626c0fcd9a fix(data-annotation): 修复数据标注任务进度计算问题
- 添加 toSafeCount 工具函数确保数值安全处理
- 支持 totalCount 和 total_count 字段兼容性
-
2026-02-01 23:42:06 +08:00
2f2e0d6a8d feat(KnowledgeManagement): 保留知识集原始字段信息
- 在更新标签时保持知识集的名称、描述、状态等核心属性
- 保留领域、业务线、负责人等元数据信息
- 维护有效期、敏感度等配置项
- 确保源类型和自定义元数据字段不被覆盖
- 防止更新标签操作意外丢失其他重要字段值
2026-02-01 23:30:01 +08:00
10fad39e02 feat(KnowledgeManagement): 为知识集详情页添加标签功能
- 引入 updateKnowledgeSetByIdUsingPut、createDatasetTagUsingPost 和 queryDatasetTagsUsingGet API
- 添加 Clock 图标用于显示更新时间
- 替换条目数和更新时间的图标为 File 和 Clock 组件
- 配置标签组件以支持添加、获取和创建标签
- 实现标签的创建和添加逻辑
- 集成标签的异步加载和更新功能
2026-02-01 23:26:54 +08:00
9014dca1ac fix(knowledge): 修复知识集详情页面状态判断逻辑
- 修正了 office 预览状态的条件判断
- 移除了对 PENDING 状态的冗余检查
- 优化了状态轮询的触发条件
2026-02-01 23:15:50 +08:00
0b8fe34586 refactor(DataManagement): 简化文件操作逻辑并移除文本数据集类型检查
- 移除了未使用的 DatasetType 导入
- 删除了 TEXT_DATASET_TYPE_PREFIX 常量定义
- 移除了 isTextDataset 工具函数
- 直接设置 excludeDerivedFiles 参数为 true,简化查询逻辑
2026-02-01 23:13:09 +08:00
27e27a09d4 fix(knowledge): 移除知识条目编辑器中的冗余提示消息
- 删除了文件上传成功后的重复提示信息
- 保持了文件对象的正确处理逻辑
- 优化了用户体验避免不必要的操作反馈
2026-02-01 23:07:32 +08:00
d24fea83d8 feat(KnowledgeItemEditor): 添加文件上传替换功能的加载状态
- 添加 loading 状态用于控制文件上传和替换操作
- 在文件上传前设置 loading 状态为 true
- 在文件替换前设置 loading 状态为 true
- 在操作完成后通过 finally 块重置 loading 状态
- 将 loading 状态绑定到确认按钮的 confirmLoading 属性
2026-02-01 23:07:10 +08:00
05088fef1a refactor(data-management): 优化文本数据集类型判断逻辑
- 添加 TEXT_DATASET_TYPE_PREFIX 常量定义
- 新增 isTextDataset 工具函数用于判断文本数据集类型
- 使用 isTextDataset 函数替换原有的直接比较逻辑
- 提高代码可读性和类型判断的准确性
2026-02-01 23:02:05 +08:00
a0239518fb feat(dataset): 实现数据集文件可见性过滤功能
- 添加派生文件识别逻辑,通过元数据中的derived_from_file_id字段判断
- 实现applyVisibleFileCounts方法为数据集批量设置可见文件数量
- 修改数据集统计接口使用过滤后的可见文件进行统计计算
- 添加normalizeFilePath工具方法统一路径格式处理
- 更新文件查询逻辑支持派生文件过滤功能
- 新增DatasetFileCount DTO用于文件计数统计返回
2026-02-01 22:55:07 +08:00
9d185bb10c feat(deploy): 添加上传文件存储卷配置
- 新增 uploads_volume 卷用于存储上传文件
- 配置卷名称为 datamate-uploads-volume
- 将上传卷挂载到容器 /uploads 目录
- 更新部署配置以支持文件上传功能
2026-02-01 22:34:14 +08:00
6c4f05c0b9 fix(data-management): 修复文件预览状态检查逻辑
- 移除 PENDING 状态的预览轮询检查
- 避免在 PENDING 状态下重复轮询导致的性能问题
- 优化预览加载状态管理流程
2026-02-01 22:32:03 +08:00
438acebb89 feat(data-management): 添加Office文档预览功能
- 集成LibreOffice转换器实现DOC/DOCX转PDF功能
- 新增DatasetFilePreviewService处理预览文件管理
- 新增DatasetFilePreviewAsyncService异步转换任务
- 在文件删除时同步清理预览文件
- 前端实现Office文档预览状态轮询机制
- 添加预览API接口支持状态查询和转换触发
- 优化文件预览界面显示转换进度和错误信息
2026-02-01 22:26:05 +08:00
f06d6e5a7e fix(utils): 修复请求工具中的XMLHttpRequest配置问题
- 移动XMLHttpRequest实例化到方法开头避免重复创建
- 删除被注释掉的旧请求完成事件处理代码
- 修正请求错误和中止事件的错误处理逻辑
- 移除重复的xhr.open调用确保正确的HTTP方法设置
2026-02-01 22:07:43 +08:00
fda283198d refactor(knowledge): 移除未使用的Tag组件导入
- 从KnowledgeSetDetail.tsx中移除未使用的Tag组件导入
- 保持代码整洁性,消除无用的依赖项
2026-02-01 22:05:10 +08:00
d535d0ac1b feat(knowledge): 添加Office文档预览轮询机制
- 引入useRef钩子用于管理轮询定时器和当前处理项目
- 添加Spin组件用于预览加载状态显示
- 新增queryKnowledgeItemPreviewStatusUsingGet API调用接口
- 设置OFFICE_PREVIEW_POLL_INTERVAL和OFFICE_PREVIEW_POLL_MAX_TIMES常量
- 移除原有的Office预览元数据解析相关代码
- 添加officePreviewStatus、officePreviewError状态管理
- 实现pollOfficePreviewStatus函数进行预览状态轮询
- 添加clearOfficePreviewPolling清理轮询定时器功能
- 在handlePreviewItemFile中集成预览状态轮询逻辑
- 更新关闭预览时清理轮询和重置状态
- 移除表格中的Office预览标签显示
- 优化PDF预览界面,在无预览URL时显示加载或错误状态
2026-02-01 22:02:57 +08:00
4d2c9e546c refactor(menu): 调整菜单结构并更新数据管理标题
- 将数据管理菜单项标题从"数据管理"改为"数集管理"
- 重新排列菜单项顺序,将数据标注和内容生成功能移至数据管理后
- 数据集统计页面标题从"数据管理"更新为"数据集统计"
- 移除重复的数据标注和内容生成菜单配置项
2026-02-01 21:40:21 +08:00
02cd16523f refactor(data-management-service): 移除 docx4j 依赖
- 删除了 docx4j-core 依赖项
- 删除了 docx4j-export-fo 依赖项
- 更新了项目依赖配置
- 简化了构建配置文件
2026-02-01 21:18:50 +08:00
d4a44f3bf5 refactor(data-management): 优化知识项目预览服务的文件转换逻辑
- 移除 docx4j 相关依赖和转换方法
- 统一 office 文件转换为 pdf 的处理方式,全部使用 libreoffice
- 删除单独的 docx 到 pdf 转换方法
- 重命名转换方法为 convertOfficeToPdfByLibreOffice
- 增强路径解析逻辑,添加多种候选路径处理
- 添加路径安全性验证和规范化处理
- 新增 extractRelativePathFromSegment 和 normalizeRelativePathValue 辅助方法
- 改进文件存在性检查和路径构建逻辑
2026-02-01 21:18:14 +08:00
340a0ad364 refactor(data-management): 更新知识项存储路径解析方法
- 将 resolveKnowledgeItemStoragePath 方法替换为 resolveKnowledgeItemStoragePathWithFallback
- 新方法提供备用路径解析逻辑以增强文件定位的可靠性
2026-02-01 21:14:39 +08:00
00c41fbbd3 refactor(knowledge-item): 优化知识项预览文件路径处理逻辑
- 将文件路径验证逻辑从方法开始位置移动到实际使用位置
- 修复了预览文件名获取方式,直接从相对路径解析文件名
- 确保文件存在性检查只在需要时执行
- 提高了代码可读性和执行效率
2026-02-01 21:00:07 +08:00
2430db290d fix(knowledge): 修复知识管理页面统计信息显示错误
- 将第二个统计项从"文件总数"更正为"知识类别"
- 将第三个统计项从"标签总数"更正为"文件总数"
- 在统计数据显示区域调整标签总数的位置
- 确保统计数据与标题正确对应
2026-02-01 20:46:54 +08:00
40889baacc feat(knowledge): 添加知识库条目预览功能
- 集成 docx4j 和 LibreOffice 实现 Office 文档转 PDF 预览
- 新增 KnowledgeItemPreviewService 处理预览转换逻辑
- 添加异步任务 KnowledgeItemPreviewAsyncService 进行文档转换
- 实现预览状态管理包括待转换、转换中、就绪和失败状态
- 在前端界面添加 Office 文档预览状态标签显示
- 支持 DOC/DOCX 文件在线预览功能
- 添加预览元数据存储和管理机制
2026-02-01 20:05:25 +08:00
551248ec76 feat(data-annotation): 添加表格序号列并移除任务ID列
- 添加序号列显示当前页码计算后的行号
- 移除原有的任务ID列
- 序号列居中对齐宽度为80px
- 序号基于当前页码和页面大小动态计算
- 保持表格
2026-02-01 19:11:39 +08:00
0bb9abb200 feat(annotation): 添加标注类型显示功能
- 在前端页面中新增标注类型列并使用Tag组件展示
- 添加AnnotationTypeMap常量用于标注类型的映射
- 修改接口定义支持labelingType字段的传递
- 更新后端项目创建和更新逻辑以存储标注类型
- 添加标注类型配置键常量统一管理
- 扩展数据传输对象支持标注类型属性
- 实现模板标注类型的继承逻辑
2026-02-01 19:08:11 +08:00
d135a7f336 feat(knowledge): 添加知识库标签统计功能
- 在 KnowledgeItemApplicationService 中注入 TagMapper 并调用统计方法
- 新增 countKnowledgeSetTags 方法用于计算知识库中的标签总数
- 在 KnowledgeManagementStatisticsResponse 中添加 totalTags 字段
- 在前端 KnowledgeManagementPage 中显示标签总数统计信息
- 更新统计卡片布局从 3 列改为 4 列以适应新增统计项
- 在知识管理模型中添加 totalTags 类型定义
2026-02-01 18:46:31 +08:00
7043a26ab3 feat(auth): 添加登录功能和路由保护
- 在侧边栏添加退出登录按钮并实现登出逻辑
- 添加 ProtectedRoute 组件用于路由权限控制
- 创建 LoginPage 组件实现登录界面和逻辑
- 集成本地登录验证到 authSlice 状态管理
- 配置路由表添加登录页面和保护路由
- 实现自动跳转到登录页面的重定向逻辑
2026-02-01 14:11:44 +08:00
906bb39b83 feat(annotation): 添加保存并跳转到下一段功能
- 新增 SAVE_AND_NEXT_LABEL 常量用于保存并跳转按钮文本
- 添加 saveDisabled 状态控制保存按钮禁用逻辑
- 修改顶部工具栏布局为三列网格结构
- 在工具栏中间位置添加保存并跳转到下一段/下一条按钮
- 调整保存按钮样式移除主色调设置
- 优化保存按钮禁用状态逻辑统一管理
- 修改保存功能区分普通保存和跳转保存操作
2026-02-01 13:09:55 +08:00
dbf8ec53dd style(ui): 统一预览模态框宽度为响应式尺寸
- 将 CreateAnnotationTaskDialog 中的预览模态框宽度从固定像素改为 80vw
- 将 VisualTemplateBuilder 中的预览抽屉宽度从 600px 改为 80vw
- 将 PreviewPromptModal 中的模态框宽度从 800px 改为 80vw
- 将 Overview 组件中的文本和媒体预览宽度统一改为 80vw
- 将 KnowledgeSetDetail 中的文本和媒体预览宽度统一改为 80vw
- 移除原来固定的像素值,使用响应式单位提升用户体验
2026-02-01 12:49:56 +08:00
5f89968974 refactor(dataset): 重构数据集基础信息组件
- 优化 BasicInformation 组件结构和逻辑
- 更新 CreateDataset 组件的数据处理流程
- 改进表单验证和错误处理机制
- 统一组件间的事件传递方式
- 提升代码可读性和维护性
2026-02-01 11:31:09 +08:00
be313cf425 refactor(db): 优化知识条目表索引结构
- 移除知识条目表中 relative_path 字段的索引
- 移除知识条目目录表中 relative_path 字段的唯一约束
- 移除知识条目目录表中 relative_path 字段的索引
- 保留必要的 source_file 和 set_id 关
2026-02-01 11:26:10 +08:00
db37de8aee perf(db): 优化知识条目表索引配置
- 为 idx_dm_ki_relative_path 索引添加长度限制 (768)
- 为 uk_dm_kd_set_path 唯一约束添加相对路径长度限制 (768)
- 为 idx_dm_kd_relative_path 索引添加长度限制 (768)
- 提升数据库查询性能和索引效率
2026-02-01 11:24:35 +08:00
aeec19b99f feat(annotation): 添加保存快捷键功能
- 实现了 Ctrl+S 保存快捷键检测逻辑
- 添加了 handleSaveShortcut 事件处理函数
- 在窗口上注册键盘事件监听器
- 修改 requestExport 函数支持 autoAdvance 参数
- 更新保存按钮点击事件传递 autoAdvance 参数
2026-01-31 20:47:33 +08:00
a4aefe66cd perf(file): 增加文件上传默认超时时间
- 将默认超时时间从 120 秒增加到 1800 秒
- 提高大文件上传的处理能力
2026-01-31 19:15:21 +08:00
2f3a8b38d0 fix(dataset): 解决数据集文件查询时空目录导致异常的问题
- 添加目录存在性检查,避免文件系统访问异常
- 目录不存在时返回空分页结果而不是抛出异常
- 优化数据集刚创建时的用户体验
2026-01-31 19:10:22 +08:00
150af1a741 fix(annotation): 修复项目映射查询逻辑错误
- 移除旧的映射服务查询方式,改为直接查询 ORM 模型获取原始数据
- 更新配置字段读取逻辑以使用新的 ORM 对象
- 修复更新无变化时的响应数据返回问题
- 添加软删除过滤条件确保只返回未删除的项目记录
- 统一数据访问方式提高查询效率和代码一致性
2026-01-31 18:57:08 +08:00
e28f680abb feat(annotation): 添加标注项目信息更新功能
- 引入 DatasetMappingUpdateRequest 请求模型支持 name、description、template_id 和 label_config 字段更新
- 在项目接口中添加 PUT /{project_id} 端点用于更新标注项目信息
- 实现更新逻辑包括映射记录查询、配置信息处理和数据库更新操作
- 集成标准响应格式返回更新结果
- 添加异常处理和日志记录确保操作可追溯性
2026-01-31 18:54:05 +08:00
4f99875670 feat(data-management): 添加数据集类型判断并控制按行分割功能显示
- 从 dataset.model 中导入 DatasetType 类型定义
- 新增 isTextDataset 变量用于判断当前数据集是否为文本类型
- 将按行分割配置项包裹在条件渲染中,仅在文本数据集时显示
- 保持原有非文本文件禁用逻辑不变
2026-01-31 18:50:56 +08:00
c23a9da8cb feat(knowledge): 添加知识库目录管理功能
- 在知识条目表中新增relative_path字段用于存储条目相对路径
- 创建知识条目目录表用于管理知识库中的目录结构
- 实现目录的增删查接口和相应的应用服务逻辑
- 在前端知识库详情页面集成目录显示和操作功能
- 添加目录创建删除等相关的API接口和DTO定义
- 更新数据库初始化脚本包含新的目录表结构
2026-01-31 18:36:40 +08:00
310bc356b1 feat(knowledge): 添加知识库文件目录结构支持功能
- 在 KnowledgeItem 模型中增加 relativePath 字段存储相对路径
- 实现文件上传时的目录前缀处理和相对路径构建逻辑
- 添加批量删除知识条目的接口和实现方法
- 重构前端 KnowledgeSetDetail 组件以支持目录浏览和管理
- 实现文件夹创建、删除、导航等目录操作功能
- 更新数据查询逻辑以支持按相对路径进行搜索和过滤
- 添加前端文件夹图标显示和目录层级展示功能
2026-01-31 17:45:43 +08:00
c1fb02b0f5 refactor(annotation): 更新任务编辑模式的数据类型定义
- 移除 AnnotationTask 类型导入
- 添加 AnnotationTaskListItem 类型导入
- 修改 editTask 属性类型从 AnnotationTask 到 AnnotationTaskListItem
- 优化组件类型定义以匹配实际使用的数据结构
2026-01-31 17:19:18 +08:00
4a3e466210 feat(annotation): 添加标注任务进行中数据显示功能
- 新增 AnnotationTaskListItem 和相关类型定义
- 在前端页面中添加标注中列显示进行中的标注数据量
- 更新数据获取逻辑以支持进行中标注数量统计
- 修改后端服务层添加 in_progress_count 字段映射
- 优化类型安全和代码结构设计
2026-01-31 17:14:23 +08:00
5d8d25ca8c fix(annotation): 解决空标注结果的状态处理问题
- 在构建标注快照时增加空标注检查,避免空对象被处理
- 修改状态判断逻辑,当标注为空且当前状态为 NO_ANNOTATION 或 NOT_APPLICABLE 时保持原状态
- 移除冗余的 hasExistingAnnotation 变量检查
- 确保空标注情况下状态流转的正确性,防止误标为已标注状态
2026-01-31 16:57:38 +08:00
f6788756d3 fix(annotation): 修复分段标注数据结构兼容性问题
- 添加分段标注合并异常时的日志记录和警告
- 增加分段标注保存时的详细状态日志
- 修复分段数据结构类型检查逻辑,支持dict和list格式统一转换
- 避免SQLAlchemy变更检测失效的原地修改问题
- 添加旧版list结构向新dict结构的数据迁移兼容处理
2026-01-31 16:45:48 +08:00
5a5279869e feat(annotation): 添加分段总数提示功能优化性能
- 在编辑器服务中添加 segment_total_hint 变量用于缓存分段总数计算结果
- 使用 with_for_update() 锁定查询以避免并发问题
- 将重复的分段总数计算逻辑替换为使用缓存的提示值
- 减少数据库查询次数提升标注任务处理效率
- 优化了分段索引存在时的总数获取流程
2026-01-31 16:28:39 +08:00
e1c963928a feat(annotation): 添加标注对象解析和导出功能
- 实现 isAnnotationObject 函数验证标注对象
- 添加 resolveSelectedAnnotation 函数解析选中标注
- 优化 exportSelectedAnnotation 函数的标注选择逻辑
- 添加未找到标注对象的错误处理
- 支持 results 字段到 result 字段的自动转换
- 提升标注数据导出的稳定性和准确性
2026-01-31 16:14:12 +08:00
33cf65c9f8 feat(annotation): 添加分段标注统计和进度跟踪功能
- 新增 SegmentStats 类型定义用于分段统计
- 实现分段标注进度计算和缓存机制
- 添加标注任务状态判断逻辑支持分段模式
- 集成分段统计数据显示到任务列表界面
- 实现分段总数自动计算和验证功能
- 扩展标注状态枚举支持进行中标注状态
- 优化任务选择逻辑基于分段完成状态
- 添加分段统计数据预加载和同步机制
2026-01-31 15:42:04 +08:00
3e0a15ac8e fix(annotation): 修复导出标注对话框格式选项显示问题
- 为格式选项添加 py-1 样式类改善布局
- 添加 simpleLabel 属性用于选项标签显示
- 将 optionLabelProp 从 label 改为 simpleLabel
- 优化下拉选择器的标签
2026-01-31 15:35:54 +08:00
5318ee9641 fix(annotation): 修复导出服务中的重复数据处理逻辑
- 移除了重复的else分支代码块
- 修复了分段索引键不存在时的数据处理流程
- 简化了列表类型分段的处理逻辑
- 消除了重复的数据添加操作
2026-01-31 14:39:21 +08:00
c5c8e6c69e feat(annotation): 添加分段标注功能支持
- 定义分段标注相关常量(segmented、segments、result等键名)
- 实现分段标注提取方法_extract_segment_annotations处理字典和列表格式
- 添加分段标注判断方法_is_segmented_annotation检测标注状态
- 修改_has_annotation_result方法使用新的分段标注处理逻辑
- 在任务创建过程中集成分段标注数据处理
- 更新导出服务中的分段标注结果扁平化处理
- 实现标注归一化方法支持分段标注格式转换
- 调整JSON和CSV导出格式适配分段标注结构
2026-01-31 14:36:16 +08:00
8fdc7d99b8 feat(docker): 优化 Dockerfile 支持弱网环境缓存
- 使用缓存挂载 DataX 源码,避免重复克隆提高构建效率
- 添加 NLTK 数据缓存挂载并增加失败检查机制
- 实现 PaddleOCR 模型下载缓存,支持离线重用
- 集成 spaCy 模型缓存机制,提升安装稳定性
- 优化构建流程适配弱网环境下的依赖下载
2026-01-31 14:31:47 +08:00
2bc48fd465 refactor(annotation): 移除编辑器标签配置装饰逻辑
- 删除了 _decorate_label_config_for_editor 方法调用
- 简化了标签配置获取流程
- 移除了不必要的条件检查逻辑
2026-01-31 14:14:32 +08:00
a21a632a4b refactor(DataManagement): 优化数据集详情页面的文件获取逻辑
- 将文件获取逻辑从 fetchDataset 函数中分离到独立的 useEffect 钩子
- 添加 dataset.id 依赖以确保在数据集加载后获取文件
- 修复初始加载时可能发生的文件获取时机问题
- 改进组件渲染性能通过更精确的依赖跟踪
- 保持原有功能不变但提升代码可维护性
2026-01-31 14:14:16 +08:00
595a758d05 refactor(data-management): 优化PDF文本提取服务的事务处理
- 添加TransactionSynchronization相关依赖注入
- 实现事务提交后异步执行PDF文本提取功能
- 增加数据集ID和文件ID的空值检查
- 在活跃事务中注册同步回调确保正确执行
- 避免在事务未提交时提前执行异步任务
2026-01-31 13:59:03 +08:00
4fa0ac1df4 config(security): 禁用安全配置中的frameOptions以允许iframe嵌入
- 在SecurityFilterChain中添加headers配置
- 禁用frameOptions以解决iframe嵌入限制问题
- 保持csrf禁用和其他现有安全设置不变
2026-01-31 13:57:38 +08:00
f2403f00ce feat(annotation): 添加不适用标注状态支持
- 在 AnnotationResultStatus 枚举中新增 NOT_APPLICABLE 状态
- 将无标注/不适用合并为两个独立的状态选项
- 更新前端标签显示逻辑以支持新的状态类型
- 修改确认对话框允许选择不适用状态
- 在后端数据库模型中添加 NOT_APPLICABLE 状态值
- 更新 API schema 描述以反映新的状态选项
- 调整标注状态判断和保存逻辑以处理三种状态
- 更新数据库表结构注释包含新状态类型
2026-01-31 13:28:08 +08:00
f4fc574687 feat(annotation): 添加标注状态管理功能
- 引入 AnnotationResultStatus 枚举类型区分已标注和无标注状态
- 在前端组件中实现空标注检测和确认对话框逻辑
- 添加数据库表字段 annotation_status 存储标注状态
- 扩展后端服务验证和处理标注状态逻辑
- 更新 API 接口支持标注状态参数传递
- 改进任务列表显示逻辑以反映不同标注状态
- 实现分段模式下的标注结果检查机制
2026-01-31 13:23:38 +08:00
52a2a73a8e feat(annotation): 添加保存并跳转快捷键功能
- 实现了 Ctrl+Enter 保存并跳转到下一个标注的快捷键逻辑
- 添加了键盘事件监听器来捕获快捷键组合
- 集成了导出选中标注并发送到父窗口的功能
- 处理了快捷键事件的防重复和传播阻止
- 在消息处理器中添加了 LS_SAVE_AND_NEXT 类型的支持
- 实现了自动跳转到下一项标注的功能
2026-01-31 11:47:33 +08:00
b5d7c66240 feat(data-management): 扩展源文档排除功能支持Excel文件类型
- 在后端服务中扩展源文档类型检查,新增对XLS和XLSX文件的支持
- 修改DatasetFileApplicationService中的过滤逻辑,统一处理所有源文档类型
- 新增isSourceDocument和isDerivedFile辅助方法进行文件类型判断
- 更新前端DatasetFileTransfer组件中的注释说明
- 在Python运行时依赖中添加openpyxl和xlrd库以支持Excel文件处理
- 修改标注项目接口中源文档类型的集合定义
- 更新文件操作钩子中的派生文件排除逻辑
2026-01-31 11:30:55 +08:00
6c7ea0c25e chore(deps): 更新 Docker 镜像源地址
- 将 etcd 镜像源从 quay.io 替换为 quay.nju.edu.cn
- 将 vLLM-Ascend 镜像源从 quay.io 替换为 quay.nju.edu.cn
- 统一使用南京大学镜像仓库地址以提高下载速度
2026-01-31 11:21:47 +08:00
153066a95f fix(frontend): hide action dropdown in CardView when operations list is empty 2026-01-31 11:14:26 +08:00
498f23a0c4 feat(data-management): 扩展文本数据集支持Excel文件类型
- 在DatasetFileApplicationService中添加XLS和XLSX文件类型到文档文本文件类型集合
- 更新DatasetTypeController中的TEXT数据集类型支持xls和xlsx扩展名
- 在pdf_extract.py中添加XLS和XLSX文件类型的常量定义和解析器配置
- 实现Excel文件转CSV的功能,支持单个工作表和多工作表的解析
- 添加对Excel文件的依赖检查和错误处理机制
- 修改目标文件路径构建逻辑以支持不同文件类型的派生扩展名
- 更新文本文件记录创建逻辑以使用派生文件类型而不是固定文本类型
2026-01-31 11:11:24 +08:00
85d7141a91 refactor(DataManagement): 移除相似数据集表格并改用卡片视图显示
- 移除了 Overview 组件中的相似数据集表格相关代码
- 移除了 Tag 组件和相关依赖的导入
- 在 DatasetDetail 中添加 CardView 组件用于显示相似数据集
- 将相似数据集的展示从表格改为卡片布局
- 移除了 Overview 组件中的相似数据集参数传递
- 更新了页面布局以
2026-01-31 09:40:06 +08:00
790385bd80 feat(knowledge-management): 添加知识管理搜索功能和统计接口
- 新增知识条目搜索查询和响应DTO
- 实现知识管理统计功能,包括总数、文件数和总大小
- 添加数据库查询方法支持文件搜索和统计计算
- 创建知识条目搜索控制器提供REST API
- 在前端添加知识管理搜索页面和相关组件
- 更新前端路由配置添加搜索页面入口
- 移除RAG索引服务中的重复统计功能
- 优化前端页面统计数据显示和刷新逻辑
2026-01-31 09:30:37 +08:00
97170a90fe feat(data-import): 添加文本文件类型检测和按行分割功能
- 新增 TEXT_FILE_MIME_PREFIX、TEXT_FILE_MIME_TYPES 和 TEXT_FILE_EXTENSIONS 常量用于文本文件识别
- 添加 getUploadFileName、getUploadFileType 和 isTextUploadFile 工具函数
- 在 splitFileByLines 函数中集成文本文件类型检查
- 添加 hasNonTextFile useMemo 钩子来检测是否存在非文本文件
- 当存在非文本文件时禁用按行分割功能并重置开关状态
- 更新 Tooltip 提示内容以反映文件类型限制
- 使用 useCallback 优化 fetchCollectionTasks 和 resetState 函数
- 调整 useEffect 依赖数组以确保正确的重新渲染行为
2026-01-30 23:31:02 +08:00
fd209c3083 feat(knowledge-base): 添加知识库统计功能
- 后端服务新增 KnowledgeBaseStatisticsResp 和 RagFileStatistics 数据传输对象
- 在 KnowledgeBaseService 中实现 getStatistics 方法提供统计信息查询
- 为 RagFileRepository 添加 getStatistics 接口及其实现
- 通过 MyBatis Mapper 实现数据库层面的统计查询功能
- 在 KnowledgeBaseController 中暴露 /statistics 接口供前端调用
- 前端页面集成统计卡片组件展示知识库、文件数量及总大小信息
- 实现前后端数据同步机制确保统计数据实时更新
2026-01-30 23:17:40 +08:00
76f70a6847 feat(knowledge-base): 添加知识库文件全库检索功能
- 新增相对路径字段替代原有的metadata存储方式
- 实现跨知识库文件检索接口searchFiles
- 添加前端全库检索页面和相关API调用
- 优化文件路径处理和数据库索引配置
- 统一请求参数类型定义为RequestPayload和RequestParams
- 简化RagFile模型中的元数据结构设计
2026-01-30 22:24:12 +08:00
cbad129ce4 feat(rag): 添加相对路径搜索功能并优化文件显示
- 在RagFileRepositoryImpl中新增relativePath字段和路径模式构建方法
- 实现buildRelativePathPattern方法用于构建相对路径搜索模式
- 修改page方法添加相对路径模糊查询支持
- 在RagFileReq DTO中添加relativePath参数字段
- 优化KnowledgeBaseDetail页面中的文件名显示逻辑
- 添加normalizePath函数处理文件路径规范化显示
2026-01-30 21:55:29 +08:00
ca7ff56610 feat(rag): 添加文件相对路径支持功能
- 在FileInfo DTO中新增relativePath字段
- 实现文件相对路径的规范化处理逻辑
- 将文件相对路径存储到元数据中
- 前端添加文件路径解析和显示功能
- 优化路径分隔符统一处理机制
- 更新文件列表展示逻辑以支持路径层级结构
2026-01-30 21:46:03 +08:00
a00a6ed3c3 feat(knowledge-base): 实现知识库文件夹功能和优化文件管理
- 添加 datasetId 和 filePath 字段到 DatasetFile 接口
- 实现 resolveRelativeFileName 函数用于解析相对文件名
- 在 AddDataDialog 中使用 resolveRelativeFileName 处理文件名
- 添加文件夹浏览功能,支持目录导航和层级显示
- 实现文件夹删除功能,可批量删除目录下所有文件
- 集成 Folder 和 File 图标组件用于目录和文件区分
- 优化文件列表加载逻辑,使用分页和关键词搜索
- 添加文件夹状态显示和相应操作按钮
- 实现文件路径前缀管理和子目录过滤
- 重构文件列表渲染逻辑,支持目录和文件混合展示
2026-01-30 21:30:54 +08:00
9a205919d7 refactor(data-import): 优化数据源文件扫描和复制逻辑
- 修改数据源文件扫描方法,直接在主流程中获取任务详情和路径
- 移除独立的getFilePaths方法,将路径扫描逻辑整合到scanFilePaths方法中
- 新增copyFilesToDatasetDirWithSourceRoot方法支持保留相对路径的文件复制
- 更新数据集文件应用服务中的文件复制逻辑,支持相对路径处理
- 修改Python后端项目接口中的文件查询逻辑,移除注释掉的编辑器服务引用
- 调整文件过滤逻辑,基于元数据中的派生源ID进行文件筛选
- 移除编辑器服务中已废弃的源文档过滤条件
2026-01-30 18:58:34 +08:00
8b2a19f09a feat(annotation): 添加标注项目文件快照功能
- 新增 LabelingProjectFile 模型用于存储标注项目的文件快照
- 在创建标注项目时记录关联的文件快照数据
- 更新查询逻辑以基于项目快照过滤文件列表
- 优化导出统计功能使用快照数据进行计算
- 添加数据库表结构支持项目文件快照关系
2026-01-30 18:10:13 +08:00
3c3ca130b3 feat(annotation): 添加文本文件内容读取和多类型标签导出功能
- 新增异步函数 _read_file_content 用于安全读取文本文件内容
- 实现在导出时包含文本文件的实际内容数据
- 扩展 CSV 导出格式支持多种标注类型标签提取
- 添加对矩形标签、多边形标签、画笔标签等多种标注类型的支持
- 更新 COCO 格式导出文档说明bbox坐标转换注意事项
2026-01-30 17:35:22 +08:00
a4cdaecf8a refactor(annotation): 简化注释数据导出下载逻辑
- 移除前端手动创建 a 标签下载文件的方式
- 将文件名参数传递给后端 API 函数
- 利用 download 函数内置的下载处理机制
- 简化 ExportAnnotationDialog 组件中的导出流程
- 更新 annotation.api.ts 中的 downloadAnnotationsUsingGet 函数签名
- 直接通过 API 调用完成文件下载和命名
2026-01-30 17:33:14 +08:00
6dfed934a5 feat(file-preview): 增加PDF文件预览功能并优化预览逻辑
- 引入统一的文件预览工具函数和类型定义
- 添加PDF文件类型的识别和预览支持
- 使用iframe实现PDF文件在线预览
- 重构文件预览逻辑,统一处理不同文件类型的预览
- 优化文本内容预览的长度截取机制
- 更新预览按钮加载状态显示
- 统一预览窗口的最大高度配置
- 修改API调用路径为专门的预览接口
2026-01-30 17:32:36 +08:00
bd37858ccc refactor(dataset): 优化数据集路径管理和关联关系处理
- 移除Dataset类中initCreateParam方法的parentPath参数
- 简化handleParentChange方法中的路径构建逻辑
- 更新错误消息将"子数据集"改为"关联数据集"
- 修改前端界面将"父数据集"相关术语统一为"关联数据集"
- 在导入配置组件中添加类型定义和改进文件处理逻辑
- 限制数据源选项排除COLLECTION类型避免错误选择
2026-01-30 16:48:39 +08:00
accaa47a83 fix(components): 修复组件中定时器内存泄漏问题
- 在TopLoadingBar组件中添加timeoutRef并正确清理定时器
- 在Agent页面中添加timeoutRef管理AI响应模拟定时器
- 修复BasicInformation组件中useCallback依赖数组缺失问题
- 在CreateDataset页面中传递hidden属性控制数据源显示
- 在Orchestration页面中添加intervalRef管理工作流执行进度
- 在SynthesisTask中添加testTimeoutRef管理模板测试定时器
- 确保所有组件卸载时正确清除定时器避免内存泄漏
2026-01-30 14:35:45 +08:00
98d2ef1aa5 feat(KnowledgeBase): 优化知识库文件上传功能
- 添加提交状态控制,防止重复提交
- 将分块选项中的"按章节分块"改为"按句子分块"
- 更新固定长度分块的选项值从FIXED_LENGTH_CHUNK到LENGTH_CHUNK
- 简化文件计数逻辑,直接统计选中文件数量
- 添加上传进度提示消息
- 重构文件数据结构,确保ID为字符串类型
- 添加按钮禁用状态控制,提升用户体验
- 优化消息提示的显示方式,支持更新现有消息
2026-01-30 14:29:45 +08:00
164 changed files with 12267 additions and 2418 deletions

304
Makefile.offline.mk Normal file
View File

@@ -0,0 +1,304 @@
# ============================================================================
# Makefile 离线构建扩展
# 将此文件内容追加到主 Makefile 末尾,或单独包含使用
# ============================================================================
# 离线构建配置
CACHE_DIR ?= ./build-cache
OFFLINE_VERSION ?= latest
# 创建 buildx 构建器(如果不存在)
.PHONY: ensure-buildx
ensure-buildx:
@if ! docker buildx inspect offline-builder > /dev/null 2>&1; then \
echo "创建 buildx 构建器..."; \
docker buildx create --name offline-builder --driver docker-container --use 2>/dev/null || docker buildx use offline-builder; \
else \
docker buildx use offline-builder 2>/dev/null || true; \
fi
# ========== 离线缓存导出(有网环境) ==========
.PHONY: offline-export
offline-export: ensure-buildx
@echo "======================================"
@echo "导出离线构建缓存..."
@echo "======================================"
@mkdir -p $(CACHE_DIR)/buildkit $(CACHE_DIR)/images $(CACHE_DIR)/resources
@$(MAKE) _offline-export-base-images
@$(MAKE) _offline-export-cache
@$(MAKE) _offline-export-resources
@$(MAKE) _offline-package
.PHONY: _offline-export-base-images
_offline-export-base-images:
@echo ""
@echo "1. 导出基础镜像..."
@bash -c 'images=( \
"maven:3-eclipse-temurin-21" \
"maven:3-eclipse-temurin-8" \
"eclipse-temurin:21-jdk" \
"mysql:8" \
"node:20-alpine" \
"nginx:1.29" \
"ghcr.nju.edu.cn/astral-sh/uv:python3.11-bookworm" \
"ghcr.nju.edu.cn/astral-sh/uv:python3.12-bookworm" \
"ghcr.nju.edu.cn/astral-sh/uv:latest" \
"python:3.12-slim" \
"python:3.11-slim" \
"gcr.nju.edu.cn/distroless/nodejs20-debian12" \
); for img in "$${images[@]}"; do echo " Pulling $$img..."; docker pull "$$img" 2>/dev/null || true; done'
@echo " Saving base images..."
@docker save -o $(CACHE_DIR)/images/base-images.tar \
maven:3-eclipse-temurin-21 \
maven:3-eclipse-temurin-8 \
eclipse-temurin:21-jdk \
mysql:8 \
node:20-alpine \
nginx:1.29 \
ghcr.nju.edu.cn/astral-sh/uv:python3.11-bookworm \
ghcr.nju.edu.cn/astral-sh/uv:python3.12-bookworm \
ghcr.nju.edu.cn/astral-sh/uv:latest \
python:3.12-slim \
python:3.11-slim \
gcr.nju.edu.cn/distroless/nodejs20-debian12 2>/dev/null || echo " Warning: Some images may not exist"
.PHONY: _offline-export-cache
_offline-export-cache:
@echo ""
@echo "2. 导出 BuildKit 缓存..."
@echo " backend..."
@docker buildx build --cache-to type=local,dest=$(CACHE_DIR)/buildkit/backend-cache,mode=max -f scripts/images/backend/Dockerfile -t datamate-backend:cache . 2>/dev/null || echo " Warning: backend cache export failed"
@echo " backend-python..."
@docker buildx build --cache-to type=local,dest=$(CACHE_DIR)/buildkit/backend-python-cache,mode=max -f scripts/images/backend-python/Dockerfile -t datamate-backend-python:cache . 2>/dev/null || echo " Warning: backend-python cache export failed"
@echo " database..."
@docker buildx build --cache-to type=local,dest=$(CACHE_DIR)/buildkit/database-cache,mode=max -f scripts/images/database/Dockerfile -t datamate-database:cache . 2>/dev/null || echo " Warning: database cache export failed"
@echo " frontend..."
@docker buildx build --cache-to type=local,dest=$(CACHE_DIR)/buildkit/frontend-cache,mode=max -f scripts/images/frontend/Dockerfile -t datamate-frontend:cache . 2>/dev/null || echo " Warning: frontend cache export failed"
@echo " gateway..."
@docker buildx build --cache-to type=local,dest=$(CACHE_DIR)/buildkit/gateway-cache,mode=max -f scripts/images/gateway/Dockerfile -t datamate-gateway:cache . 2>/dev/null || echo " Warning: gateway cache export failed"
@echo " runtime..."
@docker buildx build --cache-to type=local,dest=$(CACHE_DIR)/buildkit/runtime-cache,mode=max -f scripts/images/runtime/Dockerfile -t datamate-runtime:cache . 2>/dev/null || echo " Warning: runtime cache export failed"
@echo " deer-flow-backend..."
@docker buildx build --cache-to type=local,dest=$(CACHE_DIR)/buildkit/deer-flow-backend-cache,mode=max -f scripts/images/deer-flow-backend/Dockerfile -t deer-flow-backend:cache . 2>/dev/null || echo " Warning: deer-flow-backend cache export failed"
@echo " deer-flow-frontend..."
@docker buildx build --cache-to type=local,dest=$(CACHE_DIR)/buildkit/deer-flow-frontend-cache,mode=max -f scripts/images/deer-flow-frontend/Dockerfile -t deer-flow-frontend:cache . 2>/dev/null || echo " Warning: deer-flow-frontend cache export failed"
@echo " mineru..."
@docker buildx build --cache-to type=local,dest=$(CACHE_DIR)/buildkit/mineru-cache,mode=max -f scripts/images/mineru/Dockerfile -t datamate-mineru:cache . 2>/dev/null || echo " Warning: mineru cache export failed"
.PHONY: _offline-export-resources
_offline-export-resources:
@echo ""
@echo "3. 预下载外部资源..."
@mkdir -p $(CACHE_DIR)/resources/models
@echo " PaddleOCR model..."
@wget -q -O $(CACHE_DIR)/resources/models/ch_ppocr_mobile_v2.0_cls_infer.tar \
https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar 2>/dev/null || echo " Warning: PaddleOCR model download failed"
@echo " spaCy model..."
@wget -q -O $(CACHE_DIR)/resources/models/zh_core_web_sm-3.8.0-py3-none-any.whl \
https://ghproxy.net/https://github.com/explosion/spacy-models/releases/download/zh_core_web_sm-3.8.0/zh_core_web_sm-3.8.0-py3-none-any.whl 2>/dev/null || echo " Warning: spaCy model download failed"
@echo " DataX source..."
@if [ ! -d "$(CACHE_DIR)/resources/DataX" ]; then \
git clone --depth 1 https://gitee.com/alibaba/DataX.git $(CACHE_DIR)/resources/DataX 2>/dev/null || echo " Warning: DataX clone failed"; \
fi
@echo " deer-flow source..."
@if [ ! -d "$(CACHE_DIR)/resources/deer-flow" ]; then \
git clone --depth 1 https://ghproxy.net/https://github.com/ModelEngine-Group/deer-flow.git $(CACHE_DIR)/resources/deer-flow 2>/dev/null || echo " Warning: deer-flow clone failed"; \
fi
.PHONY: _offline-package
_offline-package:
@echo ""
@echo "4. 打包缓存..."
@cd $(CACHE_DIR) && tar -czf "build-cache-$$(date +%Y%m%d).tar.gz" buildkit images resources 2>/dev/null && cd - > /dev/null
@echo ""
@echo "======================================"
@echo "✓ 缓存导出完成!"
@echo "======================================"
@echo "传输文件: $(CACHE_DIR)/build-cache-$$(date +%Y%m%d).tar.gz"
# ========== 离线构建(无网环境) ==========
.PHONY: offline-setup
offline-setup:
@echo "======================================"
@echo "设置离线构建环境..."
@echo "======================================"
@if [ ! -d "$(CACHE_DIR)" ]; then \
echo "查找并解压缓存包..."; \
cache_file=$$(ls -t build-cache-*.tar.gz 2>/dev/null | head -1); \
if [ -z "$$cache_file" ]; then \
echo "错误: 未找到缓存压缩包 (build-cache-*.tar.gz)"; \
exit 1; \
fi; \
echo "解压 $$cache_file..."; \
tar -xzf "$$cache_file"; \
else \
echo "缓存目录已存在: $(CACHE_DIR)"; \
fi
@echo ""
@echo "加载基础镜像..."
@if [ -f "$(CACHE_DIR)/images/base-images.tar" ]; then \
docker load -i $(CACHE_DIR)/images/base-images.tar; \
else \
echo "警告: 基础镜像文件不存在,假设已手动加载"; \
fi
@$(MAKE) ensure-buildx
@echo ""
@echo "✓ 离线环境准备完成"
.PHONY: offline-build
offline-build: offline-setup
@echo ""
@echo "======================================"
@echo "开始离线构建..."
@echo "======================================"
@$(MAKE) _offline-build-services
.PHONY: _offline-build-services
_offline-build-services: ensure-buildx
@echo ""
@echo "构建 datamate-database..."
@docker buildx build \
--cache-from type=local,src=$(CACHE_DIR)/buildkit/database-cache \
--pull=false \
-f scripts/images/database/Dockerfile \
-t datamate-database:$(OFFLINE_VERSION) \
--load . || echo " Failed"
@echo ""
@echo "构建 datamate-gateway..."
@docker buildx build \
--cache-from type=local,src=$(CACHE_DIR)/buildkit/gateway-cache \
--pull=false \
-f scripts/images/gateway/Dockerfile \
-t datamate-gateway:$(OFFLINE_VERSION) \
--load . || echo " Failed"
@echo ""
@echo "构建 datamate-backend..."
@docker buildx build \
--cache-from type=local,src=$(CACHE_DIR)/buildkit/backend-cache \
--pull=false \
-f scripts/images/backend/Dockerfile \
-t datamate-backend:$(OFFLINE_VERSION) \
--load . || echo " Failed"
@echo ""
@echo "构建 datamate-frontend..."
@docker buildx build \
--cache-from type=local,src=$(CACHE_DIR)/buildkit/frontend-cache \
--pull=false \
-f scripts/images/frontend/Dockerfile \
-t datamate-frontend:$(OFFLINE_VERSION) \
--load . || echo " Failed"
@echo ""
@echo "构建 datamate-runtime..."
@docker buildx build \
--cache-from type=local,src=$(CACHE_DIR)/buildkit/runtime-cache \
--pull=false \
--build-arg RESOURCES_DIR=$(CACHE_DIR)/resources \
-f scripts/images/runtime/Dockerfile \
-t datamate-runtime:$(OFFLINE_VERSION) \
--load . || echo " Failed"
@echo ""
@echo "构建 datamate-backend-python..."
@docker buildx build \
--cache-from type=local,src=$(CACHE_DIR)/buildkit/backend-python-cache \
--pull=false \
--build-arg RESOURCES_DIR=$(CACHE_DIR)/resources \
-f scripts/images/backend-python/Dockerfile \
-t datamate-backend-python:$(OFFLINE_VERSION) \
--load . || echo " Failed"
@echo ""
@echo "======================================"
@echo "✓ 离线构建完成"
@echo "======================================"
# 单个服务离线构建 (BuildKit)
.PHONY: %-offline-build
%-offline-build: offline-setup ensure-buildx
@echo "离线构建 $*..."
@if [ ! -d "$(CACHE_DIR)/buildkit/$*-cache" ]; then \
echo "错误: $* 的缓存不存在"; \
exit 1; \
fi
@$(eval IMAGE_NAME := $(if $(filter deer-flow%,$*),$*,datamate-$*))
@docker buildx build \
--cache-from type=local,src=$(CACHE_DIR)/buildkit/$*-cache \
--pull=false \
$(if $(filter runtime backend-python deer-flow%,$*),--build-arg RESOURCES_DIR=$(CACHE_DIR)/resources,) \
-f scripts/images/$*/Dockerfile \
-t $(IMAGE_NAME):$(OFFLINE_VERSION) \
--load .
# 传统 Docker 构建(不使用 BuildKit,更稳定)
.PHONY: offline-build-classic
offline-build-classic: offline-setup
@echo "使用传统 docker build 进行离线构建..."
@bash scripts/offline/build-offline-classic.sh $(CACHE_DIR) $(OFFLINE_VERSION)
# 诊断离线环境
.PHONY: offline-diagnose
offline-diagnose:
@bash scripts/offline/diagnose.sh $(CACHE_DIR)
# 构建 APT 预装基础镜像(有网环境)
.PHONY: offline-build-base-images
offline-build-base-images:
@echo "构建 APT 预装基础镜像..."
@bash scripts/offline/build-base-images.sh $(CACHE_DIR)
# 使用预装基础镜像进行离线构建(推荐)
.PHONY: offline-build-final
offline-build-final: offline-setup
@echo "使用预装 APT 包的基础镜像进行离线构建..."
@bash scripts/offline/build-offline-final.sh $(CACHE_DIR) $(OFFLINE_VERSION)
# 完整离线导出(包含 APT 预装基础镜像)
.PHONY: offline-export-full
offline-export-full:
@echo "======================================"
@echo "完整离线缓存导出(含 APT 预装基础镜像)"
@echo "======================================"
@$(MAKE) offline-build-base-images
@$(MAKE) offline-export
@echo ""
@echo "导出完成!传输时请包含以下文件:"
@echo " - build-cache/images/base-images-with-apt.tar"
@echo " - build-cache-YYYYMMDD.tar.gz"
# ========== 帮助 ==========
.PHONY: help-offline
help-offline:
@echo "离线构建命令:"
@echo ""
@echo "【有网环境】"
@echo " make offline-export [CACHE_DIR=./build-cache] - 导出构建缓存"
@echo " make offline-export-full - 导出完整缓存(含 APT 预装基础镜像)"
@echo " make offline-build-base-images - 构建 APT 预装基础镜像"
@echo ""
@echo "【无网环境】"
@echo " make offline-setup [CACHE_DIR=./build-cache] - 解压并准备离线缓存"
@echo " make offline-build-final - 使用预装基础镜像构建(推荐,解决 APT 问题)"
@echo " make offline-build-classic - 使用传统 docker build"
@echo " make offline-build - 使用 BuildKit 构建"
@echo " make offline-diagnose - 诊断离线构建环境"
@echo " make <service>-offline-build - 离线构建单个服务"
@echo ""
@echo "【完整工作流程(推荐)】"
@echo " # 1. 有网环境导出完整缓存"
@echo " make offline-export-full"
@echo ""
@echo " # 2. 传输到无网环境(需要传输两个文件)"
@echo " scp build-cache/images/base-images-with-apt.tar user@offline-server:/path/"
@echo " scp build-cache-*.tar.gz user@offline-server:/path/"
@echo ""
@echo " # 3. 无网环境构建"
@echo " tar -xzf build-cache-*.tar.gz"
@echo " docker load -i build-cache/images/base-images-with-apt.tar"
@echo " make offline-build-final"

View File

@@ -1,5 +1,6 @@
package com.datamate.datamanagement.application;
import com.baomidou.mybatisplus.core.conditions.update.LambdaUpdateWrapper;
import com.baomidou.mybatisplus.core.metadata.IPage;
import com.baomidou.mybatisplus.extension.plugins.pagination.Page;
import com.datamate.common.domain.utils.ChunksSaver;
@@ -19,8 +20,11 @@ import com.datamate.datamanagement.infrastructure.exception.DataManagementErrorC
import com.datamate.datamanagement.infrastructure.persistence.mapper.TagMapper;
import com.datamate.datamanagement.infrastructure.persistence.repository.DatasetFileRepository;
import com.datamate.datamanagement.infrastructure.persistence.repository.DatasetRepository;
import com.datamate.datamanagement.infrastructure.persistence.repository.dto.DatasetFileCount;
import com.datamate.datamanagement.interfaces.converter.DatasetConverter;
import com.datamate.datamanagement.interfaces.dto.*;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.collections4.CollectionUtils;
@@ -53,6 +57,7 @@ public class DatasetApplicationService {
private static final int SIMILAR_DATASET_MAX_LIMIT = 50;
private static final int SIMILAR_DATASET_CANDIDATE_FACTOR = 5;
private static final int SIMILAR_DATASET_CANDIDATE_MAX = 100;
private static final String DERIVED_METADATA_KEY = "derived_from_file_id";
private final DatasetRepository datasetRepository;
private final TagMapper tagMapper;
private final DatasetFileRepository datasetFileRepository;
@@ -73,7 +78,7 @@ public class DatasetApplicationService {
Dataset dataset = DatasetConverter.INSTANCE.convertToDataset(createDatasetRequest);
Dataset parentDataset = resolveParentDataset(createDatasetRequest.getParentDatasetId(), dataset.getId());
dataset.setParentDatasetId(parentDataset == null ? null : parentDataset.getId());
dataset.initCreateParam(datasetBasePath, parentDataset == null ? null : parentDataset.getPath());
dataset.initCreateParam(datasetBasePath);
// 处理标签
Set<Tag> processedTags = Optional.ofNullable(createDatasetRequest.getTags())
.filter(CollectionUtils::isNotEmpty)
@@ -97,6 +102,7 @@ public class DatasetApplicationService {
public Dataset updateDataset(String datasetId, UpdateDatasetRequest updateDatasetRequest) {
Dataset dataset = datasetRepository.getById(datasetId);
BusinessAssert.notNull(dataset, DataManagementErrorCode.DATASET_NOT_FOUND);
if (StringUtils.hasText(updateDatasetRequest.getName())) {
dataset.setName(updateDatasetRequest.getName());
}
@@ -109,13 +115,31 @@ public class DatasetApplicationService {
if (Objects.nonNull(updateDatasetRequest.getStatus())) {
dataset.setStatus(updateDatasetRequest.getStatus());
}
if (updateDatasetRequest.getParentDatasetId() != null) {
if (updateDatasetRequest.isParentDatasetIdProvided()) {
// 保存原始的 parentDatasetId 值,用于比较是否发生了变化
String originalParentDatasetId = dataset.getParentDatasetId();
// 处理父数据集变更:仅当请求显式包含 parentDatasetId 时处理
// handleParentChange 内部通过 normalizeParentId 方法将空字符串和 null 都转换为 null
// 这样既支持设置新的父数据集,也支持清除关联
handleParentChange(dataset, updateDatasetRequest.getParentDatasetId());
// 检查 parentDatasetId 是否发生了变化
if (!Objects.equals(originalParentDatasetId, dataset.getParentDatasetId())) {
// 使用 LambdaUpdateWrapper 显式地更新 parentDatasetId 字段
// 这样即使值为 null 也能被正确更新到数据库
datasetRepository.update(null, new LambdaUpdateWrapper<Dataset>()
.eq(Dataset::getId, datasetId)
.set(Dataset::getParentDatasetId, dataset.getParentDatasetId()));
}
}
if (StringUtils.hasText(updateDatasetRequest.getDataSource())) {
// 数据源id不为空,使用异步线程进行文件扫盘落库
processDataSourceAsync(dataset.getId(), updateDatasetRequest.getDataSource());
}
// 更新其他字段(不包括 parentDatasetId,因为它已经在上面的代码中更新了)
datasetRepository.updateById(dataset);
return dataset;
}
@@ -142,6 +166,7 @@ public class DatasetApplicationService {
BusinessAssert.notNull(dataset, DataManagementErrorCode.DATASET_NOT_FOUND);
List<DatasetFile> datasetFiles = datasetFileRepository.findAllByDatasetId(datasetId);
dataset.setFiles(datasetFiles);
applyVisibleFileCounts(Collections.singletonList(dataset));
return dataset;
}
@@ -153,6 +178,7 @@ public class DatasetApplicationService {
IPage<Dataset> page = new Page<>(query.getPage(), query.getSize());
page = datasetRepository.findByCriteria(page, query);
String datasetPvcName = getDatasetPvcName();
applyVisibleFileCounts(page.getRecords());
List<DatasetResponse> datasetResponses = DatasetConverter.INSTANCE.convertToResponse(page.getRecords());
datasetResponses.forEach(dataset -> dataset.setPvcName(datasetPvcName));
return PagedResponse.of(datasetResponses, page.getCurrent(), page.getTotal(), page.getPages());
@@ -200,6 +226,7 @@ public class DatasetApplicationService {
})
.limit(safeLimit)
.toList();
applyVisibleFileCounts(sorted);
List<DatasetResponse> responses = DatasetConverter.INSTANCE.convertToResponse(sorted);
responses.forEach(item -> item.setPvcName(datasetPvcName));
return responses;
@@ -291,7 +318,9 @@ public class DatasetApplicationService {
private void handleParentChange(Dataset dataset, String parentDatasetId) {
String normalized = normalizeParentId(parentDatasetId);
if (Objects.equals(dataset.getParentDatasetId(), normalized)) {
String expectedPath = buildDatasetPath(datasetBasePath, dataset.getId());
if (Objects.equals(dataset.getParentDatasetId(), normalized)
&& Objects.equals(dataset.getPath(), expectedPath)) {
return;
}
long childCount = datasetRepository.countByParentId(dataset.getId());
@@ -299,8 +328,7 @@ public class DatasetApplicationService {
throw BusinessException.of(DataManagementErrorCode.DATASET_HAS_CHILDREN);
}
Dataset parent = normalized == null ? null : resolveParentDataset(normalized, dataset.getId());
String newPath = buildDatasetPath(parent == null ? datasetBasePath : parent.getPath(), dataset.getId());
moveDatasetPath(dataset, newPath);
moveDatasetPath(dataset, expectedPath);
dataset.setParentDatasetId(parent == null ? null : parent.getId());
}
@@ -344,6 +372,61 @@ public class DatasetApplicationService {
dataset.setPath(newPath);
}
private void applyVisibleFileCounts(List<Dataset> datasets) {
if (CollectionUtils.isEmpty(datasets)) {
return;
}
List<String> datasetIds = datasets.stream()
.filter(Objects::nonNull)
.map(Dataset::getId)
.filter(StringUtils::hasText)
.toList();
if (datasetIds.isEmpty()) {
return;
}
Map<String, Long> countMap = datasetFileRepository.countNonDerivedByDatasetIds(datasetIds).stream()
.filter(Objects::nonNull)
.collect(Collectors.toMap(
DatasetFileCount::getDatasetId,
count -> Optional.ofNullable(count.getFileCount()).orElse(0L),
(left, right) -> left
));
for (Dataset dataset : datasets) {
if (dataset == null || !StringUtils.hasText(dataset.getId())) {
continue;
}
Long visibleCount = countMap.get(dataset.getId());
dataset.setFileCount(visibleCount != null ? visibleCount : 0L);
}
}
private List<DatasetFile> filterVisibleFiles(List<DatasetFile> files) {
if (CollectionUtils.isEmpty(files)) {
return Collections.emptyList();
}
return files.stream()
.filter(file -> !isDerivedFile(file))
.collect(Collectors.toList());
}
private boolean isDerivedFile(DatasetFile datasetFile) {
if (datasetFile == null) {
return false;
}
String metadata = datasetFile.getMetadata();
if (!StringUtils.hasText(metadata)) {
return false;
}
try {
ObjectMapper mapper = new ObjectMapper();
Map<String, Object> metadataMap = mapper.readValue(metadata, new TypeReference<Map<String, Object>>() {});
return metadataMap.get(DERIVED_METADATA_KEY) != null;
} catch (Exception e) {
log.debug("Failed to parse dataset file metadata for derived detection: {}", datasetFile.getId(), e);
return false;
}
}
/**
* 获取数据集统计信息
*/
@@ -356,27 +439,29 @@ public class DatasetApplicationService {
Map<String, Object> statistics = new HashMap<>();
// 基础统计
Long totalFiles = datasetFileRepository.countByDatasetId(datasetId);
Long completedFiles = datasetFileRepository.countCompletedByDatasetId(datasetId);
List<DatasetFile> allFiles = datasetFileRepository.findAllByDatasetId(datasetId);
List<DatasetFile> visibleFiles = filterVisibleFiles(allFiles);
long totalFiles = visibleFiles.size();
long completedFiles = visibleFiles.stream()
.filter(file -> "COMPLETED".equalsIgnoreCase(file.getStatus()))
.count();
Long totalSize = datasetFileRepository.sumSizeByDatasetId(datasetId);
statistics.put("totalFiles", totalFiles != null ? totalFiles.intValue() : 0);
statistics.put("completedFiles", completedFiles != null ? completedFiles.intValue() : 0);
statistics.put("totalFiles", (int) totalFiles);
statistics.put("completedFiles", (int) completedFiles);
statistics.put("totalSize", totalSize != null ? totalSize : 0L);
// 完成率计算
float completionRate = 0.0f;
if (totalFiles != null && totalFiles > 0) {
completionRate = (completedFiles != null ? completedFiles.floatValue() : 0.0f) / totalFiles.floatValue() * 100.0f;
if (totalFiles > 0) {
completionRate = ((float) completedFiles) / (float) totalFiles * 100.0f;
}
statistics.put("completionRate", completionRate);
// 文件类型分布统计
Map<String, Integer> fileTypeDistribution = new HashMap<>();
List<DatasetFile> allFiles = datasetFileRepository.findAllByDatasetId(datasetId);
if (allFiles != null) {
for (DatasetFile file : allFiles) {
if (!visibleFiles.isEmpty()) {
for (DatasetFile file : visibleFiles) {
String fileType = file.getFileType() != null ? file.getFileType() : "unknown";
fileTypeDistribution.put(fileType, fileTypeDistribution.getOrDefault(fileType, 0) + 1);
}
@@ -385,8 +470,8 @@ public class DatasetApplicationService {
// 状态分布统计
Map<String, Integer> statusDistribution = new HashMap<>();
if (allFiles != null) {
for (DatasetFile file : allFiles) {
if (!visibleFiles.isEmpty()) {
for (DatasetFile file : visibleFiles) {
String status = file.getStatus() != null ? file.getStatus() : "unknown";
statusDistribution.put(status, statusDistribution.getOrDefault(status, 0) + 1);
}
@@ -413,33 +498,32 @@ public class DatasetApplicationService {
public void processDataSourceAsync(String datasetId, String dataSourceId) {
try {
log.info("Initiating data source file scanning, dataset ID: {}, collection task ID: {}", datasetId, dataSourceId);
List<String> filePaths = getFilePaths(dataSourceId);
CollectionTaskDetailResponse taskDetail = collectionTaskClient.getTaskDetail(dataSourceId).getData();
if (taskDetail == null) {
log.warn("Fail to get collection task detail, task ID: {}", dataSourceId);
return;
}
Path targetPath = Paths.get(taskDetail.getTargetPath());
if (!Files.exists(targetPath) || !Files.isDirectory(targetPath)) {
log.warn("Target path not exists or is not a directory: {}", taskDetail.getTargetPath());
return;
}
List<String> filePaths = scanFilePaths(targetPath);
if (CollectionUtils.isEmpty(filePaths)) {
return;
}
datasetFileApplicationService.copyFilesToDatasetDir(datasetId, new CopyFilesRequest(filePaths));
datasetFileApplicationService.copyFilesToDatasetDirWithSourceRoot(datasetId, targetPath, filePaths);
log.info("Success file scan, total files: {}", filePaths.size());
} catch (Exception e) {
log.error("处理数据源文件扫描失败,数据集ID: {}, 数据源ID: {}", datasetId, dataSourceId, e);
}
}
private List<String> getFilePaths(String dataSourceId) {
CollectionTaskDetailResponse taskDetail = collectionTaskClient.getTaskDetail(dataSourceId).getData();
if (taskDetail == null) {
log.warn("Fail to get collection task detail, task ID: {}", dataSourceId);
return Collections.emptyList();
}
Path targetPath = Paths.get(taskDetail.getTargetPath());
if (!Files.exists(targetPath) || !Files.isDirectory(targetPath)) {
log.warn("Target path not exists or is not a directory: {}", taskDetail.getTargetPath());
return Collections.emptyList();
}
try (Stream<Path> paths = Files.walk(targetPath, 1)) {
private List<String> scanFilePaths(Path targetPath) {
try (Stream<Path> paths = Files.walk(targetPath)) {
return paths
.filter(Files::isRegularFile) // 只保留文件,排除目录
.map(Path::toString) // 转换为字符串路径
.filter(Files::isRegularFile)
.map(Path::toString)
.collect(Collectors.toList());
} catch (IOException e) {
log.error("Fail to scan directory: {}", targetPath, e);

View File

@@ -28,6 +28,7 @@ import com.datamate.datamanagement.interfaces.dto.CopyFilesRequest;
import com.datamate.datamanagement.interfaces.dto.CreateDirectoryRequest;
import com.datamate.datamanagement.interfaces.dto.UploadFileRequest;
import com.datamate.datamanagement.interfaces.dto.UploadFilesPreRequest;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
import jakarta.servlet.http.HttpServletResponse;
@@ -42,6 +43,8 @@ import org.springframework.core.io.UrlResource;
import org.springframework.http.HttpHeaders;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
import org.springframework.transaction.support.TransactionSynchronization;
import org.springframework.transaction.support.TransactionSynchronizationManager;
import java.io.File;
import java.io.IOException;
@@ -70,12 +73,22 @@ public class DatasetFileApplicationService {
private static final String PDF_FILE_TYPE = "pdf";
private static final String DOC_FILE_TYPE = "doc";
private static final String DOCX_FILE_TYPE = "docx";
private static final Set<String> DOCUMENT_TEXT_FILE_TYPES = Set.of(PDF_FILE_TYPE, DOC_FILE_TYPE, DOCX_FILE_TYPE);
private static final String XLS_FILE_TYPE = "xls";
private static final String XLSX_FILE_TYPE = "xlsx";
private static final Set<String> DOCUMENT_TEXT_FILE_TYPES = Set.of(
PDF_FILE_TYPE,
DOC_FILE_TYPE,
DOCX_FILE_TYPE,
XLS_FILE_TYPE,
XLSX_FILE_TYPE
);
private static final String DERIVED_METADATA_KEY = "derived_from_file_id";
private final DatasetFileRepository datasetFileRepository;
private final DatasetRepository datasetRepository;
private final FileService fileService;
private final PdfTextExtractAsyncService pdfTextExtractAsyncService;
private final DatasetFileRepository datasetFileRepository;
private final DatasetRepository datasetRepository;
private final FileService fileService;
private final PdfTextExtractAsyncService pdfTextExtractAsyncService;
private final DatasetFilePreviewService datasetFilePreviewService;
@Value("${datamate.data-management.base-path:/dataset}")
private String datasetBasePath;
@@ -84,15 +97,17 @@ public class DatasetFileApplicationService {
private DuplicateMethod duplicateMethod;
@Autowired
public DatasetFileApplicationService(DatasetFileRepository datasetFileRepository,
DatasetRepository datasetRepository,
FileService fileService,
PdfTextExtractAsyncService pdfTextExtractAsyncService) {
this.datasetFileRepository = datasetFileRepository;
this.datasetRepository = datasetRepository;
this.fileService = fileService;
this.pdfTextExtractAsyncService = pdfTextExtractAsyncService;
}
public DatasetFileApplicationService(DatasetFileRepository datasetFileRepository,
DatasetRepository datasetRepository,
FileService fileService,
PdfTextExtractAsyncService pdfTextExtractAsyncService,
DatasetFilePreviewService datasetFilePreviewService) {
this.datasetFileRepository = datasetFileRepository;
this.datasetRepository = datasetRepository;
this.fileService = fileService;
this.pdfTextExtractAsyncService = pdfTextExtractAsyncService;
this.datasetFilePreviewService = datasetFilePreviewService;
}
/**
* 获取数据集文件列表
@@ -111,7 +126,7 @@ public class DatasetFileApplicationService {
* @param status 状态过滤
* @param name 文件名模糊查询
* @param hasAnnotation 是否有标注
* @param excludeSourceDocuments 是否排除已被转换为TXT的源文档(PDF/DOC/DOCX)
* @param excludeSourceDocuments 是否排除源文档(PDF/DOC/DOCX/XLS/XLSX
* @param pagingQuery 分页参数
* @return 分页文件列表
*/
@@ -122,19 +137,15 @@ public class DatasetFileApplicationService {
IPage<DatasetFile> files = datasetFileRepository.findByCriteria(datasetId, fileType, status, name, hasAnnotation, page);
if (excludeSourceDocuments) {
// 查询所有作为衍生TXT文件源的文档文件ID
List<String> sourceFileIds = datasetFileRepository.findSourceFileIdsWithDerivedFiles(datasetId);
if (!sourceFileIds.isEmpty()) {
// 过滤掉源文件
List<DatasetFile> filteredRecords = files.getRecords().stream()
.filter(file -> !sourceFileIds.contains(file.getId()))
.collect(Collectors.toList());
// 重新构建分页结果
Page<DatasetFile> filteredPage = new Page<>(files.getCurrent(), files.getSize(), files.getTotal());
filteredPage.setRecords(filteredRecords);
return PagedResponse.of(filteredPage);
}
// 过滤掉源文档文件(PDF/DOC/DOCX/XLS/XLSX),用于标注场景只展示派生文件
List<DatasetFile> filteredRecords = files.getRecords().stream()
.filter(file -> !isSourceDocument(file))
.collect(Collectors.toList());
// 重新构建分页结果
Page<DatasetFile> filteredPage = new Page<>(files.getCurrent(), files.getSize(), files.getTotal());
filteredPage.setRecords(filteredRecords);
return PagedResponse.of(filteredPage);
}
return PagedResponse.of(files);
@@ -144,7 +155,7 @@ public class DatasetFileApplicationService {
* 获取数据集文件列表
*/
@Transactional(readOnly = true)
public PagedResponse<DatasetFile> getDatasetFilesWithDirectory(String datasetId, String prefix, PagingQuery pagingQuery) {
public PagedResponse<DatasetFile> getDatasetFilesWithDirectory(String datasetId, String prefix, boolean excludeDerivedFiles, PagingQuery pagingQuery) {
Dataset dataset = datasetRepository.getById(datasetId);
int page = Math.max(pagingQuery.getPage(), 1);
int size = pagingQuery.getSize() == null || pagingQuery.getSize() < 0 ? 20 : pagingQuery.getSize();
@@ -153,15 +164,36 @@ public class DatasetFileApplicationService {
}
String datasetPath = dataset.getPath();
Path queryPath = Path.of(dataset.getPath() + File.separator + prefix);
Map<String, DatasetFile> datasetFilesMap = datasetFileRepository.findAllByDatasetId(datasetId)
.stream().collect(Collectors.toMap(DatasetFile::getFilePath, Function.identity()));
Map<String, DatasetFile> datasetFilesMap = datasetFileRepository.findAllByDatasetId(datasetId)
.stream()
.filter(file -> file.getFilePath() != null)
.collect(Collectors.toMap(
file -> normalizeFilePath(file.getFilePath()),
Function.identity(),
(left, right) -> left
));
Set<String> derivedFilePaths = excludeDerivedFiles
? datasetFilesMap.values().stream()
.filter(this::isDerivedFile)
.map(DatasetFile::getFilePath)
.map(this::normalizeFilePath)
.filter(Objects::nonNull)
.collect(Collectors.toSet())
: Collections.emptySet();
// 如果目录不存在,直接返回空结果(数据集刚创建时目录可能还未生成)
if (!Files.exists(queryPath)) {
return new PagedResponse<>(page, size, 0, 0, Collections.emptyList());
}
try (Stream<Path> pathStream = Files.list(queryPath)) {
List<Path> allFiles = pathStream
.filter(path -> path.toString().startsWith(datasetPath))
.sorted(Comparator
.comparing((Path path) -> !Files.isDirectory(path))
.thenComparing(path -> path.getFileName().toString()))
.collect(Collectors.toList());
List<Path> allFiles = pathStream
.filter(path -> path.toString().startsWith(datasetPath))
.filter(path -> !excludeDerivedFiles
|| Files.isDirectory(path)
|| !derivedFilePaths.contains(normalizeFilePath(path.toString())))
.sorted(Comparator
.comparing((Path path) -> !Files.isDirectory(path))
.thenComparing(path -> path.getFileName().toString()))
.collect(Collectors.toList());
// 计算分页
int total = allFiles.size();
@@ -176,7 +208,9 @@ public class DatasetFileApplicationService {
if (fromIndex < total) {
pageData = allFiles.subList(fromIndex, toIndex);
}
List<DatasetFile> datasetFiles = pageData.stream().map(path -> getDatasetFile(path, datasetFilesMap)).toList();
List<DatasetFile> datasetFiles = pageData.stream()
.map(path -> getDatasetFile(path, datasetFilesMap, excludeDerivedFiles, derivedFilePaths))
.toList();
return new PagedResponse<>(page, size, total, totalPages, datasetFiles);
} catch (IOException e) {
@@ -185,9 +219,12 @@ public class DatasetFileApplicationService {
}
}
private DatasetFile getDatasetFile(Path path, Map<String, DatasetFile> datasetFilesMap) {
DatasetFile datasetFile = new DatasetFile();
LocalDateTime localDateTime = LocalDateTime.now();
private DatasetFile getDatasetFile(Path path,
Map<String, DatasetFile> datasetFilesMap,
boolean excludeDerivedFiles,
Set<String> derivedFilePaths) {
DatasetFile datasetFile = new DatasetFile();
LocalDateTime localDateTime = LocalDateTime.now();
try {
localDateTime = Files.getLastModifiedTime(path).toInstant().atZone(ZoneId.systemDefault()).toLocalDateTime();
} catch (IOException e) {
@@ -206,23 +243,32 @@ public class DatasetFileApplicationService {
long fileCount;
long totalSize;
try (Stream<Path> walk = Files.walk(path)) {
fileCount = walk.filter(Files::isRegularFile).count();
}
try (Stream<Path> walk = Files.walk(path)) {
totalSize = walk
.filter(Files::isRegularFile)
.mapToLong(p -> {
try {
return Files.size(p);
} catch (IOException e) {
log.error("get file size error", e);
return 0L;
}
})
.sum();
}
try (Stream<Path> walk = Files.walk(path)) {
Stream<Path> fileStream = walk.filter(Files::isRegularFile);
if (excludeDerivedFiles && !derivedFilePaths.isEmpty()) {
fileStream = fileStream.filter(filePath ->
!derivedFilePaths.contains(normalizeFilePath(filePath.toString())));
}
fileCount = fileStream.count();
}
try (Stream<Path> walk = Files.walk(path)) {
Stream<Path> fileStream = walk.filter(Files::isRegularFile);
if (excludeDerivedFiles && !derivedFilePaths.isEmpty()) {
fileStream = fileStream.filter(filePath ->
!derivedFilePaths.contains(normalizeFilePath(filePath.toString())));
}
totalSize = fileStream
.mapToLong(p -> {
try {
return Files.size(p);
} catch (IOException e) {
log.error("get file size error", e);
return 0L;
}
})
.sum();
}
datasetFile.setFileCount(fileCount);
datasetFile.setFileSize(totalSize);
@@ -230,15 +276,55 @@ public class DatasetFileApplicationService {
log.error("stat directory info error", e);
}
} else {
DatasetFile exist = datasetFilesMap.get(path.toString());
if (exist == null) {
datasetFile.setId("file-" + datasetFile.getFileName());
datasetFile.setFileSize(path.toFile().length());
} else {
DatasetFile exist = datasetFilesMap.get(normalizeFilePath(path.toString()));
if (exist == null) {
datasetFile.setId("file-" + datasetFile.getFileName());
datasetFile.setFileSize(path.toFile().length());
} else {
datasetFile = exist;
}
}
return datasetFile;
return datasetFile;
}
private String normalizeFilePath(String filePath) {
if (filePath == null || filePath.isBlank()) {
return null;
}
try {
return Paths.get(filePath).toAbsolutePath().normalize().toString();
} catch (Exception e) {
return filePath.replace("\\", "/");
}
}
private boolean isSourceDocument(DatasetFile datasetFile) {
if (datasetFile == null) {
return false;
}
String fileType = datasetFile.getFileType();
if (fileType == null || fileType.isBlank()) {
return false;
}
return DOCUMENT_TEXT_FILE_TYPES.contains(fileType.toLowerCase(Locale.ROOT));
}
private boolean isDerivedFile(DatasetFile datasetFile) {
if (datasetFile == null) {
return false;
}
String metadata = datasetFile.getMetadata();
if (metadata == null || metadata.isBlank()) {
return false;
}
try {
ObjectMapper mapper = new ObjectMapper();
Map<String, Object> metadataMap = mapper.readValue(metadata, new TypeReference<Map<String, Object>>() {});
return metadataMap.get(DERIVED_METADATA_KEY) != null;
} catch (Exception e) {
log.debug("Failed to parse dataset file metadata for derived detection: {}", datasetFile.getId(), e);
return false;
}
}
/**
@@ -260,18 +346,19 @@ public class DatasetFileApplicationService {
* 删除文件
*/
@Transactional
public void deleteDatasetFile(String datasetId, String fileId) {
DatasetFile file = getDatasetFile(datasetId, fileId);
Dataset dataset = datasetRepository.getById(datasetId);
dataset.setFiles(new ArrayList<>(Collections.singleton(file)));
datasetFileRepository.removeById(fileId);
dataset.removeFile(file);
datasetRepository.updateById(dataset);
// 删除文件时,上传到数据集中的文件会同时删除数据库中的记录和文件系统中的文件,归集过来的文件仅删除数据库中的记录
if (file.getFilePath().startsWith(dataset.getPath())) {
try {
Path filePath = Paths.get(file.getFilePath());
Files.deleteIfExists(filePath);
public void deleteDatasetFile(String datasetId, String fileId) {
DatasetFile file = getDatasetFile(datasetId, fileId);
Dataset dataset = datasetRepository.getById(datasetId);
dataset.setFiles(new ArrayList<>(Collections.singleton(file)));
datasetFileRepository.removeById(fileId);
dataset.removeFile(file);
datasetRepository.updateById(dataset);
datasetFilePreviewService.deletePreviewFileQuietly(datasetId, fileId);
// 删除文件时,上传到数据集中的文件会同时删除数据库中的记录和文件系统中的文件,归集过来的文件仅删除数据库中的记录
if (file.getFilePath().startsWith(dataset.getPath())) {
try {
Path filePath = Paths.get(file.getFilePath());
Files.deleteIfExists(filePath);
} catch (IOException ex) {
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}
@@ -637,9 +724,10 @@ public class DatasetFileApplicationService {
})
.collect(Collectors.toList());
for (DatasetFile file : filesToDelete) {
datasetFileRepository.removeById(file.getId());
}
for (DatasetFile file : filesToDelete) {
datasetFileRepository.removeById(file.getId());
datasetFilePreviewService.deletePreviewFileQuietly(datasetId, file.getId());
}
// 删除文件系统中的目录
try {
@@ -739,6 +827,71 @@ public class DatasetFileApplicationService {
return copiedFiles;
}
/**
* 复制文件到数据集目录(保留相对路径,适用于数据源导入)
*
* @param datasetId 数据集id
* @param sourceRoot 数据源根目录
* @param sourcePaths 源文件路径列表
* @return 复制的文件列表
*/
@Transactional
public List<DatasetFile> copyFilesToDatasetDirWithSourceRoot(String datasetId, Path sourceRoot, List<String> sourcePaths) {
Dataset dataset = datasetRepository.getById(datasetId);
BusinessAssert.notNull(dataset, SystemErrorCode.RESOURCE_NOT_FOUND);
Path normalizedRoot = sourceRoot.toAbsolutePath().normalize();
List<DatasetFile> copiedFiles = new ArrayList<>();
List<DatasetFile> existDatasetFiles = datasetFileRepository.findAllByDatasetId(datasetId);
dataset.setFiles(existDatasetFiles);
Map<String, DatasetFile> copyTargets = new LinkedHashMap<>();
for (String sourceFilePath : sourcePaths) {
if (sourceFilePath == null || sourceFilePath.isBlank()) {
continue;
}
Path sourcePath = Paths.get(sourceFilePath).toAbsolutePath().normalize();
if (!sourcePath.startsWith(normalizedRoot)) {
log.warn("Source file path is out of root: {}", sourceFilePath);
continue;
}
if (!Files.exists(sourcePath) || !Files.isRegularFile(sourcePath)) {
log.warn("Source file does not exist or is not a regular file: {}", sourceFilePath);
continue;
}
Path relativePath = normalizedRoot.relativize(sourcePath);
String fileName = sourcePath.getFileName().toString();
File sourceFile = sourcePath.toFile();
LocalDateTime currentTime = LocalDateTime.now();
Path targetPath = Paths.get(dataset.getPath(), relativePath.toString());
DatasetFile datasetFile = DatasetFile.builder()
.id(UUID.randomUUID().toString())
.datasetId(datasetId)
.fileName(fileName)
.fileType(AnalyzerUtils.getExtension(fileName))
.fileSize(sourceFile.length())
.filePath(targetPath.toString())
.uploadTime(currentTime)
.lastAccessTime(currentTime)
.build();
setDatasetFileId(datasetFile, dataset);
dataset.addFile(datasetFile);
copiedFiles.add(datasetFile);
copyTargets.put(sourceFilePath, datasetFile);
}
if (copiedFiles.isEmpty()) {
return copiedFiles;
}
datasetFileRepository.saveOrUpdateBatch(copiedFiles, 100);
dataset.active();
datasetRepository.updateById(dataset);
CompletableFuture.runAsync(() -> copyFilesToDatasetDirWithRelativePath(copyTargets, dataset, normalizedRoot));
return copiedFiles;
}
private void copyFilesToDatasetDir(List<String> sourcePaths, Dataset dataset) {
for (String sourcePath : sourcePaths) {
Path sourceFilePath = Paths.get(sourcePath);
@@ -757,6 +910,35 @@ public class DatasetFileApplicationService {
}
}
private void copyFilesToDatasetDirWithRelativePath(
Map<String, DatasetFile> copyTargets,
Dataset dataset,
Path sourceRoot
) {
Path datasetRoot = Paths.get(dataset.getPath()).toAbsolutePath().normalize();
Path normalizedRoot = sourceRoot.toAbsolutePath().normalize();
for (Map.Entry<String, DatasetFile> entry : copyTargets.entrySet()) {
Path sourcePath = Paths.get(entry.getKey()).toAbsolutePath().normalize();
if (!sourcePath.startsWith(normalizedRoot)) {
log.warn("Source file path is out of root: {}", sourcePath);
continue;
}
Path relativePath = normalizedRoot.relativize(sourcePath);
Path targetFilePath = datasetRoot.resolve(relativePath).normalize();
if (!targetFilePath.startsWith(datasetRoot)) {
log.warn("Target file path is out of dataset path: {}", targetFilePath);
continue;
}
try {
Files.createDirectories(targetFilePath.getParent());
Files.copy(sourcePath, targetFilePath);
triggerPdfTextExtraction(dataset, entry.getValue());
} catch (IOException e) {
log.error("Failed to copy file from {} to {}", sourcePath, targetFilePath, e);
}
}
}
/**
* 添加文件到数据集(仅创建数据库记录,不执行文件系统操作)
*
@@ -824,6 +1006,20 @@ public class DatasetFileApplicationService {
if (fileType == null || !DOCUMENT_TEXT_FILE_TYPES.contains(fileType.toLowerCase(Locale.ROOT))) {
return;
}
pdfTextExtractAsyncService.extractPdfText(dataset.getId(), datasetFile.getId());
String datasetId = dataset.getId();
String fileId = datasetFile.getId();
if (datasetId == null || fileId == null) {
return;
}
if (TransactionSynchronizationManager.isSynchronizationActive()) {
TransactionSynchronizationManager.registerSynchronization(new TransactionSynchronization() {
@Override
public void afterCommit() {
pdfTextExtractAsyncService.extractPdfText(datasetId, fileId);
}
});
return;
}
pdfTextExtractAsyncService.extractPdfText(datasetId, fileId);
}
}

View File

@@ -0,0 +1,171 @@
package com.datamate.datamanagement.application;
import com.datamate.datamanagement.common.enums.KnowledgeItemPreviewStatus;
import com.datamate.datamanagement.domain.model.dataset.DatasetFile;
import com.datamate.datamanagement.infrastructure.config.DataManagementProperties;
import com.datamate.datamanagement.infrastructure.persistence.repository.DatasetFileRepository;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.StringUtils;
import org.springframework.scheduling.annotation.Async;
import org.springframework.stereotype.Service;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Set;
/**
* 数据集文件预览转换异步任务
*/
@Service
@RequiredArgsConstructor
@Slf4j
public class DatasetFilePreviewAsyncService {
private static final Set<String> OFFICE_EXTENSIONS = Set.of("doc", "docx");
private static final String DATASET_PREVIEW_DIR = "dataset-previews";
private static final String PREVIEW_FILE_SUFFIX = ".pdf";
private static final String PATH_SEPARATOR = "/";
private static final int MAX_ERROR_LENGTH = 500;
private static final DateTimeFormatter PREVIEW_TIME_FORMATTER = DateTimeFormatter.ISO_LOCAL_DATE_TIME;
private final DatasetFileRepository datasetFileRepository;
private final DataManagementProperties dataManagementProperties;
private final ObjectMapper objectMapper = new ObjectMapper();
@Async
public void convertPreviewAsync(String fileId) {
if (StringUtils.isBlank(fileId)) {
return;
}
DatasetFile file = datasetFileRepository.getById(fileId);
if (file == null) {
return;
}
String extension = resolveFileExtension(resolveOriginalName(file));
if (!OFFICE_EXTENSIONS.contains(extension)) {
updatePreviewStatus(file, KnowledgeItemPreviewStatus.FAILED, null, "仅支持 DOC/DOCX 转换");
return;
}
if (StringUtils.isBlank(file.getFilePath())) {
updatePreviewStatus(file, KnowledgeItemPreviewStatus.FAILED, null, "源文件路径为空");
return;
}
Path sourcePath = Paths.get(file.getFilePath()).toAbsolutePath().normalize();
if (!Files.exists(sourcePath) || !Files.isRegularFile(sourcePath)) {
updatePreviewStatus(file, KnowledgeItemPreviewStatus.FAILED, null, "源文件不存在");
return;
}
KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo = KnowledgeItemPreviewMetadataHelper
.readPreviewInfo(file.getMetadata(), objectMapper);
String previewRelativePath = StringUtils.defaultIfBlank(
previewInfo.pdfPath(),
resolvePreviewRelativePath(file.getDatasetId(), file.getId())
);
Path targetPath = resolvePreviewStoragePath(previewRelativePath);
try {
ensureParentDirectory(targetPath);
LibreOfficeConverter.convertToPdf(sourcePath, targetPath);
updatePreviewStatus(file, KnowledgeItemPreviewStatus.READY, previewRelativePath, null);
} catch (Exception e) {
log.error("dataset preview convert failed, fileId: {}", file.getId(), e);
updatePreviewStatus(file, KnowledgeItemPreviewStatus.FAILED, previewRelativePath, trimError(e.getMessage()));
}
}
private void updatePreviewStatus(
DatasetFile file,
KnowledgeItemPreviewStatus status,
String previewRelativePath,
String error
) {
if (file == null) {
return;
}
String updatedMetadata = KnowledgeItemPreviewMetadataHelper.applyPreviewInfo(
file.getMetadata(),
objectMapper,
status,
previewRelativePath,
error,
nowText()
);
file.setMetadata(updatedMetadata);
datasetFileRepository.updateById(file);
}
private String resolveOriginalName(DatasetFile file) {
if (file == null) {
return "";
}
if (StringUtils.isNotBlank(file.getFileName())) {
return file.getFileName();
}
if (StringUtils.isNotBlank(file.getFilePath())) {
return Paths.get(file.getFilePath()).getFileName().toString();
}
return "";
}
private String resolveFileExtension(String fileName) {
if (StringUtils.isBlank(fileName)) {
return "";
}
int dotIndex = fileName.lastIndexOf('.');
if (dotIndex <= 0 || dotIndex >= fileName.length() - 1) {
return "";
}
return fileName.substring(dotIndex + 1).toLowerCase();
}
private String resolvePreviewRelativePath(String datasetId, String fileId) {
String relativePath = Paths.get(DATASET_PREVIEW_DIR, datasetId, fileId + PREVIEW_FILE_SUFFIX)
.toString();
return relativePath.replace("\\", PATH_SEPARATOR);
}
private Path resolvePreviewStoragePath(String relativePath) {
String normalizedRelativePath = StringUtils.defaultString(relativePath).replace("/", java.io.File.separator);
Path root = resolveUploadRootPath();
Path target = root.resolve(normalizedRelativePath).toAbsolutePath().normalize();
if (!target.startsWith(root)) {
throw new IllegalArgumentException("invalid preview path");
}
return target;
}
private Path resolveUploadRootPath() {
String uploadDir = dataManagementProperties.getFileStorage().getUploadDir();
return Paths.get(uploadDir).toAbsolutePath().normalize();
}
private void ensureParentDirectory(Path targetPath) {
try {
Path parent = targetPath.getParent();
if (parent != null) {
Files.createDirectories(parent);
}
} catch (Exception e) {
throw new IllegalStateException("创建预览目录失败", e);
}
}
private String trimError(String error) {
if (StringUtils.isBlank(error)) {
return "";
}
if (error.length() <= MAX_ERROR_LENGTH) {
return error;
}
return error.substring(0, MAX_ERROR_LENGTH);
}
private String nowText() {
return LocalDateTime.now().format(PREVIEW_TIME_FORMATTER);
}
}

View File

@@ -0,0 +1,233 @@
package com.datamate.datamanagement.application;
import com.datamate.common.infrastructure.exception.BusinessAssert;
import com.datamate.common.infrastructure.exception.CommonErrorCode;
import com.datamate.datamanagement.common.enums.KnowledgeItemPreviewStatus;
import com.datamate.datamanagement.domain.model.dataset.DatasetFile;
import com.datamate.datamanagement.infrastructure.config.DataManagementProperties;
import com.datamate.datamanagement.infrastructure.persistence.repository.DatasetFileRepository;
import com.datamate.datamanagement.interfaces.dto.DatasetFilePreviewStatusResponse;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.StringUtils;
import org.springframework.stereotype.Service;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Objects;
import java.util.Set;
/**
* 数据集文件预览转换服务
*/
@Service
@RequiredArgsConstructor
@Slf4j
public class DatasetFilePreviewService {
private static final Set<String> OFFICE_EXTENSIONS = Set.of("doc", "docx");
private static final String DATASET_PREVIEW_DIR = "dataset-previews";
private static final String PREVIEW_FILE_SUFFIX = ".pdf";
private static final String PATH_SEPARATOR = "/";
private static final DateTimeFormatter PREVIEW_TIME_FORMATTER = DateTimeFormatter.ISO_LOCAL_DATE_TIME;
private final DatasetFileRepository datasetFileRepository;
private final DataManagementProperties dataManagementProperties;
private final DatasetFilePreviewAsyncService datasetFilePreviewAsyncService;
private final ObjectMapper objectMapper = new ObjectMapper();
public DatasetFilePreviewStatusResponse getPreviewStatus(String datasetId, String fileId) {
DatasetFile file = requireDatasetFile(datasetId, fileId);
assertOfficeDocument(file);
KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo = KnowledgeItemPreviewMetadataHelper
.readPreviewInfo(file.getMetadata(), objectMapper);
if (previewInfo.status() == KnowledgeItemPreviewStatus.READY && !previewPdfExists(file, previewInfo)) {
previewInfo = markPreviewFailed(file, previewInfo, "预览文件不存在");
}
return buildResponse(previewInfo);
}
public DatasetFilePreviewStatusResponse ensurePreview(String datasetId, String fileId) {
DatasetFile file = requireDatasetFile(datasetId, fileId);
assertOfficeDocument(file);
KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo = KnowledgeItemPreviewMetadataHelper
.readPreviewInfo(file.getMetadata(), objectMapper);
if (previewInfo.status() == KnowledgeItemPreviewStatus.READY && previewPdfExists(file, previewInfo)) {
return buildResponse(previewInfo);
}
if (previewInfo.status() == KnowledgeItemPreviewStatus.PROCESSING) {
return buildResponse(previewInfo);
}
String previewRelativePath = resolvePreviewRelativePath(file.getDatasetId(), file.getId());
String updatedMetadata = KnowledgeItemPreviewMetadataHelper.applyPreviewInfo(
file.getMetadata(),
objectMapper,
KnowledgeItemPreviewStatus.PROCESSING,
previewRelativePath,
null,
nowText()
);
file.setMetadata(updatedMetadata);
datasetFileRepository.updateById(file);
datasetFilePreviewAsyncService.convertPreviewAsync(file.getId());
KnowledgeItemPreviewMetadataHelper.PreviewInfo refreshed = KnowledgeItemPreviewMetadataHelper
.readPreviewInfo(updatedMetadata, objectMapper);
return buildResponse(refreshed);
}
public boolean isOfficeDocument(String fileName) {
String extension = resolveFileExtension(fileName);
return StringUtils.isNotBlank(extension) && OFFICE_EXTENSIONS.contains(extension.toLowerCase());
}
public PreviewFile resolveReadyPreviewFile(String datasetId, DatasetFile file) {
if (file == null) {
return null;
}
KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo = KnowledgeItemPreviewMetadataHelper
.readPreviewInfo(file.getMetadata(), objectMapper);
if (previewInfo.status() != KnowledgeItemPreviewStatus.READY) {
return null;
}
String relativePath = StringUtils.defaultIfBlank(previewInfo.pdfPath(), resolvePreviewRelativePath(datasetId, file.getId()));
Path filePath = resolvePreviewStoragePath(relativePath);
if (!Files.exists(filePath) || !Files.isRegularFile(filePath)) {
markPreviewFailed(file, previewInfo, "预览文件不存在");
return null;
}
String previewName = resolvePreviewPdfName(file);
return new PreviewFile(filePath, previewName);
}
public void deletePreviewFileQuietly(String datasetId, String fileId) {
String relativePath = resolvePreviewRelativePath(datasetId, fileId);
Path filePath = resolvePreviewStoragePath(relativePath);
try {
Files.deleteIfExists(filePath);
} catch (Exception e) {
log.warn("delete dataset preview pdf error, fileId: {}", fileId, e);
}
}
private DatasetFilePreviewStatusResponse buildResponse(KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo) {
DatasetFilePreviewStatusResponse response = new DatasetFilePreviewStatusResponse();
KnowledgeItemPreviewStatus status = previewInfo.status() == null
? KnowledgeItemPreviewStatus.PENDING
: previewInfo.status();
response.setStatus(status);
response.setPreviewError(previewInfo.error());
response.setUpdatedAt(previewInfo.updatedAt());
return response;
}
private DatasetFile requireDatasetFile(String datasetId, String fileId) {
BusinessAssert.isTrue(StringUtils.isNotBlank(datasetId), CommonErrorCode.PARAM_ERROR);
BusinessAssert.isTrue(StringUtils.isNotBlank(fileId), CommonErrorCode.PARAM_ERROR);
DatasetFile datasetFile = datasetFileRepository.getById(fileId);
BusinessAssert.notNull(datasetFile, CommonErrorCode.PARAM_ERROR);
BusinessAssert.isTrue(Objects.equals(datasetFile.getDatasetId(), datasetId), CommonErrorCode.PARAM_ERROR);
return datasetFile;
}
private void assertOfficeDocument(DatasetFile file) {
BusinessAssert.notNull(file, CommonErrorCode.PARAM_ERROR);
String extension = resolveFileExtension(resolveOriginalName(file));
BusinessAssert.isTrue(OFFICE_EXTENSIONS.contains(extension), CommonErrorCode.PARAM_ERROR);
}
private String resolveOriginalName(DatasetFile file) {
if (file == null) {
return "";
}
if (StringUtils.isNotBlank(file.getFileName())) {
return file.getFileName();
}
if (StringUtils.isNotBlank(file.getFilePath())) {
return Paths.get(file.getFilePath()).getFileName().toString();
}
return "";
}
private String resolveFileExtension(String fileName) {
if (StringUtils.isBlank(fileName)) {
return "";
}
int dotIndex = fileName.lastIndexOf('.');
if (dotIndex <= 0 || dotIndex >= fileName.length() - 1) {
return "";
}
return fileName.substring(dotIndex + 1).toLowerCase();
}
private String resolvePreviewPdfName(DatasetFile file) {
String originalName = resolveOriginalName(file);
if (StringUtils.isBlank(originalName)) {
return "预览.pdf";
}
int dotIndex = originalName.lastIndexOf('.');
if (dotIndex <= 0) {
return originalName + PREVIEW_FILE_SUFFIX;
}
return originalName.substring(0, dotIndex) + PREVIEW_FILE_SUFFIX;
}
private boolean previewPdfExists(DatasetFile file, KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo) {
String relativePath = StringUtils.defaultIfBlank(previewInfo.pdfPath(), resolvePreviewRelativePath(file.getDatasetId(), file.getId()));
Path filePath = resolvePreviewStoragePath(relativePath);
return Files.exists(filePath) && Files.isRegularFile(filePath);
}
private KnowledgeItemPreviewMetadataHelper.PreviewInfo markPreviewFailed(
DatasetFile file,
KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo,
String error
) {
String relativePath = StringUtils.defaultIfBlank(previewInfo.pdfPath(), resolvePreviewRelativePath(file.getDatasetId(), file.getId()));
String updatedMetadata = KnowledgeItemPreviewMetadataHelper.applyPreviewInfo(
file.getMetadata(),
objectMapper,
KnowledgeItemPreviewStatus.FAILED,
relativePath,
error,
nowText()
);
file.setMetadata(updatedMetadata);
datasetFileRepository.updateById(file);
return KnowledgeItemPreviewMetadataHelper.readPreviewInfo(updatedMetadata, objectMapper);
}
private String resolvePreviewRelativePath(String datasetId, String fileId) {
String relativePath = Paths.get(DATASET_PREVIEW_DIR, datasetId, fileId + PREVIEW_FILE_SUFFIX)
.toString();
return relativePath.replace("\\", PATH_SEPARATOR);
}
Path resolvePreviewStoragePath(String relativePath) {
String normalizedRelativePath = StringUtils.defaultString(relativePath).replace("/", java.io.File.separator);
Path root = resolveUploadRootPath();
Path target = root.resolve(normalizedRelativePath).toAbsolutePath().normalize();
BusinessAssert.isTrue(target.startsWith(root), CommonErrorCode.PARAM_ERROR);
return target;
}
private Path resolveUploadRootPath() {
String uploadDir = dataManagementProperties.getFileStorage().getUploadDir();
BusinessAssert.isTrue(StringUtils.isNotBlank(uploadDir), CommonErrorCode.PARAM_ERROR);
return Paths.get(uploadDir).toAbsolutePath().normalize();
}
private String nowText() {
return LocalDateTime.now().format(PREVIEW_TIME_FORMATTER);
}
public record PreviewFile(Path filePath, String fileName) {
}
}

View File

@@ -0,0 +1,142 @@
package com.datamate.datamanagement.application;
import com.datamate.common.infrastructure.exception.BusinessAssert;
import com.datamate.common.infrastructure.exception.CommonErrorCode;
import com.datamate.datamanagement.common.enums.KnowledgeStatusType;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItemDirectory;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeSet;
import com.datamate.datamanagement.infrastructure.exception.DataManagementErrorCode;
import com.datamate.datamanagement.infrastructure.persistence.repository.KnowledgeItemDirectoryRepository;
import com.datamate.datamanagement.infrastructure.persistence.repository.KnowledgeItemRepository;
import com.datamate.datamanagement.infrastructure.persistence.repository.KnowledgeSetRepository;
import com.datamate.datamanagement.interfaces.dto.CreateKnowledgeDirectoryRequest;
import com.datamate.datamanagement.interfaces.dto.KnowledgeDirectoryQuery;
import lombok.RequiredArgsConstructor;
import org.apache.commons.lang3.StringUtils;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
import java.util.List;
import java.util.UUID;
/**
* 知识条目目录应用服务
*/
@Service
@Transactional
@RequiredArgsConstructor
public class KnowledgeDirectoryApplicationService {
private static final String PATH_SEPARATOR = "/";
private static final String INVALID_PATH_SEGMENT = "..";
private final KnowledgeItemDirectoryRepository knowledgeItemDirectoryRepository;
private final KnowledgeItemRepository knowledgeItemRepository;
private final KnowledgeSetRepository knowledgeSetRepository;
@Transactional(readOnly = true)
public List<KnowledgeItemDirectory> getKnowledgeDirectories(String setId, KnowledgeDirectoryQuery query) {
BusinessAssert.notNull(query, CommonErrorCode.PARAM_ERROR);
query.setSetId(setId);
return knowledgeItemDirectoryRepository.findByCriteria(query);
}
public KnowledgeItemDirectory createKnowledgeDirectory(String setId, CreateKnowledgeDirectoryRequest request) {
BusinessAssert.notNull(request, CommonErrorCode.PARAM_ERROR);
KnowledgeSet knowledgeSet = requireKnowledgeSet(setId);
BusinessAssert.isTrue(!isReadOnlyStatus(knowledgeSet.getStatus()),
DataManagementErrorCode.KNOWLEDGE_SET_STATUS_ERROR);
String directoryName = normalizeDirectoryName(request.getDirectoryName());
validateDirectoryName(directoryName);
String parentPrefix = normalizeRelativePathPrefix(request.getParentPrefix());
String relativePath = normalizeRelativePathValue(parentPrefix + directoryName);
validateRelativePath(relativePath);
BusinessAssert.isTrue(!knowledgeItemRepository.existsBySetIdAndRelativePath(setId, relativePath),
CommonErrorCode.PARAM_ERROR);
KnowledgeItemDirectory existing = knowledgeItemDirectoryRepository.findBySetIdAndPath(setId, relativePath);
if (existing != null) {
return existing;
}
KnowledgeItemDirectory directory = new KnowledgeItemDirectory();
directory.setId(UUID.randomUUID().toString());
directory.setSetId(setId);
directory.setName(directoryName);
directory.setRelativePath(relativePath);
knowledgeItemDirectoryRepository.save(directory);
return directory;
}
public void deleteKnowledgeDirectory(String setId, String relativePath) {
KnowledgeSet knowledgeSet = requireKnowledgeSet(setId);
BusinessAssert.isTrue(!isReadOnlyStatus(knowledgeSet.getStatus()),
DataManagementErrorCode.KNOWLEDGE_SET_STATUS_ERROR);
String normalized = normalizeRelativePathValue(relativePath);
validateRelativePath(normalized);
knowledgeItemRepository.removeByRelativePathPrefix(setId, normalized);
knowledgeItemDirectoryRepository.removeByRelativePathPrefix(setId, normalized);
}
private KnowledgeSet requireKnowledgeSet(String setId) {
KnowledgeSet knowledgeSet = knowledgeSetRepository.getById(setId);
BusinessAssert.notNull(knowledgeSet, DataManagementErrorCode.KNOWLEDGE_SET_NOT_FOUND);
return knowledgeSet;
}
private boolean isReadOnlyStatus(KnowledgeStatusType status) {
return status == KnowledgeStatusType.ARCHIVED || status == KnowledgeStatusType.DEPRECATED;
}
private String normalizeDirectoryName(String name) {
return StringUtils.trimToEmpty(name);
}
private void validateDirectoryName(String name) {
BusinessAssert.isTrue(StringUtils.isNotBlank(name), CommonErrorCode.PARAM_ERROR);
BusinessAssert.isTrue(!name.contains(PATH_SEPARATOR), CommonErrorCode.PARAM_ERROR);
BusinessAssert.isTrue(!name.contains("\\"), CommonErrorCode.PARAM_ERROR);
BusinessAssert.isTrue(!name.contains(INVALID_PATH_SEGMENT), CommonErrorCode.PARAM_ERROR);
}
private void validateRelativePath(String relativePath) {
BusinessAssert.isTrue(StringUtils.isNotBlank(relativePath), CommonErrorCode.PARAM_ERROR);
BusinessAssert.isTrue(!relativePath.contains(INVALID_PATH_SEGMENT), CommonErrorCode.PARAM_ERROR);
}
private String normalizeRelativePathPrefix(String prefix) {
if (StringUtils.isBlank(prefix)) {
return "";
}
String normalized = prefix.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
while (normalized.endsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(0, normalized.length() - 1);
}
if (StringUtils.isBlank(normalized)) {
return "";
}
validateRelativePath(normalized);
return normalized + PATH_SEPARATOR;
}
private String normalizeRelativePathValue(String relativePath) {
if (StringUtils.isBlank(relativePath)) {
return "";
}
String normalized = relativePath.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
while (normalized.endsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(0, normalized.length() - 1);
}
return normalized;
}
}

View File

@@ -16,15 +16,20 @@ import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItem;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeSet;
import com.datamate.datamanagement.infrastructure.config.DataManagementProperties;
import com.datamate.datamanagement.infrastructure.exception.DataManagementErrorCode;
import com.datamate.datamanagement.infrastructure.persistence.mapper.TagMapper;
import com.datamate.datamanagement.infrastructure.persistence.repository.DatasetFileRepository;
import com.datamate.datamanagement.infrastructure.persistence.repository.DatasetRepository;
import com.datamate.datamanagement.infrastructure.persistence.repository.KnowledgeItemRepository;
import com.datamate.datamanagement.infrastructure.persistence.repository.KnowledgeSetRepository;
import com.datamate.datamanagement.interfaces.converter.KnowledgeConverter;
import com.datamate.datamanagement.interfaces.dto.CreateKnowledgeItemRequest;
import com.datamate.datamanagement.interfaces.dto.DeleteKnowledgeItemsRequest;
import com.datamate.datamanagement.interfaces.dto.ImportKnowledgeItemsRequest;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemPagingQuery;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemResponse;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemSearchQuery;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemSearchResponse;
import com.datamate.datamanagement.interfaces.dto.KnowledgeManagementStatisticsResponse;
import com.datamate.datamanagement.interfaces.dto.ReplaceKnowledgeItemFileRequest;
import com.datamate.datamanagement.interfaces.dto.UpdateKnowledgeItemRequest;
import com.datamate.datamanagement.interfaces.dto.UploadKnowledgeItemsRequest;
@@ -71,16 +76,20 @@ public class KnowledgeItemApplicationService {
private static final String EXPORT_FILE_PREFIX = "knowledge_set_";
private static final String EXPORT_FILE_SUFFIX = ".zip";
private static final String EXPORT_CONTENT_TYPE = "application/zip";
private static final String PREVIEW_PDF_CONTENT_TYPE = "application/pdf";
private static final int MAX_FILE_BASE_LENGTH = 120;
private static final int MAX_TITLE_LENGTH = 200;
private static final String KNOWLEDGE_ITEM_UPLOAD_DIR = "knowledge-items";
private static final String DEFAULT_FILE_EXTENSION = "bin";
private static final String PATH_SEPARATOR = "/";
private final KnowledgeItemRepository knowledgeItemRepository;
private final KnowledgeSetRepository knowledgeSetRepository;
private final DatasetRepository datasetRepository;
private final DatasetFileRepository datasetFileRepository;
private final DataManagementProperties dataManagementProperties;
private final TagMapper tagMapper;
private final KnowledgeItemPreviewService knowledgeItemPreviewService;
public KnowledgeItem createKnowledgeItem(String setId, CreateKnowledgeItemRequest request) {
KnowledgeSet knowledgeSet = requireKnowledgeSet(setId);
@@ -109,6 +118,7 @@ public class KnowledgeItemApplicationService {
List<MultipartFile> files = request.getFiles();
BusinessAssert.isTrue(CollectionUtils.isNotEmpty(files), CommonErrorCode.PARAM_ERROR);
String parentPrefix = normalizeRelativePathPrefix(request.getParentPrefix());
Path uploadRoot = resolveUploadRootPath();
Path setDir = uploadRoot.resolve(KNOWLEDGE_ITEM_UPLOAD_DIR).resolve(setId).normalize();
@@ -142,6 +152,7 @@ public class KnowledgeItemApplicationService {
knowledgeItem.setContentType(KnowledgeContentType.FILE);
knowledgeItem.setSourceType(KnowledgeSourceType.FILE_UPLOAD);
knowledgeItem.setSourceFileId(trimToLength(safeOriginalName, MAX_TITLE_LENGTH));
knowledgeItem.setRelativePath(buildRelativePath(parentPrefix, safeOriginalName));
items.add(knowledgeItem);
}
@@ -167,6 +178,9 @@ public class KnowledgeItemApplicationService {
if (request.getContentType() != null) {
knowledgeItem.setContentType(request.getContentType());
}
if (request.getMetadata() != null) {
knowledgeItem.setMetadata(request.getMetadata());
}
knowledgeItemRepository.updateById(knowledgeItem);
return knowledgeItem;
@@ -179,6 +193,22 @@ public class KnowledgeItemApplicationService {
knowledgeItemRepository.removeById(itemId);
}
public void deleteKnowledgeItems(String setId, DeleteKnowledgeItemsRequest request) {
BusinessAssert.notNull(request, CommonErrorCode.PARAM_ERROR);
List<String> ids = request.getIds();
BusinessAssert.isTrue(CollectionUtils.isNotEmpty(ids), CommonErrorCode.PARAM_ERROR);
List<KnowledgeItem> items = knowledgeItemRepository.listByIds(ids);
BusinessAssert.isTrue(CollectionUtils.isNotEmpty(items), DataManagementErrorCode.KNOWLEDGE_ITEM_NOT_FOUND);
BusinessAssert.isTrue(items.size() == ids.size(), DataManagementErrorCode.KNOWLEDGE_ITEM_NOT_FOUND);
boolean allMatch = items.stream().allMatch(item -> Objects.equals(item.getSetId(), setId));
BusinessAssert.isTrue(allMatch, CommonErrorCode.PARAM_ERROR);
List<String> deleteIds = items.stream().map(KnowledgeItem::getId).toList();
knowledgeItemRepository.removeByIds(deleteIds);
}
@Transactional(readOnly = true)
public KnowledgeItem getKnowledgeItem(String setId, String itemId) {
KnowledgeItem knowledgeItem = knowledgeItemRepository.getById(itemId);
@@ -196,6 +226,40 @@ public class KnowledgeItemApplicationService {
return PagedResponse.of(responses, page.getCurrent(), page.getTotal(), page.getPages());
}
@Transactional(readOnly = true)
public KnowledgeManagementStatisticsResponse getKnowledgeManagementStatistics() {
KnowledgeManagementStatisticsResponse response = new KnowledgeManagementStatisticsResponse();
response.setTotalKnowledgeSets(knowledgeSetRepository.count());
long totalFiles = knowledgeItemRepository.countBySourceTypes(List.of(
KnowledgeSourceType.DATASET_FILE,
KnowledgeSourceType.FILE_UPLOAD
));
response.setTotalFiles(totalFiles);
long datasetFileSize = safeLong(knowledgeItemRepository.sumDatasetFileSize());
long uploadFileSize = calculateUploadFileTotalSize();
response.setTotalSize(datasetFileSize + uploadFileSize);
response.setTotalTags(safeLong(tagMapper.countKnowledgeSetTags()));
return response;
}
@Transactional(readOnly = true)
public PagedResponse<KnowledgeItemSearchResponse> searchKnowledgeItems(KnowledgeItemSearchQuery query) {
BusinessAssert.notNull(query, CommonErrorCode.PARAM_ERROR);
String keyword = StringUtils.trimToEmpty(query.getKeyword());
BusinessAssert.isTrue(StringUtils.isNotBlank(keyword), CommonErrorCode.PARAM_ERROR);
IPage<KnowledgeItemSearchResponse> page = new Page<>(query.getPage(), query.getSize());
IPage<KnowledgeItemSearchResponse> result = knowledgeItemRepository.searchFileItems(page, keyword);
List<KnowledgeItemSearchResponse> responses = result.getRecords()
.stream()
.map(this::normalizeSearchResponse)
.toList();
return PagedResponse.of(responses, result.getCurrent(), result.getTotal(), result.getPages());
}
public List<KnowledgeItem> importKnowledgeItems(String setId, ImportKnowledgeItemsRequest request) {
KnowledgeSet knowledgeSet = requireKnowledgeSet(setId);
BusinessAssert.isTrue(!isReadOnlyStatus(knowledgeSet.getStatus()),
@@ -220,6 +284,7 @@ public class KnowledgeItemApplicationService {
knowledgeItem.setSourceType(KnowledgeSourceType.DATASET_FILE);
knowledgeItem.setSourceDatasetId(dataset.getId());
knowledgeItem.setSourceFileId(datasetFile.getId());
knowledgeItem.setRelativePath(resolveDatasetFileRelativePath(dataset, datasetFile));
items.add(knowledgeItem);
}
@@ -271,7 +336,7 @@ public class KnowledgeItemApplicationService {
String relativePath = knowledgeItem.getContent();
BusinessAssert.isTrue(StringUtils.isNotBlank(relativePath), CommonErrorCode.PARAM_ERROR);
Path filePath = resolveKnowledgeItemStoragePath(relativePath);
Path filePath = resolveKnowledgeItemStoragePathWithFallback(relativePath);
BusinessAssert.isTrue(Files.exists(filePath) && Files.isRegularFile(filePath), CommonErrorCode.PARAM_ERROR);
String downloadName = StringUtils.isNotBlank(knowledgeItem.getSourceFileId())
@@ -304,12 +369,32 @@ public class KnowledgeItemApplicationService {
String relativePath = knowledgeItem.getContent();
BusinessAssert.isTrue(StringUtils.isNotBlank(relativePath), CommonErrorCode.PARAM_ERROR);
Path filePath = resolveKnowledgeItemStoragePath(relativePath);
BusinessAssert.isTrue(Files.exists(filePath) && Files.isRegularFile(filePath), CommonErrorCode.PARAM_ERROR);
String previewName = StringUtils.isNotBlank(knowledgeItem.getSourceFileId())
? knowledgeItem.getSourceFileId()
: filePath.getFileName().toString();
: Paths.get(relativePath).getFileName().toString();
if (knowledgeItemPreviewService.isOfficeDocument(previewName)) {
KnowledgeItemPreviewService.PreviewFile previewFile = knowledgeItemPreviewService.resolveReadyPreviewFile(setId, knowledgeItem);
if (previewFile == null) {
response.setStatus(HttpServletResponse.SC_CONFLICT);
return;
}
response.setContentType(PREVIEW_PDF_CONTENT_TYPE);
response.setCharacterEncoding(StandardCharsets.UTF_8.name());
response.setHeader(HttpHeaders.CONTENT_DISPOSITION,
"inline; filename=\"" + URLEncoder.encode(previewFile.fileName(), StandardCharsets.UTF_8) + "\"");
try (InputStream inputStream = Files.newInputStream(previewFile.filePath())) {
inputStream.transferTo(response.getOutputStream());
response.flushBuffer();
} catch (IOException e) {
log.error("preview knowledge item pdf error, itemId: {}", itemId, e);
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}
return;
}
Path filePath = resolveKnowledgeItemStoragePathWithFallback(relativePath);
BusinessAssert.isTrue(Files.exists(filePath) && Files.isRegularFile(filePath), CommonErrorCode.PARAM_ERROR);
String contentType = null;
try {
@@ -382,7 +467,10 @@ public class KnowledgeItemApplicationService {
knowledgeItem.setContentType(KnowledgeContentType.FILE);
knowledgeItem.setSourceType(KnowledgeSourceType.FILE_UPLOAD);
knowledgeItem.setSourceFileId(sourceFileId);
knowledgeItem.setRelativePath(resolveReplacedRelativePath(knowledgeItem.getRelativePath(), sourceFileId));
knowledgeItem.setMetadata(knowledgeItemPreviewService.clearPreviewMetadata(knowledgeItem.getMetadata()));
knowledgeItemRepository.updateById(knowledgeItem);
knowledgeItemPreviewService.deletePreviewFileQuietly(setId, knowledgeItem.getId());
deleteFile(oldFilePath);
} catch (Exception e) {
deleteFileQuietly(targetPath);
@@ -447,11 +535,221 @@ public class KnowledgeItemApplicationService {
return target;
}
private Path resolveKnowledgeItemStoragePathWithFallback(String relativePath) {
BusinessAssert.isTrue(StringUtils.isNotBlank(relativePath), CommonErrorCode.PARAM_ERROR);
String normalizedInput = relativePath.replace("\\", PATH_SEPARATOR).trim();
Path root = resolveUploadRootPath();
java.util.LinkedHashSet<Path> candidates = new java.util.LinkedHashSet<>();
Path inputPath = Paths.get(normalizedInput.replace(PATH_SEPARATOR, File.separator));
if (inputPath.isAbsolute()) {
Path normalizedAbsolute = inputPath.toAbsolutePath().normalize();
if (normalizedAbsolute.startsWith(root)) {
candidates.add(normalizedAbsolute);
}
String segmentRelativePath = extractRelativePathFromSegment(normalizedInput, KNOWLEDGE_ITEM_UPLOAD_DIR);
if (StringUtils.isNotBlank(segmentRelativePath)) {
candidates.add(buildKnowledgeItemStoragePath(root, segmentRelativePath));
}
BusinessAssert.isTrue(!candidates.isEmpty(), CommonErrorCode.PARAM_ERROR);
} else {
String normalizedRelative = normalizeRelativePathValue(normalizedInput);
if (StringUtils.isNotBlank(normalizedRelative)) {
candidates.add(buildKnowledgeItemStoragePath(root, normalizedRelative));
}
String segmentRelativePath = extractRelativePathFromSegment(normalizedInput, KNOWLEDGE_ITEM_UPLOAD_DIR);
if (StringUtils.isNotBlank(segmentRelativePath)) {
candidates.add(buildKnowledgeItemStoragePath(root, segmentRelativePath));
}
if (StringUtils.isNotBlank(normalizedRelative)
&& !normalizedRelative.startsWith(KNOWLEDGE_ITEM_UPLOAD_DIR + PATH_SEPARATOR)
&& !normalizedRelative.equals(KNOWLEDGE_ITEM_UPLOAD_DIR)) {
candidates.add(buildKnowledgeItemStoragePath(root, KNOWLEDGE_ITEM_UPLOAD_DIR + PATH_SEPARATOR + normalizedRelative));
}
}
if (root.getFileName() != null && KNOWLEDGE_ITEM_UPLOAD_DIR.equals(root.getFileName().toString())) {
String normalizedRelative = normalizeRelativePathValue(normalizedInput);
if (StringUtils.isNotBlank(normalizedRelative)
&& normalizedRelative.startsWith(KNOWLEDGE_ITEM_UPLOAD_DIR + PATH_SEPARATOR)) {
String withoutPrefix = normalizedRelative.substring(KNOWLEDGE_ITEM_UPLOAD_DIR.length() + PATH_SEPARATOR.length());
if (StringUtils.isNotBlank(withoutPrefix)) {
candidates.add(buildKnowledgeItemStoragePath(root, withoutPrefix));
}
}
}
Path fallback = null;
for (Path candidate : candidates) {
if (fallback == null) {
fallback = candidate;
}
if (Files.exists(candidate) && Files.isRegularFile(candidate)) {
return candidate;
}
}
BusinessAssert.notNull(fallback, CommonErrorCode.PARAM_ERROR);
return fallback;
}
private Path buildKnowledgeItemStoragePath(Path root, String relativePath) {
String normalizedRelativePath = StringUtils.defaultString(relativePath).replace(PATH_SEPARATOR, File.separator);
Path target = root.resolve(normalizedRelativePath).toAbsolutePath().normalize();
BusinessAssert.isTrue(target.startsWith(root), CommonErrorCode.PARAM_ERROR);
return target;
}
private String extractRelativePathFromSegment(String rawPath, String segment) {
if (StringUtils.isBlank(rawPath) || StringUtils.isBlank(segment)) {
return null;
}
String normalized = rawPath.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
String segmentPrefix = segment + PATH_SEPARATOR;
int index = normalized.indexOf(segmentPrefix);
if (index < 0) {
return segment.equals(normalized) ? segment : null;
}
return normalizeRelativePathValue(normalized.substring(index));
}
private KnowledgeItemSearchResponse normalizeSearchResponse(KnowledgeItemSearchResponse item) {
BusinessAssert.notNull(item, CommonErrorCode.PARAM_ERROR);
if (item.getSourceType() == KnowledgeSourceType.FILE_UPLOAD) {
item.setFileSize(resolveUploadFileSize(item.getContent()));
if (StringUtils.isBlank(item.getFileName())) {
item.setFileName(item.getSourceFileId());
}
}
if (item.getSourceType() == KnowledgeSourceType.DATASET_FILE) {
if (item.getFileSize() == null) {
item.setFileSize(0L);
}
if (StringUtils.isBlank(item.getFileName())) {
item.setFileName(item.getSourceFileId());
}
}
item.setContent(null);
return item;
}
private long calculateUploadFileTotalSize() {
List<KnowledgeItem> items = knowledgeItemRepository.findFileUploadItems();
if (CollectionUtils.isEmpty(items)) {
return 0L;
}
long total = 0L;
for (KnowledgeItem item : items) {
total += resolveUploadFileSize(item.getContent());
}
return total;
}
private long resolveUploadFileSize(String relativePath) {
if (StringUtils.isBlank(relativePath)) {
return 0L;
}
try {
Path filePath = resolveKnowledgeItemStoragePath(relativePath);
if (!Files.exists(filePath) || !Files.isRegularFile(filePath)) {
return 0L;
}
return Files.size(filePath);
} catch (Exception e) {
log.warn("resolve knowledge item file size error, path: {}", relativePath, e);
return 0L;
}
}
private long safeLong(Long value) {
return value == null ? 0L : value;
}
private String buildRelativeFilePath(String setId, String storedName) {
String relativePath = Paths.get(KNOWLEDGE_ITEM_UPLOAD_DIR, setId, storedName).toString();
return relativePath.replace(File.separatorChar, '/');
}
private String buildRelativePath(String parentPrefix, String fileName) {
String safeName = sanitizeFileName(fileName);
if (StringUtils.isBlank(safeName)) {
safeName = "file";
}
String normalizedPrefix = normalizeRelativePathPrefix(parentPrefix);
return normalizedPrefix + safeName;
}
private String normalizeRelativePathPrefix(String prefix) {
if (StringUtils.isBlank(prefix)) {
return "";
}
String normalized = prefix.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
while (normalized.endsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(0, normalized.length() - 1);
}
BusinessAssert.isTrue(!normalized.contains(".."), CommonErrorCode.PARAM_ERROR);
if (StringUtils.isBlank(normalized)) {
return "";
}
return normalized + PATH_SEPARATOR;
}
private String normalizeRelativePathValue(String relativePath) {
if (StringUtils.isBlank(relativePath)) {
return "";
}
String normalized = relativePath.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
while (normalized.endsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(0, normalized.length() - 1);
}
return normalized;
}
private String resolveDatasetFileRelativePath(Dataset dataset, DatasetFile datasetFile) {
if (datasetFile == null) {
return "";
}
String fileName = StringUtils.defaultIfBlank(datasetFile.getFileName(), datasetFile.getId());
String datasetPath = dataset == null ? null : dataset.getPath();
String filePath = datasetFile.getFilePath();
if (StringUtils.isBlank(datasetPath) || StringUtils.isBlank(filePath)) {
return buildRelativePath("", fileName);
}
try {
Path datasetRoot = Paths.get(datasetPath).toAbsolutePath().normalize();
Path targetPath = Paths.get(filePath).toAbsolutePath().normalize();
if (targetPath.startsWith(datasetRoot)) {
Path relative = datasetRoot.relativize(targetPath);
String relativeValue = relative.toString().replace(File.separatorChar, '/');
String normalized = normalizeRelativePathValue(relativeValue);
if (!normalized.contains("..") && StringUtils.isNotBlank(normalized)) {
return normalized;
}
}
} catch (Exception e) {
log.warn("resolve dataset file relative path failed, fileId: {}", datasetFile.getId(), e);
}
return buildRelativePath("", fileName);
}
private String resolveReplacedRelativePath(String existingRelativePath, String newFileName) {
String normalized = normalizeRelativePathValue(existingRelativePath);
if (StringUtils.isBlank(normalized)) {
return buildRelativePath("", newFileName);
}
int lastIndex = normalized.lastIndexOf(PATH_SEPARATOR);
String parentPrefix = lastIndex >= 0 ? normalized.substring(0, lastIndex + 1) : "";
return buildRelativePath(parentPrefix, newFileName);
}
private void createDirectories(Path path) {
try {
Files.createDirectories(path);

View File

@@ -0,0 +1,275 @@
package com.datamate.datamanagement.application;
import com.datamate.datamanagement.common.enums.KnowledgeItemPreviewStatus;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItem;
import com.datamate.datamanagement.infrastructure.config.DataManagementProperties;
import com.datamate.datamanagement.infrastructure.persistence.repository.KnowledgeItemRepository;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.StringUtils;
import org.springframework.scheduling.annotation.Async;
import org.springframework.stereotype.Service;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Set;
/**
* 知识条目预览转换异步任务
*/
@Service
@RequiredArgsConstructor
@Slf4j
public class KnowledgeItemPreviewAsyncService {
private static final Set<String> OFFICE_EXTENSIONS = Set.of("doc", "docx");
private static final String KNOWLEDGE_ITEM_UPLOAD_DIR = "knowledge-items";
private static final String PREVIEW_SUB_DIR = "preview";
private static final String PREVIEW_FILE_SUFFIX = ".pdf";
private static final String PATH_SEPARATOR = "/";
private static final int MAX_ERROR_LENGTH = 500;
private static final DateTimeFormatter PREVIEW_TIME_FORMATTER = DateTimeFormatter.ISO_LOCAL_DATE_TIME;
private final KnowledgeItemRepository knowledgeItemRepository;
private final DataManagementProperties dataManagementProperties;
private final ObjectMapper objectMapper = new ObjectMapper();
@Async
public void convertPreviewAsync(String itemId) {
if (StringUtils.isBlank(itemId)) {
return;
}
KnowledgeItem item = knowledgeItemRepository.getById(itemId);
if (item == null) {
return;
}
String extension = resolveFileExtension(resolveOriginalName(item));
if (!OFFICE_EXTENSIONS.contains(extension)) {
updatePreviewStatus(item, KnowledgeItemPreviewStatus.FAILED, null, "仅支持 DOC/DOCX 转换");
return;
}
if (StringUtils.isBlank(item.getContent())) {
updatePreviewStatus(item, KnowledgeItemPreviewStatus.FAILED, null, "源文件路径为空");
return;
}
Path sourcePath = resolveKnowledgeItemStoragePath(item.getContent());
if (!Files.exists(sourcePath) || !Files.isRegularFile(sourcePath)) {
updatePreviewStatus(item, KnowledgeItemPreviewStatus.FAILED, null, "源文件不存在");
return;
}
KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo = KnowledgeItemPreviewMetadataHelper
.readPreviewInfo(item.getMetadata(), objectMapper);
String previewRelativePath = StringUtils.defaultIfBlank(
previewInfo.pdfPath(),
resolvePreviewRelativePath(item.getSetId(), item.getId())
);
Path targetPath = resolvePreviewStoragePath(previewRelativePath);
ensureParentDirectory(targetPath);
try {
LibreOfficeConverter.convertToPdf(sourcePath, targetPath);
updatePreviewStatus(item, KnowledgeItemPreviewStatus.READY, previewRelativePath, null);
} catch (Exception e) {
log.error("preview convert failed, itemId: {}", item.getId(), e);
updatePreviewStatus(item, KnowledgeItemPreviewStatus.FAILED, previewRelativePath, trimError(e.getMessage()));
}
}
private void updatePreviewStatus(
KnowledgeItem item,
KnowledgeItemPreviewStatus status,
String previewRelativePath,
String error
) {
if (item == null) {
return;
}
String updatedMetadata = KnowledgeItemPreviewMetadataHelper.applyPreviewInfo(
item.getMetadata(),
objectMapper,
status,
previewRelativePath,
error,
nowText()
);
item.setMetadata(updatedMetadata);
knowledgeItemRepository.updateById(item);
}
private String resolveOriginalName(KnowledgeItem item) {
if (item == null) {
return "";
}
if (StringUtils.isNotBlank(item.getSourceFileId())) {
return item.getSourceFileId();
}
if (StringUtils.isNotBlank(item.getContent())) {
return Paths.get(item.getContent()).getFileName().toString();
}
return "";
}
private String resolveFileExtension(String fileName) {
if (StringUtils.isBlank(fileName)) {
return "";
}
int dotIndex = fileName.lastIndexOf('.');
if (dotIndex <= 0 || dotIndex >= fileName.length() - 1) {
return "";
}
return fileName.substring(dotIndex + 1).toLowerCase();
}
private String resolvePreviewRelativePath(String setId, String itemId) {
String relativePath = Paths.get(KNOWLEDGE_ITEM_UPLOAD_DIR, setId, PREVIEW_SUB_DIR, itemId + PREVIEW_FILE_SUFFIX)
.toString();
return relativePath.replace("\\", PATH_SEPARATOR);
}
private Path resolvePreviewStoragePath(String relativePath) {
String normalizedRelativePath = StringUtils.defaultString(relativePath).replace("/", java.io.File.separator);
Path root = resolveUploadRootPath();
Path target = root.resolve(normalizedRelativePath).toAbsolutePath().normalize();
if (!target.startsWith(root)) {
throw new IllegalArgumentException("invalid preview path");
}
return target;
}
private Path resolveKnowledgeItemStoragePath(String relativePath) {
if (StringUtils.isBlank(relativePath)) {
throw new IllegalArgumentException("invalid knowledge item path");
}
String normalizedInput = relativePath.replace("\\", PATH_SEPARATOR).trim();
Path root = resolveUploadRootPath();
java.util.LinkedHashSet<Path> candidates = new java.util.LinkedHashSet<>();
Path inputPath = Paths.get(normalizedInput.replace(PATH_SEPARATOR, java.io.File.separator));
if (inputPath.isAbsolute()) {
Path normalizedAbsolute = inputPath.toAbsolutePath().normalize();
if (normalizedAbsolute.startsWith(root)) {
candidates.add(normalizedAbsolute);
}
String segmentRelativePath = extractRelativePathFromSegment(normalizedInput, KNOWLEDGE_ITEM_UPLOAD_DIR);
if (StringUtils.isNotBlank(segmentRelativePath)) {
candidates.add(buildKnowledgeItemStoragePath(root, segmentRelativePath));
}
if (candidates.isEmpty()) {
throw new IllegalArgumentException("invalid knowledge item path");
}
} else {
String normalizedRelative = normalizeRelativePathValue(normalizedInput);
if (StringUtils.isNotBlank(normalizedRelative)) {
candidates.add(buildKnowledgeItemStoragePath(root, normalizedRelative));
}
String segmentRelativePath = extractRelativePathFromSegment(normalizedInput, KNOWLEDGE_ITEM_UPLOAD_DIR);
if (StringUtils.isNotBlank(segmentRelativePath)) {
candidates.add(buildKnowledgeItemStoragePath(root, segmentRelativePath));
}
if (StringUtils.isNotBlank(normalizedRelative)
&& !normalizedRelative.startsWith(KNOWLEDGE_ITEM_UPLOAD_DIR + PATH_SEPARATOR)
&& !normalizedRelative.equals(KNOWLEDGE_ITEM_UPLOAD_DIR)) {
candidates.add(buildKnowledgeItemStoragePath(root, KNOWLEDGE_ITEM_UPLOAD_DIR + PATH_SEPARATOR + normalizedRelative));
}
}
if (root.getFileName() != null && KNOWLEDGE_ITEM_UPLOAD_DIR.equals(root.getFileName().toString())) {
String normalizedRelative = normalizeRelativePathValue(normalizedInput);
if (StringUtils.isNotBlank(normalizedRelative)
&& normalizedRelative.startsWith(KNOWLEDGE_ITEM_UPLOAD_DIR + PATH_SEPARATOR)) {
String withoutPrefix = normalizedRelative.substring(KNOWLEDGE_ITEM_UPLOAD_DIR.length() + PATH_SEPARATOR.length());
if (StringUtils.isNotBlank(withoutPrefix)) {
candidates.add(buildKnowledgeItemStoragePath(root, withoutPrefix));
}
}
}
Path fallback = null;
for (Path candidate : candidates) {
if (fallback == null) {
fallback = candidate;
}
if (Files.exists(candidate) && Files.isRegularFile(candidate)) {
return candidate;
}
}
if (fallback == null) {
throw new IllegalArgumentException("invalid knowledge item path");
}
return fallback;
}
private Path buildKnowledgeItemStoragePath(Path root, String relativePath) {
String normalizedRelativePath = StringUtils.defaultString(relativePath).replace(PATH_SEPARATOR, java.io.File.separator);
Path target = root.resolve(normalizedRelativePath).toAbsolutePath().normalize();
if (!target.startsWith(root)) {
throw new IllegalArgumentException("invalid knowledge item path");
}
return target;
}
private String extractRelativePathFromSegment(String rawPath, String segment) {
if (StringUtils.isBlank(rawPath) || StringUtils.isBlank(segment)) {
return null;
}
String normalized = rawPath.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
String segmentPrefix = segment + PATH_SEPARATOR;
int index = normalized.indexOf(segmentPrefix);
if (index < 0) {
return segment.equals(normalized) ? segment : null;
}
return normalizeRelativePathValue(normalized.substring(index));
}
private String normalizeRelativePathValue(String relativePath) {
if (StringUtils.isBlank(relativePath)) {
return "";
}
String normalized = relativePath.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
while (normalized.endsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(0, normalized.length() - 1);
}
return normalized;
}
private Path resolveUploadRootPath() {
String uploadDir = dataManagementProperties.getFileStorage().getUploadDir();
return Paths.get(uploadDir).toAbsolutePath().normalize();
}
private void ensureParentDirectory(Path targetPath) {
try {
Path parent = targetPath.getParent();
if (parent != null) {
Files.createDirectories(parent);
}
} catch (IOException e) {
throw new IllegalStateException("创建预览目录失败", e);
}
}
private String trimError(String error) {
if (StringUtils.isBlank(error)) {
return "";
}
if (error.length() <= MAX_ERROR_LENGTH) {
return error;
}
return error.substring(0, MAX_ERROR_LENGTH);
}
private String nowText() {
return LocalDateTime.now().format(PREVIEW_TIME_FORMATTER);
}
}

View File

@@ -0,0 +1,134 @@
package com.datamate.datamanagement.application;
import com.datamate.datamanagement.common.enums.KnowledgeItemPreviewStatus;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.ObjectNode;
import org.apache.commons.lang3.StringUtils;
/**
* 知识条目预览元数据解析与写入辅助类
*/
public final class KnowledgeItemPreviewMetadataHelper {
public static final String PREVIEW_STATUS_KEY = "previewStatus";
public static final String PREVIEW_PDF_PATH_KEY = "previewPdfPath";
public static final String PREVIEW_ERROR_KEY = "previewError";
public static final String PREVIEW_UPDATED_AT_KEY = "previewUpdatedAt";
private KnowledgeItemPreviewMetadataHelper() {
}
public static PreviewInfo readPreviewInfo(String metadata, ObjectMapper objectMapper) {
if (StringUtils.isBlank(metadata) || objectMapper == null) {
return PreviewInfo.empty();
}
try {
JsonNode node = objectMapper.readTree(metadata);
if (node == null || !node.isObject()) {
return PreviewInfo.empty();
}
String statusText = textValue(node, PREVIEW_STATUS_KEY);
KnowledgeItemPreviewStatus status = parseStatus(statusText);
return new PreviewInfo(
status,
textValue(node, PREVIEW_PDF_PATH_KEY),
textValue(node, PREVIEW_ERROR_KEY),
textValue(node, PREVIEW_UPDATED_AT_KEY)
);
} catch (Exception ignore) {
return PreviewInfo.empty();
}
}
public static String applyPreviewInfo(
String metadata,
ObjectMapper objectMapper,
KnowledgeItemPreviewStatus status,
String pdfPath,
String error,
String updatedAt
) {
if (objectMapper == null) {
return metadata;
}
ObjectNode root = parseRoot(metadata, objectMapper);
if (status == null) {
root.remove(PREVIEW_STATUS_KEY);
} else {
root.put(PREVIEW_STATUS_KEY, status.name());
}
if (StringUtils.isBlank(pdfPath)) {
root.remove(PREVIEW_PDF_PATH_KEY);
} else {
root.put(PREVIEW_PDF_PATH_KEY, pdfPath);
}
if (StringUtils.isBlank(error)) {
root.remove(PREVIEW_ERROR_KEY);
} else {
root.put(PREVIEW_ERROR_KEY, error);
}
if (StringUtils.isBlank(updatedAt)) {
root.remove(PREVIEW_UPDATED_AT_KEY);
} else {
root.put(PREVIEW_UPDATED_AT_KEY, updatedAt);
}
return root.size() == 0 ? null : root.toString();
}
public static String clearPreviewInfo(String metadata, ObjectMapper objectMapper) {
if (objectMapper == null) {
return metadata;
}
ObjectNode root = parseRoot(metadata, objectMapper);
root.remove(PREVIEW_STATUS_KEY);
root.remove(PREVIEW_PDF_PATH_KEY);
root.remove(PREVIEW_ERROR_KEY);
root.remove(PREVIEW_UPDATED_AT_KEY);
return root.size() == 0 ? null : root.toString();
}
private static ObjectNode parseRoot(String metadata, ObjectMapper objectMapper) {
if (StringUtils.isBlank(metadata)) {
return objectMapper.createObjectNode();
}
try {
JsonNode node = objectMapper.readTree(metadata);
if (node instanceof ObjectNode objectNode) {
return objectNode;
}
} catch (Exception ignore) {
return objectMapper.createObjectNode();
}
return objectMapper.createObjectNode();
}
private static String textValue(JsonNode node, String key) {
if (node == null || StringUtils.isBlank(key)) {
return null;
}
JsonNode value = node.get(key);
return value == null || value.isNull() ? null : value.asText();
}
private static KnowledgeItemPreviewStatus parseStatus(String statusText) {
if (StringUtils.isBlank(statusText)) {
return null;
}
try {
return KnowledgeItemPreviewStatus.valueOf(statusText);
} catch (Exception ignore) {
return null;
}
}
public record PreviewInfo(
KnowledgeItemPreviewStatus status,
String pdfPath,
String error,
String updatedAt
) {
public static PreviewInfo empty() {
return new PreviewInfo(null, null, null, null);
}
}
}

View File

@@ -0,0 +1,244 @@
package com.datamate.datamanagement.application;
import com.datamate.common.infrastructure.exception.BusinessAssert;
import com.datamate.common.infrastructure.exception.CommonErrorCode;
import com.datamate.datamanagement.common.enums.KnowledgeContentType;
import com.datamate.datamanagement.common.enums.KnowledgeItemPreviewStatus;
import com.datamate.datamanagement.common.enums.KnowledgeSourceType;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItem;
import com.datamate.datamanagement.infrastructure.config.DataManagementProperties;
import com.datamate.datamanagement.infrastructure.persistence.repository.KnowledgeItemRepository;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemPreviewStatusResponse;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.StringUtils;
import org.springframework.stereotype.Service;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Objects;
import java.util.Set;
/**
* 知识条目预览转换服务
*/
@Service
@RequiredArgsConstructor
@Slf4j
public class KnowledgeItemPreviewService {
private static final Set<String> OFFICE_EXTENSIONS = Set.of("doc", "docx");
private static final String KNOWLEDGE_ITEM_UPLOAD_DIR = "knowledge-items";
private static final String PREVIEW_SUB_DIR = "preview";
private static final String PREVIEW_FILE_SUFFIX = ".pdf";
private static final String PATH_SEPARATOR = "/";
private static final DateTimeFormatter PREVIEW_TIME_FORMATTER = DateTimeFormatter.ISO_LOCAL_DATE_TIME;
private final KnowledgeItemRepository knowledgeItemRepository;
private final DataManagementProperties dataManagementProperties;
private final KnowledgeItemPreviewAsyncService knowledgeItemPreviewAsyncService;
private final ObjectMapper objectMapper = new ObjectMapper();
public KnowledgeItemPreviewStatusResponse getPreviewStatus(String setId, String itemId) {
KnowledgeItem item = requireKnowledgeItem(setId, itemId);
assertOfficeDocument(item);
KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo = KnowledgeItemPreviewMetadataHelper
.readPreviewInfo(item.getMetadata(), objectMapper);
if (previewInfo.status() == KnowledgeItemPreviewStatus.READY && !previewPdfExists(item, previewInfo)) {
previewInfo = markPreviewFailed(item, previewInfo, "预览文件不存在");
}
return buildResponse(previewInfo);
}
public KnowledgeItemPreviewStatusResponse ensurePreview(String setId, String itemId) {
KnowledgeItem item = requireKnowledgeItem(setId, itemId);
assertOfficeDocument(item);
KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo = KnowledgeItemPreviewMetadataHelper
.readPreviewInfo(item.getMetadata(), objectMapper);
if (previewInfo.status() == KnowledgeItemPreviewStatus.READY && previewPdfExists(item, previewInfo)) {
return buildResponse(previewInfo);
}
if (previewInfo.status() == KnowledgeItemPreviewStatus.PROCESSING) {
return buildResponse(previewInfo);
}
String previewRelativePath = resolvePreviewRelativePath(item.getSetId(), item.getId());
String updatedMetadata = KnowledgeItemPreviewMetadataHelper.applyPreviewInfo(
item.getMetadata(),
objectMapper,
KnowledgeItemPreviewStatus.PROCESSING,
previewRelativePath,
null,
nowText()
);
item.setMetadata(updatedMetadata);
knowledgeItemRepository.updateById(item);
knowledgeItemPreviewAsyncService.convertPreviewAsync(item.getId());
KnowledgeItemPreviewMetadataHelper.PreviewInfo refreshed = KnowledgeItemPreviewMetadataHelper
.readPreviewInfo(updatedMetadata, objectMapper);
return buildResponse(refreshed);
}
public boolean isOfficeDocument(String fileName) {
String extension = resolveFileExtension(fileName);
return StringUtils.isNotBlank(extension) && OFFICE_EXTENSIONS.contains(extension.toLowerCase());
}
public PreviewFile resolveReadyPreviewFile(String setId, KnowledgeItem item) {
if (item == null) {
return null;
}
KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo = KnowledgeItemPreviewMetadataHelper
.readPreviewInfo(item.getMetadata(), objectMapper);
if (previewInfo.status() != KnowledgeItemPreviewStatus.READY) {
return null;
}
String relativePath = StringUtils.defaultIfBlank(previewInfo.pdfPath(), resolvePreviewRelativePath(setId, item.getId()));
Path filePath = resolvePreviewStoragePath(relativePath);
if (!Files.exists(filePath) || !Files.isRegularFile(filePath)) {
markPreviewFailed(item, previewInfo, "预览文件不存在");
return null;
}
String previewName = resolvePreviewPdfName(item);
return new PreviewFile(filePath, previewName);
}
public String clearPreviewMetadata(String metadata) {
return KnowledgeItemPreviewMetadataHelper.clearPreviewInfo(metadata, objectMapper);
}
public void deletePreviewFileQuietly(String setId, String itemId) {
String relativePath = resolvePreviewRelativePath(setId, itemId);
Path filePath = resolvePreviewStoragePath(relativePath);
try {
Files.deleteIfExists(filePath);
} catch (Exception e) {
log.warn("delete preview pdf error, itemId: {}", itemId, e);
}
}
private KnowledgeItemPreviewStatusResponse buildResponse(KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo) {
KnowledgeItemPreviewStatusResponse response = new KnowledgeItemPreviewStatusResponse();
KnowledgeItemPreviewStatus status = previewInfo.status() == null
? KnowledgeItemPreviewStatus.PENDING
: previewInfo.status();
response.setStatus(status);
response.setPreviewError(previewInfo.error());
response.setUpdatedAt(previewInfo.updatedAt());
return response;
}
private KnowledgeItem requireKnowledgeItem(String setId, String itemId) {
BusinessAssert.isTrue(StringUtils.isNotBlank(setId), CommonErrorCode.PARAM_ERROR);
BusinessAssert.isTrue(StringUtils.isNotBlank(itemId), CommonErrorCode.PARAM_ERROR);
KnowledgeItem knowledgeItem = knowledgeItemRepository.getById(itemId);
BusinessAssert.notNull(knowledgeItem, CommonErrorCode.PARAM_ERROR);
BusinessAssert.isTrue(Objects.equals(knowledgeItem.getSetId(), setId), CommonErrorCode.PARAM_ERROR);
return knowledgeItem;
}
private void assertOfficeDocument(KnowledgeItem item) {
BusinessAssert.notNull(item, CommonErrorCode.PARAM_ERROR);
BusinessAssert.isTrue(
item.getContentType() == KnowledgeContentType.FILE || item.getSourceType() == KnowledgeSourceType.FILE_UPLOAD,
CommonErrorCode.PARAM_ERROR
);
String extension = resolveFileExtension(resolveOriginalName(item));
BusinessAssert.isTrue(OFFICE_EXTENSIONS.contains(extension), CommonErrorCode.PARAM_ERROR);
}
private String resolveOriginalName(KnowledgeItem item) {
if (item == null) {
return "";
}
if (StringUtils.isNotBlank(item.getSourceFileId())) {
return item.getSourceFileId();
}
if (StringUtils.isNotBlank(item.getContent())) {
return Paths.get(item.getContent()).getFileName().toString();
}
return "";
}
private String resolveFileExtension(String fileName) {
if (StringUtils.isBlank(fileName)) {
return "";
}
int dotIndex = fileName.lastIndexOf('.');
if (dotIndex <= 0 || dotIndex >= fileName.length() - 1) {
return "";
}
return fileName.substring(dotIndex + 1).toLowerCase();
}
private String resolvePreviewPdfName(KnowledgeItem item) {
String originalName = resolveOriginalName(item);
if (StringUtils.isBlank(originalName)) {
return "预览.pdf";
}
int dotIndex = originalName.lastIndexOf('.');
if (dotIndex <= 0) {
return originalName + PREVIEW_FILE_SUFFIX;
}
return originalName.substring(0, dotIndex) + PREVIEW_FILE_SUFFIX;
}
private boolean previewPdfExists(KnowledgeItem item, KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo) {
String relativePath = StringUtils.defaultIfBlank(previewInfo.pdfPath(), resolvePreviewRelativePath(item.getSetId(), item.getId()));
Path filePath = resolvePreviewStoragePath(relativePath);
return Files.exists(filePath) && Files.isRegularFile(filePath);
}
private KnowledgeItemPreviewMetadataHelper.PreviewInfo markPreviewFailed(
KnowledgeItem item,
KnowledgeItemPreviewMetadataHelper.PreviewInfo previewInfo,
String error
) {
String relativePath = StringUtils.defaultIfBlank(previewInfo.pdfPath(), resolvePreviewRelativePath(item.getSetId(), item.getId()));
String updatedMetadata = KnowledgeItemPreviewMetadataHelper.applyPreviewInfo(
item.getMetadata(),
objectMapper,
KnowledgeItemPreviewStatus.FAILED,
relativePath,
error,
nowText()
);
item.setMetadata(updatedMetadata);
knowledgeItemRepository.updateById(item);
return KnowledgeItemPreviewMetadataHelper.readPreviewInfo(updatedMetadata, objectMapper);
}
private String resolvePreviewRelativePath(String setId, String itemId) {
String relativePath = Paths.get(KNOWLEDGE_ITEM_UPLOAD_DIR, setId, PREVIEW_SUB_DIR, itemId + PREVIEW_FILE_SUFFIX)
.toString();
return relativePath.replace("\\", PATH_SEPARATOR);
}
private Path resolvePreviewStoragePath(String relativePath) {
String normalizedRelativePath = StringUtils.defaultString(relativePath).replace("/", java.io.File.separator);
Path root = resolveUploadRootPath();
Path target = root.resolve(normalizedRelativePath).toAbsolutePath().normalize();
BusinessAssert.isTrue(target.startsWith(root), CommonErrorCode.PARAM_ERROR);
return target;
}
private Path resolveUploadRootPath() {
String uploadDir = dataManagementProperties.getFileStorage().getUploadDir();
BusinessAssert.isTrue(StringUtils.isNotBlank(uploadDir), CommonErrorCode.PARAM_ERROR);
return Paths.get(uploadDir).toAbsolutePath().normalize();
}
private String nowText() {
return LocalDateTime.now().format(PREVIEW_TIME_FORMATTER);
}
public record PreviewFile(Path filePath, String fileName) {
}
}

View File

@@ -0,0 +1,93 @@
package com.datamate.datamanagement.application;
import java.io.IOException;
import java.io.InputStream;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.StandardCopyOption;
import java.time.Duration;
import java.util.List;
import java.util.concurrent.TimeUnit;
/**
* LibreOffice 文档转换工具
*/
public final class LibreOfficeConverter {
private static final String LIBREOFFICE_COMMAND = "soffice";
private static final Duration CONVERT_TIMEOUT = Duration.ofMinutes(5);
private static final int MAX_OUTPUT_LENGTH = 500;
private LibreOfficeConverter() {
}
public static void convertToPdf(Path sourcePath, Path targetPath) throws Exception {
Path outputDir = targetPath.getParent();
List<String> command = List.of(
LIBREOFFICE_COMMAND,
"--headless",
"--nologo",
"--nolockcheck",
"--nodefault",
"--nofirststartwizard",
"--convert-to",
"pdf",
"--outdir",
outputDir.toString(),
sourcePath.toString()
);
ProcessBuilder processBuilder = new ProcessBuilder(command);
processBuilder.redirectErrorStream(true);
Process process = processBuilder.start();
boolean finished = process.waitFor(CONVERT_TIMEOUT.toMillis(), TimeUnit.MILLISECONDS);
String output = readProcessOutput(process.getInputStream());
if (!finished) {
process.destroyForcibly();
throw new IllegalStateException("LibreOffice 转换超时");
}
if (process.exitValue() != 0) {
throw new IllegalStateException("LibreOffice 转换失败: " + output);
}
Path generated = outputDir.resolve(stripExtension(sourcePath.getFileName().toString()) + ".pdf");
if (!Files.exists(generated)) {
throw new IllegalStateException("LibreOffice 输出文件不存在");
}
if (!generated.equals(targetPath)) {
Files.move(generated, targetPath, StandardCopyOption.REPLACE_EXISTING);
}
}
private static String readProcessOutput(InputStream inputStream) throws IOException {
if (inputStream == null) {
return "";
}
byte[] buffer = new byte[1024];
StringBuilder builder = new StringBuilder();
int total = 0;
int read;
while ((read = inputStream.read(buffer)) >= 0) {
if (read == 0) {
continue;
}
int remaining = MAX_OUTPUT_LENGTH - total;
if (remaining <= 0) {
break;
}
int toAppend = Math.min(remaining, read);
builder.append(new String(buffer, 0, toAppend, StandardCharsets.UTF_8));
total += toAppend;
if (total >= MAX_OUTPUT_LENGTH) {
break;
}
}
return builder.toString();
}
private static String stripExtension(String fileName) {
if (fileName == null || fileName.isBlank()) {
return "preview";
}
int dotIndex = fileName.lastIndexOf('.');
return dotIndex <= 0 ? fileName : fileName.substring(0, dotIndex);
}
}

View File

@@ -0,0 +1,11 @@
package com.datamate.datamanagement.common.enums;
/**
* 知识条目预览转换状态
*/
public enum KnowledgeItemPreviewStatus {
PENDING,
PROCESSING,
READY,
FAILED
}

View File

@@ -114,9 +114,9 @@ public class Dataset extends BaseEntity<String> {
this.updatedAt = LocalDateTime.now();
}
public void initCreateParam(String datasetBasePath, String parentPath) {
public void initCreateParam(String datasetBasePath) {
this.id = UUID.randomUUID().toString();
String basePath = normalizeBasePath(parentPath != null && !parentPath.isBlank() ? parentPath : datasetBasePath);
String basePath = normalizeBasePath(datasetBasePath);
this.path = basePath + File.separator + this.id;
if (this.status == null) {
this.status = DatasetStatusType.DRAFT;

View File

@@ -38,4 +38,12 @@ public class KnowledgeItem extends BaseEntity<String> {
* 来源文件ID
*/
private String sourceFileId;
/**
* 相对路径(用于目录展示)
*/
private String relativePath;
/**
* 扩展元数据
*/
private String metadata;
}

View File

@@ -0,0 +1,29 @@
package com.datamate.datamanagement.domain.model.knowledge;
import com.baomidou.mybatisplus.annotation.TableName;
import com.datamate.common.domain.model.base.BaseEntity;
import lombok.Getter;
import lombok.Setter;
/**
* 知识条目目录实体(与数据库表 t_dm_knowledge_item_directories 对齐)
*/
@Getter
@Setter
@TableName(value = "t_dm_knowledge_item_directories", autoResultMap = true)
public class KnowledgeItemDirectory extends BaseEntity<String> {
/**
* 所属知识集ID
*/
private String setId;
/**
* 目录名称
*/
private String name;
/**
* 目录相对路径
*/
private String relativePath;
}

View File

@@ -42,9 +42,9 @@ public enum DataManagementErrorCode implements ErrorCode {
*/
DIRECTORY_NOT_FOUND("data_management.0007", "目录不存在"),
/**
* 存在数据集
* 存在关联数据集
*/
DATASET_HAS_CHILDREN("data_management.0008", "存在数据集,禁止删除或移动"),
DATASET_HAS_CHILDREN("data_management.0008", "存在关联数据集,禁止删除或移动"),
/**
* 数据集文件不存在
*/

View File

@@ -2,6 +2,7 @@ package com.datamate.datamanagement.infrastructure.persistence.mapper;
import com.baomidou.mybatisplus.core.mapper.BaseMapper;
import com.datamate.datamanagement.domain.model.dataset.DatasetFile;
import com.datamate.datamanagement.infrastructure.persistence.repository.dto.DatasetFileCount;
import org.apache.ibatis.annotations.Mapper;
import org.apache.ibatis.annotations.Param;
import org.apache.ibatis.session.RowBounds;
@@ -17,6 +18,7 @@ public interface DatasetFileMapper extends BaseMapper<DatasetFile> {
Long countByDatasetId(@Param("datasetId") String datasetId);
Long countCompletedByDatasetId(@Param("datasetId") String datasetId);
Long sumSizeByDatasetId(@Param("datasetId") String datasetId);
Long countNonDerivedByDatasetId(@Param("datasetId") String datasetId);
DatasetFile findByDatasetIdAndFileName(@Param("datasetId") String datasetId, @Param("fileName") String fileName);
List<DatasetFile> findAllByDatasetId(@Param("datasetId") String datasetId);
List<DatasetFile> findByCriteria(@Param("datasetId") String datasetId,
@@ -38,4 +40,12 @@ public interface DatasetFileMapper extends BaseMapper<DatasetFile> {
* @return 源文件ID列表
*/
List<String> findSourceFileIdsWithDerivedFiles(@Param("datasetId") String datasetId);
/**
* 批量统计排除衍生文件后的文件数
*
* @param datasetIds 数据集ID列表
* @return 文件数统计列表
*/
List<DatasetFileCount> countNonDerivedByDatasetIds(@Param("datasetIds") List<String> datasetIds);
}

View File

@@ -0,0 +1,9 @@
package com.datamate.datamanagement.infrastructure.persistence.mapper;
import com.baomidou.mybatisplus.core.mapper.BaseMapper;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItemDirectory;
import org.apache.ibatis.annotations.Mapper;
@Mapper
public interface KnowledgeItemDirectoryMapper extends BaseMapper<KnowledgeItemDirectory> {
}

View File

@@ -1,9 +1,52 @@
package com.datamate.datamanagement.infrastructure.persistence.mapper;
import com.baomidou.mybatisplus.core.mapper.BaseMapper;
import com.baomidou.mybatisplus.core.metadata.IPage;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItem;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemSearchResponse;
import org.apache.ibatis.annotations.Mapper;
import org.apache.ibatis.annotations.Param;
import org.apache.ibatis.annotations.Select;
@Mapper
public interface KnowledgeItemMapper extends BaseMapper<KnowledgeItem> {
@Select("""
SELECT
ki.id AS id,
ki.set_id AS setId,
ks.name AS setName,
ki.content_type AS contentType,
ki.source_type AS sourceType,
ki.source_dataset_id AS sourceDatasetId,
ki.source_file_id AS sourceFileId,
CASE
WHEN ki.source_type = 'DATASET_FILE' THEN df.file_name
ELSE ki.source_file_id
END AS fileName,
df.file_size AS fileSize,
CASE
WHEN ki.source_type = 'FILE_UPLOAD' THEN ki.content
ELSE NULL
END AS content,
ki.relative_path AS relativePath,
ki.created_at AS createdAt,
ki.updated_at AS updatedAt
FROM t_dm_knowledge_items ki
LEFT JOIN t_dm_knowledge_sets ks ON ki.set_id = ks.id
LEFT JOIN t_dm_dataset_files df ON ki.source_file_id = df.id AND ki.source_type = 'DATASET_FILE'
WHERE (ki.source_type = 'FILE_UPLOAD' AND (ki.source_file_id LIKE CONCAT('%', #{keyword}, '%')
OR ki.relative_path LIKE CONCAT('%', #{keyword}, '%')))
OR (ki.source_type = 'DATASET_FILE' AND (df.file_name LIKE CONCAT('%', #{keyword}, '%')
OR ki.relative_path LIKE CONCAT('%', #{keyword}, '%')))
ORDER BY ki.created_at DESC
""")
IPage<KnowledgeItemSearchResponse> searchFileItems(IPage<?> page, @Param("keyword") String keyword);
@Select("""
SELECT COALESCE(SUM(df.file_size), 0)
FROM t_dm_knowledge_items ki
LEFT JOIN t_dm_dataset_files df ON ki.source_file_id = df.id
WHERE ki.source_type = 'DATASET_FILE'
""")
Long sumDatasetFileSize();
}

View File

@@ -14,6 +14,7 @@ public interface TagMapper {
List<Tag> findByIdIn(@Param("ids") List<String> ids);
List<Tag> findByKeyword(@Param("keyword") String keyword);
List<Tag> findAllByOrderByUsageCountDesc();
Long countKnowledgeSetTags();
int insert(Tag tag);
int update(Tag tag);

View File

@@ -3,6 +3,7 @@ package com.datamate.datamanagement.infrastructure.persistence.repository;
import com.baomidou.mybatisplus.core.metadata.IPage;
import com.baomidou.mybatisplus.extension.repository.IRepository;
import com.datamate.datamanagement.domain.model.dataset.DatasetFile;
import com.datamate.datamanagement.infrastructure.persistence.repository.dto.DatasetFileCount;
import java.util.List;
@@ -15,6 +16,8 @@ import java.util.List;
public interface DatasetFileRepository extends IRepository<DatasetFile> {
Long countByDatasetId(String datasetId);
Long countNonDerivedByDatasetId(String datasetId);
Long countCompletedByDatasetId(String datasetId);
Long sumSizeByDatasetId(String datasetId);
@@ -36,4 +39,6 @@ public interface DatasetFileRepository extends IRepository<DatasetFile> {
* @return 源文件ID列表
*/
List<String> findSourceFileIdsWithDerivedFiles(String datasetId);
List<DatasetFileCount> countNonDerivedByDatasetIds(List<String> datasetIds);
}

View File

@@ -0,0 +1,18 @@
package com.datamate.datamanagement.infrastructure.persistence.repository;
import com.baomidou.mybatisplus.extension.repository.IRepository;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItemDirectory;
import com.datamate.datamanagement.interfaces.dto.KnowledgeDirectoryQuery;
import java.util.List;
/**
* 知识条目目录仓储接口
*/
public interface KnowledgeItemDirectoryRepository extends IRepository<KnowledgeItemDirectory> {
List<KnowledgeItemDirectory> findByCriteria(KnowledgeDirectoryQuery query);
KnowledgeItemDirectory findBySetIdAndPath(String setId, String relativePath);
int removeByRelativePathPrefix(String setId, String relativePath);
}

View File

@@ -2,8 +2,11 @@ package com.datamate.datamanagement.infrastructure.persistence.repository;
import com.baomidou.mybatisplus.core.metadata.IPage;
import com.baomidou.mybatisplus.extension.repository.IRepository;
import com.baomidou.mybatisplus.core.metadata.IPage;
import com.datamate.datamanagement.common.enums.KnowledgeSourceType;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItem;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemPagingQuery;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemSearchResponse;
import java.util.List;
/**
@@ -15,4 +18,16 @@ public interface KnowledgeItemRepository extends IRepository<KnowledgeItem> {
long countBySetId(String setId);
List<KnowledgeItem> findAllBySetId(String setId);
long countBySourceTypes(List<KnowledgeSourceType> sourceTypes);
List<KnowledgeItem> findFileUploadItems();
IPage<KnowledgeItemSearchResponse> searchFileItems(IPage<?> page, String keyword);
Long sumDatasetFileSize();
boolean existsBySetIdAndRelativePath(String setId, String relativePath);
int removeByRelativePathPrefix(String setId, String relativePath);
}

View File

@@ -0,0 +1,18 @@
package com.datamate.datamanagement.infrastructure.persistence.repository.dto;
import lombok.AllArgsConstructor;
import lombok.Getter;
import lombok.NoArgsConstructor;
import lombok.Setter;
/**
* 数据集文件数统计结果
*/
@Getter
@Setter
@NoArgsConstructor
@AllArgsConstructor
public class DatasetFileCount {
private String datasetId;
private Long fileCount;
}

View File

@@ -6,6 +6,7 @@ import com.baomidou.mybatisplus.extension.repository.CrudRepository;
import com.datamate.datamanagement.domain.model.dataset.DatasetFile;
import com.datamate.datamanagement.infrastructure.persistence.mapper.DatasetFileMapper;
import com.datamate.datamanagement.infrastructure.persistence.repository.DatasetFileRepository;
import com.datamate.datamanagement.infrastructure.persistence.repository.dto.DatasetFileCount;
import lombok.RequiredArgsConstructor;
import org.springframework.stereotype.Repository;
import org.springframework.util.StringUtils;
@@ -30,6 +31,11 @@ public class DatasetFileRepositoryImpl extends CrudRepository<DatasetFileMapper,
return datasetFileMapper.selectCount(new LambdaQueryWrapper<DatasetFile>().eq(DatasetFile::getDatasetId, datasetId));
}
@Override
public Long countNonDerivedByDatasetId(String datasetId) {
return datasetFileMapper.countNonDerivedByDatasetId(datasetId);
}
@Override
public Long countCompletedByDatasetId(String datasetId) {
return datasetFileMapper.countCompletedByDatasetId(datasetId);
@@ -71,4 +77,9 @@ public class DatasetFileRepositoryImpl extends CrudRepository<DatasetFileMapper,
// 使用 MyBatis 的 @Select 注解或直接调用 mapper 方法
return datasetFileMapper.findSourceFileIdsWithDerivedFiles(datasetId);
}
@Override
public List<DatasetFileCount> countNonDerivedByDatasetIds(List<String> datasetIds) {
return datasetFileMapper.countNonDerivedByDatasetIds(datasetIds);
}
}

View File

@@ -0,0 +1,96 @@
package com.datamate.datamanagement.infrastructure.persistence.repository.impl;
import com.baomidou.mybatisplus.core.conditions.query.LambdaQueryWrapper;
import com.baomidou.mybatisplus.extension.repository.CrudRepository;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItemDirectory;
import com.datamate.datamanagement.infrastructure.persistence.mapper.KnowledgeItemDirectoryMapper;
import com.datamate.datamanagement.infrastructure.persistence.repository.KnowledgeItemDirectoryRepository;
import com.datamate.datamanagement.interfaces.dto.KnowledgeDirectoryQuery;
import lombok.RequiredArgsConstructor;
import org.apache.commons.lang3.StringUtils;
import org.springframework.stereotype.Repository;
import java.util.List;
/**
* 知识条目目录仓储实现类
*/
@Repository
@RequiredArgsConstructor
public class KnowledgeItemDirectoryRepositoryImpl
extends CrudRepository<KnowledgeItemDirectoryMapper, KnowledgeItemDirectory>
implements KnowledgeItemDirectoryRepository {
private static final String PATH_SEPARATOR = "/";
private final KnowledgeItemDirectoryMapper knowledgeItemDirectoryMapper;
@Override
public List<KnowledgeItemDirectory> findByCriteria(KnowledgeDirectoryQuery query) {
String relativePath = normalizeRelativePathPrefix(query.getRelativePath());
LambdaQueryWrapper<KnowledgeItemDirectory> wrapper = new LambdaQueryWrapper<KnowledgeItemDirectory>()
.eq(StringUtils.isNotBlank(query.getSetId()), KnowledgeItemDirectory::getSetId, query.getSetId())
.likeRight(StringUtils.isNotBlank(relativePath), KnowledgeItemDirectory::getRelativePath, relativePath);
if (StringUtils.isNotBlank(query.getKeyword())) {
wrapper.and(w -> w.like(KnowledgeItemDirectory::getName, query.getKeyword())
.or()
.like(KnowledgeItemDirectory::getRelativePath, query.getKeyword()));
}
wrapper.orderByAsc(KnowledgeItemDirectory::getRelativePath);
return knowledgeItemDirectoryMapper.selectList(wrapper);
}
@Override
public KnowledgeItemDirectory findBySetIdAndPath(String setId, String relativePath) {
return knowledgeItemDirectoryMapper.selectOne(new LambdaQueryWrapper<KnowledgeItemDirectory>()
.eq(KnowledgeItemDirectory::getSetId, setId)
.eq(KnowledgeItemDirectory::getRelativePath, relativePath));
}
@Override
public int removeByRelativePathPrefix(String setId, String relativePath) {
String normalized = normalizeRelativePathValue(relativePath);
if (StringUtils.isBlank(normalized)) {
return 0;
}
String prefix = normalizeRelativePathPrefix(normalized);
LambdaQueryWrapper<KnowledgeItemDirectory> wrapper = new LambdaQueryWrapper<KnowledgeItemDirectory>()
.eq(KnowledgeItemDirectory::getSetId, setId)
.and(w -> w.eq(KnowledgeItemDirectory::getRelativePath, normalized)
.or()
.likeRight(KnowledgeItemDirectory::getRelativePath, prefix));
return knowledgeItemDirectoryMapper.delete(wrapper);
}
private String normalizeRelativePathPrefix(String relativePath) {
if (StringUtils.isBlank(relativePath)) {
return "";
}
String normalized = relativePath.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
if (StringUtils.isBlank(normalized)) {
return "";
}
if (!normalized.endsWith(PATH_SEPARATOR)) {
normalized = normalized + PATH_SEPARATOR;
}
return normalized;
}
private String normalizeRelativePathValue(String relativePath) {
if (StringUtils.isBlank(relativePath)) {
return "";
}
String normalized = relativePath.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
while (normalized.endsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(0, normalized.length() - 1);
}
return normalized;
}
}

View File

@@ -3,10 +3,12 @@ package com.datamate.datamanagement.infrastructure.persistence.repository.impl;
import com.baomidou.mybatisplus.core.conditions.query.LambdaQueryWrapper;
import com.baomidou.mybatisplus.core.metadata.IPage;
import com.baomidou.mybatisplus.extension.repository.CrudRepository;
import com.datamate.datamanagement.common.enums.KnowledgeSourceType;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItem;
import com.datamate.datamanagement.infrastructure.persistence.mapper.KnowledgeItemMapper;
import com.datamate.datamanagement.infrastructure.persistence.repository.KnowledgeItemRepository;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemPagingQuery;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemSearchResponse;
import lombok.RequiredArgsConstructor;
import org.apache.commons.lang3.StringUtils;
import org.springframework.stereotype.Repository;
@@ -19,21 +21,26 @@ import java.util.List;
@Repository
@RequiredArgsConstructor
public class KnowledgeItemRepositoryImpl extends CrudRepository<KnowledgeItemMapper, KnowledgeItem> implements KnowledgeItemRepository {
private static final String PATH_SEPARATOR = "/";
private final KnowledgeItemMapper knowledgeItemMapper;
@Override
public IPage<KnowledgeItem> findByCriteria(IPage<KnowledgeItem> page, KnowledgeItemPagingQuery query) {
String relativePath = normalizeRelativePathPrefix(query.getRelativePath());
LambdaQueryWrapper<KnowledgeItem> wrapper = new LambdaQueryWrapper<KnowledgeItem>()
.eq(StringUtils.isNotBlank(query.getSetId()), KnowledgeItem::getSetId, query.getSetId())
.eq(query.getContentType() != null, KnowledgeItem::getContentType, query.getContentType())
.eq(query.getSourceType() != null, KnowledgeItem::getSourceType, query.getSourceType())
.eq(StringUtils.isNotBlank(query.getSourceDatasetId()), KnowledgeItem::getSourceDatasetId, query.getSourceDatasetId())
.eq(StringUtils.isNotBlank(query.getSourceFileId()), KnowledgeItem::getSourceFileId, query.getSourceFileId());
.eq(StringUtils.isNotBlank(query.getSourceFileId()), KnowledgeItem::getSourceFileId, query.getSourceFileId())
.likeRight(StringUtils.isNotBlank(relativePath), KnowledgeItem::getRelativePath, relativePath);
if (StringUtils.isNotBlank(query.getKeyword())) {
wrapper.and(w -> w.like(KnowledgeItem::getSourceFileId, query.getKeyword())
.or()
.like(KnowledgeItem::getContent, query.getKeyword()));
.like(KnowledgeItem::getContent, query.getKeyword())
.or()
.like(KnowledgeItem::getRelativePath, query.getKeyword()));
}
wrapper.orderByDesc(KnowledgeItem::getCreatedAt);
@@ -52,4 +59,83 @@ public class KnowledgeItemRepositoryImpl extends CrudRepository<KnowledgeItemMap
.eq(KnowledgeItem::getSetId, setId)
.orderByDesc(KnowledgeItem::getCreatedAt));
}
@Override
public long countBySourceTypes(List<KnowledgeSourceType> sourceTypes) {
return knowledgeItemMapper.selectCount(new LambdaQueryWrapper<KnowledgeItem>()
.in(KnowledgeItem::getSourceType, sourceTypes));
}
@Override
public List<KnowledgeItem> findFileUploadItems() {
return knowledgeItemMapper.selectList(new LambdaQueryWrapper<KnowledgeItem>()
.eq(KnowledgeItem::getSourceType, KnowledgeSourceType.FILE_UPLOAD)
.select(KnowledgeItem::getId, KnowledgeItem::getContent, KnowledgeItem::getSourceFileId));
}
@Override
public IPage<KnowledgeItemSearchResponse> searchFileItems(IPage<?> page, String keyword) {
return knowledgeItemMapper.searchFileItems(page, keyword);
}
@Override
public Long sumDatasetFileSize() {
return knowledgeItemMapper.sumDatasetFileSize();
}
@Override
public boolean existsBySetIdAndRelativePath(String setId, String relativePath) {
if (StringUtils.isBlank(setId) || StringUtils.isBlank(relativePath)) {
return false;
}
return knowledgeItemMapper.selectCount(new LambdaQueryWrapper<KnowledgeItem>()
.eq(KnowledgeItem::getSetId, setId)
.eq(KnowledgeItem::getRelativePath, relativePath)) > 0;
}
@Override
public int removeByRelativePathPrefix(String setId, String relativePath) {
String normalized = normalizeRelativePathValue(relativePath);
if (StringUtils.isBlank(setId) || StringUtils.isBlank(normalized)) {
return 0;
}
String prefix = normalizeRelativePathPrefix(normalized);
LambdaQueryWrapper<KnowledgeItem> wrapper = new LambdaQueryWrapper<KnowledgeItem>()
.eq(KnowledgeItem::getSetId, setId)
.and(w -> w.eq(KnowledgeItem::getRelativePath, normalized)
.or()
.likeRight(KnowledgeItem::getRelativePath, prefix));
return knowledgeItemMapper.delete(wrapper);
}
private String normalizeRelativePathPrefix(String relativePath) {
if (StringUtils.isBlank(relativePath)) {
return "";
}
String normalized = relativePath.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
if (StringUtils.isBlank(normalized)) {
return "";
}
if (!normalized.endsWith(PATH_SEPARATOR)) {
normalized = normalized + PATH_SEPARATOR;
}
return normalized;
}
private String normalizeRelativePathValue(String relativePath) {
if (StringUtils.isBlank(relativePath)) {
return "";
}
String normalized = relativePath.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
while (normalized.endsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(0, normalized.length() - 1);
}
return normalized;
}
}

View File

@@ -1,9 +1,11 @@
package com.datamate.datamanagement.interfaces.converter;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItem;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItemDirectory;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeSet;
import com.datamate.datamanagement.interfaces.dto.CreateKnowledgeItemRequest;
import com.datamate.datamanagement.interfaces.dto.CreateKnowledgeSetRequest;
import com.datamate.datamanagement.interfaces.dto.KnowledgeDirectoryResponse;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemResponse;
import com.datamate.datamanagement.interfaces.dto.KnowledgeSetResponse;
import org.mapstruct.Mapper;
@@ -31,4 +33,8 @@ public interface KnowledgeConverter {
KnowledgeItemResponse convertToResponse(KnowledgeItem knowledgeItem);
List<KnowledgeItemResponse> convertItemResponses(List<KnowledgeItem> items);
KnowledgeDirectoryResponse convertToResponse(KnowledgeItemDirectory directory);
List<KnowledgeDirectoryResponse> convertDirectoryResponses(List<KnowledgeItemDirectory> directories);
}

View File

@@ -0,0 +1,20 @@
package com.datamate.datamanagement.interfaces.dto;
import jakarta.validation.constraints.NotBlank;
import lombok.Getter;
import lombok.Setter;
/**
* 创建知识条目目录请求
*/
@Getter
@Setter
public class CreateKnowledgeDirectoryRequest {
/** 父级前缀路径,例如 "docs/",为空表示知识集根目录 */
private String parentPrefix;
/** 新建目录名称 */
@NotBlank
private String directoryName;
}

View File

@@ -34,4 +34,8 @@ public class CreateKnowledgeItemRequest {
* 来源文件ID(用于标注同步等场景)
*/
private String sourceFileId;
/**
* 扩展元数据
*/
private String metadata;
}

View File

@@ -0,0 +1,16 @@
package com.datamate.datamanagement.interfaces.dto;
import com.datamate.datamanagement.common.enums.KnowledgeItemPreviewStatus;
import lombok.Getter;
import lombok.Setter;
/**
* 数据集文件预览状态响应
*/
@Getter
@Setter
public class DatasetFilePreviewStatusResponse {
private KnowledgeItemPreviewStatus status;
private String previewError;
private String updatedAt;
}

View File

@@ -0,0 +1,20 @@
package com.datamate.datamanagement.interfaces.dto;
import jakarta.validation.constraints.NotEmpty;
import lombok.Getter;
import lombok.Setter;
import java.util.List;
/**
* 批量删除知识条目请求
*/
@Getter
@Setter
public class DeleteKnowledgeItemsRequest {
/**
* 知识条目ID列表
*/
@NotEmpty(message = "知识条目ID不能为空")
private List<String> ids;
}

View File

@@ -0,0 +1,20 @@
package com.datamate.datamanagement.interfaces.dto;
import lombok.Getter;
import lombok.Setter;
/**
* 知识条目目录查询参数
*/
@Getter
@Setter
public class KnowledgeDirectoryQuery {
/** 所属知识集ID */
private String setId;
/** 目录相对路径前缀 */
private String relativePath;
/** 搜索关键字 */
private String keyword;
}

View File

@@ -0,0 +1,20 @@
package com.datamate.datamanagement.interfaces.dto;
import lombok.Getter;
import lombok.Setter;
import java.time.LocalDateTime;
/**
* 知识条目目录响应
*/
@Getter
@Setter
public class KnowledgeDirectoryResponse {
private String id;
private String setId;
private String name;
private String relativePath;
private LocalDateTime createdAt;
private LocalDateTime updatedAt;
}

View File

@@ -41,4 +41,8 @@ public class KnowledgeItemPagingQuery extends PagingQuery {
* 来源文件ID
*/
private String sourceFileId;
/**
* 相对路径前缀
*/
private String relativePath;
}

View File

@@ -0,0 +1,16 @@
package com.datamate.datamanagement.interfaces.dto;
import com.datamate.datamanagement.common.enums.KnowledgeItemPreviewStatus;
import lombok.Getter;
import lombok.Setter;
/**
* 知识条目预览状态响应
*/
@Getter
@Setter
public class KnowledgeItemPreviewStatusResponse {
private KnowledgeItemPreviewStatus status;
private String previewError;
private String updatedAt;
}

View File

@@ -20,6 +20,14 @@ public class KnowledgeItemResponse {
private KnowledgeSourceType sourceType;
private String sourceDatasetId;
private String sourceFileId;
/**
* 相对路径(用于目录展示)
*/
private String relativePath;
/**
* 扩展元数据
*/
private String metadata;
private LocalDateTime createdAt;
private LocalDateTime updatedAt;
private String createdBy;

View File

@@ -0,0 +1,17 @@
package com.datamate.datamanagement.interfaces.dto;
import com.datamate.common.interfaces.PagingQuery;
import lombok.Getter;
import lombok.Setter;
/**
* 知识条目文件搜索请求
*/
@Getter
@Setter
public class KnowledgeItemSearchQuery extends PagingQuery {
/**
* 文件名关键词
*/
private String keyword;
}

View File

@@ -0,0 +1,35 @@
package com.datamate.datamanagement.interfaces.dto;
import com.datamate.datamanagement.common.enums.KnowledgeContentType;
import com.datamate.datamanagement.common.enums.KnowledgeSourceType;
import com.fasterxml.jackson.annotation.JsonIgnore;
import lombok.Getter;
import lombok.Setter;
import java.time.LocalDateTime;
/**
* 知识条目文件搜索响应
*/
@Getter
@Setter
public class KnowledgeItemSearchResponse {
private String id;
private String setId;
private String setName;
private KnowledgeContentType contentType;
private KnowledgeSourceType sourceType;
private String sourceDatasetId;
private String sourceFileId;
private String fileName;
private Long fileSize;
/**
* 相对路径(用于目录展示)
*/
private String relativePath;
private LocalDateTime createdAt;
private LocalDateTime updatedAt;
@JsonIgnore
private String content;
}

View File

@@ -0,0 +1,16 @@
package com.datamate.datamanagement.interfaces.dto;
import lombok.Getter;
import lombok.Setter;
/**
* 知识管理统计响应
*/
@Getter
@Setter
public class KnowledgeManagementStatisticsResponse {
private Long totalKnowledgeSets = 0L;
private Long totalFiles = 0L;
private Long totalSize = 0L;
private Long totalTags = 0L;
}

View File

@@ -1,8 +1,10 @@
package com.datamate.datamanagement.interfaces.dto;
import com.datamate.datamanagement.common.enums.DatasetStatusType;
import com.fasterxml.jackson.annotation.JsonIgnore;
import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.Size;
import lombok.AccessLevel;
import lombok.Getter;
import lombok.Setter;
@@ -24,9 +26,18 @@ public class UpdateDatasetRequest {
/** 归集任务id */
private String dataSource;
/** 父数据集ID */
@Setter(AccessLevel.NONE)
private String parentDatasetId;
@JsonIgnore
@Setter(AccessLevel.NONE)
private boolean parentDatasetIdProvided;
/** 标签列表 */
private List<String> tags;
/** 数据集状态 */
private DatasetStatusType status;
public void setParentDatasetId(String parentDatasetId) {
this.parentDatasetIdProvided = true;
this.parentDatasetId = parentDatasetId;
}
}

View File

@@ -18,4 +18,8 @@ public class UpdateKnowledgeItemRequest {
* 内容类型
*/
private KnowledgeContentType contentType;
/**
* 扩展元数据
*/
private String metadata;
}

View File

@@ -17,4 +17,8 @@ public class UploadKnowledgeItemsRequest {
*/
@NotEmpty(message = "文件列表不能为空")
private List<MultipartFile> files;
/**
* 目录前缀(用于目录上传)
*/
private String parentPrefix;
}

View File

@@ -5,20 +5,23 @@ import com.datamate.common.infrastructure.common.Response;
import com.datamate.common.infrastructure.exception.SystemErrorCode;
import com.datamate.common.interfaces.PagedResponse;
import com.datamate.common.interfaces.PagingQuery;
import com.datamate.datamanagement.application.DatasetFileApplicationService;
import com.datamate.datamanagement.application.DatasetFileApplicationService;
import com.datamate.datamanagement.application.DatasetFilePreviewService;
import com.datamate.datamanagement.domain.model.dataset.DatasetFile;
import com.datamate.datamanagement.interfaces.converter.DatasetConverter;
import com.datamate.datamanagement.interfaces.dto.AddFilesRequest;
import com.datamate.datamanagement.interfaces.dto.CopyFilesRequest;
import com.datamate.datamanagement.interfaces.dto.CreateDirectoryRequest;
import com.datamate.datamanagement.interfaces.dto.DatasetFileResponse;
import com.datamate.datamanagement.interfaces.dto.UploadFileRequest;
import com.datamate.datamanagement.interfaces.dto.UploadFilesPreRequest;
import com.datamate.datamanagement.interfaces.dto.AddFilesRequest;
import com.datamate.datamanagement.interfaces.dto.CopyFilesRequest;
import com.datamate.datamanagement.interfaces.dto.CreateDirectoryRequest;
import com.datamate.datamanagement.interfaces.dto.DatasetFilePreviewStatusResponse;
import com.datamate.datamanagement.interfaces.dto.DatasetFileResponse;
import com.datamate.datamanagement.interfaces.dto.UploadFileRequest;
import com.datamate.datamanagement.interfaces.dto.UploadFilesPreRequest;
import jakarta.servlet.http.HttpServletResponse;
import jakarta.validation.Valid;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.core.io.Resource;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.core.io.Resource;
import org.springframework.core.io.UrlResource;
import org.springframework.http.HttpHeaders;
import org.springframework.http.HttpStatus;
import org.springframework.http.MediaType;
@@ -36,32 +39,41 @@ import java.util.List;
@RequestMapping("/data-management/datasets/{datasetId}/files")
public class DatasetFileController {
private final DatasetFileApplicationService datasetFileApplicationService;
private final DatasetFileApplicationService datasetFileApplicationService;
private final DatasetFilePreviewService datasetFilePreviewService;
@Autowired
public DatasetFileController(DatasetFileApplicationService datasetFileApplicationService) {
this.datasetFileApplicationService = datasetFileApplicationService;
}
public DatasetFileController(DatasetFileApplicationService datasetFileApplicationService,
DatasetFilePreviewService datasetFilePreviewService) {
this.datasetFileApplicationService = datasetFileApplicationService;
this.datasetFilePreviewService = datasetFilePreviewService;
}
@GetMapping
public Response<PagedResponse<DatasetFile>> getDatasetFiles(
@PathVariable("datasetId") String datasetId,
@RequestParam(value = "isWithDirectory", required = false) boolean isWithDirectory,
@RequestParam(value = "page", required = false, defaultValue = "0") Integer page,
@RequestParam(value = "size", required = false, defaultValue = "20") Integer size,
@RequestParam(value = "prefix", required = false, defaultValue = "") String prefix,
@RequestParam(value = "status", required = false) String status,
@RequestParam(value = "hasAnnotation", required = false) Boolean hasAnnotation,
@RequestParam(value = "excludeSourceDocuments", required = false, defaultValue = "false") Boolean excludeSourceDocuments) {
PagingQuery pagingQuery = new PagingQuery(page, size);
PagedResponse<DatasetFile> filesPage;
if (isWithDirectory) {
filesPage = datasetFileApplicationService.getDatasetFilesWithDirectory(datasetId, prefix, pagingQuery);
} else {
filesPage = datasetFileApplicationService.getDatasetFiles(datasetId, null, status, null, hasAnnotation,
Boolean.TRUE.equals(excludeSourceDocuments), pagingQuery);
}
return Response.ok(filesPage);
public Response<PagedResponse<DatasetFile>> getDatasetFiles(
@PathVariable("datasetId") String datasetId,
@RequestParam(value = "isWithDirectory", required = false) boolean isWithDirectory,
@RequestParam(value = "page", required = false, defaultValue = "0") Integer page,
@RequestParam(value = "size", required = false, defaultValue = "20") Integer size,
@RequestParam(value = "prefix", required = false, defaultValue = "") String prefix,
@RequestParam(value = "status", required = false) String status,
@RequestParam(value = "hasAnnotation", required = false) Boolean hasAnnotation,
@RequestParam(value = "excludeSourceDocuments", required = false, defaultValue = "false") Boolean excludeSourceDocuments,
@RequestParam(value = "excludeDerivedFiles", required = false, defaultValue = "false") Boolean excludeDerivedFiles) {
PagingQuery pagingQuery = new PagingQuery(page, size);
PagedResponse<DatasetFile> filesPage;
if (isWithDirectory) {
filesPage = datasetFileApplicationService.getDatasetFilesWithDirectory(
datasetId,
prefix,
Boolean.TRUE.equals(excludeDerivedFiles),
pagingQuery
);
} else {
filesPage = datasetFileApplicationService.getDatasetFiles(datasetId, null, status, null, hasAnnotation,
Boolean.TRUE.equals(excludeSourceDocuments), pagingQuery);
}
return Response.ok(filesPage);
}
@GetMapping("/{fileId}")
@@ -108,15 +120,28 @@ public class DatasetFileController {
}
}
@IgnoreResponseWrap
@GetMapping(value = "/{fileId}/preview", produces = MediaType.ALL_VALUE)
public ResponseEntity<Resource> previewDatasetFileById(@PathVariable("datasetId") String datasetId,
@PathVariable("fileId") String fileId) {
try {
DatasetFile datasetFile = datasetFileApplicationService.getDatasetFile(datasetId, fileId);
Resource resource = datasetFileApplicationService.downloadFile(datasetId, fileId);
MediaType mediaType = MediaTypeFactory.getMediaType(resource)
.orElse(MediaType.APPLICATION_OCTET_STREAM);
@IgnoreResponseWrap
@GetMapping(value = "/{fileId}/preview", produces = MediaType.ALL_VALUE)
public ResponseEntity<Resource> previewDatasetFileById(@PathVariable("datasetId") String datasetId,
@PathVariable("fileId") String fileId) {
try {
DatasetFile datasetFile = datasetFileApplicationService.getDatasetFile(datasetId, fileId);
if (datasetFilePreviewService.isOfficeDocument(datasetFile.getFileName())) {
DatasetFilePreviewService.PreviewFile previewFile = datasetFilePreviewService
.resolveReadyPreviewFile(datasetId, datasetFile);
if (previewFile == null) {
return ResponseEntity.status(HttpStatus.CONFLICT).build();
}
Resource previewResource = new UrlResource(previewFile.filePath().toUri());
return ResponseEntity.ok()
.contentType(MediaType.APPLICATION_PDF)
.header(HttpHeaders.CONTENT_DISPOSITION,
"inline; filename=\"" + previewFile.fileName() + "\"")
.body(previewResource);
}
Resource resource = datasetFileApplicationService.downloadFile(datasetId, fileId);
MediaType mediaType = MediaTypeFactory.getMediaType(resource)
.orElse(MediaType.APPLICATION_OCTET_STREAM);
return ResponseEntity.ok()
.contentType(mediaType)
@@ -127,8 +152,20 @@ public class DatasetFileController {
return ResponseEntity.status(HttpStatus.NOT_FOUND).build();
} catch (Exception e) {
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).build();
}
}
}
}
@GetMapping("/{fileId}/preview/status")
public DatasetFilePreviewStatusResponse getDatasetFilePreviewStatus(@PathVariable("datasetId") String datasetId,
@PathVariable("fileId") String fileId) {
return datasetFilePreviewService.getPreviewStatus(datasetId, fileId);
}
@PostMapping("/{fileId}/preview/convert")
public DatasetFilePreviewStatusResponse convertDatasetFilePreview(@PathVariable("datasetId") String datasetId,
@PathVariable("fileId") String fileId) {
return datasetFilePreviewService.ensurePreview(datasetId, fileId);
}
@IgnoreResponseWrap
@GetMapping(value = "/download", produces = MediaType.APPLICATION_OCTET_STREAM_VALUE)

View File

@@ -23,7 +23,7 @@ public class DatasetTypeController {
public List<DatasetTypeResponse> getDatasetTypes() {
return Arrays.asList(
createDatasetType("IMAGE", "图像数据集", "用于机器学习的图像数据集", Arrays.asList("jpg", "jpeg", "png", "bmp", "gif")),
createDatasetType("TEXT", "文本数据集", "用于文本分析的文本数据集", Arrays.asList("txt", "csv", "json", "xml")),
createDatasetType("TEXT", "文本数据集", "用于文本分析的文本数据集", Arrays.asList("txt", "csv", "xls", "xlsx", "json", "xml")),
createDatasetType("AUDIO", "音频数据集", "用于音频处理的音频数据集", Arrays.asList("wav", "mp3", "flac", "aac")),
createDatasetType("VIDEO", "视频数据集", "用于视频分析的视频数据集", Arrays.asList("mp4", "avi", "mov", "mkv")),
createDatasetType("MULTIMODAL", "多模态数据集", "包含多种数据类型的数据集", List.of("*"))

View File

@@ -0,0 +1,43 @@
package com.datamate.datamanagement.interfaces.rest;
import com.datamate.datamanagement.application.KnowledgeDirectoryApplicationService;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItemDirectory;
import com.datamate.datamanagement.interfaces.converter.KnowledgeConverter;
import com.datamate.datamanagement.interfaces.dto.CreateKnowledgeDirectoryRequest;
import com.datamate.datamanagement.interfaces.dto.KnowledgeDirectoryQuery;
import com.datamate.datamanagement.interfaces.dto.KnowledgeDirectoryResponse;
import jakarta.validation.Valid;
import lombok.RequiredArgsConstructor;
import org.springframework.web.bind.annotation.*;
import java.util.List;
/**
* 知识条目目录 REST 控制器
*/
@RestController
@RequiredArgsConstructor
@RequestMapping("/data-management/knowledge-sets/{setId}/directories")
public class KnowledgeDirectoryController {
private final KnowledgeDirectoryApplicationService knowledgeDirectoryApplicationService;
@GetMapping
public List<KnowledgeDirectoryResponse> getKnowledgeDirectories(@PathVariable("setId") String setId,
KnowledgeDirectoryQuery query) {
List<KnowledgeItemDirectory> directories = knowledgeDirectoryApplicationService.getKnowledgeDirectories(setId, query);
return KnowledgeConverter.INSTANCE.convertDirectoryResponses(directories);
}
@PostMapping
public KnowledgeDirectoryResponse createKnowledgeDirectory(@PathVariable("setId") String setId,
@RequestBody @Valid CreateKnowledgeDirectoryRequest request) {
KnowledgeItemDirectory directory = knowledgeDirectoryApplicationService.createKnowledgeDirectory(setId, request);
return KnowledgeConverter.INSTANCE.convertToResponse(directory);
}
@DeleteMapping
public void deleteKnowledgeDirectory(@PathVariable("setId") String setId,
@RequestParam("relativePath") String relativePath) {
knowledgeDirectoryApplicationService.deleteKnowledgeDirectory(setId, relativePath);
}
}

View File

@@ -3,11 +3,14 @@ package com.datamate.datamanagement.interfaces.rest;
import com.datamate.common.infrastructure.common.IgnoreResponseWrap;
import com.datamate.common.interfaces.PagedResponse;
import com.datamate.datamanagement.application.KnowledgeItemApplicationService;
import com.datamate.datamanagement.application.KnowledgeItemPreviewService;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeItem;
import com.datamate.datamanagement.interfaces.converter.KnowledgeConverter;
import com.datamate.datamanagement.interfaces.dto.CreateKnowledgeItemRequest;
import com.datamate.datamanagement.interfaces.dto.DeleteKnowledgeItemsRequest;
import com.datamate.datamanagement.interfaces.dto.ImportKnowledgeItemsRequest;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemPagingQuery;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemPreviewStatusResponse;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemResponse;
import com.datamate.datamanagement.interfaces.dto.ReplaceKnowledgeItemFileRequest;
import com.datamate.datamanagement.interfaces.dto.UpdateKnowledgeItemRequest;
@@ -30,6 +33,7 @@ import java.util.List;
@RequestMapping("/data-management/knowledge-sets/{setId}/items")
public class KnowledgeItemController {
private final KnowledgeItemApplicationService knowledgeItemApplicationService;
private final KnowledgeItemPreviewService knowledgeItemPreviewService;
@GetMapping
public PagedResponse<KnowledgeItemResponse> getKnowledgeItems(@PathVariable("setId") String setId,
@@ -80,6 +84,18 @@ public class KnowledgeItemController {
knowledgeItemApplicationService.previewKnowledgeItemFile(setId, itemId, response);
}
@GetMapping("/{itemId}/preview/status")
public KnowledgeItemPreviewStatusResponse getKnowledgeItemPreviewStatus(@PathVariable("setId") String setId,
@PathVariable("itemId") String itemId) {
return knowledgeItemPreviewService.getPreviewStatus(setId, itemId);
}
@PostMapping("/{itemId}/preview/convert")
public KnowledgeItemPreviewStatusResponse convertKnowledgeItemPreview(@PathVariable("setId") String setId,
@PathVariable("itemId") String itemId) {
return knowledgeItemPreviewService.ensurePreview(setId, itemId);
}
@GetMapping("/{itemId}")
public KnowledgeItemResponse getKnowledgeItemById(@PathVariable("setId") String setId,
@PathVariable("itemId") String itemId) {
@@ -108,4 +124,10 @@ public class KnowledgeItemController {
@PathVariable("itemId") String itemId) {
knowledgeItemApplicationService.deleteKnowledgeItem(setId, itemId);
}
@PostMapping("/batch-delete")
public void deleteKnowledgeItems(@PathVariable("setId") String setId,
@RequestBody @Valid DeleteKnowledgeItemsRequest request) {
knowledgeItemApplicationService.deleteKnowledgeItems(setId, request);
}
}

View File

@@ -0,0 +1,27 @@
package com.datamate.datamanagement.interfaces.rest;
import com.datamate.common.interfaces.PagedResponse;
import com.datamate.datamanagement.application.KnowledgeItemApplicationService;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemSearchQuery;
import com.datamate.datamanagement.interfaces.dto.KnowledgeItemSearchResponse;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
/**
* 知识条目搜索控制器
*/
@Slf4j
@RestController
@RequiredArgsConstructor
@RequestMapping("/data-management/knowledge-items")
public class KnowledgeItemSearchController {
private final KnowledgeItemApplicationService knowledgeItemApplicationService;
@GetMapping("/search")
public PagedResponse<KnowledgeItemSearchResponse> search(KnowledgeItemSearchQuery query) {
return knowledgeItemApplicationService.searchKnowledgeItems(query);
}
}

View File

@@ -1,10 +1,12 @@
package com.datamate.datamanagement.interfaces.rest;
import com.datamate.common.interfaces.PagedResponse;
import com.datamate.datamanagement.application.KnowledgeItemApplicationService;
import com.datamate.datamanagement.application.KnowledgeSetApplicationService;
import com.datamate.datamanagement.domain.model.knowledge.KnowledgeSet;
import com.datamate.datamanagement.interfaces.converter.KnowledgeConverter;
import com.datamate.datamanagement.interfaces.dto.CreateKnowledgeSetRequest;
import com.datamate.datamanagement.interfaces.dto.KnowledgeManagementStatisticsResponse;
import com.datamate.datamanagement.interfaces.dto.KnowledgeSetPagingQuery;
import com.datamate.datamanagement.interfaces.dto.KnowledgeSetResponse;
import com.datamate.datamanagement.interfaces.dto.UpdateKnowledgeSetRequest;
@@ -22,6 +24,7 @@ import org.springframework.web.bind.annotation.*;
@RequestMapping("/data-management/knowledge-sets")
public class KnowledgeSetController {
private final KnowledgeSetApplicationService knowledgeSetApplicationService;
private final KnowledgeItemApplicationService knowledgeItemApplicationService;
@GetMapping
public PagedResponse<KnowledgeSetResponse> getKnowledgeSets(KnowledgeSetPagingQuery query) {
@@ -51,4 +54,9 @@ public class KnowledgeSetController {
public void deleteKnowledgeSet(@PathVariable("setId") String setId) {
knowledgeSetApplicationService.deleteKnowledgeSet(setId);
}
@GetMapping("/statistics")
public KnowledgeManagementStatisticsResponse getKnowledgeManagementStatistics() {
return knowledgeItemApplicationService.getKnowledgeManagementStatistics();
}
}

View File

@@ -42,6 +42,13 @@
SELECT COUNT(*) FROM t_dm_dataset_files WHERE dataset_id = #{datasetId}
</select>
<select id="countNonDerivedByDatasetId" parameterType="string" resultType="long">
SELECT COUNT(*)
FROM t_dm_dataset_files
WHERE dataset_id = #{datasetId}
AND (metadata IS NULL OR JSON_EXTRACT(metadata, '$.derived_from_file_id') IS NULL)
</select>
<select id="countCompletedByDatasetId" parameterType="string" resultType="long">
SELECT COUNT(*) FROM t_dm_dataset_files WHERE dataset_id = #{datasetId} AND status = 'COMPLETED'
</select>
@@ -110,4 +117,16 @@
AND metadata IS NOT NULL
AND JSON_EXTRACT(metadata, '$.derived_from_file_id') IS NOT NULL
</select>
<select id="countNonDerivedByDatasetIds" resultType="com.datamate.datamanagement.infrastructure.persistence.repository.dto.DatasetFileCount">
SELECT dataset_id AS datasetId,
COUNT(*) AS fileCount
FROM t_dm_dataset_files
WHERE dataset_id IN
<foreach collection="datasetIds" item="datasetId" open="(" separator="," close=")">
#{datasetId}
</foreach>
AND (metadata IS NULL OR JSON_EXTRACT(metadata, '$.derived_from_file_id') IS NULL)
GROUP BY dataset_id
</select>
</mapper>

View File

@@ -145,9 +145,10 @@
<select id="getAllDatasetStatistics" resultType="com.datamate.datamanagement.interfaces.dto.AllDatasetStatisticsResponse">
SELECT
COUNT(*) AS total_datasets,
SUM(size_bytes) AS total_size,
SUM(file_count) AS total_files
FROM t_dm_datasets;
(SELECT COUNT(*) FROM t_dm_datasets) AS total_datasets,
(SELECT COALESCE(SUM(size_bytes), 0) FROM t_dm_datasets) AS total_size,
(SELECT COUNT(*)
FROM t_dm_dataset_files
WHERE metadata IS NULL OR JSON_EXTRACT(metadata, '$.derived_from_file_id') IS NULL) AS total_files
</select>
</mapper>

View File

@@ -53,6 +53,19 @@
ORDER BY usage_count DESC, name ASC
</select>
<select id="countKnowledgeSetTags" resultType="long">
SELECT COUNT(DISTINCT t.id)
FROM t_dm_tags t
WHERE EXISTS (
SELECT 1
FROM t_dm_knowledge_sets ks
WHERE ks.tags IS NOT NULL
AND JSON_VALID(ks.tags) = 1
AND JSON_LENGTH(ks.tags) > 0
AND JSON_SEARCH(ks.tags, 'one', t.name, NULL, '$[*].name') IS NOT NULL
)
</select>
<insert id="insert" parameterType="com.datamate.datamanagement.domain.model.dataset.Tag">
INSERT INTO t_dm_tags (id, name, description, category, color, usage_count)
VALUES (#{id}, #{name}, #{description}, #{category}, #{color}, #{usageCount})

View File

@@ -17,6 +17,7 @@ public class SecurityConfig {
@Bean
public SecurityFilterChain filterChain(HttpSecurity http) throws Exception {
http.csrf(csrf -> csrf.disable())
.headers(headers -> headers.frameOptions(frameOptions -> frameOptions.disable()))
.authorizeHttpRequests(authz -> authz
.anyRequest().permitAll() // 允许所有请求无需认证
);

View File

@@ -36,7 +36,9 @@ import org.springframework.util.StringUtils;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.stream.Collectors;
/**
* 知识库服务类
@@ -47,6 +49,7 @@ import java.util.Optional;
@Service
@RequiredArgsConstructor
public class KnowledgeBaseService {
private static final String PATH_SEPARATOR = "/";
private final KnowledgeBaseRepository knowledgeBaseRepository;
private final RagFileRepository ragFileRepository;
private final ApplicationEventPublisher eventPublisher;
@@ -137,6 +140,7 @@ public class KnowledgeBaseService {
return PagedResponse.of(respList, page.getCurrent(), page.getTotal(), page.getPages());
}
@Transactional(rollbackFor = Exception.class)
public void addFiles(AddFilesReq request) {
KnowledgeBase knowledgeBase = Optional.ofNullable(knowledgeBaseRepository.getById(request.getKnowledgeBaseId()))
@@ -146,6 +150,7 @@ public class KnowledgeBaseService {
ragFile.setKnowledgeBaseId(knowledgeBase.getId());
ragFile.setFileId(fileInfo.id());
ragFile.setFileName(fileInfo.fileName());
ragFile.setRelativePath(normalizeRelativePath(fileInfo.relativePath()));
ragFile.setStatus(FileStatus.UNPROCESSED);
return ragFile;
}).toList();
@@ -153,6 +158,17 @@ public class KnowledgeBaseService {
eventPublisher.publishEvent(new DataInsertedEvent(knowledgeBase, request));
}
private String normalizeRelativePath(String relativePath) {
if (!StringUtils.hasText(relativePath)) {
return "";
}
String normalized = relativePath.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
return normalized;
}
public PagedResponse<RagFile> listFiles(String knowledgeBaseId, RagFileReq request) {
IPage<RagFile> page = new Page<>(request.getPage(), request.getSize());
request.setKnowledgeBaseId(knowledgeBaseId);
@@ -160,6 +176,41 @@ public class KnowledgeBaseService {
return PagedResponse.of(page.getRecords(), page.getCurrent(), page.getTotal(), page.getPages());
}
public PagedResponse<KnowledgeBaseFileSearchResp> searchFiles(KnowledgeBaseFileSearchReq request) {
IPage<RagFile> page = new Page<>(request.getPage(), request.getSize());
page = ragFileRepository.searchPage(page, request);
List<RagFile> records = page.getRecords();
if (records.isEmpty()) {
return PagedResponse.of(Collections.emptyList(), page.getCurrent(), page.getTotal(), page.getPages());
}
List<String> knowledgeBaseIds = records.stream()
.map(RagFile::getKnowledgeBaseId)
.filter(StringUtils::hasText)
.distinct()
.toList();
Map<String, String> knowledgeBaseNameMap = knowledgeBaseRepository.listByIds(knowledgeBaseIds).stream()
.collect(Collectors.toMap(KnowledgeBase::getId, KnowledgeBase::getName));
List<KnowledgeBaseFileSearchResp> responses = records.stream()
.map(file -> {
KnowledgeBaseFileSearchResp resp = new KnowledgeBaseFileSearchResp();
resp.setId(file.getId());
resp.setKnowledgeBaseId(file.getKnowledgeBaseId());
resp.setKnowledgeBaseName(knowledgeBaseNameMap.getOrDefault(file.getKnowledgeBaseId(), ""));
resp.setFileName(file.getFileName());
resp.setRelativePath(file.getRelativePath());
resp.setChunkCount(file.getChunkCount());
resp.setStatus(file.getStatus());
resp.setCreatedAt(file.getCreatedAt());
resp.setUpdatedAt(file.getUpdatedAt());
return resp;
})
.toList();
return PagedResponse.of(responses, page.getCurrent(), page.getTotal(), page.getPages());
}
@Transactional(rollbackFor = Exception.class)
public void deleteFiles(String knowledgeBaseId, DeleteFilesReq request) {
KnowledgeBase knowledgeBase = Optional.ofNullable(knowledgeBaseRepository.getById(knowledgeBaseId))
@@ -222,4 +273,4 @@ public class KnowledgeBaseService {
});
return searchResults;
}
}
}

View File

@@ -28,6 +28,10 @@ public class RagFile extends BaseEntity<String> {
* 文件名
*/
private String fileName;
/**
* 相对路径
*/
private String relativePath;
/**
* 文件ID
*/

View File

@@ -3,6 +3,7 @@ package com.datamate.rag.indexer.domain.repository;
import com.baomidou.mybatisplus.core.metadata.IPage;
import com.baomidou.mybatisplus.extension.repository.IRepository;
import com.datamate.rag.indexer.domain.model.RagFile;
import com.datamate.rag.indexer.interfaces.dto.KnowledgeBaseFileSearchReq;
import com.datamate.rag.indexer.interfaces.dto.RagFileReq;
import java.util.List;
@@ -21,4 +22,6 @@ public interface RagFileRepository extends IRepository<RagFile> {
List<RagFile> findAllByKnowledgeBaseId(String knowledgeBaseId);
IPage<RagFile> page(IPage<RagFile> page, RagFileReq request);
IPage<RagFile> searchPage(IPage<RagFile> page, KnowledgeBaseFileSearchReq request);
}

View File

@@ -6,6 +6,7 @@ import com.datamate.rag.indexer.domain.model.FileStatus;
import com.datamate.rag.indexer.domain.model.RagFile;
import com.datamate.rag.indexer.domain.repository.RagFileRepository;
import com.datamate.rag.indexer.infrastructure.persistence.mapper.RagFileMapper;
import com.datamate.rag.indexer.interfaces.dto.KnowledgeBaseFileSearchReq;
import com.datamate.rag.indexer.interfaces.dto.RagFileReq;
import org.springframework.stereotype.Repository;
import org.springframework.util.StringUtils;
@@ -20,6 +21,7 @@ import java.util.List;
*/
@Repository
public class RagFileRepositoryImpl extends CrudRepository<RagFileMapper, RagFile> implements RagFileRepository {
private static final String PATH_SEPARATOR = "/";
@Override
public void removeByKnowledgeBaseId(String knowledgeBaseId) {
lambdaUpdate().eq(RagFile::getKnowledgeBaseId, knowledgeBaseId).remove();
@@ -45,6 +47,27 @@ public class RagFileRepositoryImpl extends CrudRepository<RagFileMapper, RagFile
return lambdaQuery()
.eq(RagFile::getKnowledgeBaseId, request.getKnowledgeBaseId())
.like(StringUtils.hasText(request.getFileName()), RagFile::getFileName, request.getFileName())
.likeRight(StringUtils.hasText(request.getRelativePath()), RagFile::getRelativePath, normalizeRelativePath(request.getRelativePath()))
.page(page);
}
@Override
public IPage<RagFile> searchPage(IPage<RagFile> page, KnowledgeBaseFileSearchReq request) {
return lambdaQuery()
.eq(StringUtils.hasText(request.getKnowledgeBaseId()), RagFile::getKnowledgeBaseId, request.getKnowledgeBaseId())
.like(StringUtils.hasText(request.getFileName()), RagFile::getFileName, request.getFileName())
.likeRight(StringUtils.hasText(request.getRelativePath()), RagFile::getRelativePath, normalizeRelativePath(request.getRelativePath()))
.page(page);
}
private String normalizeRelativePath(String relativePath) {
if (!StringUtils.hasText(relativePath)) {
return "";
}
String normalized = relativePath.replace("\\", PATH_SEPARATOR).trim();
while (normalized.startsWith(PATH_SEPARATOR)) {
normalized = normalized.substring(1);
}
return normalized;
}
}

View File

@@ -80,6 +80,7 @@ public class KnowledgeBaseController {
return knowledgeBaseService.list(request);
}
/**
* 添加文件到知识库
*
@@ -105,6 +106,17 @@ public class KnowledgeBaseController {
return knowledgeBaseService.listFiles(knowledgeBaseId, request);
}
/**
* 全库检索知识库文件(跨知识库)
*
* @param request 检索请求
* @return 文件列表
*/
@GetMapping("/files/search")
public PagedResponse<KnowledgeBaseFileSearchResp> searchFiles(KnowledgeBaseFileSearchReq request) {
return knowledgeBaseService.searchFiles(request);
}
/**
* 删除知识库文件
*
@@ -141,4 +153,4 @@ public class KnowledgeBaseController {
public List<SearchResp.SearchResult> retrieve(@RequestBody @Valid RetrieveReq request) {
return knowledgeBaseService.retrieve(request);
}
}
}

View File

@@ -21,6 +21,6 @@ public class AddFilesReq {
private String delimiter;
private List<FileInfo> files;
public record FileInfo(String id, String fileName) {
public record FileInfo(String id, String fileName, String relativePath) {
}
}

View File

@@ -0,0 +1,19 @@
package com.datamate.rag.indexer.interfaces.dto;
import com.datamate.common.interfaces.PagingQuery;
import lombok.Getter;
import lombok.Setter;
/**
* 知识库文件全库检索请求
*
* @author dallas
* @since 2026-01-30
*/
@Getter
@Setter
public class KnowledgeBaseFileSearchReq extends PagingQuery {
private String fileName;
private String relativePath;
private String knowledgeBaseId;
}

View File

@@ -0,0 +1,27 @@
package com.datamate.rag.indexer.interfaces.dto;
import com.datamate.rag.indexer.domain.model.FileStatus;
import lombok.Getter;
import lombok.Setter;
import java.time.LocalDateTime;
/**
* 知识库文件全库检索响应
*
* @author dallas
* @since 2026-01-30
*/
@Getter
@Setter
public class KnowledgeBaseFileSearchResp {
private String id;
private String knowledgeBaseId;
private String knowledgeBaseName;
private String fileName;
private String relativePath;
private Integer chunkCount;
private FileStatus status;
private LocalDateTime createdAt;
private LocalDateTime updatedAt;
}

View File

@@ -14,5 +14,6 @@ import lombok.Setter;
@Getter
public class RagFileReq extends PagingQuery {
private String fileName;
private String relativePath;
private String knowledgeBaseId;
}

View File

@@ -21,7 +21,7 @@ import java.util.UUID;
*/
@Component
public class FileService {
private static final int DEFAULT_TIMEOUT = 120;
private static final int DEFAULT_TIMEOUT = 1800;
private final ChunkUploadRequestMapper chunkUploadRequestMapper;

View File

@@ -5,7 +5,7 @@ server {
access_log /var/log/datamate/frontend/access.log main;
error_log /var/log/datamate/frontend/error.log notice;
client_max_body_size 1024M;
client_max_body_size 0;
add_header Set-Cookie "NEXT_LOCALE=zh";

View File

@@ -11,6 +11,7 @@ services:
- log_volume:/var/log/datamate
- operator-upload-volume:/operators/upload
- operator-runtime-volume:/operators/extract
- uploads_volume:/uploads
networks: [ datamate ]
depends_on:
- datamate-database
@@ -154,6 +155,8 @@ services:
profiles: [ data-juicer ]
volumes:
uploads_volume:
name: datamate-uploads-volume
dataset_volume:
name: datamate-dataset-volume
flow_volume:

View File

@@ -1,7 +1,7 @@
services:
etcd:
container_name: milvus-etcd
image: quay.io/coreos/etcd:v3.5.18
image: quay.nju.edu.cn/coreos/etcd:v3.5.18
environment:
- ETCD_AUTO_COMPACTION_MODE=revision
- ETCD_AUTO_COMPACTION_RETENTION=1000

View File

@@ -169,6 +169,33 @@
}
}
function isAnnotationObject(value) {
if (!value || typeof value !== "object") return false;
return typeof value.serializeAnnotation === "function" || typeof value.serialize === "function";
}
function resolveSelectedAnnotation(store) {
if (!store) return null;
const annotations = Array.isArray(store.annotations) ? store.annotations : [];
if (isAnnotationObject(store.selectedAnnotation)) {
return store.selectedAnnotation;
}
if (isAnnotationObject(store.selected)) {
return store.selected;
}
const selectedId = store.selected;
if (selectedId !== undefined && selectedId !== null && annotations.length) {
const matched = annotations.find((ann) => ann && String(ann.id) === String(selectedId));
if (isAnnotationObject(matched)) {
return matched;
}
}
if (annotations.length && isAnnotationObject(annotations[0])) {
return annotations[0];
}
return null;
}
function exportSelectedAnnotation() {
if (!lsInstance) {
throw new Error("LabelStudio 未初始化");
@@ -179,10 +206,10 @@
throw new Error("无法访问 annotationStore");
}
const selected =
store.selected ||
store.selectedAnnotation ||
(Array.isArray(store.annotations) && store.annotations.length ? store.annotations[0] : null);
const selected = resolveSelectedAnnotation(store);
if (!selected) {
throw new Error("未找到可导出的标注对象");
}
let serialized = null;
if (selected && typeof selected.serializeAnnotation === "function") {
@@ -197,6 +224,10 @@
? { id: selected?.id || serialized.id || "draft", ...serialized }
: { id: selected?.id || "draft", result: (selected && selected.result) || [] };
if (!Array.isArray(annotationPayload.result) && Array.isArray(annotationPayload.results)) {
annotationPayload.result = annotationPayload.results;
}
// 最小化对齐 Label Studio Server 的字段(DataMate 侧会原样存储)
const taskId = typeof currentTask?.id === "number" ? currentTask.id : Number(currentTask?.id) || null;
const fileId = currentTask?.data?.file_id || currentTask?.data?.fileId || null;
@@ -226,6 +257,52 @@
};
}
function isSaveAndNextShortcut(event) {
if (!event || event.defaultPrevented || event.isComposing) return false;
const key = event.key;
const code = event.code;
const isEnter = key === "Enter" || code === "Enter" || code === "NumpadEnter";
if (!isEnter) return false;
if (!(event.ctrlKey || event.metaKey)) return false;
if (event.shiftKey || event.altKey) return false;
return true;
}
function isSaveShortcut(event) {
if (!event || event.defaultPrevented || event.isComposing) return false;
const key = event.key;
const code = event.code;
const isS = key === "s" || key === "S" || code === "KeyS";
if (!isS) return false;
if (!(event.ctrlKey || event.metaKey)) return false;
if (event.shiftKey || event.altKey) return false;
return true;
}
function handleSaveAndNextShortcut(event) {
if (!isSaveAndNextShortcut(event) || event.repeat) return;
event.preventDefault();
event.stopPropagation();
try {
const raw = exportSelectedAnnotation();
postToParent("LS_SAVE_AND_NEXT", raw);
} catch (e) {
postToParent("LS_ERROR", { message: e?.message || String(e) });
}
}
function handleSaveShortcut(event) {
if (!isSaveShortcut(event) || event.repeat) return;
event.preventDefault();
event.stopPropagation();
try {
const raw = exportSelectedAnnotation();
postToParent("LS_EXPORT_RESULT", raw);
} catch (e) {
postToParent("LS_ERROR", { message: e?.message || String(e) });
}
}
function initLabelStudio(payload) {
if (!window.LabelStudio) {
throw new Error("LabelStudio 未加载(请检查静态资源/网络)");
@@ -296,6 +373,9 @@
});
}
window.addEventListener("keydown", handleSaveAndNextShortcut);
window.addEventListener("keydown", handleSaveShortcut);
window.addEventListener("message", (event) => {
if (event.origin !== ORIGIN) return;

View File

@@ -1,17 +1,17 @@
import { Button, Input, Popover, theme, Tag, Empty } from "antd";
import { PlusOutlined } from "@ant-design/icons";
import { useEffect, useMemo, useState } from "react";
import { useCallback, useEffect, useMemo, useState } from "react";
interface Tag {
id: number;
id?: string | number;
name: string;
color: string;
color?: string;
}
interface AddTagPopoverProps {
tags: Tag[];
onFetchTags?: () => Promise<Tag[]>;
onAddTag?: (tag: Tag) => void;
onAddTag?: (tagName: string) => void;
onCreateAndTag?: (tagName: string) => void;
}
@@ -27,20 +27,23 @@ export default function AddTagPopover({
const [newTag, setNewTag] = useState("");
const [allTags, setAllTags] = useState<Tag[]>([]);
const tagsSet = useMemo(() => new Set(tags.map((tag) => tag.id)), [tags]);
const tagsSet = useMemo(
() => new Set(tags.map((tag) => (tag.id ?? tag.name))),
[tags]
);
const fetchTags = async () => {
const fetchTags = useCallback(async () => {
if (onFetchTags && showPopover) {
const data = await onFetchTags?.();
setAllTags(data || []);
}
};
}, [onFetchTags, showPopover]);
useEffect(() => {
fetchTags();
}, [showPopover]);
}, [fetchTags]);
const availableTags = useMemo(() => {
return allTags.filter((tag) => !tagsSet.has(tag.id));
return allTags.filter((tag) => !tagsSet.has(tag.id ?? tag.name));
}, [allTags, tagsSet]);
const handleCreateAndAddTag = () => {

View File

@@ -276,7 +276,7 @@ function CardView<T extends BaseCardDataType>(props: CardViewProps<T>) {
{formatDateTime(item?.updatedAt)}
</div>
</div>
{operations && (
{operations && ops(item).length > 0 && (
<ActionDropdown
actions={ops(item)}
onAction={(key) => {

View File

@@ -22,44 +22,51 @@ interface OperationItem {
danger?: boolean;
}
interface TagConfig {
showAdd: boolean;
tags: { id: number; name: string; color: string }[];
onFetchTags?: () => Promise<{
data: { id: number; name: string; color: string }[];
}>;
onAddTag?: (tag: { id: number; name: string; color: string }) => void;
onCreateAndTag?: (tagName: string) => void;
}
interface DetailHeaderProps<T> {
data: T;
statistics: StatisticItem[];
operations: OperationItem[];
tagConfig?: TagConfig;
}
function DetailHeader<T>({
data = {} as T,
statistics,
operations,
tagConfig,
}: DetailHeaderProps<T>): React.ReactNode {
interface TagConfig {
showAdd: boolean;
tags: { id?: string | number; name: string; color?: string }[];
onFetchTags?: () => Promise<{ id?: string | number; name: string; color?: string }[]>;
onAddTag?: (tagName: string) => void;
onCreateAndTag?: (tagName: string) => void;
}
interface DetailHeaderData {
name?: string;
description?: string;
status?: { color?: string; icon?: React.ReactNode; label?: string };
tags?: { id?: string | number; name?: string }[];
icon?: React.ReactNode;
iconColor?: string;
}
interface DetailHeaderProps<T extends DetailHeaderData> {
data: T;
statistics: StatisticItem[];
operations: OperationItem[];
tagConfig?: TagConfig;
}
function DetailHeader<T extends DetailHeaderData>({
data = {} as T,
statistics,
operations,
tagConfig,
}: DetailHeaderProps<T>): React.ReactNode {
return (
<Card>
<div className="flex items-start justify-between">
<div className="flex items-start gap-4 flex-1">
<div
className={`w-16 h-16 text-white rounded-lg flex-center shadow-lg ${
(data as any)?.iconColor
? ""
: "bg-gradient-to-br from-sky-300 to-blue-500 text-white"
}`}
style={(data as any)?.iconColor ? { backgroundColor: (data as any).iconColor } : undefined}
>
{<div className="w-[2.8rem] h-[2.8rem] text-gray-50">{(data as any)?.icon}</div> || (
<Database className="w-8 h-8 text-white" />
)}
</div>
data?.iconColor
? ""
: "bg-gradient-to-br from-sky-300 to-blue-500 text-white"
}`}
style={data?.iconColor ? { backgroundColor: data.iconColor } : undefined}
>
{<div className="w-[2.8rem] h-[2.8rem] text-gray-50">{data?.icon}</div> || (
<Database className="w-8 h-8 text-white" />
)}
</div>
<div className="flex-1">
<div className="flex items-center gap-3 mb-2">
<h1 className="text-lg font-bold text-gray-900">{data?.name}</h1>

View File

@@ -0,0 +1,21 @@
import React from 'react';
import { Navigate, useLocation, Outlet } from 'react-router';
import { useAppSelector } from '@/store/hooks';
interface ProtectedRouteProps {
children?: React.ReactNode;
}
const ProtectedRoute: React.FC<ProtectedRouteProps> = ({ children }) => {
const { isAuthenticated } = useAppSelector((state) => state.auth);
const location = useLocation();
if (!isAuthenticated) {
// Redirect to the login page, but save the current location they were trying to go to
return <Navigate to="/login" state={{ from: location }} replace />;
}
return children ? <>{children}</> : <Outlet />;
};
export default ProtectedRoute;

View File

@@ -4,6 +4,7 @@ const TopLoadingBar = () => {
const [isVisible, setIsVisible] = useState(false);
const [progress, setProgress] = useState(0);
const intervalRef = useRef(null);
const timeoutRef = useRef(null);
useEffect(() => {
// 监听全局事件
@@ -33,8 +34,13 @@ const TopLoadingBar = () => {
clearInterval(intervalRef.current);
intervalRef.current = null;
}
// 清除旧的timeout
if (timeoutRef.current) {
clearTimeout(timeoutRef.current);
timeoutRef.current = null;
}
setProgress(100);
setTimeout(() => {
timeoutRef.current = setTimeout(() => {
setIsVisible(false);
setProgress(0);
}, 300);
@@ -49,6 +55,9 @@ const TopLoadingBar = () => {
if (intervalRef.current) {
clearInterval(intervalRef.current);
}
if (timeoutRef.current) {
clearTimeout(timeoutRef.current);
}
window.removeEventListener("loading:show", handleShow);
window.removeEventListener("loading:hide", handleHide);
};

View File

@@ -22,11 +22,10 @@ interface DatasetFileTransferProps
onDatasetSelect?: (dataset: Dataset | null) => void;
datasetTypeFilter?: DatasetType;
hasAnnotationFilter?: boolean;
/**
* 是否排除已被转换为TXT的源文档文件(PDF/DOC/DOCX
* 默认为 true,当 datasetTypeFilter 为 TEXT 时自动启用
*/
excludeSourceDocuments?: boolean;
/**
* 是否排除源文档文件(PDF/DOC/DOCX/XLS/XLSX),文本标注默认启用
*/
excludeSourceDocuments?: boolean;
}
const fileCols = [

View File

@@ -1,198 +1,348 @@
import { TaskItem } from "@/pages/DataManagement/dataset.model";
import { calculateSHA256, checkIsFilesExist } from "@/utils/file.util";
import { App } from "antd";
import { useRef, useState } from "react";
export function useFileSliceUpload(
{
preUpload,
uploadChunk,
cancelUpload,
}: {
preUpload: (id: string, params: any) => Promise<{ data: number }>;
uploadChunk: (id: string, formData: FormData, config: any) => Promise<any>;
cancelUpload: ((reqId: number) => Promise<any>) | null;
},
showTaskCenter = true // 上传时是否显示任务中心
) {
const { message } = App.useApp();
const [taskList, setTaskList] = useState<TaskItem[]>([]);
const taskListRef = useRef<TaskItem[]>([]); // 用于固定任务顺序
const createTask = (detail: any = {}) => {
const { dataset } = detail;
const title = `上传数据集: ${dataset.name} `;
const controller = new AbortController();
const task: TaskItem = {
key: dataset.id,
title,
percent: 0,
reqId: -1,
controller,
size: 0,
updateEvent: detail.updateEvent,
hasArchive: detail.hasArchive,
prefix: detail.prefix,
};
taskListRef.current = [task, ...taskListRef.current];
setTaskList(taskListRef.current);
return task;
};
const updateTaskList = (task: TaskItem) => {
taskListRef.current = taskListRef.current.map((item) =>
item.key === task.key ? task : item
);
setTaskList(taskListRef.current);
};
const removeTask = (task: TaskItem) => {
const { key } = task;
taskListRef.current = taskListRef.current.filter(
(item) => item.key !== key
);
setTaskList(taskListRef.current);
if (task.isCancel && task.cancelFn) {
task.cancelFn();
}
if (task.updateEvent) {
// 携带前缀信息,便于刷新后仍停留在当前目录
window.dispatchEvent(
new CustomEvent(task.updateEvent, {
detail: { prefix: (task as any).prefix },
})
);
}
if (showTaskCenter) {
window.dispatchEvent(
new CustomEvent("show:task-popover", { detail: { show: false } })
);
}
};
async function buildFormData({ file, reqId, i, j }) {
const formData = new FormData();
const { slices, name, size } = file;
const checkSum = await calculateSHA256(slices[j]);
formData.append("file", slices[j]);
formData.append("reqId", reqId.toString());
formData.append("fileNo", (i + 1).toString());
formData.append("chunkNo", (j + 1).toString());
formData.append("fileName", name);
formData.append("fileSize", size.toString());
formData.append("totalChunkNum", slices.length.toString());
formData.append("checkSumHex", checkSum);
return formData;
}
async function uploadSlice(task: TaskItem, fileInfo) {
if (!task) {
return;
}
const { reqId, key } = task;
const { loaded, i, j, files, totalSize } = fileInfo;
const formData = await buildFormData({
file: files[i],
i,
j,
reqId,
});
let newTask = { ...task };
await uploadChunk(key, formData, {
onUploadProgress: (e) => {
const loadedSize = loaded + e.loaded;
const curPercent = Number((loadedSize / totalSize) * 100).toFixed(2);
newTask = {
...newTask,
...taskListRef.current.find((item) => item.key === key),
size: loadedSize,
percent: curPercent >= 100 ? 99.99 : curPercent,
};
updateTaskList(newTask);
},
});
}
async function uploadFile({ task, files, totalSize }) {
console.log('[useSliceUpload] Calling preUpload with prefix:', task.prefix);
const { data: reqId } = await preUpload(task.key, {
totalFileNum: files.length,
totalSize,
datasetId: task.key,
hasArchive: task.hasArchive,
prefix: task.prefix,
});
console.log('[useSliceUpload] PreUpload response reqId:', reqId);
const newTask: TaskItem = {
...task,
reqId,
isCancel: false,
cancelFn: () => {
task.controller.abort();
cancelUpload?.(reqId);
if (task.updateEvent) window.dispatchEvent(new Event(task.updateEvent));
},
};
updateTaskList(newTask);
if (showTaskCenter) {
window.dispatchEvent(
new CustomEvent("show:task-popover", { detail: { show: true } })
);
}
// // 更新数据状态
if (task.updateEvent) window.dispatchEvent(new Event(task.updateEvent));
let loaded = 0;
for (let i = 0; i < files.length; i++) {
const { slices } = files[i];
for (let j = 0; j < slices.length; j++) {
await uploadSlice(newTask, {
loaded,
i,
j,
files,
totalSize,
});
loaded += slices[j].size;
}
}
removeTask(newTask);
}
const handleUpload = async ({ task, files }) => {
const isErrorFile = await checkIsFilesExist(files);
if (isErrorFile) {
message.error("文件被修改或删除,请重新选择文件上传");
removeTask({
...task,
isCancel: false,
...taskListRef.current.find((item) => item.key === task.key),
});
return;
}
try {
const totalSize = files.reduce((acc, file) => acc + file.size, 0);
await uploadFile({ task, files, totalSize });
} catch (err) {
console.error(err);
message.error("文件上传失败,请稍后重试");
removeTask({
...task,
isCancel: true,
...taskListRef.current.find((item) => item.key === task.key),
});
}
};
return {
taskList,
createTask,
removeTask,
handleUpload,
};
}
import { TaskItem } from "@/pages/DataManagement/dataset.model";
import { calculateSHA256, checkIsFilesExist, streamSplitAndUpload, StreamUploadResult } from "@/utils/file.util";
import { App } from "antd";
import { useRef, useState } from "react";
export function useFileSliceUpload(
{
preUpload,
uploadChunk,
cancelUpload,
}: {
preUpload: (id: string, params: Record<string, unknown>) => Promise<{ data: number }>;
uploadChunk: (id: string, formData: FormData, config: Record<string, unknown>) => Promise<unknown>;
cancelUpload: ((reqId: number) => Promise<unknown>) | null;
},
showTaskCenter = true, // 上传时是否显示任务中心
enableStreamUpload = true // 是否启用流式分割上传
) {
const { message } = App.useApp();
const [taskList, setTaskList] = useState<TaskItem[]>([]);
const taskListRef = useRef<TaskItem[]>([]); // 用于固定任务顺序
const createTask = (detail: Record<string, unknown> = {}) => {
const { dataset } = detail;
const title = `上传数据集: ${dataset.name} `;
const controller = new AbortController();
const task: TaskItem = {
key: dataset.id,
title,
percent: 0,
reqId: -1,
controller,
size: 0,
updateEvent: detail.updateEvent,
hasArchive: detail.hasArchive,
prefix: detail.prefix,
};
taskListRef.current = [task, ...taskListRef.current];
setTaskList(taskListRef.current);
// 立即显示任务中心,让用户感知上传已开始
if (showTaskCenter) {
window.dispatchEvent(
new CustomEvent("show:task-popover", { detail: { show: true } })
);
}
return task;
};
const updateTaskList = (task: TaskItem) => {
taskListRef.current = taskListRef.current.map((item) =>
item.key === task.key ? task : item
);
setTaskList(taskListRef.current);
};
const removeTask = (task: TaskItem) => {
const { key } = task;
taskListRef.current = taskListRef.current.filter(
(item) => item.key !== key
);
setTaskList(taskListRef.current);
if (task.isCancel && task.cancelFn) {
task.cancelFn();
}
if (task.updateEvent) {
// 携带前缀信息,便于刷新后仍停留在当前目录
window.dispatchEvent(
new CustomEvent(task.updateEvent, {
detail: { prefix: task.prefix },
})
);
}
if (showTaskCenter) {
window.dispatchEvent(
new CustomEvent("show:task-popover", { detail: { show: false } })
);
}
};
async function buildFormData({ file, reqId, i, j }: { file: { slices: Blob[]; name: string; size: number }; reqId: number; i: number; j: number }) {
const formData = new FormData();
const { slices, name, size } = file;
const checkSum = await calculateSHA256(slices[j]);
formData.append("file", slices[j]);
formData.append("reqId", reqId.toString());
formData.append("fileNo", (i + 1).toString());
formData.append("chunkNo", (j + 1).toString());
formData.append("fileName", name);
formData.append("fileSize", size.toString());
formData.append("totalChunkNum", slices.length.toString());
formData.append("checkSumHex", checkSum);
return formData;
}
async function uploadSlice(task: TaskItem, fileInfo: { loaded: number; i: number; j: number; files: { slices: Blob[]; name: string; size: number }[]; totalSize: number }) {
if (!task) {
return;
}
const { reqId, key } = task;
const { loaded, i, j, files, totalSize } = fileInfo;
const formData = await buildFormData({
file: files[i],
i,
j,
reqId,
});
let newTask = { ...task };
await uploadChunk(key, formData, {
onUploadProgress: (e) => {
const loadedSize = loaded + e.loaded;
const curPercent = Number((loadedSize / totalSize) * 100).toFixed(2);
newTask = {
...newTask,
...taskListRef.current.find((item) => item.key === key),
size: loadedSize,
percent: curPercent >= 100 ? 99.99 : curPercent,
};
updateTaskList(newTask);
},
});
}
async function uploadFile({ task, files, totalSize }: { task: TaskItem; files: { slices: Blob[]; name: string; size: number; originFile: Blob }[]; totalSize: number }) {
console.log('[useSliceUpload] Calling preUpload with prefix:', task.prefix);
const { data: reqId } = await preUpload(task.key, {
totalFileNum: files.length,
totalSize,
datasetId: task.key,
hasArchive: task.hasArchive,
prefix: task.prefix,
});
console.log('[useSliceUpload] PreUpload response reqId:', reqId);
const newTask: TaskItem = {
...task,
reqId,
isCancel: false,
cancelFn: () => {
task.controller.abort();
cancelUpload?.(reqId);
if (task.updateEvent) window.dispatchEvent(new Event(task.updateEvent));
},
};
updateTaskList(newTask);
// 注意:show:task-popover 事件已在 createTask 中触发,此处不再重复触发
// // 更新数据状态
if (task.updateEvent) window.dispatchEvent(new Event(task.updateEvent));
let loaded = 0;
for (let i = 0; i < files.length; i++) {
const { slices } = files[i];
for (let j = 0; j < slices.length; j++) {
await uploadSlice(newTask, {
loaded,
i,
j,
files,
totalSize,
});
loaded += slices[j].size;
}
}
removeTask(newTask);
}
const handleUpload = async ({ task, files }: { task: TaskItem; files: { slices: Blob[]; name: string; size: number; originFile: Blob }[] }) => {
const isErrorFile = await checkIsFilesExist(files);
if (isErrorFile) {
message.error("文件被修改或删除,请重新选择文件上传");
removeTask({
...task,
isCancel: false,
...taskListRef.current.find((item) => item.key === task.key),
});
return;
}
try {
const totalSize = files.reduce((acc, file) => acc + file.size, 0);
await uploadFile({ task, files, totalSize });
} catch (err) {
console.error(err);
message.error("文件上传失败,请稍后重试");
removeTask({
...task,
isCancel: true,
...taskListRef.current.find((item) => item.key === task.key),
});
}
};
/**
* 流式分割上传处理
* 用于大文件按行分割并立即上传的场景
*/
const handleStreamUpload = async ({ task, files }: { task: TaskItem; files: File[] }) => {
try {
console.log('[useSliceUpload] Starting stream upload for', files.length, 'files');
// 预上传,获取 reqId
const totalSize = files.reduce((acc, file) => acc + file.size, 0);
const { data: reqId } = await preUpload(task.key, {
totalFileNum: files.length,
totalSize,
datasetId: task.key,
hasArchive: task.hasArchive,
prefix: task.prefix,
});
console.log('[useSliceUpload] Stream upload preUpload response reqId:', reqId);
const newTask: TaskItem = {
...task,
reqId,
isCancel: false,
cancelFn: () => {
task.controller.abort();
cancelUpload?.(reqId);
if (task.updateEvent) window.dispatchEvent(new Event(task.updateEvent));
},
};
updateTaskList(newTask);
let totalUploadedLines = 0;
let totalProcessedBytes = 0;
const results: StreamUploadResult[] = [];
// 逐个处理文件
for (let i = 0; i < files.length; i++) {
const file = files[i];
console.log(`[useSliceUpload] Processing file ${i + 1}/${files.length}: ${file.name}`);
const result = await streamSplitAndUpload(
file,
(formData, config) => uploadChunk(task.key, formData, config),
(currentBytes, totalBytes, uploadedLines) => {
// 更新进度
const overallBytes = totalProcessedBytes + currentBytes;
const curPercent = Number((overallBytes / totalSize) * 100).toFixed(2);
const updatedTask: TaskItem = {
...newTask,
...taskListRef.current.find((item) => item.key === task.key),
size: overallBytes,
percent: curPercent >= 100 ? 99.99 : curPercent,
streamUploadInfo: {
currentFile: file.name,
fileIndex: i + 1,
totalFiles: files.length,
uploadedLines: totalUploadedLines + uploadedLines,
},
};
updateTaskList(updatedTask);
},
1024 * 1024, // 1MB chunk size
{
reqId,
hasArchive: task.hasArchive,
prefix: task.prefix,
signal: task.controller.signal,
maxConcurrency: 3,
}
);
results.push(result);
totalUploadedLines += result.uploadedCount;
totalProcessedBytes += file.size;
console.log(`[useSliceUpload] File ${file.name} processed, uploaded ${result.uploadedCount} lines`);
}
console.log('[useSliceUpload] Stream upload completed, total lines:', totalUploadedLines);
removeTask(newTask);
message.success(`成功上传 ${totalUploadedLines} 个文件(按行分割)`);
} catch (err) {
console.error('[useSliceUpload] Stream upload error:', err);
if (err.message === "Upload cancelled") {
message.info("上传已取消");
} else {
message.error("文件上传失败,请稍后重试");
}
removeTask({
...task,
isCancel: true,
...taskListRef.current.find((item) => item.key === task.key),
});
}
};
/**
* 注册流式上传事件监听
* 返回注销函数
*/
const registerStreamUploadListener = () => {
if (!enableStreamUpload) return () => {};
const streamUploadHandler = async (e: Event) => {
const customEvent = e as CustomEvent;
const { dataset, files, updateEvent, hasArchive, prefix } = customEvent.detail;
const controller = new AbortController();
const task: TaskItem = {
key: dataset.id,
title: `上传数据集: ${dataset.name} (按行分割)`,
percent: 0,
reqId: -1,
controller,
size: 0,
updateEvent,
hasArchive,
prefix,
};
taskListRef.current = [task, ...taskListRef.current];
setTaskList(taskListRef.current);
// 显示任务中心
if (showTaskCenter) {
window.dispatchEvent(
new CustomEvent("show:task-popover", { detail: { show: true } })
);
}
await handleStreamUpload({ task, files });
};
window.addEventListener("upload:dataset-stream", streamUploadHandler);
return () => {
window.removeEventListener("upload:dataset-stream", streamUploadHandler);
};
};
return {
taskList,
createTask,
removeTask,
handleUpload,
handleStreamUpload,
registerStreamUploadListener,
};
}

View File

@@ -151,6 +151,15 @@ export default function AgentPage() {
const [isTyping, setIsTyping] = useState(false);
const messagesEndRef = useRef<HTMLDivElement>(null);
const inputRef = useRef<any>(null);
const timeoutRef = useRef<NodeJS.Timeout | null>(null);
useEffect(() => {
return () => {
if (timeoutRef.current) {
clearTimeout(timeoutRef.current);
}
};
}, []);
const scrollToBottom = () => {
messagesEndRef.current?.scrollIntoView({ behavior: "smooth" });
@@ -174,8 +183,13 @@ export default function AgentPage() {
setInputValue("");
setIsTyping(true);
// 清理旧的 timeout
if (timeoutRef.current) {
clearTimeout(timeoutRef.current);
}
// 模拟AI响应
setTimeout(() => {
timeoutRef.current = setTimeout(() => {
const response = generateResponse(content);
const assistantMessage: Message = {
id: (Date.now() + 1).toString(),

View File

@@ -3,7 +3,9 @@
* 通过 iframe 加载外部页面
*/
export default function ContentGenerationPage() {
const iframeUrl = "http://192.168.0.8:3000";
const iframeUrl = "/api#/meeting";
window.localStorage.setItem("geeker-user", '{"token":"123","userInfo":{"name":"xteam"},"loginFrom":null,"loginData":null}');
return (
<div className="h-full w-full flex flex-col">
@@ -16,6 +18,11 @@ export default function ContentGenerationPage() {
className="w-full h-full border-0"
title="内容生成"
sandbox="allow-same-origin allow-scripts allow-popups allow-forms allow-downloads"
style={{marginLeft: "-220px",
marginTop: "-66px",
width: "calc(100% + 233px)",
height: "calc(100% + 108px)"
}}
/>
</div>
</div>

View File

@@ -9,6 +9,7 @@ import {
listEditorTasksUsingGet,
upsertEditorAnnotationUsingPut,
} from "../annotation.api";
import { AnnotationResultStatus } from "../annotation.model";
type EditorProjectInfo = {
projectId: string;
@@ -26,6 +27,8 @@ type EditorTaskListItem = {
fileType?: string | null;
hasAnnotation: boolean;
annotationUpdatedAt?: string | null;
annotationStatus?: AnnotationResultStatus | null;
segmentStats?: SegmentStats;
};
type LsfMessage = {
@@ -43,6 +46,11 @@ type SegmentInfo = {
chunkIndex: number;
};
type SegmentStats = {
done: number;
total: number;
};
type ApiResponse<T> = {
code?: number;
message?: string;
@@ -88,6 +96,13 @@ type SwitchDecision = "save" | "discard" | "cancel";
const LSF_IFRAME_SRC = "/lsf/lsf.html";
const TASK_PAGE_START = 0;
const TASK_PAGE_SIZE = 200;
const NO_ANNOTATION_LABEL = "无标注";
const NOT_APPLICABLE_LABEL = "不适用";
const NO_ANNOTATION_CONFIRM_TITLE = "没有标注任何内容";
const NO_ANNOTATION_CONFIRM_OK_TEXT = "设为无标注并保存";
const NOT_APPLICABLE_CONFIRM_TEXT = "设为不适用并保存";
const NO_ANNOTATION_CONFIRM_CANCEL_TEXT = "继续标注";
const SAVE_AND_NEXT_LABEL = "保存并跳转到下一段/下一条";
type NormalizedTaskList = {
items: EditorTaskListItem[];
@@ -103,6 +118,17 @@ const resolveSegmentIndex = (value: unknown) => {
return Number.isFinite(parsed) ? parsed : undefined;
};
const isSaveShortcut = (event: KeyboardEvent) => {
if (event.defaultPrevented || event.isComposing) return false;
const key = event.key;
const code = event.code;
const isS = key === "s" || key === "S" || code === "KeyS";
if (!isS) return false;
if (!(event.ctrlKey || event.metaKey)) return false;
if (event.shiftKey || event.altKey) return false;
return true;
};
const normalizePayload = (payload: unknown): ExportPayload | undefined => {
if (!payload || typeof payload !== "object") return undefined;
return payload as ExportPayload;
@@ -119,6 +145,40 @@ const resolvePayloadMessage = (payload: unknown) => {
const isRecord = (value: unknown): value is Record<string, unknown> =>
!!value && typeof value === "object" && !Array.isArray(value);
const isAnnotationResultEmpty = (annotation?: Record<string, unknown>) => {
if (!annotation) return true;
if (!("result" in annotation)) return true;
const result = (annotation as { result?: unknown }).result;
if (!Array.isArray(result)) return false;
return result.length === 0;
};
const resolveTaskStatusMeta = (item: EditorTaskListItem) => {
const segmentSummary = resolveSegmentSummary(item);
if (segmentSummary) {
if (segmentSummary.done >= segmentSummary.total) {
return { text: "已标注", type: "success" as const };
}
if (segmentSummary.done > 0) {
return { text: "标注中", type: "warning" as const };
}
return { text: "未标注", type: "secondary" as const };
}
if (!item.hasAnnotation) {
return { text: "未标注", type: "secondary" as const };
}
if (item.annotationStatus === AnnotationResultStatus.NO_ANNOTATION) {
return { text: NO_ANNOTATION_LABEL, type: "warning" as const };
}
if (item.annotationStatus === AnnotationResultStatus.NOT_APPLICABLE) {
return { text: NOT_APPLICABLE_LABEL, type: "warning" as const };
}
if (item.annotationStatus === AnnotationResultStatus.IN_PROGRESS) {
return { text: "标注中", type: "warning" as const };
}
return { text: "已标注", type: "success" as const };
};
const normalizeSnapshotValue = (value: unknown, seen: WeakSet<object>): unknown => {
if (!value || typeof value !== "object") return value;
const obj = value as object;
@@ -144,6 +204,7 @@ const stableStringify = (value: unknown) => {
const buildAnnotationSnapshot = (annotation?: Record<string, unknown>) => {
if (!annotation) return "";
if (isAnnotationResultEmpty(annotation)) return "";
const cleaned: Record<string, unknown> = { ...annotation };
delete cleaned.updated_at;
delete cleaned.updatedAt;
@@ -155,6 +216,25 @@ const buildAnnotationSnapshot = (annotation?: Record<string, unknown>) => {
const buildSnapshotKey = (fileId: string, segmentIndex?: number) =>
`${fileId}::${segmentIndex ?? "full"}`;
const buildSegmentStats = (segmentList?: SegmentInfo[] | null): SegmentStats | null => {
if (!Array.isArray(segmentList) || segmentList.length === 0) return null;
const total = segmentList.length;
const done = segmentList.reduce((count, seg) => count + (seg.hasAnnotation ? 1 : 0), 0);
return { done, total };
};
const normalizeSegmentStats = (stats?: SegmentStats | null): SegmentStats | null => {
if (!stats) return null;
const total = Number(stats.total);
const done = Number(stats.done);
if (!Number.isFinite(total) || total <= 0) return null;
const safeDone = Math.min(Math.max(done, 0), total);
return { done: safeDone, total };
};
const resolveSegmentSummary = (item: EditorTaskListItem) =>
normalizeSegmentStats(item.segmentStats);
const mergeTaskItems = (base: EditorTaskListItem[], next: EditorTaskListItem[]) => {
if (next.length === 0) return base;
const seen = new Set(base.map((item) => item.fileId));
@@ -205,6 +285,9 @@ export default function LabelStudioTextEditor() {
const exportCheckSeqRef = useRef(0);
const savedSnapshotsRef = useRef<Record<string, string>>({});
const pendingAutoAdvanceRef = useRef(false);
const segmentStatsCacheRef = useRef<Record<string, SegmentStats>>({});
const segmentStatsSeqRef = useRef(0);
const segmentStatsLoadingRef = useRef<Set<string>>(new Set());
const [loadingProject, setLoadingProject] = useState(true);
const [loadingTasks, setLoadingTasks] = useState(false);
@@ -247,6 +330,100 @@ export default function LabelStudioTextEditor() {
win.postMessage({ type, payload }, origin);
}, [origin]);
const applySegmentStats = useCallback((fileId: string, stats: SegmentStats | null) => {
if (!fileId) return;
const normalized = normalizeSegmentStats(stats);
setTasks((prev) =>
prev.map((item) =>
item.fileId === fileId
? { ...item, segmentStats: normalized || undefined }
: item
)
);
}, []);
const updateSegmentStatsCache = useCallback((fileId: string, stats: SegmentStats | null) => {
if (!fileId) return;
const normalized = normalizeSegmentStats(stats);
if (normalized) {
segmentStatsCacheRef.current[fileId] = normalized;
} else {
delete segmentStatsCacheRef.current[fileId];
}
applySegmentStats(fileId, normalized);
}, [applySegmentStats]);
const fetchSegmentStatsForFile = useCallback(async (fileId: string, seq: number) => {
if (!projectId || !fileId) return;
if (segmentStatsCacheRef.current[fileId] || segmentStatsLoadingRef.current.has(fileId)) return;
segmentStatsLoadingRef.current.add(fileId);
try {
const resp = (await getEditorTaskUsingGet(projectId, fileId, {
segmentIndex: 0,
})) as ApiResponse<EditorTaskResponse>;
if (segmentStatsSeqRef.current !== seq) return;
const data = resp?.data;
if (!data?.segmented) return;
const stats = buildSegmentStats(data.segments);
if (!stats) return;
segmentStatsCacheRef.current[fileId] = stats;
applySegmentStats(fileId, stats);
} catch (e) {
console.error(e);
} finally {
segmentStatsLoadingRef.current.delete(fileId);
}
}, [applySegmentStats, projectId]);
const prefetchSegmentStats = useCallback((items: EditorTaskListItem[]) => {
if (!projectId) return;
const fileIds = items
.map((item) => item.fileId)
.filter((fileId) => fileId && !segmentStatsCacheRef.current[fileId]);
if (fileIds.length === 0) return;
const seq = segmentStatsSeqRef.current;
let cursor = 0;
const workerCount = Math.min(3, fileIds.length);
const runWorker = async () => {
while (cursor < fileIds.length && segmentStatsSeqRef.current === seq) {
const fileId = fileIds[cursor];
cursor += 1;
await fetchSegmentStatsForFile(fileId, seq);
}
};
void Promise.all(Array.from({ length: workerCount }, () => runWorker()));
}, [fetchSegmentStatsForFile, projectId]);
const confirmEmptyAnnotationStatus = useCallback(() => {
return new Promise<AnnotationResultStatus | null>((resolve) => {
let resolved = false;
let modalInstance: { destroy: () => void } | null = null;
const settle = (value: AnnotationResultStatus | null) => {
if (resolved) return;
resolved = true;
resolve(value);
if (modalInstance) modalInstance.destroy();
};
const handleNotApplicable = () => settle(AnnotationResultStatus.NOT_APPLICABLE);
modalInstance = modal.confirm({
title: NO_ANNOTATION_CONFIRM_TITLE,
content: (
<div className="flex flex-col gap-2">
<Typography.Text></Typography.Text>
<Typography.Text type="secondary"></Typography.Text>
<Button type="link" style={{ padding: 0, height: "auto" }} onClick={handleNotApplicable}>
{NOT_APPLICABLE_CONFIRM_TEXT}
</Button>
</div>
),
okText: NO_ANNOTATION_CONFIRM_OK_TEXT,
cancelText: NO_ANNOTATION_CONFIRM_CANCEL_TEXT,
onOk: () => settle(AnnotationResultStatus.NO_ANNOTATION),
onCancel: () => settle(null),
});
});
}, [modal]);
const loadProject = useCallback(async () => {
setLoadingProject(true);
try {
@@ -268,8 +445,13 @@ export default function LabelStudioTextEditor() {
}, [message, projectId]);
const updateTaskSelection = useCallback((items: EditorTaskListItem[]) => {
const isCompleted = (item: EditorTaskListItem) => {
const summary = resolveSegmentSummary(item);
if (summary) return summary.done >= summary.total;
return item.hasAnnotation;
};
const defaultFileId =
items.find((item) => !item.hasAnnotation)?.fileId || items[0]?.fileId || "";
items.find((item) => !isCompleted(item))?.fileId || items[0]?.fileId || "";
setSelectedFileId((prev) => {
if (prev && items.some((item) => item.fileId === prev)) return prev;
return defaultFileId;
@@ -326,6 +508,9 @@ export default function LabelStudioTextEditor() {
if (mode === "reset") {
prefetchSeqRef.current += 1;
setPrefetching(false);
segmentStatsSeqRef.current += 1;
segmentStatsCacheRef.current = {};
segmentStatsLoadingRef.current = new Set();
}
if (mode === "append") {
setLoadingMore(true);
@@ -410,13 +595,16 @@ export default function LabelStudioTextEditor() {
? resolveSegmentIndex(data.currentSegmentIndex) ?? 0
: undefined;
if (data?.segmented) {
const stats = buildSegmentStats(data.segments);
setSegmented(true);
setSegments(data.segments || []);
setCurrentSegmentIndex(segmentIndex ?? 0);
updateSegmentStatsCache(fileId, stats);
} else {
setSegmented(false);
setSegments([]);
setCurrentSegmentIndex(0);
updateSegmentStatsCache(fileId, null);
}
const taskData = {
@@ -476,7 +664,7 @@ export default function LabelStudioTextEditor() {
} finally {
if (seq === initSeqRef.current) setLoadingTaskDetail(false);
}
}, [iframeReady, message, postToIframe, project, projectId]);
}, [iframeReady, message, postToIframe, project, projectId, updateSegmentStatsCache]);
const advanceAfterSave = useCallback(async (fileId: string, segmentIndex?: number) => {
if (!fileId) return;
@@ -539,11 +727,31 @@ export default function LabelStudioTextEditor() {
? currentSegmentIndex
: undefined;
const annotationRecord = annotation as Record<string, unknown>;
const currentTask = tasks.find((item) => item.fileId === String(fileId));
const currentStatus = currentTask?.annotationStatus;
let resolvedStatus: AnnotationResultStatus;
if (isAnnotationResultEmpty(annotationRecord)) {
if (
currentStatus === AnnotationResultStatus.NO_ANNOTATION ||
currentStatus === AnnotationResultStatus.NOT_APPLICABLE
) {
resolvedStatus = currentStatus;
} else {
const selectedStatus = await confirmEmptyAnnotationStatus();
if (!selectedStatus) return false;
resolvedStatus = selectedStatus;
}
} else {
resolvedStatus = AnnotationResultStatus.ANNOTATED;
}
setSaving(true);
try {
const resp = (await upsertEditorAnnotationUsingPut(projectId, String(fileId), {
annotation,
segmentIndex,
annotationStatus: resolvedStatus,
})) as ApiResponse<UpsertAnnotationResponse>;
const updatedAt = resp?.data?.updatedAt;
message.success("标注已保存");
@@ -553,6 +761,7 @@ export default function LabelStudioTextEditor() {
? {
...item,
hasAnnotation: true,
annotationStatus: resolvedStatus,
annotationUpdatedAt: updatedAt || item.annotationUpdatedAt,
}
: item
@@ -565,13 +774,13 @@ export default function LabelStudioTextEditor() {
// 分段模式下更新当前段落的标注状态
if (segmented && segmentIndex !== undefined) {
setSegments((prev) =>
prev.map((seg) =>
seg.idx === segmentIndex
? { ...seg, hasAnnotation: true }
: seg
)
const nextSegments = segments.map((seg) =>
seg.idx === segmentIndex
? { ...seg, hasAnnotation: true }
: seg
);
setSegments(nextSegments);
updateSegmentStatsCache(String(fileId), buildSegmentStats(nextSegments));
}
if (options?.autoAdvance) {
await advanceAfterSave(String(fileId), segmentIndex);
@@ -586,11 +795,15 @@ export default function LabelStudioTextEditor() {
}
}, [
advanceAfterSave,
confirmEmptyAnnotationStatus,
currentSegmentIndex,
message,
projectId,
segmented,
segments,
selectedFileId,
tasks,
updateSegmentStatsCache,
]);
const requestExportForCheck = useCallback(() => {
@@ -650,14 +863,27 @@ export default function LabelStudioTextEditor() {
});
}, [modal]);
const requestExport = () => {
const requestExport = useCallback((autoAdvance: boolean) => {
if (!selectedFileId) {
message.warning("请先选择文件");
return;
}
pendingAutoAdvanceRef.current = true;
pendingAutoAdvanceRef.current = autoAdvance;
postToIframe("LS_EXPORT", {});
};
}, [message, postToIframe, selectedFileId]);
useEffect(() => {
const handleSaveShortcut = (event: KeyboardEvent) => {
if (!isSaveShortcut(event) || event.repeat) return;
if (saving || loadingTaskDetail || segmentSwitching) return;
if (!iframeReady || !lsReady) return;
event.preventDefault();
event.stopPropagation();
requestExport(false);
};
window.addEventListener("keydown", handleSaveShortcut);
return () => window.removeEventListener("keydown", handleSaveShortcut);
}, [iframeReady, loadingTaskDetail, lsReady, requestExport, saving, segmentSwitching]);
// 段落切换处理
const handleSegmentChange = useCallback(async (newIndex: number) => {
@@ -754,6 +980,9 @@ export default function LabelStudioTextEditor() {
setSegments([]);
setCurrentSegmentIndex(0);
savedSnapshotsRef.current = {};
segmentStatsSeqRef.current += 1;
segmentStatsCacheRef.current = {};
segmentStatsLoadingRef.current = new Set();
if (exportCheckRef.current?.timer) {
window.clearTimeout(exportCheckRef.current.timer);
}
@@ -767,6 +996,12 @@ export default function LabelStudioTextEditor() {
loadTasks({ mode: "reset" });
}, [project?.supported, loadTasks]);
useEffect(() => {
if (!segmented) return;
if (tasks.length === 0) return;
prefetchSegmentStats(tasks);
}, [prefetchSegmentStats, segmented, tasks]);
useEffect(() => {
if (!selectedFileId) return;
initEditorForFile(selectedFileId);
@@ -826,6 +1061,15 @@ export default function LabelStudioTextEditor() {
[segmentTreeData]
);
const inProgressSegmentedCount = useMemo(() => {
if (tasks.length === 0) return 0;
return tasks.reduce((count, item) => {
const summary = resolveSegmentSummary(item);
if (!summary) return count;
return summary.done < summary.total ? count + 1 : count;
}, 0);
}, [tasks]);
const handleSegmentSelect = useCallback((keys: Array<string | number>) => {
const [first] = keys;
if (first === undefined || first === null) return;
@@ -865,6 +1109,12 @@ export default function LabelStudioTextEditor() {
return;
}
if (msg.type === "LS_SAVE_AND_NEXT") {
pendingAutoAdvanceRef.current = false;
saveFromExport(payload, { autoAdvance: true });
return;
}
if (msg.type === "LS_EXPORT_CHECK_RESULT") {
const pending = exportCheckRef.current;
if (!pending) return;
@@ -897,6 +1147,8 @@ export default function LabelStudioTextEditor() {
}, [message, origin, saveFromExport]);
const canLoadMore = taskTotalPages > 0 && taskPage + 1 < taskTotalPages;
const saveDisabled =
!iframeReady || !selectedFileId || saving || segmentSwitching || loadingTaskDetail;
const loadMoreNode = canLoadMore ? (
<div className="p-2 text-center">
<Button
@@ -960,7 +1212,7 @@ export default function LabelStudioTextEditor() {
return (
<div className="h-full flex flex-col">
{/* 顶部工具栏 */}
<div className="flex items-center justify-between px-3 py-2 border-b border-gray-200 bg-white">
<div className="grid grid-cols-[1fr_auto_1fr] items-center px-3 py-2 border-b border-gray-200 bg-white">
<div className="flex items-center gap-2">
<Button icon={<LeftOutlined />} onClick={() => navigate("/data/annotation")}>
@@ -974,7 +1226,18 @@ export default function LabelStudioTextEditor() {
</Typography.Title>
</div>
<div className="flex items-center gap-2">
<div className="flex items-center justify-center">
<Button
type="primary"
icon={<SaveOutlined />}
loading={saving}
disabled={saveDisabled}
onClick={() => requestExport(true)}
>
{SAVE_AND_NEXT_LABEL}
</Button>
</div>
<div className="flex items-center gap-2 justify-end">
<Button
icon={<ReloadOutlined />}
loading={loadingTasks}
@@ -983,11 +1246,10 @@ export default function LabelStudioTextEditor() {
</Button>
<Button
type="primary"
icon={<SaveOutlined />}
loading={saving}
disabled={!iframeReady || !selectedFileId}
onClick={requestExport}
disabled={saveDisabled}
onClick={() => requestExport(false)}
>
</Button>
@@ -1001,8 +1263,13 @@ export default function LabelStudioTextEditor() {
className="border-r border-gray-200 bg-gray-50 flex flex-col transition-all duration-200 min-h-0"
style={{ width: sidebarCollapsed ? 0 : 240, overflow: "hidden" }}
>
<div className="px-3 py-2 border-b border-gray-200 bg-white font-medium text-sm">
<div className="px-3 py-2 border-b border-gray-200 bg-white font-medium text-sm flex items-center justify-between gap-2">
<span></span>
{segmented && (
<Tag color="orange" style={{ margin: 0 }}>
{inProgressSegmentedCount}
</Tag>
)}
</div>
<div className="flex-1 min-h-0 overflow-auto">
<List
@@ -1010,37 +1277,45 @@ export default function LabelStudioTextEditor() {
size="small"
dataSource={tasks}
loadMore={loadMoreNode}
renderItem={(item) => (
<List.Item
key={item.fileId}
className="cursor-pointer hover:bg-blue-50"
style={{
background: item.fileId === selectedFileId ? "#e6f4ff" : undefined,
padding: "8px 12px",
borderBottom: "1px solid #f0f0f0",
}}
onClick={() => setSelectedFileId(item.fileId)}
>
<div className="flex flex-col w-full gap-1">
<Typography.Text ellipsis style={{ fontSize: 13 }}>
{item.fileName}
</Typography.Text>
<div className="flex items-center justify-between">
<Typography.Text
type={item.hasAnnotation ? "success" : "secondary"}
style={{ fontSize: 11 }}
>
{item.hasAnnotation ? "已标注" : "未标注"}
</Typography.Text>
{item.annotationUpdatedAt && (
<Typography.Text type="secondary" style={{ fontSize: 10 }}>
{item.annotationUpdatedAt}
renderItem={(item) => {
const segmentSummary = resolveSegmentSummary(item);
const statusMeta = resolveTaskStatusMeta(item);
return (
<List.Item
key={item.fileId}
className="cursor-pointer hover:bg-blue-50"
style={{
background: item.fileId === selectedFileId ? "#e6f4ff" : undefined,
padding: "8px 12px",
borderBottom: "1px solid #f0f0f0",
}}
onClick={() => setSelectedFileId(item.fileId)}
>
<div className="flex flex-col w-full gap-1">
<Typography.Text ellipsis style={{ fontSize: 13 }}>
{item.fileName}
</Typography.Text>
)}
<div className="flex items-center justify-between">
<div className="flex items-center gap-2">
<Typography.Text type={statusMeta.type} style={{ fontSize: 11 }}>
{statusMeta.text}
</Typography.Text>
{segmentSummary && (
<Typography.Text type="secondary" style={{ fontSize: 10 }}>
{segmentSummary.done}/{segmentSummary.total}
</Typography.Text>
)}
</div>
{item.annotationUpdatedAt && (
<Typography.Text type="secondary" style={{ fontSize: 10 }}>
{item.annotationUpdatedAt}
</Typography.Text>
)}
</div>
</div>
</div>
</List.Item>
)}
</List.Item>
);
}}
/>
</div>
{segmented && (

View File

@@ -6,6 +6,12 @@ import TextArea from "antd/es/input/TextArea";
import { useEffect, useMemo, useState } from "react";
import type { ReactNode } from "react";
import { Eye } from "lucide-react";
import {
PREVIEW_TEXT_MAX_LENGTH,
resolvePreviewFileType,
truncatePreviewText,
type PreviewFileType,
} from "@/utils/filePreview";
import {
createAnnotationTaskUsingPost,
getAnnotationTaskByIdUsingGet,
@@ -13,7 +19,8 @@ import {
queryAnnotationTemplatesUsingGet,
} from "../../annotation.api";
import { DatasetType, type Dataset } from "@/pages/DataManagement/dataset.model";
import { DataType, type AnnotationTemplate, type AnnotationTask } from "../../annotation.model";
import { DataType, type AnnotationTemplate } from "../../annotation.model";
import type { AnnotationTaskListItem } from "../../annotation.const";
import LabelStudioEmbed from "@/components/business/LabelStudioEmbed";
import TemplateConfigurationTreeEditor from "../../components/TemplateConfigurationTreeEditor";
import { useTagConfig } from "@/hooks/useTagConfig";
@@ -23,7 +30,7 @@ interface AnnotationTaskDialogProps {
onClose: () => void;
onRefresh: () => void;
/** 编辑模式:传入要编辑的任务数据 */
editTask?: AnnotationTask | null;
editTask?: AnnotationTaskListItem | null;
}
type DatasetOption = Dataset & { icon?: ReactNode };
@@ -53,6 +60,8 @@ const isRecord = (value: unknown): value is Record<string, unknown> =>
!!value && typeof value === "object" && !Array.isArray(value);
const DEFAULT_SEGMENTATION_ENABLED = true;
const FILE_PREVIEW_MAX_HEIGHT = 500;
const PREVIEW_MODAL_WIDTH = "80vw";
const SEGMENTATION_OPTIONS = [
{ label: "需要切片段", value: true },
{ label: "不需要切片段", value: false },
@@ -116,7 +125,7 @@ export default function CreateAnnotationTask({
const [fileContent, setFileContent] = useState("");
const [fileContentLoading, setFileContentLoading] = useState(false);
const [previewFileName, setPreviewFileName] = useState("");
const [previewFileType, setPreviewFileType] = useState<"text" | "image" | "video" | "audio">("text");
const [previewFileType, setPreviewFileType] = useState<PreviewFileType>("text");
const [previewMediaUrl, setPreviewMediaUrl] = useState("");
// 任务详情加载状态(编辑模式)
@@ -275,7 +284,7 @@ export default function CreateAnnotationTask({
}
setDatasetPreviewLoading(true);
try {
// 对于文本数据集,排除已被转换为TXT的源文档文件(PDF/DOC/DOCX)
// 对于文本数据集,排除源文档文件(PDF/DOC/DOCX/XLS/XLSX
const params: { page: number; size: number; excludeSourceDocuments?: boolean } = { page: 0, size: 10 };
if (isTextDataset) {
params.excludeSourceDocuments = true;
@@ -297,57 +306,32 @@ export default function CreateAnnotationTask({
// 预览文件内容
const handlePreviewFileContent = async (file: DatasetPreviewFile) => {
const fileName = file.fileName?.toLowerCase() || '';
// 文件类型扩展名映射
const textExtensions = ['.json', '.jsonl', '.txt', '.csv', '.tsv', '.xml', '.md', '.yaml', '.yml'];
const imageExtensions = ['.jpg', '.jpeg', '.png', '.gif', '.bmp', '.webp', '.svg'];
const videoExtensions = ['.mp4', '.webm', '.ogg', '.mov', '.avi'];
const audioExtensions = ['.mp3', '.wav', '.ogg', '.aac', '.flac', '.m4a'];
const isTextFile = textExtensions.some(ext => fileName.endsWith(ext));
const isImageFile = imageExtensions.some(ext => fileName.endsWith(ext));
const isVideoFile = videoExtensions.some(ext => fileName.endsWith(ext));
const isAudioFile = audioExtensions.some(ext => fileName.endsWith(ext));
if (!isTextFile && !isImageFile && !isVideoFile && !isAudioFile) {
const fileType = resolvePreviewFileType(file.fileName);
if (!fileType) {
message.warning("不支持预览该文件类型");
return;
}
setFileContentLoading(true);
setPreviewFileName(file.fileName);
setPreviewFileType(fileType);
setFileContent("");
setPreviewMediaUrl("");
const fileUrl = `/api/data-management/datasets/${selectedDatasetId}/files/${file.id}/download`;
const previewUrl = `/api/data-management/datasets/${selectedDatasetId}/files/${file.id}/preview`;
try {
if (isTextFile) {
if (fileType === "text") {
// 文本文件:获取内容
const response = await fetch(fileUrl);
const response = await fetch(previewUrl);
if (!response.ok) {
throw new Error('下载失败');
}
const text = await response.text();
// 限制预览内容长度
const maxLength = 50000;
if (text.length > maxLength) {
setFileContent(text.substring(0, maxLength) + '\n\n... (内容过长,仅显示前 50000 字符)');
} else {
setFileContent(text);
}
setPreviewFileType("text");
} else if (isImageFile) {
// 图片文件:直接使用 URL
setPreviewMediaUrl(fileUrl);
setPreviewFileType("image");
} else if (isVideoFile) {
// 视频文件:使用 URL
setPreviewMediaUrl(fileUrl);
setPreviewFileType("video");
} else if (isAudioFile) {
// 音频文件:使用 URL
setPreviewMediaUrl(fileUrl);
setPreviewFileType("audio");
setFileContent(truncatePreviewText(text, PREVIEW_TEXT_MAX_LENGTH));
} else {
// 媒体/PDF 文件:直接使用预览地址
setPreviewMediaUrl(previewUrl);
}
setFileContentVisible(true);
} catch (error) {
@@ -846,7 +830,7 @@ export default function CreateAnnotationTask({
open={showPreview}
onCancel={() => setShowPreview(false)}
title="标注界面预览"
width={1000}
width={PREVIEW_MODAL_WIDTH}
footer={[
<Button key="close" onClick={() => setShowPreview(false)}>
@@ -871,14 +855,14 @@ export default function CreateAnnotationTask({
open={datasetPreviewVisible}
onCancel={() => setDatasetPreviewVisible(false)}
title="数据集预览(前10条文件)"
width={700}
width={PREVIEW_MODAL_WIDTH}
footer={[
<Button key="close" onClick={() => setDatasetPreviewVisible(false)}>
</Button>
]}
>
<div className="mb-2 text-xs text-gray-500"></div>
<div className="mb-2 text-xs text-gray-500">PDF</div>
<Table
dataSource={datasetPreviewData}
columns={[
@@ -928,7 +912,7 @@ export default function CreateAnnotationTask({
setFileContent("");
}}
title={`文件预览:${previewFileName}`}
width={previewFileType === "text" ? 800 : 700}
width={PREVIEW_MODAL_WIDTH}
footer={[
<Button key="close" onClick={() => {
setFileContentVisible(false);
@@ -942,7 +926,7 @@ export default function CreateAnnotationTask({
{previewFileType === "text" && (
<pre
style={{
maxHeight: '500px',
maxHeight: `${FILE_PREVIEW_MAX_HEIGHT}px`,
overflow: 'auto',
backgroundColor: '#f5f5f5',
padding: '12px',
@@ -960,16 +944,23 @@ export default function CreateAnnotationTask({
<img
src={previewMediaUrl}
alt={previewFileName}
style={{ maxWidth: '100%', maxHeight: '500px', objectFit: 'contain' }}
style={{ maxWidth: '100%', maxHeight: `${FILE_PREVIEW_MAX_HEIGHT}px`, objectFit: 'contain' }}
/>
</div>
)}
{previewFileType === "pdf" && (
<iframe
src={previewMediaUrl}
title={previewFileName || "PDF 预览"}
style={{ width: '100%', height: `${FILE_PREVIEW_MAX_HEIGHT}px`, border: 'none' }}
/>
)}
{previewFileType === "video" && (
<div style={{ textAlign: 'center' }}>
<video
src={previewMediaUrl}
controls
style={{ maxWidth: '100%', maxHeight: '500px' }}
style={{ maxWidth: '100%', maxHeight: `${FILE_PREVIEW_MAX_HEIGHT}px` }}
>
</video>

View File

@@ -1,5 +1,5 @@
import { useState } from "react";
import { Card, Button, Table, message, Modal, Tabs } from "antd";
import { Card, Button, Table, Tag, message, Modal, Tabs } from "antd";
import {
PlusOutlined,
EditOutlined,
@@ -10,27 +10,39 @@ import {
import { useNavigate } from "react-router";
import { SearchControls } from "@/components/SearchControls";
import CardView from "@/components/CardView";
import type { AnnotationTask } from "../annotation.model";
import useFetchData from "@/hooks/useFetchData";
import {
deleteAnnotationTaskByIdUsingDelete,
queryAnnotationTasksUsingGet,
} from "../annotation.api";
import { mapAnnotationTask } from "../annotation.const";
import {
AnnotationTypeMap,
mapAnnotationTask,
type AnnotationTaskListItem,
} from "../annotation.const";
import CreateAnnotationTask from "../Create/components/CreateAnnotationTaskDialog";
import ExportAnnotationDialog from "./ExportAnnotationDialog";
import { ColumnType } from "antd/es/table";
import { TemplateList } from "../Template";
// Note: DevelopmentInProgress intentionally not used here
type AnnotationTaskRowKey = string | number;
type AnnotationTaskOperation = {
key: string;
label: string;
icon: JSX.Element;
danger?: boolean;
onClick: (task: AnnotationTaskListItem) => void;
};
export default function DataAnnotation() {
// return <DevelopmentInProgress showTime="2025.10.30" />;
const navigate = useNavigate();
const [activeTab, setActiveTab] = useState("tasks");
const [viewMode, setViewMode] = useState<"list" | "card">("list");
const [showCreateDialog, setShowCreateDialog] = useState(false);
const [exportTask, setExportTask] = useState<AnnotationTask | null>(null);
const [editTask, setEditTask] = useState<AnnotationTask | null>(null);
const [exportTask, setExportTask] = useState<AnnotationTaskListItem | null>(null);
const [editTask, setEditTask] = useState<AnnotationTaskListItem | null>(null);
const {
loading,
@@ -40,13 +52,16 @@ export default function DataAnnotation() {
fetchData,
handleFiltersChange,
handleKeywordChange,
} = useFetchData(queryAnnotationTasksUsingGet, mapAnnotationTask, 30000, true, [], 0);
} = useFetchData<AnnotationTaskListItem>(queryAnnotationTasksUsingGet, mapAnnotationTask, 30000, true, [], 0);
const [selectedRowKeys, setSelectedRowKeys] = useState<(string | number)[]>([]);
const [selectedRows, setSelectedRows] = useState<any[]>([]);
const [selectedRowKeys, setSelectedRowKeys] = useState<AnnotationTaskRowKey[]>([]);
const [selectedRows, setSelectedRows] = useState<AnnotationTaskListItem[]>([]);
const handleAnnotate = (task: AnnotationTask) => {
const projectId = (task as any)?.id;
const toSafeCount = (value: unknown) =>
typeof value === "number" && Number.isFinite(value) ? value : 0;
const handleAnnotate = (task: AnnotationTaskListItem) => {
const projectId = task.id;
if (!projectId) {
message.error("无法进入标注:缺少标注项目ID");
return;
@@ -54,15 +69,15 @@ export default function DataAnnotation() {
navigate(`/data/annotation/annotate/${projectId}`);
};
const handleExport = (task: AnnotationTask) => {
const handleExport = (task: AnnotationTaskListItem) => {
setExportTask(task);
};
const handleEdit = (task: AnnotationTask) => {
const handleEdit = (task: AnnotationTaskListItem) => {
setEditTask(task);
};
const handleDelete = (task: AnnotationTask) => {
const handleDelete = (task: AnnotationTaskListItem) => {
Modal.confirm({
title: `确认删除标注任务「${task.name}」吗?`,
content: "删除标注任务不会删除对应数据集,但会删除该任务的所有标注结果。",
@@ -110,7 +125,7 @@ export default function DataAnnotation() {
});
};
const operations = [
const operations: AnnotationTaskOperation[] = [
{
key: "annotate",
label: "标注",
@@ -142,24 +157,45 @@ export default function DataAnnotation() {
},
];
const columns: ColumnType<any>[] = [
const columns: ColumnType<AnnotationTaskListItem>[] = [
{
title: "序号",
key: "index",
width: 80,
align: "center" as const,
render: (_value: unknown, _record: AnnotationTaskListItem, index: number) => {
const current = pagination.current ?? 1;
const pageSize = pagination.pageSize ?? tableData.length ?? 0;
return (current - 1) * pageSize + index + 1;
},
},
{
title: "任务名称",
dataIndex: "name",
key: "name",
fixed: "left" as const,
},
{
title: "任务ID",
dataIndex: "id",
key: "id",
},
{
title: "数据集",
dataIndex: "datasetName",
key: "datasetName",
width: 180,
},
{
title: "标注类型",
dataIndex: "labelingType",
key: "labelingType",
width: 160,
render: (value?: string) => {
if (!value) {
return "-";
}
const label =
AnnotationTypeMap[value as keyof typeof AnnotationTypeMap]?.label ||
value;
return <Tag color="geekblue">{label}</Tag>;
},
},
{
title: "数据量",
dataIndex: "totalCount",
@@ -173,9 +209,21 @@ export default function DataAnnotation() {
key: "annotatedCount",
width: 100,
align: "center" as const,
render: (value: number, record: any) => {
const total = record.totalCount || 0;
const annotated = value || 0;
render: (value: number, record: AnnotationTaskListItem) => {
const total = toSafeCount(record.totalCount ?? record.total_count);
const annotatedRaw = toSafeCount(
value ?? record.annotatedCount ?? record.annotated_count
);
const segmentationEnabled =
record.segmentationEnabled ?? record.segmentation_enabled;
const inProgressRaw = segmentationEnabled
? toSafeCount(record.inProgressCount ?? record.in_progress_count)
: 0;
const shouldExcludeInProgress =
total > 0 && annotatedRaw + inProgressRaw > total;
const annotated = shouldExcludeInProgress
? Math.max(annotatedRaw - inProgressRaw, 0)
: annotatedRaw;
const percent = total > 0 ? Math.round((annotated / total) * 100) : 0;
return (
<span title={`${annotated}/${total} (${percent}%)`}>
@@ -184,6 +232,23 @@ export default function DataAnnotation() {
);
},
},
{
title: "标注中",
dataIndex: "inProgressCount",
key: "inProgressCount",
width: 100,
align: "center" as const,
render: (value: number, record: AnnotationTaskListItem) => {
const segmentationEnabled =
record.segmentationEnabled ?? record.segmentation_enabled;
if (!segmentationEnabled) return "-";
const resolved =
Number.isFinite(value)
? value
: record.inProgressCount ?? record.in_progress_count ?? 0;
return resolved;
},
},
{
title: "创建时间",
dataIndex: "createdAt",
@@ -202,14 +267,14 @@ export default function DataAnnotation() {
fixed: "right" as const,
width: 150,
dataIndex: "actions",
render: (_: any, task: any) => (
render: (_value: unknown, task: AnnotationTaskListItem) => (
<div className="flex items-center justify-center space-x-1">
{operations.map((operation) => (
<Button
key={operation.key}
type="text"
icon={operation.icon}
onClick={() => (operation?.onClick as any)?.(task)}
onClick={() => operation.onClick(task)}
title={operation.label}
/>
))}
@@ -282,9 +347,9 @@ export default function DataAnnotation() {
pagination={pagination}
rowSelection={{
selectedRowKeys,
onChange: (keys, rows) => {
setSelectedRowKeys(keys as (string | number)[]);
setSelectedRows(rows as any[]);
onChange: (keys: AnnotationTaskRowKey[], rows: AnnotationTaskListItem[]) => {
setSelectedRowKeys(keys);
setSelectedRows(rows);
},
}}
scroll={{ x: "max-content", y: "calc(100vh - 24rem)" }}
@@ -293,7 +358,7 @@ export default function DataAnnotation() {
) : (
<CardView
data={tableData}
operations={operations as any}
operations={operations}
pagination={pagination}
loading={loading}
/>
@@ -327,4 +392,4 @@ export default function DataAnnotation() {
/>
</div>
);
}
}

View File

@@ -106,13 +106,6 @@ export default function ExportAnnotationDialog({
const values = await form.validateFields();
setExporting(true);
const blob = await downloadAnnotationsUsingGet(
projectId,
values.format,
values.onlyAnnotated,
values.includeData
);
// 获取文件名
const formatExt: Record<ExportFormat, string> = {
json: "json",
@@ -124,15 +117,14 @@ export default function ExportAnnotationDialog({
const ext = formatExt[values.format as ExportFormat] || "json";
const filename = `${projectName}_annotations.${ext}`;
// 下载文件
const url = window.URL.createObjectURL(blob as Blob);
const a = document.createElement("a");
a.href = url;
a.download = filename;
document.body.appendChild(a);
a.click();
window.URL.revokeObjectURL(url);
document.body.removeChild(a);
// 下载文件(download函数内部已处理下载逻辑)
await downloadAnnotationsUsingGet(
projectId,
values.format,
values.onlyAnnotated,
values.includeData,
filename
);
message.success("导出成功");
onClose();
@@ -186,14 +178,15 @@ export default function ExportAnnotationDialog({
<Select
options={FORMAT_OPTIONS.map((opt) => ({
label: (
<div>
<div className="py-1">
<div className="font-medium">{opt.label}</div>
<div className="text-xs text-gray-400">{opt.description}</div>
</div>
),
value: opt.value,
simpleLabel: opt.label,
}))}
optionLabelProp="label"
optionLabelProp="simpleLabel"
/>
</Form.Item>

View File

@@ -43,14 +43,6 @@ const TemplateDetail: React.FC<TemplateDetailProps> = ({
<Descriptions.Item label="样式">
{template.style}
</Descriptions.Item>
<Descriptions.Item label="类型">
<Tag color={template.builtIn ? "gold" : "default"}>
{template.builtIn ? "系统内置" : "自定义"}
</Tag>
</Descriptions.Item>
<Descriptions.Item label="版本">
{template.version}
</Descriptions.Item>
<Descriptions.Item label="创建时间" span={2}>
{new Date(template.createdAt).toLocaleString()}
</Descriptions.Item>

View File

@@ -36,6 +36,7 @@ const TemplateForm: React.FC<TemplateFormProps> = ({
const [form] = Form.useForm();
const [loading, setLoading] = useState(false);
const [labelConfig, setLabelConfig] = useState("");
const selectedDataType = Form.useWatch("dataType", form);
useEffect(() => {
if (visible && template && mode === "edit") {
@@ -96,8 +97,12 @@ const TemplateForm: React.FC<TemplateFormProps> = ({
} else {
message.error(response.message || `模板${mode === "create" ? "创建" : "更新"}失败`);
}
} catch (error: any) {
if (error.errorFields) {
} catch (error: unknown) {
const hasErrorFields =
typeof error === "object" &&
error !== null &&
"errorFields" in error;
if (hasErrorFields) {
message.error("请填写所有必填字段");
} else {
message.error(`模板${mode === "create" ? "创建" : "更新"}失败`);
@@ -195,6 +200,7 @@ const TemplateForm: React.FC<TemplateFormProps> = ({
value={labelConfig}
onChange={setLabelConfig}
height={420}
dataType={selectedDataType}
/>
</div>
</Form>

View File

@@ -1,4 +1,4 @@
import React, { useState } from "react";
import React, { useState, useEffect } from "react";
import {
Button,
Table,
@@ -32,7 +32,16 @@ import {
TemplateTypeMap
} from "@/pages/DataAnnotation/annotation.const.tsx";
const TEMPLATE_ADMIN_KEY = "datamate_template_admin";
const TemplateList: React.FC = () => {
const [isAdmin, setIsAdmin] = useState(false);
useEffect(() => {
// 检查 localStorage 中是否存在特殊键
const hasAdminKey = localStorage.getItem(TEMPLATE_ADMIN_KEY) !== null;
setIsAdmin(hasAdminKey);
}, []);
const filterOptions = [
{
key: "category",
@@ -225,23 +234,7 @@ const TemplateList: React.FC = () => {
<Tag color={getCategoryColor(category)}>{ClassificationMap[category as keyof typeof ClassificationMap]?.label || category}</Tag>
),
},
{
title: "类型",
dataIndex: "builtIn",
key: "builtIn",
width: 100,
render: (builtIn: boolean) => (
<Tag color={builtIn ? "gold" : "default"}>
{builtIn ? "系统内置" : "自定义"}
</Tag>
),
},
{
title: "版本",
dataIndex: "version",
key: "version",
width: 80,
},
{
title: "创建时间",
dataIndex: "createdAt",
@@ -263,29 +256,31 @@ const TemplateList: React.FC = () => {
onClick={() => handleView(record)}
/>
</Tooltip>
<>
<Tooltip title="编辑">
<Button
type="link"
icon={<EditOutlined />}
onClick={() => handleEdit(record)}
/>
</Tooltip>
<Popconfirm
title="确定要删除这个模板吗?"
onConfirm={() => handleDelete(record.id)}
okText="确定"
cancelText="取消"
>
<Tooltip title="删除">
{isAdmin && (
<>
<Tooltip title="编辑">
<Button
type="link"
danger
icon={<DeleteOutlined />}
icon={<EditOutlined />}
onClick={() => handleEdit(record)}
/>
</Tooltip>
</Popconfirm>
</>
<Popconfirm
title="确定要删除这个模板吗?"
onConfirm={() => handleDelete(record.id)}
okText="确定"
cancelText="取消"
>
<Tooltip title="删除">
<Button
type="link"
danger
icon={<DeleteOutlined />}
/>
</Tooltip>
</Popconfirm>
</>
)}
</Space>
),
},
@@ -310,11 +305,13 @@ const TemplateList: React.FC = () => {
</div>
{/* Right side: Create button */}
<div className="flex items-center gap-2">
<Button type="primary" icon={<PlusOutlined />} onClick={handleCreate}>
</Button>
</div>
{isAdmin && (
<div className="flex items-center gap-2">
<Button type="primary" icon={<PlusOutlined />} onClick={handleCreate}>
</Button>
</div>
)}
</div>
<Card>

View File

@@ -18,6 +18,7 @@ import {
import { TagBrowser } from "./components";
const { Paragraph } = Typography;
const PREVIEW_DRAWER_WIDTH = "80vw";
interface VisualTemplateBuilderProps {
onSave?: (templateCode: string) => void;
@@ -129,7 +130,7 @@ const VisualTemplateBuilder: React.FC<VisualTemplateBuilderProps> = ({
<Drawer
title="模板代码预览"
placement="right"
width={600}
width={PREVIEW_DRAWER_WIDTH}
open={previewVisible}
onClose={() => setPreviewVisible(false)}
>

View File

@@ -109,12 +109,13 @@ export function downloadAnnotationsUsingGet(
projectId: string,
format: ExportFormat = "json",
onlyAnnotated: boolean = true,
includeData: boolean = false
includeData: boolean = false,
filename?: string
) {
const params = new URLSearchParams({
format,
only_annotated: String(onlyAnnotated),
include_data: String(includeData),
});
return download(`/api/annotation/export/projects/${projectId}/download?${params.toString()}`);
return download(`/api/annotation/export/projects/${projectId}/download?${params.toString()}`, null, filename);
}

View File

@@ -6,6 +6,71 @@ import {
CloseCircleOutlined,
} from "@ant-design/icons";
type AnnotationTaskStatistics = {
accuracy?: number | string;
averageTime?: number | string;
reviewCount?: number | string;
};
type AnnotationTaskPayload = {
id?: string;
labelingProjId?: string;
labelingProjectId?: string;
projId?: string;
labeling_project_id?: string;
name?: string;
description?: string;
datasetId?: string;
datasetName?: string;
dataset_name?: string;
labelingType?: string;
labeling_type?: string;
template?: {
labelingType?: string;
labeling_type?: string;
};
totalCount?: number;
total_count?: number;
annotatedCount?: number;
annotated_count?: number;
inProgressCount?: number;
in_progress_count?: number;
segmentationEnabled?: boolean;
segmentation_enabled?: boolean;
createdAt?: string;
created_at?: string;
updatedAt?: string;
updated_at?: string;
status?: string;
statistics?: AnnotationTaskStatistics;
[key: string]: unknown;
};
export type AnnotationTaskListItem = {
id?: string;
labelingProjId?: string;
projId?: string;
name?: string;
description?: string;
datasetId?: string;
datasetName?: string;
labelingType?: string;
totalCount?: number;
annotatedCount?: number;
inProgressCount?: number;
segmentationEnabled?: boolean;
createdAt?: string;
updatedAt?: string;
icon?: JSX.Element;
iconColor?: string;
status?: {
label: string;
color: string;
};
statistics?: { label: string; value: string | number }[];
[key: string]: unknown;
};
export const AnnotationTaskStatusMap = {
[AnnotationTaskStatus.ACTIVE]: {
label: "活跃",
@@ -27,9 +92,16 @@ export const AnnotationTaskStatusMap = {
},
};
export function mapAnnotationTask(task: any) {
export function mapAnnotationTask(task: AnnotationTaskPayload): AnnotationTaskListItem {
// Normalize labeling project id from possible backend field names
const labelingProjId = task?.labelingProjId || task?.labelingProjectId || task?.projId || task?.labeling_project_id || "";
const segmentationEnabled = task?.segmentationEnabled ?? task?.segmentation_enabled ?? false;
const inProgressCount = task?.inProgressCount ?? task?.in_progress_count ?? 0;
const labelingType =
task?.labelingType ||
task?.labeling_type ||
task?.template?.labelingType ||
task?.template?.labeling_type;
const statsArray = task?.statistics
? [
@@ -45,6 +117,9 @@ export function mapAnnotationTask(task: any) {
// provide consistent field for components
labelingProjId,
projId: labelingProjId,
segmentationEnabled,
inProgressCount,
labelingType,
name: task.name,
description: task.description || "",
datasetName: task.datasetName || task.dataset_name || "-",
@@ -478,4 +553,4 @@ export const TemplateTypeMap = {
label: "自定义",
value: TemplateType.CUSTOM
},
}
}

View File

@@ -8,6 +8,13 @@ export enum AnnotationTaskStatus {
SKIPPED = "skipped",
}
export enum AnnotationResultStatus {
ANNOTATED = "ANNOTATED",
IN_PROGRESS = "IN_PROGRESS",
NO_ANNOTATION = "NO_ANNOTATION",
NOT_APPLICABLE = "NOT_APPLICABLE",
}
export interface AnnotationTask {
id: string;
name: string;
@@ -52,7 +59,7 @@ export interface ObjectDefinition {
export interface TemplateConfiguration {
labels: LabelDefinition[];
objects: ObjectDefinition[];
metadata?: Record<string, any>;
metadata?: Record<string, unknown>;
}
export interface AnnotationTemplate {

View File

@@ -22,6 +22,7 @@ import {
getObjectDisplayName,
type LabelStudioTagConfig,
} from "../annotation.tagconfig";
import { DataType } from "../annotation.model";
const { Text, Title } = Typography;
@@ -44,10 +45,22 @@ interface TemplateConfigurationTreeEditorProps {
readOnly?: boolean;
readOnlyStructure?: boolean;
height?: number | string;
dataType?: DataType;
}
const DEFAULT_ROOT_TAG = "View";
const CHILD_TAGS = ["Label", "Choice", "Relation", "Item", "Path", "Channel"];
const OBJECT_TAGS_BY_DATA_TYPE: Record<DataType, string[]> = {
[DataType.TEXT]: ["Text", "Paragraphs", "Markdown"],
[DataType.IMAGE]: ["Image", "Bitmask"],
[DataType.AUDIO]: ["Audio", "AudioPlus"],
[DataType.VIDEO]: ["Video"],
[DataType.PDF]: ["PDF"],
[DataType.TIMESERIES]: ["Timeseries", "TimeSeries", "Vector"],
[DataType.CHAT]: ["Chat"],
[DataType.HTML]: ["HyperText", "Markdown"],
[DataType.TABLE]: ["Table", "Vector"],
};
const createId = () =>
`node_${Date.now().toString(36)}_${Math.random().toString(36).slice(2, 8)}`;
@@ -247,18 +260,34 @@ const createNode = (
attrs[attr] = "";
});
if (objectConfig && attrs.name !== undefined) {
if (objectConfig) {
const name = getDefaultName(tag);
attrs.name = name;
if (attrs.value !== undefined) {
attrs.value = `$${name}`;
if (!attrs.name) {
attrs.name = name;
}
if (!attrs.value) {
attrs.value = `$${attrs.name}`;
}
}
if (controlConfig && attrs.name !== undefined) {
attrs.name = getDefaultName(tag);
if (attrs.toName !== undefined) {
attrs.toName = objectNames[0] || "";
if (controlConfig) {
const isLabeling = controlConfig.category === "labeling";
if (isLabeling) {
if (!attrs.name) {
attrs.name = getDefaultName(tag);
}
if (!attrs.toName) {
attrs.toName = objectNames[0] || "";
}
} else {
// For layout controls, only fill if required
if (attrs.name !== undefined && !attrs.name) {
attrs.name = getDefaultName(tag);
}
if (attrs.toName !== undefined && !attrs.toName) {
attrs.toName = objectNames[0] || "";
}
}
}
@@ -420,14 +449,13 @@ const TemplateConfigurationTreeEditor = ({
readOnly = false,
readOnlyStructure = false,
height = 420,
dataType,
}: TemplateConfigurationTreeEditorProps) => {
const { config } = useTagConfig(false);
const [tree, setTree] = useState<XmlNode>(() => createEmptyTree());
const [selectedId, setSelectedId] = useState<string>(tree.id);
const [parseError, setParseError] = useState<string | null>(null);
const lastSerialized = useRef<string>("");
const [addChildTag, setAddChildTag] = useState<string | undefined>();
const [addSiblingTag, setAddSiblingTag] = useState<string | undefined>();
useEffect(() => {
if (!value) {
@@ -498,11 +526,17 @@ const TemplateConfigurationTreeEditor = ({
const objectOptions = useMemo(() => {
if (!config?.objects) return [];
return Object.keys(config.objects).map((tag) => ({
const options = Object.keys(config.objects).map((tag) => ({
value: tag,
label: getObjectDisplayName(tag),
}));
}, [config]);
if (!dataType) return options;
const allowedTags = OBJECT_TAGS_BY_DATA_TYPE[dataType];
if (!allowedTags) return options;
const allowedSet = new Set(allowedTags);
const filtered = options.filter((option) => allowedSet.has(option.value));
return filtered.length > 0 ? filtered : options;
}, [config, dataType]);
const tagOptions = useMemo(() => {
const options = [] as {
@@ -763,9 +797,8 @@ const TemplateConfigurationTreeEditor = ({
<Select
placeholder="添加子节点"
options={tagOptions}
value={addChildTag}
value={null}
onChange={(value) => {
setAddChildTag(undefined);
handleAddNode(value, "child");
}}
disabled={isStructureLocked}
@@ -773,9 +806,8 @@ const TemplateConfigurationTreeEditor = ({
<Select
placeholder="添加同级节点"
options={tagOptions}
value={addSiblingTag}
value={null}
onChange={(value) => {
setAddSiblingTag(undefined);
handleAddNode(value, "sibling");
}}
disabled={isStructureLocked || selectedNode.id === tree.id}

View File

@@ -7,6 +7,8 @@ interface PreviewPromptModalProps {
evaluationPrompt: string;
}
const PREVIEW_MODAL_WIDTH = "80vw";
const PreviewPromptModal: React.FC<PreviewPromptModalProps> = ({ previewVisible, onCancel, evaluationPrompt }) => {
return (
<Modal
@@ -24,7 +26,7 @@ const PreviewPromptModal: React.FC<PreviewPromptModalProps> = ({ previewVisible,
</Button>
]}
width={800}
width={PREVIEW_MODAL_WIDTH}
>
<div style={{
background: '#f5f5f5',

View File

@@ -78,7 +78,11 @@ export default function DatasetCreate() {
onValuesChange={handleValuesChange}
layout="vertical"
>
<BasicInformation data={newDataset} setData={setNewDataset} />
<BasicInformation
data={newDataset}
setData={setNewDataset}
hidden={["dataSource"]}
/>
</Form>
</div>
<div className="flex gap-2 justify-end p-6 border-top">

View File

@@ -96,7 +96,7 @@ export default function EditDataset({
<BasicInformation
data={newDataset}
setData={setNewDataset}
hidden={["datasetType"]}
hidden={["datasetType", "dataSource"]}
/>
</Form>
</Modal>

View File

@@ -11,10 +11,12 @@ export default function BasicInformation({
data,
setData,
hidden = [],
datasetTypeOptions = datasetTypes,
}: {
data: DatasetFormData;
setData: Dispatch<SetStateAction<DatasetFormData>>;
hidden?: string[];
datasetTypeOptions?: DatasetTypeOption[];
}) {
const [tagOptions, setTagOptions] = useState<DatasetTagOption[]>([]);
const [collectionOptions, setCollectionOptions] = useState<SelectOption[]>([]);
@@ -39,6 +41,7 @@ export default function BasicInformation({
// 获取归集任务
const fetchCollectionTasks = useCallback(async () => {
if (hidden.includes("dataSource")) return;
try {
const res = await queryTasksUsingGet({ page: 0, size: 100 });
const tasks = Array.isArray(res?.data?.content)
@@ -52,7 +55,7 @@ export default function BasicInformation({
} catch (error) {
console.error("Error fetching collection tasks:", error);
}
}, []);
}, [hidden]);
const fetchParentDatasets = useCallback(async () => {
if (hidden.includes("parentDatasetId")) return;
@@ -73,7 +76,7 @@ export default function BasicInformation({
value: dataset.id,
}));
setParentDatasetOptions([
{ label: "数据集", value: "" },
{ label: "无关联数据集", value: "" },
...options,
]);
} catch (error) {
@@ -101,11 +104,11 @@ export default function BasicInformation({
</Form.Item>
)}
{!hidden.includes("parentDatasetId") && (
<Form.Item name="parentDatasetId" label="数据集">
<Form.Item name="parentDatasetId" label="关联数据集">
<Select
className="w-full"
options={parentDatasetOptions}
placeholder="选择数据集(仅支持一层)"
placeholder="选择关联数据集(仅支持一层)"
/>
</Form.Item>
)}
@@ -118,7 +121,7 @@ export default function BasicInformation({
rules={[{ required: true, message: "请选择数据集类型" }]}
>
<RadioCard
options={datasetTypes}
options={datasetTypeOptions}
value={data.type}
onChange={(datasetType) => setData({ ...data, datasetType })}
/>
@@ -148,6 +151,8 @@ type DatasetFormData = Partial<Dataset> & {
parentDatasetId?: string;
};
type DatasetTypeOption = (typeof datasetTypes)[number];
type DatasetTagOption = {
label: string;
value: string;

View File

@@ -1,195 +1,216 @@
import { useEffect, useMemo, useRef, useState } from "react";
import { Breadcrumb, App, Tabs, Table, Tag } from "antd";
import { useEffect, useMemo, useRef, useState } from "react";
import { Breadcrumb, App, Tabs, Table, Tag } from "antd";
import {
ReloadOutlined,
DownloadOutlined,
EditOutlined,
DeleteOutlined,
PlusOutlined,
} from "@ant-design/icons";
ReloadOutlined,
DownloadOutlined,
EditOutlined,
DeleteOutlined,
PlusOutlined,
} from "@ant-design/icons";
import DetailHeader from "@/components/DetailHeader";
import { mapDataset, datasetTypeMap } from "../dataset.const";
import type { Dataset } from "@/pages/DataManagement/dataset.model";
import { Link, useNavigate, useParams } from "react-router";
import { useFilesOperation } from "./useFilesOperation";
import {
createDatasetTagUsingPost,
deleteDatasetByIdUsingDelete,
downloadDatasetUsingGet,
queryDatasetByIdUsingGet,
queryDatasetsUsingGet,
queryDatasetTagsUsingGet,
querySimilarDatasetsUsingGet,
updateDatasetByIdUsingPut,
} from "../dataset.api";
import {
createDatasetTagUsingPost,
deleteDatasetByIdUsingDelete,
downloadDatasetUsingGet,
queryDatasetByIdUsingGet,
queryDatasetsUsingGet,
queryDatasetTagsUsingGet,
querySimilarDatasetsUsingGet,
updateDatasetByIdUsingPut,
} from "../dataset.api";
import DataQuality from "./components/DataQuality";
import DataLineageFlow from "./components/DataLineageFlow";
import Overview from "./components/Overview";
import { Activity, Clock, File, FileType } from "lucide-react";
import EditDataset from "../Create/EditDataset";
import ImportConfiguration from "./components/ImportConfiguration";
const SIMILAR_DATASET_LIMIT = 4;
export default function DatasetDetail() {
const { id } = useParams(); // 获取动态路由参数
const navigate = useNavigate();
const [activeTab, setActiveTab] = useState("overview");
const { message } = App.useApp();
const [showEditDialog, setShowEditDialog] = useState(false);
const [dataset, setDataset] = useState<Dataset>({} as Dataset);
const [parentDataset, setParentDataset] = useState<Dataset | null>(null);
const [childDatasets, setChildDatasets] = useState<Dataset[]>([]);
const [childDatasetsLoading, setChildDatasetsLoading] = useState(false);
const [similarDatasets, setSimilarDatasets] = useState<Dataset[]>([]);
const [similarDatasetsLoading, setSimilarDatasetsLoading] = useState(false);
const [similarTagNames, setSimilarTagNames] = useState<string[]>([]);
const similarRequestRef = useRef(0);
const filesOperation = useFilesOperation(dataset);
import ImportConfiguration from "./components/ImportConfiguration";
import CardView from "@/components/CardView";
const [showUploadDialog, setShowUploadDialog] = useState(false);
const normalizeTagNames = (
tags?: Array<string | { name?: string | null } | null>
) => {
if (!tags || tags.length === 0) {
return [];
}
const names = tags
.map((tag) => (typeof tag === "string" ? tag : tag?.name))
.filter((name): name is string => !!name && name.trim().length > 0)
.map((name) => name.trim());
return Array.from(new Set(names));
};
const fetchSimilarDatasets = async (currentDataset: Dataset) => {
const requestId = similarRequestRef.current + 1;
similarRequestRef.current = requestId;
if (!currentDataset?.id) {
setSimilarDatasets([]);
setSimilarTagNames([]);
setSimilarDatasetsLoading(false);
return;
}
const tagNames = normalizeTagNames(
currentDataset.tags as Array<string | { name?: string }>
);
setSimilarTagNames(tagNames);
setSimilarDatasets([]);
if (tagNames.length === 0) {
setSimilarDatasetsLoading(false);
return;
}
setSimilarDatasetsLoading(true);
try {
const { data } = await querySimilarDatasetsUsingGet(currentDataset.id, {
limit: SIMILAR_DATASET_LIMIT,
});
if (similarRequestRef.current !== requestId) {
return;
}
const list = Array.isArray(data) ? data : [];
setSimilarDatasets(list.map((item) => mapDataset(item)));
} catch (error) {
console.error("Failed to fetch similar datasets:", error);
} finally {
if (similarRequestRef.current === requestId) {
setSimilarDatasetsLoading(false);
}
}
};
const navigateItems = useMemo(() => {
const items = [
{
title: <Link to="/data/management"></Link>,
},
];
if (parentDataset) {
items.push({
title: (
<Link to={`/data/management/detail/${parentDataset.id}`}>
{parentDataset.name}
</Link>
),
});
}
items.push({
title: dataset.name || "数据集详情",
});
return items;
}, [dataset, parentDataset]);
const tabList = useMemo(() => {
const items = [
{
key: "overview",
label: "概览",
},
];
if (!dataset?.parentDatasetId) {
items.push({
key: "children",
label: "子数据集",
});
}
return items;
}, [dataset?.parentDatasetId]);
const handleCreateChildDataset = () => {
if (!dataset?.id) {
return;
}
navigate("/data/management/create", {
state: { parentDatasetId: dataset.id },
});
};
const fetchChildDatasets = async (parentId?: string) => {
if (!parentId) {
setChildDatasets([]);
return;
}
setChildDatasetsLoading(true);
try {
const { data: res } = await queryDatasetsUsingGet({
parentDatasetId: parentId,
page: 1,
size: 1000,
});
const list = res?.content || res?.data || [];
setChildDatasets(list.map((item) => mapDataset(item)));
} finally {
setChildDatasetsLoading(false);
}
};
const fetchDataset = async () => {
if (!id) {
return;
}
const { data } = await queryDatasetByIdUsingGet(id);
const mapped = mapDataset(data);
setDataset(mapped);
fetchSimilarDatasets(mapped);
if (data?.parentDatasetId) {
const { data: parentData } = await queryDatasetByIdUsingGet(
data.parentDatasetId
);
setParentDataset(mapDataset(parentData));
setChildDatasets([]);
} else {
setParentDataset(null);
await fetchChildDatasets(data?.id);
}
};
const SIMILAR_DATASET_LIMIT = 4;
const SIMILAR_TAGS_PREVIEW_LIMIT = 3;
useEffect(() => {
if (!id) {
return;
}
fetchDataset();
filesOperation.fetchFiles("", 1, 10); // 从根目录开始,第一页
}, [id]);
useEffect(() => {
if (dataset?.parentDatasetId && activeTab === "children") {
setActiveTab("overview");
}
}, [activeTab, dataset?.parentDatasetId]);
export default function DatasetDetail() {
const { id } = useParams(); // 获取动态路由参数
const navigate = useNavigate();
const [activeTab, setActiveTab] = useState("overview");
const { message } = App.useApp();
const [showEditDialog, setShowEditDialog] = useState(false);
const [dataset, setDataset] = useState<Dataset>({} as Dataset);
const [parentDataset, setParentDataset] = useState<Dataset | null>(null);
const [childDatasets, setChildDatasets] = useState<Dataset[]>([]);
const [childDatasetsLoading, setChildDatasetsLoading] = useState(false);
const [similarDatasets, setSimilarDatasets] = useState<Dataset[]>([]);
const [similarDatasetsLoading, setSimilarDatasetsLoading] = useState(false);
const [similarTagNames, setSimilarTagNames] = useState<string[]>([]);
const similarRequestRef = useRef(0);
const filesOperation = useFilesOperation(dataset);
const [showUploadDialog, setShowUploadDialog] = useState(false);
const normalizeTagNames = (
tags?: Array<string | { name?: string | null } | null>
) => {
if (!tags || tags.length === 0) {
return [];
}
const names = tags
.map((tag) => (typeof tag === "string" ? tag : tag?.name))
.filter((name): name is string => !!name && name.trim().length > 0)
.map((name) => name.trim());
return Array.from(new Set(names));
};
const fetchSimilarDatasets = async (currentDataset: Dataset) => {
const requestId = similarRequestRef.current + 1;
similarRequestRef.current = requestId;
if (!currentDataset?.id) {
setSimilarDatasets([]);
setSimilarTagNames([]);
setSimilarDatasetsLoading(false);
return;
}
const tagNames = normalizeTagNames(
currentDataset.tags as Array<string | { name?: string }>
);
setSimilarTagNames(tagNames);
setSimilarDatasets([]);
if (tagNames.length === 0) {
setSimilarDatasetsLoading(false);
return;
}
setSimilarDatasetsLoading(true);
try {
const { data } = await querySimilarDatasetsUsingGet(currentDataset.id, {
limit: SIMILAR_DATASET_LIMIT,
});
if (similarRequestRef.current !== requestId) {
return;
}
const list = Array.isArray(data) ? data : [];
setSimilarDatasets(list.map((item) => mapDataset(item)));
} catch (error) {
console.error("Failed to fetch similar datasets:", error);
} finally {
if (similarRequestRef.current === requestId) {
setSimilarDatasetsLoading(false);
}
}
};
const similarTagsSummary = useMemo(() => {
if (!similarTagNames || similarTagNames.length === 0) {
return "";
}
const visibleTags = similarTagNames.slice(0, SIMILAR_TAGS_PREVIEW_LIMIT);
const hiddenCount = similarTagNames.length - visibleTags.length;
if (hiddenCount > 0) {
return `${visibleTags.join("、")}${similarTagNames.length}`;
}
return visibleTags.join("、");
}, [similarTagNames]);
const navigateItems = useMemo(() => {
const items = [
{
title: <Link to="/data/management"></Link>,
},
];
if (parentDataset) {
items.push({
title: (
<Link to={`/data/management/detail/${parentDataset.id}`}>
{parentDataset.name}
</Link>
),
});
}
items.push({
title: dataset.name || "数据集详情",
});
return items;
}, [dataset, parentDataset]);
const tabList = useMemo(() => {
const items = [
{
key: "overview",
label: "概览",
},
];
if (!dataset?.parentDatasetId) {
items.push({
key: "children",
label: "关联数据集",
});
}
return items;
}, [dataset?.parentDatasetId]);
const handleCreateChildDataset = () => {
if (!dataset?.id) {
return;
}
navigate("/data/management/create", {
state: { parentDatasetId: dataset.id },
});
};
const fetchChildDatasets = async (parentId?: string) => {
if (!parentId) {
setChildDatasets([]);
return;
}
setChildDatasetsLoading(true);
try {
const { data: res } = await queryDatasetsUsingGet({
parentDatasetId: parentId,
page: 1,
size: 1000,
});
const list = res?.content || res?.data || [];
setChildDatasets(list.map((item) => mapDataset(item)));
} finally {
setChildDatasetsLoading(false);
}
};
const fetchDataset = async () => {
if (!id) {
return;
}
const { data } = await queryDatasetByIdUsingGet(id);
const mapped = mapDataset(data);
setDataset(mapped);
fetchSimilarDatasets(mapped);
if (data?.parentDatasetId) {
const { data: parentData } = await queryDatasetByIdUsingGet(
data.parentDatasetId
);
setParentDataset(mapDataset(parentData));
setChildDatasets([]);
} else {
setParentDataset(null);
await fetchChildDatasets(data?.id);
}
};
useEffect(() => {
if (!id) {
return;
}
fetchDataset();
}, [id]);
useEffect(() => {
if (dataset?.id) {
filesOperation.fetchFiles("", 1, 10); // 从根目录开始,第一页
}
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [dataset?.id]);
useEffect(() => {
if (dataset?.parentDatasetId && activeTab === "children") {
setActiveTab("overview");
}
}, [activeTab, dataset?.parentDatasetId]);
const handleRefresh = async (showMessage = true, prefixOverride?: string) => {
fetchDataset();
@@ -261,22 +282,22 @@ export default function DatasetDetail() {
];
// 数据集操作列表
const operations = [
...(dataset?.id && !dataset.parentDatasetId
? [
{
key: "create-child",
label: "创建数据集",
icon: <PlusOutlined />,
onClick: handleCreateChildDataset,
},
]
: []),
{
key: "edit",
label: "编辑",
icon: <EditOutlined />,
onClick: () => {
const operations = [
...(dataset?.id && !dataset.parentDatasetId
? [
{
key: "create-child",
label: "创建关联数据集",
icon: <PlusOutlined />,
onClick: handleCreateChildDataset,
},
]
: []),
{
key: "edit",
label: "编辑",
icon: <EditOutlined />,
onClick: () => {
setShowEditDialog(true);
},
},
@@ -314,55 +335,55 @@ export default function DatasetDetail() {
icon: <DeleteOutlined />,
onClick: handleDeleteDataset,
},
];
const childColumns = [
{
title: "名称",
dataIndex: "name",
key: "name",
render: (_: string, record: Dataset) => (
<Link to={`/data/management/detail/${record.id}`}>{record.name}</Link>
),
},
{
title: "类型",
dataIndex: "datasetType",
key: "datasetType",
width: 120,
render: (value: string) => datasetTypeMap[value]?.label || "未知",
},
{
title: "状态",
dataIndex: "status",
key: "status",
width: 120,
render: (status) =>
status ? <Tag color={status.color}>{status.label}</Tag> : "-",
},
{
title: "文件数",
dataIndex: "fileCount",
key: "fileCount",
width: 120,
render: (value?: number) => value ?? 0,
},
{
title: "大小",
dataIndex: "size",
key: "size",
width: 140,
render: (value?: string) => value || "0 B",
},
{
title: "更新时间",
dataIndex: "updatedAt",
key: "updatedAt",
width: 180,
},
];
];
const childColumns = [
{
title: "名称",
dataIndex: "name",
key: "name",
render: (_: string, record: Dataset) => (
<Link to={`/data/management/detail/${record.id}`}>{record.name}</Link>
),
},
{
title: "类型",
dataIndex: "datasetType",
key: "datasetType",
width: 120,
render: (value: string) => datasetTypeMap[value]?.label || "未知",
},
{
title: "状态",
dataIndex: "status",
key: "status",
width: 120,
render: (status) =>
status ? <Tag color={status.color}>{status.label}</Tag> : "-",
},
{
title: "文件数",
dataIndex: "fileCount",
key: "fileCount",
width: 120,
render: (value?: number) => value ?? 0,
},
{
title: "大小",
dataIndex: "size",
key: "size",
width: 140,
render: (value?: string) => value || "0 B",
},
{
title: "更新时间",
dataIndex: "updatedAt",
key: "updatedAt",
width: 180,
},
];
return (
<div className="h-full flex flex-col gap-4">
<div className="h-full flex flex-col gap-4 overflow-hidden">
<Breadcrumb items={navigateItems} />
{/* Header */}
<DetailHeader
@@ -398,42 +419,67 @@ export default function DatasetDetail() {
},
}}
/>
<div className="flex-overflow-auto p-6 pt-2 bg-white rounded-md shadow">
<Tabs activeKey={activeTab} items={tabList} onChange={setActiveTab} />
<div className="h-full overflow-auto">
{activeTab === "overview" && (
<Overview
dataset={dataset}
filesOperation={filesOperation}
fetchDataset={fetchDataset}
onUpload={() => setShowUploadDialog(true)}
similarDatasets={similarDatasets}
similarDatasetsLoading={similarDatasetsLoading}
similarTags={similarTagNames}
/>
)}
{activeTab === "children" && (
<div className="pt-4">
<div className="flex items-center justify-between mb-3">
<h2 className="text-base font-semibold"></h2>
<span className="text-xs text-gray-500">
{childDatasets.length}
</span>
</div>
<Table
rowKey="id"
columns={childColumns}
dataSource={childDatasets}
loading={childDatasetsLoading}
pagination={false}
locale={{ emptyText: "暂无子数据集" }}
/>
</div>
)}
{activeTab === "lineage" && <DataLineageFlow dataset={dataset} />}
{activeTab === "quality" && <DataQuality />}
</div>
</div>
<div className="flex-1 overflow-auto">
<div className="p-6 pt-2 bg-white rounded-md shadow mb-4">
<Tabs activeKey={activeTab} items={tabList} onChange={setActiveTab} />
<div className="">
{activeTab === "overview" && (
<Overview
dataset={dataset}
filesOperation={filesOperation}
fetchDataset={fetchDataset}
onUpload={() => setShowUploadDialog(true)}
/>
)}
{activeTab === "children" && (
<div className="pt-4">
<div className="flex items-center justify-between mb-3">
<h2 className="text-base font-semibold"></h2>
<span className="text-xs text-gray-500">
{childDatasets.length}
</span>
</div>
<Table
rowKey="id"
columns={childColumns}
dataSource={childDatasets}
loading={childDatasetsLoading}
pagination={false}
locale={{ emptyText: "暂无关联数据集" }}
/>
</div>
)}
{activeTab === "lineage" && <DataLineageFlow dataset={dataset} />}
{activeTab === "quality" && <DataQuality />}
</div>
</div>
{/* 相似数据集 */}
<div className="bg-white rounded-md shadow p-6 mb-4">
<div className="flex items-center justify-between mb-3">
<h2 className="text-base font-semibold"></h2>
{similarTagsSummary && (
<span className="text-xs text-gray-500">
{similarTagsSummary}
</span>
)}
</div>
<CardView
data={similarDatasets}
loading={similarDatasetsLoading}
operations={[]}
pagination={{
current: 1,
pageSize: similarDatasets.length || 10,
total: similarDatasets.length || 0,
style: { display: "none" },
}}
onView={(item) => {
navigate(`/data/management/detail/${item.id}`);
}}
/>
</div>
</div>
<ImportConfiguration
data={dataset}
open={showUploadDialog}

View File

@@ -1,351 +1,530 @@
import { Select, Input, Form, Radio, Modal, Button, UploadFile, Switch, Tooltip } from "antd";
import { InboxOutlined, QuestionCircleOutlined } from "@ant-design/icons";
import { dataSourceOptions } from "../../dataset.const";
import { Dataset, DataSource } from "../../dataset.model";
import { useEffect, useState } from "react";
import { queryTasksUsingGet } from "@/pages/DataCollection/collection.apis";
import { updateDatasetByIdUsingPut } from "../../dataset.api";
import { sliceFile } from "@/utils/file.util";
import Dragger from "antd/es/upload/Dragger";
/**
* 按行分割文件
* @param file 原始文件
* @returns 分割后的文件列表,每行一个文件
*/
async function splitFileByLines(file: UploadFile): Promise<UploadFile[]> {
const originFile = (file as any).originFileObj || file;
if (!originFile || typeof originFile.text !== "function") {
return [file];
}
const text = await originFile.text();
if (!text) return [file];
// 按行分割并过滤空行
const lines = text.split(/\r?\n/).filter((line: string) => line.trim() !== "");
if (lines.length === 0) return [];
// 生成文件名:原文件名_序号.扩展名
const nameParts = file.name.split(".");
const ext = nameParts.length > 1 ? "." + nameParts.pop() : "";
const baseName = nameParts.join(".");
const padLength = String(lines.length).length;
return lines.map((line: string, index: number) => {
const newFileName = `${baseName}_${String(index + 1).padStart(padLength, "0")}${ext}`;
const blob = new Blob([line], { type: "text/plain" });
const newFile = new File([blob], newFileName, { type: "text/plain" });
return {
uid: `${file.uid}-${index}`,
name: newFileName,
size: newFile.size,
type: "text/plain",
originFileObj: newFile as any,
} as UploadFile;
});
}
export default function ImportConfiguration({
data,
open,
onClose,
updateEvent = "update:dataset",
prefix,
}: {
data: Dataset | null;
open: boolean;
onClose: () => void;
updateEvent?: string;
prefix?: string;
}) {
const [form] = Form.useForm();
const [collectionOptions, setCollectionOptions] = useState([]);
const [importConfig, setImportConfig] = useState<any>({
source: DataSource.UPLOAD,
hasArchive: true,
splitByLine: false,
});
const [currentPrefix, setCurrentPrefix] = useState<string>("");
// 本地上传文件相关逻辑
const handleUpload = async (dataset: Dataset) => {
let filesToUpload = form.getFieldValue("files") || [];
// 如果启用分行分割,处理文件
if (importConfig.splitByLine) {
const splitResults = await Promise.all(
filesToUpload.map((file) => splitFileByLines(file))
);
filesToUpload = splitResults.flat();
}
// 计算分片列表
const sliceList = filesToUpload.map((file) => {
const originFile = (file as any).originFileObj || file;
const slices = sliceFile(originFile);
return {
originFile: originFile, // 传入真正的 File/Blob 对象
slices,
name: file.name,
size: originFile.size || 0,
};
});
console.log("[ImportConfiguration] Uploading with currentPrefix:", currentPrefix);
window.dispatchEvent(
new CustomEvent("upload:dataset", {
detail: {
dataset,
files: sliceList,
updateEvent,
hasArchive: importConfig.hasArchive,
prefix: currentPrefix,
},
})
);
};
const fetchCollectionTasks = async () => {
if (importConfig.source !== DataSource.COLLECTION) return;
try {
const res = await queryTasksUsingGet({ page: 0, size: 100 });
const options = res.data.content.map((task: any) => ({
label: task.name,
value: task.id,
}));
setCollectionOptions(options);
} catch (error) {
console.error("Error fetching collection tasks:", error);
}
};
const resetState = () => {
console.log('[ImportConfiguration] resetState called, preserving currentPrefix:', currentPrefix);
form.resetFields();
form.setFieldsValue({ files: null });
setImportConfig({
source: importConfig.source ? importConfig.source : DataSource.UPLOAD,
hasArchive: true,
splitByLine: false,
});
console.log('[ImportConfiguration] resetState done, currentPrefix still:', currentPrefix);
};
const handleImportData = async () => {
if (!data) return;
console.log('[ImportConfiguration] handleImportData called, currentPrefix:', currentPrefix);
if (importConfig.source === DataSource.UPLOAD) {
await handleUpload(data);
} else if (importConfig.source === DataSource.COLLECTION) {
await updateDatasetByIdUsingPut(data.id, {
...importConfig,
});
}
onClose();
};
useEffect(() => {
if (open) {
setCurrentPrefix(prefix || "");
console.log('[ImportConfiguration] Modal opened with prefix:', prefix);
resetState();
fetchCollectionTasks();
}
}, [open]);
// Separate effect for fetching collection tasks when source changes
useEffect(() => {
if (open && importConfig.source === DataSource.COLLECTION) {
fetchCollectionTasks();
}
}, [importConfig.source]);
return (
<Modal
title="导入数据"
open={open}
width={600}
onCancel={() => {
onClose();
resetState();
}}
maskClosable={false}
footer={
<>
<Button onClick={onClose}></Button>
<Button
type="primary"
disabled={!importConfig?.files?.length && !importConfig.dataSource}
onClick={handleImportData}
>
</Button>
</>
}
>
<Form
form={form}
layout="vertical"
initialValues={importConfig || {}}
onValuesChange={(_, allValues) => setImportConfig(allValues)}
>
<Form.Item
label="数据源"
name="source"
rules={[{ required: true, message: "请选择数据源" }]}
>
<Radio.Group
buttonStyle="solid"
options={dataSourceOptions}
optionType="button"
/>
</Form.Item>
{importConfig?.source === DataSource.COLLECTION && (
<Form.Item name="dataSource" label="归集任务" required>
<Select placeholder="请选择归集任务" options={collectionOptions} />
</Form.Item>
)}
{/* obs import */}
{importConfig?.source === DataSource.OBS && (
<div className="grid grid-cols-2 gap-3 p-4 bg-blue-50 rounded-lg">
<Form.Item
name="endpoint"
rules={[{ required: true }]}
label="Endpoint"
>
<Input
className="h-8 text-xs"
placeholder="obs.cn-north-4.myhuaweicloud.com"
/>
</Form.Item>
<Form.Item
name="bucket"
rules={[{ required: true }]}
label="Bucket"
>
<Input className="h-8 text-xs" placeholder="my-bucket" />
</Form.Item>
<Form.Item
name="accessKey"
rules={[{ required: true }]}
label="Access Key"
>
<Input className="h-8 text-xs" placeholder="Access Key" />
</Form.Item>
<Form.Item
name="secretKey"
rules={[{ required: true }]}
label="Secret Key"
>
<Input
type="password"
className="h-8 text-xs"
placeholder="Secret Key"
/>
</Form.Item>
</div>
)}
{/* Local Upload Component */}
{importConfig?.source === DataSource.UPLOAD && (
<>
<Form.Item
label="自动解压上传的压缩包"
name="hasArchive"
valuePropName="checked"
>
<Switch />
</Form.Item>
<Form.Item
label={
<span>
{" "}
<Tooltip title="选中后,文本文件的每一行将被分割成独立文件">
<QuestionCircleOutlined style={{ color: "#999" }} />
</Tooltip>
</span>
}
name="splitByLine"
valuePropName="checked"
>
<Switch />
</Form.Item>
<Form.Item
label="上传文件"
name="files"
valuePropName="fileList"
getValueFromEvent={(e: any) => {
if (Array.isArray(e)) {
return e;
}
return e && e.fileList;
}}
rules={[
{
required: true,
message: "请上传文件",
},
]}
>
<Dragger
className="w-full"
beforeUpload={() => false}
multiple
>
<p className="ant-upload-drag-icon">
<InboxOutlined />
</p>
<p className="ant-upload-text"></p>
<p className="ant-upload-hint"></p>
</Dragger>
</Form.Item>
</>
)}
{/* Target Configuration */}
{importConfig?.target && importConfig?.target !== DataSource.UPLOAD && (
<div className="space-y-3 p-4 bg-blue-50 rounded-lg">
{importConfig?.target === DataSource.DATABASE && (
<div className="grid grid-cols-2 gap-3">
<Form.Item
name="databaseType"
rules={[{ required: true }]}
label="数据库类型"
>
<Select
className="w-full"
options={[
{ label: "MySQL", value: "mysql" },
{ label: "PostgreSQL", value: "postgresql" },
{ label: "MongoDB", value: "mongodb" },
]}
></Select>
</Form.Item>
<Form.Item
name="tableName"
rules={[{ required: true }]}
label="表名"
>
<Input className="h-8 text-xs" placeholder="dataset_table" />
</Form.Item>
<Form.Item
name="connectionString"
rules={[{ required: true }]}
label="连接字符串"
>
<Input
className="h-8 text-xs col-span-2"
placeholder="数据库连接字符串"
/>
</Form.Item>
</div>
)}
</div>
)}
</Form>
</Modal>
);
}
import { Select, Input, Form, Radio, Modal, Button, UploadFile, Switch, Tooltip } from "antd";
import { InboxOutlined, QuestionCircleOutlined } from "@ant-design/icons";
import { dataSourceOptions } from "../../dataset.const";
import { Dataset, DatasetType, DataSource } from "../../dataset.model";
import { useCallback, useEffect, useMemo, useState } from "react";
import { queryTasksUsingGet } from "@/pages/DataCollection/collection.apis";
import { updateDatasetByIdUsingPut } from "../../dataset.api";
import { sliceFile, shouldStreamUpload } from "@/utils/file.util";
import Dragger from "antd/es/upload/Dragger";
const TEXT_FILE_MIME_PREFIX = "text/";
const TEXT_FILE_MIME_TYPES = new Set([
"application/json",
"application/xml",
"application/csv",
"application/ndjson",
"application/x-ndjson",
"application/x-yaml",
"application/yaml",
"application/javascript",
"application/x-javascript",
"application/sql",
]);
const TEXT_FILE_EXTENSIONS = new Set([
".txt",
".md",
".csv",
".tsv",
".json",
".jsonl",
".ndjson",
".log",
".xml",
".yaml",
".yml",
".sql",
]);
function getUploadFileName(file: UploadFile): string {
if (file.name) return file.name;
const originFile = file.originFileObj;
if (originFile instanceof File && originFile.name) {
return originFile.name;
}
return "";
}
function getUploadFileType(file: UploadFile): string {
if (file.type) return file.type;
const originFile = file.originFileObj;
if (originFile instanceof File && typeof originFile.type === "string") {
return originFile.type;
}
return "";
}
function isTextUploadFile(file: UploadFile): boolean {
const mimeType = getUploadFileType(file).toLowerCase();
if (mimeType) {
if (mimeType.startsWith(TEXT_FILE_MIME_PREFIX)) return true;
if (TEXT_FILE_MIME_TYPES.has(mimeType)) return true;
}
const fileName = getUploadFileName(file);
const dotIndex = fileName.lastIndexOf(".");
if (dotIndex < 0) return false;
const ext = fileName.slice(dotIndex).toLowerCase();
return TEXT_FILE_EXTENSIONS.has(ext);
}
/**
* 按行分割文件
* @param file 原始文件
* @returns 分割后的文件列表,每行一个文件
*/
async function splitFileByLines(file: UploadFile): Promise<UploadFile[]> {
if (!isTextUploadFile(file)) {
return [file];
}
const originFile = file.originFileObj ?? file;
if (!(originFile instanceof File) || typeof originFile.text !== "function") {
return [file];
}
const text = await originFile.text();
if (!text) return [file];
// 按行分割并过滤空行
const lines = text.split(/\r?\n/).filter((line: string) => line.trim() !== "");
if (lines.length === 0) return [];
// 生成文件名:原文件名_序号(不保留后缀)
const nameParts = file.name.split(".");
if (nameParts.length > 1) {
nameParts.pop();
}
const baseName = nameParts.join(".");
const padLength = String(lines.length).length;
return lines.map((line: string, index: number) => {
const newFileName = `${baseName}_${String(index + 1).padStart(padLength, "0")}`;
const blob = new Blob([line], { type: "text/plain" });
const newFile = new File([blob], newFileName, { type: "text/plain" });
return {
uid: `${file.uid}-${index}`,
name: newFileName,
size: newFile.size,
type: "text/plain",
originFileObj: newFile as UploadFile["originFileObj"],
} as UploadFile;
});
}
type SelectOption = {
label: string;
value: string;
};
type CollectionTask = {
id: string;
name: string;
};
type ImportConfig = {
source: DataSource;
hasArchive: boolean;
splitByLine: boolean;
files?: UploadFile[];
dataSource?: string;
target?: DataSource;
[key: string]: unknown;
};
export default function ImportConfiguration({
data,
open,
onClose,
updateEvent = "update:dataset",
prefix,
}: {
data: Dataset | null;
open: boolean;
onClose: () => void;
updateEvent?: string;
prefix?: string;
}) {
const [form] = Form.useForm();
const [collectionOptions, setCollectionOptions] = useState<SelectOption[]>([]);
const availableSourceOptions = dataSourceOptions.filter(
(option) => option.value !== DataSource.COLLECTION
);
const [importConfig, setImportConfig] = useState<ImportConfig>({
source: DataSource.UPLOAD,
hasArchive: true,
splitByLine: false,
});
const [currentPrefix, setCurrentPrefix] = useState<string>("");
const hasNonTextFile = useMemo(() => {
const files = importConfig.files ?? [];
if (files.length === 0) return false;
return files.some((file) => !isTextUploadFile(file));
}, [importConfig.files]);
const isTextDataset = data?.datasetType === DatasetType.TEXT;
// 本地上传文件相关逻辑
const handleUpload = async (dataset: Dataset) => {
const filesToUpload =
(form.getFieldValue("files") as UploadFile[] | undefined) || [];
// 如果启用分行分割,对大文件使用流式处理
if (importConfig.splitByLine && !hasNonTextFile) {
// 检查是否有大文件需要流式分割上传
const filesForStreamUpload: File[] = [];
const filesForNormalUpload: UploadFile[] = [];
for (const file of filesToUpload) {
const originFile = file.originFileObj ?? file;
if (originFile instanceof File && shouldStreamUpload(originFile)) {
filesForStreamUpload.push(originFile);
} else {
filesForNormalUpload.push(file);
}
}
// 大文件使用流式分割上传
if (filesForStreamUpload.length > 0) {
window.dispatchEvent(
new CustomEvent("upload:dataset-stream", {
detail: {
dataset,
files: filesForStreamUpload,
updateEvent,
hasArchive: importConfig.hasArchive,
prefix: currentPrefix,
},
})
);
}
// 小文件使用传统分割方式
if (filesForNormalUpload.length > 0) {
const splitResults = await Promise.all(
filesForNormalUpload.map((file) => splitFileByLines(file))
);
const smallFilesToUpload = splitResults.flat();
// 计算分片列表
const sliceList = smallFilesToUpload.map((file) => {
const originFile = (file.originFileObj ?? file) as Blob;
const slices = sliceFile(originFile);
return {
originFile: originFile,
slices,
name: file.name,
size: originFile.size || 0,
};
});
console.log("[ImportConfiguration] Uploading small files with currentPrefix:", currentPrefix);
window.dispatchEvent(
new CustomEvent("upload:dataset", {
detail: {
dataset,
files: sliceList,
updateEvent,
hasArchive: importConfig.hasArchive,
prefix: currentPrefix,
},
})
);
}
return;
}
// 未启用分行分割,使用普通上传
// 计算分片列表
const sliceList = filesToUpload.map((file) => {
const originFile = (file.originFileObj ?? file) as Blob;
const slices = sliceFile(originFile);
return {
originFile: originFile, // 传入真正的 File/Blob 对象
slices,
name: file.name,
size: originFile.size || 0,
};
});
console.log("[ImportConfiguration] Uploading with currentPrefix:", currentPrefix);
window.dispatchEvent(
new CustomEvent("upload:dataset", {
detail: {
dataset,
files: sliceList,
updateEvent,
hasArchive: importConfig.hasArchive,
prefix: currentPrefix,
},
})
);
};
const fetchCollectionTasks = useCallback(async () => {
if (importConfig.source !== DataSource.COLLECTION) return;
try {
const res = await queryTasksUsingGet({ page: 0, size: 100 });
const tasks = Array.isArray(res?.data?.content)
? (res.data.content as CollectionTask[])
: [];
const options = tasks.map((task) => ({
label: task.name,
value: task.id,
}));
setCollectionOptions(options);
} catch (error) {
console.error("Error fetching collection tasks:", error);
}
}, [importConfig.source]);
const resetState = useCallback(() => {
console.log('[ImportConfiguration] resetState called, preserving currentPrefix:', currentPrefix);
form.resetFields();
form.setFieldsValue({ files: null });
setImportConfig({
source: DataSource.UPLOAD,
hasArchive: true,
splitByLine: false,
});
console.log('[ImportConfiguration] resetState done, currentPrefix still:', currentPrefix);
}, [currentPrefix, form]);
const handleImportData = async () => {
if (!data) return;
console.log('[ImportConfiguration] handleImportData called, currentPrefix:', currentPrefix);
if (importConfig.source === DataSource.UPLOAD) {
// 立即显示任务中心,让用户感知上传已开始(在文件分割等耗时操作之前)
window.dispatchEvent(
new CustomEvent("show:task-popover", { detail: { show: true } })
);
await handleUpload(data);
} else if (importConfig.source === DataSource.COLLECTION) {
await updateDatasetByIdUsingPut(data.id, {
...importConfig,
});
}
onClose();
};
useEffect(() => {
if (open) {
setCurrentPrefix(prefix || "");
console.log('[ImportConfiguration] Modal opened with prefix:', prefix);
resetState();
fetchCollectionTasks();
}
}, [fetchCollectionTasks, open, prefix, resetState]);
useEffect(() => {
if (!importConfig.files?.length) return;
if (!importConfig.splitByLine) return;
if (!hasNonTextFile) return;
form.setFieldsValue({ splitByLine: false });
setImportConfig((prev) => ({ ...prev, splitByLine: false }));
}, [form, hasNonTextFile, importConfig.files, importConfig.splitByLine]);
// Separate effect for fetching collection tasks when source changes
useEffect(() => {
if (open && importConfig.source === DataSource.COLLECTION) {
fetchCollectionTasks();
}
}, [fetchCollectionTasks, importConfig.source, open]);
return (
<Modal
title="导入数据"
open={open}
width={600}
onCancel={() => {
onClose();
resetState();
}}
maskClosable={false}
footer={
<>
<Button onClick={onClose}></Button>
<Button
type="primary"
disabled={!importConfig?.files?.length && !importConfig.dataSource}
onClick={handleImportData}
>
</Button>
</>
}
>
<Form
form={form}
layout="vertical"
initialValues={importConfig || {}}
onValuesChange={(_, allValues) => setImportConfig(allValues)}
>
<Form.Item
label="数据源"
name="source"
rules={[{ required: true, message: "请选择数据源" }]}
>
<Radio.Group
buttonStyle="solid"
options={availableSourceOptions}
optionType="button"
/>
</Form.Item>
{importConfig?.source === DataSource.COLLECTION && (
<Form.Item name="dataSource" label="归集任务" required>
<Select placeholder="请选择归集任务" options={collectionOptions} />
</Form.Item>
)}
{/* obs import */}
{importConfig?.source === DataSource.OBS && (
<div className="grid grid-cols-2 gap-3 p-4 bg-blue-50 rounded-lg">
<Form.Item
name="endpoint"
rules={[{ required: true }]}
label="Endpoint"
>
<Input
className="h-8 text-xs"
placeholder="obs.cn-north-4.myhuaweicloud.com"
/>
</Form.Item>
<Form.Item
name="bucket"
rules={[{ required: true }]}
label="Bucket"
>
<Input className="h-8 text-xs" placeholder="my-bucket" />
</Form.Item>
<Form.Item
name="accessKey"
rules={[{ required: true }]}
label="Access Key"
>
<Input className="h-8 text-xs" placeholder="Access Key" />
</Form.Item>
<Form.Item
name="secretKey"
rules={[{ required: true }]}
label="Secret Key"
>
<Input
type="password"
className="h-8 text-xs"
placeholder="Secret Key"
/>
</Form.Item>
</div>
)}
{/* Local Upload Component */}
{importConfig?.source === DataSource.UPLOAD && (
<>
<Form.Item
label="自动解压上传的压缩包"
name="hasArchive"
valuePropName="checked"
>
<Switch />
</Form.Item>
{isTextDataset && (
<Form.Item
label={
<span>
{" "}
<Tooltip
title={
hasNonTextFile
? "已选择非文本文件,无法按行分割"
: "选中后,文本文件的每一行将被分割成独立文件"
}
>
<QuestionCircleOutlined style={{ color: "#999" }} />
</Tooltip>
</span>
}
name="splitByLine"
valuePropName="checked"
>
<Switch disabled={hasNonTextFile} />
</Form.Item>
)}
<Form.Item
label="上传文件"
name="files"
valuePropName="fileList"
getValueFromEvent={(
event: { fileList?: UploadFile[] } | UploadFile[]
) => {
if (Array.isArray(event)) {
return event;
}
return event?.fileList;
}}
rules={[
{
required: true,
message: "请上传文件",
},
]}
>
<Dragger
className="w-full"
beforeUpload={() => false}
multiple
>
<p className="ant-upload-drag-icon">
<InboxOutlined />
</p>
<p className="ant-upload-text"></p>
<p className="ant-upload-hint"></p>
</Dragger>
</Form.Item>
</>
)}
{/* Target Configuration */}
{importConfig?.target && importConfig?.target !== DataSource.UPLOAD && (
<div className="space-y-3 p-4 bg-blue-50 rounded-lg">
{importConfig?.target === DataSource.DATABASE && (
<div className="grid grid-cols-2 gap-3">
<Form.Item
name="databaseType"
rules={[{ required: true }]}
label="数据库类型"
>
<Select
className="w-full"
options={[
{ label: "MySQL", value: "mysql" },
{ label: "PostgreSQL", value: "postgresql" },
{ label: "MongoDB", value: "mongodb" },
]}
></Select>
</Form.Item>
<Form.Item
name="tableName"
rules={[{ required: true }]}
label="表名"
>
<Input className="h-8 text-xs" placeholder="dataset_table" />
</Form.Item>
<Form.Item
name="connectionString"
rules={[{ required: true }]}
label="连接字符串"
>
<Input
className="h-8 text-xs col-span-2"
placeholder="数据库连接字符串"
/>
</Form.Item>
</div>
)}
</div>
)}
</Form>
</Modal>
);
}

View File

@@ -4,152 +4,69 @@ import {
Descriptions,
DescriptionsProps,
Modal,
Spin,
Table,
Input,
Tag,
} from "antd";
import { formatBytes, formatDateTime } from "@/utils/unit";
import { Download, Trash2, Folder, File } from "lucide-react";
import { datasetTypeMap } from "../../dataset.const";
import type { Dataset, DatasetFile } from "@/pages/DataManagement/dataset.model";
import { Link } from "react-router";
import type { useFilesOperation } from "../useFilesOperation";
type DatasetFileRow = DatasetFile & {
fileSize?: number;
fileCount?: number;
uploadTime?: string;
};
const PREVIEW_MAX_HEIGHT = 500;
import { formatBytes, formatDateTime } from "@/utils/unit";
import { Download, Trash2, Folder, File } from "lucide-react";
import { datasetTypeMap } from "../../dataset.const";
import type { Dataset, DatasetFile } from "@/pages/DataManagement/dataset.model";
import type { useFilesOperation } from "../useFilesOperation";
type DatasetFileRow = DatasetFile & {
fileSize?: number;
fileCount?: number;
uploadTime?: string;
};
const PREVIEW_MAX_HEIGHT = 500;
const PREVIEW_MODAL_WIDTH = {
text: 800,
media: 700,
text: "80vw",
media: "80vw",
};
const PREVIEW_TEXT_FONT_SIZE = 12;
const PREVIEW_TEXT_PADDING = 12;
const PREVIEW_AUDIO_PADDING = 40;
const SIMILAR_TAGS_PREVIEW_LIMIT = 3;
const SIMILAR_DATASET_TAG_PREVIEW_LIMIT = 4;
type OverviewProps = {
dataset: Dataset;
filesOperation: ReturnType<typeof useFilesOperation>;
fetchDataset: () => void;
onUpload?: () => void;
similarDatasets: Dataset[];
similarDatasetsLoading: boolean;
similarTags: string[];
};
export default function Overview({
dataset,
filesOperation,
fetchDataset,
onUpload,
similarDatasets,
similarDatasetsLoading,
similarTags,
}: OverviewProps) {
const { modal, message } = App.useApp();
const PREVIEW_TEXT_FONT_SIZE = 12;
const PREVIEW_TEXT_PADDING = 12;
const PREVIEW_AUDIO_PADDING = 40;
type OverviewProps = {
dataset: Dataset;
filesOperation: ReturnType<typeof useFilesOperation>;
fetchDataset: () => void;
onUpload?: () => void;
};
export default function Overview({
dataset,
filesOperation,
fetchDataset,
onUpload,
}: OverviewProps) {
const { modal, message } = App.useApp();
const {
fileList,
pagination,
selectedFiles,
previewVisible,
previewFileName,
previewContent,
fileList,
pagination,
selectedFiles,
previewVisible,
previewFileName,
previewContent,
previewFileType,
previewMediaUrl,
previewLoading,
officePreviewStatus,
officePreviewError,
closePreview,
handleDeleteFile,
handleDownloadFile,
handleBatchDeleteFiles,
handleBatchExport,
handleCreateDirectory,
handleDownloadDirectory,
handleDeleteDirectory,
handlePreviewFile,
} = filesOperation;
const similarTagsSummary = (() => {
if (!similarTags || similarTags.length === 0) {
return "";
}
const visibleTags = similarTags.slice(0, SIMILAR_TAGS_PREVIEW_LIMIT);
const hiddenCount = similarTags.length - visibleTags.length;
if (hiddenCount > 0) {
return `${visibleTags.join("、")}${similarTags.length}`;
}
return visibleTags.join("、");
})();
const renderDatasetTags = (
tags?: Array<string | { name?: string; color?: string } | null>
) => {
if (!tags || tags.length === 0) {
return "-";
}
const visibleTags = tags.slice(0, SIMILAR_DATASET_TAG_PREVIEW_LIMIT);
const hiddenCount = tags.length - visibleTags.length;
return (
<div className="flex flex-wrap gap-1">
{visibleTags.map((tag, index) => {
const tagName = typeof tag === "string" ? tag : tag?.name;
if (!tagName) {
return null;
}
const tagColor = typeof tag === "string" ? undefined : tag?.color;
return (
<Tag key={`${tagName}-${index}`} color={tagColor}>
{tagName}
</Tag>
);
})}
{hiddenCount > 0 && <Tag>+{hiddenCount}</Tag>}
</div>
);
};
const similarColumns = [
{
title: "名称",
dataIndex: "name",
key: "name",
render: (_: string, record: Dataset) => (
<Link to={`/data/management/detail/${record.id}`}>{record.name}</Link>
),
},
{
title: "标签",
dataIndex: "tags",
key: "tags",
render: (tags: Array<string | { name?: string; color?: string }>) =>
renderDatasetTags(tags),
},
{
title: "类型",
dataIndex: "datasetType",
key: "datasetType",
width: 120,
render: (_: string, record: Dataset) =>
datasetTypeMap[record.datasetType as keyof typeof datasetTypeMap]?.label ||
"未知",
},
{
title: "文件数",
dataIndex: "fileCount",
key: "fileCount",
width: 120,
render: (value?: number) => value ?? 0,
},
{
title: "更新时间",
dataIndex: "updatedAt",
key: "updatedAt",
width: 180,
},
];
handleDeleteFile,
handleDownloadFile,
handleBatchDeleteFiles,
handleBatchExport,
handleCreateDirectory,
handleDownloadDirectory,
handleDeleteDirectory,
handlePreviewFile,
} = filesOperation;
// 基本信息
// 基本信息
const items: DescriptionsProps["items"] = [
{
key: "id",
@@ -211,7 +128,7 @@ export default function Overview({
dataIndex: "fileName",
key: "fileName",
fixed: "left",
render: (text: string, record: DatasetFileRow) => {
render: (text: string, record: DatasetFileRow) => {
const isDirectory = record.id.startsWith('directory-');
const iconSize = 16;
@@ -230,35 +147,35 @@ export default function Overview({
return (
<Button
type="link"
onClick={() => {
const currentPath = filesOperation.pagination.prefix || '';
// 文件夹路径必须以斜杠结尾
const newPath = `${currentPath}${record.fileName}/`;
filesOperation.fetchFiles(newPath, 1, filesOperation.pagination.pageSize);
}}
>
onClick={() => {
const currentPath = filesOperation.pagination.prefix || '';
// 文件夹路径必须以斜杠结尾
const newPath = `${currentPath}${record.fileName}/`;
filesOperation.fetchFiles(newPath, 1, filesOperation.pagination.pageSize);
}}
>
{content}
</Button>
);
}
return (
<Button
type="link"
loading={previewLoading && previewFileName === record.fileName}
onClick={() => handlePreviewFile(record)}
>
{content}
</Button>
);
},
return (
<Button
type="link"
loading={previewLoading && previewFileName === record.fileName}
onClick={() => handlePreviewFile(record)}
>
{content}
</Button>
);
},
},
{
title: "大小",
dataIndex: "fileSize",
key: "fileSize",
width: 150,
render: (text: number, record: DatasetFileRow) => {
render: (text: number, record: DatasetFileRow) => {
const isDirectory = record.id.startsWith('directory-');
if (isDirectory) {
return formatBytes(record.fileSize || 0);
@@ -271,7 +188,7 @@ export default function Overview({
dataIndex: "fileCount",
key: "fileCount",
width: 120,
render: (text: number, record: DatasetFileRow) => {
render: (text: number, record: DatasetFileRow) => {
const isDirectory = record.id.startsWith('directory-');
if (!isDirectory) {
return "-";
@@ -291,7 +208,7 @@ export default function Overview({
key: "action",
width: 180,
fixed: "right",
render: (_, record: DatasetFileRow) => {
render: (_, record: DatasetFileRow) => {
const isDirectory = record.id.startsWith('directory-');
if (isDirectory) {
@@ -332,6 +249,14 @@ export default function Overview({
return (
<div className="flex">
<Button
size="small"
type="link"
loading={previewLoading && previewFileName === record.fileName}
onClick={() => handlePreviewFile(record)}
>
</Button>
<Button
size="small"
type="link"
@@ -367,70 +292,45 @@ export default function Overview({
column={5}
/>
{/* 相似数据集 */}
<div className="mt-8">
<div className="flex items-center justify-between mb-3">
<h2 className="text-base font-semibold"></h2>
{similarTagsSummary && (
<span className="text-xs text-gray-500">
{similarTagsSummary}
</span>
)}
</div>
<Table
size="small"
rowKey="id"
columns={similarColumns}
dataSource={similarDatasets}
loading={similarDatasetsLoading}
pagination={false}
locale={{
emptyText: similarTags?.length
? "暂无相似数据集"
: "当前数据集未设置标签",
}}
/>
</div>
{/* 文件列表 */}
<div className="flex items-center justify-between mt-8 mb-2">
<h2 className="text-base font-semibold"></h2>
<div className="flex items-center gap-2">
<Button size="small" onClick={() => onUpload?.()}>
</Button>
<Button
type="primary"
size="small"
onClick={() => {
let dirName = "";
modal.confirm({
title: "新建文件夹",
content: (
<Input
autoFocus
placeholder="请输入文件夹名称"
onChange={(e) => {
dirName = e.target.value?.trim();
}}
/>
),
okText: "确定",
cancelText: "取消",
onOk: async () => {
if (!dirName) {
message.warning("请输入文件夹名称");
return Promise.reject();
}
await handleCreateDirectory(dirName);
},
});
}}
>
</Button>
</div>
</div>
{/* 文件列表 */}
<div className="flex items-center justify-between mt-8 mb-2">
<h2 className="text-base font-semibold"></h2>
<div className="flex items-center gap-2">
<Button size="small" onClick={() => onUpload?.()}>
</Button>
<Button
type="primary"
size="small"
onClick={() => {
let dirName = "";
modal.confirm({
title: "新建文件夹",
content: (
<Input
autoFocus
placeholder="请输入文件夹名称"
onChange={(e) => {
dirName = e.target.value?.trim();
}}
/>
),
okText: "确定",
cancelText: "取消",
onOk: async () => {
if (!dirName) {
message.warning("请输入文件夹名称");
return Promise.reject();
}
await handleCreateDirectory(dirName);
},
});
}}
>
</Button>
</div>
</div>
{selectedFiles.length > 0 && (
<div className="flex items-center gap-2 p-3 bg-blue-50 rounded-lg border border-blue-200">
<span className="text-sm text-blue-700 font-medium">
@@ -511,63 +411,98 @@ export default function Overview({
/>
</div>
</div>
{/* 文件预览弹窗 */}
<Modal
title={`文件预览:${previewFileName}`}
open={previewVisible}
onCancel={closePreview}
footer={[
<Button key="close" onClick={closePreview}>
</Button>,
]}
width={previewFileType === "text" ? PREVIEW_MODAL_WIDTH.text : PREVIEW_MODAL_WIDTH.media}
>
{previewFileType === "text" && (
<pre
style={{
maxHeight: `${PREVIEW_MAX_HEIGHT}px`,
overflow: "auto",
whiteSpace: "pre-wrap",
wordBreak: "break-all",
fontSize: PREVIEW_TEXT_FONT_SIZE,
color: "#222",
backgroundColor: "#f5f5f5",
padding: `${PREVIEW_TEXT_PADDING}px`,
borderRadius: "4px",
}}
>
{previewContent}
</pre>
{/* 文件预览弹窗 */}
<Modal
title={`文件预览:${previewFileName}`}
open={previewVisible}
onCancel={closePreview}
footer={[
<Button key="close" onClick={closePreview}>
</Button>,
]}
width={previewFileType === "text" ? PREVIEW_MODAL_WIDTH.text : PREVIEW_MODAL_WIDTH.media}
>
{previewFileType === "text" && (
<pre
style={{
maxHeight: `${PREVIEW_MAX_HEIGHT}px`,
overflow: "auto",
whiteSpace: "pre-wrap",
wordBreak: "break-all",
fontSize: PREVIEW_TEXT_FONT_SIZE,
color: "#222",
backgroundColor: "#f5f5f5",
padding: `${PREVIEW_TEXT_PADDING}px`,
borderRadius: "4px",
}}
>
{previewContent}
</pre>
)}
{previewFileType === "image" && (
<div style={{ textAlign: "center" }}>
<img
src={previewMediaUrl}
alt={previewFileName}
style={{ maxWidth: "100%", maxHeight: `${PREVIEW_MAX_HEIGHT}px`, objectFit: "contain" }}
/>
</div>
)}
{previewFileType === "pdf" && (
<>
{previewMediaUrl ? (
<iframe
src={previewMediaUrl}
title={previewFileName || "PDF 预览"}
style={{ width: "100%", height: `${PREVIEW_MAX_HEIGHT}px`, border: "none" }}
/>
) : (
<div
style={{
height: `${PREVIEW_MAX_HEIGHT}px`,
display: "flex",
flexDirection: "column",
alignItems: "center",
justifyContent: "center",
gap: 12,
color: "#666",
}}
>
{officePreviewStatus === "FAILED" ? (
<>
<div></div>
<div>{officePreviewError || "请稍后重试"}</div>
</>
) : (
<>
<Spin />
<div>...</div>
</>
)}
</div>
)}
</>
)}
{previewFileType === "image" && (
<div style={{ textAlign: "center" }}>
<img
src={previewMediaUrl}
alt={previewFileName}
style={{ maxWidth: "100%", maxHeight: `${PREVIEW_MAX_HEIGHT}px`, objectFit: "contain" }}
/>
</div>
)}
{previewFileType === "video" && (
<div style={{ textAlign: "center" }}>
<video
src={previewMediaUrl}
controls
style={{ maxWidth: "100%", maxHeight: `${PREVIEW_MAX_HEIGHT}px` }}
>
</video>
</div>
)}
{previewFileType === "audio" && (
<div style={{ textAlign: "center", padding: `${PREVIEW_AUDIO_PADDING}px 0` }}>
<audio src={previewMediaUrl} controls style={{ width: "100%" }}>
</audio>
</div>
)}
</Modal>
</>
);
}
{previewFileType === "video" && (
<div style={{ textAlign: "center" }}>
<video
src={previewMediaUrl}
controls
style={{ maxWidth: "100%", maxHeight: `${PREVIEW_MAX_HEIGHT}px` }}
>
</video>
</div>
)}
{previewFileType === "audio" && (
<div style={{ textAlign: "center", padding: `${PREVIEW_AUDIO_PADDING}px 0` }}>
<audio src={previewMediaUrl} controls style={{ width: "100%" }}>
</audio>
</div>
)}
</Modal>
</>
);
}

View File

@@ -1,22 +1,51 @@
import type {
Dataset,
DatasetFile,
} from "@/pages/DataManagement/dataset.model";
import type {
Dataset,
DatasetFile,
} from "@/pages/DataManagement/dataset.model";
import { App } from "antd";
import { useState } from "react";
import { PREVIEW_TEXT_MAX_LENGTH, resolvePreviewFileType, truncatePreviewText } from "@/utils/filePreview";
import {
deleteDatasetFileUsingDelete,
downloadFileByIdUsingGet,
exportDatasetUsingPost,
queryDatasetFilesUsingGet,
createDatasetDirectoryUsingPost,
downloadDirectoryUsingGet,
deleteDirectoryUsingDelete,
} from "../dataset.api";
import { useCallback, useEffect, useRef, useState } from "react";
import {
PREVIEW_TEXT_MAX_LENGTH,
resolvePreviewFileType,
truncatePreviewText,
type PreviewFileType,
} from "@/utils/filePreview";
import {
deleteDatasetFileUsingDelete,
downloadFileByIdUsingGet,
exportDatasetUsingPost,
queryDatasetFilesUsingGet,
createDatasetDirectoryUsingPost,
downloadDirectoryUsingGet,
deleteDirectoryUsingDelete,
queryDatasetFilePreviewStatusUsingGet,
convertDatasetFilePreviewUsingPost,
} from "../dataset.api";
import { useParams } from "react-router";
const OFFICE_FILE_EXTENSIONS = [".doc", ".docx"];
const OFFICE_PREVIEW_POLL_INTERVAL = 2000;
const OFFICE_PREVIEW_POLL_MAX_TIMES = 60;
type OfficePreviewStatus = "UNSET" | "PENDING" | "PROCESSING" | "READY" | "FAILED";
const isOfficeFileName = (fileName?: string) => {
const lowerName = (fileName || "").toLowerCase();
return OFFICE_FILE_EXTENSIONS.some((ext) => lowerName.endsWith(ext));
};
const normalizeOfficePreviewStatus = (status?: string): OfficePreviewStatus => {
if (!status) {
return "UNSET";
}
const upper = status.toUpperCase();
if (upper === "PENDING" || upper === "PROCESSING" || upper === "READY" || upper === "FAILED") {
return upper as OfficePreviewStatus;
}
return "UNSET";
};
export function useFilesOperation(dataset: Dataset) {
const { message } = App.useApp();
const { id } = useParams(); // 获取动态路由参数
@@ -35,9 +64,26 @@ export function useFilesOperation(dataset: Dataset) {
const [previewVisible, setPreviewVisible] = useState(false);
const [previewContent, setPreviewContent] = useState("");
const [previewFileName, setPreviewFileName] = useState("");
const [previewFileType, setPreviewFileType] = useState<"text" | "image" | "video" | "audio">("text");
const [previewFileType, setPreviewFileType] = useState<PreviewFileType>("text");
const [previewMediaUrl, setPreviewMediaUrl] = useState("");
const [previewLoading, setPreviewLoading] = useState(false);
const [officePreviewStatus, setOfficePreviewStatus] = useState<OfficePreviewStatus | null>(null);
const [officePreviewError, setOfficePreviewError] = useState("");
const officePreviewPollingRef = useRef<number | null>(null);
const officePreviewFileRef = useRef<string | null>(null);
const clearOfficePreviewPolling = useCallback(() => {
if (officePreviewPollingRef.current) {
window.clearTimeout(officePreviewPollingRef.current);
officePreviewPollingRef.current = null;
}
}, []);
useEffect(() => {
return () => {
clearOfficePreviewPolling();
};
}, [clearOfficePreviewPolling]);
const fetchFiles = async (
prefix?: string,
@@ -52,6 +98,7 @@ export function useFilesOperation(dataset: Dataset) {
size: pageSize !== undefined ? pageSize : pagination.pageSize,
isWithDirectory: true,
prefix: targetPrefix,
excludeDerivedFiles: true,
};
const { data } = await queryDatasetFilesUsingGet(id!, params);
@@ -105,22 +152,66 @@ export function useFilesOperation(dataset: Dataset) {
return;
}
const previewUrl = `/api/data-management/datasets/${datasetId}/files/${file.id}/preview`;
setPreviewFileName(file.fileName);
setPreviewContent("");
setPreviewMediaUrl("");
if (isOfficeFileName(file?.fileName)) {
setPreviewFileType("pdf");
setPreviewVisible(true);
setPreviewLoading(true);
setOfficePreviewStatus("PROCESSING");
setOfficePreviewError("");
officePreviewFileRef.current = file.id;
try {
const { data: statusData } = await queryDatasetFilePreviewStatusUsingGet(datasetId, file.id);
const currentStatus = normalizeOfficePreviewStatus(statusData?.status);
if (currentStatus === "READY") {
setPreviewMediaUrl(previewUrl);
setOfficePreviewStatus("READY");
setPreviewLoading(false);
return;
}
if (currentStatus === "PROCESSING") {
pollOfficePreviewStatus(datasetId, file.id, 0);
return;
}
const { data } = await convertDatasetFilePreviewUsingPost(datasetId, file.id);
const status = normalizeOfficePreviewStatus(data?.status);
if (status === "READY") {
setPreviewMediaUrl(previewUrl);
setOfficePreviewStatus("READY");
} else if (status === "FAILED") {
setOfficePreviewStatus("FAILED");
setOfficePreviewError(data?.previewError || "转换失败,请稍后重试");
} else {
setOfficePreviewStatus("PROCESSING");
pollOfficePreviewStatus(datasetId, file.id, 0);
return;
}
} catch (error) {
console.error("触发预览转换失败", error);
message.error({ content: "触发预览转换失败" });
setOfficePreviewStatus("FAILED");
setOfficePreviewError("触发预览转换失败");
} finally {
setPreviewLoading(false);
}
return;
}
const fileType = resolvePreviewFileType(file?.fileName);
if (!fileType) {
message.warning({ content: "不支持预览该文件类型" });
return;
}
const fileUrl = `/api/data-management/datasets/${datasetId}/files/${file.id}/download`;
setPreviewFileName(file.fileName);
setPreviewFileType(fileType);
setPreviewContent("");
setPreviewMediaUrl("");
if (fileType === "text") {
setPreviewLoading(true);
try {
const response = await fetch(fileUrl);
const response = await fetch(previewUrl);
if (!response.ok) {
throw new Error("下载失败");
}
@@ -136,18 +227,67 @@ export function useFilesOperation(dataset: Dataset) {
return;
}
setPreviewMediaUrl(fileUrl);
setPreviewMediaUrl(previewUrl);
setPreviewVisible(true);
};
const closePreview = () => {
clearOfficePreviewPolling();
officePreviewFileRef.current = null;
setPreviewVisible(false);
setPreviewContent("");
setPreviewMediaUrl("");
setPreviewFileName("");
setPreviewFileType("text");
setOfficePreviewStatus(null);
setOfficePreviewError("");
};
const pollOfficePreviewStatus = useCallback(
async (datasetId: string, fileId: string, attempt: number) => {
clearOfficePreviewPolling();
officePreviewPollingRef.current = window.setTimeout(async () => {
if (officePreviewFileRef.current !== fileId) {
return;
}
try {
const { data } = await queryDatasetFilePreviewStatusUsingGet(datasetId, fileId);
const status = normalizeOfficePreviewStatus(data?.status);
if (status === "READY") {
setPreviewMediaUrl(`/api/data-management/datasets/${datasetId}/files/${fileId}/preview`);
setOfficePreviewStatus("READY");
setOfficePreviewError("");
setPreviewLoading(false);
return;
}
if (status === "FAILED") {
setOfficePreviewStatus("FAILED");
setOfficePreviewError(data?.previewError || "转换失败,请稍后重试");
setPreviewLoading(false);
return;
}
if (attempt >= OFFICE_PREVIEW_POLL_MAX_TIMES - 1) {
setOfficePreviewStatus("FAILED");
setOfficePreviewError("转换超时,请稍后重试");
setPreviewLoading(false);
return;
}
pollOfficePreviewStatus(datasetId, fileId, attempt + 1);
} catch (error) {
console.error("轮询预览状态失败", error);
if (attempt >= OFFICE_PREVIEW_POLL_MAX_TIMES - 1) {
setOfficePreviewStatus("FAILED");
setOfficePreviewError("转换超时,请稍后重试");
setPreviewLoading(false);
return;
}
pollOfficePreviewStatus(datasetId, fileId, attempt + 1);
}
}, OFFICE_PREVIEW_POLL_INTERVAL);
},
[clearOfficePreviewPolling]
);
const handleDeleteFile = async (file: DatasetFile) => {
try {
await deleteDatasetFileUsingDelete(dataset.id, file.id);
@@ -190,6 +330,8 @@ export function useFilesOperation(dataset: Dataset) {
previewFileType,
previewMediaUrl,
previewLoading,
officePreviewStatus,
officePreviewError,
closePreview,
fetchFiles,
setFileList,
@@ -240,4 +382,5 @@ interface DatasetFilesQueryParams {
size: number;
isWithDirectory: boolean;
prefix: string;
excludeDerivedFiles?: boolean;
}

View File

@@ -8,8 +8,8 @@ import {
} from "@ant-design/icons";
import TagManager from "@/components/business/TagManagement";
import { Link, useNavigate } from "react-router";
import { useEffect, useMemo, useState } from "react";
import type { ReactNode } from "react";
import { useEffect, useMemo, useState } from "react";
import type { ReactNode } from "react";
import { SearchControls } from "@/components/SearchControls";
import CardView from "@/components/CardView";
import type { Dataset } from "@/pages/DataManagement/dataset.model";
@@ -36,19 +36,19 @@ export default function DatasetManagementPage() {
const [editDatasetOpen, setEditDatasetOpen] = useState(false);
const [currentDataset, setCurrentDataset] = useState<Dataset | null>(null);
const [showUploadDialog, setShowUploadDialog] = useState(false);
const [statisticsData, setStatisticsData] = useState<StatisticsData>({
count: [],
size: [],
});
const [statisticsData, setStatisticsData] = useState<StatisticsData>({
count: [],
size: [],
});
async function fetchStatistics() {
const { data } = await getDatasetStatisticsUsingGet();
const statistics: StatisticsData = {
size: [
{
title: "数据集总数",
value: data?.totalDatasets || 0,
const statistics: StatisticsData = {
size: [
{
title: "数据集总数",
value: data?.totalDatasets || 0,
},
{
title: "文件总数",
@@ -76,10 +76,10 @@ export default function DatasetManagementPage() {
title: "视频",
value: data?.count?.video || 0,
},
],
};
setStatisticsData(statistics);
}
],
};
setStatisticsData(statistics);
}
const [tags, setTags] = useState<string[]>([]);
@@ -136,9 +136,9 @@ export default function DatasetManagementPage() {
message.success("数据集下载成功");
};
const handleDeleteDataset = async (id: string) => {
if (!id) return;
await deleteDatasetByIdUsingDelete(id);
const handleDeleteDataset = async (id: string) => {
if (!id) return;
await deleteDatasetByIdUsingDelete(id);
fetchData({ pageOffset: 0 });
message.success("数据删除成功");
};
@@ -223,12 +223,12 @@ export default function DatasetManagementPage() {
title: "状态",
dataIndex: "status",
key: "status",
render: (status: DatasetStatusMeta) => {
return (
<Tag icon={status?.icon} color={status?.color}>
{status?.label}
</Tag>
);
render: (status: DatasetStatusMeta) => {
return (
<Tag icon={status?.icon} color={status?.color}>
{status?.label}
</Tag>
);
},
width: 120,
},
@@ -274,10 +274,10 @@ export default function DatasetManagementPage() {
key: "actions",
width: 200,
fixed: "right",
render: (_: unknown, record: Dataset) => (
<div className="flex items-center gap-2">
{operations.map((op) => (
<Tooltip key={op.key} title={op.label}>
render: (_: unknown, record: Dataset) => (
<div className="flex items-center gap-2">
{operations.map((op) => (
<Tooltip key={op.key} title={op.label}>
<Button
type="text"
icon={op.icon}
@@ -329,7 +329,7 @@ export default function DatasetManagementPage() {
<div className="gap-4 h-full flex flex-col">
{/* Header */}
<div className="flex items-center justify-between">
<h1 className="text-xl font-bold"></h1>
<h1 className="text-xl font-bold"></h1>
<div className="flex gap-2 items-center">
{/* tasks */}
<TagManager
@@ -353,13 +353,13 @@ export default function DatasetManagementPage() {
<div className="grid grid-cols-1 gap-4">
<Card>
<div className="grid grid-cols-3">
{statisticsData.size.map((item) => (
<Statistic
title={item.title}
key={item.title}
value={`${item.value}`}
/>
))}
{statisticsData.size.map((item) => (
<Statistic
title={item.title}
key={item.title}
value={`${item.value}`}
/>
))}
</div>
</Card>
</div>
@@ -396,22 +396,22 @@ export default function DatasetManagementPage() {
updateEvent="update:datasets"
/>
</div>
);
}
type StatisticsItem = {
title: string;
value: number | string;
};
type StatisticsData = {
count: StatisticsItem[];
size: StatisticsItem[];
};
type DatasetStatusMeta = {
label: string;
value: string;
color: string;
icon: ReactNode;
};
);
}
type StatisticsItem = {
title: string;
value: number | string;
};
type StatisticsData = {
count: StatisticsItem[];
size: StatisticsItem[];
};
type DatasetStatusMeta = {
label: string;
value: string;
color: string;
icon: ReactNode;
};

View File

@@ -107,17 +107,33 @@ export function deleteDirectoryUsingDelete(
return del(`/api/data-management/datasets/${id}/files/directories?prefix=${encodeURIComponent(directoryPath)}`);
}
export function downloadFileByIdUsingGet(
id: string | number,
fileId: string | number,
fileName: string
) {
return download(
`/api/data-management/datasets/${id}/files/${fileId}/download`,
null,
fileName
);
}
export function downloadFileByIdUsingGet(
id: string | number,
fileId: string | number,
fileName: string
) {
return download(
`/api/data-management/datasets/${id}/files/${fileId}/download`,
null,
fileName
);
}
// 数据集文件预览状态
export function queryDatasetFilePreviewStatusUsingGet(
datasetId: string | number,
fileId: string | number
) {
return get(`/api/data-management/datasets/${datasetId}/files/${fileId}/preview/status`);
}
// 触发数据集文件预览转换
export function convertDatasetFilePreviewUsingPost(
datasetId: string | number,
fileId: string | number
) {
return post(`/api/data-management/datasets/${datasetId}/files/${fileId}/preview/convert`, {});
}
// 删除数据集文件
export function deleteDatasetFileUsingDelete(

View File

@@ -34,10 +34,12 @@ export enum DataSource {
export interface DatasetFile {
id: string;
datasetId?: string;
fileName: string;
size: string;
uploadDate: string;
path: string;
filePath?: string;
}
export interface Dataset {
@@ -100,6 +102,13 @@ export interface DatasetTask {
executionHistory?: { time: string; status: string }[];
}
export interface StreamUploadInfo {
currentFile: string;
fileIndex: number;
totalFiles: number;
uploadedLines: number;
}
export interface TaskItem {
key: string;
title: string;
@@ -111,4 +120,6 @@ export interface TaskItem {
updateEvent?: string;
size?: number;
hasArchive?: boolean;
prefix?: string;
streamUploadInfo?: StreamUploadInfo;
}

View File

@@ -1,12 +1,23 @@
import type React from "react";
import { useEffect, useState } from "react";
import { Table, Badge, Button, Breadcrumb, Tooltip, App, Card, Input, Empty, Spin } from "antd";
import { useCallback, useEffect, useMemo, useState } from "react";
import {
Table,
Badge,
Button,
Breadcrumb,
Tooltip,
App,
Card,
Input,
Empty,
Spin,
} from "antd";
import {
DeleteOutlined,
EditOutlined,
ReloadOutlined,
} from "@ant-design/icons";
import { useNavigate, useParams } from "react-router";
import { useNavigate, useParams, useSearchParams } from "react-router";
import DetailHeader from "@/components/DetailHeader";
import { SearchControls } from "@/components/SearchControls";
import { KBFile, KnowledgeBaseItem } from "../knowledge-base.model";
@@ -18,9 +29,9 @@ import {
queryKnowledgeBaseFilesUsingGet,
retrieveKnowledgeBaseContent,
} from "../knowledge-base.api";
import useFetchData from "@/hooks/useFetchData";
import AddDataDialog from "../components/AddDataDialog";
import CreateKnowledgeBase from "../components/CreateKnowledgeBase";
import { File, Folder } from "lucide-react";
interface StatisticItem {
icon?: React.ReactNode;
@@ -39,44 +50,127 @@ interface RecallResult {
primaryKey?: string;
}
type KBFileRow = KBFile & {
isDirectory?: boolean;
displayName?: string;
fullPath?: string;
fileCount?: number;
};
const PATH_SEPARATOR = "/";
const normalizePath = (value?: string) =>
(value ?? "").replace(/\\/g, PATH_SEPARATOR);
const normalizePrefix = (value?: string) => {
const trimmed = normalizePath(value).replace(/^\/+/, "").trim();
if (!trimmed) {
return "";
}
return trimmed.endsWith(PATH_SEPARATOR)
? trimmed
: `${trimmed}${PATH_SEPARATOR}`;
};
const splitRelativePath = (fullPath: string, prefix: string) => {
if (prefix && !fullPath.startsWith(prefix)) {
return [];
}
const remainder = fullPath.slice(prefix.length);
return remainder.split(PATH_SEPARATOR).filter(Boolean);
};
const resolveFileRelativePath = (file: KBFile) => {
const rawPath = file.relativePath || file.fileName || file.name || "";
return normalizePath(rawPath).replace(/^\/+/, "");
};
const KnowledgeBaseDetailPage: React.FC = () => {
const navigate = useNavigate();
const [searchParams] = useSearchParams();
const { message } = App.useApp();
const { id } = useParams<{ id: string }>();
const [knowledgeBase, setKnowledgeBase] = useState<KnowledgeBaseItem | undefined>(undefined);
const [showEdit, setShowEdit] = useState(false);
const [activeTab, setActiveTab] = useState<'fileList' | 'recallTest'>('fileList');
const [filePrefix, setFilePrefix] = useState("");
const [fileKeyword, setFileKeyword] = useState("");
const [filesLoading, setFilesLoading] = useState(false);
const [allFiles, setAllFiles] = useState<KBFile[]>([]);
const [filePagination, setFilePagination] = useState({
current: 1,
pageSize: 10,
});
const [recallLoading, setRecallLoading] = useState(false);
const [recallResults, setRecallResults] = useState<RecallResult[]>([]);
const [recallQuery, setRecallQuery] = useState("");
const fetchKnowledgeBaseDetails = async (id: string) => {
const { data } = await queryKnowledgeBaseByIdUsingGet(id);
const fetchKnowledgeBaseDetails = useCallback(async (baseId: string) => {
const { data } = await queryKnowledgeBaseByIdUsingGet(baseId);
setKnowledgeBase(mapKnowledgeBase(data));
};
}, []);
const fetchFiles = useCallback(async () => {
if (!id) {
setAllFiles([]);
return;
}
setFilesLoading(true);
try {
const pageSize = 200;
let page = 0;
let combined: KBFile[] = [];
const currentPrefix = normalizePrefix(filePrefix);
const keyword = fileKeyword.trim();
while (true) {
const { data } = await queryKnowledgeBaseFilesUsingGet(id, {
page,
size: pageSize,
...(currentPrefix ? { relativePath: currentPrefix } : {}),
...(keyword ? { fileName: keyword } : {}),
});
const content = Array.isArray(data?.content) ? data.content : [];
combined = combined.concat(content.map(mapFileData));
if (content.length < pageSize) {
break;
}
if (typeof data?.totalElements === "number" && combined.length >= data.totalElements) {
break;
}
page += 1;
}
setAllFiles(combined);
} catch (error) {
console.error("Failed to fetch knowledge base files:", error);
message.error("文件列表加载失败");
} finally {
setFilesLoading(false);
}
}, [id, filePrefix, fileKeyword, message]);
useEffect(() => {
if (id) {
fetchKnowledgeBaseDetails(id);
}
}, [id]);
}, [id, fetchKnowledgeBaseDetails]);
const {
loading,
tableData: files,
searchParams,
pagination,
fetchData: fetchFiles,
setSearchParams,
handleFiltersChange,
handleKeywordChange,
} = useFetchData<KBFile>(
(params) => id ? queryKnowledgeBaseFilesUsingGet(id, params) : Promise.resolve({ data: [] }),
mapFileData
);
useEffect(() => {
if (!id) {
return;
}
const prefixParam = searchParams.get("prefix");
const fileNameParam = searchParams.get("fileName");
setFilePrefix(prefixParam ? normalizePrefix(prefixParam) : "");
setFileKeyword(fileNameParam ? fileNameParam : "");
}, [id, searchParams]);
useEffect(() => {
if (id) {
fetchFiles();
}
}, [id, fetchFiles]);
// File table logic
const handleDeleteFile = async (file: KBFile) => {
const handleDeleteFile = async (file: KBFileRow) => {
try {
await deleteKnowledgeBaseFileByIdUsingDelete(knowledgeBase!.id, {
ids: [file.id]
@@ -119,6 +213,152 @@ const KnowledgeBaseDetailPage: React.FC = () => {
setRecallLoading(false);
};
const handleOpenDirectory = (directoryName: string) => {
const currentPrefix = normalizePrefix(filePrefix);
const nextPrefix = normalizePrefix(`${currentPrefix}${directoryName}`);
setFilePrefix(nextPrefix);
};
const handleBackToParent = () => {
const currentPrefix = normalizePrefix(filePrefix);
if (!currentPrefix) {
return;
}
const trimmed = currentPrefix.replace(/\/$/, "");
const parts = trimmed.split(PATH_SEPARATOR).filter(Boolean);
parts.pop();
const parentPrefix = parts.length
? `${parts.join(PATH_SEPARATOR)}${PATH_SEPARATOR}`
: "";
setFilePrefix(parentPrefix);
};
const handleDeleteDirectory = async (directoryName: string) => {
if (!knowledgeBase?.id) {
return;
}
const currentPrefix = normalizePrefix(filePrefix);
const directoryPrefix = normalizePrefix(`${currentPrefix}${directoryName}`);
const targetIds = allFiles
.filter((file) => {
const fullPath = resolveFileRelativePath(file);
return fullPath.startsWith(directoryPrefix);
})
.map((file) => file.id);
if (targetIds.length === 0) {
message.info("该文件夹为空");
return;
}
try {
await deleteKnowledgeBaseFileByIdUsingDelete(knowledgeBase.id, {
ids: targetIds,
});
message.success(`已删除 ${targetIds.length} 个文件`);
fetchFiles();
} catch {
message.error("文件夹删除失败");
}
};
const handleKeywordChange = (keyword: string) => {
setFileKeyword(keyword);
};
useEffect(() => {
setFilePagination((prev) => ({ ...prev, current: 1 }));
}, [filePrefix, fileKeyword]);
const normalizedPrefix = useMemo(() => normalizePrefix(filePrefix), [filePrefix]);
const { rows: fileRows, total: fileTotal } = useMemo(() => {
const folderMap = new Map<string, { name: string; fileCount: number }>();
const fileItems: KBFileRow[] = [];
allFiles.forEach((file) => {
const fullPath = resolveFileRelativePath(file);
if (!fullPath) {
return;
}
const segments = splitRelativePath(fullPath, normalizedPrefix);
if (segments.length === 0) {
return;
}
const leafName = segments[0];
if (segments.length > 1) {
const folderName = leafName;
const entry = folderMap.get(folderName) || {
name: folderName,
fileCount: 0,
};
entry.fileCount += 1;
folderMap.set(folderName, entry);
return;
}
const normalizedFileName = normalizePath(file.fileName);
const displayName = normalizedFileName.includes(PATH_SEPARATOR)
? leafName
: file.fileName || leafName;
fileItems.push({
...file,
name: displayName,
displayName,
fullPath,
});
});
const folderItems: KBFileRow[] = Array.from(folderMap.values()).map(
(entry) =>
({
id: `directory-${normalizedPrefix}${entry.name}`,
fileName: entry.name,
name: entry.name,
status: null,
chunkCount: 0,
createdAt: "",
updatedAt: "",
metadata: {},
knowledgeBaseId: knowledgeBase?.id || "",
fileId: "",
updatedBy: "",
createdBy: "",
isDirectory: true,
displayName: entry.name,
fullPath: `${normalizedPrefix}${entry.name}/`,
fileCount: entry.fileCount,
}) as KBFileRow
);
const sortByName = (a: KBFileRow, b: KBFileRow) =>
(a.displayName || a.name || "").localeCompare(
b.displayName || b.name || "",
"zh-Hans-CN"
);
folderItems.sort(sortByName);
fileItems.sort(sortByName);
const combined = [...folderItems, ...fileItems];
return { rows: combined, total: combined.length };
}, [allFiles, knowledgeBase?.id, normalizedPrefix]);
const filePageCurrent = filePagination.current;
const filePageSize = filePagination.pageSize;
const pagedFileRows = useMemo(() => {
const startIndex = (filePageCurrent - 1) * filePageSize;
const endIndex = startIndex + filePageSize;
return fileRows.slice(startIndex, endIndex);
}, [filePageCurrent, filePageSize, fileRows]);
useEffect(() => {
const maxPage = Math.max(1, Math.ceil(fileTotal / filePageSize));
if (filePageCurrent > maxPage) {
setFilePagination((prev) => ({ ...prev, current: maxPage }));
}
}, [filePageCurrent, filePageSize, fileTotal]);
const operations = [
{
key: "edit",
@@ -170,14 +410,38 @@ const KnowledgeBaseDetailPage: React.FC = () => {
width: 200,
ellipsis: true,
fixed: "left" as const,
render: (name: string, record: KBFileRow) => {
const displayName = record.displayName || name;
if (record.isDirectory) {
return (
<Button
type="link"
onClick={() => handleOpenDirectory(displayName)}
className="flex items-center gap-2 p-0"
>
<Folder className="w-4 h-4 text-blue-500" />
<span className="truncate">{displayName}</span>
</Button>
);
}
return (
<div className="flex items-center gap-2">
<File className="w-4 h-4 text-gray-800" />
<span className="truncate">{displayName}</span>
</div>
);
},
},
{
title: "状态",
dataIndex: "status",
key: "vectorizationStatus",
width: 120,
render: (status: unknown) => {
if (typeof status === 'object' && status !== null) {
render: (status: unknown, record: KBFileRow) => {
if (record.isDirectory) {
return <Badge color="default" text="文件夹" />;
}
if (typeof status === "object" && status !== null) {
const s = status as { color?: string; label?: string };
return <Badge color={s.color} text={s.label} />;
}
@@ -190,6 +454,8 @@ const KnowledgeBaseDetailPage: React.FC = () => {
key: "chunkCount",
width: 100,
ellipsis: true,
render: (value: number, record: KBFileRow) =>
record.isDirectory ? "-" : value ?? 0,
},
{
title: "创建时间",
@@ -197,6 +463,8 @@ const KnowledgeBaseDetailPage: React.FC = () => {
key: "createdAt",
ellipsis: true,
width: 180,
render: (value: string, record: KBFileRow) =>
record.isDirectory ? "-" : value || "-",
},
{
title: "更新时间",
@@ -204,26 +472,51 @@ const KnowledgeBaseDetailPage: React.FC = () => {
key: "updatedAt",
ellipsis: true,
width: 180,
render: (value: string, record: KBFileRow) =>
record.isDirectory ? "-" : value || "-",
},
{
title: "操作",
key: "actions",
align: "right" as const,
width: 100,
render: (_: unknown, file: KBFile) => (
<div>
{fileOps.map((op) => (
<Tooltip key={op.key} title={op.label}>
render: (_: unknown, file: KBFileRow) => {
if (file.isDirectory) {
return (
<Tooltip title="删除文件夹">
<Button
type="text"
icon={op.icon}
danger={op?.danger}
onClick={() => op.onClick(file)}
icon={<DeleteOutlined className="w-4 h-4" />}
danger
onClick={() => {
modal.confirm({
title: "确认删除该文件夹吗?",
content: `删除后将移除文件夹 “${file.displayName || file.name}” 下的全部文件,且无法恢复。`,
okText: "删除",
okType: "danger",
cancelText: "取消",
onOk: () => handleDeleteDirectory(file.displayName || file.name),
});
}}
/>
</Tooltip>
))}
</div>
),
);
}
return (
<div>
{fileOps.map((op) => (
<Tooltip key={op.key} title={op.label}>
<Button
type="text"
icon={op.icon}
danger={op?.danger}
onClick={() => op.onClick(file)}
/>
</Tooltip>
))}
</div>
);
},
},
];
@@ -265,12 +558,12 @@ const KnowledgeBaseDetailPage: React.FC = () => {
<>
<div className="flex-1">
<SearchControls
searchTerm={searchParams.keyword}
searchTerm={fileKeyword}
onSearchChange={handleKeywordChange}
searchPlaceholder="搜索文件名..."
filters={[]}
onFiltersChange={handleFiltersChange}
onClearFilters={() => setSearchParams({ ...searchParams, filter: { type: [], status: [], tags: [] } })}
onFiltersChange={() => {}}
onClearFilters={() => setFileKeyword("")}
showViewToggle={false}
showReload={false}
/>
@@ -281,14 +574,54 @@ const KnowledgeBaseDetailPage: React.FC = () => {
</div>
{activeTab === 'fileList' ? (
<Table
loading={loading}
columns={fileColumns}
dataSource={files}
rowKey="id"
pagination={pagination}
scroll={{ y: "calc(100vh - 30rem)" }}
/>
<>
<div className="mb-2">
{normalizedPrefix && (
<Button type="link" onClick={handleBackToParent} className="p-0">
<span className="flex items-center text-blue-500">
<svg
className="w-4 h-4 mr-1"
fill="none"
stroke="currentColor"
viewBox="0 0 24 24"
xmlns="http://www.w3.org/2000/svg"
>
<path
strokeLinecap="round"
strokeLinejoin="round"
strokeWidth={2}
d="M10 19l-7-7m0 0l7-7m-7 7h18"
/>
</svg>
</span>
</Button>
)}
{normalizedPrefix && (
<span className="ml-2 text-gray-600">
: {normalizedPrefix}
</span>
)}
</div>
<Table
loading={filesLoading}
columns={fileColumns}
dataSource={pagedFileRows}
rowKey="id"
pagination={{
current: filePagination.current,
pageSize: filePagination.pageSize,
total: fileTotal,
showTotal: (total) => `${total}`,
onChange: (page, pageSize) =>
setFilePagination({
current: page,
pageSize: pageSize || filePagination.pageSize,
}),
}}
scroll={{ y: "calc(100vh - 30rem)" }}
/>
</>
) : (
<div className="p-2">
<div style={{ fontSize: 14, fontWeight: 300, marginBottom: 8 }}></div>

Some files were not shown because too many files have changed in this diff Show More