Jerry Yan a260134d7c fix(knowledge-graph): 修复 Codex 审查发现的 5 个问题并新增查询功能
本次提交包含两部分内容:
1. 新增知识图谱查询功能(邻居查询、最短路径、子图提取、全文搜索)
2. 修复 Codex 代码审查发现的 5 个问题(3 个 P1 严重问题 + 2 个 P2 次要问题)

## 新增功能

### GraphQueryService 和 GraphQueryController
- 邻居查询:GET /query/neighbors/{entityId}?depth=2&limit=50
- 最短路径:GET /query/shortest-path?sourceId=...&targetId=...&maxDepth=3
- 子图提取:POST /query/subgraph + body {"entityIds": [...]}
- 全文搜索:GET /query/search?q=keyword&page=0&size=20

### 新增 DTO
- EntitySummaryVO, EdgeSummaryVO:实体和边的摘要信息
- SubgraphVO:子图结果(nodes + edges + counts)
- PathVO:路径结果
- SearchHitVO:搜索结果(含相关度分数)
- SubgraphRequest:子图请求 DTO(含校验)

## 问题修复

### P1-1: 邻居查询图边界风险
**文件**: GraphQueryService.java
**问题**: getNeighborGraph 使用 -[*1..N]-,未约束中间路径节点/关系的 graph_id
**修复**:
- 使用路径变量 p:MATCH p = ...
- 添加 ALL(n IN nodes(p) WHERE n.graph_id = $graphId)
- 添加 ALL(r IN relationships(p) WHERE r.graph_id = $graphId)
- 限定关系类型为 :RELATED_TO
- 排除自环:WHERE e <> neighbor

### P1-2: 全图扫描性能风险
**文件**: GraphRelationRepository.java
**问题**: findByEntityId/countByEntityId 先匹配全图关系,再用 s.id = X OR t.id = X 过滤
**修复**:
- findByEntityId:改为 CALL { 出边锚定查询 UNION ALL 入边锚定查询 }
- countByEntityId:
  - "in"/"out" 方向:将 id: $entityId 直接写入 MATCH 模式
  - "all" 方向:改为 CALL { 出边 UNION 入边 } RETURN count(r)
- 利用 (graph_id, id) 索引直接定位,避免全图扫描

### P1-3: 接口破坏性变更
**文件**: GraphEntityController.java
**问题**: GET /knowledge-graph/{graphId}/entities 从 List<GraphEntity> 变为 PagedResponse<GraphEntity>
**修复**: 使用 Spring MVC params 属性实现零破坏性升级
- @GetMapping(params = "!page"):无 page 参数时返回 List(向后兼容)
- @GetMapping(params = "page"):有 page 参数时返回 PagedResponse(新功能)
- 现有调用方无需改动,新调用方可选择分页

### P2-4: direction 参数未严格校验
**文件**: GraphEntityController.java, GraphRelationService.java
**问题**: 非法 direction 值被静默当作 "all" 处理
**修复**: 双层校验
- Controller 层:@Pattern(regexp = "^(all|in|out)$")
- Service 层:VALID_DIRECTIONS.contains() 校验
- 非法值返回 INVALID_PARAMETER 异常

### P2-5: 子图接口请求体缺少元素级校验
**文件**: GraphQueryController.java, SubgraphRequest.java
**问题**: /query/subgraph 直接接收 List<String>,无 UUID 校验
**修复**: 创建 SubgraphRequest DTO
- @NotEmpty:列表不能为空
- @Size(max = 500):元素数量上限
- List<@Pattern(UUID) String>:每个元素必须是合法 UUID
- Controller 使用 @Valid @RequestBody SubgraphRequest
- ⚠️ API 变更:请求体格式从 ["uuid1"] 变为 {"entityIds": ["uuid1"]}

## 技术亮点

1. **图边界安全**: 路径变量 + ALL 约束确保跨图查询安全
2. **查询性能**: 实体锚定查询替代全图扫描,利用索引优化
3. **向后兼容**: params 属性实现同路径双端点,零破坏性升级
4. **多层防御**: Controller + Service 双层校验,框架级 + 业务级
5. **类型安全**: DTO + Bean Validation 确保请求体格式和内容合法

## 测试建议

1. 编译验证:mvn -pl services/knowledge-graph-service -am compile
2. 测试邻居查询的图边界约束
3. 测试实体关系查询的性能(大数据集)
4. 验证实体列表接口的向后兼容性(无 page 参数)
5. 测试 direction 参数的非法值拒绝
6. 测试子图接口的请求体校验(非法 UUID、空列表、超限)

Co-authored-by: Claude (Anthropic)
Reviewed-by: Codex (OpenAI)
2026-02-18 07:49:16 +08:00
2025-11-04 20:30:40 +08:00
2025-12-11 23:17:01 +08:00
2025-12-11 23:17:01 +08:00

DataMate All-in-One Data Work Platform

Backend CI Frontend CI GitHub Stars GitHub Forks GitHub Issues GitHub License

DataMate is an enterprise-level data processing platform for model fine-tuning and RAG retrieval, supporting core functions such as data collection, data management, operator marketplace, data cleaning, data synthesis, data annotation, data evaluation, and knowledge generation.

简体中文 | English

If you like this project, please give it a Star️!

🌟 Core Features

  • Core Modules: Data Collection, Data Management, Operator Marketplace, Data Cleaning, Data Synthesis, Data Annotation, Data Evaluation, Knowledge Generation.
  • Visual Orchestration: Drag-and-drop data processing workflow design.
  • Operator Ecosystem: Rich built-in operators and support for custom operators.

🚀 Quick Start

Prerequisites

  • Git (for pulling source code)
  • Make (for building and installing)
  • Docker (for building images and deploying services)
  • Docker-Compose (for service deployment - Docker method)
  • Kubernetes (for service deployment - k8s method)
  • Helm (for service deployment - k8s method)

This project supports deployment via two methods: docker-compose and helm. After executing the command, please enter the corresponding number for the deployment method. The command echo is as follows:

Choose a deployment method:
1. Docker/Docker-Compose
2. Kubernetes/Helm
Enter choice:

Clone the Code

git clone git@github.com:ModelEngine-Group/DataMate.git
cd DataMate

Deploy the basic services

make install

If the machine you are using does not have make installed, please run the following command to deploy it:

# Windows
set REGISTRY=ghcr.io/modelengine-group/
docker compose -f ./deployment/docker/datamate/docker-compose.yml up -d
docker compose -f ./deployment/docker/milvus/docker-compose.yml up -d

# Linux/Mac
export REGISTRY=ghcr.io/modelengine-group/
docker compose -f ./deployment/docker/datamate/docker-compose.yml up -d
docker compose -f ./deployment/docker/milvus/docker-compose.yml up -d

Once the container is running, access http://localhost:30000 in a browser to view the front-end interface.

To list all available Make targets, flags and help text, run:

make help

Build and deploy Mineru Enhanced PDF Processing

make build-mineru
make install-mineru

Deploy the DeerFlow service

make install-deer-flow

Local Development and Deployment

After modifying the local code, please execute the following commands to build the image and deploy using the local image.

make build
make install dev=true

Uninstall

make uninstall

When running make uninstall, the installer will prompt once whether to delete volumes; that single choice is applied to all components. The uninstall order is: milvus -> label-studio -> datamate, which ensures the datamate network is removed cleanly after services that use it have stopped.

🤝 Contribution Guidelines

Thank you for your interest in this project! We warmly welcome contributions from the community. Whether it's submitting bug reports, suggesting new features, or directly participating in code development, all forms of help make the project better.

📮 GitHub Issues: Submit bugs or feature suggestions.

🔧 GitHub Pull Requests: Contribute code improvements.

📄 License

DataMate is open source under the MIT license. You are free to use, modify, and distribute the code of this project in compliance with the license terms.

Description
No description provided
Readme 12 MiB
Languages
JavaScript 41.9%
TypeScript 19.9%
Java 16.7%
Python 15.6%
Smarty 4.4%
Other 1.5%