You've already forked DataMate
feat(kg): 实现 Phase 3.1 前端图谱浏览器
核心功能: - G6 v5 力导向图,支持交互式缩放、平移、拖拽 - 5 种布局模式:force, circular, grid, radial, concentric - 双击展开节点邻居到图中(增量探索) - 全文搜索,类型过滤,结果高亮(变暗/高亮状态) - 节点详情抽屉:实体属性、别名、置信度、关系列表(可导航) - 关系详情抽屉:类型、源/目标、权重、置信度、属性 - 查询构建器:最短路径/全路径查询,可配置 maxDepth/maxPaths - 基于 UUID 的图加载(输入或 URL 参数 ?graphId=...) - 大图性能优化(200 节点阈值,超过时禁用动画) 新增文件(13 个): - knowledge-graph.model.ts - TypeScript 接口,匹配 Java DTOs - knowledge-graph.api.ts - API 服务,包含所有 KG REST 端点 - knowledge-graph.const.ts - 实体类型颜色、关系类型标签、中文显示名称 - graphTransform.ts - 后端数据 → G6 节点/边格式转换 + 合并工具 - graphConfig.ts - G6 v5 图配置(节点/边样式、行为、布局) - hooks/useGraphData.ts - 数据钩子:加载子图、展开节点、搜索、合并 - hooks/useGraphLayout.ts - 布局钩子:5 种布局类型 - components/GraphCanvas.tsx - G6 v5 画布,力导向布局,缩放/平移/拖拽 - components/SearchPanel.tsx - 全文实体搜索,类型过滤 - components/NodeDetail.tsx - 实体详情抽屉 - components/RelationDetail.tsx - 关系详情抽屉 - components/QueryBuilder.tsx - 路径查询构建器 - Home/KnowledgeGraphPage.tsx - 主页面,整合所有组件 修改文件(5 个): - package.json - 添加 @antv/g6 v5 依赖 - vite.config.ts - 添加 /knowledge-graph 代理规则 - auth/permissions.ts - 添加 knowledgeGraphRead/knowledgeGraphWrite - pages/Layout/menu.tsx - 添加知识图谱菜单项(Network 图标) - routes/routes.ts - 添加 /data/knowledge-graph 路由 新增文档(10 个): - docs/knowledge-graph/ - 完整的知识图谱设计文档 Bug 修复(Codex 审查后修复): - P1: 详情抽屉状态与选中状态不一致(显示旧数据) - P1: 查询构建器未实现(最短路径/多路径查询) - P2: 实体类型映射 Organization → Org(匹配后端) - P2: getSubgraph depth 参数无效(改用正确端点) - P2: AllPathsVO 字段名不一致(totalPaths → pathCount) - P2: 搜索取消逻辑无效(传递 AbortController.signal) - P2: 大图性能优化(动画降级) - P3: 移除未使用的类型导入 构建验证: - tsc --noEmit ✅ clean - eslint ✅ 0 errors/warnings - vite build ✅ successful
This commit is contained in:
223
docs/knowledge-graph/README.md
Normal file
223
docs/knowledge-graph/README.md
Normal file
@@ -0,0 +1,223 @@
|
||||
# DataMate 知识图谱实现方案
|
||||
|
||||
## 📋 项目概述
|
||||
|
||||
DataMate 知识图谱旨在构建企业级数据处理平台的知识网络,通过图结构揭示数据资产之间的关系,支持智能推荐、影响分析、血缘追踪等高级功能。
|
||||
|
||||
## 🎯 核心目标
|
||||
|
||||
1. **数据血缘追踪**:追踪数据从源到目标的完整流转路径
|
||||
2. **影响分析**:评估数据变更对下游任务的影响范围
|
||||
3. **智能推荐**:基于历史使用模式推荐相关数据集和工作流
|
||||
4. **知识发现**:挖掘隐藏的数据关系和模式
|
||||
|
||||
## 🏗️ 技术架构
|
||||
|
||||
### 技术栈
|
||||
|
||||
```
|
||||
存储层:MySQL (元数据) + Neo4j (图结构) + Milvus (向量)
|
||||
后端:Spring Boot (kg-service) + FastAPI (kg-ingestion)
|
||||
前端:React + AntV G6
|
||||
抽取:LangChain LLMGraphTransformer
|
||||
```
|
||||
|
||||
### 架构设计
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────┐
|
||||
│ 前端层 │
|
||||
│ React + AntV G6 (图谱可视化 + 编辑) │
|
||||
└─────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────┐
|
||||
│ 服务层 │
|
||||
│ kg-service (Spring Boot) │
|
||||
│ - 图查询 API │
|
||||
│ - 权限过滤 │
|
||||
│ - 缓存层 (Redis) │
|
||||
│ │
|
||||
│ rag-query-service (增强) │
|
||||
│ - 混合检索 (Milvus + Neo4j) │
|
||||
│ - GraphRAG │
|
||||
└─────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────┐
|
||||
│ 摄入层 │
|
||||
│ kg-ingestion (FastAPI) │
|
||||
│ - LangChain LLMGraphTransformer │
|
||||
│ - 实体对齐 │
|
||||
│ - 关系生成 │
|
||||
└─────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────┐
|
||||
│ 存储层 │
|
||||
│ MySQL + Neo4j + Milvus │
|
||||
└─────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## 📊 数据模型
|
||||
|
||||
> 详细定义参见 [实体文档](./schema/entities.md) 和 [关系文档](./schema/relationships.md)
|
||||
|
||||
### 核心实体(8 类)
|
||||
|
||||
- **Dataset**:数据集
|
||||
- **Field**:字段
|
||||
- **LabelTask**:标注任务
|
||||
- **Workflow**:工作流
|
||||
- **Job**:作业
|
||||
- **User**:用户
|
||||
- **Org**:组织
|
||||
- **KnowledgeSet**:知识集
|
||||
|
||||
### 核心关系(10 类)
|
||||
|
||||
- **HAS_FIELD**:Dataset → Field,数据集包含字段
|
||||
- **DERIVED_FROM**:Dataset → Dataset,数据集血缘派生
|
||||
- **USES_DATASET**:Job/LabelTask/Workflow → Dataset,使用数据集
|
||||
- **PRODUCES**:Job → Dataset,作业产出数据集
|
||||
- **ASSIGNED_TO**:LabelTask/Job → User,任务分配给用户
|
||||
- **BELONGS_TO**:User/Dataset → Org,组织归属
|
||||
- **TRIGGERS**:Workflow → Job,工作流触发作业
|
||||
- **DEPENDS_ON**:Job → Job,作业执行依赖
|
||||
- **IMPACTS**:Field → Field,字段级影响
|
||||
- **SOURCED_FROM**:KnowledgeSet → Dataset,知识溯源
|
||||
|
||||
### 节点公共属性
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "UUID,全局唯一标识符",
|
||||
"name": "实体名称",
|
||||
"type": "实体类型(Dataset / Field / LabelTask 等)",
|
||||
"description": "实体描述",
|
||||
"graph_id": "所属图谱 ID(多租户隔离)",
|
||||
"source_id": "来源记录 ID",
|
||||
"source_type": "来源类型:SYNC / EXTRACTION / MANUAL",
|
||||
"confidence": "置信度 0.0-1.0",
|
||||
"created_at": "创建时间"
|
||||
}
|
||||
```
|
||||
|
||||
### 边公共属性
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "UUID,关系唯一标识符",
|
||||
"relation_type": "语义关系类型",
|
||||
"graph_id": "所属图谱 ID",
|
||||
"weight": "关系权重 0.0-1.0",
|
||||
"confidence": "置信度 0.0-1.0",
|
||||
"source_id": "来源记录 ID",
|
||||
"properties_json": "扩展属性 JSON",
|
||||
"created_at": "创建时间"
|
||||
}
|
||||
```
|
||||
|
||||
## 🚀 实施路线图
|
||||
|
||||
### 第 0 阶段:基础设施(1周)✅ 已完成
|
||||
|
||||
- ✅ 搭建 Neo4j(docker-compose)
|
||||
- ✅ 更新 Makefile
|
||||
- ✅ 创建 knowledge-graph-service(Spring Boot)
|
||||
- ✅ 创建 kg_extraction 模块(Python)
|
||||
- ✅ 代码审查和修复(3 轮审查,2 轮修复)
|
||||
|
||||
**成果**:
|
||||
- Neo4j 配置:`deployment/docker/neo4j/docker-compose.yml`
|
||||
- Java 服务:`backend/services/knowledge-graph-service/`(11 个文件)
|
||||
- Python 模块:`runtime/datamate-python/app/module/kg_extraction/`(3 个文件)
|
||||
- Makefile 命令:`neo4j-up`, `neo4j-down`, `neo4j-logs`, `neo4j-shell`
|
||||
|
||||
### 第 1 阶段:MVP(2-3周)⏳ 进行中
|
||||
|
||||
**目标**:实现基础的图谱构建和查询功能
|
||||
|
||||
**任务**:
|
||||
1. 实现 Python 抽取器的 FastAPI 接口
|
||||
- 创建 `/api/kg/extract` 端点
|
||||
- 支持文本输入,输出节点和边
|
||||
- 集成到 FastAPI 路由
|
||||
|
||||
2. 实现 Java 服务的关系(Relation)功能
|
||||
- 补充 Relation 的 Repository/Service/Controller
|
||||
- 实现关系的 CRUD 操作
|
||||
- 支持关系查询和遍历
|
||||
|
||||
3. 定义核心实体和关系模型
|
||||
- 确定 5-8 类核心实体
|
||||
- 定义实体之间的关系
|
||||
- 设计 Schema 版本管理
|
||||
|
||||
4. 实现基础的图谱构建流程
|
||||
- 从 MySQL 同步元数据到 Neo4j
|
||||
- 实现增量更新机制
|
||||
- 支持手动触发构建
|
||||
|
||||
**验收标准**:
|
||||
- ✅ 能够从文本抽取实体和关系
|
||||
- ✅ 能够存储到 Neo4j
|
||||
- ✅ 能够查询和遍历图谱
|
||||
- ✅ 支持基础的权限控制
|
||||
|
||||
### 第 2 阶段:GraphRAG 融合(3-4周)
|
||||
|
||||
**目标**:将知识图谱与现有 RAG 系统深度融合
|
||||
|
||||
**任务**:
|
||||
1. 在 rag-query-service 中增加"混合检索"模式
|
||||
2. 查询时同时检索 Milvus(向量)+ Neo4j(图结构)
|
||||
3. 将 2-hop 子图的三元组文本化后作为 Context 喂给 LLM
|
||||
4. 实现 GraphRAG 的评估和优化
|
||||
|
||||
**验收标准**:
|
||||
- ✅ 混合检索性能优于单一检索
|
||||
- ✅ 支持可配置的检索策略
|
||||
- ✅ 有完整的评估指标
|
||||
|
||||
### 第 3 阶段:可视化与优化(4-6周)
|
||||
|
||||
**目标**:提供友好的图谱可视化和编辑功能
|
||||
|
||||
**任务**:
|
||||
1. 前端图谱浏览器(React + AntV G6)
|
||||
2. Human-in-the-loop 编辑功能
|
||||
3. 性能优化(索引、缓存、离线计算)
|
||||
4. 监控和运维(Prometheus + Grafana)
|
||||
|
||||
**验收标准**:
|
||||
- ✅ 支持大规模图谱可视化(10000+ 节点)
|
||||
- ✅ 支持实时编辑和反馈
|
||||
- ✅ 查询响应时间 < 1s
|
||||
|
||||
## 🔑 核心原则
|
||||
|
||||
1. **先做"窄而深"的场景**:不追求"大而全本体",先聚焦 2-3 个高价值场景
|
||||
2. **最终一致性**:MySQL 为主库,Neo4j 为专用存储,通过对账机制保证一致性
|
||||
3. **双重防御**:Controller 格式校验 + Service 业务校验
|
||||
4. **权限隔离**:所有操作都在正确的 graph_id 范围内
|
||||
5. **性能优先**:限制遍历深度、使用缓存、离线计算
|
||||
|
||||
## 📚 相关文档
|
||||
|
||||
- [架构设计](./architecture.md)
|
||||
- [数据模型 - 实体定义](./schema/entities.md)
|
||||
- [数据模型 - 关系定义](./schema/relationships.md)
|
||||
- [数据模型 - ER 图](./schema/er-diagram.md)
|
||||
- [实施计划](./implementation.md)
|
||||
- [Gemini 分析结果](./analysis/gemini.md)
|
||||
- [Codex 分析结果](./analysis/codex.md)
|
||||
- [Claude 分析结果](./analysis/claude.md)
|
||||
|
||||
## 🔗 快速链接
|
||||
|
||||
- Neo4j Browser: http://localhost:7474
|
||||
- Bolt URI: bolt://localhost:7687
|
||||
- 默认密码: datamate123(生产环境请修改)
|
||||
|
||||
## 📝 更新日志
|
||||
|
||||
- 2026-02-17:完成基础设施搭建(第 0 阶段)
|
||||
- 2026-02-17:创建项目文档
|
||||
289
docs/knowledge-graph/analysis/claude.md
Normal file
289
docs/knowledge-graph/analysis/claude.md
Normal file
@@ -0,0 +1,289 @@
|
||||
# Claude 知识图谱分析结果
|
||||
|
||||
## 分析时间
|
||||
2026-02-17
|
||||
|
||||
## 核心建议
|
||||
|
||||
### 1. 技术选型
|
||||
|
||||
**图数据库**:Neo4j(社区版或企业版)
|
||||
|
||||
**存储架构**:MySQL + Neo4j 双存储
|
||||
- **MySQL**:元数据主库,保持现有业务逻辑
|
||||
- **Neo4j**:图结构专用存储,支持复杂查询
|
||||
|
||||
**同步策略**:最终一致性 + 对账机制
|
||||
|
||||
### 2. 架构设计(复用现有基础设施)
|
||||
|
||||
**核心原则**:
|
||||
- 复用现有的服务架构
|
||||
- 最小化对现有系统的影响
|
||||
- 渐进式集成
|
||||
|
||||
**集成方式**:
|
||||
```
|
||||
现有服务 → MySQL(主库)
|
||||
↓ 同步
|
||||
Neo4j(图库)
|
||||
↓ 查询
|
||||
kg-service(新服务)
|
||||
```
|
||||
|
||||
### 3. 数据建模(Schema 先行 + 版本管理)
|
||||
|
||||
#### Schema 设计原则
|
||||
1. **先行设计**:明确定义实体和关系
|
||||
2. **版本管理**:支持 Schema 演进
|
||||
3. **向后兼容**:新版本兼容旧版本
|
||||
4. **文档化**:详细记录每个版本的变更
|
||||
|
||||
#### 实体属性设计
|
||||
```json
|
||||
{
|
||||
"id": "UUID",
|
||||
"name": "名称",
|
||||
"type": "类型",
|
||||
"description": "描述",
|
||||
"tenant_id": "租户ID",
|
||||
"schema_version": "1.0",
|
||||
"created_at": "创建时间",
|
||||
"updated_at": "更新时间"
|
||||
}
|
||||
```
|
||||
|
||||
#### 关系属性设计
|
||||
```json
|
||||
{
|
||||
"source": "源节点ID",
|
||||
"target": "目标节点ID",
|
||||
"type": "关系类型",
|
||||
"confidence": "置信度(0-1)",
|
||||
"source": "来源(manual/auto)",
|
||||
"valid_from": "生效时间",
|
||||
"valid_to": "失效时间"
|
||||
}
|
||||
```
|
||||
|
||||
### 4. 实施路线图(4 阶段)
|
||||
|
||||
#### 第 0 阶段:基础设施(1周)✅
|
||||
- 搭建 Neo4j
|
||||
- 创建基础服务
|
||||
- 定义 Schema
|
||||
|
||||
#### 第 1 阶段:核心功能(2-3周)
|
||||
- 实现同步机制
|
||||
- 实现基础查询
|
||||
- 集成到现有系统
|
||||
|
||||
#### 第 2 阶段:高级功能(3-4周)
|
||||
- 实现 GraphRAG
|
||||
- 实现可视化
|
||||
- 性能优化
|
||||
|
||||
#### 第 3 阶段:持续优化
|
||||
- 扩展功能
|
||||
- 优化性能
|
||||
- 提升体验
|
||||
|
||||
### 5. 挑战解决方案
|
||||
|
||||
#### 数据一致性
|
||||
**问题**:MySQL 和 Neo4j 数据可能不一致
|
||||
|
||||
**解决方案**:
|
||||
- **最终一致性**:允许短暂的不一致
|
||||
- **对账机制**:定期对比并修复
|
||||
- **事件驱动**:通过事件同步变更
|
||||
|
||||
**实现**:
|
||||
```java
|
||||
@Scheduled(cron = "0 0 2 * * *") // 每天凌晨 2 点
|
||||
public void reconcile() {
|
||||
// 1. 查询 MySQL 中的所有实体
|
||||
List<Dataset> datasets = datasetRepository.findAll();
|
||||
|
||||
// 2. 查询 Neo4j 中的所有实体
|
||||
List<GraphEntity> graphEntities = graphEntityRepository.findAll();
|
||||
|
||||
// 3. 对比并找出差异
|
||||
List<Diff> diffs = compare(datasets, graphEntities);
|
||||
|
||||
// 4. 修复差异
|
||||
for (Diff diff : diffs) {
|
||||
if (diff.getType() == DiffType.MISSING_IN_NEO4J) {
|
||||
syncToNeo4j(diff.getEntity());
|
||||
} else if (diff.getType() == DiffType.OUTDATED_IN_NEO4J) {
|
||||
updateNeo4j(diff.getEntity());
|
||||
}
|
||||
}
|
||||
|
||||
// 5. 记录日志
|
||||
log.info("Reconciliation completed: {} diffs fixed", diffs.size());
|
||||
}
|
||||
```
|
||||
|
||||
#### 性能优化
|
||||
**问题**:大规模图谱查询性能下降
|
||||
|
||||
**解决方案**:
|
||||
- **索引策略**:在高频字段上创建索引
|
||||
- **限制遍历深度**:最大 3 跳
|
||||
- **Redis 缓存**:缓存热点数据
|
||||
- **离线计算**:预计算常用子图
|
||||
|
||||
**索引创建**:
|
||||
```cypher
|
||||
// 实体 ID 索引
|
||||
CREATE INDEX entity_id IF NOT EXISTS FOR (n:Entity) ON (n.id);
|
||||
|
||||
// 租户 ID 索引
|
||||
CREATE INDEX entity_tenant_id IF NOT EXISTS FOR (n:Entity) ON (n.tenant_id);
|
||||
|
||||
// 复合索引
|
||||
CREATE INDEX entity_id_graph_id IF NOT EXISTS
|
||||
FOR (n:Entity) ON (n.id, n.graph_id);
|
||||
```
|
||||
|
||||
#### 前端可视化
|
||||
**问题**:大规模图谱难以可视化
|
||||
|
||||
**解决方案**:
|
||||
- **分层加载**:先加载核心节点,再加载周边
|
||||
- **子图裁剪**:只显示相关子图
|
||||
- **WebGL 渲染**:使用 WebGL 提升性能
|
||||
- **虚拟滚动**:只渲染可见区域
|
||||
|
||||
**推荐库**:
|
||||
- Cytoscape.js(功能丰富)
|
||||
- AntV G6(国产,文档友好)
|
||||
- vis.js(简单易用)
|
||||
|
||||
### 6. 最佳实践
|
||||
|
||||
#### 开发实践
|
||||
1. **API 规范一致**:遵循 RESTful 规范
|
||||
2. **复用现有模式**:使用现有的 DTO、ErrorCode
|
||||
3. **事件驱动解耦**:通过事件同步变更
|
||||
4. **Cypher 注入防护**:使用参数化查询
|
||||
|
||||
#### 运维实践
|
||||
1. **Neo4j 备份**:每天全量备份
|
||||
2. **监控告警**:Prometheus + Grafana
|
||||
3. **性能调优**:定期分析慢查询
|
||||
4. **容量规划**:根据数据增长预测资源需求
|
||||
|
||||
#### 部署实践
|
||||
1. **Docker 部署**:使用 docker-compose
|
||||
2. **Kubernetes 扩展**:使用 Helm Chart
|
||||
3. **灰度发布**:先在小范围验证
|
||||
4. **回滚机制**:支持快速回滚
|
||||
|
||||
### 7. 代码实现细节
|
||||
|
||||
#### 双重防御示例
|
||||
```java
|
||||
// Controller 层:格式校验
|
||||
@GetMapping("/{graphId}/entities/{entityId}")
|
||||
public GraphEntity getEntity(
|
||||
@PathVariable @Pattern(regexp = UUID_REGEX, message = "graphId 格式无效")
|
||||
String graphId,
|
||||
@PathVariable @Pattern(regexp = UUID_REGEX, message = "entityId 格式无效")
|
||||
String entityId
|
||||
) {
|
||||
return entityService.getEntity(graphId, entityId);
|
||||
}
|
||||
|
||||
// Service 层:业务校验
|
||||
public GraphEntity getEntity(String graphId, String entityId) {
|
||||
// 1. 校验 graphId 格式
|
||||
validateGraphId(graphId);
|
||||
|
||||
// 2. 查询实体(同时校验 graphId 和 entityId)
|
||||
return entityRepository.findByIdAndGraphId(entityId, graphId)
|
||||
.orElseThrow(() -> BusinessException.of(
|
||||
KnowledgeGraphErrorCode.ENTITY_NOT_FOUND
|
||||
));
|
||||
}
|
||||
|
||||
// Repository 层:数据访问
|
||||
@Query("MATCH (n:Entity {id: $id, graph_id: $graphId}) RETURN n")
|
||||
Optional<GraphEntity> findByIdAndGraphId(
|
||||
@Param("id") String id,
|
||||
@Param("graphId") String graphId
|
||||
);
|
||||
```
|
||||
|
||||
#### 查询限流示例
|
||||
```java
|
||||
public List<GraphEntity> getNeighbors(
|
||||
String graphId,
|
||||
String entityId,
|
||||
int depth,
|
||||
int limit
|
||||
) {
|
||||
// Clamp 参数到配置的最大值
|
||||
int actualDepth = Math.min(depth, properties.getMaxDepth());
|
||||
int actualLimit = Math.min(limit, properties.getMaxNodesPerQuery());
|
||||
|
||||
// 查询
|
||||
return entityRepository.findNeighbors(
|
||||
graphId, entityId, actualDepth, actualLimit
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
### 8. 建议的下一步
|
||||
|
||||
**立即行动**:
|
||||
1. 实现 Relation 的完整功能
|
||||
2. 实现 MySQL → Neo4j 同步
|
||||
3. 补充单元测试
|
||||
|
||||
**短期目标**(1-2周):
|
||||
1. 完成 MVP 功能
|
||||
2. 集成到现有系统
|
||||
3. 进行性能测试
|
||||
|
||||
**中期目标**(1-2月):
|
||||
1. 实现 GraphRAG
|
||||
2. 实现可视化
|
||||
3. 上线第一个场景
|
||||
|
||||
## 与其他工具的对比
|
||||
|
||||
| 维度 | Claude | Codex | Gemini |
|
||||
|------|--------|-------|--------|
|
||||
| **技术选型** | Neo4j | Neo4j/JanusGraph | Neo4j |
|
||||
| **架构重点** | 复用现有基础设施 | 3个新模块 | GraphRAG 融合 |
|
||||
| **数据建模** | Schema先行+版本管理 | 10类实体+6类关系 | 灵活Schema+embedding |
|
||||
| **实现路径** | 4阶段 | 4阶段(0-3) | 3阶段(MVP优先) |
|
||||
| **独特优势** | 深度集成现有系统 | 详细的领域模型 | LangChain+RAG融合 |
|
||||
|
||||
## 关键洞察
|
||||
|
||||
1. **深度集成**:Claude 强调复用现有基础设施,最小化影响
|
||||
2. **最终一致性**:提出了实用的数据同步和对账方案
|
||||
3. **详细的代码示例**:提供了可直接使用的代码片段
|
||||
4. **运维实践**:关注生产环境的监控、备份、部署
|
||||
|
||||
## 建议采纳度
|
||||
|
||||
**强烈推荐**:
|
||||
- ✅ MySQL + Neo4j 双存储架构
|
||||
- ✅ 最终一致性 + 对账机制
|
||||
- ✅ 双重防御(Controller + Service)
|
||||
- ✅ 查询限流
|
||||
- ✅ 运维实践(备份、监控)
|
||||
|
||||
**可选**:
|
||||
- ⚠️ 事件驱动同步(可以先用定时任务)
|
||||
|
||||
## 相关文档
|
||||
|
||||
- [总体方案](../README.md)
|
||||
- [架构设计](../architecture.md)
|
||||
- [Gemini 分析结果](./gemini.md)
|
||||
- [Codex 分析结果](./codex.md)
|
||||
201
docs/knowledge-graph/analysis/codex.md
Normal file
201
docs/knowledge-graph/analysis/codex.md
Normal file
@@ -0,0 +1,201 @@
|
||||
# Codex 知识图谱分析结果
|
||||
|
||||
## 分析时间
|
||||
2026-02-17
|
||||
|
||||
## 核心建议
|
||||
|
||||
### 1. 技术选型
|
||||
|
||||
**图数据库**:
|
||||
- **首选**:Neo4j(成熟稳定,社区活跃)
|
||||
- **备选**:JanusGraph(分布式场景)
|
||||
|
||||
**理由**:
|
||||
- Neo4j 的 Cypher 查询语言简洁强大
|
||||
- Spring Data Neo4j 集成良好
|
||||
- 丰富的图算法库
|
||||
- 适合中小规模图谱(< 1000万节点)
|
||||
|
||||
### 2. 架构设计(3 个新模块)
|
||||
|
||||
#### kg-ingestion (FastAPI)
|
||||
**职责**:知识抽取和预处理
|
||||
- 文本 → 实体 + 关系
|
||||
- 实体对齐和消歧
|
||||
- 置信度评分
|
||||
|
||||
#### kg-service (Spring Boot)
|
||||
**职责**:图谱查询和管理
|
||||
- 图查询 API
|
||||
- 权限控制
|
||||
- 缓存管理
|
||||
|
||||
#### kg-ui (React)
|
||||
**职责**:图谱可视化
|
||||
- AntV G6 可视化
|
||||
- 交互式查询
|
||||
- 编辑功能
|
||||
|
||||
### 3. 数据建模(10 类实体 + 6 类关系)
|
||||
|
||||
#### 核心实体(10 类)
|
||||
1. **Dataset**:数据集
|
||||
2. **Field**:字段
|
||||
3. **LabelTask**:标注任务
|
||||
4. **Workflow**:工作流
|
||||
5. **Job**:作业
|
||||
6. **Rule**:规则
|
||||
7. **User**:用户
|
||||
8. **Org**:组织
|
||||
9. **Model**:模型
|
||||
10. **Issue**:问题
|
||||
|
||||
#### 核心关系(6 类)
|
||||
1. **HAS_FIELD**:数据集包含字段
|
||||
2. **TRIGGERS**:触发关系
|
||||
3. **USES_RULE**:使用规则
|
||||
4. **ASSIGNED_TO**:分配给
|
||||
5. **PRODUCED_BY**:产生于
|
||||
6. **IMPACTS**:影响
|
||||
|
||||
### 4. 实施路线图(4 阶段)
|
||||
|
||||
#### 第 0 阶段:场景确定(1-2周)
|
||||
- 确定 2 个高价值场景
|
||||
- 定义核心实体和关系
|
||||
- 设计 Schema
|
||||
|
||||
#### 第 1 阶段:PoC(2-4周)
|
||||
- 搭建基础设施
|
||||
- 实现基础抽取
|
||||
- 验证技术可行性
|
||||
|
||||
#### 第 2 阶段:生产化(4-8周)
|
||||
- 完善功能
|
||||
- 性能优化
|
||||
- 集成到现有系统
|
||||
|
||||
#### 第 3 阶段:持续优化
|
||||
- 扩展实体和关系
|
||||
- 优化算法
|
||||
- 提升用户体验
|
||||
|
||||
### 5. 潜在挑战
|
||||
|
||||
#### 数据质量
|
||||
**问题**:元数据不完整或不准确
|
||||
|
||||
**解决方案**:
|
||||
- 数据清洗和标准化
|
||||
- 人工审核机制
|
||||
- 置信度评分
|
||||
|
||||
#### 性能瓶颈
|
||||
**问题**:大规模图谱查询性能下降
|
||||
|
||||
**解决方案**:
|
||||
- 索引优化
|
||||
- 查询限流
|
||||
- 缓存热点数据
|
||||
- 离线计算
|
||||
|
||||
#### 多租户隔离
|
||||
**问题**:不同租户的数据需要隔离
|
||||
|
||||
**解决方案**:
|
||||
- 所有节点包含 tenant_id
|
||||
- 查询时自动过滤
|
||||
- 权限控制
|
||||
|
||||
### 6. 最佳实践
|
||||
|
||||
#### Schema 设计
|
||||
- **先行设计**:明确定义实体和关系
|
||||
- **版本管理**:支持 Schema 演进
|
||||
- **文档化**:详细记录每个实体和关系
|
||||
|
||||
#### 查询优化
|
||||
- **限制深度**:最大 3 跳
|
||||
- **限制数量**:最大 1000 个节点
|
||||
- **使用索引**:在高频字段上创建索引
|
||||
- **缓存结果**:缓存热点查询
|
||||
|
||||
#### 安全性
|
||||
- **参数化查询**:防止 Cypher 注入
|
||||
- **权限控制**:基于角色的访问控制
|
||||
- **审计日志**:记录所有操作
|
||||
|
||||
### 7. 代码审查发现的问题
|
||||
|
||||
#### P0 - 严重问题
|
||||
1. **主应用未声明依赖**:已修复
|
||||
2. **Neo4j 凭据硬编码**:已修复
|
||||
3. **graphId 参数未校验**:已修复
|
||||
|
||||
#### P1 - 重要问题
|
||||
4. **异常处理不规范**:已修复
|
||||
5. **查询未限流**:已修复
|
||||
6. **异常码体系未对齐**:已修复
|
||||
|
||||
#### P2 - 中等问题
|
||||
7. **关系建模未打通**:待实现
|
||||
8. **列表接口缺分页**:待实现
|
||||
9. **Python 模块未接入路由**:待实现
|
||||
10. **密钥处理不规范**:待实现
|
||||
|
||||
#### P3 - 次要问题
|
||||
11. **Neo4j 镜像浮动 tag**:待修复
|
||||
12. **测试覆盖为空**:待补充
|
||||
|
||||
### 8. 建议的下一步
|
||||
|
||||
**立即行动**:
|
||||
1. 补充 P2 问题(关系功能、分页、Python 路由)
|
||||
2. 定义核心实体和关系模型
|
||||
3. 实现 MySQL → Neo4j 同步
|
||||
|
||||
**短期目标**(1-2周):
|
||||
1. 完成 MVP 功能
|
||||
2. 补充单元测试
|
||||
3. 进行性能测试
|
||||
|
||||
**中期目标**(1-2月):
|
||||
1. 集成到现有系统
|
||||
2. 实现 GraphRAG
|
||||
3. 上线第一个场景
|
||||
|
||||
## 与其他工具的对比
|
||||
|
||||
| 维度 | Codex | Gemini | Claude |
|
||||
|------|-------|--------|--------|
|
||||
| **技术选型** | Neo4j/JanusGraph | Neo4j | Neo4j |
|
||||
| **架构重点** | 3个新模块 | GraphRAG 融合 | 复用现有基础设施 |
|
||||
| **数据建模** | 10类实体+6类关系 | 灵活Schema+embedding | Schema先行+版本管理 |
|
||||
| **实现路径** | 4阶段(0-3) | 3阶段(MVP优先) | 4阶段 |
|
||||
| **独特优势** | 详细的领域模型 | LangChain+RAG融合 | 深度集成现有系统 |
|
||||
|
||||
## 关键洞察
|
||||
|
||||
1. **详细的领域模型**:Codex 提供了最详细的实体和关系定义
|
||||
2. **严格的代码审查**:发现了 12 个问题,确保代码质量
|
||||
3. **实用的最佳实践**:提供了具体的优化建议
|
||||
4. **分阶段实施**:强调先做 PoC,验证可行性
|
||||
|
||||
## 建议采纳度
|
||||
|
||||
**强烈推荐**:
|
||||
- ✅ 10 类实体 + 6 类关系的数据模型
|
||||
- ✅ 代码审查发现的问题修复
|
||||
- ✅ 最佳实践(查询优化、安全性)
|
||||
- ✅ 4 阶段实施路线
|
||||
|
||||
**可选**:
|
||||
- ⚠️ JanusGraph(如果需要分布式)
|
||||
|
||||
## 相关文档
|
||||
|
||||
- [总体方案](../README.md)
|
||||
- [架构设计](../architecture.md)
|
||||
- [Gemini 分析结果](./gemini.md)
|
||||
- [Claude 分析结果](./claude.md)
|
||||
154
docs/knowledge-graph/analysis/gemini.md
Normal file
154
docs/knowledge-graph/analysis/gemini.md
Normal file
@@ -0,0 +1,154 @@
|
||||
# Gemini 知识图谱分析结果
|
||||
|
||||
## 分析时间
|
||||
2026-02-17
|
||||
|
||||
## 核心建议
|
||||
|
||||
### 1. GraphRAG 融合方案(独特贡献)
|
||||
|
||||
**创新点**:将知识图谱与现有 RAG 系统深度融合
|
||||
|
||||
**实现方案**:
|
||||
- 在 `rag-query-service` 中增加"混合检索"模式
|
||||
- 查询时同时检索 Milvus(向量)+ Neo4j(图结构)
|
||||
- 将 2-hop 子图的三元组文本化后作为 Context 喂给 LLM
|
||||
|
||||
**优势**:
|
||||
- 充分利用现有的 Milvus 向量检索能力
|
||||
- 结合向量相似度和图结构关系
|
||||
- 提供更丰富的上下文信息
|
||||
|
||||
### 2. LangChain 集成方案
|
||||
|
||||
**技术路径**:
|
||||
- 利用 LangChain 的 `LLMGraphTransformer` 实现自动抽取
|
||||
- 在 `runtime/datamate-python` 中实现
|
||||
- API: `POST /graph/extract`,输入文本,输出节点和边
|
||||
|
||||
**实现细节**:
|
||||
```python
|
||||
from langchain_experimental.graph_transformers import LLMGraphTransformer
|
||||
|
||||
transformer = LLMGraphTransformer(
|
||||
llm=llm,
|
||||
allowed_nodes=["Dataset", "Field", "Workflow"],
|
||||
allowed_relationships=["HAS_FIELD", "USES"]
|
||||
)
|
||||
|
||||
graph_documents = transformer.convert_to_graph_documents([document])
|
||||
```
|
||||
|
||||
### 3. 数据建模增强
|
||||
|
||||
**核心元模型**:
|
||||
- **Entity**:增加 `embedding` 字段(节点的向量表示)
|
||||
- **Document**:新增节点类型,用于溯源
|
||||
- **关系**:`(Entity)-[MENTIONED_IN]->(Document)`
|
||||
|
||||
**优势**:
|
||||
- 支持向量检索与图检索的混合
|
||||
- 方便溯源,追踪实体来源
|
||||
- 提升检索准确性
|
||||
|
||||
### 4. 实施路线图(3 阶段)
|
||||
|
||||
#### 第一阶段:基础设施与基础抽取 (MVP)
|
||||
1. 环境搭建:在 `deployment/docker/` 下新建 neo4j 目录
|
||||
2. Python 抽取器:利用 LangChain 的 LLMGraphTransformer
|
||||
3. 简单存储:直接存入 Neo4j
|
||||
|
||||
#### 第二阶段:图谱服务与 RAG 融合
|
||||
1. Java 服务:创建 `knowledge-graph-service`
|
||||
2. GraphRAG:在 `rag-query-service` 中增加"混合检索"模式
|
||||
- 查询时同时检索 Milvus 和 Neo4j(2-hop 子图)
|
||||
- 将三元组文本化后作为 Context 喂给 LLM
|
||||
|
||||
#### 第三阶段:可视化与高级功能
|
||||
1. 前端可视化:知识图谱浏览器
|
||||
2. 图谱编辑:Human-in-the-loop 修正
|
||||
|
||||
### 5. 潜在挑战与应对
|
||||
|
||||
#### 实体歧义
|
||||
**问题**:同名实体可能指代不同对象
|
||||
|
||||
**解决方案**:
|
||||
- 实体对齐步骤
|
||||
- 利用 LLM 或向量相似度合并
|
||||
- 人工审核机制
|
||||
|
||||
#### 信息过载(Super Nodes)
|
||||
**问题**:某些节点连接过多,查询性能下降
|
||||
|
||||
**解决方案**:
|
||||
- 限制跳数(最大 3 跳)
|
||||
- 限制最大边数(最大 1000 条)
|
||||
- 分页返回结果
|
||||
|
||||
#### 幻觉与错误抽取
|
||||
**问题**:LLM 可能产生不存在的实体或关系
|
||||
|
||||
**解决方案**:
|
||||
- 置信度评分
|
||||
- 人工审核
|
||||
- 对比多个模型的结果
|
||||
|
||||
### 6. 首要行动
|
||||
|
||||
**基础设施搭建**:
|
||||
1. 在 `deployment/docker/` 下创建 neo4j 目录
|
||||
2. 编写 docker-compose.yml
|
||||
3. 更新 Makefile 支持 Neo4j 的启动
|
||||
|
||||
**示例配置**:
|
||||
```yaml
|
||||
version: '3.8'
|
||||
services:
|
||||
neo4j:
|
||||
image: neo4j:latest
|
||||
ports:
|
||||
- "7474:7474"
|
||||
- "7687:7687"
|
||||
environment:
|
||||
- NEO4J_AUTH=neo4j/datamate123
|
||||
volumes:
|
||||
- neo4j_data:/data
|
||||
volumes:
|
||||
neo4j_data:
|
||||
```
|
||||
|
||||
## 与其他工具的对比
|
||||
|
||||
| 维度 | Gemini | Codex | Claude |
|
||||
|------|--------|-------|--------|
|
||||
| **技术选型** | Neo4j | Neo4j/JanusGraph | Neo4j |
|
||||
| **架构重点** | GraphRAG 融合 | 3个新模块 | 复用现有基础设施 |
|
||||
| **数据建模** | 灵活Schema+embedding | 10类实体+6类关系 | Schema先行+版本管理 |
|
||||
| **实现路径** | 3阶段(MVP优先) | 4阶段(0-3) | 4阶段 |
|
||||
| **独特优势** | LangChain+RAG融合 | 详细的领域模型 | 深度集成现有系统 |
|
||||
|
||||
## 关键洞察
|
||||
|
||||
1. **GraphRAG 是核心创新**:Gemini 提出的混合检索方案特别适合 DataMate 现有的 RAG 架构
|
||||
2. **LangChain 简化开发**:利用现成的 LLMGraphTransformer 可以快速实现抽取功能
|
||||
3. **向量 + 图结构**:embedding 字段的引入使得向量检索和图检索可以无缝结合
|
||||
4. **MVP 优先**:强调先做基础设施,再逐步扩展功能
|
||||
|
||||
## 建议采纳度
|
||||
|
||||
**强烈推荐**:
|
||||
- ✅ GraphRAG 融合方案
|
||||
- ✅ LangChain 集成
|
||||
- ✅ embedding 字段
|
||||
- ✅ Document 节点
|
||||
|
||||
**可选**:
|
||||
- ⚠️ 3 阶段实施路线(可与其他工具的 4 阶段结合)
|
||||
|
||||
## 相关文档
|
||||
|
||||
- [总体方案](../README.md)
|
||||
- [架构设计](../architecture.md)
|
||||
- [Codex 分析结果](./codex.md)
|
||||
- [Claude 分析结果](./claude.md)
|
||||
397
docs/knowledge-graph/architecture.md
Normal file
397
docs/knowledge-graph/architecture.md
Normal file
@@ -0,0 +1,397 @@
|
||||
# DataMate 知识图谱架构设计
|
||||
|
||||
## 🏗️ 整体架构
|
||||
|
||||
### 分层架构
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 前端层 (Frontend) │
|
||||
│ ┌─────────────────────────────────────────────────────┐ │
|
||||
│ │ React + AntV G6 │ │
|
||||
│ │ - 图谱可视化(分层加载、子图裁剪) │ │
|
||||
│ │ - 图谱编辑(Human-in-the-loop) │ │
|
||||
│ │ - 查询界面(Cypher 查询构建器) │ │
|
||||
│ └─────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
↓ HTTP/REST
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 服务层 (Service) │
|
||||
│ ┌──────────────────────┐ ┌──────────────────────────┐ │
|
||||
│ │ kg-service │ │ rag-query-service │ │
|
||||
│ │ (Spring Boot) │ │ (FastAPI) │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ - 图查询 API │ │ - 混合检索 │ │
|
||||
│ │ - 权限过滤 │ │ - GraphRAG │ │
|
||||
│ │ - 缓存层 (Redis) │ │ - 向量检索 + 图检索 │ │
|
||||
│ └──────────────────────┘ └──────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 摄入层 (Ingestion) │
|
||||
│ ┌─────────────────────────────────────────────────────┐ │
|
||||
│ │ kg-ingestion (FastAPI) │ │
|
||||
│ │ - LangChain LLMGraphTransformer │ │
|
||||
│ │ - 实体对齐(向量相似度 + LLM) │ │
|
||||
│ │ - 关系生成(规则 + LLM) │ │
|
||||
│ │ - 置信度评分 │ │
|
||||
│ └─────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 存储层 (Storage) │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ MySQL │ │ Neo4j │ │ Milvus │ │
|
||||
│ │ (元数据) │ │ (图结构) │ │ (向量) │ │
|
||||
│ └──────────┘ └──────────┘ └──────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## 🔧 技术选型
|
||||
|
||||
### 图数据库:Neo4j
|
||||
|
||||
**选择理由**:
|
||||
- ✅ 成熟稳定,社区活跃
|
||||
- ✅ Cypher 查询语言简洁强大
|
||||
- ✅ Spring Data Neo4j 集成良好
|
||||
- ✅ 支持 ACID 事务
|
||||
- ✅ 丰富的图算法库
|
||||
|
||||
**版本**:Neo4j 社区版(生产环境可升级企业版)
|
||||
|
||||
**配置**:
|
||||
- 端口:7474 (HTTP), 7687 (Bolt)
|
||||
- 内存:heap 512MB, page cache 512MB(可根据数据量调整)
|
||||
- 持久化:Docker volume
|
||||
|
||||
### 后端框架
|
||||
|
||||
#### knowledge-graph-service (Spring Boot)
|
||||
|
||||
**职责**:
|
||||
- 图谱查询 API
|
||||
- 权限控制和租户隔离
|
||||
- 缓存管理
|
||||
- 与其他服务集成
|
||||
|
||||
**技术栈**:
|
||||
- Spring Boot 3.x
|
||||
- Spring Data Neo4j
|
||||
- Spring Security(权限控制)
|
||||
- Redis(缓存)
|
||||
|
||||
**DDD 分层**:
|
||||
```
|
||||
com.datamate.knowledgegraph/
|
||||
├── application/ # 应用服务层
|
||||
│ └── GraphEntityService.java
|
||||
├── domain/ # 领域层
|
||||
│ ├── model/
|
||||
│ │ ├── GraphEntity.java
|
||||
│ │ └── GraphRelation.java
|
||||
│ └── repository/
|
||||
│ └── GraphEntityRepository.java
|
||||
├── infrastructure/ # 基础设施层
|
||||
│ ├── neo4j/
|
||||
│ │ └── KnowledgeGraphProperties.java
|
||||
│ └── exception/
|
||||
│ └── KnowledgeGraphErrorCode.java
|
||||
└── interfaces/ # 接口层
|
||||
├── rest/
|
||||
│ └── GraphEntityController.java
|
||||
└── dto/
|
||||
├── CreateEntityRequest.java
|
||||
├── UpdateEntityRequest.java
|
||||
└── CreateRelationRequest.java
|
||||
```
|
||||
|
||||
#### kg-ingestion (FastAPI)
|
||||
|
||||
**职责**:
|
||||
- 知识抽取(文本 → 实体 + 关系)
|
||||
- 实体对齐和消歧
|
||||
- 关系生成和验证
|
||||
- 置信度评分
|
||||
|
||||
**技术栈**:
|
||||
- FastAPI
|
||||
- LangChain
|
||||
- LangChain LLMGraphTransformer
|
||||
- Pydantic(数据验证)
|
||||
|
||||
**模块结构**:
|
||||
```
|
||||
kg_extraction/
|
||||
├── __init__.py
|
||||
├── models.py # 数据模型
|
||||
├── extractor.py # 抽取器
|
||||
└── aligner.py # 实体对齐(待实现)
|
||||
```
|
||||
|
||||
### 前端框架
|
||||
|
||||
**技术栈**:
|
||||
- React 18
|
||||
- AntV G6(图可视化)
|
||||
- TypeScript
|
||||
- Ant Design(UI 组件)
|
||||
|
||||
**核心功能**:
|
||||
- 图谱可视化(支持 10000+ 节点)
|
||||
- 交互式查询构建器
|
||||
- 实时编辑和反馈
|
||||
- 导出和分享
|
||||
|
||||
## 🔐 安全设计
|
||||
|
||||
### 多租户隔离
|
||||
|
||||
**策略**:
|
||||
- 所有实体和关系都包含 `graph_id` 属性
|
||||
- 查询时自动添加 `graph_id` 过滤条件
|
||||
- Neo4j 索引包含 `graph_id`
|
||||
|
||||
**实现**:
|
||||
```cypher
|
||||
// 创建索引
|
||||
CREATE INDEX entity_graph_id IF NOT EXISTS FOR (n:Entity) ON (n.graph_id);
|
||||
|
||||
// 查询时自动过滤
|
||||
MATCH (n:Entity {graph_id: $graphId})
|
||||
WHERE n.id = $entityId
|
||||
RETURN n;
|
||||
```
|
||||
|
||||
### 权限控制
|
||||
|
||||
**graphId 双重防御**:
|
||||
1. **Controller 层**:`@Pattern(regexp = UUID_REGEX)` 格式校验
|
||||
2. **Service 层**:`validateGraphId()` 业务校验
|
||||
|
||||
**实现**:
|
||||
```java
|
||||
// Controller 层
|
||||
@GetMapping("/{graphId}/entities/{entityId}")
|
||||
public GraphEntity getEntity(
|
||||
@PathVariable @Pattern(regexp = UUID_REGEX) String graphId,
|
||||
@PathVariable @Pattern(regexp = UUID_REGEX) String entityId
|
||||
) {
|
||||
return entityService.getEntity(graphId, entityId);
|
||||
}
|
||||
|
||||
// Service 层
|
||||
public GraphEntity getEntity(String graphId, String entityId) {
|
||||
validateGraphId(graphId);
|
||||
return entityRepository.findByIdAndGraphId(entityId, graphId)
|
||||
.orElseThrow(() -> BusinessException.of(
|
||||
KnowledgeGraphErrorCode.ENTITY_NOT_FOUND
|
||||
));
|
||||
}
|
||||
```
|
||||
|
||||
### Cypher 注入防护
|
||||
|
||||
**策略**:
|
||||
- 使用参数化查询
|
||||
- 禁止拼接 Cypher 字符串
|
||||
- 输入验证和转义
|
||||
|
||||
**示例**:
|
||||
```java
|
||||
// ✅ 正确:参数化查询
|
||||
@Query("MATCH (n:Entity {id: $id, graph_id: $graphId}) RETURN n")
|
||||
Optional<GraphEntity> findByIdAndGraphId(
|
||||
@Param("id") String id,
|
||||
@Param("graphId") String graphId
|
||||
);
|
||||
|
||||
// ❌ 错误:字符串拼接
|
||||
String cypher = "MATCH (n:Entity {id: '" + id + "'}) RETURN n";
|
||||
```
|
||||
|
||||
## 📊 数据同步策略
|
||||
|
||||
### MySQL → Neo4j 同步
|
||||
|
||||
**策略**:最终一致性 + 对账机制
|
||||
|
||||
**同步方式**:
|
||||
1. **实时同步**:通过 CDC(Change Data Capture)捕获 MySQL 变更
|
||||
2. **批量同步**:定时任务(每小时/每天)全量同步
|
||||
3. **手动同步**:提供 API 触发同步
|
||||
|
||||
**对账机制**:
|
||||
- 每天凌晨对比 MySQL 和 Neo4j 的数据
|
||||
- 发现不一致时记录日志并告警
|
||||
- 提供修复工具
|
||||
|
||||
**实现**:
|
||||
```java
|
||||
@Scheduled(cron = "0 0 * * * *") // 每小时
|
||||
public void syncFromMySQL() {
|
||||
// 1. 查询 MySQL 中的变更
|
||||
List<Dataset> changedDatasets = datasetRepository
|
||||
.findByUpdatedAtAfter(lastSyncTime);
|
||||
|
||||
// 2. 转换为图实体
|
||||
List<GraphEntity> entities = changedDatasets.stream()
|
||||
.map(this::toGraphEntity)
|
||||
.collect(Collectors.toList());
|
||||
|
||||
// 3. 批量写入 Neo4j
|
||||
graphEntityRepository.saveAll(entities);
|
||||
|
||||
// 4. 更新同步时间
|
||||
lastSyncTime = Instant.now();
|
||||
}
|
||||
```
|
||||
|
||||
## ⚡ 性能优化
|
||||
|
||||
### 查询优化
|
||||
|
||||
**策略**:
|
||||
1. **限制遍历深度**:最大 3 跳
|
||||
2. **限制返回节点数**:最大 1000 个
|
||||
3. **使用索引**:在高频查询字段上创建索引
|
||||
4. **缓存热点数据**:使用 Redis 缓存
|
||||
|
||||
**实现**:
|
||||
```java
|
||||
public List<GraphEntity> getNeighbors(
|
||||
String graphId,
|
||||
String entityId,
|
||||
int depth,
|
||||
int limit
|
||||
) {
|
||||
// Clamp 参数
|
||||
int actualDepth = Math.min(depth, properties.getMaxDepth());
|
||||
int actualLimit = Math.min(limit, properties.getMaxNodesPerQuery());
|
||||
|
||||
// 查询
|
||||
return entityRepository.findNeighbors(
|
||||
graphId, entityId, actualDepth, actualLimit
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
### 索引策略
|
||||
|
||||
**必需索引**:
|
||||
```cypher
|
||||
// 实体 ID 索引
|
||||
CREATE INDEX entity_id IF NOT EXISTS FOR (n:Entity) ON (n.id);
|
||||
|
||||
// 图 ID 索引
|
||||
CREATE INDEX entity_graph_id IF NOT EXISTS FOR (n:Entity) ON (n.graph_id);
|
||||
|
||||
// 复合索引
|
||||
CREATE INDEX entity_id_graph_id IF NOT EXISTS
|
||||
FOR (n:Entity) ON (n.id, n.graph_id);
|
||||
```
|
||||
|
||||
### 缓存策略
|
||||
|
||||
**缓存层次**:
|
||||
1. **L1 缓存**:Spring Cache(本地缓存)
|
||||
2. **L2 缓存**:Redis(分布式缓存)
|
||||
3. **L3 缓存**:Neo4j 内置缓存
|
||||
|
||||
**缓存内容**:
|
||||
- 热点实体(访问频率 > 100/小时)
|
||||
- 常用子图(2-hop 邻居)
|
||||
- 查询结果(TTL 5 分钟)
|
||||
|
||||
## 🔄 GraphRAG 融合
|
||||
|
||||
### 混合检索架构
|
||||
|
||||
```
|
||||
用户查询
|
||||
↓
|
||||
┌─────────────────────────────────────┐
|
||||
│ 查询理解和改写 │
|
||||
└─────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────┐
|
||||
│ 并行检索 │
|
||||
│ ┌───────────┐ ┌─────────────┐ │
|
||||
│ │ Milvus │ │ Neo4j │ │
|
||||
│ │ 向量检索 │ │ 图检索 │ │
|
||||
│ │ Top-K │ │ 2-hop 子图 │ │
|
||||
│ └───────────┘ └─────────────┘ │
|
||||
└─────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────┐
|
||||
│ 结果融合和排序 │
|
||||
│ - 向量相似度 × 0.6 │
|
||||
│ - 图结构相关性 × 0.4 │
|
||||
└─────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────┐
|
||||
│ Context 构建 │
|
||||
│ - 文档片段(Milvus) │
|
||||
│ - 三元组文本化(Neo4j) │
|
||||
└─────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────┐
|
||||
│ LLM 生成 │
|
||||
└─────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 三元组文本化
|
||||
|
||||
**策略**:将图结构转换为自然语言
|
||||
|
||||
**示例**:
|
||||
```python
|
||||
# 图结构
|
||||
(Dataset:用户行为数据)-[HAS_FIELD]->(Field:user_id)
|
||||
(Dataset:用户行为数据)-[USED_BY]->(Workflow:用户画像构建)
|
||||
|
||||
# 文本化
|
||||
"""
|
||||
数据集"用户行为数据"包含字段"user_id"。
|
||||
数据集"用户行为数据"被工作流"用户画像构建"使用。
|
||||
"""
|
||||
```
|
||||
|
||||
## 📈 监控和运维
|
||||
|
||||
### 监控指标
|
||||
|
||||
**Neo4j 指标**:
|
||||
- 节点数量
|
||||
- 关系数量
|
||||
- 查询响应时间
|
||||
- 内存使用率
|
||||
- 磁盘使用率
|
||||
|
||||
**服务指标**:
|
||||
- API 响应时间
|
||||
- 错误率
|
||||
- 吞吐量
|
||||
- 缓存命中率
|
||||
|
||||
**工具**:
|
||||
- Prometheus(指标采集)
|
||||
- Grafana(可视化)
|
||||
- Neo4j Metrics(Neo4j 专用指标)
|
||||
|
||||
### 备份策略
|
||||
|
||||
**Neo4j 备份**:
|
||||
- 每天凌晨全量备份
|
||||
- 保留最近 7 天的备份
|
||||
- 备份到对象存储(S3/OSS)
|
||||
|
||||
**恢复测试**:
|
||||
- 每月进行一次恢复演练
|
||||
- 验证备份的完整性和可用性
|
||||
|
||||
## 🔗 相关文档
|
||||
|
||||
- [总体方案](./README.md)
|
||||
- [实施计划](./implementation.md)
|
||||
- [AI 分析结果](./analysis/)
|
||||
395
docs/knowledge-graph/implementation.md
Normal file
395
docs/knowledge-graph/implementation.md
Normal file
@@ -0,0 +1,395 @@
|
||||
# DataMate 知识图谱实施计划
|
||||
|
||||
## 📅 总体时间线
|
||||
|
||||
```
|
||||
第 0 阶段:基础设施(1周) ✅ 已完成
|
||||
第 1 阶段:MVP(2-3周) ⏳ 进行中
|
||||
第 2 阶段:GraphRAG 融合(3-4周) ⏳ 待开始
|
||||
第 3 阶段:可视化与优化(4-6周) ⏳ 待开始
|
||||
```
|
||||
|
||||
**总计**:10-14 周
|
||||
|
||||
---
|
||||
|
||||
## ✅ 第 0 阶段:基础设施(已完成)
|
||||
|
||||
### 目标
|
||||
搭建知识图谱的基础设施,包括 Neo4j、Java 服务、Python 模块。
|
||||
|
||||
### 已完成任务
|
||||
|
||||
#### 1. Neo4j Docker Compose 配置
|
||||
- ✅ 创建 `deployment/docker/neo4j/docker-compose.yml`
|
||||
- ✅ 配置 Neo4j 社区版
|
||||
- ✅ 端口:7474 (HTTP), 7687 (Bolt)
|
||||
- ✅ 数据持久化:Docker volume
|
||||
- ✅ 环境变量化密码
|
||||
|
||||
#### 2. Makefile 更新
|
||||
- ✅ 添加 `neo4j-up`:启动 Neo4j
|
||||
- ✅ 添加 `neo4j-down`:停止 Neo4j
|
||||
- ✅ 添加 `neo4j-logs`:查看日志
|
||||
- ✅ 添加 `neo4j-shell`:进入 Cypher Shell
|
||||
|
||||
#### 3. knowledge-graph-service(Spring Boot)
|
||||
- ✅ 创建完整的 DDD 分层架构
|
||||
- ✅ 实现 GraphEntity 的 CRUD
|
||||
- ✅ 实现 graphId 双重防御
|
||||
- ✅ 实现查询限流
|
||||
- ✅ 统一异常处理体系
|
||||
|
||||
**文件清单**(11 个 Java 文件):
|
||||
- `KnowledgeGraphServiceConfiguration.java`
|
||||
- `GraphEntityService.java`
|
||||
- `GraphEntity.java`, `GraphRelation.java`
|
||||
- `GraphEntityRepository.java`
|
||||
- `KnowledgeGraphErrorCode.java`
|
||||
- `KnowledgeGraphProperties.java`
|
||||
- `GraphEntityController.java`
|
||||
- `CreateEntityRequest.java`, `UpdateEntityRequest.java`, `CreateRelationRequest.java`
|
||||
- `application-knowledgegraph.yml`
|
||||
|
||||
#### 4. kg_extraction 模块(Python)
|
||||
- ✅ 创建 `KnowledgeGraphExtractor` 类
|
||||
- ✅ 集成 LangChain LLMGraphTransformer
|
||||
- ✅ 支持异步/同步/批量抽取
|
||||
- ✅ 支持 schema-guided 模式
|
||||
- ✅ 兼容 OpenAI 及自部署模型
|
||||
|
||||
**文件清单**(3 个 Python 文件):
|
||||
- `__init__.py`
|
||||
- `models.py`:Pydantic 数据模型
|
||||
- `extractor.py`:抽取器实现
|
||||
|
||||
#### 5. 代码审查和修复
|
||||
- ✅ 3 轮 Codex 审查
|
||||
- ✅ 2 轮 Claude 修复
|
||||
- ✅ 所有 P0 和 P1 问题已解决
|
||||
- ✅ 编译通过,无阻塞性问题
|
||||
|
||||
### 成果
|
||||
- Commit: `5a553dd`
|
||||
- 文件变更:22 个文件,1007 行新增
|
||||
- 分支:`lsf`
|
||||
|
||||
---
|
||||
|
||||
## ⏳ 第 1 阶段:MVP(2-3周)
|
||||
|
||||
### 目标
|
||||
实现基础的图谱构建和查询功能,支持 2-3 个高价值场景。
|
||||
|
||||
### 任务列表
|
||||
|
||||
#### 任务 1.1:实现 Python 抽取器的 FastAPI 接口(3天)
|
||||
|
||||
**子任务**:
|
||||
1. 创建 `kg_extraction/interface.py`
|
||||
- 定义 FastAPI 路由
|
||||
- 实现 `/api/kg/extract` 端点
|
||||
- 支持文本输入,输出节点和边
|
||||
|
||||
2. 集成到 FastAPI 主路由
|
||||
- 在 `app/module/__init__.py` 中注册路由
|
||||
- 添加 API 文档
|
||||
|
||||
3. 实现配置管理
|
||||
- 从环境变量读取 API Key
|
||||
- 使用 `SecretStr` 保护敏感信息
|
||||
|
||||
4. 编写单元测试
|
||||
- 测试抽取功能
|
||||
- 测试错误处理
|
||||
|
||||
**验收标准**:
|
||||
- ✅ 能够通过 API 调用抽取功能
|
||||
- ✅ 返回结构化的节点和边
|
||||
- ✅ 有完整的 API 文档
|
||||
- ✅ 单元测试覆盖率 > 80%
|
||||
|
||||
#### 任务 1.2:实现 Java 服务的关系(Relation)功能(3天)
|
||||
|
||||
**子任务**:
|
||||
1. 补充 `GraphRelationRepository`
|
||||
- 实现 `findByGraphId`
|
||||
- 实现 `findBySourceAndTarget`
|
||||
- 实现 `findByType`
|
||||
|
||||
2. 实现 `GraphRelationService`
|
||||
- 创建关系
|
||||
- 查询关系
|
||||
- 更新关系
|
||||
- 删除关系
|
||||
|
||||
3. 实现 `GraphRelationController`
|
||||
- `POST /{graphId}/relations`:创建关系
|
||||
- `GET /{graphId}/relations`:列表查询
|
||||
- `GET /{graphId}/relations/{relationId}`:单个查询
|
||||
- `PUT /{graphId}/relations/{relationId}`:更新关系
|
||||
- `DELETE /{graphId}/relations/{relationId}`:删除关系
|
||||
|
||||
4. 编写单元测试和集成测试
|
||||
|
||||
**验收标准**:
|
||||
- ✅ 关系的 CRUD 功能完整
|
||||
- ✅ 支持按类型、源节点、目标节点查询
|
||||
- ✅ 有完整的权限控制
|
||||
- ✅ 测试覆盖率 > 80%
|
||||
|
||||
#### 任务 1.3:定义核心实体和关系模型(2天)
|
||||
|
||||
**子任务**:
|
||||
1. 确定 5-8 类核心实体
|
||||
- 分析 DataMate 现有数据模型
|
||||
- 选择高价值实体
|
||||
- 定义实体属性
|
||||
|
||||
2. 定义实体之间的关系
|
||||
- 分析业务流程
|
||||
- 定义关系类型
|
||||
- 定义关系属性
|
||||
|
||||
3. 设计 Schema 版本管理
|
||||
- 定义 Schema 版本号
|
||||
- 实现 Schema 迁移机制
|
||||
- 记录 Schema 变更历史
|
||||
|
||||
4. 创建文档
|
||||
- 实体和关系清单
|
||||
- 属性说明
|
||||
- 示例数据
|
||||
|
||||
**验收标准**:
|
||||
- ✅ 有清晰的实体和关系定义
|
||||
- ✅ 有完整的文档
|
||||
- ✅ 有示例数据
|
||||
|
||||
**建议的核心实体**(5-8 类):
|
||||
1. **Dataset**:数据集
|
||||
2. **Field**:字段
|
||||
3. **LabelTask**:标注任务
|
||||
4. **Workflow**:工作流
|
||||
5. **Job**:作业
|
||||
6. **User**:用户
|
||||
7. **Model**:模型(可选)
|
||||
8. **Rule**:规则(可选)
|
||||
|
||||
**建议的核心关系**:
|
||||
1. **HAS_FIELD**:数据集包含字段
|
||||
2. **TRIGGERS**:触发关系(Workflow → Job)
|
||||
3. **USES_DATASET**:使用数据集(Workflow → Dataset)
|
||||
4. **ASSIGNED_TO**:分配给(LabelTask → User)
|
||||
5. **PRODUCED_BY**:产生于(Dataset → Job)
|
||||
6. **DEPENDS_ON**:依赖于(Job → Job)
|
||||
|
||||
#### 任务 1.4:实现基础的图谱构建流程(5天)
|
||||
|
||||
**子任务**:
|
||||
1. 实现 MySQL → Neo4j 同步
|
||||
- 创建 `GraphSyncService`
|
||||
- 实现全量同步
|
||||
- 实现增量同步
|
||||
- 实现对账机制
|
||||
|
||||
2. 实现手动触发构建
|
||||
- 创建 `/api/kg/sync` 端点
|
||||
- 支持按实体类型同步
|
||||
- 支持按时间范围同步
|
||||
|
||||
3. 实现同步监控
|
||||
- 记录同步日志
|
||||
- 统计同步数据量
|
||||
- 监控同步耗时
|
||||
|
||||
4. 编写集成测试
|
||||
- 测试全量同步
|
||||
- 测试增量同步
|
||||
- 测试对账机制
|
||||
|
||||
**验收标准**:
|
||||
- ✅ 能够从 MySQL 同步元数据到 Neo4j
|
||||
- ✅ 支持增量更新
|
||||
- ✅ 有完整的监控和日志
|
||||
- ✅ 集成测试通过
|
||||
|
||||
#### 任务 1.5:实现基础查询功能(2天)
|
||||
|
||||
**子任务**:
|
||||
1. 实现邻居查询
|
||||
- 支持 N 跳邻居查询
|
||||
- 支持按关系类型过滤
|
||||
- 支持分页
|
||||
|
||||
2. 实现路径查询
|
||||
- 最短路径
|
||||
- 所有路径(限制最大数量)
|
||||
|
||||
3. 实现子图查询
|
||||
- 按条件筛选子图
|
||||
- 支持导出
|
||||
|
||||
4. 编写单元测试
|
||||
|
||||
**验收标准**:
|
||||
- ✅ 查询功能完整
|
||||
- ✅ 性能满足要求(< 1s)
|
||||
- ✅ 测试覆盖率 > 80%
|
||||
|
||||
### 里程碑
|
||||
|
||||
**M1.1**(第 1 周结束):
|
||||
- ✅ Python 抽取器 API 完成
|
||||
- ✅ Java 关系功能完成
|
||||
|
||||
**M1.2**(第 2 周结束):
|
||||
- ✅ 核心实体和关系模型定义完成
|
||||
- ✅ 图谱构建流程完成
|
||||
|
||||
**M1.3**(第 3 周结束):
|
||||
- ✅ 基础查询功能完成
|
||||
- ✅ 集成测试通过
|
||||
- ✅ MVP 演示
|
||||
|
||||
### 验收标准
|
||||
|
||||
1. **功能完整性**:
|
||||
- ✅ 能够从文本抽取实体和关系
|
||||
- ✅ 能够存储到 Neo4j
|
||||
- ✅ 能够查询和遍历图谱
|
||||
- ✅ 支持基础的权限控制
|
||||
|
||||
2. **性能指标**:
|
||||
- ✅ 抽取响应时间 < 5s
|
||||
- ✅ 查询响应时间 < 1s
|
||||
- ✅ 同步吞吐量 > 1000 实体/分钟
|
||||
|
||||
3. **质量指标**:
|
||||
- ✅ 单元测试覆盖率 > 80%
|
||||
- ✅ 集成测试通过
|
||||
- ✅ 代码审查通过
|
||||
|
||||
---
|
||||
|
||||
## ⏳ 第 2 阶段:GraphRAG 融合(3-4周)
|
||||
|
||||
### 目标
|
||||
将知识图谱与现有 RAG 系统深度融合,提升检索和生成质量。
|
||||
|
||||
### 任务列表
|
||||
|
||||
#### 任务 2.1:实现混合检索(2周)
|
||||
|
||||
**子任务**:
|
||||
1. 在 `rag-query-service` 中增加图检索模块
|
||||
2. 实现 Milvus + Neo4j 并行检索
|
||||
3. 实现结果融合和排序
|
||||
4. 实现三元组文本化
|
||||
|
||||
#### 任务 2.2:实现 GraphRAG(1周)
|
||||
|
||||
**子任务**:
|
||||
1. 设计 GraphRAG 流程
|
||||
2. 实现 Context 构建
|
||||
3. 实现 LLM 生成
|
||||
4. 优化 Prompt
|
||||
|
||||
#### 任务 2.3:评估和优化(1周)
|
||||
|
||||
**子任务**:
|
||||
1. 设计评估指标
|
||||
2. 收集测试数据
|
||||
3. 进行 A/B 测试
|
||||
4. 优化检索策略
|
||||
|
||||
### 验收标准
|
||||
- ✅ 混合检索性能优于单一检索
|
||||
- ✅ 支持可配置的检索策略
|
||||
- ✅ 有完整的评估指标
|
||||
|
||||
---
|
||||
|
||||
## ⏳ 第 3 阶段:可视化与优化(4-6周)
|
||||
|
||||
### 目标
|
||||
提供友好的图谱可视化和编辑功能,优化性能和运维。
|
||||
|
||||
### 任务列表
|
||||
|
||||
#### 任务 3.1:前端图谱浏览器(2周)
|
||||
|
||||
**子任务**:
|
||||
1. 搭建 React + AntV G6 项目
|
||||
2. 实现图谱可视化
|
||||
3. 实现交互功能(缩放、拖拽、搜索)
|
||||
4. 实现查询构建器
|
||||
|
||||
#### 任务 3.2:Human-in-the-loop 编辑(1周)
|
||||
|
||||
**子任务**:
|
||||
1. 实现实体编辑
|
||||
2. 实现关系编辑
|
||||
3. 实现批量操作
|
||||
4. 实现审核流程
|
||||
|
||||
#### 任务 3.3:性能优化(1周)
|
||||
|
||||
**子任务**:
|
||||
1. 优化索引策略
|
||||
2. 实现缓存机制
|
||||
3. 实现离线计算
|
||||
4. 优化查询语句
|
||||
|
||||
#### 任务 3.4:监控和运维(1周)
|
||||
|
||||
**子任务**:
|
||||
1. 集成 Prometheus + Grafana
|
||||
2. 实现备份和恢复
|
||||
3. 编写运维文档
|
||||
4. 进行压力测试
|
||||
|
||||
### 验收标准
|
||||
- ✅ 支持大规模图谱可视化(10000+ 节点)
|
||||
- ✅ 支持实时编辑和反馈
|
||||
- ✅ 查询响应时间 < 1s
|
||||
- ✅ 有完整的监控和告警
|
||||
|
||||
---
|
||||
|
||||
## 📊 资源需求
|
||||
|
||||
### 人力资源
|
||||
- **后端开发**:1-2 人
|
||||
- **前端开发**:1 人
|
||||
- **算法工程师**:1 人(兼职)
|
||||
- **测试工程师**:1 人(兼职)
|
||||
|
||||
### 基础设施
|
||||
- **Neo4j**:4 核 8GB 内存(开发环境)
|
||||
- **MySQL**:现有资源
|
||||
- **Milvus**:现有资源
|
||||
- **Redis**:现有资源
|
||||
|
||||
### 外部依赖
|
||||
- **LLM API**:OpenAI 或自部署模型
|
||||
- **对象存储**:备份使用
|
||||
|
||||
---
|
||||
|
||||
## 🎯 关键里程碑
|
||||
|
||||
| 里程碑 | 时间 | 交付物 |
|
||||
|--------|------|--------|
|
||||
| M0 | 第 1 周 | 基础设施搭建完成 ✅ |
|
||||
| M1 | 第 4 周 | MVP 完成,支持基础图谱构建和查询 |
|
||||
| M2 | 第 8 周 | GraphRAG 融合完成,检索质量提升 |
|
||||
| M3 | 第 12 周 | 可视化和优化完成,系统上线 |
|
||||
|
||||
---
|
||||
|
||||
## 🔗 相关文档
|
||||
|
||||
- [总体方案](./README.md)
|
||||
- [架构设计](./architecture.md)
|
||||
- [AI 分析结果](./analysis/)
|
||||
329
docs/knowledge-graph/schema/entities.md
Normal file
329
docs/knowledge-graph/schema/entities.md
Normal file
@@ -0,0 +1,329 @@
|
||||
# DataMate 知识图谱 - 核心实体定义
|
||||
|
||||
> Schema 版本:1.0.0
|
||||
> 更新日期:2026-02-17
|
||||
|
||||
## 概述
|
||||
|
||||
DataMate 知识图谱定义了 **8 类核心实体**,覆盖数据资产管理、任务追踪、组织归属和知识管理四大领域。
|
||||
|
||||
所有实体在 Neo4j 中统一使用 `Entity` 标签,通过 `type` 属性区分语义类型。每个实体都包含以下公共属性:
|
||||
|
||||
| 公共属性 | 类型 | 必填 | 说明 |
|
||||
|---------|------|------|------|
|
||||
| `id` | String (UUID) | 是 | 全局唯一标识符 |
|
||||
| `name` | String | 是 | 实体名称 |
|
||||
| `type` | String | 是 | 实体类型(见下文各类型定义) |
|
||||
| `description` | String | 否 | 实体描述 |
|
||||
| `graph_id` | String (UUID) | 是 | 所属图谱 ID,用于多租户隔离 |
|
||||
| `source_id` | String | 否 | 来源记录 ID(MySQL 主键或外部系统 ID) |
|
||||
| `source_type` | String | 否 | 来源类型:`SYNC`(MySQL 同步)、`EXTRACTION`(LLM 抽取)、`MANUAL`(人工创建) |
|
||||
| `confidence` | Double | 否 | 置信度 0.0-1.0(同步数据默认 1.0,抽取数据由模型评分) |
|
||||
| `properties_json` | String (JSON) | 否 | 类型特有 properties 的 JSON 序列化,各类型的 properties 定义见下文 |
|
||||
| `created_at` | LocalDateTime | 是 | 创建时间 |
|
||||
|
||||
---
|
||||
|
||||
## 1. Dataset(数据集)
|
||||
|
||||
数据集是 DataMate 的核心资产,代表一组结构化或非结构化数据的集合。
|
||||
|
||||
**对应代码模型**:`data-management-service` 的 `Dataset.java`
|
||||
|
||||
### properties(properties_json 字段)
|
||||
|
||||
| property | 类型 | 必填 | 约束 | 说明 |
|
||||
|----------|------|------|------|------|
|
||||
| `dataset_type` | String | 是 | `IMAGE` / `TEXT` / `QA` / `MULTIMODAL` / `OTHER` | 数据集类型 |
|
||||
| `status` | String | 是 | `DRAFT` / `ACTIVE` / `ARCHIVED` | 数据集状态 |
|
||||
| `category` | String | 否 | 最长 50 字符 | 业务分类 |
|
||||
| `format` | String | 否 | — | 数据格式(如 CSV、JSON、DICOM) |
|
||||
| `record_count` | Long | 否 | >= 0 | 记录/文件数量 |
|
||||
| `size_bytes` | Long | 否 | >= 0 | 数据集大小(字节) |
|
||||
| `version` | Integer | 否 | >= 1 | 版本号 |
|
||||
| `tags` | List\<String\> | 否 | — | 标签列表 |
|
||||
|
||||
### Cypher 示例
|
||||
|
||||
```cypher
|
||||
// 创建 Dataset 实体(类型 properties 序列化到 properties_json)
|
||||
CREATE (d:Entity {
|
||||
id: 'a1b2c3d4-...',
|
||||
name: '用户行为日志-v2',
|
||||
type: 'Dataset',
|
||||
description: '2025年Q4用户行为埋点数据',
|
||||
graph_id: $graphId,
|
||||
source_id: '12345',
|
||||
source_type: 'SYNC',
|
||||
confidence: 1.0,
|
||||
properties_json: '{"dataset_type":"TEXT","status":"ACTIVE","category":"用户行为","format":"JSON","record_count":1500000,"size_bytes":2147483648,"version":2,"tags":["behavior","production"]}',
|
||||
created_at: datetime()
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Field(字段)
|
||||
|
||||
字段代表数据集中的列或属性元数据,是数据血缘分析和影响评估的基础单元。
|
||||
|
||||
**对应代码模型**:从 `DatasetFile` 的 schema 元数据中提取
|
||||
|
||||
### properties(properties_json 字段)
|
||||
|
||||
| property | 类型 | 必填 | 约束 | 说明 |
|
||||
|----------|------|------|------|------|
|
||||
| `data_type` | String | 是 | — | 数据类型(如 STRING、INT、FLOAT、DATETIME、JSON) |
|
||||
| `nullable` | Boolean | 否 | — | 是否允许空值 |
|
||||
| `is_primary_key` | Boolean | 否 | — | 是否为主键 |
|
||||
| `default_value` | String | 否 | — | 默认值 |
|
||||
| `sample_values` | List\<String\> | 否 | 最多 5 个 | 示例值 |
|
||||
| `statistics` | String | 否 | JSON 格式 | 字段统计信息(null 率、唯一值数等) |
|
||||
|
||||
### Cypher 示例
|
||||
|
||||
```cypher
|
||||
CREATE (f:Entity {
|
||||
id: 'f1e2d3c4-...',
|
||||
name: 'user_id',
|
||||
type: 'Field',
|
||||
description: '用户唯一标识符',
|
||||
graph_id: $graphId,
|
||||
source_type: 'SYNC',
|
||||
confidence: 1.0,
|
||||
properties_json: '{"data_type":"STRING","nullable":false,"is_primary_key":true,"sample_values":["U001","U002","U003"]}',
|
||||
created_at: datetime()
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. LabelTask(标注任务)
|
||||
|
||||
标注任务代表一次数据标注活动,包括人工标注和自动标注。
|
||||
|
||||
**对应代码模型**:`data-annotation-service` 的 `LabelingProject`、`AutoAnnotationTask`;`task-coordination-service` 的 `TaskMeta`
|
||||
|
||||
### properties(properties_json 字段)
|
||||
|
||||
| property | 类型 | 必填 | 约束 | 说明 |
|
||||
|----------|------|------|------|------|
|
||||
| `task_mode` | String | 是 | `MANUAL` / `AUTO` / `HYBRID` | 标注模式 |
|
||||
| `data_type` | String | 否 | `image` / `text` / `audio` / `video` / `pdf` 等 | 标注数据类型 |
|
||||
| `labeling_type` | String | 否 | — | 标注类型(如 NER、目标检测、情感分析) |
|
||||
| `status` | String | 是 | `PENDING` / `IN_PROGRESS` / `COMPLETED` / `FAILED` / `STOPPED` | 任务状态 |
|
||||
| `progress` | Double | 否 | 0.0-100.0 | 完成进度百分比 |
|
||||
| `template_name` | String | 否 | — | 使用的标注模板名称 |
|
||||
|
||||
### Cypher 示例
|
||||
|
||||
```cypher
|
||||
CREATE (t:Entity {
|
||||
id: 'e5f6a7b8-...',
|
||||
name: '医学图像病灶标注-批次3',
|
||||
type: 'LabelTask',
|
||||
description: 'CT影像中肺结节目标检测标注',
|
||||
graph_id: $graphId,
|
||||
source_id: '67890',
|
||||
source_type: 'SYNC',
|
||||
confidence: 1.0,
|
||||
properties_json: '{"task_mode":"HYBRID","data_type":"image","labeling_type":"object_detection","status":"IN_PROGRESS","progress":45.5,"template_name":"医学目标检测"}',
|
||||
created_at: datetime()
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Workflow(工作流)
|
||||
|
||||
工作流代表一组数据处理步骤的编排定义,涵盖数据清洗、数据合成、数据评估等处理管道。
|
||||
|
||||
**对应代码模型**:`data-cleaning-service` 的 `CleaningTemplate`;`data-collection-service` 的 `CollectionTemplate`;算子编排 `Operator`
|
||||
|
||||
### properties(properties_json 字段)
|
||||
|
||||
| property | 类型 | 必填 | 约束 | 说明 |
|
||||
|----------|------|------|------|------|
|
||||
| `workflow_type` | String | 是 | `CLEANING` / `SYNTHESIS` / `EVALUATION` / `COLLECTION` / `CUSTOM` | 工作流类型 |
|
||||
| `status` | String | 否 | `DRAFT` / `ACTIVE` / `DEPRECATED` | 工作流状态 |
|
||||
| `version` | String | 否 | — | 版本号 |
|
||||
| `operator_count` | Integer | 否 | >= 0 | 包含的算子数量 |
|
||||
| `schedule` | String | 否 | Cron 表达式 | 调度表达式(用于定时工作流) |
|
||||
|
||||
### Cypher 示例
|
||||
|
||||
```cypher
|
||||
CREATE (w:Entity {
|
||||
id: 'c9d0e1f2-...',
|
||||
name: '文本去重清洗管道',
|
||||
type: 'Workflow',
|
||||
description: '基于SimHash的文本去重 + 格式标准化 + 质量过滤',
|
||||
graph_id: $graphId,
|
||||
source_type: 'SYNC',
|
||||
confidence: 1.0,
|
||||
properties_json: '{"workflow_type":"CLEANING","status":"ACTIVE","version":"2.1","operator_count":3}',
|
||||
created_at: datetime()
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Job(作业)
|
||||
|
||||
作业代表一次具体的任务执行实例,是工作流的运行时实体,记录输入输出和执行状态。
|
||||
|
||||
**对应代码模型**:`CleaningTask`、`DataSynthInstance`、`EvaluationTask`、`CollectionTask`、`TaskExecution`
|
||||
|
||||
### properties(properties_json 字段)
|
||||
|
||||
| property | 类型 | 必填 | 约束 | 说明 |
|
||||
|----------|------|------|------|------|
|
||||
| `job_type` | String | 是 | `CLEANING` / `SYNTHESIS` / `EVALUATION` / `COLLECTION` / `ANNOTATION` | 作业类型 |
|
||||
| `status` | String | 是 | `PENDING` / `RUNNING` / `COMPLETED` / `FAILED` / `STOPPED` / `CANCELLED` | 执行状态 |
|
||||
| `started_at` | String | 否 | ISO 8601 | 开始时间 |
|
||||
| `completed_at` | String | 否 | ISO 8601 | 完成时间 |
|
||||
| `duration_seconds` | Long | 否 | >= 0 | 执行耗时(秒) |
|
||||
| `input_count` | Long | 否 | >= 0 | 输入记录/文件数 |
|
||||
| `output_count` | Long | 否 | >= 0 | 输出记录/文件数 |
|
||||
| `error_message` | String | 否 | — | 错误信息(失败时) |
|
||||
|
||||
### Cypher 示例
|
||||
|
||||
```cypher
|
||||
CREATE (j:Entity {
|
||||
id: 'd3e4f5a6-...',
|
||||
name: '清洗作业-20260215-001',
|
||||
type: 'Job',
|
||||
description: '用户行为日志去重清洗',
|
||||
graph_id: $graphId,
|
||||
source_id: '54321',
|
||||
source_type: 'SYNC',
|
||||
confidence: 1.0,
|
||||
properties_json: '{"job_type":"CLEANING","status":"COMPLETED","started_at":"2026-02-15T10:00:00","completed_at":"2026-02-15T10:35:00","duration_seconds":2100,"input_count":1500000,"output_count":1380000}',
|
||||
created_at: datetime()
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. User(用户)
|
||||
|
||||
用户代表 DataMate 平台的操作人员,用于追踪数据资产的责任人和任务的执行者。
|
||||
|
||||
**对应代码模型**:`User.java`(`user` 表)
|
||||
|
||||
### properties(properties_json 字段)
|
||||
|
||||
| property | 类型 | 必填 | 约束 | 说明 |
|
||||
|----------|------|------|------|------|
|
||||
| `username` | String | 是 | 唯一 | 登录用户名 |
|
||||
| `email` | String | 否 | — | 邮箱地址 |
|
||||
| `role` | String | 否 | `ADMIN` / `USER` | 角色 |
|
||||
| `enabled` | Boolean | 否 | — | 是否启用 |
|
||||
|
||||
### Cypher 示例
|
||||
|
||||
```cypher
|
||||
CREATE (u:Entity {
|
||||
id: 'b7c8d9e0-...',
|
||||
name: '张三',
|
||||
type: 'User',
|
||||
graph_id: $graphId,
|
||||
source_id: '1001',
|
||||
source_type: 'SYNC',
|
||||
confidence: 1.0,
|
||||
properties_json: '{"username":"zhangsan","email":"zhangsan@example.com","role":"USER","enabled":true}',
|
||||
created_at: datetime()
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Org(组织)
|
||||
|
||||
组织代表企业内部的团队或部门,用于数据资产的归属管理和权限隔离。
|
||||
|
||||
**对应代码模型**:从 `User.organization` 字段聚合派生
|
||||
|
||||
### properties(properties_json 字段)
|
||||
|
||||
| property | 类型 | 必填 | 约束 | 说明 |
|
||||
|----------|------|------|------|------|
|
||||
| `org_code` | String | 否 | 唯一 | 组织编码 |
|
||||
| `parent_org_id` | String | 否 | UUID | 上级组织 ID |
|
||||
| `level` | Integer | 否 | >= 1 | 组织层级 |
|
||||
| `member_count` | Integer | 否 | >= 0 | 成员数量 |
|
||||
|
||||
### Cypher 示例
|
||||
|
||||
```cypher
|
||||
CREATE (o:Entity {
|
||||
id: 'a0b1c2d3-...',
|
||||
name: '数据工程部',
|
||||
type: 'Org',
|
||||
description: '负责数据采集、清洗和标注',
|
||||
graph_id: $graphId,
|
||||
source_type: 'SYNC',
|
||||
confidence: 1.0,
|
||||
properties_json: '{"org_code":"DE","level":2,"member_count":15}',
|
||||
created_at: datetime()
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. KnowledgeSet(知识集)
|
||||
|
||||
知识集代表经过整理和验证的知识资产集合,是 RAG 检索和知识问答的基础。
|
||||
|
||||
**对应代码模型**:`KnowledgeSet.java`(`knowledge_set` 表)
|
||||
|
||||
### properties(properties_json 字段)
|
||||
|
||||
| property | 类型 | 必填 | 约束 | 说明 |
|
||||
|----------|------|------|------|------|
|
||||
| `status` | String | 是 | `DRAFT` / `PUBLISHED` / `ARCHIVED` / `DEPRECATED` | 知识集状态 |
|
||||
| `domain` | String | 否 | — | 知识领域 |
|
||||
| `business_line` | String | 否 | — | 业务线 |
|
||||
| `sensitivity` | String | 否 | `PUBLIC` / `INTERNAL` / `CONFIDENTIAL` / `SECRET` | 敏感级别 |
|
||||
| `item_count` | Integer | 否 | >= 0 | 包含的知识条目数 |
|
||||
| `valid_from` | String | 否 | ISO 8601 | 有效期开始 |
|
||||
| `valid_to` | String | 否 | ISO 8601 | 有效期结束 |
|
||||
|
||||
### Cypher 示例
|
||||
|
||||
```cypher
|
||||
CREATE (k:Entity {
|
||||
id: 'f4e5d6c7-...',
|
||||
name: '医学影像标注规范知识库',
|
||||
type: 'KnowledgeSet',
|
||||
description: 'CT/MRI影像标注标准和常见病灶特征知识',
|
||||
graph_id: $graphId,
|
||||
source_id: '777',
|
||||
source_type: 'SYNC',
|
||||
confidence: 1.0,
|
||||
properties_json: '{"status":"PUBLISHED","domain":"医学影像","business_line":"AI辅助诊断","sensitivity":"INTERNAL","item_count":320,"valid_from":"2026-01-01T00:00:00","valid_to":"2027-01-01T00:00:00"}',
|
||||
created_at: datetime()
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 实体类型汇总
|
||||
|
||||
| 实体类型 | Neo4j type 值 | 核心用途 | 来源 |
|
||||
|---------|--------------|---------|------|
|
||||
| Dataset | `Dataset` | 数据资产管理、血缘追踪 | MySQL 同步 |
|
||||
| Field | `Field` | 字段级血缘、影响分析 | MySQL 同步 / Schema 解析 |
|
||||
| LabelTask | `LabelTask` | 标注任务追踪、人员管理 | MySQL 同步 |
|
||||
| Workflow | `Workflow` | 流程编排、复用管理 | MySQL 同步 |
|
||||
| Job | `Job` | 执行追踪、输入输出血缘 | MySQL 同步 |
|
||||
| User | `User` | 责任人追踪、权限管理 | MySQL 同步 |
|
||||
| Org | `Org` | 组织归属、资产隔离 | MySQL 同步 / 派生 |
|
||||
| KnowledgeSet | `KnowledgeSet` | 知识资产管理、RAG 检索 | MySQL 同步 |
|
||||
|
||||
## 扩展说明
|
||||
|
||||
- **自定义实体类型**:除上述 8 类核心实体外,用户可通过 LLM 抽取或手动创建自定义实体类型。自定义实体使用相同的 `Entity` 标签和公共属性结构,`type` 字段可为任意字符串。
|
||||
- **属性存储**:类型特有 properties 存储在 `properties_json` 字段中(JSON 序列化),不直接作为 Neo4j 节点属性。这保证了 schema 的灵活性,同时通过 `type` 字段实现类型区分。
|
||||
- **索引策略**:`id`、`graph_id`、`type`、`name` 字段建立 Neo4j 索引,`properties_json` 中的 properties 不建立索引。如果某个 property 需要高频查询,应提升为节点顶层属性并建立索引。
|
||||
298
docs/knowledge-graph/schema/er-diagram.md
Normal file
298
docs/knowledge-graph/schema/er-diagram.md
Normal file
@@ -0,0 +1,298 @@
|
||||
# DataMate 知识图谱 - 实体关系图
|
||||
|
||||
> Schema 版本:1.0.0
|
||||
> 更新日期:2026-02-17
|
||||
|
||||
## 核心实体关系总览
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
%% 实体定义
|
||||
Dataset["<b>Dataset</b><br/>数据集"]
|
||||
Field["<b>Field</b><br/>字段"]
|
||||
LabelTask["<b>LabelTask</b><br/>标注任务"]
|
||||
Workflow["<b>Workflow</b><br/>工作流"]
|
||||
Job["<b>Job</b><br/>作业"]
|
||||
User["<b>User</b><br/>用户"]
|
||||
Org["<b>Org</b><br/>组织"]
|
||||
KnowledgeSet["<b>KnowledgeSet</b><br/>知识集"]
|
||||
|
||||
%% 关系连接
|
||||
Dataset -->|HAS_FIELD| Field
|
||||
Dataset -->|DERIVED_FROM| Dataset
|
||||
Dataset -->|BELONGS_TO| Org
|
||||
|
||||
Job -->|USES_DATASET| Dataset
|
||||
Job -->|PRODUCES| Dataset
|
||||
Job -->|DEPENDS_ON| Job
|
||||
|
||||
Workflow -->|TRIGGERS| Job
|
||||
Workflow -->|USES_DATASET| Dataset
|
||||
|
||||
LabelTask -->|USES_DATASET| Dataset
|
||||
LabelTask -->|ASSIGNED_TO| User
|
||||
|
||||
User -->|BELONGS_TO| Org
|
||||
|
||||
Field -->|IMPACTS| Field
|
||||
|
||||
KnowledgeSet -->|SOURCED_FROM| Dataset
|
||||
|
||||
%% 样式
|
||||
classDef dataAsset fill:#4A90D9,stroke:#2C5F8A,color:#fff,stroke-width:2px
|
||||
classDef task fill:#7B68EE,stroke:#5A4CB5,color:#fff,stroke-width:2px
|
||||
classDef actor fill:#50C878,stroke:#3A9B5B,color:#fff,stroke-width:2px
|
||||
classDef knowledge fill:#FFB347,stroke:#CC8F39,color:#fff,stroke-width:2px
|
||||
|
||||
class Dataset,Field dataAsset
|
||||
class LabelTask,Workflow,Job task
|
||||
class User,Org actor
|
||||
class KnowledgeSet knowledge
|
||||
```
|
||||
|
||||
## 分领域视图
|
||||
|
||||
### 数据血缘视图
|
||||
|
||||
展示数据集之间的派生关系和字段级血缘。
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph 源数据层
|
||||
DS_RAW["Dataset<br/>原始数据集"]
|
||||
F1["Field: user_id"]
|
||||
F2["Field: event_type"]
|
||||
F3["Field: timestamp"]
|
||||
end
|
||||
|
||||
subgraph 处理层
|
||||
JOB_CLEAN["Job<br/>清洗作业"]
|
||||
JOB_SYNTH["Job<br/>合成作业"]
|
||||
end
|
||||
|
||||
subgraph 产出数据层
|
||||
DS_CLEAN["Dataset<br/>清洗后数据集"]
|
||||
DS_SYNTH["Dataset<br/>合成数据集"]
|
||||
F1_CLEAN["Field: user_id"]
|
||||
F4["Field: user_segment"]
|
||||
end
|
||||
|
||||
DS_RAW -->|HAS_FIELD| F1
|
||||
DS_RAW -->|HAS_FIELD| F2
|
||||
DS_RAW -->|HAS_FIELD| F3
|
||||
|
||||
JOB_CLEAN -->|USES_DATASET| DS_RAW
|
||||
JOB_CLEAN -->|PRODUCES| DS_CLEAN
|
||||
JOB_SYNTH -->|USES_DATASET| DS_CLEAN
|
||||
JOB_SYNTH -->|PRODUCES| DS_SYNTH
|
||||
|
||||
DS_CLEAN -->|DERIVED_FROM| DS_RAW
|
||||
DS_SYNTH -->|DERIVED_FROM| DS_CLEAN
|
||||
|
||||
DS_CLEAN -->|HAS_FIELD| F1_CLEAN
|
||||
DS_SYNTH -->|HAS_FIELD| F4
|
||||
|
||||
F1 -->|IMPACTS| F1_CLEAN
|
||||
F1_CLEAN -->|IMPACTS| F4
|
||||
|
||||
classDef source fill:#E8F4FD,stroke:#4A90D9,color:#333
|
||||
classDef process fill:#F3E8FF,stroke:#7B68EE,color:#333
|
||||
classDef output fill:#E8FFF0,stroke:#50C878,color:#333
|
||||
|
||||
class DS_RAW,F1,F2,F3 source
|
||||
class JOB_CLEAN,JOB_SYNTH process
|
||||
class DS_CLEAN,DS_SYNTH,F1_CLEAN,F4 output
|
||||
```
|
||||
|
||||
### 任务编排视图
|
||||
|
||||
展示工作流、作业和任务之间的编排关系。
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph 工作流定义
|
||||
WF_CLEAN["Workflow<br/>清洗管道"]
|
||||
WF_EVAL["Workflow<br/>评估管道"]
|
||||
end
|
||||
|
||||
subgraph 作业执行
|
||||
JOB1["Job<br/>清洗作业 #1"]
|
||||
JOB2["Job<br/>清洗作业 #2"]
|
||||
JOB3["Job<br/>评估作业"]
|
||||
end
|
||||
|
||||
subgraph 标注任务
|
||||
LT1["LabelTask<br/>人工标注"]
|
||||
LT2["LabelTask<br/>自动标注"]
|
||||
end
|
||||
|
||||
subgraph 人员
|
||||
U1["User<br/>张三"]
|
||||
U2["User<br/>李四"]
|
||||
end
|
||||
|
||||
WF_CLEAN -->|TRIGGERS| JOB1
|
||||
WF_CLEAN -->|TRIGGERS| JOB2
|
||||
WF_EVAL -->|TRIGGERS| JOB3
|
||||
|
||||
JOB2 -->|DEPENDS_ON| JOB1
|
||||
JOB3 -->|DEPENDS_ON| JOB2
|
||||
|
||||
LT1 -->|ASSIGNED_TO| U1
|
||||
LT2 -->|ASSIGNED_TO| U2
|
||||
|
||||
classDef wf fill:#7B68EE,stroke:#5A4CB5,color:#fff
|
||||
classDef job fill:#9B8FFF,stroke:#7B68EE,color:#fff
|
||||
classDef task fill:#B8A9FF,stroke:#9B8FFF,color:#fff
|
||||
classDef user fill:#50C878,stroke:#3A9B5B,color:#fff
|
||||
|
||||
class WF_CLEAN,WF_EVAL wf
|
||||
class JOB1,JOB2,JOB3 job
|
||||
class LT1,LT2 task
|
||||
class U1,U2 user
|
||||
```
|
||||
|
||||
### 组织归属视图
|
||||
|
||||
展示用户、数据集与组织的归属关系。
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph 组织
|
||||
ORG1["Org<br/>数据工程部"]
|
||||
ORG2["Org<br/>AI研发部"]
|
||||
end
|
||||
|
||||
subgraph 人员
|
||||
U1["User: 张三"]
|
||||
U2["User: 李四"]
|
||||
U3["User: 王五"]
|
||||
end
|
||||
|
||||
subgraph 数据资产
|
||||
DS1["Dataset: 用户行为日志"]
|
||||
DS2["Dataset: 医学影像集"]
|
||||
DS3["Dataset: 训练数据集"]
|
||||
end
|
||||
|
||||
U1 -->|BELONGS_TO| ORG1
|
||||
U2 -->|BELONGS_TO| ORG1
|
||||
U3 -->|BELONGS_TO| ORG2
|
||||
|
||||
DS1 -->|BELONGS_TO| ORG1
|
||||
DS2 -->|BELONGS_TO| ORG2
|
||||
DS3 -->|BELONGS_TO| ORG2
|
||||
|
||||
classDef org fill:#FFB347,stroke:#CC8F39,color:#fff
|
||||
classDef user fill:#50C878,stroke:#3A9B5B,color:#fff
|
||||
classDef data fill:#4A90D9,stroke:#2C5F8A,color:#fff
|
||||
|
||||
class ORG1,ORG2 org
|
||||
class U1,U2,U3 user
|
||||
class DS1,DS2,DS3 data
|
||||
```
|
||||
|
||||
### 知识溯源视图
|
||||
|
||||
展示知识集与数据集的溯源关系。
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph 数据源
|
||||
DS1["Dataset<br/>用户行为日志"]
|
||||
DS2["Dataset<br/>产品文档"]
|
||||
end
|
||||
|
||||
subgraph 知识资产
|
||||
KS1["KnowledgeSet<br/>用户行为知识库"]
|
||||
end
|
||||
|
||||
subgraph 标注
|
||||
LT["LabelTask<br/>知识标注"]
|
||||
end
|
||||
|
||||
KS1 -->|SOURCED_FROM| DS1
|
||||
KS1 -->|SOURCED_FROM| DS2
|
||||
LT -->|USES_DATASET| DS1
|
||||
|
||||
classDef data fill:#4A90D9,stroke:#2C5F8A,color:#fff
|
||||
classDef knowledge fill:#FFB347,stroke:#CC8F39,color:#fff
|
||||
classDef task fill:#7B68EE,stroke:#5A4CB5,color:#fff
|
||||
|
||||
class DS1,DS2 data
|
||||
class KS1 knowledge
|
||||
class LT task
|
||||
```
|
||||
|
||||
## 综合示例:完整数据流
|
||||
|
||||
展示从原始数据到知识资产的完整处理链路。
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
%% 组织和人员
|
||||
ORG["Org: 数据工程部"]
|
||||
USER["User: 张三"]
|
||||
|
||||
%% 数据资产
|
||||
DS_RAW["Dataset: 原始日志"]
|
||||
DS_CLEAN["Dataset: 清洗数据"]
|
||||
F_UID_RAW["Field: user_id (原始)"]
|
||||
F_UID_CLEAN["Field: user_id (清洗)"]
|
||||
|
||||
%% 处理流程
|
||||
WF["Workflow: 清洗管道"]
|
||||
JOB["Job: 清洗作业"]
|
||||
LT["LabelTask: 情感标注"]
|
||||
|
||||
%% 知识
|
||||
KS["KnowledgeSet: 行为知识库"]
|
||||
|
||||
%% 组织归属
|
||||
USER -->|BELONGS_TO| ORG
|
||||
DS_RAW -->|BELONGS_TO| ORG
|
||||
|
||||
%% 数据结构
|
||||
DS_RAW -->|HAS_FIELD| F_UID_RAW
|
||||
DS_CLEAN -->|HAS_FIELD| F_UID_CLEAN
|
||||
|
||||
%% 处理链路
|
||||
WF -->|TRIGGERS| JOB
|
||||
JOB -->|USES_DATASET| DS_RAW
|
||||
JOB -->|PRODUCES| DS_CLEAN
|
||||
DS_CLEAN -->|DERIVED_FROM| DS_RAW
|
||||
|
||||
%% 字段血缘
|
||||
F_UID_RAW -->|IMPACTS| F_UID_CLEAN
|
||||
|
||||
%% 任务分配
|
||||
LT -->|USES_DATASET| DS_CLEAN
|
||||
LT -->|ASSIGNED_TO| USER
|
||||
|
||||
%% 知识溯源
|
||||
KS -->|SOURCED_FROM| DS_CLEAN
|
||||
|
||||
%% 样式
|
||||
classDef org fill:#FFB347,stroke:#CC8F39,color:#fff,stroke-width:2px
|
||||
classDef user fill:#50C878,stroke:#3A9B5B,color:#fff,stroke-width:2px
|
||||
classDef data fill:#4A90D9,stroke:#2C5F8A,color:#fff,stroke-width:2px
|
||||
classDef field fill:#87CEEB,stroke:#4A90D9,color:#333,stroke-width:1px
|
||||
classDef process fill:#7B68EE,stroke:#5A4CB5,color:#fff,stroke-width:2px
|
||||
classDef knowledge fill:#FF6B6B,stroke:#CC5555,color:#fff,stroke-width:2px
|
||||
|
||||
class ORG org
|
||||
class USER user
|
||||
class DS_RAW,DS_CLEAN data
|
||||
class F_UID_RAW,F_UID_CLEAN field
|
||||
class WF,JOB,LT process
|
||||
class KS knowledge
|
||||
```
|
||||
|
||||
## 图例
|
||||
|
||||
| 颜色 | 分类 | 包含实体 |
|
||||
|------|------|---------|
|
||||
| 蓝色 | 数据资产 | Dataset, Field |
|
||||
| 紫色 | 任务/流程 | Workflow, Job, LabelTask |
|
||||
| 绿色 | 人员 | User, Org |
|
||||
| 橙色/红色 | 知识 | KnowledgeSet |
|
||||
585
docs/knowledge-graph/schema/relationships.md
Normal file
585
docs/knowledge-graph/schema/relationships.md
Normal file
@@ -0,0 +1,585 @@
|
||||
# DataMate 知识图谱 - 核心关系定义
|
||||
|
||||
> Schema 版本:1.0.0
|
||||
> 更新日期:2026-02-17
|
||||
|
||||
## 概述
|
||||
|
||||
DataMate 知识图谱定义了 **10 类核心关系**,覆盖数据血缘、任务编排、组织归属和知识溯源四大场景。
|
||||
|
||||
所有关系在 Neo4j 中统一使用 `RELATED_TO` 关系类型,通过 `relation_type` 属性区分语义类型。每个关系都包含以下公共属性:
|
||||
|
||||
| 公共属性 | 类型 | 必填 | 说明 |
|
||||
|---------|------|------|------|
|
||||
| `id` | String (UUID) | 是 | 关系唯一标识符 |
|
||||
| `relation_type` | String | 是 | 语义关系类型(见下文各类型定义) |
|
||||
| `graph_id` | String (UUID) | 是 | 所属图谱 ID |
|
||||
| `weight` | Double | 否 | 关系权重 0.0-1.0(默认 1.0) |
|
||||
| `confidence` | Double | 否 | 置信度 0.0-1.0(同步数据默认 1.0,抽取数据由模型评分) |
|
||||
| `source_id` | String | 否 | 来源记录 ID |
|
||||
| `properties_json` | String | 否 | 扩展属性 JSON |
|
||||
| `created_at` | LocalDateTime | 是 | 创建时间 |
|
||||
|
||||
### 关系方向约定
|
||||
|
||||
所有关系均为有向关系。方向表示语义上的"主动方 → 被动方"关系:
|
||||
- `(A)-[:RELATED_TO {relation_type: 'HAS_FIELD'}]->(B)` 表示 A 拥有 B
|
||||
- 查询时应注意方向,反向查询需要使用 `<-[]-` 语法
|
||||
|
||||
---
|
||||
|
||||
## 1. HAS_FIELD(包含字段)
|
||||
|
||||
**方向**:`Dataset → Field`
|
||||
|
||||
表示数据集包含某个字段/列。这是数据血缘分析的基础关系,支撑字段级影响评估。
|
||||
|
||||
### 关系属性
|
||||
|
||||
| 属性 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| `ordinal` | Integer | 否 | 字段在数据集中的位置(从 0 开始) |
|
||||
| `required` | Boolean | 否 | 是否为必填字段 |
|
||||
|
||||
### 约束
|
||||
|
||||
- 源实体类型必须为 `Dataset`
|
||||
- 目标实体类型必须为 `Field`
|
||||
- 同一 Dataset → Field 对不应重复
|
||||
|
||||
### Cypher 示例
|
||||
|
||||
```cypher
|
||||
// 创建 HAS_FIELD 关系
|
||||
MATCH (d:Entity {id: $datasetId, graph_id: $graphId})
|
||||
MATCH (f:Entity {id: $fieldId, graph_id: $graphId})
|
||||
CREATE (d)-[r:RELATED_TO {
|
||||
id: randomUUID(),
|
||||
relation_type: 'HAS_FIELD',
|
||||
graph_id: $graphId,
|
||||
weight: 1.0,
|
||||
confidence: 1.0,
|
||||
source_id: '',
|
||||
properties_json: '{"ordinal": 0, "required": true}',
|
||||
created_at: datetime()
|
||||
}]->(f)
|
||||
|
||||
// 查询数据集的所有字段
|
||||
MATCH (d:Entity {id: $datasetId, graph_id: $graphId})
|
||||
-[r:RELATED_TO {relation_type: 'HAS_FIELD', graph_id: $graphId}]->
|
||||
(f:Entity {graph_id: $graphId})
|
||||
RETURN f ORDER BY r.properties_json
|
||||
```
|
||||
|
||||
### 业务场景
|
||||
|
||||
- 查看数据集包含哪些字段
|
||||
- 字段搜索:找到包含 `user_id` 字段的所有数据集
|
||||
- Schema 对比:比较两个数据集的字段差异
|
||||
|
||||
---
|
||||
|
||||
## 2. DERIVED_FROM(派生自)
|
||||
|
||||
**方向**:`Dataset → Dataset`
|
||||
|
||||
表示数据集之间的血缘关系:目标数据集是源数据集经过某种处理后派生出来的。涵盖数据清洗、数据合成、版本迭代等场景。
|
||||
|
||||
### 关系属性
|
||||
|
||||
| 属性 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| `derivation_type` | String | 是 | 派生类型:`CLEANING`(清洗)/ `SYNTHESIS`(合成)/ `SPLIT`(拆分)/ `MERGE`(合并)/ `VERSION`(版本迭代) |
|
||||
| `job_id` | String | 否 | 产生该派生关系的作业 ID |
|
||||
| `transformation` | String | 否 | 转换描述(如"去重 + 格式标准化") |
|
||||
|
||||
### 约束
|
||||
|
||||
- 源实体和目标实体类型均为 `Dataset`
|
||||
- 不允许自引用(源 ≠ 目标)
|
||||
- 建议检查避免循环依赖
|
||||
|
||||
### Cypher 示例
|
||||
|
||||
```cypher
|
||||
// 创建清洗派生关系
|
||||
MATCH (output:Entity {id: $outputDatasetId, graph_id: $graphId})
|
||||
MATCH (input:Entity {id: $inputDatasetId, graph_id: $graphId})
|
||||
CREATE (output)-[r:RELATED_TO {
|
||||
id: randomUUID(),
|
||||
relation_type: 'DERIVED_FROM',
|
||||
graph_id: $graphId,
|
||||
weight: 1.0,
|
||||
confidence: 1.0,
|
||||
properties_json: '{"derivation_type":"CLEANING","job_id":"d3e4f5a6-...","transformation":"SimHash去重 + 空值过滤"}',
|
||||
created_at: datetime()
|
||||
}]->(input)
|
||||
|
||||
// 追踪数据血缘(最多 5 跳)
|
||||
MATCH path = (d:Entity {id: $datasetId, graph_id: $graphId})
|
||||
-[:RELATED_TO *1..5 {relation_type: 'DERIVED_FROM'}]->
|
||||
(ancestor:Entity {graph_id: $graphId})
|
||||
RETURN path
|
||||
```
|
||||
|
||||
### 业务场景
|
||||
|
||||
- **数据血缘追踪**:追溯数据集的来源链路
|
||||
- **影响分析**:当源数据集变更时,哪些下游数据集受影响
|
||||
- **版本管理**:查看数据集的版本演进历史
|
||||
|
||||
---
|
||||
|
||||
## 3. USES_DATASET(使用数据集)
|
||||
|
||||
**方向**:`Job | LabelTask | Workflow → Dataset`
|
||||
|
||||
表示作业、标注任务或工作流使用某个数据集作为输入。
|
||||
|
||||
### 关系属性
|
||||
|
||||
| 属性 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| `usage_role` | String | 否 | 使用角色:`INPUT`(输入)/ `REFERENCE`(参考)/ `VALIDATION`(验证) |
|
||||
|
||||
### 约束
|
||||
|
||||
- 源实体类型为 `Job`、`LabelTask` 或 `Workflow`
|
||||
- 目标实体类型为 `Dataset`
|
||||
|
||||
### Cypher 示例
|
||||
|
||||
```cypher
|
||||
// 创建使用关系
|
||||
MATCH (j:Entity {id: $jobId, graph_id: $graphId})
|
||||
MATCH (d:Entity {id: $datasetId, graph_id: $graphId})
|
||||
CREATE (j)-[r:RELATED_TO {
|
||||
id: randomUUID(),
|
||||
relation_type: 'USES_DATASET',
|
||||
graph_id: $graphId,
|
||||
weight: 1.0,
|
||||
confidence: 1.0,
|
||||
properties_json: '{"usage_role":"INPUT"}',
|
||||
created_at: datetime()
|
||||
}]->(d)
|
||||
|
||||
// 查询数据集被哪些作业使用
|
||||
MATCH (j:Entity {graph_id: $graphId})
|
||||
-[r:RELATED_TO {relation_type: 'USES_DATASET', graph_id: $graphId}]->
|
||||
(d:Entity {id: $datasetId, graph_id: $graphId})
|
||||
RETURN j
|
||||
```
|
||||
|
||||
### 业务场景
|
||||
|
||||
- 查看数据集的消费者:谁在使用这个数据集
|
||||
- 评估数据集的重要程度:被多少任务依赖
|
||||
- 任务输入追溯:任务使用了哪些数据集
|
||||
|
||||
---
|
||||
|
||||
## 4. PRODUCES(产出)
|
||||
|
||||
**方向**:`Job → Dataset`
|
||||
|
||||
表示作业执行后产出了一个新的数据集。与 `USES_DATASET` 相对,构成完整的输入输出链路。
|
||||
|
||||
### 关系属性
|
||||
|
||||
| 属性 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| `output_type` | String | 否 | 产出类型:`PRIMARY`(主输出)/ `SECONDARY`(副产物,如日志、统计报告) |
|
||||
|
||||
### 约束
|
||||
|
||||
- 源实体类型为 `Job`
|
||||
- 目标实体类型为 `Dataset`
|
||||
- 一个 Job 可以产出多个 Dataset(如主输出 + 统计报告)
|
||||
|
||||
### Cypher 示例
|
||||
|
||||
```cypher
|
||||
// 创建产出关系
|
||||
MATCH (j:Entity {id: $jobId, graph_id: $graphId})
|
||||
MATCH (d:Entity {id: $outputDatasetId, graph_id: $graphId})
|
||||
CREATE (j)-[r:RELATED_TO {
|
||||
id: randomUUID(),
|
||||
relation_type: 'PRODUCES',
|
||||
graph_id: $graphId,
|
||||
weight: 1.0,
|
||||
confidence: 1.0,
|
||||
properties_json: '{"output_type":"PRIMARY"}',
|
||||
created_at: datetime()
|
||||
}]->(d)
|
||||
|
||||
// 查看作业的完整输入输出
|
||||
MATCH (input:Entity {graph_id: $graphId})
|
||||
<-[:RELATED_TO {relation_type: 'USES_DATASET'}]-
|
||||
(j:Entity {id: $jobId, graph_id: $graphId})
|
||||
-[:RELATED_TO {relation_type: 'PRODUCES'}]->
|
||||
(output:Entity {graph_id: $graphId})
|
||||
RETURN input, j, output
|
||||
```
|
||||
|
||||
### 业务场景
|
||||
|
||||
- **端到端血缘**:结合 `USES_DATASET` 查看 Input → Job → Output 完整链路
|
||||
- **产出追踪**:查看作业产出了哪些数据集
|
||||
- **成本归因**:将产出数据集的成本归因到执行作业
|
||||
|
||||
---
|
||||
|
||||
## 5. ASSIGNED_TO(分配给)
|
||||
|
||||
**方向**:`LabelTask | Job → User`
|
||||
|
||||
表示任务被分配给某个用户执行。
|
||||
|
||||
### 关系属性
|
||||
|
||||
| 属性 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| `assigned_at` | String | 否 | 分配时间(ISO 8601) |
|
||||
| `role` | String | 否 | 分配角色:`EXECUTOR`(执行者)/ `REVIEWER`(审核者)/ `OWNER`(负责人) |
|
||||
|
||||
### 约束
|
||||
|
||||
- 源实体类型为 `LabelTask` 或 `Job`
|
||||
- 目标实体类型为 `User`
|
||||
- 同一任务可分配给多个用户(不同角色)
|
||||
|
||||
### Cypher 示例
|
||||
|
||||
```cypher
|
||||
// 创建分配关系
|
||||
MATCH (t:Entity {id: $taskId, graph_id: $graphId})
|
||||
MATCH (u:Entity {id: $userId, graph_id: $graphId})
|
||||
CREATE (t)-[r:RELATED_TO {
|
||||
id: randomUUID(),
|
||||
relation_type: 'ASSIGNED_TO',
|
||||
graph_id: $graphId,
|
||||
weight: 1.0,
|
||||
confidence: 1.0,
|
||||
properties_json: '{"assigned_at":"2026-02-15T10:00:00","role":"EXECUTOR"}',
|
||||
created_at: datetime()
|
||||
}]->(u)
|
||||
|
||||
// 查询用户的所有待办任务
|
||||
MATCH (t:Entity {graph_id: $graphId})
|
||||
-[r:RELATED_TO {relation_type: 'ASSIGNED_TO', graph_id: $graphId}]->
|
||||
(u:Entity {id: $userId, graph_id: $graphId})
|
||||
RETURN t
|
||||
```
|
||||
|
||||
### 业务场景
|
||||
|
||||
- **工作量分析**:查看用户被分配了多少任务
|
||||
- **任务追踪**:查看任务的执行者和审核者
|
||||
- **人员负载均衡**:分析团队内任务分配情况
|
||||
|
||||
---
|
||||
|
||||
## 6. BELONGS_TO(归属于)
|
||||
|
||||
**方向**:`User → Org` 或 `Dataset → Org`
|
||||
|
||||
表示用户属于某个组织,或数据集归属于某个组织。
|
||||
|
||||
### 关系属性
|
||||
|
||||
| 属性 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| `membership_type` | String | 否 | 归属类型:`PRIMARY`(主归属)/ `SECONDARY`(兼任/共享) |
|
||||
| `since` | String | 否 | 归属起始时间(ISO 8601) |
|
||||
|
||||
### 约束
|
||||
|
||||
- 源实体类型为 `User` 或 `Dataset`
|
||||
- 目标实体类型为 `Org`
|
||||
- User → Org 通常为 1:1(主归属),但允许兼任
|
||||
|
||||
### Cypher 示例
|
||||
|
||||
```cypher
|
||||
// 用户归属组织
|
||||
MATCH (u:Entity {id: $userId, graph_id: $graphId})
|
||||
MATCH (o:Entity {id: $orgId, graph_id: $graphId})
|
||||
CREATE (u)-[r:RELATED_TO {
|
||||
id: randomUUID(),
|
||||
relation_type: 'BELONGS_TO',
|
||||
graph_id: $graphId,
|
||||
weight: 1.0,
|
||||
confidence: 1.0,
|
||||
properties_json: '{"membership_type":"PRIMARY","since":"2025-03-01T00:00:00"}',
|
||||
created_at: datetime()
|
||||
}]->(o)
|
||||
|
||||
// 查询组织下的所有数据资产
|
||||
MATCH (d:Entity {type: 'Dataset', graph_id: $graphId})
|
||||
-[:RELATED_TO {relation_type: 'BELONGS_TO', graph_id: $graphId}]->
|
||||
(o:Entity {id: $orgId, graph_id: $graphId})
|
||||
RETURN d
|
||||
```
|
||||
|
||||
### 业务场景
|
||||
|
||||
- **组织资产看板**:查看组织拥有的所有数据集
|
||||
- **权限继承**:基于组织关系推导数据访问权限
|
||||
- **跨组织协作**:发现共享数据集的组织关系
|
||||
|
||||
---
|
||||
|
||||
## 7. TRIGGERS(触发)
|
||||
|
||||
**方向**:`Workflow → Job`
|
||||
|
||||
表示工作流触发了一次作业执行。
|
||||
|
||||
### 关系属性
|
||||
|
||||
| 属性 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| `trigger_type` | String | 否 | 触发方式:`MANUAL`(手动)/ `SCHEDULED`(定时)/ `EVENT`(事件驱动) |
|
||||
| `triggered_at` | String | 否 | 触发时间(ISO 8601) |
|
||||
|
||||
### 约束
|
||||
|
||||
- 源实体类型为 `Workflow`
|
||||
- 目标实体类型为 `Job`
|
||||
- 一个 Workflow 可触发多个 Job(每次执行产生一个)
|
||||
|
||||
### Cypher 示例
|
||||
|
||||
```cypher
|
||||
// 创建触发关系
|
||||
MATCH (w:Entity {id: $workflowId, graph_id: $graphId})
|
||||
MATCH (j:Entity {id: $jobId, graph_id: $graphId})
|
||||
CREATE (w)-[r:RELATED_TO {
|
||||
id: randomUUID(),
|
||||
relation_type: 'TRIGGERS',
|
||||
graph_id: $graphId,
|
||||
weight: 1.0,
|
||||
confidence: 1.0,
|
||||
properties_json: '{"trigger_type":"SCHEDULED","triggered_at":"2026-02-15T10:00:00"}',
|
||||
created_at: datetime()
|
||||
}]->(j)
|
||||
|
||||
// 查询工作流的执行历史
|
||||
MATCH (w:Entity {id: $workflowId, graph_id: $graphId})
|
||||
-[r:RELATED_TO {relation_type: 'TRIGGERS', graph_id: $graphId}]->
|
||||
(j:Entity {graph_id: $graphId})
|
||||
RETURN j ORDER BY r.created_at DESC
|
||||
```
|
||||
|
||||
### 业务场景
|
||||
|
||||
- **执行历史**:查看工作流的所有执行记录
|
||||
- **故障排查**:定位工作流最近一次失败的作业
|
||||
- **运行统计**:统计工作流的执行频率和成功率
|
||||
|
||||
---
|
||||
|
||||
## 8. DEPENDS_ON(依赖)
|
||||
|
||||
**方向**:`Job → Job`
|
||||
|
||||
表示作业之间的执行依赖关系:源作业的执行依赖于目标作业的完成。
|
||||
|
||||
### 关系属性
|
||||
|
||||
| 属性 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| `dependency_type` | String | 否 | 依赖类型:`STRICT`(强依赖,必须成功)/ `SOFT`(弱依赖,失败可继续) |
|
||||
|
||||
### 约束
|
||||
|
||||
- 源实体和目标实体类型均为 `Job`
|
||||
- 不允许自引用
|
||||
- 不允许循环依赖(应用层校验)
|
||||
|
||||
### Cypher 示例
|
||||
|
||||
```cypher
|
||||
// 创建依赖关系
|
||||
MATCH (j1:Entity {id: $jobId, graph_id: $graphId})
|
||||
MATCH (j2:Entity {id: $dependsOnJobId, graph_id: $graphId})
|
||||
CREATE (j1)-[r:RELATED_TO {
|
||||
id: randomUUID(),
|
||||
relation_type: 'DEPENDS_ON',
|
||||
graph_id: $graphId,
|
||||
weight: 1.0,
|
||||
confidence: 1.0,
|
||||
properties_json: '{"dependency_type":"STRICT"}',
|
||||
created_at: datetime()
|
||||
}]->(j2)
|
||||
|
||||
// 查询作业的完整依赖链
|
||||
MATCH path = (j:Entity {id: $jobId, graph_id: $graphId})
|
||||
-[:RELATED_TO *1..10 {relation_type: 'DEPENDS_ON'}]->
|
||||
(dep:Entity {graph_id: $graphId})
|
||||
RETURN path
|
||||
```
|
||||
|
||||
### 业务场景
|
||||
|
||||
- **DAG 执行调度**:确定作业执行顺序
|
||||
- **失败传播分析**:当某个作业失败,哪些下游作业受影响
|
||||
- **关键路径分析**:找到最长依赖链,识别瓶颈
|
||||
|
||||
---
|
||||
|
||||
## 9. IMPACTS(影响)
|
||||
|
||||
**方向**:`Field → Field`
|
||||
|
||||
表示字段之间的影响关系:源字段的变更会影响目标字段。这是跨数据集的字段级血缘关系。
|
||||
|
||||
### 关系属性
|
||||
|
||||
| 属性 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| `impact_type` | String | 否 | 影响类型:`DIRECT`(直接映射)/ `TRANSFORM`(转换派生)/ `AGGREGATE`(聚合计算) |
|
||||
| `transformation_rule` | String | 否 | 转换规则描述(如"UPPER(source.name)") |
|
||||
| `job_id` | String | 否 | 建立该影响关系的作业 ID |
|
||||
|
||||
### 约束
|
||||
|
||||
- 源实体和目标实体类型均为 `Field`
|
||||
- 通常跨越不同 Dataset(但同 Dataset 内的字段派生也允许)
|
||||
- 不允许自引用
|
||||
|
||||
### Cypher 示例
|
||||
|
||||
```cypher
|
||||
// 创建字段影响关系
|
||||
MATCH (f1:Entity {id: $sourceFieldId, graph_id: $graphId})
|
||||
MATCH (f2:Entity {id: $targetFieldId, graph_id: $graphId})
|
||||
CREATE (f1)-[r:RELATED_TO {
|
||||
id: randomUUID(),
|
||||
relation_type: 'IMPACTS',
|
||||
graph_id: $graphId,
|
||||
weight: 0.8,
|
||||
confidence: 0.9,
|
||||
properties_json: '{"impact_type":"TRANSFORM","transformation_rule":"TRIM(LOWER(source))","job_id":"d3e4f5a6-..."}',
|
||||
created_at: datetime()
|
||||
}]->(f2)
|
||||
|
||||
// 查询字段的影响范围(下游)
|
||||
MATCH (f:Entity {id: $fieldId, graph_id: $graphId})
|
||||
-[:RELATED_TO *1..5 {relation_type: 'IMPACTS'}]->
|
||||
(downstream:Entity {graph_id: $graphId})
|
||||
RETURN downstream
|
||||
```
|
||||
|
||||
### 业务场景
|
||||
|
||||
- **字段级血缘**:追踪字段从源到目标的完整链路
|
||||
- **影响评估**:修改某个字段前,评估下游影响范围
|
||||
- **数据质量追溯**:发现下游字段质量问题时,回溯源头
|
||||
|
||||
---
|
||||
|
||||
## 10. SOURCED_FROM(来源于)
|
||||
|
||||
**方向**:`KnowledgeSet → Dataset`
|
||||
|
||||
表示知识集的知识内容来源于某个数据集,是知识溯源的基础关系。
|
||||
|
||||
### 关系属性
|
||||
|
||||
| 属性 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| `extraction_method` | String | 否 | 抽取方式:`LLM`(LLM 抽取)/ `RULE`(规则抽取)/ `MANUAL`(人工整理) |
|
||||
| `extracted_at` | String | 否 | 抽取时间(ISO 8601) |
|
||||
| `item_count` | Integer | 否 | 从该数据集抽取的知识条目数 |
|
||||
|
||||
### 约束
|
||||
|
||||
- 源实体类型为 `KnowledgeSet`
|
||||
- 目标实体类型为 `Dataset`
|
||||
- 一个 KnowledgeSet 可来源于多个 Dataset
|
||||
|
||||
### Cypher 示例
|
||||
|
||||
```cypher
|
||||
// 创建来源关系
|
||||
MATCH (k:Entity {id: $knowledgeSetId, graph_id: $graphId})
|
||||
MATCH (d:Entity {id: $datasetId, graph_id: $graphId})
|
||||
CREATE (k)-[r:RELATED_TO {
|
||||
id: randomUUID(),
|
||||
relation_type: 'SOURCED_FROM',
|
||||
graph_id: $graphId,
|
||||
weight: 1.0,
|
||||
confidence: 0.85,
|
||||
properties_json: '{"extraction_method":"LLM","extracted_at":"2026-02-10T14:30:00","item_count":120}',
|
||||
created_at: datetime()
|
||||
}]->(d)
|
||||
|
||||
// 查询知识集的所有数据来源
|
||||
MATCH (k:Entity {id: $knowledgeSetId, graph_id: $graphId})
|
||||
-[r:RELATED_TO {relation_type: 'SOURCED_FROM', graph_id: $graphId}]->
|
||||
(d:Entity {graph_id: $graphId})
|
||||
RETURN d, r.properties_json AS extraction_info
|
||||
```
|
||||
|
||||
### 业务场景
|
||||
|
||||
- **知识溯源**:查看知识集基于哪些数据构建
|
||||
- **数据变更通知**:当源数据集更新时,提醒知识集需要刷新
|
||||
- **知识覆盖分析**:查看哪些数据集尚未被纳入知识管理
|
||||
|
||||
---
|
||||
|
||||
## 关系类型汇总
|
||||
|
||||
| 关系类型 | 方向 | relation_type 值 | 核心用途 |
|
||||
|---------|------|-----------------|---------|
|
||||
| HAS_FIELD | Dataset → Field | `HAS_FIELD` | 数据集字段结构 |
|
||||
| DERIVED_FROM | Dataset → Dataset | `DERIVED_FROM` | 数据集级血缘 |
|
||||
| USES_DATASET | Job/LabelTask/Workflow → Dataset | `USES_DATASET` | 输入依赖 |
|
||||
| PRODUCES | Job → Dataset | `PRODUCES` | 输出产出 |
|
||||
| ASSIGNED_TO | LabelTask/Job → User | `ASSIGNED_TO` | 任务分配 |
|
||||
| BELONGS_TO | User/Dataset → Org | `BELONGS_TO` | 组织归属 |
|
||||
| TRIGGERS | Workflow → Job | `TRIGGERS` | 流程触发 |
|
||||
| DEPENDS_ON | Job → Job | `DEPENDS_ON` | 作业依赖 |
|
||||
| IMPACTS | Field → Field | `IMPACTS` | 字段级血缘 |
|
||||
| SOURCED_FROM | KnowledgeSet → Dataset | `SOURCED_FROM` | 知识溯源 |
|
||||
|
||||
## 典型查询模式
|
||||
|
||||
### 1. 端到端数据血缘
|
||||
|
||||
```cypher
|
||||
// 从最终数据集追溯到原始数据集,经过的所有处理步骤
|
||||
MATCH path = (final:Entity {id: $datasetId, graph_id: $graphId})
|
||||
-[:RELATED_TO *1..10]->
|
||||
(origin:Entity {graph_id: $graphId})
|
||||
WHERE ALL(r IN relationships(path) WHERE r.relation_type IN ['DERIVED_FROM', 'USES_DATASET', 'PRODUCES'])
|
||||
RETURN path
|
||||
```
|
||||
|
||||
### 2. 数据集影响分析
|
||||
|
||||
```cypher
|
||||
// 查找修改某数据集后,所有受影响的下游实体
|
||||
MATCH (d:Entity {id: $datasetId, graph_id: $graphId})
|
||||
<-[:RELATED_TO {relation_type: 'USES_DATASET'}]-
|
||||
(consumer:Entity {graph_id: $graphId})
|
||||
RETURN consumer.type AS entity_type, consumer.name AS entity_name, consumer.id AS entity_id
|
||||
```
|
||||
|
||||
### 3. 用户工作看板
|
||||
|
||||
```cypher
|
||||
// 查询用户相关的所有实体和关系
|
||||
MATCH (u:Entity {id: $userId, type: 'User', graph_id: $graphId})
|
||||
OPTIONAL MATCH (task:Entity)-[:RELATED_TO {relation_type: 'ASSIGNED_TO'}]->(u)
|
||||
OPTIONAL MATCH (u)-[:RELATED_TO {relation_type: 'BELONGS_TO'}]->(org:Entity)
|
||||
RETURN u, collect(DISTINCT task) AS tasks, collect(DISTINCT org) AS orgs
|
||||
```
|
||||
|
||||
## 扩展说明
|
||||
|
||||
- **自定义关系类型**:除上述 10 类核心关系外,用户可通过 LLM 抽取或手动创建自定义关系类型。自定义关系使用相同的 `RELATED_TO` Neo4j 关系类型和公共属性结构,`relation_type` 字段可为任意字符串。
|
||||
- **双向关系**:所有关系均为单向。如果需要表达双向关系(如"A 和 B 互相影响"),应创建两条方向相反的关系。
|
||||
- **关系去重**:应用层应在创建关系前检查是否已存在相同的(source, target, relation_type)组合,避免重复。
|
||||
469
docs/knowledge-graph/schema/schema.cypher
Normal file
469
docs/knowledge-graph/schema/schema.cypher
Normal file
@@ -0,0 +1,469 @@
|
||||
// =============================================================================
|
||||
// DataMate 知识图谱 - Neo4j Schema 初始化脚本
|
||||
// Schema 版本:1.0.0
|
||||
// 更新日期:2026-02-17
|
||||
//
|
||||
// 使用方式:
|
||||
// 1. 通过 Cypher Shell 执行:
|
||||
// cat schema.cypher | cypher-shell -u neo4j -p <password>
|
||||
// 2. 或在 Neo4j Browser 中逐段执行
|
||||
//
|
||||
// 注意:
|
||||
// - 所有索引和约束使用 IF NOT EXISTS,可重复执行
|
||||
// - 约束自动创建对应索引,无需重复创建
|
||||
// - 关系属性索引需要 Neo4j Enterprise Edition,社区版使用属性内联匹配
|
||||
// =============================================================================
|
||||
|
||||
// =============================================================================
|
||||
// 第 1 部分:节点约束
|
||||
// =============================================================================
|
||||
|
||||
// Entity 节点 ID 唯一性约束(自动创建索引)
|
||||
CREATE CONSTRAINT entity_id_unique IF NOT EXISTS
|
||||
FOR (n:Entity) REQUIRE n.id IS UNIQUE;
|
||||
|
||||
// =============================================================================
|
||||
// 第 2 部分:节点索引
|
||||
// =============================================================================
|
||||
|
||||
// graph_id 索引 —— 多租户隔离的核心索引,所有查询都会带上 graph_id
|
||||
CREATE INDEX entity_graph_id IF NOT EXISTS
|
||||
FOR (n:Entity) ON (n.graph_id);
|
||||
|
||||
// type 索引 —— 按实体类型过滤
|
||||
CREATE INDEX entity_type IF NOT EXISTS
|
||||
FOR (n:Entity) ON (n.type);
|
||||
|
||||
// name 索引 —— 按名称搜索
|
||||
CREATE INDEX entity_name IF NOT EXISTS
|
||||
FOR (n:Entity) ON (n.name);
|
||||
|
||||
// source_id 索引 —— MySQL → Neo4j 同步时按源 ID 查找
|
||||
CREATE INDEX entity_source_id IF NOT EXISTS
|
||||
FOR (n:Entity) ON (n.source_id);
|
||||
|
||||
// 复合索引:(graph_id, type) —— 查询某图谱内指定类型的实体
|
||||
CREATE INDEX entity_graph_id_type IF NOT EXISTS
|
||||
FOR (n:Entity) ON (n.graph_id, n.type);
|
||||
|
||||
// 复合索引:(graph_id, id) —— 精确查找实体(最常用查询路径)
|
||||
CREATE INDEX entity_graph_id_id IF NOT EXISTS
|
||||
FOR (n:Entity) ON (n.graph_id, n.id);
|
||||
|
||||
// 复合索引:(graph_id, source_id) —— 同步时按源 ID 查找
|
||||
CREATE INDEX entity_graph_id_source_id IF NOT EXISTS
|
||||
FOR (n:Entity) ON (n.graph_id, n.source_id);
|
||||
|
||||
// created_at 索引 —— 按创建时间排序
|
||||
CREATE INDEX entity_created_at IF NOT EXISTS
|
||||
FOR (n:Entity) ON (n.created_at);
|
||||
|
||||
// =============================================================================
|
||||
// 第 3 部分:全文索引(用于模糊搜索)
|
||||
// =============================================================================
|
||||
|
||||
// Entity name + description 全文索引
|
||||
CREATE FULLTEXT INDEX entity_fulltext IF NOT EXISTS
|
||||
FOR (n:Entity) ON EACH [n.name, n.description];
|
||||
|
||||
// =============================================================================
|
||||
// 第 3.1 部分:SyncHistory 约束和索引(同步元数据节点)
|
||||
// =============================================================================
|
||||
|
||||
// (graph_id, sync_id) 唯一约束 —— 防止 syncId 碰撞产生重复记录
|
||||
CREATE CONSTRAINT sync_history_graph_sync_unique IF NOT EXISTS
|
||||
FOR (h:SyncHistory) REQUIRE (h.graph_id, h.sync_id) IS UNIQUE;
|
||||
|
||||
// (graph_id, started_at) 索引 —— 加速按时间范围查询同步历史
|
||||
CREATE INDEX sync_history_graph_started IF NOT EXISTS
|
||||
FOR (h:SyncHistory) ON (h.graph_id, h.started_at);
|
||||
|
||||
// (graph_id, status, started_at) 索引 —— 加速按状态+时间的过滤查询
|
||||
CREATE INDEX sync_history_graph_status_started IF NOT EXISTS
|
||||
FOR (h:SyncHistory) ON (h.graph_id, h.status, h.started_at);
|
||||
|
||||
// =============================================================================
|
||||
// 第 4 部分:关系属性说明
|
||||
// =============================================================================
|
||||
|
||||
// Neo4j 社区版不支持关系属性索引。
|
||||
// 所有关系查询通过节点索引定位后,在关系上使用属性内联匹配:
|
||||
// -[r:RELATED_TO {graph_id: $graphId, relation_type: $type}]->
|
||||
//
|
||||
// 如果使用 Neo4j Enterprise Edition,可取消以下注释创建关系索引:
|
||||
//
|
||||
// CREATE INDEX rel_graph_id IF NOT EXISTS
|
||||
// FOR ()-[r:RELATED_TO]-() ON (r.graph_id);
|
||||
//
|
||||
// CREATE INDEX rel_relation_type IF NOT EXISTS
|
||||
// FOR ()-[r:RELATED_TO]-() ON (r.relation_type);
|
||||
//
|
||||
// CREATE INDEX rel_id IF NOT EXISTS
|
||||
// FOR ()-[r:RELATED_TO]-() ON (r.id);
|
||||
|
||||
// =============================================================================
|
||||
// 第 5 部分:示例数据(可选,用于验证 Schema)
|
||||
// =============================================================================
|
||||
|
||||
// 以下示例数据使用固定的 graph_id,用于开发和测试环境。
|
||||
// 生产环境中 graph_id 由应用层生成和管理。
|
||||
|
||||
// --- 创建示例组织 ---
|
||||
CREATE (org:Entity {
|
||||
id: '00000000-0000-0000-0000-000000000001',
|
||||
name: '数据工程部',
|
||||
type: 'Org',
|
||||
description: '负责数据采集、清洗和标注',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
source_type: 'MANUAL',
|
||||
confidence: 1.0,
|
||||
properties_json: '{"org_code":"DE","level":1,"member_count":15}',
|
||||
created_at: datetime()
|
||||
});
|
||||
|
||||
// --- 创建示例用户 ---
|
||||
CREATE (user:Entity {
|
||||
id: '00000000-0000-0000-0000-000000000002',
|
||||
name: '张三',
|
||||
type: 'User',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
source_type: 'SYNC',
|
||||
confidence: 1.0,
|
||||
properties_json: '{"username":"zhangsan","email":"zhangsan@example.com","role":"USER","enabled":true}',
|
||||
created_at: datetime()
|
||||
});
|
||||
|
||||
// --- 创建示例数据集(源) ---
|
||||
CREATE (ds1:Entity {
|
||||
id: '00000000-0000-0000-0000-000000000010',
|
||||
name: '用户行为日志-原始',
|
||||
type: 'Dataset',
|
||||
description: '原始用户行为埋点数据',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
source_id: '100',
|
||||
source_type: 'SYNC',
|
||||
confidence: 1.0,
|
||||
properties_json: '{"dataset_type":"TEXT","status":"ACTIVE","category":"用户行为","format":"JSON","record_count":2000000,"size_bytes":3221225472}',
|
||||
created_at: datetime()
|
||||
});
|
||||
|
||||
// --- 创建示例数据集(清洗后) ---
|
||||
CREATE (ds2:Entity {
|
||||
id: '00000000-0000-0000-0000-000000000011',
|
||||
name: '用户行为日志-清洗后',
|
||||
type: 'Dataset',
|
||||
description: '经过去重和格式标准化的用户行为数据',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
source_id: '101',
|
||||
source_type: 'SYNC',
|
||||
confidence: 1.0,
|
||||
properties_json: '{"dataset_type":"TEXT","status":"ACTIVE","category":"用户行为","format":"JSON","record_count":1500000,"size_bytes":2147483648,"version":1}',
|
||||
created_at: datetime()
|
||||
});
|
||||
|
||||
// --- 创建示例字段 ---
|
||||
CREATE (f1:Entity {
|
||||
id: '00000000-0000-0000-0000-000000000020',
|
||||
name: 'user_id',
|
||||
type: 'Field',
|
||||
description: '用户唯一标识符',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
source_type: 'SYNC',
|
||||
confidence: 1.0,
|
||||
properties_json: '{"data_type":"STRING","nullable":false,"is_primary_key":true}',
|
||||
created_at: datetime()
|
||||
});
|
||||
|
||||
CREATE (f2:Entity {
|
||||
id: '00000000-0000-0000-0000-000000000021',
|
||||
name: 'event_type',
|
||||
type: 'Field',
|
||||
description: '事件类型',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
source_type: 'SYNC',
|
||||
confidence: 1.0,
|
||||
properties_json: '{"data_type":"STRING","nullable":false,"sample_values":["click","view","purchase"]}',
|
||||
created_at: datetime()
|
||||
});
|
||||
|
||||
CREATE (f3:Entity {
|
||||
id: '00000000-0000-0000-0000-000000000022',
|
||||
name: 'user_id',
|
||||
type: 'Field',
|
||||
description: '用户唯一标识符(清洗后)',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
source_type: 'SYNC',
|
||||
confidence: 1.0,
|
||||
properties_json: '{"data_type":"STRING","nullable":false,"is_primary_key":true}',
|
||||
created_at: datetime()
|
||||
});
|
||||
|
||||
// --- 创建示例工作流 ---
|
||||
CREATE (wf:Entity {
|
||||
id: '00000000-0000-0000-0000-000000000030',
|
||||
name: '文本去重清洗管道',
|
||||
type: 'Workflow',
|
||||
description: 'SimHash去重 + 格式标准化 + 空值过滤',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
source_type: 'SYNC',
|
||||
confidence: 1.0,
|
||||
properties_json: '{"workflow_type":"CLEANING","status":"ACTIVE","version":"1.0","operator_count":3}',
|
||||
created_at: datetime()
|
||||
});
|
||||
|
||||
// --- 创建示例作业 ---
|
||||
CREATE (job:Entity {
|
||||
id: '00000000-0000-0000-0000-000000000040',
|
||||
name: '清洗作业-20260215-001',
|
||||
type: 'Job',
|
||||
description: '用户行为日志去重清洗',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
source_id: '500',
|
||||
source_type: 'SYNC',
|
||||
confidence: 1.0,
|
||||
properties_json: '{"job_type":"CLEANING","status":"COMPLETED","started_at":"2026-02-15T10:00:00","completed_at":"2026-02-15T10:35:00","duration_seconds":2100,"input_count":2000000,"output_count":1500000}',
|
||||
created_at: datetime()
|
||||
});
|
||||
|
||||
// --- 创建示例标注任务 ---
|
||||
CREATE (lt:Entity {
|
||||
id: '00000000-0000-0000-0000-000000000050',
|
||||
name: '情感分析标注-批次1',
|
||||
type: 'LabelTask',
|
||||
description: '用户评论情感标注(正面/负面/中性)',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
source_id: '600',
|
||||
source_type: 'SYNC',
|
||||
confidence: 1.0,
|
||||
properties_json: '{"task_mode":"MANUAL","data_type":"text","labeling_type":"sentiment_analysis","status":"IN_PROGRESS","progress":30.0,"template_name":"情感分析"}',
|
||||
created_at: datetime()
|
||||
});
|
||||
|
||||
// --- 创建示例知识集 ---
|
||||
CREATE (ks:Entity {
|
||||
id: '00000000-0000-0000-0000-000000000060',
|
||||
name: '用户行为分析知识库',
|
||||
type: 'KnowledgeSet',
|
||||
description: '从用户行为数据中提取的业务规则和洞察',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
source_type: 'SYNC',
|
||||
confidence: 1.0,
|
||||
properties_json: '{"status":"PUBLISHED","domain":"用户行为","business_line":"数据分析","sensitivity":"INTERNAL","item_count":85}',
|
||||
created_at: datetime()
|
||||
});
|
||||
|
||||
// =============================================================================
|
||||
// 第 6 部分:示例关系
|
||||
// =============================================================================
|
||||
|
||||
// HAS_FIELD:源数据集 → 字段
|
||||
MATCH (ds1:Entity {id: '00000000-0000-0000-0000-000000000010'})
|
||||
MATCH (f1:Entity {id: '00000000-0000-0000-0000-000000000020'})
|
||||
CREATE (ds1)-[:RELATED_TO {
|
||||
id: '00000000-0000-0000-0000-100000000001',
|
||||
relation_type: 'HAS_FIELD',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
weight: 1.0,
|
||||
confidence: 1.0,
|
||||
source_id: '',
|
||||
properties_json: '{"ordinal":0,"required":true}',
|
||||
created_at: datetime()
|
||||
}]->(f1);
|
||||
|
||||
MATCH (ds1:Entity {id: '00000000-0000-0000-0000-000000000010'})
|
||||
MATCH (f2:Entity {id: '00000000-0000-0000-0000-000000000021'})
|
||||
CREATE (ds1)-[:RELATED_TO {
|
||||
id: '00000000-0000-0000-0000-100000000002',
|
||||
relation_type: 'HAS_FIELD',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
weight: 1.0,
|
||||
confidence: 1.0,
|
||||
source_id: '',
|
||||
properties_json: '{"ordinal":1,"required":true}',
|
||||
created_at: datetime()
|
||||
}]->(f2);
|
||||
|
||||
// HAS_FIELD:清洗后数据集 → 字段
|
||||
MATCH (ds2:Entity {id: '00000000-0000-0000-0000-000000000011'})
|
||||
MATCH (f3:Entity {id: '00000000-0000-0000-0000-000000000022'})
|
||||
CREATE (ds2)-[:RELATED_TO {
|
||||
id: '00000000-0000-0000-0000-100000000003',
|
||||
relation_type: 'HAS_FIELD',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
weight: 1.0,
|
||||
confidence: 1.0,
|
||||
source_id: '',
|
||||
properties_json: '{"ordinal":0,"required":true}',
|
||||
created_at: datetime()
|
||||
}]->(f3);
|
||||
|
||||
// DERIVED_FROM:清洗后数据集 → 源数据集
|
||||
MATCH (ds2:Entity {id: '00000000-0000-0000-0000-000000000011'})
|
||||
MATCH (ds1:Entity {id: '00000000-0000-0000-0000-000000000010'})
|
||||
CREATE (ds2)-[:RELATED_TO {
|
||||
id: '00000000-0000-0000-0000-100000000004',
|
||||
relation_type: 'DERIVED_FROM',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
weight: 1.0,
|
||||
confidence: 1.0,
|
||||
source_id: '',
|
||||
properties_json: '{"derivation_type":"CLEANING","job_id":"00000000-0000-0000-0000-000000000040","transformation":"SimHash去重 + 空值过滤"}',
|
||||
created_at: datetime()
|
||||
}]->(ds1);
|
||||
|
||||
// TRIGGERS:工作流 → 作业
|
||||
MATCH (wf:Entity {id: '00000000-0000-0000-0000-000000000030'})
|
||||
MATCH (job:Entity {id: '00000000-0000-0000-0000-000000000040'})
|
||||
CREATE (wf)-[:RELATED_TO {
|
||||
id: '00000000-0000-0000-0000-100000000005',
|
||||
relation_type: 'TRIGGERS',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
weight: 1.0,
|
||||
confidence: 1.0,
|
||||
source_id: '',
|
||||
properties_json: '{"trigger_type":"MANUAL","triggered_at":"2026-02-15T10:00:00"}',
|
||||
created_at: datetime()
|
||||
}]->(job);
|
||||
|
||||
// USES_DATASET:作业 → 源数据集
|
||||
MATCH (job:Entity {id: '00000000-0000-0000-0000-000000000040'})
|
||||
MATCH (ds1:Entity {id: '00000000-0000-0000-0000-000000000010'})
|
||||
CREATE (job)-[:RELATED_TO {
|
||||
id: '00000000-0000-0000-0000-100000000006',
|
||||
relation_type: 'USES_DATASET',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
weight: 1.0,
|
||||
confidence: 1.0,
|
||||
source_id: '',
|
||||
properties_json: '{"usage_role":"INPUT"}',
|
||||
created_at: datetime()
|
||||
}]->(ds1);
|
||||
|
||||
// PRODUCES:作业 → 清洗后数据集
|
||||
MATCH (job:Entity {id: '00000000-0000-0000-0000-000000000040'})
|
||||
MATCH (ds2:Entity {id: '00000000-0000-0000-0000-000000000011'})
|
||||
CREATE (job)-[:RELATED_TO {
|
||||
id: '00000000-0000-0000-0000-100000000007',
|
||||
relation_type: 'PRODUCES',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
weight: 1.0,
|
||||
confidence: 1.0,
|
||||
source_id: '',
|
||||
properties_json: '{"output_type":"PRIMARY"}',
|
||||
created_at: datetime()
|
||||
}]->(ds2);
|
||||
|
||||
// ASSIGNED_TO:标注任务 → 用户
|
||||
MATCH (lt:Entity {id: '00000000-0000-0000-0000-000000000050'})
|
||||
MATCH (user:Entity {id: '00000000-0000-0000-0000-000000000002'})
|
||||
CREATE (lt)-[:RELATED_TO {
|
||||
id: '00000000-0000-0000-0000-100000000008',
|
||||
relation_type: 'ASSIGNED_TO',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
weight: 1.0,
|
||||
confidence: 1.0,
|
||||
source_id: '',
|
||||
properties_json: '{"assigned_at":"2026-02-14T09:00:00","role":"EXECUTOR"}',
|
||||
created_at: datetime()
|
||||
}]->(user);
|
||||
|
||||
// USES_DATASET:标注任务 → 清洗后数据集
|
||||
MATCH (lt:Entity {id: '00000000-0000-0000-0000-000000000050'})
|
||||
MATCH (ds2:Entity {id: '00000000-0000-0000-0000-000000000011'})
|
||||
CREATE (lt)-[:RELATED_TO {
|
||||
id: '00000000-0000-0000-0000-100000000009',
|
||||
relation_type: 'USES_DATASET',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
weight: 1.0,
|
||||
confidence: 1.0,
|
||||
source_id: '',
|
||||
properties_json: '{"usage_role":"INPUT"}',
|
||||
created_at: datetime()
|
||||
}]->(ds2);
|
||||
|
||||
// BELONGS_TO:用户 → 组织
|
||||
MATCH (user:Entity {id: '00000000-0000-0000-0000-000000000002'})
|
||||
MATCH (org:Entity {id: '00000000-0000-0000-0000-000000000001'})
|
||||
CREATE (user)-[:RELATED_TO {
|
||||
id: '00000000-0000-0000-0000-100000000010',
|
||||
relation_type: 'BELONGS_TO',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
weight: 1.0,
|
||||
confidence: 1.0,
|
||||
source_id: '',
|
||||
properties_json: '{"membership_type":"PRIMARY","since":"2025-03-01T00:00:00"}',
|
||||
created_at: datetime()
|
||||
}]->(org);
|
||||
|
||||
// BELONGS_TO:源数据集 → 组织
|
||||
MATCH (ds1:Entity {id: '00000000-0000-0000-0000-000000000010'})
|
||||
MATCH (org:Entity {id: '00000000-0000-0000-0000-000000000001'})
|
||||
CREATE (ds1)-[:RELATED_TO {
|
||||
id: '00000000-0000-0000-0000-100000000011',
|
||||
relation_type: 'BELONGS_TO',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
weight: 1.0,
|
||||
confidence: 1.0,
|
||||
source_id: '',
|
||||
properties_json: '{"membership_type":"PRIMARY"}',
|
||||
created_at: datetime()
|
||||
}]->(org);
|
||||
|
||||
// IMPACTS:源字段 → 清洗后字段
|
||||
MATCH (f1:Entity {id: '00000000-0000-0000-0000-000000000020'})
|
||||
MATCH (f3:Entity {id: '00000000-0000-0000-0000-000000000022'})
|
||||
CREATE (f1)-[:RELATED_TO {
|
||||
id: '00000000-0000-0000-0000-100000000012',
|
||||
relation_type: 'IMPACTS',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
weight: 1.0,
|
||||
confidence: 0.95,
|
||||
source_id: '',
|
||||
properties_json: '{"impact_type":"DIRECT","job_id":"00000000-0000-0000-0000-000000000040"}',
|
||||
created_at: datetime()
|
||||
}]->(f3);
|
||||
|
||||
// SOURCED_FROM:知识集 → 清洗后数据集
|
||||
MATCH (ks:Entity {id: '00000000-0000-0000-0000-000000000060'})
|
||||
MATCH (ds2:Entity {id: '00000000-0000-0000-0000-000000000011'})
|
||||
CREATE (ks)-[:RELATED_TO {
|
||||
id: '00000000-0000-0000-0000-100000000013',
|
||||
relation_type: 'SOURCED_FROM',
|
||||
graph_id: '11111111-1111-1111-1111-111111111111',
|
||||
weight: 1.0,
|
||||
confidence: 0.85,
|
||||
source_id: '',
|
||||
properties_json: '{"extraction_method":"LLM","extracted_at":"2026-02-16T14:30:00","item_count":85}',
|
||||
created_at: datetime()
|
||||
}]->(ds2);
|
||||
|
||||
// =============================================================================
|
||||
// 第 7 部分:验证查询
|
||||
// =============================================================================
|
||||
|
||||
// 验证节点数量
|
||||
// MATCH (n:Entity {graph_id: '11111111-1111-1111-1111-111111111111'})
|
||||
// RETURN n.type AS type, count(*) AS count
|
||||
// ORDER BY count DESC;
|
||||
|
||||
// 验证关系数量
|
||||
// MATCH (:Entity {graph_id: '11111111-1111-1111-1111-111111111111'})
|
||||
// -[r:RELATED_TO {graph_id: '11111111-1111-1111-1111-111111111111'}]->
|
||||
// (:Entity {graph_id: '11111111-1111-1111-1111-111111111111'})
|
||||
// RETURN r.relation_type AS type, count(*) AS count
|
||||
// ORDER BY count DESC;
|
||||
|
||||
// 验证端到端血缘
|
||||
// MATCH path = (ds2:Entity {name: '用户行为日志-清洗后'})
|
||||
// -[:RELATED_TO *1..5]->
|
||||
// (origin:Entity)
|
||||
// WHERE ALL(r IN relationships(path) WHERE r.graph_id = '11111111-1111-1111-1111-111111111111')
|
||||
// RETURN path;
|
||||
|
||||
// =============================================================================
|
||||
// 第 8 部分:清理示例数据(可选)
|
||||
// =============================================================================
|
||||
|
||||
// 如需清理示例数据,执行以下语句:
|
||||
// MATCH (n:Entity {graph_id: '11111111-1111-1111-1111-111111111111'})
|
||||
// DETACH DELETE n;
|
||||
1444
frontend/package-lock.json
generated
1444
frontend/package-lock.json
generated
File diff suppressed because it is too large
Load Diff
@@ -20,7 +20,8 @@
|
||||
"react-dom": "^18.1.1",
|
||||
"react-redux": "^9.2.0",
|
||||
"react-router": "^7.8.0",
|
||||
"recharts": "2.15.0"
|
||||
"recharts": "2.15.0",
|
||||
"@antv/g6": "^5.0.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@eslint/js": "^9.33.0",
|
||||
|
||||
@@ -22,6 +22,8 @@ export const PermissionCodes = {
|
||||
taskCoordinationAssign: "module:task-coordination:assign",
|
||||
contentGenerationUse: "module:content-generation:use",
|
||||
agentUse: "module:agent:use",
|
||||
knowledgeGraphRead: "module:knowledge-graph:read",
|
||||
knowledgeGraphWrite: "module:knowledge-graph:write",
|
||||
userManage: "system:user:manage",
|
||||
roleManage: "system:role:manage",
|
||||
permissionManage: "system:permission:manage",
|
||||
@@ -39,6 +41,7 @@ const routePermissionRules: Array<{ prefix: string; permission: string }> = [
|
||||
{ prefix: "/data/orchestration", permission: PermissionCodes.orchestrationRead },
|
||||
{ prefix: "/data/task-coordination", permission: PermissionCodes.taskCoordinationRead },
|
||||
{ prefix: "/data/content-generation", permission: PermissionCodes.contentGenerationUse },
|
||||
{ prefix: "/data/knowledge-graph", permission: PermissionCodes.knowledgeGraphRead },
|
||||
{ prefix: "/chat", permission: PermissionCodes.agentUse },
|
||||
];
|
||||
|
||||
|
||||
274
frontend/src/pages/KnowledgeGraph/Home/KnowledgeGraphPage.tsx
Normal file
274
frontend/src/pages/KnowledgeGraph/Home/KnowledgeGraphPage.tsx
Normal file
@@ -0,0 +1,274 @@
|
||||
import { useState, useCallback, useEffect } from "react";
|
||||
import { Card, Input, Select, Button, Tag, Space, Empty, Tabs, message } from "antd";
|
||||
import { Network, RotateCcw } from "lucide-react";
|
||||
import { useSearchParams } from "react-router";
|
||||
import GraphCanvas from "../components/GraphCanvas";
|
||||
import SearchPanel from "../components/SearchPanel";
|
||||
import QueryBuilder from "../components/QueryBuilder";
|
||||
import NodeDetail from "../components/NodeDetail";
|
||||
import RelationDetail from "../components/RelationDetail";
|
||||
import useGraphData from "../hooks/useGraphData";
|
||||
import useGraphLayout, { LAYOUT_OPTIONS } from "../hooks/useGraphLayout";
|
||||
import {
|
||||
ENTITY_TYPE_COLORS,
|
||||
DEFAULT_ENTITY_COLOR,
|
||||
ENTITY_TYPE_LABELS,
|
||||
} from "../knowledge-graph.const";
|
||||
|
||||
const UUID_REGEX = /^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i;
|
||||
|
||||
export default function KnowledgeGraphPage() {
|
||||
const [params, setParams] = useSearchParams();
|
||||
const [graphId, setGraphId] = useState(() => params.get("graphId") ?? "");
|
||||
const [graphIdInput, setGraphIdInput] = useState(() => params.get("graphId") ?? "");
|
||||
|
||||
const {
|
||||
graphData,
|
||||
loading,
|
||||
searchResults,
|
||||
searchLoading,
|
||||
highlightedNodeIds,
|
||||
loadInitialData,
|
||||
expandNode,
|
||||
searchEntities,
|
||||
mergePathData,
|
||||
clearGraph,
|
||||
clearSearch,
|
||||
} = useGraphData();
|
||||
|
||||
const { layoutType, setLayoutType } = useGraphLayout();
|
||||
|
||||
// Detail panel state
|
||||
const [selectedNodeId, setSelectedNodeId] = useState<string | null>(null);
|
||||
const [selectedEdgeId, setSelectedEdgeId] = useState<string | null>(null);
|
||||
const [nodeDetailOpen, setNodeDetailOpen] = useState(false);
|
||||
const [relationDetailOpen, setRelationDetailOpen] = useState(false);
|
||||
|
||||
// Load graph when graphId changes
|
||||
useEffect(() => {
|
||||
if (graphId && UUID_REGEX.test(graphId)) {
|
||||
clearGraph();
|
||||
loadInitialData(graphId);
|
||||
}
|
||||
}, [graphId, loadInitialData, clearGraph]);
|
||||
|
||||
const handleLoadGraph = useCallback(() => {
|
||||
if (!UUID_REGEX.test(graphIdInput)) {
|
||||
message.warning("请输入有效的图谱 ID(UUID 格式)");
|
||||
return;
|
||||
}
|
||||
setGraphId(graphIdInput);
|
||||
setParams({ graphId: graphIdInput });
|
||||
}, [graphIdInput, setParams]);
|
||||
|
||||
const handleNodeClick = useCallback((nodeId: string) => {
|
||||
setSelectedNodeId(nodeId);
|
||||
setSelectedEdgeId(null);
|
||||
setNodeDetailOpen(true);
|
||||
setRelationDetailOpen(false);
|
||||
}, []);
|
||||
|
||||
const handleEdgeClick = useCallback((edgeId: string) => {
|
||||
setSelectedEdgeId(edgeId);
|
||||
setSelectedNodeId(null);
|
||||
setRelationDetailOpen(true);
|
||||
setNodeDetailOpen(false);
|
||||
}, []);
|
||||
|
||||
const handleNodeDoubleClick = useCallback(
|
||||
(nodeId: string) => {
|
||||
if (!graphId) return;
|
||||
expandNode(graphId, nodeId);
|
||||
},
|
||||
[graphId, expandNode]
|
||||
);
|
||||
|
||||
const handleCanvasClick = useCallback(() => {
|
||||
setSelectedNodeId(null);
|
||||
setSelectedEdgeId(null);
|
||||
setNodeDetailOpen(false);
|
||||
setRelationDetailOpen(false);
|
||||
}, []);
|
||||
|
||||
const handleExpandNode = useCallback(
|
||||
(entityId: string) => {
|
||||
if (!graphId) return;
|
||||
expandNode(graphId, entityId);
|
||||
},
|
||||
[graphId, expandNode]
|
||||
);
|
||||
|
||||
const handleEntityNavigate = useCallback(
|
||||
(entityId: string) => {
|
||||
setSelectedNodeId(entityId);
|
||||
setNodeDetailOpen(true);
|
||||
setRelationDetailOpen(false);
|
||||
},
|
||||
[]
|
||||
);
|
||||
|
||||
const handleSearchResultClick = useCallback(
|
||||
(entityId: string) => {
|
||||
handleNodeClick(entityId);
|
||||
// If the entity is not in the current graph, expand it
|
||||
if (!graphData.nodes.find((n) => n.id === entityId) && graphId) {
|
||||
expandNode(graphId, entityId);
|
||||
}
|
||||
},
|
||||
[handleNodeClick, graphData.nodes, graphId, expandNode]
|
||||
);
|
||||
|
||||
const handleRelationClick = useCallback((relationId: string) => {
|
||||
setSelectedEdgeId(relationId);
|
||||
setRelationDetailOpen(true);
|
||||
setNodeDetailOpen(false);
|
||||
}, []);
|
||||
|
||||
const hasGraph = graphId && UUID_REGEX.test(graphId);
|
||||
const nodeCount = graphData.nodes.length;
|
||||
const edgeCount = graphData.edges.length;
|
||||
|
||||
// Collect unique entity types in current graph for legend
|
||||
const entityTypes = [...new Set(graphData.nodes.map((n) => n.data.type))].sort();
|
||||
|
||||
return (
|
||||
<div className="h-full flex flex-col gap-4">
|
||||
{/* Header */}
|
||||
<div className="flex items-center justify-between">
|
||||
<h1 className="text-xl font-bold flex items-center gap-2">
|
||||
<Network className="w-5 h-5" />
|
||||
知识图谱浏览器
|
||||
</h1>
|
||||
</div>
|
||||
|
||||
{/* Graph ID Input + Controls */}
|
||||
<div className="flex items-center gap-3 flex-wrap">
|
||||
<Space.Compact className="w-[420px]">
|
||||
<Input
|
||||
placeholder="输入图谱 ID (UUID)..."
|
||||
value={graphIdInput}
|
||||
onChange={(e) => setGraphIdInput(e.target.value)}
|
||||
onPressEnter={handleLoadGraph}
|
||||
allowClear
|
||||
/>
|
||||
<Button type="primary" onClick={handleLoadGraph}>
|
||||
加载
|
||||
</Button>
|
||||
</Space.Compact>
|
||||
|
||||
<Select
|
||||
value={layoutType}
|
||||
onChange={setLayoutType}
|
||||
options={LAYOUT_OPTIONS}
|
||||
className="w-28"
|
||||
/>
|
||||
|
||||
{hasGraph && (
|
||||
<>
|
||||
<Button
|
||||
icon={<RotateCcw className="w-3.5 h-3.5" />}
|
||||
onClick={() => loadInitialData(graphId)}
|
||||
>
|
||||
重新加载
|
||||
</Button>
|
||||
<span className="text-sm text-gray-500">
|
||||
节点: {nodeCount} | 边: {edgeCount}
|
||||
</span>
|
||||
</>
|
||||
)}
|
||||
</div>
|
||||
|
||||
{/* Legend */}
|
||||
{entityTypes.length > 0 && (
|
||||
<div className="flex items-center gap-2 flex-wrap">
|
||||
<span className="text-xs text-gray-500">图例:</span>
|
||||
{entityTypes.map((type) => (
|
||||
<Tag key={type} color={ENTITY_TYPE_COLORS[type] ?? DEFAULT_ENTITY_COLOR}>
|
||||
{ENTITY_TYPE_LABELS[type] ?? type}
|
||||
</Tag>
|
||||
))}
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Main content */}
|
||||
<div className="flex-1 flex gap-4 min-h-0">
|
||||
{/* Sidebar with tabs */}
|
||||
{hasGraph && (
|
||||
<Card className="w-72 shrink-0 overflow-auto" size="small" bodyStyle={{ padding: 0 }}>
|
||||
<Tabs
|
||||
size="small"
|
||||
className="px-3"
|
||||
items={[
|
||||
{
|
||||
key: "search",
|
||||
label: "搜索",
|
||||
children: (
|
||||
<SearchPanel
|
||||
graphId={graphId}
|
||||
results={searchResults}
|
||||
loading={searchLoading}
|
||||
onSearch={searchEntities}
|
||||
onResultClick={handleSearchResultClick}
|
||||
onClear={clearSearch}
|
||||
/>
|
||||
),
|
||||
},
|
||||
{
|
||||
key: "query",
|
||||
label: "路径查询",
|
||||
children: (
|
||||
<QueryBuilder
|
||||
graphId={graphId}
|
||||
onPathResult={mergePathData}
|
||||
/>
|
||||
),
|
||||
},
|
||||
]}
|
||||
/>
|
||||
</Card>
|
||||
)}
|
||||
|
||||
{/* Canvas */}
|
||||
<Card className="flex-1 min-w-0" bodyStyle={{ height: "100%", padding: 0 }}>
|
||||
{hasGraph ? (
|
||||
<GraphCanvas
|
||||
data={graphData}
|
||||
loading={loading}
|
||||
layoutType={layoutType}
|
||||
highlightedNodeIds={highlightedNodeIds}
|
||||
onNodeClick={handleNodeClick}
|
||||
onEdgeClick={handleEdgeClick}
|
||||
onNodeDoubleClick={handleNodeDoubleClick}
|
||||
onCanvasClick={handleCanvasClick}
|
||||
/>
|
||||
) : (
|
||||
<div className="h-full flex items-center justify-center">
|
||||
<Empty
|
||||
description="请输入图谱 ID 加载知识图谱"
|
||||
image={<Network className="w-16 h-16 text-gray-300 mx-auto" />}
|
||||
/>
|
||||
</div>
|
||||
)}
|
||||
</Card>
|
||||
</div>
|
||||
|
||||
{/* Detail drawers */}
|
||||
<NodeDetail
|
||||
graphId={graphId}
|
||||
entityId={selectedNodeId}
|
||||
open={nodeDetailOpen}
|
||||
onClose={() => setNodeDetailOpen(false)}
|
||||
onExpandNode={handleExpandNode}
|
||||
onRelationClick={handleRelationClick}
|
||||
onEntityNavigate={handleEntityNavigate}
|
||||
/>
|
||||
<RelationDetail
|
||||
graphId={graphId}
|
||||
relationId={selectedEdgeId}
|
||||
open={relationDetailOpen}
|
||||
onClose={() => setRelationDetailOpen(false)}
|
||||
onEntityNavigate={handleEntityNavigate}
|
||||
/>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
182
frontend/src/pages/KnowledgeGraph/components/GraphCanvas.tsx
Normal file
182
frontend/src/pages/KnowledgeGraph/components/GraphCanvas.tsx
Normal file
@@ -0,0 +1,182 @@
|
||||
import { useEffect, useRef, useCallback, memo } from "react";
|
||||
import { Graph } from "@antv/g6";
|
||||
import { Spin } from "antd";
|
||||
import type { G6GraphData } from "../graphTransform";
|
||||
import { createGraphOptions, LARGE_GRAPH_THRESHOLD } from "../graphConfig";
|
||||
import type { LayoutType } from "../hooks/useGraphLayout";
|
||||
|
||||
interface GraphCanvasProps {
|
||||
data: G6GraphData;
|
||||
loading?: boolean;
|
||||
layoutType: LayoutType;
|
||||
highlightedNodeIds?: Set<string>;
|
||||
onNodeClick?: (nodeId: string) => void;
|
||||
onEdgeClick?: (edgeId: string) => void;
|
||||
onNodeDoubleClick?: (nodeId: string) => void;
|
||||
onCanvasClick?: () => void;
|
||||
}
|
||||
|
||||
function GraphCanvas({
|
||||
data,
|
||||
loading = false,
|
||||
layoutType,
|
||||
highlightedNodeIds,
|
||||
onNodeClick,
|
||||
onEdgeClick,
|
||||
onNodeDoubleClick,
|
||||
onCanvasClick,
|
||||
}: GraphCanvasProps) {
|
||||
const containerRef = useRef<HTMLDivElement>(null);
|
||||
const graphRef = useRef<Graph | null>(null);
|
||||
|
||||
// Initialize graph
|
||||
useEffect(() => {
|
||||
if (!containerRef.current) return;
|
||||
|
||||
const options = createGraphOptions(containerRef.current);
|
||||
const graph = new Graph(options);
|
||||
graphRef.current = graph;
|
||||
|
||||
graph.render();
|
||||
|
||||
return () => {
|
||||
graphRef.current = null;
|
||||
graph.destroy();
|
||||
};
|
||||
}, []);
|
||||
|
||||
// Update data (with large-graph performance optimization)
|
||||
useEffect(() => {
|
||||
const graph = graphRef.current;
|
||||
if (!graph) return;
|
||||
|
||||
const isLargeGraph = data.nodes.length >= LARGE_GRAPH_THRESHOLD;
|
||||
if (isLargeGraph) {
|
||||
graph.setOptions({ animation: false });
|
||||
}
|
||||
|
||||
if (data.nodes.length === 0 && data.edges.length === 0) {
|
||||
graph.setData({ nodes: [], edges: [] });
|
||||
graph.render();
|
||||
return;
|
||||
}
|
||||
graph.setData(data);
|
||||
graph.render();
|
||||
}, [data]);
|
||||
|
||||
// Update layout
|
||||
useEffect(() => {
|
||||
const graph = graphRef.current;
|
||||
if (!graph) return;
|
||||
|
||||
const layoutConfigs: Record<string, Record<string, unknown>> = {
|
||||
"d3-force": {
|
||||
type: "d3-force",
|
||||
preventOverlap: true,
|
||||
link: { distance: 180 },
|
||||
charge: { strength: -400 },
|
||||
collide: { radius: 50 },
|
||||
},
|
||||
circular: { type: "circular", radius: 250 },
|
||||
grid: { type: "grid" },
|
||||
radial: { type: "radial", unitRadius: 120, preventOverlap: true, nodeSpacing: 30 },
|
||||
concentric: { type: "concentric", preventOverlap: true, nodeSpacing: 30 },
|
||||
};
|
||||
|
||||
graph.setLayout(layoutConfigs[layoutType] ?? layoutConfigs["d3-force"]);
|
||||
graph.layout();
|
||||
}, [layoutType]);
|
||||
|
||||
// Highlight nodes
|
||||
useEffect(() => {
|
||||
const graph = graphRef.current;
|
||||
if (!graph || !highlightedNodeIds) return;
|
||||
|
||||
const allNodeIds = data.nodes.map((n) => n.id);
|
||||
if (highlightedNodeIds.size === 0) {
|
||||
// Clear all states
|
||||
allNodeIds.forEach((id) => {
|
||||
graph.setElementState(id, []);
|
||||
});
|
||||
data.edges.forEach((e) => {
|
||||
graph.setElementState(e.id, []);
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
allNodeIds.forEach((id) => {
|
||||
if (highlightedNodeIds.has(id)) {
|
||||
graph.setElementState(id, ["highlighted"]);
|
||||
} else {
|
||||
graph.setElementState(id, ["dimmed"]);
|
||||
}
|
||||
});
|
||||
data.edges.forEach((e) => {
|
||||
if (highlightedNodeIds.has(e.source) || highlightedNodeIds.has(e.target)) {
|
||||
graph.setElementState(e.id, []);
|
||||
} else {
|
||||
graph.setElementState(e.id, ["dimmed"]);
|
||||
}
|
||||
});
|
||||
}, [highlightedNodeIds, data]);
|
||||
|
||||
// Bind events
|
||||
useEffect(() => {
|
||||
const graph = graphRef.current;
|
||||
if (!graph) return;
|
||||
|
||||
const handleNodeClick = (event: { target: { id: string } }) => {
|
||||
onNodeClick?.(event.target.id);
|
||||
};
|
||||
const handleEdgeClick = (event: { target: { id: string } }) => {
|
||||
onEdgeClick?.(event.target.id);
|
||||
};
|
||||
const handleNodeDblClick = (event: { target: { id: string } }) => {
|
||||
onNodeDoubleClick?.(event.target.id);
|
||||
};
|
||||
const handleCanvasClick = () => {
|
||||
onCanvasClick?.();
|
||||
};
|
||||
|
||||
graph.on("node:click", handleNodeClick);
|
||||
graph.on("edge:click", handleEdgeClick);
|
||||
graph.on("node:dblclick", handleNodeDblClick);
|
||||
graph.on("canvas:click", handleCanvasClick);
|
||||
|
||||
return () => {
|
||||
graph.off("node:click", handleNodeClick);
|
||||
graph.off("edge:click", handleEdgeClick);
|
||||
graph.off("node:dblclick", handleNodeDblClick);
|
||||
graph.off("canvas:click", handleCanvasClick);
|
||||
};
|
||||
}, [onNodeClick, onEdgeClick, onNodeDoubleClick, onCanvasClick]);
|
||||
|
||||
// Fit view helper
|
||||
const handleFitView = useCallback(() => {
|
||||
graphRef.current?.fitView();
|
||||
}, []);
|
||||
|
||||
return (
|
||||
<div className="relative w-full h-full">
|
||||
<Spin spinning={loading} tip="加载中...">
|
||||
<div ref={containerRef} className="w-full h-full min-h-[500px]" />
|
||||
</Spin>
|
||||
<div className="absolute bottom-4 right-4 flex gap-2">
|
||||
<button
|
||||
onClick={handleFitView}
|
||||
className="px-3 py-1.5 bg-white border border-gray-300 rounded shadow-sm text-xs hover:bg-gray-50"
|
||||
>
|
||||
适应画布
|
||||
</button>
|
||||
<button
|
||||
onClick={() => graphRef.current?.zoomTo(1)}
|
||||
className="px-3 py-1.5 bg-white border border-gray-300 rounded shadow-sm text-xs hover:bg-gray-50"
|
||||
>
|
||||
重置缩放
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
export default memo(GraphCanvas);
|
||||
187
frontend/src/pages/KnowledgeGraph/components/NodeDetail.tsx
Normal file
187
frontend/src/pages/KnowledgeGraph/components/NodeDetail.tsx
Normal file
@@ -0,0 +1,187 @@
|
||||
import { useEffect, useState } from "react";
|
||||
import { Drawer, Descriptions, Tag, List, Button, Spin, Empty, message } from "antd";
|
||||
import { Expand } from "lucide-react";
|
||||
import type { GraphEntity, RelationVO, PagedResponse } from "../knowledge-graph.model";
|
||||
import {
|
||||
ENTITY_TYPE_LABELS,
|
||||
ENTITY_TYPE_COLORS,
|
||||
DEFAULT_ENTITY_COLOR,
|
||||
RELATION_TYPE_LABELS,
|
||||
} from "../knowledge-graph.const";
|
||||
import * as api from "../knowledge-graph.api";
|
||||
|
||||
interface NodeDetailProps {
|
||||
graphId: string;
|
||||
entityId: string | null;
|
||||
open: boolean;
|
||||
onClose: () => void;
|
||||
onExpandNode: (entityId: string) => void;
|
||||
onRelationClick: (relationId: string) => void;
|
||||
onEntityNavigate: (entityId: string) => void;
|
||||
}
|
||||
|
||||
export default function NodeDetail({
|
||||
graphId,
|
||||
entityId,
|
||||
open,
|
||||
onClose,
|
||||
onExpandNode,
|
||||
onRelationClick,
|
||||
onEntityNavigate,
|
||||
}: NodeDetailProps) {
|
||||
const [entity, setEntity] = useState<GraphEntity | null>(null);
|
||||
const [relations, setRelations] = useState<RelationVO[]>([]);
|
||||
const [loading, setLoading] = useState(false);
|
||||
|
||||
useEffect(() => {
|
||||
if (!entityId || !graphId) {
|
||||
setEntity(null);
|
||||
setRelations([]);
|
||||
return;
|
||||
}
|
||||
if (!open) return;
|
||||
|
||||
setLoading(true);
|
||||
Promise.all([
|
||||
api.getEntity(graphId, entityId),
|
||||
api.listEntityRelations(graphId, entityId, { page: 0, size: 50 }),
|
||||
])
|
||||
.then(([entityData, relData]: [GraphEntity, PagedResponse<RelationVO>]) => {
|
||||
setEntity(entityData);
|
||||
setRelations(relData.content);
|
||||
})
|
||||
.catch(() => {
|
||||
message.error("加载实体详情失败");
|
||||
})
|
||||
.finally(() => {
|
||||
setLoading(false);
|
||||
});
|
||||
}, [graphId, entityId, open]);
|
||||
|
||||
return (
|
||||
<Drawer
|
||||
title={
|
||||
<div className="flex items-center gap-2">
|
||||
<span>实体详情</span>
|
||||
{entity && (
|
||||
<Tag color={ENTITY_TYPE_COLORS[entity.type] ?? DEFAULT_ENTITY_COLOR}>
|
||||
{ENTITY_TYPE_LABELS[entity.type] ?? entity.type}
|
||||
</Tag>
|
||||
)}
|
||||
</div>
|
||||
}
|
||||
open={open}
|
||||
onClose={onClose}
|
||||
width={420}
|
||||
extra={
|
||||
entityId && (
|
||||
<Button
|
||||
type="primary"
|
||||
size="small"
|
||||
icon={<Expand className="w-3 h-3" />}
|
||||
onClick={() => onExpandNode(entityId)}
|
||||
>
|
||||
展开邻居
|
||||
</Button>
|
||||
)
|
||||
}
|
||||
>
|
||||
<Spin spinning={loading}>
|
||||
{entity ? (
|
||||
<div className="flex flex-col gap-4">
|
||||
<Descriptions column={1} size="small" bordered>
|
||||
<Descriptions.Item label="名称">{entity.name}</Descriptions.Item>
|
||||
<Descriptions.Item label="类型">
|
||||
{ENTITY_TYPE_LABELS[entity.type] ?? entity.type}
|
||||
</Descriptions.Item>
|
||||
{entity.description && (
|
||||
<Descriptions.Item label="描述">{entity.description}</Descriptions.Item>
|
||||
)}
|
||||
{entity.aliases && entity.aliases.length > 0 && (
|
||||
<Descriptions.Item label="别名">
|
||||
{entity.aliases.map((a) => (
|
||||
<Tag key={a}>{a}</Tag>
|
||||
))}
|
||||
</Descriptions.Item>
|
||||
)}
|
||||
{entity.confidence != null && (
|
||||
<Descriptions.Item label="置信度">
|
||||
{(entity.confidence * 100).toFixed(0)}%
|
||||
</Descriptions.Item>
|
||||
)}
|
||||
{entity.sourceType && (
|
||||
<Descriptions.Item label="来源">{entity.sourceType}</Descriptions.Item>
|
||||
)}
|
||||
{entity.createdAt && (
|
||||
<Descriptions.Item label="创建时间">{entity.createdAt}</Descriptions.Item>
|
||||
)}
|
||||
</Descriptions>
|
||||
|
||||
{entity.properties && Object.keys(entity.properties).length > 0 && (
|
||||
<>
|
||||
<h4 className="font-medium text-sm">扩展属性</h4>
|
||||
<Descriptions column={1} size="small" bordered>
|
||||
{Object.entries(entity.properties).map(([key, value]) => (
|
||||
<Descriptions.Item key={key} label={key}>
|
||||
{String(value)}
|
||||
</Descriptions.Item>
|
||||
))}
|
||||
</Descriptions>
|
||||
</>
|
||||
)}
|
||||
|
||||
<h4 className="font-medium text-sm">关系列表 ({relations.length})</h4>
|
||||
{relations.length > 0 ? (
|
||||
<List
|
||||
size="small"
|
||||
dataSource={relations}
|
||||
renderItem={(rel) => {
|
||||
const isSource = rel.sourceEntityId === entityId;
|
||||
const otherName = isSource ? rel.targetEntityName : rel.sourceEntityName;
|
||||
const otherType = isSource ? rel.targetEntityType : rel.sourceEntityType;
|
||||
const otherId = isSource ? rel.targetEntityId : rel.sourceEntityId;
|
||||
const direction = isSource ? "→" : "←";
|
||||
|
||||
return (
|
||||
<List.Item
|
||||
className="cursor-pointer hover:bg-gray-50 !px-2"
|
||||
onClick={() => onRelationClick(rel.id)}
|
||||
>
|
||||
<div className="flex items-center gap-1.5 w-full min-w-0 text-sm">
|
||||
<span className="text-gray-400">{direction}</span>
|
||||
<Tag
|
||||
className="shrink-0"
|
||||
color={ENTITY_TYPE_COLORS[otherType] ?? DEFAULT_ENTITY_COLOR}
|
||||
>
|
||||
{ENTITY_TYPE_LABELS[otherType] ?? otherType}
|
||||
</Tag>
|
||||
<Button
|
||||
type="link"
|
||||
size="small"
|
||||
className="!p-0 truncate"
|
||||
onClick={(e) => {
|
||||
e.stopPropagation();
|
||||
onEntityNavigate(otherId);
|
||||
}}
|
||||
>
|
||||
{otherName}
|
||||
</Button>
|
||||
<span className="ml-auto text-xs text-gray-400 shrink-0">
|
||||
{RELATION_TYPE_LABELS[rel.relationType] ?? rel.relationType}
|
||||
</span>
|
||||
</div>
|
||||
</List.Item>
|
||||
);
|
||||
}}
|
||||
/>
|
||||
) : (
|
||||
<Empty description="暂无关系" image={Empty.PRESENTED_IMAGE_SIMPLE} />
|
||||
)}
|
||||
</div>
|
||||
) : !loading ? (
|
||||
<Empty description="选择一个节点查看详情" />
|
||||
) : null}
|
||||
</Spin>
|
||||
</Drawer>
|
||||
);
|
||||
}
|
||||
173
frontend/src/pages/KnowledgeGraph/components/QueryBuilder.tsx
Normal file
173
frontend/src/pages/KnowledgeGraph/components/QueryBuilder.tsx
Normal file
@@ -0,0 +1,173 @@
|
||||
import { useState, useCallback } from "react";
|
||||
import { Input, Button, Select, InputNumber, List, Tag, Empty, message, Spin } from "antd";
|
||||
import type { PathVO, AllPathsVO, EntitySummaryVO, EdgeSummaryVO } from "../knowledge-graph.model";
|
||||
import {
|
||||
ENTITY_TYPE_LABELS,
|
||||
ENTITY_TYPE_COLORS,
|
||||
DEFAULT_ENTITY_COLOR,
|
||||
RELATION_TYPE_LABELS,
|
||||
} from "../knowledge-graph.const";
|
||||
import * as api from "../knowledge-graph.api";
|
||||
|
||||
type QueryType = "shortest-path" | "all-paths";
|
||||
|
||||
interface QueryBuilderProps {
|
||||
graphId: string;
|
||||
onPathResult: (nodes: EntitySummaryVO[], edges: EdgeSummaryVO[]) => void;
|
||||
}
|
||||
|
||||
export default function QueryBuilder({ graphId, onPathResult }: QueryBuilderProps) {
|
||||
const [queryType, setQueryType] = useState<QueryType>("shortest-path");
|
||||
const [sourceId, setSourceId] = useState("");
|
||||
const [targetId, setTargetId] = useState("");
|
||||
const [maxDepth, setMaxDepth] = useState(5);
|
||||
const [maxPaths, setMaxPaths] = useState(3);
|
||||
const [loading, setLoading] = useState(false);
|
||||
const [pathResults, setPathResults] = useState<PathVO[]>([]);
|
||||
|
||||
const handleQuery = useCallback(async () => {
|
||||
if (!sourceId.trim() || !targetId.trim()) {
|
||||
message.warning("请输入源实体和目标实体 ID");
|
||||
return;
|
||||
}
|
||||
setLoading(true);
|
||||
setPathResults([]);
|
||||
try {
|
||||
if (queryType === "shortest-path") {
|
||||
const path: PathVO = await api.getShortestPath(graphId, {
|
||||
sourceId: sourceId.trim(),
|
||||
targetId: targetId.trim(),
|
||||
maxDepth,
|
||||
});
|
||||
setPathResults([path]);
|
||||
onPathResult(path.nodes, path.edges);
|
||||
} else {
|
||||
const result: AllPathsVO = await api.getAllPaths(graphId, {
|
||||
sourceId: sourceId.trim(),
|
||||
targetId: targetId.trim(),
|
||||
maxDepth,
|
||||
maxPaths,
|
||||
});
|
||||
setPathResults(result.paths);
|
||||
if (result.paths.length > 0) {
|
||||
const allNodes = result.paths.flatMap((p) => p.nodes);
|
||||
const allEdges = result.paths.flatMap((p) => p.edges);
|
||||
onPathResult(allNodes, allEdges);
|
||||
}
|
||||
}
|
||||
} catch {
|
||||
message.error("路径查询失败");
|
||||
} finally {
|
||||
setLoading(false);
|
||||
}
|
||||
}, [graphId, queryType, sourceId, targetId, maxDepth, maxPaths, onPathResult]);
|
||||
|
||||
const handleClear = useCallback(() => {
|
||||
setPathResults([]);
|
||||
setSourceId("");
|
||||
setTargetId("");
|
||||
onPathResult([], []);
|
||||
}, [onPathResult]);
|
||||
|
||||
return (
|
||||
<div className="flex flex-col gap-3">
|
||||
<Select
|
||||
value={queryType}
|
||||
onChange={setQueryType}
|
||||
className="w-full"
|
||||
options={[
|
||||
{ label: "最短路径", value: "shortest-path" },
|
||||
{ label: "所有路径", value: "all-paths" },
|
||||
]}
|
||||
/>
|
||||
|
||||
<Input
|
||||
placeholder="源实体 ID"
|
||||
value={sourceId}
|
||||
onChange={(e) => setSourceId(e.target.value)}
|
||||
allowClear
|
||||
/>
|
||||
|
||||
<Input
|
||||
placeholder="目标实体 ID"
|
||||
value={targetId}
|
||||
onChange={(e) => setTargetId(e.target.value)}
|
||||
allowClear
|
||||
/>
|
||||
|
||||
<div className="flex items-center gap-2">
|
||||
<span className="text-xs text-gray-500 shrink-0">最大深度</span>
|
||||
<InputNumber
|
||||
min={1}
|
||||
max={10}
|
||||
value={maxDepth}
|
||||
onChange={(v) => setMaxDepth(v ?? 5)}
|
||||
size="small"
|
||||
className="flex-1"
|
||||
/>
|
||||
</div>
|
||||
|
||||
{queryType === "all-paths" && (
|
||||
<div className="flex items-center gap-2">
|
||||
<span className="text-xs text-gray-500 shrink-0">最大路径数</span>
|
||||
<InputNumber
|
||||
min={1}
|
||||
max={20}
|
||||
value={maxPaths}
|
||||
onChange={(v) => setMaxPaths(v ?? 3)}
|
||||
size="small"
|
||||
className="flex-1"
|
||||
/>
|
||||
</div>
|
||||
)}
|
||||
|
||||
<div className="flex gap-2">
|
||||
<Button type="primary" onClick={handleQuery} loading={loading} className="flex-1">
|
||||
查询
|
||||
</Button>
|
||||
<Button onClick={handleClear}>清除</Button>
|
||||
</div>
|
||||
|
||||
<Spin spinning={loading}>
|
||||
{pathResults.length > 0 ? (
|
||||
<List
|
||||
size="small"
|
||||
dataSource={pathResults}
|
||||
renderItem={(path, index) => (
|
||||
<List.Item className="!px-2">
|
||||
<div className="flex flex-col gap-1 w-full">
|
||||
<div className="text-xs font-medium text-gray-600">
|
||||
路径 {index + 1}({path.pathLength} 跳)
|
||||
</div>
|
||||
<div className="flex items-center gap-1 flex-wrap">
|
||||
{path.nodes.map((node, ni) => (
|
||||
<span key={node.id} className="flex items-center gap-1">
|
||||
{ni > 0 && (
|
||||
<span className="text-xs text-gray-400">
|
||||
{path.edges[ni - 1]
|
||||
? RELATION_TYPE_LABELS[path.edges[ni - 1].relationType] ??
|
||||
path.edges[ni - 1].relationType
|
||||
: "→"}
|
||||
</span>
|
||||
)}
|
||||
<Tag
|
||||
color={ENTITY_TYPE_COLORS[node.type] ?? DEFAULT_ENTITY_COLOR}
|
||||
className="!m-0"
|
||||
>
|
||||
{ENTITY_TYPE_LABELS[node.type] ?? node.type}
|
||||
</Tag>
|
||||
<span className="text-xs">{node.name}</span>
|
||||
</span>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
</List.Item>
|
||||
)}
|
||||
/>
|
||||
) : !loading && sourceId && targetId ? (
|
||||
<Empty description="暂无结果" image={Empty.PRESENTED_IMAGE_SIMPLE} />
|
||||
) : null}
|
||||
</Spin>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
122
frontend/src/pages/KnowledgeGraph/components/RelationDetail.tsx
Normal file
122
frontend/src/pages/KnowledgeGraph/components/RelationDetail.tsx
Normal file
@@ -0,0 +1,122 @@
|
||||
import { useEffect, useState } from "react";
|
||||
import { Drawer, Descriptions, Tag, Spin, Empty, message } from "antd";
|
||||
import type { RelationVO } from "../knowledge-graph.model";
|
||||
import {
|
||||
ENTITY_TYPE_LABELS,
|
||||
ENTITY_TYPE_COLORS,
|
||||
DEFAULT_ENTITY_COLOR,
|
||||
RELATION_TYPE_LABELS,
|
||||
} from "../knowledge-graph.const";
|
||||
import * as api from "../knowledge-graph.api";
|
||||
|
||||
interface RelationDetailProps {
|
||||
graphId: string;
|
||||
relationId: string | null;
|
||||
open: boolean;
|
||||
onClose: () => void;
|
||||
onEntityNavigate: (entityId: string) => void;
|
||||
}
|
||||
|
||||
export default function RelationDetail({
|
||||
graphId,
|
||||
relationId,
|
||||
open,
|
||||
onClose,
|
||||
onEntityNavigate,
|
||||
}: RelationDetailProps) {
|
||||
const [relation, setRelation] = useState<RelationVO | null>(null);
|
||||
const [loading, setLoading] = useState(false);
|
||||
|
||||
useEffect(() => {
|
||||
if (!relationId || !graphId) {
|
||||
setRelation(null);
|
||||
return;
|
||||
}
|
||||
if (!open) return;
|
||||
|
||||
setLoading(true);
|
||||
api
|
||||
.getRelation(graphId, relationId)
|
||||
.then((data) => setRelation(data))
|
||||
.catch(() => message.error("加载关系详情失败"))
|
||||
.finally(() => setLoading(false));
|
||||
}, [graphId, relationId, open]);
|
||||
|
||||
return (
|
||||
<Drawer title="关系详情" open={open} onClose={onClose} width={400}>
|
||||
<Spin spinning={loading}>
|
||||
{relation ? (
|
||||
<div className="flex flex-col gap-4">
|
||||
<Descriptions column={1} size="small" bordered>
|
||||
<Descriptions.Item label="关系类型">
|
||||
<Tag color="blue">
|
||||
{RELATION_TYPE_LABELS[relation.relationType] ?? relation.relationType}
|
||||
</Tag>
|
||||
</Descriptions.Item>
|
||||
<Descriptions.Item label="源实体">
|
||||
<div className="flex items-center gap-1.5">
|
||||
<Tag
|
||||
color={
|
||||
ENTITY_TYPE_COLORS[relation.sourceEntityType] ?? DEFAULT_ENTITY_COLOR
|
||||
}
|
||||
>
|
||||
{ENTITY_TYPE_LABELS[relation.sourceEntityType] ?? relation.sourceEntityType}
|
||||
</Tag>
|
||||
<a
|
||||
className="text-blue-500 cursor-pointer hover:underline"
|
||||
onClick={() => onEntityNavigate(relation.sourceEntityId)}
|
||||
>
|
||||
{relation.sourceEntityName}
|
||||
</a>
|
||||
</div>
|
||||
</Descriptions.Item>
|
||||
<Descriptions.Item label="目标实体">
|
||||
<div className="flex items-center gap-1.5">
|
||||
<Tag
|
||||
color={
|
||||
ENTITY_TYPE_COLORS[relation.targetEntityType] ?? DEFAULT_ENTITY_COLOR
|
||||
}
|
||||
>
|
||||
{ENTITY_TYPE_LABELS[relation.targetEntityType] ?? relation.targetEntityType}
|
||||
</Tag>
|
||||
<a
|
||||
className="text-blue-500 cursor-pointer hover:underline"
|
||||
onClick={() => onEntityNavigate(relation.targetEntityId)}
|
||||
>
|
||||
{relation.targetEntityName}
|
||||
</a>
|
||||
</div>
|
||||
</Descriptions.Item>
|
||||
{relation.weight != null && (
|
||||
<Descriptions.Item label="权重">{relation.weight}</Descriptions.Item>
|
||||
)}
|
||||
{relation.confidence != null && (
|
||||
<Descriptions.Item label="置信度">
|
||||
{(relation.confidence * 100).toFixed(0)}%
|
||||
</Descriptions.Item>
|
||||
)}
|
||||
{relation.createdAt && (
|
||||
<Descriptions.Item label="创建时间">{relation.createdAt}</Descriptions.Item>
|
||||
)}
|
||||
</Descriptions>
|
||||
|
||||
{relation.properties && Object.keys(relation.properties).length > 0 && (
|
||||
<>
|
||||
<h4 className="font-medium text-sm">扩展属性</h4>
|
||||
<Descriptions column={1} size="small" bordered>
|
||||
{Object.entries(relation.properties).map(([key, value]) => (
|
||||
<Descriptions.Item key={key} label={key}>
|
||||
{String(value)}
|
||||
</Descriptions.Item>
|
||||
))}
|
||||
</Descriptions>
|
||||
</>
|
||||
)}
|
||||
</div>
|
||||
) : !loading ? (
|
||||
<Empty description="选择一条边查看详情" />
|
||||
) : null}
|
||||
</Spin>
|
||||
</Drawer>
|
||||
);
|
||||
}
|
||||
102
frontend/src/pages/KnowledgeGraph/components/SearchPanel.tsx
Normal file
102
frontend/src/pages/KnowledgeGraph/components/SearchPanel.tsx
Normal file
@@ -0,0 +1,102 @@
|
||||
import { useState, useCallback } from "react";
|
||||
import { Input, List, Tag, Select, Empty } from "antd";
|
||||
import { Search } from "lucide-react";
|
||||
import type { SearchHitVO } from "../knowledge-graph.model";
|
||||
import {
|
||||
ENTITY_TYPES,
|
||||
ENTITY_TYPE_LABELS,
|
||||
ENTITY_TYPE_COLORS,
|
||||
DEFAULT_ENTITY_COLOR,
|
||||
} from "../knowledge-graph.const";
|
||||
|
||||
interface SearchPanelProps {
|
||||
graphId: string;
|
||||
results: SearchHitVO[];
|
||||
loading: boolean;
|
||||
onSearch: (graphId: string, query: string) => void;
|
||||
onResultClick: (entityId: string) => void;
|
||||
onClear: () => void;
|
||||
}
|
||||
|
||||
export default function SearchPanel({
|
||||
graphId,
|
||||
results,
|
||||
loading,
|
||||
onSearch,
|
||||
onResultClick,
|
||||
onClear,
|
||||
}: SearchPanelProps) {
|
||||
const [query, setQuery] = useState("");
|
||||
const [typeFilter, setTypeFilter] = useState<string | undefined>(undefined);
|
||||
|
||||
const handleSearch = useCallback(
|
||||
(value: string) => {
|
||||
setQuery(value);
|
||||
if (!value.trim()) {
|
||||
onClear();
|
||||
return;
|
||||
}
|
||||
onSearch(graphId, value);
|
||||
},
|
||||
[graphId, onSearch, onClear]
|
||||
);
|
||||
|
||||
const filteredResults = typeFilter
|
||||
? results.filter((r) => r.type === typeFilter)
|
||||
: results;
|
||||
|
||||
return (
|
||||
<div className="flex flex-col gap-3">
|
||||
<Input.Search
|
||||
placeholder="搜索实体名称..."
|
||||
value={query}
|
||||
onChange={(e) => setQuery(e.target.value)}
|
||||
onSearch={handleSearch}
|
||||
allowClear
|
||||
onClear={() => {
|
||||
setQuery("");
|
||||
onClear();
|
||||
}}
|
||||
prefix={<Search className="w-4 h-4 text-gray-400" />}
|
||||
loading={loading}
|
||||
/>
|
||||
|
||||
<Select
|
||||
allowClear
|
||||
placeholder="按类型筛选"
|
||||
value={typeFilter}
|
||||
onChange={setTypeFilter}
|
||||
className="w-full"
|
||||
options={ENTITY_TYPES.map((t) => ({
|
||||
label: ENTITY_TYPE_LABELS[t] ?? t,
|
||||
value: t,
|
||||
}))}
|
||||
/>
|
||||
|
||||
{filteredResults.length > 0 ? (
|
||||
<List
|
||||
size="small"
|
||||
dataSource={filteredResults}
|
||||
renderItem={(item) => (
|
||||
<List.Item
|
||||
className="cursor-pointer hover:bg-gray-50 !px-2"
|
||||
onClick={() => onResultClick(item.id)}
|
||||
>
|
||||
<div className="flex items-center gap-2 w-full min-w-0">
|
||||
<Tag color={ENTITY_TYPE_COLORS[item.type] ?? DEFAULT_ENTITY_COLOR}>
|
||||
{ENTITY_TYPE_LABELS[item.type] ?? item.type}
|
||||
</Tag>
|
||||
<span className="truncate font-medium text-sm">{item.name}</span>
|
||||
<span className="ml-auto text-xs text-gray-400 shrink-0">
|
||||
{item.score.toFixed(2)}
|
||||
</span>
|
||||
</div>
|
||||
</List.Item>
|
||||
)}
|
||||
/>
|
||||
) : query && !loading ? (
|
||||
<Empty description="未找到匹配实体" image={Empty.PRESENTED_IMAGE_SIMPLE} />
|
||||
) : null}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
106
frontend/src/pages/KnowledgeGraph/graphConfig.ts
Normal file
106
frontend/src/pages/KnowledgeGraph/graphConfig.ts
Normal file
@@ -0,0 +1,106 @@
|
||||
import { ENTITY_TYPE_COLORS, DEFAULT_ENTITY_COLOR } from "./knowledge-graph.const";
|
||||
|
||||
/** Node count threshold above which performance optimizations kick in. */
|
||||
export const LARGE_GRAPH_THRESHOLD = 200;
|
||||
|
||||
/** Create the G6 v5 graph options. */
|
||||
export function createGraphOptions(container: HTMLElement) {
|
||||
return {
|
||||
container,
|
||||
autoFit: "view" as const,
|
||||
padding: 40,
|
||||
animation: true,
|
||||
layout: {
|
||||
type: "d3-force" as const,
|
||||
preventOverlap: true,
|
||||
link: {
|
||||
distance: 180,
|
||||
},
|
||||
charge: {
|
||||
strength: -400,
|
||||
},
|
||||
collide: {
|
||||
radius: 50,
|
||||
},
|
||||
},
|
||||
node: {
|
||||
type: "circle" as const,
|
||||
style: {
|
||||
size: (d: { data?: { type?: string } }) => {
|
||||
return d?.data?.type === "Dataset" ? 40 : 32;
|
||||
},
|
||||
fill: (d: { data?: { type?: string } }) => {
|
||||
const type = d?.data?.type ?? "";
|
||||
return ENTITY_TYPE_COLORS[type] ?? DEFAULT_ENTITY_COLOR;
|
||||
},
|
||||
stroke: "#fff",
|
||||
lineWidth: 2,
|
||||
labelText: (d: { data?: { label?: string } }) => d?.data?.label ?? "",
|
||||
labelFontSize: 11,
|
||||
labelFill: "#333",
|
||||
labelPlacement: "bottom" as const,
|
||||
labelOffsetY: 4,
|
||||
labelMaxWidth: 100,
|
||||
labelWordWrap: true,
|
||||
labelWordWrapWidth: 100,
|
||||
cursor: "pointer",
|
||||
},
|
||||
state: {
|
||||
selected: {
|
||||
stroke: "#1677ff",
|
||||
lineWidth: 3,
|
||||
shadowColor: "rgba(22, 119, 255, 0.4)",
|
||||
shadowBlur: 10,
|
||||
labelVisibility: "visible" as const,
|
||||
},
|
||||
highlighted: {
|
||||
stroke: "#faad14",
|
||||
lineWidth: 3,
|
||||
labelVisibility: "visible" as const,
|
||||
},
|
||||
dimmed: {
|
||||
opacity: 0.3,
|
||||
},
|
||||
},
|
||||
},
|
||||
edge: {
|
||||
type: "line" as const,
|
||||
style: {
|
||||
stroke: "#C2C8D5",
|
||||
lineWidth: 1,
|
||||
endArrow: true,
|
||||
endArrowSize: 6,
|
||||
labelText: (d: { data?: { label?: string } }) => d?.data?.label ?? "",
|
||||
labelFontSize: 10,
|
||||
labelFill: "#999",
|
||||
labelBackground: true,
|
||||
labelBackgroundFill: "#fff",
|
||||
labelBackgroundOpacity: 0.85,
|
||||
labelPadding: [2, 4],
|
||||
cursor: "pointer",
|
||||
},
|
||||
state: {
|
||||
selected: {
|
||||
stroke: "#1677ff",
|
||||
lineWidth: 2,
|
||||
},
|
||||
highlighted: {
|
||||
stroke: "#faad14",
|
||||
lineWidth: 2,
|
||||
},
|
||||
dimmed: {
|
||||
opacity: 0.15,
|
||||
},
|
||||
},
|
||||
},
|
||||
behaviors: [
|
||||
"drag-canvas",
|
||||
"zoom-canvas",
|
||||
"drag-element",
|
||||
{
|
||||
type: "click-select" as const,
|
||||
multiple: false,
|
||||
},
|
||||
],
|
||||
};
|
||||
}
|
||||
77
frontend/src/pages/KnowledgeGraph/graphTransform.ts
Normal file
77
frontend/src/pages/KnowledgeGraph/graphTransform.ts
Normal file
@@ -0,0 +1,77 @@
|
||||
import type { EntitySummaryVO, EdgeSummaryVO, SubgraphVO } from "./knowledge-graph.model";
|
||||
import { ENTITY_TYPE_COLORS, DEFAULT_ENTITY_COLOR, RELATION_TYPE_LABELS } from "./knowledge-graph.const";
|
||||
|
||||
export interface G6NodeData {
|
||||
id: string;
|
||||
data: {
|
||||
label: string;
|
||||
type: string;
|
||||
description?: string;
|
||||
};
|
||||
style?: Record<string, unknown>;
|
||||
}
|
||||
|
||||
export interface G6EdgeData {
|
||||
id: string;
|
||||
source: string;
|
||||
target: string;
|
||||
data: {
|
||||
label: string;
|
||||
relationType: string;
|
||||
weight?: number;
|
||||
};
|
||||
}
|
||||
|
||||
export interface G6GraphData {
|
||||
nodes: G6NodeData[];
|
||||
edges: G6EdgeData[];
|
||||
}
|
||||
|
||||
export function entityToG6Node(entity: EntitySummaryVO): G6NodeData {
|
||||
return {
|
||||
id: entity.id,
|
||||
data: {
|
||||
label: entity.name,
|
||||
type: entity.type,
|
||||
description: entity.description,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
export function edgeToG6Edge(edge: EdgeSummaryVO): G6EdgeData {
|
||||
return {
|
||||
id: edge.id,
|
||||
source: edge.sourceEntityId,
|
||||
target: edge.targetEntityId,
|
||||
data: {
|
||||
label: RELATION_TYPE_LABELS[edge.relationType] ?? edge.relationType,
|
||||
relationType: edge.relationType,
|
||||
weight: edge.weight,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
export function subgraphToG6Data(subgraph: SubgraphVO): G6GraphData {
|
||||
return {
|
||||
nodes: subgraph.nodes.map(entityToG6Node),
|
||||
edges: subgraph.edges.map(edgeToG6Edge),
|
||||
};
|
||||
}
|
||||
|
||||
/** Merge new subgraph data into existing graph data, avoiding duplicates. */
|
||||
export function mergeG6Data(existing: G6GraphData, incoming: G6GraphData): G6GraphData {
|
||||
const nodeIds = new Set(existing.nodes.map((n) => n.id));
|
||||
const edgeIds = new Set(existing.edges.map((e) => e.id));
|
||||
|
||||
const newNodes = incoming.nodes.filter((n) => !nodeIds.has(n.id));
|
||||
const newEdges = incoming.edges.filter((e) => !edgeIds.has(e.id));
|
||||
|
||||
return {
|
||||
nodes: [...existing.nodes, ...newNodes],
|
||||
edges: [...existing.edges, ...newEdges],
|
||||
};
|
||||
}
|
||||
|
||||
export function getEntityColor(type: string): string {
|
||||
return ENTITY_TYPE_COLORS[type] ?? DEFAULT_ENTITY_COLOR;
|
||||
}
|
||||
141
frontend/src/pages/KnowledgeGraph/hooks/useGraphData.ts
Normal file
141
frontend/src/pages/KnowledgeGraph/hooks/useGraphData.ts
Normal file
@@ -0,0 +1,141 @@
|
||||
import { useState, useCallback, useRef } from "react";
|
||||
import { message } from "antd";
|
||||
import type { SubgraphVO, SearchHitVO, EntitySummaryVO, EdgeSummaryVO } from "../knowledge-graph.model";
|
||||
import type { G6GraphData } from "../graphTransform";
|
||||
import { subgraphToG6Data, mergeG6Data } from "../graphTransform";
|
||||
import * as api from "../knowledge-graph.api";
|
||||
|
||||
export interface UseGraphDataReturn {
|
||||
graphData: G6GraphData;
|
||||
loading: boolean;
|
||||
searchResults: SearchHitVO[];
|
||||
searchLoading: boolean;
|
||||
highlightedNodeIds: Set<string>;
|
||||
loadSubgraph: (graphId: string, entityIds: string[], depth?: number) => Promise<void>;
|
||||
expandNode: (graphId: string, entityId: string, depth?: number) => Promise<void>;
|
||||
searchEntities: (graphId: string, query: string) => Promise<void>;
|
||||
loadInitialData: (graphId: string) => Promise<void>;
|
||||
mergePathData: (nodes: EntitySummaryVO[], edges: EdgeSummaryVO[]) => void;
|
||||
clearGraph: () => void;
|
||||
clearSearch: () => void;
|
||||
}
|
||||
|
||||
export default function useGraphData(): UseGraphDataReturn {
|
||||
const [graphData, setGraphData] = useState<G6GraphData>({ nodes: [], edges: [] });
|
||||
const [loading, setLoading] = useState(false);
|
||||
const [searchResults, setSearchResults] = useState<SearchHitVO[]>([]);
|
||||
const [searchLoading, setSearchLoading] = useState(false);
|
||||
const [highlightedNodeIds, setHighlightedNodeIds] = useState<Set<string>>(new Set());
|
||||
const abortRef = useRef<AbortController | null>(null);
|
||||
|
||||
const loadInitialData = useCallback(async (graphId: string) => {
|
||||
setLoading(true);
|
||||
try {
|
||||
const entities = await api.listEntitiesPaged(graphId, { page: 0, size: 100 });
|
||||
const entityIds = entities.content.map((e) => e.id);
|
||||
if (entityIds.length === 0) {
|
||||
setGraphData({ nodes: [], edges: [] });
|
||||
return;
|
||||
}
|
||||
const subgraph: SubgraphVO = await api.getSubgraph(graphId, { entityIds }, { depth: 1 });
|
||||
setGraphData(subgraphToG6Data(subgraph));
|
||||
} catch {
|
||||
message.error("加载图谱数据失败");
|
||||
} finally {
|
||||
setLoading(false);
|
||||
}
|
||||
}, []);
|
||||
|
||||
const loadSubgraph = useCallback(async (graphId: string, entityIds: string[], depth = 1) => {
|
||||
setLoading(true);
|
||||
try {
|
||||
const subgraph = await api.getSubgraph(graphId, { entityIds }, { depth });
|
||||
setGraphData(subgraphToG6Data(subgraph));
|
||||
} catch {
|
||||
message.error("加载子图失败");
|
||||
} finally {
|
||||
setLoading(false);
|
||||
}
|
||||
}, []);
|
||||
|
||||
const expandNode = useCallback(
|
||||
async (graphId: string, entityId: string, depth = 1) => {
|
||||
setLoading(true);
|
||||
try {
|
||||
const subgraph = await api.getNeighborSubgraph(graphId, entityId, { depth, limit: 50 });
|
||||
const incoming = subgraphToG6Data(subgraph);
|
||||
setGraphData((prev) => mergeG6Data(prev, incoming));
|
||||
} catch {
|
||||
message.error("展开节点失败");
|
||||
} finally {
|
||||
setLoading(false);
|
||||
}
|
||||
},
|
||||
[]
|
||||
);
|
||||
|
||||
const searchEntitiesFn = useCallback(async (graphId: string, query: string) => {
|
||||
if (!query.trim()) {
|
||||
setSearchResults([]);
|
||||
setHighlightedNodeIds(new Set());
|
||||
return;
|
||||
}
|
||||
abortRef.current?.abort();
|
||||
const controller = new AbortController();
|
||||
abortRef.current = controller;
|
||||
setSearchLoading(true);
|
||||
try {
|
||||
const result = await api.searchEntities(graphId, { q: query, size: 20 }, { signal: controller.signal });
|
||||
setSearchResults(result.content);
|
||||
setHighlightedNodeIds(new Set(result.content.map((h) => h.id)));
|
||||
} catch {
|
||||
// ignore abort errors
|
||||
} finally {
|
||||
setSearchLoading(false);
|
||||
}
|
||||
}, []);
|
||||
|
||||
const clearGraph = useCallback(() => {
|
||||
setGraphData({ nodes: [], edges: [] });
|
||||
setSearchResults([]);
|
||||
setHighlightedNodeIds(new Set());
|
||||
}, []);
|
||||
|
||||
const clearSearch = useCallback(() => {
|
||||
setSearchResults([]);
|
||||
setHighlightedNodeIds(new Set());
|
||||
}, []);
|
||||
|
||||
const mergePathData = useCallback(
|
||||
(nodes: EntitySummaryVO[], edges: EdgeSummaryVO[]) => {
|
||||
if (nodes.length === 0) {
|
||||
setHighlightedNodeIds(new Set());
|
||||
return;
|
||||
}
|
||||
const pathData = subgraphToG6Data({
|
||||
nodes,
|
||||
edges,
|
||||
nodeCount: nodes.length,
|
||||
edgeCount: edges.length,
|
||||
});
|
||||
setGraphData((prev) => mergeG6Data(prev, pathData));
|
||||
setHighlightedNodeIds(new Set(nodes.map((n) => n.id)));
|
||||
},
|
||||
[]
|
||||
);
|
||||
|
||||
return {
|
||||
graphData,
|
||||
loading,
|
||||
searchResults,
|
||||
searchLoading,
|
||||
highlightedNodeIds,
|
||||
loadSubgraph,
|
||||
expandNode,
|
||||
searchEntities: searchEntitiesFn,
|
||||
loadInitialData,
|
||||
mergePathData,
|
||||
clearGraph,
|
||||
clearSearch,
|
||||
};
|
||||
}
|
||||
61
frontend/src/pages/KnowledgeGraph/hooks/useGraphLayout.ts
Normal file
61
frontend/src/pages/KnowledgeGraph/hooks/useGraphLayout.ts
Normal file
@@ -0,0 +1,61 @@
|
||||
import { useState, useCallback } from "react";
|
||||
|
||||
export type LayoutType = "d3-force" | "circular" | "grid" | "radial" | "concentric";
|
||||
|
||||
interface LayoutConfig {
|
||||
type: LayoutType;
|
||||
[key: string]: unknown;
|
||||
}
|
||||
|
||||
const LAYOUT_CONFIGS: Record<LayoutType, LayoutConfig> = {
|
||||
"d3-force": {
|
||||
type: "d3-force",
|
||||
preventOverlap: true,
|
||||
link: { distance: 180 },
|
||||
charge: { strength: -400 },
|
||||
collide: { radius: 50 },
|
||||
},
|
||||
circular: {
|
||||
type: "circular",
|
||||
radius: 250,
|
||||
},
|
||||
grid: {
|
||||
type: "grid",
|
||||
rows: undefined,
|
||||
cols: undefined,
|
||||
sortBy: "type",
|
||||
},
|
||||
radial: {
|
||||
type: "radial",
|
||||
unitRadius: 120,
|
||||
preventOverlap: true,
|
||||
nodeSpacing: 30,
|
||||
},
|
||||
concentric: {
|
||||
type: "concentric",
|
||||
preventOverlap: true,
|
||||
nodeSpacing: 30,
|
||||
},
|
||||
};
|
||||
|
||||
export const LAYOUT_OPTIONS: { label: string; value: LayoutType }[] = [
|
||||
{ label: "力导向", value: "d3-force" },
|
||||
{ label: "环形", value: "circular" },
|
||||
{ label: "网格", value: "grid" },
|
||||
{ label: "径向", value: "radial" },
|
||||
{ label: "同心圆", value: "concentric" },
|
||||
];
|
||||
|
||||
export default function useGraphLayout() {
|
||||
const [layoutType, setLayoutType] = useState<LayoutType>("d3-force");
|
||||
|
||||
const getLayoutConfig = useCallback((): LayoutConfig => {
|
||||
return LAYOUT_CONFIGS[layoutType] ?? LAYOUT_CONFIGS["d3-force"];
|
||||
}, [layoutType]);
|
||||
|
||||
return {
|
||||
layoutType,
|
||||
setLayoutType,
|
||||
getLayoutConfig,
|
||||
};
|
||||
}
|
||||
148
frontend/src/pages/KnowledgeGraph/knowledge-graph.api.ts
Normal file
148
frontend/src/pages/KnowledgeGraph/knowledge-graph.api.ts
Normal file
@@ -0,0 +1,148 @@
|
||||
import { get, post, del, put } from "@/utils/request";
|
||||
import type {
|
||||
GraphEntity,
|
||||
SubgraphVO,
|
||||
RelationVO,
|
||||
SearchHitVO,
|
||||
PagedResponse,
|
||||
PathVO,
|
||||
AllPathsVO,
|
||||
} from "./knowledge-graph.model";
|
||||
|
||||
const BASE = "/knowledge-graph";
|
||||
|
||||
// ---- Entity ----
|
||||
|
||||
export function getEntity(graphId: string, entityId: string): Promise<GraphEntity> {
|
||||
return get(`${BASE}/${graphId}/entities/${entityId}`);
|
||||
}
|
||||
|
||||
export function listEntities(
|
||||
graphId: string,
|
||||
params?: { type?: string; keyword?: string }
|
||||
): Promise<GraphEntity[]> {
|
||||
return get(`${BASE}/${graphId}/entities`, params ?? null);
|
||||
}
|
||||
|
||||
export function listEntitiesPaged(
|
||||
graphId: string,
|
||||
params: { type?: string; keyword?: string; page?: number; size?: number }
|
||||
): Promise<PagedResponse<GraphEntity>> {
|
||||
return get(`${BASE}/${graphId}/entities`, params);
|
||||
}
|
||||
|
||||
export function createEntity(
|
||||
graphId: string,
|
||||
data: { name: string; type: string; description?: string; properties?: Record<string, unknown> }
|
||||
): Promise<GraphEntity> {
|
||||
return post(`${BASE}/${graphId}/entities`, data);
|
||||
}
|
||||
|
||||
export function updateEntity(
|
||||
graphId: string,
|
||||
entityId: string,
|
||||
data: { name?: string; type?: string; description?: string; properties?: Record<string, unknown> }
|
||||
): Promise<GraphEntity> {
|
||||
return put(`${BASE}/${graphId}/entities/${entityId}`, data);
|
||||
}
|
||||
|
||||
export function deleteEntity(graphId: string, entityId: string): Promise<void> {
|
||||
return del(`${BASE}/${graphId}/entities/${entityId}`);
|
||||
}
|
||||
|
||||
// ---- Relation ----
|
||||
|
||||
export function getRelation(graphId: string, relationId: string): Promise<RelationVO> {
|
||||
return get(`${BASE}/${graphId}/relations/${relationId}`);
|
||||
}
|
||||
|
||||
export function listRelations(
|
||||
graphId: string,
|
||||
params?: { type?: string; page?: number; size?: number }
|
||||
): Promise<PagedResponse<RelationVO>> {
|
||||
return get(`${BASE}/${graphId}/relations`, params ?? null);
|
||||
}
|
||||
|
||||
export function createRelation(
|
||||
graphId: string,
|
||||
data: {
|
||||
sourceEntityId: string;
|
||||
targetEntityId: string;
|
||||
relationType: string;
|
||||
properties?: Record<string, unknown>;
|
||||
weight?: number;
|
||||
confidence?: number;
|
||||
}
|
||||
): Promise<RelationVO> {
|
||||
return post(`${BASE}/${graphId}/relations`, data);
|
||||
}
|
||||
|
||||
export function updateRelation(
|
||||
graphId: string,
|
||||
relationId: string,
|
||||
data: { relationType?: string; properties?: Record<string, unknown>; weight?: number; confidence?: number }
|
||||
): Promise<RelationVO> {
|
||||
return put(`${BASE}/${graphId}/relations/${relationId}`, data);
|
||||
}
|
||||
|
||||
export function deleteRelation(graphId: string, relationId: string): Promise<void> {
|
||||
return del(`${BASE}/${graphId}/relations/${relationId}`);
|
||||
}
|
||||
|
||||
export function listEntityRelations(
|
||||
graphId: string,
|
||||
entityId: string,
|
||||
params?: { direction?: string; type?: string; page?: number; size?: number }
|
||||
): Promise<PagedResponse<RelationVO>> {
|
||||
return get(`${BASE}/${graphId}/entities/${entityId}/relations`, params ?? null);
|
||||
}
|
||||
|
||||
// ---- Query ----
|
||||
|
||||
export function getNeighborSubgraph(
|
||||
graphId: string,
|
||||
entityId: string,
|
||||
params?: { depth?: number; limit?: number }
|
||||
): Promise<SubgraphVO> {
|
||||
return get(`${BASE}/${graphId}/query/neighbors/${entityId}`, params ?? null);
|
||||
}
|
||||
|
||||
export function getSubgraph(
|
||||
graphId: string,
|
||||
data: { entityIds: string[] },
|
||||
params?: { depth?: number }
|
||||
): Promise<SubgraphVO> {
|
||||
return post(`${BASE}/${graphId}/query/subgraph/export?depth=${params?.depth ?? 1}`, data);
|
||||
}
|
||||
|
||||
export function getShortestPath(
|
||||
graphId: string,
|
||||
params: { sourceId: string; targetId: string; maxDepth?: number }
|
||||
): Promise<PathVO> {
|
||||
return get(`${BASE}/${graphId}/query/shortest-path`, params);
|
||||
}
|
||||
|
||||
export function getAllPaths(
|
||||
graphId: string,
|
||||
params: { sourceId: string; targetId: string; maxDepth?: number; maxPaths?: number }
|
||||
): Promise<AllPathsVO> {
|
||||
return get(`${BASE}/${graphId}/query/all-paths`, params);
|
||||
}
|
||||
|
||||
export function searchEntities(
|
||||
graphId: string,
|
||||
params: { q: string; page?: number; size?: number },
|
||||
options?: { signal?: AbortSignal }
|
||||
): Promise<PagedResponse<SearchHitVO>> {
|
||||
return get(`${BASE}/${graphId}/query/search`, params, options);
|
||||
}
|
||||
|
||||
// ---- Neighbors (entity controller) ----
|
||||
|
||||
export function getEntityNeighbors(
|
||||
graphId: string,
|
||||
entityId: string,
|
||||
params?: { depth?: number; limit?: number }
|
||||
): Promise<GraphEntity[]> {
|
||||
return get(`${BASE}/${graphId}/entities/${entityId}/neighbors`, params ?? null);
|
||||
}
|
||||
46
frontend/src/pages/KnowledgeGraph/knowledge-graph.const.ts
Normal file
46
frontend/src/pages/KnowledgeGraph/knowledge-graph.const.ts
Normal file
@@ -0,0 +1,46 @@
|
||||
/** Entity type -> display color mapping */
|
||||
export const ENTITY_TYPE_COLORS: Record<string, string> = {
|
||||
Dataset: "#5B8FF9",
|
||||
Field: "#5AD8A6",
|
||||
User: "#F6BD16",
|
||||
Org: "#E86452",
|
||||
Workflow: "#6DC8EC",
|
||||
Job: "#945FB9",
|
||||
LabelTask: "#FF9845",
|
||||
KnowledgeSet: "#1E9493",
|
||||
};
|
||||
|
||||
/** Default color for unknown entity types */
|
||||
export const DEFAULT_ENTITY_COLOR = "#9CA3AF";
|
||||
|
||||
/** Relation type -> Chinese label mapping */
|
||||
export const RELATION_TYPE_LABELS: Record<string, string> = {
|
||||
HAS_FIELD: "包含字段",
|
||||
DERIVED_FROM: "来源于",
|
||||
USES_DATASET: "使用数据集",
|
||||
PRODUCES: "产出",
|
||||
ASSIGNED_TO: "分配给",
|
||||
BELONGS_TO: "属于",
|
||||
TRIGGERS: "触发",
|
||||
DEPENDS_ON: "依赖",
|
||||
IMPACTS: "影响",
|
||||
SOURCED_FROM: "知识来源",
|
||||
};
|
||||
|
||||
/** Entity type -> Chinese label mapping */
|
||||
export const ENTITY_TYPE_LABELS: Record<string, string> = {
|
||||
Dataset: "数据集",
|
||||
Field: "字段",
|
||||
User: "用户",
|
||||
Org: "组织",
|
||||
Workflow: "工作流",
|
||||
Job: "作业",
|
||||
LabelTask: "标注任务",
|
||||
KnowledgeSet: "知识集",
|
||||
};
|
||||
|
||||
/** Available entity types for filtering */
|
||||
export const ENTITY_TYPES = Object.keys(ENTITY_TYPE_LABELS);
|
||||
|
||||
/** Available relation types for filtering */
|
||||
export const RELATION_TYPES = Object.keys(RELATION_TYPE_LABELS);
|
||||
81
frontend/src/pages/KnowledgeGraph/knowledge-graph.model.ts
Normal file
81
frontend/src/pages/KnowledgeGraph/knowledge-graph.model.ts
Normal file
@@ -0,0 +1,81 @@
|
||||
export interface GraphEntity {
|
||||
id: string;
|
||||
name: string;
|
||||
type: string;
|
||||
description?: string;
|
||||
labels?: string[];
|
||||
aliases?: string[];
|
||||
properties?: Record<string, unknown>;
|
||||
sourceId?: string;
|
||||
sourceType?: string;
|
||||
graphId: string;
|
||||
confidence?: number;
|
||||
createdAt?: string;
|
||||
updatedAt?: string;
|
||||
}
|
||||
|
||||
export interface EntitySummaryVO {
|
||||
id: string;
|
||||
name: string;
|
||||
type: string;
|
||||
description?: string;
|
||||
}
|
||||
|
||||
export interface EdgeSummaryVO {
|
||||
id: string;
|
||||
sourceEntityId: string;
|
||||
targetEntityId: string;
|
||||
relationType: string;
|
||||
weight?: number;
|
||||
}
|
||||
|
||||
export interface SubgraphVO {
|
||||
nodes: EntitySummaryVO[];
|
||||
edges: EdgeSummaryVO[];
|
||||
nodeCount: number;
|
||||
edgeCount: number;
|
||||
}
|
||||
|
||||
export interface RelationVO {
|
||||
id: string;
|
||||
sourceEntityId: string;
|
||||
sourceEntityName: string;
|
||||
sourceEntityType: string;
|
||||
targetEntityId: string;
|
||||
targetEntityName: string;
|
||||
targetEntityType: string;
|
||||
relationType: string;
|
||||
properties?: Record<string, unknown>;
|
||||
weight?: number;
|
||||
confidence?: number;
|
||||
sourceId?: string;
|
||||
graphId: string;
|
||||
createdAt?: string;
|
||||
}
|
||||
|
||||
export interface SearchHitVO {
|
||||
id: string;
|
||||
name: string;
|
||||
type: string;
|
||||
description?: string;
|
||||
score: number;
|
||||
}
|
||||
|
||||
export interface PagedResponse<T> {
|
||||
page: number;
|
||||
size: number;
|
||||
totalElements: number;
|
||||
totalPages: number;
|
||||
content: T[];
|
||||
}
|
||||
|
||||
export interface PathVO {
|
||||
nodes: EntitySummaryVO[];
|
||||
edges: EdgeSummaryVO[];
|
||||
pathLength: number;
|
||||
}
|
||||
|
||||
export interface AllPathsVO {
|
||||
paths: PathVO[];
|
||||
pathCount: number;
|
||||
}
|
||||
@@ -10,6 +10,7 @@ import {
|
||||
Shield,
|
||||
Sparkles,
|
||||
ListChecks,
|
||||
Network,
|
||||
// Database,
|
||||
// Store,
|
||||
// Merge,
|
||||
@@ -56,6 +57,14 @@ export const menuItems = [
|
||||
description: "管理知识集与知识条目",
|
||||
color: "bg-indigo-500",
|
||||
},
|
||||
{
|
||||
id: "knowledge-graph",
|
||||
title: "知识图谱",
|
||||
icon: Network,
|
||||
permissionCode: PermissionCodes.knowledgeGraphRead,
|
||||
description: "知识图谱浏览与探索",
|
||||
color: "bg-teal-500",
|
||||
},
|
||||
{
|
||||
id: "task-coordination",
|
||||
title: "任务协调",
|
||||
|
||||
@@ -55,6 +55,7 @@ import ContentGenerationPage from "@/pages/ContentGeneration/ContentGenerationPa
|
||||
import LoginPage from "@/pages/Login/LoginPage";
|
||||
import ProtectedRoute from "@/components/ProtectedRoute";
|
||||
import ForbiddenPage from "@/pages/Forbidden/ForbiddenPage";
|
||||
import KnowledgeGraphPage from "@/pages/KnowledgeGraph/Home/KnowledgeGraphPage";
|
||||
|
||||
const router = createBrowserRouter([
|
||||
{
|
||||
@@ -287,6 +288,10 @@ const router = createBrowserRouter([
|
||||
},
|
||||
],
|
||||
},
|
||||
{
|
||||
path: "knowledge-graph",
|
||||
Component: withErrorBoundary(KnowledgeGraphPage),
|
||||
},
|
||||
{
|
||||
path: "task-coordination",
|
||||
children: [
|
||||
|
||||
@@ -18,6 +18,11 @@ export default defineConfig({
|
||||
// "Origin, X-Requested-With, Content-Type, Accept",
|
||||
// },
|
||||
proxy: {
|
||||
"^/knowledge-graph": {
|
||||
target: "http://localhost:8080",
|
||||
changeOrigin: true,
|
||||
secure: false,
|
||||
},
|
||||
"^/api": {
|
||||
target: "http://localhost:8080", // 本地后端服务地址
|
||||
changeOrigin: true,
|
||||
|
||||
Reference in New Issue
Block a user