Files

T

q792602257 9b6ff59a11 feat(kg): 实现 Phase 3.3 性能优化

核心功能：
- Neo4j 索引优化（entityType, graphId, properties.name）
- Redis 缓存（Java 侧，3 个缓存区，TTL 可配置）
- LRU 缓存（Python 侧，KG + Embedding，线程安全）
- 细粒度缓存清除（graphId 前缀匹配）
- 失败路径缓存清除（finally 块）

新增文件（Java 侧，7 个）：
- V2__PerformanceIndexes.java - Flyway 迁移，创建 3 个索引
- IndexHealthService.java - 索引健康监控
- RedisCacheConfig.java - Spring Cache + Redis 配置
- GraphCacheService.java - 缓存清除管理器
- CacheableIntegrationTest.java - 集成测试（10 tests）
- GraphCacheServiceTest.java - 单元测试（19 tests）
- V2__PerformanceIndexesTest.java, IndexHealthServiceTest.java

新增文件（Python 侧，2 个）：
- cache.py - 内存 TTL+LRU 缓存（cachetools）
- test_cache.py - 单元测试（20 tests）

修改文件（Java 侧，9 个）：
- GraphEntityService.java - 添加 @Cacheable，缓存清除
- GraphQueryService.java - 添加 @Cacheable（包含用户权限上下文）
- GraphRelationService.java - 添加缓存清除
- GraphSyncService.java - 添加缓存清除（finally 块，失败路径）
- KnowledgeGraphProperties.java - 添加 Cache 配置类
- application-knowledgegraph.yml - 添加 Redis 和缓存 TTL 配置
- GraphEntityServiceTest.java - 添加 verify(cacheService) 断言
- GraphRelationServiceTest.java - 添加 verify(cacheService) 断言
- GraphSyncServiceTest.java - 添加失败路径缓存清除测试

修改文件（Python 侧，5 个）：
- kg_client.py - 集成缓存（fulltext_search, get_subgraph）
- interface.py - 添加 /cache/stats 和 /cache/clear 端点
- config.py - 添加缓存配置字段
- pyproject.toml - 添加 cachetools 依赖
- test_kg_client.py - 添加 _disable_cache fixture

安全修复（3 轮迭代）：
- P0: 缓存 key 用户隔离（防止跨用户数据泄露）
- P1-1: 同步子步骤后的缓存清除（18 个方法）
- P1-2: 实体创建后的搜索缓存清除
- P1-3: 失败路径缓存清除（finally 块）
- P2-1: 细粒度缓存清除（graphId 前缀匹配，避免跨图谱冲刷）
- P2-2: 服务层测试添加 verify(cacheService) 断言

测试结果：
- Java: 280 tests pass ✅ (270 → 280, +10 new)
- Python: 154 tests pass ✅ (140 → 154, +14 new)

缓存配置：
- kg:entities - 实体缓存，TTL 1h
- kg:queries - 查询结果缓存，TTL 5min
- kg:search - 全文搜索缓存，TTL 3min
- KG cache (Python) - 256 entries, 5min TTL
- Embedding cache (Python) - 512 entries, 10min TTL

2026-02-20 18:28:33 +08:00

app

feat(kg): 实现 Phase 3.3 性能优化

2026-02-20 18:28:33 +08:00

deploy

feat(auto-annotation): integrate YOLO auto-labeling and enhance data management (#223 )

2026-01-05 14:22:44 +08:00

examples

feat: Enhance file tag update functionality with automatic format conversion (#84 )

2025-11-14 12:42:39 +08:00

.env.example

feat: File and Annotation 2-way sync implementation (#63 )

2025-11-07 15:03:07 +08:00

.gitignore

feat: File and Annotation 2-way sync implementation (#63 )

2025-11-07 15:03:07 +08:00

poetry.lock

feat(runtime): 添加 Pillow 图像处理库依赖

2026-02-06 13:21:01 +08:00

pyproject.toml

feat(kg): 实现 Phase 3.3 性能优化

2026-02-20 18:28:33 +08:00

README.md

docs: update README and Makefile for clarity and new development instructions (#147 )

2025-12-10 12:25:25 +08:00

uvicorn_start.sh

feat: File and Annotation 2-way sync implementation (#63 )

2025-11-07 15:03:07 +08:00

README.md

DataMate Python Service (DataMate)

这是 DataMate 的 Python 服务，负责DataMate的数据合成、数据标注、数据评估等功能。

简要说明

框架：FastAPI
异步数据库/ORM：SQLAlchemy (async)
数据库迁移：Alembic
运行器：uvicorn

快速开始（开发）

前置条件

Python 3.11+
poetry 包管理器

克隆仓库

git clone git@github.com:ModelEngine-Group/DataMate.git

cd runtime/datamate-python

安装依赖由于项目使用poetry管理依赖，你可以使用以下命令安装：：

poetry install

或者直接使用pip安装（如果poetry不可用）：

pip install -e .

配置环境变量复制环境变量示例文件并配置：

cp .env.example .env

编辑.env文件，设置必要的环境变量，如数据库连接、Label Studio配置等。

数据库迁移（开发环境）：

alembic upgrade head

启动开发服务器（示例与常用参数）：

本地开发（默认 host/port，自动重载）：

set -a && source .env && set +a && poetry run uvicorn app.main:app --port 18000 --log-level debug --reload

或者

poetry run python -m app.main

指定主机与端口并打开调试日志：

uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload --log-level debug

在生产环境使用多个 worker（不使用 --reload）：

uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4 --log-level info --proxy-headers

使用环境变量启动（示例）：

HOST=0.0.0.0 PORT=8000 uvicorn app.main:app --reload

注意：

--reload 仅用于开发，会监视文件变化并重启进程；不要在生产中使用。
--workers 提供并发处理能力，但会增加内存占用；生产时通常配合进程管理或容器编排（Kubernetes）使用。
若需要完整的生产部署建议使用 ASGI 服务器（如 gunicorn + uvicorn workers / 或直接使用 uvicorn 在容器中配合进程管理）。

访问 API 文档：

Swagger UI: http://127.0.0.1:8000/docs
ReDoc: http://127.0.0.1:8000/redoc （推荐使用）

开发新功能

安装开发依赖：

poetry  add xxx

使用（简要）

所有 API 路径以 /api 前缀注册（见 app/main.py 中 app.include_router(api_router, prefix="/api")）。
根路径 / 返回服务信息和文档链接。

更多细节请查看 doc/usage.md（接口使用）和 doc/development.md（开发说明）。