核心功能: - Neo4j 索引优化(entityType, graphId, properties.name) - Redis 缓存(Java 侧,3 个缓存区,TTL 可配置) - LRU 缓存(Python 侧,KG + Embedding,线程安全) - 细粒度缓存清除(graphId 前缀匹配) - 失败路径缓存清除(finally 块) 新增文件(Java 侧,7 个): - V2__PerformanceIndexes.java - Flyway 迁移,创建 3 个索引 - IndexHealthService.java - 索引健康监控 - RedisCacheConfig.java - Spring Cache + Redis 配置 - GraphCacheService.java - 缓存清除管理器 - CacheableIntegrationTest.java - 集成测试(10 tests) - GraphCacheServiceTest.java - 单元测试(19 tests) - V2__PerformanceIndexesTest.java, IndexHealthServiceTest.java 新增文件(Python 侧,2 个): - cache.py - 内存 TTL+LRU 缓存(cachetools) - test_cache.py - 单元测试(20 tests) 修改文件(Java 侧,9 个): - GraphEntityService.java - 添加 @Cacheable,缓存清除 - GraphQueryService.java - 添加 @Cacheable(包含用户权限上下文) - GraphRelationService.java - 添加缓存清除 - GraphSyncService.java - 添加缓存清除(finally 块,失败路径) - KnowledgeGraphProperties.java - 添加 Cache 配置类 - application-knowledgegraph.yml - 添加 Redis 和缓存 TTL 配置 - GraphEntityServiceTest.java - 添加 verify(cacheService) 断言 - GraphRelationServiceTest.java - 添加 verify(cacheService) 断言 - GraphSyncServiceTest.java - 添加失败路径缓存清除测试 修改文件(Python 侧,5 个): - kg_client.py - 集成缓存(fulltext_search, get_subgraph) - interface.py - 添加 /cache/stats 和 /cache/clear 端点 - config.py - 添加缓存配置字段 - pyproject.toml - 添加 cachetools 依赖 - test_kg_client.py - 添加 _disable_cache fixture 安全修复(3 轮迭代): - P0: 缓存 key 用户隔离(防止跨用户数据泄露) - P1-1: 同步子步骤后的缓存清除(18 个方法) - P1-2: 实体创建后的搜索缓存清除 - P1-3: 失败路径缓存清除(finally 块) - P2-1: 细粒度缓存清除(graphId 前缀匹配,避免跨图谱冲刷) - P2-2: 服务层测试添加 verify(cacheService) 断言 测试结果: - Java: 280 tests pass ✅ (270 → 280, +10 new) - Python: 154 tests pass ✅ (140 → 154, +14 new) 缓存配置: - kg:entities - 实体缓存,TTL 1h - kg:queries - 查询结果缓存,TTL 5min - kg:search - 全文搜索缓存,TTL 3min - KG cache (Python) - 256 entries, 5min TTL - Embedding cache (Python) - 512 entries, 10min TTL
DataMate All-in-One Data Work Platform
DataMate is an enterprise-level data processing platform for model fine-tuning and RAG retrieval, supporting core functions such as data collection, data management, operator marketplace, data cleaning, data synthesis, data annotation, data evaluation, and knowledge generation.
If you like this project, please give it a Star⭐️!
🌟 Core Features
- Core Modules: Data Collection, Data Management, Operator Marketplace, Data Cleaning, Data Synthesis, Data Annotation, Data Evaluation, Knowledge Generation.
- Visual Orchestration: Drag-and-drop data processing workflow design.
- Operator Ecosystem: Rich built-in operators and support for custom operators.
🚀 Quick Start
Prerequisites
- Git (for pulling source code)
- Make (for building and installing)
- Docker (for building images and deploying services)
- Docker-Compose (for service deployment - Docker method)
- Kubernetes (for service deployment - k8s method)
- Helm (for service deployment - k8s method)
This project supports deployment via two methods: docker-compose and helm. After executing the command, please enter the corresponding number for the deployment method. The command echo is as follows:
Choose a deployment method:
1. Docker/Docker-Compose
2. Kubernetes/Helm
Enter choice:
Clone the Code
git clone git@github.com:ModelEngine-Group/DataMate.git
cd DataMate
Deploy the basic services
make install
If the machine you are using does not have make installed, please run the following command to deploy it:
# Windows
set REGISTRY=ghcr.io/modelengine-group/
docker compose -f ./deployment/docker/datamate/docker-compose.yml up -d
docker compose -f ./deployment/docker/milvus/docker-compose.yml up -d
# Linux/Mac
export REGISTRY=ghcr.io/modelengine-group/
docker compose -f ./deployment/docker/datamate/docker-compose.yml up -d
docker compose -f ./deployment/docker/milvus/docker-compose.yml up -d
Once the container is running, access http://localhost:30000 in a browser to view the front-end interface.
To list all available Make targets, flags and help text, run:
make help
Build and deploy Mineru Enhanced PDF Processing
make build-mineru
make install-mineru
Deploy the DeerFlow service
make install-deer-flow
Local Development and Deployment
After modifying the local code, please execute the following commands to build the image and deploy using the local image.
make build
make install dev=true
Uninstall
make uninstall
When running make uninstall, the installer will prompt once whether to delete volumes; that single choice is applied to all components. The uninstall order is: milvus -> label-studio -> datamate, which ensures the datamate network is removed cleanly after services that use it have stopped.
🤝 Contribution Guidelines
Thank you for your interest in this project! We warmly welcome contributions from the community. Whether it's submitting bug reports, suggesting new features, or directly participating in code development, all forms of help make the project better.
• 📮 GitHub Issues: Submit bugs or feature suggestions.
• 🔧 GitHub Pull Requests: Contribute code improvements.
📄 License
DataMate is open source under the MIT license. You are free to use, modify, and distribute the code of this project in compliance with the license terms.