实现功能: - Neo4j Docker Compose 配置(社区版,端口 7474/7687,数据持久化) - Makefile 新增 Neo4j 命令(neo4j-up/down/logs/shell) - knowledge-graph-service Spring Boot 服务(完整的 DDD 分层架构) - kg_extraction Python 模块(基于 LangChain LLMGraphTransformer) 技术实现: - Neo4j 配置:环境变量化密码,统一默认值 datamate123 - Java 服务: - Domain: GraphEntity, GraphRelation 实体模型 - Repository: Spring Data Neo4j,支持 graphId 范围查询 - Service: 业务逻辑,graphId 双重校验,查询限流 - Controller: REST API,UUID 格式校验 - Exception: 实现 ErrorCode 接口,统一异常体系 - Python 模块: - KnowledgeGraphExtractor 类 - 支持异步/同步/批量抽取 - 支持 schema-guided 模式 - 兼容 OpenAI 及自部署模型 关键设计: - graphId 权限边界:所有实体操作都在正确的 graphId 范围内 - 查询限流:depth 和 limit 参数受配置约束 - 异常处理:统一使用 BusinessException + ErrorCode - 凭据管理:环境变量化,避免硬编码 - 双重防御:Controller 格式校验 + Service 业务校验 代码审查: - 经过 3 轮 Codex 审查和 2 轮 Claude 修复 - 所有 P0 和 P1 问题已解决 - 编译通过,无阻塞性问题 文件变更: - 新增:Neo4j 配置、knowledge-graph-service(11 个 Java 文件)、kg_extraction(3 个 Python 文件) - 修改:Makefile、pom.xml、application.yml、pyproject.toml
DataMate All-in-One Data Work Platform
DataMate is an enterprise-level data processing platform for model fine-tuning and RAG retrieval, supporting core functions such as data collection, data management, operator marketplace, data cleaning, data synthesis, data annotation, data evaluation, and knowledge generation.
If you like this project, please give it a Star⭐️!
🌟 Core Features
- Core Modules: Data Collection, Data Management, Operator Marketplace, Data Cleaning, Data Synthesis, Data Annotation, Data Evaluation, Knowledge Generation.
- Visual Orchestration: Drag-and-drop data processing workflow design.
- Operator Ecosystem: Rich built-in operators and support for custom operators.
🚀 Quick Start
Prerequisites
- Git (for pulling source code)
- Make (for building and installing)
- Docker (for building images and deploying services)
- Docker-Compose (for service deployment - Docker method)
- Kubernetes (for service deployment - k8s method)
- Helm (for service deployment - k8s method)
This project supports deployment via two methods: docker-compose and helm. After executing the command, please enter the corresponding number for the deployment method. The command echo is as follows:
Choose a deployment method:
1. Docker/Docker-Compose
2. Kubernetes/Helm
Enter choice:
Clone the Code
git clone git@github.com:ModelEngine-Group/DataMate.git
cd DataMate
Deploy the basic services
make install
If the machine you are using does not have make installed, please run the following command to deploy it:
# Windows
set REGISTRY=ghcr.io/modelengine-group/
docker compose -f ./deployment/docker/datamate/docker-compose.yml up -d
docker compose -f ./deployment/docker/milvus/docker-compose.yml up -d
# Linux/Mac
export REGISTRY=ghcr.io/modelengine-group/
docker compose -f ./deployment/docker/datamate/docker-compose.yml up -d
docker compose -f ./deployment/docker/milvus/docker-compose.yml up -d
Once the container is running, access http://localhost:30000 in a browser to view the front-end interface.
To list all available Make targets, flags and help text, run:
make help
Build and deploy Mineru Enhanced PDF Processing
make build-mineru
make install-mineru
Deploy the DeerFlow service
make install-deer-flow
Local Development and Deployment
After modifying the local code, please execute the following commands to build the image and deploy using the local image.
make build
make install dev=true
Uninstall
make uninstall
When running make uninstall, the installer will prompt once whether to delete volumes; that single choice is applied to all components. The uninstall order is: milvus -> label-studio -> datamate, which ensures the datamate network is removed cleanly after services that use it have stopped.
🤝 Contribution Guidelines
Thank you for your interest in this project! We warmly welcome contributions from the community. Whether it's submitting bug reports, suggesting new features, or directly participating in code development, all forms of help make the project better.
• 📮 GitHub Issues: Submit bugs or feature suggestions.
• 🔧 GitHub Pull Requests: Contribute code improvements.
📄 License
DataMate is open source under the MIT license. You are free to use, modify, and distribute the code of this project in compliance with the license terms.