* refactor: 修改调整数据归集实现,删除无用代码,优化代码结构 * feature: 每天凌晨00:00扫描所有数据集,检查数据集是否超过了预设的保留天数,超出保留天数的数据集调用删除接口进行删除 * fix: 修改删除数据集文件的逻辑,上传到数据集中的文件会同时删除数据库中的记录和文件系统中的文件,归集过来的文件仅删除数据库中的记录 * fix: 增加参数校验和接口定义,删除不使用的接口 * fix: 数据集统计数据默认为0 * feature: 数据集状态增加流转,创建时为草稿状态,上传文件或者归集文件后修改为活动状态 * refactor: 修改分页查询归集任务的代码 * fix: 更新后重新执行;归集任务执行增加事务控制 * feature: 创建归集任务时能够同步创建数据集,更新归集任务时能更新到指定数据集
DataMate All-in-One Data Work Platform
DataMate is an enterprise-level data processing platform for model fine-tuning and RAG retrieval, supporting core functions such as data collection, data management, operator marketplace, data cleaning, data synthesis, data annotation, data evaluation, and knowledge generation.
If you like this project, please give it a Star⭐️!
🌟 Core Features
- Core Modules: Data Collection, Data Management, Operator Marketplace, Data Cleaning, Data Synthesis, Data Annotation, Data Evaluation, Knowledge Generation.
- Visual Orchestration: Drag-and-drop data processing workflow design.
- Operator Ecosystem: Rich built-in operators and support for custom operators.
🚀 Quick Start
Prerequisites
- Git (for pulling source code)
- Make (for building and installing)
- Docker (for building images and deploying services)
- Docker-Compose (for service deployment - Docker method)
- Kubernetes (for service deployment - k8s method)
- Helm (for service deployment - k8s method)
Clone the Code
git clone git@github.com:ModelEngine-Group/DataMate.git
cd DataMate
Build Images
make build
Docker Installation
make install INSTALLER=docker
Kubernetes Installation
make install INSTALLER=k8s
🤝 Contribution Guidelines
Thank you for your interest in this project! We warmly welcome contributions from the community. Whether it's submitting bug reports, suggesting new features, or directly participating in code development, all forms of help make the project better.
• 📮 GitHub Issues: Submit bugs or feature suggestions.
• 🔧 GitHub Pull Requests: Contribute code improvements.
📄 License
DataMate is open source under the MIT license. You are free to use, modify, and distribute the code of this project in compliance with the license terms.