DataMate

Author	SHA1	Message	Date
Dallas98	8fc4455b57	配置文件更改 (#186 ) * feat(generation_service): add image URL extraction and random QA generation logic * fix(generation_service): increase batch size from 20 to 100 for improved chunk processing * fix(generation_service): increase batch size from 20 to 100 for improved chunk processing	2025-12-22 09:29:00 +08:00
Dallas98	85eb5a99ba	feat(generation_service): add image URL extraction and random QA generation logic (#182 )	2025-12-19 17:36:00 +08:00
Dallas98	d70a3eda0d	feat(generation_service): add document filtering to remove short documents based on chunk size (#180 ) * fix(chart): update Helm chart helpers and values for improved configuration * feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths * feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates * feat(SynthFileTask): enhance file display with progress tracking and delete action * feat(SynthFileTask): enhance file display with progress tracking and delete action * feat(SynthDataDetail): add delete action for chunks with confirmation prompt * feat(SynthDataDetail): update edit and delete buttons to icon-only format * feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion * feat(DocumentSplitter): add enhanced document splitting functionality with CJK support and metadata detection * feat(DataSynthesis): refactor data synthesis models and update task handling logic * feat(DataSynthesis): streamline synthesis task handling and enhance chunk processing logic * feat(DataSynthesis): refactor data synthesis models and update task handling logic * fix(generation_service): ensure processed chunks are incremented regardless of question generation success * feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options * feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options * feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options * feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options * feat(model_chat): enhance JSON parsing by removing additional thought tags and improving fallback logic * feat(generation_service): add document filtering to remove short documents based on chunk size	2025-12-19 09:34:02 +08:00
Dallas98	27b1cc8e09	feat(model_chat): enhance JSON parsing by removing additional thought tags and improving fallback logic (#178 ) * fix(chart): update Helm chart helpers and values for improved configuration * feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths * feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates * feat(SynthFileTask): enhance file display with progress tracking and delete action * feat(SynthFileTask): enhance file display with progress tracking and delete action * feat(SynthDataDetail): add delete action for chunks with confirmation prompt * feat(SynthDataDetail): update edit and delete buttons to icon-only format * feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion * feat(DocumentSplitter): add enhanced document splitting functionality with CJK support and metadata detection * feat(DataSynthesis): refactor data synthesis models and update task handling logic * feat(DataSynthesis): streamline synthesis task handling and enhance chunk processing logic * feat(DataSynthesis): refactor data synthesis models and update task handling logic * fix(generation_service): ensure processed chunks are incremented regardless of question generation success * feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options * feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options * feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options * feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options * feat(model_chat): enhance JSON parsing by removing additional thought tags and improving fallback logic	2025-12-18 19:19:54 +08:00
Dallas98	e0e9b1d94d	feat：问题生成过程优化及COT数据生成优化 (#169 ) * fix(chart): update Helm chart helpers and values for improved configuration * feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths * feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates * feat(SynthFileTask): enhance file display with progress tracking and delete action * feat(SynthFileTask): enhance file display with progress tracking and delete action * feat(SynthDataDetail): add delete action for chunks with confirmation prompt * feat(SynthDataDetail): update edit and delete buttons to icon-only format * feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion * feat(DocumentSplitter): add enhanced document splitting functionality with CJK support and metadata detection * feat(DataSynthesis): refactor data synthesis models and update task handling logic * feat(DataSynthesis): streamline synthesis task handling and enhance chunk processing logic * feat(DataSynthesis): refactor data synthesis models and update task handling logic * fix(generation_service): ensure processed chunks are incremented regardless of question generation success * feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options * feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options * feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options * feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options	2025-12-18 16:51:18 +08:00
Dallas98	ec87e4f204	feat(frontend): 增强Synthesis Data Detail页面UX体验 (#163 ) * fix(chart): update Helm chart helpers and values for improved configuration * feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths * feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates * feat(SynthFileTask): enhance file display with progress tracking and delete action * feat(SynthFileTask): enhance file display with progress tracking and delete action * feat(SynthDataDetail): add delete action for chunks with confirmation prompt * feat(SynthDataDetail): update edit and delete buttons to icon-only format * feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion	2025-12-11 21:02:44 +08:00
hefanli	8f529952f6	Fix ratio (#162 ) * fix: fixed the issue where an error would be reported when only setting the proportioning quantity when creating a proportioning task * fix: prevent adding the same file multiple times * fix: implement a more flexible matching strategy, allowing only the tag name to be configured for matching	2025-12-11 17:45:16 +08:00
Dallas98	2f3ae21f8a	feat: enhance dataset file fetching with improved pagination and document loading support (#156 )	2025-12-10 22:39:24 +08:00
hefanli	99fd46cb70	fix: fix the Data Evaluation Detail page (#154 ) * fix: the Data Evaluation Detail Page should show the model used * fix: fix the time format displayed * fix: fix the Data Evaluation Detail page	2025-12-10 18:35:29 +08:00
hefanli	f87060490c	feature: data management supports nested folders (#150 ) * fix: k8s部署场景下，backend-python服务挂载需要存储 * fix: 增加数据集文件免拷贝的接口定义 * fix: 评估时评估结果赋予初始空值，防止未评估完成时接口报错 * feature: 数据管理支持嵌套文件夹（展示时按照文件系统展示；批量下载时带上相对路径） * fix: 去除多余的文件重命名逻辑 * refactor: remove unused imports	2025-12-10 16:42:45 +08:00
Dallas98	015e738a7f	feat(SynthDataDetail): add chunk/synthesis data management with edit/delete & UI enhancements (#139 ) * feat(synthesis): add evaluation task creation functionality and UI enhancements * feat(synthesis): implement synthesis data management features including loading, editing, and deleting * feat(synthesis): add endpoints for deleting and updating synthesis data and chunks * fix: Correctly extract file values from selectedFilesMap in AddDataDialog	2025-12-09 09:59:40 +08:00
hefanli	744d15ba24	fix: 修复评估时模型输出json格式不对导致读取错误的问题 (#133 ) * feature: add cot data evaluation function * fix: added verification to evaluation results * fix: fix the prompt for evaluating * fix: 修复当评估结果为空导致读取失败的问题	2025-12-04 18:49:50 +08:00
Dallas98	31c4966608	feat(synthesis): add functionality to archive synthesis tasks to existing datasets (#132 )	2025-12-04 17:11:43 +08:00
Dallas98	7012a9ad98	feat: enhance backend deployment, frontend file selection and synthesis task management (#129 ) * feat: Implement data synthesis task management with database models and API endpoints * feat: Update Python version requirements and refine dependency constraints in configuration * fix: Correctly extract file values from selectedFilesMap in AddDataDialog * feat: Refactor synthesis task routes and enhance file task management in the API * feat: Enhance SynthesisTaskTab with tooltip actions and add chunk data retrieval in API	2025-12-04 09:57:13 +08:00
hefanli	1d19cd3a62	feature: add data-evaluation * feature: add evaluation task management function * feature: add evaluation task detail page * fix: delete duplicate definition for table t_model_config * refactor: rename package synthesis to ratio * refactor: add eval file table and refactor related code * fix: calling large models in parallel during evaluation	2025-12-04 09:23:54 +08:00
Dallas98	8b164cb012	feat: Implement data synthesis task management with database models and API endpoints (#122 )	2025-12-02 15:23:58 +08:00
hefanli	c1352ab91f	feature: multiple ratio configurations can be set for the data set. (#103 ) feature: multiple ratio configurations can be set for the data set.	2025-11-24 15:28:17 +08:00
hefanli	cddfe9b149	feature: 数据配比增加通过更新时间来配置 (#95 ) * feature: 数据配比增加通过更新时间来配置 * fix: 修复配比时间参数传递的问题	2025-11-20 18:50:51 +08:00
hefanli	a07fba23f2	feature：数据集导入数据集支持选择归集任务导入 (#92 ) * feature: 实现obs归集 * feature: 增加数据集中出现同名文件时的处理方式 * feature: 前端数据集导入数据时增加可以选择归集任务导入	2025-11-19 11:05:33 +08:00
Jason Wang	df853a5177	feat: Enhance file tag update functionality with automatic format conversion (#84 ) - Updated `update_file_tags` to support both simplified and full tag formats. - Introduced `TagFormatConverter` to handle conversion from simplified external tags to internal storage format. - Added logic to fetch and utilize the appropriate annotation template for conversion. - Improved error handling for missing templates and unknown controls during tag updates. - Created example script demonstrating the usage of the new tag format conversion feature. - Added unit tests for `TagFormatConverter` to ensure correct functionality and edge case handling.	2025-11-14 12:42:39 +08:00
Jason Wang	45743f39f5	feat: add labeling template. refactor: switch to Poetry, build and deploy of backend Python (#79 ) * feat: Enhance annotation module with template management and validation - Added DatasetMappingCreateRequest and DatasetMappingUpdateRequest schemas to handle dataset mapping requests with camelCase and snake_case support. - Introduced Annotation Template schemas including CreateAnnotationTemplateRequest, UpdateAnnotationTemplateRequest, and AnnotationTemplateResponse for managing annotation templates. - Implemented AnnotationTemplateService for creating, updating, retrieving, and deleting annotation templates, including validation of configurations and XML generation. - Added utility class LabelStudioConfigValidator for validating Label Studio configurations and XML formats. - Updated database schema for annotation templates and labeling projects to include new fields and constraints. - Seeded initial annotation templates for various use cases including image classification, object detection, and text classification. * feat: Enhance TemplateForm with improved validation and dynamic field rendering; update LabelStudio config validation for camelCase support * feat: Update docker-compose.yml to mark datamate dataset volume and network as external * feat: Add tag configuration management and related components - Introduced new components for tag selection and browsing in the frontend. - Added API endpoint to fetch tag configuration from the backend. - Implemented tag configuration management in the backend, including loading from YAML. - Enhanced template service to support dynamic tag rendering based on configuration. - Updated validation utilities to incorporate tag configuration checks. - Refactored existing code to utilize the new tag configuration structure. * feat: Refactor LabelStudioTagConfig for improved configuration loading and validation * feat: Update Makefile to include backend-python-docker-build in the build process * feat: Migrate to poetry for better deps management * Add pyyaml dependency and update Dockerfile to use Poetry for dependency management - Added pyyaml (>=6.0.3,<7.0.0) to pyproject.toml dependencies. - Updated Dockerfile to install Poetry and manage dependencies using it. - Improved layer caching by copying only dependency files before the application code. - Removed unnecessary installation of build dependencies to keep the final image size small. * feat: Remove duplicated backend-python-docker-build target from Makefile * fix: airflow is not ready for adding yet * feat: update Python version to 3.12 and remove project installation step in Dockerfile	2025-11-13 15:32:30 +08:00
Jason Wang	c5ccc56cca	feat: Add labeling template (#72 ) * feat: Enhance annotation module with template management and validation - Added DatasetMappingCreateRequest and DatasetMappingUpdateRequest schemas to handle dataset mapping requests with camelCase and snake_case support. - Introduced Annotation Template schemas including CreateAnnotationTemplateRequest, UpdateAnnotationTemplateRequest, and AnnotationTemplateResponse for managing annotation templates. - Implemented AnnotationTemplateService for creating, updating, retrieving, and deleting annotation templates, including validation of configurations and XML generation. - Added utility class LabelStudioConfigValidator for validating Label Studio configurations and XML formats. - Updated database schema for annotation templates and labeling projects to include new fields and constraints. - Seeded initial annotation templates for various use cases including image classification, object detection, and text classification. * feat: Enhance TemplateForm with improved validation and dynamic field rendering; update LabelStudio config validation for camelCase support * feat: Update docker-compose.yml to mark datamate dataset volume and network as external	2025-11-11 09:14:14 +08:00
Vincent	60e2289019	fix：修复配比任务操作问题 (#66 ) * fix：配比任务需要能够跳转到目标数据集 * feature：增加配比任务详情接口 * fix：删除不存在的配比详情页面 * fix：使用正式的逻辑来展示标签 * fix：参数默认值去掉多余的- * fix：修复配比任务相关操作	2025-11-07 19:01:45 +08:00
Jason Wang	78f50ea520	feat: File and Annotation 2-way sync implementation (#63 ) * feat: Refactor configuration and sync logic for improved dataset handling and logging * feat: Enhance annotation synchronization and dataset file management - Added new fields `tags_updated_at` to `DatasetFiles` model for tracking the last update time of tags. - Implemented new asynchronous methods in the Label Studio client for fetching, creating, updating, and deleting task annotations. - Introduced bidirectional synchronization for annotations between DataMate and Label Studio, allowing for flexible data management. - Updated sync service to handle annotation conflicts based on timestamps, ensuring data integrity during synchronization. - Enhanced dataset file response model to include tags and their update timestamps. - Modified database initialization script to create a new column for `tags_updated_at` in the dataset files table. - Updated requirements to ensure compatibility with the latest dependencies.	2025-11-07 15:03:07 +08:00
Vincent	1686f56641	fix: 配比任务能够跳转到目标数据集 (#59 ) * fix：配比任务需要能够跳转到目标数据集 * feature：增加配比任务详情接口 * fix：删除不存在的配比详情页面	2025-11-06 12:16:20 +08:00
Jason Wang	b5fe787c20	feat: Labeling Frontend adaptations + Backend build and deploy + Logging improvement (#55 ) * feat: Front-end data annotation page adaptation to the backend API. * feat: Implement labeling configuration editor and enhance annotation task creation form * feat: add python backend build and deployment; add backend configuration for Label Studio integration and improve logging setup * refactor: remove duplicate log configuration	2025-11-05 01:55:53 +08:00
hefanli	08bd4eca5c	feature：增加数据配比功能 (#52 ) * refactor: 修改调整数据归集实现，删除无用代码，优化代码结构 * feature: 每天凌晨00：00扫描所有数据集，检查数据集是否超过了预设的保留天数，超出保留天数的数据集调用删除接口进行删除 * fix: 修改删除数据集文件的逻辑，上传到数据集中的文件会同时删除数据库中的记录和文件系统中的文件，归集过来的文件仅删除数据库中的记录 * fix: 增加参数校验和接口定义，删除不使用的接口 * fix: 数据集统计数据默认为0 * feature: 数据集状态增加流转，创建时为草稿状态，上传文件或者归集文件后修改为活动状态 * refactor: 修改分页查询归集任务的代码 * fix: 更新后重新执行；归集任务执行增加事务控制 * feature: 创建归集任务时能够同步创建数据集，更新归集任务时能更新到指定数据集 * fix: 创建归集任务不需要创建数据集时不应该报错 * fix: 修复删除文件时数据集的统计数据不变动 * feature: 查询数据集详情时能够获取到文件标签分布 * fix: tags为空时不进行分析 * fix: 状态修改为ACTIVE * fix: 修改解析tag的方法 * feature: 实现创建、分页查询、删除配比任务 * feature: 实现创建、分页查询、删除配比任务的前端交互 * fix: 修复进度计算异常导致的页面报错	2025-11-03 10:17:39 +08:00
Jason Wang	ba0c69086a	feat: data annotation page adaptation to backend API. Improve labeling project creation module. * feat: data annotation page adaptation to the backend API. * feat: Implement labeling configuration editor and enhance annotation task creation form	2025-10-31 15:56:29 +08:00
Jason Wang	e0884ab048	Develop py update schema (#37 ) * feature: implement endpoints with multi-level response models * refactor: move `/health` and `/config` endpoints to system module, remove example from base schemas * refactor: remove unused get_standard_response_model()	2025-10-30 16:24:37 +08:00
Jason Wang	2f7341dc1f	refactor: Reorganize datamate-python (#34 ) refactor: Reorganize datamate-python (previously label-studio-adapter) into a DDD style structure.	2025-10-30 01:32:59 +08:00

1 2

80 Commits