You've already forked DataMate
feat(data-management): 扩展文档解析功能支持DOC和DOCX格式
- 添加对DOC和DOCX文件类型的常量定义和支持 - 将文件类型验证逻辑从仅PDF扩展为PDF/DOC/DOCX - 集成Docx2txtLoader用于处理Word文档解析 - 更新错误消息为中文描述以提升用户体验 - 重构文件解析方法以支持多种文档格式 - 添加解析器元数据记录以追踪使用的解析工具 - 更新文件路径验证和构建逻辑以适配新的文件类型
This commit is contained in:
@@ -2,20 +2,20 @@ from pydantic import BaseModel, Field
|
||||
|
||||
|
||||
class PdfTextExtractRequest(BaseModel):
|
||||
dataset_id: str = Field(..., alias="datasetId", description="Dataset ID")
|
||||
file_id: str = Field(..., alias="fileId", description="PDF file ID")
|
||||
dataset_id: str = Field(..., alias="datasetId", description="数据集ID")
|
||||
file_id: str = Field(..., alias="fileId", description="源文件ID")
|
||||
|
||||
class Config:
|
||||
populate_by_name = True
|
||||
|
||||
|
||||
class PdfTextExtractResponse(BaseModel):
|
||||
dataset_id: str = Field(..., alias="datasetId", description="Dataset ID")
|
||||
source_file_id: str = Field(..., alias="sourceFileId", description="Source PDF file ID")
|
||||
text_file_id: str = Field(..., alias="textFileId", description="Generated text file ID")
|
||||
text_file_name: str = Field(..., alias="textFileName", description="Generated text file name")
|
||||
text_file_path: str = Field(..., alias="textFilePath", description="Generated text file path")
|
||||
text_file_size: int = Field(..., alias="textFileSize", description="Generated text file size")
|
||||
dataset_id: str = Field(..., alias="datasetId", description="数据集ID")
|
||||
source_file_id: str = Field(..., alias="sourceFileId", description="源文件ID")
|
||||
text_file_id: str = Field(..., alias="textFileId", description="解析后的文本文件ID")
|
||||
text_file_name: str = Field(..., alias="textFileName", description="解析后的文本文件名")
|
||||
text_file_path: str = Field(..., alias="textFilePath", description="解析后的文本文件路径")
|
||||
text_file_size: int = Field(..., alias="textFileSize", description="解析后的文本文件大小")
|
||||
|
||||
class Config:
|
||||
populate_by_name = True
|
||||
|
||||
Reference in New Issue
Block a user