* feature: add cot data evaluation function
* fix: added verification to evaluation results
* fix: fix the prompt for evaluating
* fix: 修复当评估结果为空导致读取失败的问题
* feature: implement endpoints with multi-level response models
* refactor: move `/health` and `/config` endpoints to system module, remove example from base schemas
* refactor: remove unused get_standard_response_model()