本表格汇总了常用大语言模型在常用评测榜单上的表现。榜单涵盖人类偏好、知识与推理能力、数学能力、代码能力、多模态能力等多个方面。 This table summarizes the performance of popular large language models across well-known benchmark leaderboards. These rankings cover a range of capabilities, including human preference, knowledge and reasoning, mathematical skills, coding ability, and multimodal performance.