Tennisatw的博客 - Blog of Tennisatw

大语言模型综合排行榜 - LLM Composite Rankings – 250914

发表于 - Posted on 2025/09/14 系列 - Series LLM排行榜 - LLM Leaderboard 字数 - Word count: 894 阅读时间 - Reading time ≈ 3 mins.

大语言模型综合排行榜 - LLM Composite Rankings – 250907

发表于 - Posted on 2025/09/07 系列 - Series LLM排行榜 - LLM Leaderboard 字数 - Word count: 422 阅读时间 - Reading time ≈ 2 mins.

本表格汇总了常用大语言模型在主流评测排行榜上的表现。评测范围涵盖：人类偏好（文字和视觉），知识与推理，数学能力，代码能力，和长文本推理。在整合各项评测结果的基础上，计算出综合排名。
This chart compiles the performance of commonly used large language models across major benchmark leaderboards. Evaluation categories include：Human preference (text & vision), Knowledge and reasoning, Mathematical ability, Coding capability, and Long-context reasoning. Based on the aggregated results from these evaluations, an overall ranking is produced.

阅读全文 - Read more »

干涉花纹

发表于 - Posted on 2025/09/02 编辑于 - Edited on 2025/09/07 字数 - Word count: 114 阅读时间 - Reading time ≈ 1 mins.

阅读全文 - Read more »

LLM排行榜：25/08/31 - LLMs Leaderboard：25/08/31

发表于 - Posted on 2025/08/30 编辑于 - Edited on 2025/08/31 系列 - Series LLM排行榜 - LLM Leaderboard 字数 - Word count: 311 阅读时间 - Reading time ≈ 1 mins.

本表格汇总了常用大语言模型在常用评测排行榜上的表现，并计算出综合排名。排行榜涵盖人类偏好、知识与推理能力、数学能力、代码能力等多个方面。
This table summarizes the performance of popular large language models across well-known benchmark leaderboards, integrating evaluation results to obtain an overall ranking. These rankings cover a range of capabilities, including human preference, knowledge and reasoning, mathematical skills, and coding ability.

阅读全文 - Read more »

夏末

发表于 - Posted on 2025/08/28 编辑于 - Edited on 2025/08/29 字数 - Word count: 104 阅读时间 - Reading time ≈ 1 mins.

阅读全文 - Read more »

LLM排行榜：25/08/24 - LLMs Leaderboard：25/08/24

发表于 - Posted on 2025/08/24 编辑于 - Edited on 2025/08/30 系列 - Series LLM排行榜 - LLM Leaderboard 字数 - Word count: 322 阅读时间 - Reading time ≈ 1 mins.

本表格汇总了常用大语言模型在常用评测榜单上的表现，整合评测结果，得到综合排名。榜单涵盖人类偏好、知识与推理能力、数学能力、代码能力等多个方面。
This table summarizes the performance of popular large language models across well-known benchmark leaderboards, integrating evaluation results to obtain an overall ranking. These rankings cover a range of capabilities, including human preference, knowledge and reasoning, mathematical skills, and coding ability.

阅读全文 - Read more »

野湖边的野草

发表于 - Posted on 2025/08/23 编辑于 - Edited on 2025/08/28 字数 - Word count: 1k 阅读时间 - Reading time ≈ 4 mins.

关于埃德蒙顿的野外湖边的一些常见的野草的记录

阅读全文 - Read more »

LLM排行榜及测评：25/08/17 - LLMs Leaderboard and Evaluation：25/08/17

发表于 - Posted on 2025/08/17 编辑于 - Edited on 2025/08/30 系列 - Series LLM排行榜 - LLM Leaderboard 字数 - Word count: 2.6k 阅读时间 - Reading time ≈ 9 mins.

本表格汇总了常用大语言模型在常用评测榜单上的表现。榜单涵盖人类偏好、知识与推理能力、数学能力、代码能力、多模态能力等多个方面。
This table summarizes the performance of popular large language models across well-known benchmark leaderboards. These rankings cover a range of capabilities, including human preference, knowledge and reasoning, mathematical skills, coding ability, and multimodal performance.

阅读全文 - Read more »

文章精读 - Paper Reading 2：Machine learning potentials for metal-organic frameworks using an incremental learning approach

发表于 - Posted on 2025/07/09 字数 - Word count: 2.3k 阅读时间 - Reading time ≈ 9 mins.

阅读全文 - Read more »

哥德尔不完备性定理

发表于 - Posted on 2025/06/08 编辑于 - Edited on 2025/08/29 字数 - Word count: 4.1k 阅读时间 - Reading time ≈ 15 mins.

尝试用通俗易懂的方法证明哥德尔不完备性定理

阅读全文 - Read more »