These benchmarks collectively encompass over 2.3 million questions across 45 languages and 172 medical specialties. Traditional knowledge-based benchmarks show saturation with leading models achieving ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results