Assessing Large Language Models on climate information

Abstract

As Large Language Models (LLMs) rise in popularity, it is necessary to assess their capability in critically relevant domains. We present a comprehensive evaluation framework, grounded in science communication research, to assess LLM responses to questions about climate change. Our framework emphasizes both presentational and epistemological adequacy, offering a fine-grained analysis of LLM generations spanning 8 dimensions and 30 issues. Our evaluation task is a real-world example of a growing number of challenging problems where AI can complement and lift human performance. We introduce a novel protocol for scalable oversight that relies on AI Assistance and raters with relevant education. We evaluate several recent LLMs on a set of diverse climate questions. Our results point to a significant gap between surface and epistemological qualities of LLMs in the realm of climate communication.

Publication
In Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria, PMLR 235.
Figure 2: Bootstrapped mean rating, and 95% confidence intervals, for all presentational (left) and epistemological (right) dimensions.

Figure 2: Bootstrapped mean rating, and 95% confidence intervals, for all presentational (left) and epistemological (right) dimensions.

Niels G. Mede
Niels G. Mede
Science Communication Researcher

I am a Senior Research and Teaching Associate at the Department of Communication and Media Research (IKMZ) of the University of Zurich, where I also completed a PhD in communication studies. My work focuses on science communication, digital media, public perceptions of science, threats and attacks against scholars, climate change communication, and survey methodology. Over the last years, I was a visiting researcher at the Department of Life Sciences Communication of the University of Wisconsin—Madison (2022), the Oxford Internet Institute at the University of Oxford (2023), and the Digital Media Research Centre at the Queensland University of Technology (2024).