WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models

Abdullah Mushtaq; Imran Taj; Rafay Naeem; Ibrahim Ghaznavi; Junaid Qadir

doi:10.1613/jair.1.19001

PDF

Published: Apr 24, 2026

DOI: https://doi.org/10.1613/jair.1.19001

Keywords:

multiagent systems, natural language, philosophical foundations

Abdullah Mushtaq

Information Technology University, Department of Computer Science.

https://orcid.org/0009-0009-4267-8470

Imran Taj

Zayed University, College of Interdisciplinary Studies.

https://orcid.org/0009-0008-9062-025X

Rafay Naeem

Information Technology University, Department of Computer Science.

https://orcid.org/0009-0004-7869-7204

Ibrahim Ghaznavi

Information Technology University, Department of Computer Science.

https://orcid.org/0009-0005-3049-3330

Junaid Qadir

Qatar University, Computer Science and Engineering Department.

https://orcid.org/0000-0001-9466-2475

Abstract

Background: Large Language Models (LLMs) are predominantly trained and aligned in ways that reinforce Westerncentric epistemologies and socio-cultural norms, leading to cultural homogenization and limiting their ability to reflect global civilizational plurality. Existing benchmarking frameworks fail to adequately capture this bias, as they rely on rigid, closed-form assessments that overlook the complexity of cultural inclusivity.

Objectives: To address this cultural bias problem, we introduce WorldView-Bench, a benchmark designed to evaluate Global Cultural Inclusivity (GCI) in LLMs by analyzing their ability to accommodate diverse worldviews.

Methods: Our approach is grounded in the Multiplex Worldview proposed by Senturk et al., which distinguishes between Uniplex models, reinforcing cultural homogenization, and Multiplex models, which integrate diverse perspectives. WorldViewBench measures Cultural Polarization, the exclusion of alternative perspectives, through free-form generative evaluation rather than conventional categorical benchmarks. We implement applied multiplexity through two intervention strategies: (1) Contextually-Implemented Multiplex LLMs, where system prompts embed multiplexity principles, and (2) Multi-Agent System (MAS)-Implemented Multiplex LLMs, where multiple LLM agents representing distinct cultural perspectives collaboratively generate responses.

Results: Our results demonstrate a significant increase in Perspectives Distribution Score (PDS) entropy from 13% at baseline to 94% with MAS-Implemented Multiplex LLMs, alongside a shift toward positive sentiment (67.7%) and enhanced cultural balance.

Conclusions: The success of multiplex-aware evaluation in WorldView-Bench demonstrates that cultural bias in LLMs can be meaningfully measured and mitigated through structured worldview diversity. We expect this to pave the way for more inclusive, globally representative, and ethically aligned AI systems.

Issue

Vol. 85 (2026)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details