NBFC Platform

Project:

Back Edit Delete

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models

Created by MG96

External Public cs.CV

Statistics

Citations

References

Last updated

Authors

Kim Sung-Bin Oh Hyun-Bin JungMok Lee Arda Senocak Joon Son Chung Tae-Hyun Oh

Project Resources

Filter by Resource Type:

Name	Type	Source	Actions
ArXiv Paper	Paper	arXiv	View Edit Delete
Semantic Scholar	Paper	Semantic Scholar	View Edit Delete
GitHub Repository	Code Repository	GitHub	View Edit Delete

Abstract

Following the success of Large Language Models (LLMs), expanding their boundaries to new modalities represents a significant paradigm shift in multimodal understanding. Human perception is inherently multimodal, relying not only on text but also on auditory and visual cues for a complete understanding of the world. In recognition of this fact, audio-visual LLMs have recently emerged. Despite promising developments, the lack of dedicated benchmarks poses challenges for understanding and evaluating models. In this work, we show that audio-visual LLMs struggle to discern subtle relationships between audio and visual signals, leading to hallucinations and highlighting the need for reliable benchmarks. To address this, we introduce AVHBench, the first comprehensive benchmark specifically designed to evaluate the perception and comprehension capabilities of audio-visual LLMs. Our benchmark includes tests for assessing hallucinations, as well as the cross-modal matching and reasoning abilities of these models. Our results reveal that most existing audio-visual LLMs struggle with hallucinations caused by cross-interactions between modalities, due to their limited capacity to perceive complex multimodal signals and their relationships. Additionally, we demonstrate that simple training with our AVHBench improves robustness of audio-visual LLMs against hallucinations. Dataset: https://github.com/kaist-ami/AVHBench

Project:

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models

Statistics

Citations

References

Last updated

Authors

Project Resources

Abstract

Note:

No note available for this project.

Contact:

No contact available for this project.

Project:

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models

Statistics

Citations

References

Last updated

Authors

Authors (6)

Project Resources

Resources (3)

Abstract

Note:

No note available for this project.

Contact:

No contact available for this project.