Abstract: Audio-visual scene classification (AVSC) aims at classifying a video recording into one of the predefined scene categories, using both audio and visual modalities, which is a fundamental yet ...