AnyAnomaly:基于LVLM的零样本可定制视频异常检测
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM
March 6, 2025
作者: Sunghyun Ahn, Youngwan Jo, Kijung Lee, Sein Kwon, Inpyo Hong, Sanghyun Park
cs.AI
摘要
视频异常检测(VAD)在计算机视觉的视频分析与监控中至关重要。然而,现有的VAD模型依赖于学习到的正常模式,这使其难以适应多样化的环境。因此,用户需要针对新环境重新训练模型或开发独立的AI模型,这不仅需要机器学习专业知识、高性能硬件,还需大量数据收集,限制了VAD的实际应用。为解决这些挑战,本研究提出了可定制视频异常检测(C-VAD)技术及AnyAnomaly模型。C-VAD将用户定义的文本视为异常事件,并检测视频中包含指定事件的帧。我们通过上下文感知的视觉问答有效实现了AnyAnomaly,无需对大型视觉语言模型进行微调。为验证所提模型的有效性,我们构建了C-VAD数据集,并展示了AnyAnomaly的优越性。此外,我们的方法在VAD基准数据集上表现出竞争力,在UBnormal数据集上取得了最先进的成果,并在所有数据集上的泛化能力优于其他方法。我们的代码已在线发布,地址为github.com/SkiddieAhn/Paper-AnyAnomaly。
English
Video anomaly detection (VAD) is crucial for video analysis and surveillance
in computer vision. However, existing VAD models rely on learned normal
patterns, which makes them difficult to apply to diverse environments.
Consequently, users should retrain models or develop separate AI models for new
environments, which requires expertise in machine learning, high-performance
hardware, and extensive data collection, limiting the practical usability of
VAD. To address these challenges, this study proposes customizable video
anomaly detection (C-VAD) technique and the AnyAnomaly model. C-VAD considers
user-defined text as an abnormal event and detects frames containing a
specified event in a video. We effectively implemented AnyAnomaly using a
context-aware visual question answering without fine-tuning the large vision
language model. To validate the effectiveness of the proposed model, we
constructed C-VAD datasets and demonstrated the superiority of AnyAnomaly.
Furthermore, our approach showed competitive performance on VAD benchmark
datasets, achieving state-of-the-art results on the UBnormal dataset and
outperforming other methods in generalization across all datasets. Our code is
available online at github.com/SkiddieAhn/Paper-AnyAnomaly.Summary
AI-Generated Summary