With the rapid growth of video surveillance applications and services, the amount of surveillance videos has become extremely "big" which makes human monitoring tedious and difficult. Therefore, there exists a huge demand for smart surveillance techniques which can perform monitoring in an automatic or semi-automatic way. A number of challenges have arisen in the area of big surveillance data analysis and processing. Firstly, with the huge amount of surveillance videos in storage, video analysis tasks such as event detection, action recognition, and video summarization are of increasing importance in applications including events-of-interest retrieval and abnormality detection. Secondly, semantic data (e.g. objects' trajectory and bounding boxes) has become an essential data type in surveillance systems owing much to the growth of its size and complexity, hence introducing new challenging topics, such as efficient semantic data processing and compression, to the community. Thirdly, with the rapid growth from the static centric-based processing to the dynamic computing among distributed video processing nodes/cameras, new challenges such as multi-camera analysis, person re-identification, or distributed video processing are being issued in front of us. To meet these challenges, there is great need to extend existing approaches or explore new feasible techniques.
This is the 3rd edition of our workshop. The first two were organized in conjunction with ICME 2019 (Shanghai, China) and ICME 2020 (London, UK)
This workshop is intended to provide a forum for researchers and engineers to present their latest innovations and share their experiences on all aspects of design and implementation of new surveillance video analysis and processing techniques. Topics of interests include, but are not limited to:
Important Dates |
|
|
|
|
Format Requirements & Templates |
|
|
|
|
|
Submission Details |
|
|
|
|
Time | Talk/Presentation |
14.00-14.10 | Opening Remarks |
14.10-15.00 | Invited Keynote: Toward Human-Level General Video Understanding Yu Qiao (SIAT, CAS) |
15.00-15.48 (12 mins per talk) |
Track 1: Large-scale Surveillance Tasks Hierarchical Attention Image-Text Alignment Network for Person Re-Identification Kajal Kansal (IIITD)*; A Subramanyam (IIITD); Zheng Wang (National Institute of Informatics); Shin'ichi Satoh (National Institute of Informatics) Cluster-based Distribution Alignment for Generalizable Person Re-identification Chengzhang Zhu (Central South University); Zhe Chang (Central South University); Yalong Xiao (Central South University); Beiji Zou (Central South University); Bozhou Li (Central South University); Shu Liu (Central South University)* Deep4Air: A Novel Deep Learning Framework for Airport Airside Surveillance Phat Van Thai (Nanyang Technological University)*; Sameer Alam (Nanyang Technological University); Nimrod Lilith (Nanyang Technological University); Phu Tran (Nanyang Technological University ); Thanh Binh Nguyen (University of Science) Dense Point Prediction: A Simple Baseline for Crowd Counting and Localization Yi Wang (Nanyang Technological University); Xinyu Hou (Nanyang Technological University); Lap-Pui Chau (Nanyang Technological University)* |
15.48-16.36 (12 mins per talk) |
Track 2: Detection, Tracking & Recognition for Surveillance A Dataset and Benchmark of Underwater Object Detection for Robot Picking Chongwei Liu (Dalian University of Technology); Haojie Li (Dalian University of Technology); Shuchang Wang ( Dalian University of Technology); Ming Zhu (Dalian University of Technology); Dong Wang (Dalian University of Technology); Xin Fan (Dalian University of Technology); zhihui wang (Dalian University of Technology)* Oriented Object Detection for Remote Sensing Images Based on Weakly Supervised Learning Yongqing Sun (NTT, Japan); Ran Jie (Chongqing University of Posts and Telecommunications); Feng Yang (Chongqing Key Laboratory of Signal and Information Processing, Chongqing University of Posts and Telecommunications)*; Chenqiang Gao (Chongqing University of Posts and Telecommunications); Takayuki Kurozumi (NTT Media Intelligence Laboratories); Hideaki Kimata (NTT); Ziqi Ye (Chongqing University of Posts and Telecommunications) Multi-Object Tracking with Tracked Object Bounding Box Association Nanyang Yang (Nanyang Technological University); Yi Wang (Nanyang Technological University); Lap-Pui Chau (Nanyang Technological University)* Generate and Adjust: a Novel Framework for Semi-supervised Pedestrian Attribute Recognition Xuebo Shan (Peking University Shenzhen Graduate School)*; Peixi Peng (Peking University); Yunpeng Zhai (Peking University Shenzhen Graduate School); Chong Zhang (Peking University Shenzhen Graduate School); Tiejun Huang (Peking University); Yonghong Tian (Peking University) |
16.36-16.48 | Short Break |
16.48-17.36 (12 mins per talk) |
Track 3: Complementary Topics to Surveillance Correcting Perspective Distortion in Incremental Video Stitching Yinqi Chen (Jihua Lab); Huicheng Zheng (Sun Yat-sen University)*; Junyu Lin (Sun Yat-sen University) Topic-guided Local-global Graph Neural Network for Image Captioning Jichao Kan (University of Sydney)*; Kun Hu (The Univeristy of Sydney); Zhiyong Wang (The University of Sydney); Qiuxia Wu (South China University of Technology, China); Markus Hagenbuchner (The University of Wollongong, Australia); Ah Chung Tsoi (University of Wollongong) Adaptive Multi-Scale Semantic Fusion Network for Zero-Shot Learning Jing Song (Peking University Shenzhen Graduate School)*; Peixi Peng (Peking University); Yunpeng Zhai (Peking University Shenzhen Graduate School); Chong Zhang (Peking University Shenzhen Graduate School); Yonghong Tian (Peking University) Global Feature Fusion Attention Network for Single Image Dehazing Jie Luo (Northwest University); Qirong Bu (NorthWest University)*; Lei Zhang (NorthWest University); Jun Feng (Northwest University) |
17.36-17.45 | Closing Remarks |
Abstract: Video understanding is an important yet challenging problem in computer vision. Compared with images, video include multiple frames of images with complex motions and dynamic structures. Recent years witnessed the significant progresses in video classification, with the deep learning models and larger video datasets. However, there is a clear gap between human level understanding and SOTA algorithms. This talk with summarize recent progresses on video understanding from the perspective of dataset, task, and models. We will also discuss future tendency toward human-level General Video Understanding (GVU), including large video datasets with fine tasks, more effective and efficient deep network, and generalization to long tail distribution.
Biodata: Yu Qiao is a Professor with Shenzhen Institutes of Advanced Technology (SIAT) Chinese Academy of Science, and Shanghai AI Laboratory. His research interests include computer vision, deep learning, and bioinformation. He has published more than 180 papers in international journals and conferences, including T-PAMI, IJCV, T-IP, T-SP, CVPR, ICCV etc. His H-index is 62, with 25,000+ citations in Google scholar. He is a recipient of the distinguished paper award in AAAI 2021. He received the first prize of Guangdong technological invention award, and Jiaxi Lv young researcher award from Chinese academy of sciences. His group achieved the first runner-up at the ImageNet Large Scale Visual Recognition Challenge 2015 in scene recognition, and the winner at the ActivityNet Large Scale Activity Recognition Challenge 2016 in video classification.
Please feel free to send any question or comments to:
johnsee AT ieee.org, wylin AT sjtu.edu.cn, eddy.zhuxt AT gmail.com