3rd International Workshop on Big Surveillance Data Analysis and Processing (Big-SURV) @ ICME 2021

Introduction

With the rapid growth of video surveillance applications and services, the amount of surveillance videos has become extremely "big" which makes human monitoring tedious and difficult. Therefore, there exists a huge demand for smart surveillance techniques which can perform monitoring in an automatic or semi-automatic way. A number of challenges have arisen in the area of big surveillance data analysis and processing. Firstly, with the huge amount of surveillance videos in storage, video analysis tasks such as event detection, action recognition, and video summarization are of increasing importance in applications including events-of-interest retrieval and abnormality detection. Secondly, semantic data (e.g. objects' trajectory and bounding boxes) has become an essential data type in surveillance systems owing much to the growth of its size and complexity, hence introducing new challenging topics, such as efficient semantic data processing and compression, to the community. Thirdly, with the rapid growth from the static centric-based processing to the dynamic computing among distributed video processing nodes/cameras, new challenges such as multi-camera analysis, person re-identification, or distributed video processing are being issued in front of us. To meet these challenges, there is great need to extend existing approaches or explore new feasible techniques.

This is the 3rd edition of our workshop. The first two were organized in conjunction with ICME 2019 (Shanghai, China) and ICME 2020 (London, UK)

Scope & Topics

This workshop is intended to provide a forum for researchers and engineers to present their latest innovations and share their experiences on all aspects of design and implementation of new surveillance video analysis and processing techniques. Topics of interests include, but are not limited to:

Action/activity recognition, and event detection in surveillance videos
Multi-camera surveillance networks and applications
Surveillance scene parsing, segmentation, and analysis
Crowd parsing, estimation and analysis
Person, group or object or re-identification
Summarization and synopsis of surveillance videos
Big Data processing in large-scale surveillance systems
Distributed, edge and fog computing for surveillance systems
Low-resolution video analysis and processing: Recognition and object detection, restoration, denoising, enhancement, super-resolution
Scalable surveillance video analysis with fast model inference and low memory footprint
Surveillance from multiple modalities, not limited to: UAVs, satellite imagery, dash cams, wearables.

Call for Papers

Important Dates

~~March 13, 2021~~

~~March 27, 2021~~

Camera-Ready Due Date: April 13, 2021

9 July 2021, 2.00pm - 6.00pm

Virtual

Format Requirements & Templates

Papers must be no longer than 6 pages, including all text, figures, and references.

Workshop papers have the same format as regular papers. See the templates below. Submitted paper does not need to be double blind.

Word Template & Sample (zip)

LaTeX Template & Sample (zip)

Important: A complete paper should be submitted using the above templates.

Submission Details

https://cmt3.research.microsoft.com/ICMEW2021

Reviews will be handled directly by the Organizers and the Technical Program Committee (TPC).

As with accepted Regular and Special Session papers, accepted Workshop papers must be registered by the author deadline and presented at the conference; otherwise they will not be included in IEEE Xplore. A workshop paper is covered by a full-conference registration only.

Conference Location: Virtual

Schedule

Time	Talk/Presentation
14.00-14.10	Opening Remarks
14.10-15.00	Invited Keynote: Toward Human-Level General Video Understanding Yu Qiao (SIAT, CAS)
15.00-15.48 (12 mins per talk)	Track 1: Large-scale Surveillance Tasks Hierarchical Attention Image-Text Alignment Network for Person Re-Identification Kajal Kansal (IIITD); A Subramanyam (IIITD); Zheng Wang (National Institute of Informatics); Shin'ichi Satoh (National Institute of Informatics)* Cluster-based Distribution Alignment for Generalizable Person Re-identification Chengzhang Zhu (Central South University); Zhe Chang (Central South University); Yalong Xiao (Central South University); Beiji Zou (Central South University); Bozhou Li (Central South University); Shu Liu (Central South University)* Deep4Air: A Novel Deep Learning Framework for Airport Airside Surveillance Phat Van Thai (Nanyang Technological University); Sameer Alam (Nanyang Technological University); Nimrod Lilith (Nanyang Technological University); Phu Tran (Nanyang Technological University ); Thanh Binh Nguyen (University of Science) Dense Point Prediction: A Simple Baseline for Crowd Counting and Localization Yi Wang (Nanyang Technological University); Xinyu Hou (Nanyang Technological University); Lap-Pui Chau (Nanyang Technological University)
15.48-16.36 (12 mins per talk)	Track 2: Detection, Tracking & Recognition for Surveillance A Dataset and Benchmark of Underwater Object Detection for Robot Picking Chongwei Liu (Dalian University of Technology); Haojie Li (Dalian University of Technology); Shuchang Wang ( Dalian University of Technology); Ming Zhu (Dalian University of Technology); Dong Wang (Dalian University of Technology); Xin Fan (Dalian University of Technology); zhihui wang (Dalian University of Technology)* Oriented Object Detection for Remote Sensing Images Based on Weakly Supervised Learning Yongqing Sun (NTT, Japan); Ran Jie (Chongqing University of Posts and Telecommunications); Feng Yang (Chongqing Key Laboratory of Signal and Information Processing, Chongqing University of Posts and Telecommunications); Chenqiang Gao (Chongqing University of Posts and Telecommunications); Takayuki Kurozumi (NTT Media Intelligence Laboratories); Hideaki Kimata (NTT); Ziqi Ye (Chongqing University of Posts and Telecommunications) Multi-Object Tracking with Tracked Object Bounding Box Association Nanyang Yang (Nanyang Technological University); Yi Wang (Nanyang Technological University); Lap-Pui Chau (Nanyang Technological University) Generate and Adjust: a Novel Framework for Semi-supervised Pedestrian Attribute Recognition Xuebo Shan (Peking University Shenzhen Graduate School)*; Peixi Peng (Peking University); Yunpeng Zhai (Peking University Shenzhen Graduate School); Chong Zhang (Peking University Shenzhen Graduate School); Tiejun Huang (Peking University); Yonghong Tian (Peking University)
16.36-16.48	Short Break
16.48-17.36 (12 mins per talk)	Track 3: Complementary Topics to Surveillance Correcting Perspective Distortion in Incremental Video Stitching Yinqi Chen (Jihua Lab); Huicheng Zheng (Sun Yat-sen University); Junyu Lin (Sun Yat-sen University)* Topic-guided Local-global Graph Neural Network for Image Captioning Jichao Kan (University of Sydney); Kun Hu (The Univeristy of Sydney); Zhiyong Wang (The University of Sydney); Qiuxia Wu (South China University of Technology, China); Markus Hagenbuchner (The University of Wollongong, Australia); Ah Chung Tsoi (University of Wollongong) Adaptive Multi-Scale Semantic Fusion Network for Zero-Shot Learning Jing Song (Peking University Shenzhen Graduate School); Peixi Peng (Peking University); Yunpeng Zhai (Peking University Shenzhen Graduate School); Chong Zhang (Peking University Shenzhen Graduate School); Yonghong Tian (Peking University) Global Feature Fusion Attention Network for Single Image Dehazing Jie Luo (Northwest University); Qirong Bu (NorthWest University); Lei Zhang (NorthWest University); Jun Feng (Northwest University)*
17.36-17.45	Closing Remarks

Invited Keynote Speaker

Yu Qiao

Toward Human-Level General Video Understanding

Abstract: Video understanding is an important yet challenging problem in computer vision. Compared with images, video include multiple frames of images with complex motions and dynamic structures. Recent years witnessed the significant progresses in video classification, with the deep learning models and larger video datasets. However, there is a clear gap between human level understanding and SOTA algorithms. This talk with summarize recent progresses on video understanding from the perspective of dataset, task, and models. We will also discuss future tendency toward human-level General Video Understanding (GVU), including large video datasets with fine tasks, more effective and efficient deep network, and generalization to long tail distribution.

Biodata: Yu Qiao is a Professor with Shenzhen Institutes of Advanced Technology (SIAT) Chinese Academy of Science, and Shanghai AI Laboratory. His research interests include computer vision, deep learning, and bioinformation. He has published more than 180 papers in international journals and conferences, including T-PAMI, IJCV, T-IP, T-SP, CVPR, ICCV etc. His H-index is 62, with 25,000+ citations in Google scholar. He is a recipient of the distinguished paper award in AAAI 2021. He received the first prize of Guangdong technological invention award, and Jiaxi Lv young researcher award from Chinese academy of sciences. His group achieved the first runner-up at the ImageNet Large Scale Visual Recognition Challenge 2015 in scene recognition, and the winner at the ActivityNet Large Scale Activity Recognition Challenge 2016 in video classification.

Introduction

Scope & Topics

Call for Papers

Schedule

Invited Keynote Speaker

Yu Qiao

Toward Human-Level General Video Understanding

Organizers

Weiyao Lin
wylin AT sjtu.edu.cn

John See
johnsee AT ieee.org

Xiatian Zhu (Eddy)
eddy.zhuxt AT gmail.com

Contact

Weiyao Lin wylin AT sjtu.edu.cn

John See johnsee AT ieee.org

Xiatian Zhu (Eddy) eddy.zhuxt AT gmail.com

Weiyao Lin
wylin AT sjtu.edu.cn

John See
johnsee AT ieee.org

Xiatian Zhu (Eddy)
eddy.zhuxt AT gmail.com