Deeply Learned Attributes for Crowded Scene Understanding

Jing Shao1, Kai Kang1, Chen Change Loy2, and Xiaogang Wang1

1Department of Electronic Engineering, 2Department of Informaiton Engineering, The Chinese University of Hong Kong.

[PDF] [Extended Abstract] [alt textPresentation(ppt)] [alt textPresentation(oral video)] [Demo(Dataset preview)] [Demo(Experimental results)] [Dataset Download (Baidu disk version)] [Dataset Download (Dropbox version)] [alt textCode] [Homepage]

 

Introduction

During the last decade, the field of crowd analysis had a remarkable evolution from crowded scene understanding, including crowd behavior analysis, crowd tracking, and crowd segmentation. Much of this progress was sparked by the creation of crowd datasets as well as the new and robust features and models for profiling crowd intrinsic properties. Most of the above studies on crowd understanding are scene-specific, that is, the crowd model is learned from a specific scene and thus poor in generalization to describe other scenes. Attributes are particularly effective on characterizing generic properties across scenes. Indeed, attributes can express more information in a crowd video as they can describe a video by answering “Who is in the crowd?”, “Where is the crowd?”, and “Why is crowd here?”, but not merely define a categorical scene label or event label to it. For instance, an attribute-based representation might describe a crowd video as the “conductor” and “choir” perform on the “stage” with “audience” “applauding”, in contrast to a categorical label like “chorus”.

The contributions of this work:

 

WWW Crowd Dataset

Most of the existing public crowd datasets contain only one or two specific scenes, and even the CUHK Crowd dataset merely provides 474 videos from 215 crowded scenes. On the contrary, our proposed WWW dataset provides 10,000 videos with over 8 million frames from 8,257 diverse scenes, therefore offering a superiorly comprehensive dataset for the area of crowd understanding. The abundant sources of these videos also enrich the diversity and completeness.

 
  • Description.
  •  

  • A quick glance of WWW Crowd Dataset (left) with its attributes (right). Red represents the location (Where), green represents the subject (Who), and blue refers to event/action (Why). The area of each word is proportional to the frequency of that attribute in the WWW dataset.
  •  

     

  • The preview of video samples of all the attributes in the WWW dataset.
  •  

     

    Deeply Learned Crowd Features

    The traditional input of deep model is a map of single frame (RGB channels) or multiple frames [17]. Some well-known motion features like optical flow cannot well characterize motion patterns in crowded scenes, especially across different scenes. In this paper, we propose three scene-independent motion channels as the complement of the appearance channels, as shown in the right figure.

    The first row gives an example to briefly illustrate three motion channels construction procedure. For each channel, two examples are shown in the second and third rows. Individuals in crowd moving randomly indicates low collectiveness, while the coherent motion of crowd reveals high collectiveness. Individuals have low stability if their topological structure changes a lot, whereas high stability if topological structure changes a little. Conflict occurs when individuals move towards different directions.

     

    Experimental Results

     

     

    Reference and Acknowledgments

    If you use our dataset, please cite our paper.

    Jing Shao, Kai Kang, Chen Change Loy, and Xiaogang Wang. "Deeply learned attributes for crowded scene understanding". in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015, oral).  

    This work is partially supported by the General Research Fund sponsored by the Research Grants Council of Hong Kong (Project Nos. CUHK 419412, CUHK 417011, CUHK 14206114, and CUHK 14207814), Hong Kong Innovation and Technology Support Programme (Project reference ITS/221/13FP), Shenzhen Basic Research Program (JCYJ20130402113127496), and a hardware donation from NVIDIA Corporation. Thank Lu Sheng and Tong Xiao for valuable discussions and support.

     

    Contact Me

    If you have any questions, please feel free to contact me (amandajshao@gmail.com).

    Back to top

    Last update: May 18, 2015