Human behaviour recognition from video

Organisers: Remco Veltkamp, University of Utrecht, and Nico van der Aa, Noldus IT

Schedule: Friday 29th August 10:00 - 12:30, Kleine Veerzaal

Understanding human behavior implies measuring their behavior. Video technology provides an unobtrusive way to capture image sequences of the scene over time. Although human observers are perfectly capable of recognizing the behavior from these video streams and annotation tools like The Observer XT are available to facilitate annotation and analysis, the recognition part is tedious and expensive. Automated activity recognition is a research area that combines the fields of computer vision and machine learning. This special session is meant for researchers in the field of measuring human behavior to give some insights in what is currently possible and what the main challenges are.
The process of automated activity recognition consists of two basic steps: (1) feature selection and (2) classification. In the classification step, the goal is to give a label to each (set of) features. Standard classification methods like hidden Markov models and support vector machines can be applied straight forward to obtain the labels. In the feature selection process the image sequence is transformed into a feature that will serve as the input for the classification process. The feature can be defined as the actual pixel values, the edges or an optical flow field, but it can also be designed specifically for the application under investigation. Crucial in the choice of features is that the feature for each class should be distinguishable with respect to the features of the other classes. An example of a frequently used features is the skeleton of a person to represent the pose or the motion of the person, which can be used to recognize behavior in applications as gaming or physiology. For researchers investigating a relatively new field, the choice of features can be a trial-and-error approach. A prerequisite is to have a benchmark to test and validate on. For example, to see how a person consumes his lunch, a dataset can be constructed consisting of video streams and the ground truth labeling. With such a benchmark, the researcher can validate their behavior recognition system including their choice of features.
Many sensor and analysis systems are already available for researchers like eye trackers or facial expression analysis tools. The output of such systems can be combined to have a higher level of activity recognition. Of course, other types of sensor data like sound recordings, physiological data or accelerometer data can help in recognizing certain activities for specific applications. As an example, to capture stress levels of a knowledge worker behind a computer we can capture eye movements, facial expressions and other modalities that influence the stress level. This special session addresses the challenge of how to combine the features.


Time Authors Title
10:00-10:20 Arthur Truong, Hugo Boujut and Titus Zaharia. Institut Mines-Telecom, Telecom SudParis, France. Laban movement analysis for action recognition
10:20-10:40 Nico van der Aa, Coert van Gemeren, Lucas Noldus and Remco Veltkamp. Noldus IT. Articulated Tracking of Humans by Video Technology
10:40-11:10 Break  
11:10-11:30 Ronald Poppe and Mark Ter Maat. University of Twente, the Netherlands. Observing human behaviour to identify risk in task performance
11:30-11:50 Varun Kakra, Nico van der Aa, Lucas Noldus and Oliver Amft. Noldus IT, the Netherlands. A Multimodal Benchmark Tool for Automated Eating Behaviour Recognition
11:50-12:10 Ben Krose, Tim Van Oosterhout and Gwenn Englebienne. University of Amsterdam, the Netherlands. Video Surveillance for Behaviour Monitoring in Home Health Care