Repository logo
 
Loading...
Profile Picture
Person

Farrajota, Miguel

Search Results

Now showing 1 - 1 of 1
  • Human pose and action recognition
    Publication . Farrajota, Miguel; du Buf, J. M. H.; Rodrigues, J. M .F.
    This thesis focuses on detection of persons and pose recognition using neural networks. The goal is to detect human body poses in a visual scene with multiple persons and to use this information in order to recognize human activity. This is achieved by rst detecting persons in a scene and then by estimating their body joints in order to infer articulated poses. The work developed in this thesis explored neural networks and deep learning methods. Deep learning allows to employ computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have greatly improved the state-of-the-art in many domains such as speech recognition and visual object detection and classi cation. Deep learning discovers intricate structure in data by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation provided by the previous one. Person detection, in general, is a di cult task due to a large variability of representation due to di erent factors such as scales, views and occlusion. An object detection framework based on multi-stage convolutional features for pedestrian detection is proposed in this thesis. This framework extends the Fast R-CNN framework for the combination of several convolutional features from di erent stages of a CNN (Convolutional Neural Network) to improve the detector's accuracy. This provides high quality detections of persons in a visual scene, which are then used as input in conjunction with a human pose estimation model in order to estimate human body joint locations of multiple persons in an image. Human pose estimation is done by a deep convolutional neural network composed of a series of residual auto-encoders. These produce multiple predictions which are later combined to provide a heatmap prediction of human body joints. In this network topology, features are processed across all scales capturing the various spatial relationships associated with the body. Repeated bottom-up and top-down processing with intermediate supervision for each auto-encoder network is applied. This results in very accurate 2D heatmaps of body joint predictions. The methods presented in this thesis were benchmarked against other topperforming methods on popular datasets for human pedestrian and pose estimation, achieving good results compared with other state-of-the-art algorithms.