Human Pose Estimation by a Series of Residual Auto-Encoders

Farrajota, Miguel; Rodrigues, João; du Buf, Hans

http://hdl.handle.net/10400.1/13271

Use this identifier to reference this record.

Name:	Description:	Size:	Format:
13271.pdf		952.55 KB	Adobe PDF	Download

Send Feedback

Authors

Farrajota, Miguel

Rodrigues, João

du Buf, Hans

Abstract(s)

Pose estimation is the task of predicting the pose of an object in an image or in a sequence of images. Here, we focus on articulated human pose estimation in scenes with a single person. We employ a series of residual auto-encoders to produce multiple predictions which are then combined to provide a heatmap prediction of body joints. In this network topology, features are processed across all scales which captures the various spatial relationships associated with the body. Repeated bottom-up and top-down processing with intermediate supervision for each auto-encoder network is applied. We propose some improvements to this type of regression-based networks to further increase performance, namely: (a) increase the number of parameters of the auto-encoder networks in the pipeline, (b) use stronger regularization along with heavy data augmentation, (c) use sub-pixel precision for more precise joint localization, and (d) combine all auto-encoders output heatmaps into a single prediction, which further increases body joint prediction accuracy. We demonstrate state-of-the-art results on the popular FLIC and LSP datasets.