Farrajota, MiguelRodrigues, Joãodu Buf, HansAlexandre, L. A.Sanchez, J. S.Rodrigues, J. M. F.2019-11-202019-11-202017978-3-319-58838-4978-3-319-58837-70302-97431611-3349http://hdl.handle.net/10400.1/13271Pose estimation is the task of predicting the pose of an object in an image or in a sequence of images. Here, we focus on articulated human pose estimation in scenes with a single person. We employ a series of residual auto-encoders to produce multiple predictions which are then combined to provide a heatmap prediction of body joints. In this network topology, features are processed across all scales which captures the various spatial relationships associated with the body. Repeated bottom-up and top-down processing with intermediate supervision for each auto-encoder network is applied. We propose some improvements to this type of regression-based networks to further increase performance, namely: (a) increase the number of parameters of the auto-encoder networks in the pipeline, (b) use stronger regularization along with heavy data augmentation, (c) use sub-pixel precision for more precise joint localization, and (d) combine all auto-encoders output heatmaps into a single prediction, which further increases body joint prediction accuracy. We demonstrate state-of-the-art results on the popular FLIC and LSP datasets.engHuman Pose Estimation by a Series of Residual Auto-Encodersconference object10.1007/978-3-319-58838-4_15