Loading...
18 results
Search Results
Now showing 1 - 10 of 18
- The SmartVision navigation prototype for the blindPublication . du Buf, J. M. H.; Rodrigues, J. M. F.; Paredes, Hugo; Barroso, João; Farrajota, Miguel; José, João; Teixeira, Victor; Saleiro, MárioThe goal of the project "SmartVision: active vision for the blind" is to develop a small and portable but intelligent and reliable system for assisting the blind and visually impaired while navigating autonomously, both outdoor and indoor. In this paper we present an overview of the prototype, design issues, and its different modules which integrate a GIS with GPS, Wi-Fi, RFID tags and computer vision. The prototype addresses global navigation by following known landmarks, local navigation with path tracking and obstacle avoidance, and object recognition. The system does not replace the white cane, but extends it beyond its reach. The user-friendly interface consists of a 4-button hand-held box, a vibration actuator in the handle of the cane, and speech synthesis. A future version may also employ active RFID tags for marking navigation landmarks, and speech recognition may complement speech synthesis.
- A biological and real-time framework for hand gestures and head posesPublication . Saleiro, Mário; Farrajota, Miguel; Terzic, Kasim; Rodrigues, J. M. F.; du Buf, J. M. H.Human-robot interaction is an interdisciplinary research area that aims at the development of social robots. Since social robots are expected to interact with humans and understand their behavior through gestures and body movements, cognitive psychology and robot technology must be integrated. In this paper we present a biological and real-time framework for detecting and tracking hands and heads. This framework is based on keypoints extracted by means of cortical V1 end-stopped cells. Detected keypoints and the cells’ responses are used to classify the junction type. Through the combination of annotated keypoints in a hierarchical, multi-scale tree structure, moving and deformable hands can be segregated and tracked over time. By using hand templates with lines and edges at only a few scales, a hand’s gestures can be recognized. Head tracking and pose detection are also implemented, which can be integrated with detection of facial expressions in the future. Through the combinations of head poses and hand gestures a large number of commands can be given to a robot.
- Human pose and action recognitionPublication . Farrajota, Miguel; du Buf, J. M. H.; Rodrigues, J. M .F.This thesis focuses on detection of persons and pose recognition using neural networks. The goal is to detect human body poses in a visual scene with multiple persons and to use this information in order to recognize human activity. This is achieved by rst detecting persons in a scene and then by estimating their body joints in order to infer articulated poses. The work developed in this thesis explored neural networks and deep learning methods. Deep learning allows to employ computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have greatly improved the state-of-the-art in many domains such as speech recognition and visual object detection and classi cation. Deep learning discovers intricate structure in data by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation provided by the previous one. Person detection, in general, is a di cult task due to a large variability of representation due to di erent factors such as scales, views and occlusion. An object detection framework based on multi-stage convolutional features for pedestrian detection is proposed in this thesis. This framework extends the Fast R-CNN framework for the combination of several convolutional features from di erent stages of a CNN (Convolutional Neural Network) to improve the detector's accuracy. This provides high quality detections of persons in a visual scene, which are then used as input in conjunction with a human pose estimation model in order to estimate human body joint locations of multiple persons in an image. Human pose estimation is done by a deep convolutional neural network composed of a series of residual auto-encoders. These produce multiple predictions which are later combined to provide a heatmap prediction of human body joints. In this network topology, features are processed across all scales capturing the various spatial relationships associated with the body. Repeated bottom-up and top-down processing with intermediate supervision for each auto-encoder network is applied. This results in very accurate 2D heatmaps of body joint predictions. The methods presented in this thesis were benchmarked against other topperforming methods on popular datasets for human pedestrian and pose estimation, achieving good results compared with other state-of-the-art algorithms.
- Multi-scale cortical keypoints for realtime hand tracking and gesture recognitionPublication . Farrajota, Miguel; Saleiro, Mário; Terzic, Kasim; Rodrigues, J. M. F.; du Buf, J. M. H.Human-robot interaction is an interdisciplinary research area which aims at integrating human factors, cognitive psychology and robot technology. The ultimate goal is the development of social robots. These robots are expected to work in human environments, and to understand behavior of persons through gestures and body movements. In this paper we present a biological and realtime framework for detecting and tracking hands. This framework is based on keypoints extracted from cortical V1 end-stopped cells. Detected keypoints and the cells’ responses are used to classify the junction type. By combining annotated keypoints in a hierarchical, multi-scale tree structure, moving and deformable hands can be segregated, their movements can be obtained, and they can be tracked over time. By using hand templates with keypoints at only two scales, a hand’s gestures can be recognized.
- Region segregation by linking keypoints tuned to colourPublication . Farrajota, Miguel; Rodrigues, J. M. F.; du Buf, J. M. H.Coloured regions can be segregated from each other by using colour-opponent mechanisms, colour contrast, saturation and luminance. Here we address segmentation by using end-stopped cells tuned to colour instead of to colour contrast. Colour information is coded in separate channels. By using multi-scale cortical endstopped cells tuned to colour, keypoint information in all channels is coded and mapped by multi-scale peaks. Unsupervised segmentation is achieved by analysing the branches of these peaks, which yields the best-fitting image regions. Copyright © 2014 SCITEPRESS.
- Human Pose Estimation by a Series of Residual Auto-EncodersPublication . Farrajota, Miguel; Rodrigues, João; du Buf, Hans; Alexandre, L. A.; Sanchez, J. S.; Rodrigues, J. M. F.Pose estimation is the task of predicting the pose of an object in an image or in a sequence of images. Here, we focus on articulated human pose estimation in scenes with a single person. We employ a series of residual auto-encoders to produce multiple predictions which are then combined to provide a heatmap prediction of body joints. In this network topology, features are processed across all scales which captures the various spatial relationships associated with the body. Repeated bottom-up and top-down processing with intermediate supervision for each auto-encoder network is applied. We propose some improvements to this type of regression-based networks to further increase performance, namely: (a) increase the number of parameters of the auto-encoder networks in the pipeline, (b) use stronger regularization along with heavy data augmentation, (c) use sub-pixel precision for more precise joint localization, and (d) combine all auto-encoders output heatmaps into a single prediction, which further increases body joint prediction accuracy. We demonstrate state-of-the-art results on the popular FLIC and LSP datasets.
- A disparity energy model improved by line, edge and keypoint correspondencesPublication . Martins, J. C.; Farrajota, Miguel; Lam, Roberto; Rodrigues, J. M. F.; Terzic, Kasim; du Buf, J. M. H.Disparity energy models (DEMs) estimate local depth information on the basis ofVl complex cells. Our recent DEM (Martins et al, 2011 ISSPlT261-266) employs a population code. Once the population's cells have been trained with randorn-dot stereograms, it is applied at all retinotopic positions in the visual field. Despite producing good results in textured regions, the model needs to be made more precise, especially at depth transitions.
- Disparity energy model with keypoint disparity validationPublication . Farrajota, Miguel; Martins, J. C.; Rodrigues, J. M. F.; du Buf, J. M. H.A biological disparity energy model can estimate local depth information by using a population of V1 complex cells. Instead of applying an analytical model which explicitly involves cell parameters like spatial frequency, orientation, binocular phase and position difference, we developed a model which only involves the cells’ responses, such that disparity can be extracted from a population code, using only a set of previously trained cells with random-dot stereograms of uniform disparity. Despite good results in smooth regions, the model needs complementary processing, notably at depth transitions. We therefore introduce a new model to extract disparity at keypoints such as edge junctions, line endings and points with large curvature. Responses of end-stopped cells serve to detect keypoints, and those of simple cells are used to detect orientations of their underlying line and edge structures. Annotated keypoints are then used in the leftright matching process, with a hierarchical, multi-scale tree structure and a saliency map to segregate disparity. By combining both models we can (re)define depth transitions and regions where the disparity energy model is less accurate.
- The smart vision local navigation aid for blind and visually impaired personsPublication . José, J.; Farrajota, Miguel; Rodrigues, J. M. F.; du Buf, J. M. H.The SmartVision prototype is a small, cheap and easily wearable navigation aid for blind and visually impaired persons. Its functionality addresses global navigation for guiding the user to some destiny, and local navigation for negotiating paths, sidewalks and corridors, with avoidance of static as well as moving obstacles. Local navigation applies to both in- and outdoor situations. In this article we focus on local navigation: the detection of path borders and obstacles in front of the user and just beyond the reach of the white cane, such that the user can be assisted in centering on the path and alerted to looming hazards. Using a stereo camera worn at chest height, a portable computer in a shoulder-strapped pouch or pocket and only one earphone or small speaker, the system is inconspicuous, it is no hindrence while walking with the cane, and it does not block normal surround sounds. The vision algorithms are optimised such that the system can work at a few frames per second.
- The SmartVision Navigation Prototype for Blind UsersPublication . du Buf, J. M. H.; Barroso, João; Rodrigues, J. M. F.; Paredes, Hugo; Farrajota, Miguel; Fernandes, Hugo; José, João; Teixeira, Victor; Saleiro, MárioThe goal of the Portuguese project "SmartVision: active vision for the blind" is to develop a small, portable and cheap yet intelligent and reliable system for assisting the blind and visually impaired while navigating autonomously, both in- and outdoor. In this article we present an overview of the prototype, design issues, and its different modules which integrate GPS and Wi-Fi localisation with a GIS, passive RFID tags, and computer vision. The prototype addresses global navigation for going to some destiny, by following known landmarks stored in the GIS in combination with path optimisation, and local navigation with path and obstacle detection just beyond the reach of the white cane. The system does not replace the white cane but complements it, in order to alert the user to looming hazards. In addition, computer vision is used to identify objects on shelves, for example in a pantry or refrigerator. The user-friendly interface consists of a four-button hand-held box, a vibration actuator in the handle of the white cane, and speech synthesis. In the near future, passive RFID tags will be complemented by active tags for marking navigation landmarks, and speech recognition may complement or substitute the vibration actuator.