Loading...
21 results
Search Results
Now showing 1 - 10 of 21
- Multi-scale lines and edges in V1 and beyond: brightness, object categorization and recognition, and consciousnessPublication . Rodrigues, J. M. F.; du Buf, J. M. H.In this paper we present an improved model for line and edge detection in cortical area V1. This model is based on responses of simple and complex cells, and it is multi-scale with no free parameters. We illustrate the use of the multi-scale line/edge representation in different processes: visual reconstruction or brightness perception, automatic scale selection and object segregation. A two-level object categorization scenario is tested in which pre-categorization is based on coarse scales only and final categorization on coarse plus fine scales. We also present a multi-scale object and face recognition model. Processing schemes are discussed in the framework of a complete cortical architecture. The fact that brightness perception and object recognition may be based on the same symbolic image representation is an indication that the entire (visual) cortex is involved in consciousness.
- Minimalistic vision-based cognitive SLAMPublication . Saleiro, Mário; Rodrigues, J. M. F.; du Buf, J. M. H.The interest in cognitive robotics is still increasing, a major goal being to create a system which can adapt to dynamic environments and which can learn from its own experiences. We present a new cognitive SLAM architecture, but one which is minimalistic in terms of sensors and memory. It employs only one camera with pan and tilt control and three memories, without additional sensors nor any odometry. Short-term memory is an egocentric map which holds information at close range at the actual robot position. Long-term memory is used for mapping the environment and registration of encountered objects. Object memory holds features of learned objects which are used as navigation landmarks and task targets. Saliency maps are used to sequentially focus important areas for object and obstacle detection, but also for selecting directions of movements. Reinforcement learning is used to consolidate or enfeeble environmental information in long-term memory. The system is able to achieve complex tasks by executing sequences of visuomotor actions, decisions being taken by goal-detection and goal-completion tasks. Experimental results show that the system is capable of executing tasks like localizing specific objects while building a map, after which it manages to return to the start position even when new obstacles have appeared.
- Object segregation and local gist vision using low-level geometryPublication . Martins, J. C.; Rodrigues, J. M. F.; du Buf, J. M. H.Multi-scale representations of lines, edges and keypoints on the basis of simple, complex, and end-stopped cells can be used for object categorisation and recognition. These representations are complemented by saliency maps of colour, texture, disparity and motion information, which also serve to model extremely fast gist vision in parallel with object segregation. We present a low-level geometry model based on a single type of self-adjusting grouping cell, with a circular array of dendrites connected to edge cells located at several angles. Different angles between active edge cells allow the grouping cell to detect geometric primitives like corners, bars and blobs. Such primitives forming different configurations can then be grouped to identify more complex geometry, like object shapes, without much additional effort. The speed of the model permits it to be used for fast gist vision, assuming that edge cells respond to transients in colour, texture, disparity and motion. The big advantage of combining this information at a low level is that local (object) gist can be extracted first, ie, which types of objects are about where in a scene, after which global (scene) gist can be processed at a semantic level.
- Cortical multiscale line-edge disparity modelPublication . Rodrigues, J. M. F.; Martins, Jaime; Lam, Roberto; du Buf, J. M. H.Most biological approaches to disparity extraction rely on the disparity energy model (DEM). In this paper we present an alternative approach which can complement the DEM model. This approach is based on the multiscale coding of lines and edges, because surface structures are composed of lines and edges and contours of objects often cause edges against their background. We show that the line/edge approach can be used to create a 3D wireframe representation of a scene and the objects therein. It can also significantly improve the accuracy of the DEM model, such that our biological models can compete with some state-of-the-art algorithms from computer vision.
- Multi-scale cortical keypoints for realtime hand tracking and gesture recognitionPublication . Farrajota, Miguel; Saleiro, Mário; Terzic, Kasim; Rodrigues, J. M. F.; du Buf, J. M. H.Human-robot interaction is an interdisciplinary research area which aims at integrating human factors, cognitive psychology and robot technology. The ultimate goal is the development of social robots. These robots are expected to work in human environments, and to understand behavior of persons through gestures and body movements. In this paper we present a biological and realtime framework for detecting and tracking hands. This framework is based on keypoints extracted from cortical V1 end-stopped cells. Detected keypoints and the cells’ responses are used to classify the junction type. By combining annotated keypoints in a hierarchical, multi-scale tree structure, moving and deformable hands can be segregated, their movements can be obtained, and they can be tracked over time. By using hand templates with keypoints at only two scales, a hand’s gestures can be recognized.
- A cortical framework for invariant object categorization and recognitionPublication . Rodrigues, J. M. F.; du Buf, J. M. H.In this paper we present a new model for invariant object categorization and recognition. It is based on explicit multi-scale features: lines, edges and keypoints are extracted from responses of simple, complex and endstopped cells in cortical area V1, and keypoints are used to construct saliency maps for Focus-of-Attention. The model is a functional but dichotomous one, because keypoints are employed to model the “where” data stream, with dynamic routing of features from V1 to higher areas to obtain translation, rotation and size invariance, whereas lines and edges are employed in the “what” stream for object categorization and recognition. Furthermore, both the “where” and “what” pathways are dynamic in that information at coarse scales is employed first, after which information at progressively finer scales is added in order to refine the processes, i.e., both the dynamic feature routing and the categorization level. The construction of group and object templates, which are thought to be available in the prefrontal cortex with “what” and “where” components in PF46d and PF46v, is also illustrated. The model was tested in the framework of an integrated and biologically plausible architecture.
- Recognition of facial expressions by cortical multi-scale line and edge codingPublication . Sousa, R.; Rodrigues, J. M. F.; du Buf, J. M. H.Empirical studies concerning face recognition suggest that faces may be stored in memory by a few canonical representations. Models of visual perception are based on image representations in cortical area V1 and beyond, which contain many cell layers for feature extraction. Simple, complex and end-stopped cells provide input for line, edge and keypoint detection. Detected events provide a rich, multi-scale object representation, and this representation can be stored in memory in order to identify objects. In this paper, the above context is applied to face recognition. The multi-scale line/edge representation is explored in conjunction with keypoint-based saliency maps for Focus-of-Attention. Recognition rates of up to 96% were achieved by combining frontal and 3/4 views, and recognition was quite robust against partial occlusions.
- A cortical framework for scene categorizationPublication . Rodrigues, J. M. F.; du Buf, J. M. H.Human observers can very rapidly and accurately categorise scenes. This is context or gist vision. In this paper we present a biologically plausible scheme for gist vision which can be integrated into a complete cortical vision architecture. The model is strictly bottom-up, employing state-of-the-art models for feature extractions. It combines five cortical feature sets: multiscale lines and edges and their dominant orientations, the density of multiscale keypoints, the number of consistent multiscale regions, dominant colours in the double-opponent colour channels, and significant saliency in covert attention regions. These feature sets are processed in a hierarchical set of layers with grouping cells, which serve to characterise five image regions: left, right, top, bottom and centre. Final scene classification is obtained by a trained decision tree.
- A disparity energy model improved by line, edge and keypoint correspondencesPublication . Martins, J. C.; Farrajota, Miguel; Lam, Roberto; Rodrigues, J. M. F.; Terzic, Kasim; du Buf, J. M. H.Disparity energy models (DEMs) estimate local depth information on the basis ofVl complex cells. Our recent DEM (Martins et al, 2011 ISSPlT261-266) employs a population code. Once the population's cells have been trained with randorn-dot stereograms, it is applied at all retinotopic positions in the visual field. Despite producing good results in textured regions, the model needs to be made more precise, especially at depth transitions.
- Disparity energy model with keypoint disparity validationPublication . Farrajota, Miguel; Martins, J. C.; Rodrigues, J. M. F.; du Buf, J. M. H.A biological disparity energy model can estimate local depth information by using a population of V1 complex cells. Instead of applying an analytical model which explicitly involves cell parameters like spatial frequency, orientation, binocular phase and position difference, we developed a model which only involves the cells’ responses, such that disparity can be extracted from a population code, using only a set of previously trained cells with random-dot stereograms of uniform disparity. Despite good results in smooth regions, the model needs complementary processing, notably at depth transitions. We therefore introduce a new model to extract disparity at keypoints such as edge junctions, line endings and points with large curvature. Responses of end-stopped cells serve to detect keypoints, and those of simple cells are used to detect orientations of their underlying line and edge structures. Annotated keypoints are then used in the leftright matching process, with a hierarchical, multi-scale tree structure and a saliency map to segregate disparity. By combining both models we can (re)define depth transitions and regions where the disparity energy model is less accurate.
- «
- 1 (current)
- 2
- 3
- »