VISIONTRAINComputational and Cognitive Vision Systems: A Training European Network

Visiontrain Thematic Schools Les Houches

Computational and Neurophysiological Models for Visual Perception

Organization | Progamme (download the presentations) | Titles and abstracts | Registration information | Download registration form

List of participants with email addresses | Photo

The school is hold at Les Houches Physics School from 25 (Sunday) to 30 (Friday) March 2007. Les Houches-Servoz, Mont-Blanc, France

The school is jointly organized by two FP6 European consortiae: The Marie-Curie network VISIONTRAIN and the IST STREP project POP (Perception on Purpose).

The scientific program involves both tutorial talks and research talks focusing on computational, neurobiological, and psychophysical methods and models of visual perception. Each day will focus on a particular topic as follows:

Monday: Vision and representation, Tuesday: Depth perception and stereopsis, Wednesday: Object representation and categorization, Thursday: Attention (linking sensing and perception), Friday: Vision and learning.

Invited speakers
Other speakers
  • Prof. Irving Biederman, University of Southern California, Los Angeles
  • Prof. Peter König, University of Osnabrück
  • Dr. Peter Neri, City University London
  • Prof. John Tsotsos, York University, Toronto
  • Prof. Shimon Ullman, Weizmann Institute, Rehovot
  • Radu Horaud, INRIA Rhône-Alpes, Montbonnot
  • Ales Leonardis, University of Ljubljana
  • Miles Hansard, INRIA Rhône-Alpes, Montbonnot
  • Jörg Hipp, University Hospital Hamburg-Eppendorf
  • Shay Ohayon, The Technion, Haifa
  • Ehud Rivlin, The Technion, Haifa

Organizers

Scientific coordination
Scientific committee
Administration
  • Ehud Rivlin
  • Ehud Rivlin
  • Radu Horaud
  • Ales Leonardis
  • Danièle Herzog
  • Carole Bienvenu

 

Programme at a glance

There will be a welcome party on Monday, March 26 at 7pm

Day
Morning (9-11)
Afternoon (4-6)
Afternoon (6:15 - 7:15)
Monday
Tuesday

M. Hansard & R. Horaud: Binocular gaze estimation (download the presentation)

J. Hipp: Basics of the noninvasive electrophysiological methods EEG and MEG (download the presentation)

Wednesday
Thursday
Friday
-
-

 

Titles and abstracts

P. König: The visual cortex

In this lecture we discuss the general outline of the mammalian visual system, give examples of characteristic steps in the processing hierarchy. Next we study properties and mechanisms of attention, and the control thereof at different processing levels.

I. Biederman: Neurocomputational basis of object and face recognition

Whereas people can readily describe how they are able to distinguish between two highly similar objects (such as birds on the same page in a bird guide), they are at a loss in describing the difference between the faces of Tom Cruise and John Travolta. A remarkably simple account—that the representation of faces relies on a stage that retains aspects of early cortical spatial (e.g., Gabor) filtering--may be able to explain the ineffability of faces and a wide variety of other phenomena distinguishing face from object recognition, such as why the recognition of faces, but not objects, is so severely disrupted by contrast negation (as when viewing a photographic negative) or orientation inversion and why the primary complaint of prosopagnosics is not that faces are blurry but that they all look the same. This account will be evaluated with respect to recent behavioral and fMRI studies in humans and single-unit recordings in primates.

P. König: The brain formula or the origin of invariant representations

Building on what we learned in the first lecture block, we investigate in how far we can describe processing in sensory systems by general principles. I'll present a specific hypothesis, which gives great weight to the statistical properties of natural stimuli and unsupervised learning mechanisms.

P. Neri: Sterescopic look at the visual cortex

This talk will start with a brief general introduction on stereopsis, and will then move on to discuss three important topics in the field: the binocular energy model and its implementation in V1, the transition from V1 to immediately adjacent prestriate cortex (V2), and the organization of extrastriate cortex with respect to the coding of absolute versus relative disparity. Two of these studies recorded from single neurons in two different visual areas of the monkey brain, one (V5/MT) in dorsal and one (V4) in ventral cortex. While V5/MT neurons respond similarly to neurons in primary visual cortex (V1), V4 neurons appear to reflect a more advanced stage in the analysis of retinal disparity, closer to the perceptual experience of stereoscopic depth. Both studies are consistent with a third study using fMRI to address similar questions in humans. Together with previous evidence, these results suggest a new framework for understanding stereoscopic processing based on the separation between ventral and dorsal streams in visual cortex.

M. Hansard & R. Horaud: Binocular gaze estimation

Binocular image-pairs contain information about the three-dimensional structure of the visible scene, which can be recovered by the identification of corresponding points. However, the resulting disparity field also depends on the orientation of the eyes. If it is assumed that the exact eye-positions cannot be obtained from oculomotor feedback, then the gaze parameters must also be recovered from the images, in order to properly interpret the retinal disparity field.

Existing models of biological stereopsis have addressed this issue independently of the binocular-correspondence problem. It has been correctly assumed that if the correspondence problem can be solved, then the disparity field can be decomposed into gaze and structure components, as described above. In this work we take a different approach; we emphasize that although the complete point-wise disparity field is sufficient for gaze estimation, it is not in fact necessary. We show that the gaze parameters can be recovered directly from the images, independently of the point-wise correspondences.

The relationship between binocular vergence and the resulting epipolar geometry is derived. Our algorithm is then based on the simultaneous representation of all epipolar geometries that are feasible with respect to a fixating oculomotor system. This is done in an essentially two-dimensional space, parameterized by azimuth and viewing-distance. We define a cost function that measures the compatibility of each geometry with respect to the observed images. The true gaze parameters are estimated by a simple voting-scheme, which runs in parallel over the parameter space. We describe an implementation of the algorithm, and show results obtained from real images.

Our algorithm requires binocular units with large receptive-fields, such as those found in area MT. The model is also consistent with the finding that depth-judgments can be biased by microstimulation in MT; if the artificial signal generates an `incorrect' set of gaze parameters, then we would expect the subsequent interpretation of the disparity field to be biased. Our model could be tested using binocular stimuli based on the patterns of disparity that we describe. We note that these patterns are geometrically analogous to parametric motion fields. It has already been shown that such flow-fields are effective stimuli for motion-sensitive cells in area MST; we predict an analogous binocular `gaze-tuning' in the extrastriate cortex.

J. Hipp: Basics of the noninvasive electrophysiological methods EEG and MEG

P. Neri: Visual discrimination of interacting human agents

Although the ability to interpret and predict other people's actions is believed to play a central role in human cognitive behavior, there is no direct evidence that this ability confers a tangible benefit to sensory
processing. This talk describes quantitative behavioral experiments showing that visual discrimination of a human agent is influenced by the presence of a second agent. This effect depends on whether the two agents interact (by fighting or dancing) in a meaningful synchronized fashion that allows the actions of one agent to serve as predictors for the expected actions of the other agent, even though synchronization is irrelevant to the visual discrimination task. These results demonstrate that action understanding has a pervasive impact on the human ability to extract visual information from the actions of other humans, providing quantitative evidence of its significance for sensory performance.

S. Ullman: Object classification by informative feature hierarchies

The talks will describe a general approach to visual classification, recognition and segmentation. The approach is based on representing shapes within a class by a hierarchy of shared sub-structures called fragments. The fragments are sub-images selected automatically from a training set of images, by maximizing the mutual information of the fragments and the class they represent. By a repeated application of the same feature extraction process, the classification fragments are broken down successively into their own optimal components. The resulting feature hierarchy is used to classify new images by the application of a feed-forward sweep from low to high levels of the hierarchy, followed by a sweep from the high to low levels. The first talk will describe the hierarchical feature representation, how it is learned from example, and how it is used to classify objects as well as their parts and sub-parts at different levels.

I. Biederman: Dynamic binding in a neural network for shape recognition

Almost 20 years ago, a proposal was advanced that a considerable range of behavioral phenomena associated with human object recognition can be understood in terms of a representation positing an arrangement of simple part shapes distinguished by viewpoint invariant properties (= geons). How well does this proposal stand up given recent research on optical imaging and single cell recordings in primates and behavioral and fMRI studies in humans? Issues examined will include the extent to which the representation specifies parts rather than local features or templates, nonaccidental vs. metric properties, simple vs. irregular, complex shapes, and dimensions of generalized cones (such as axis curvature and expansion).

S. Ullman: Combined classification, recognition and segmentation

The talk will discuss individual object recognition, segmentation, and biological aspects of the scheme. For the task of individual recognition, the features at each node in the hierarchy are generalized to become extended fragments, which are equivalence sets of features, representing the same object part under different viewing conditions.

Finally, image segmentation into an object and background is combined in this approach with the classification process. This is in contrast with the more common view, in which image segmentation is performed first, in a bottom-up manner, followed by object recognition. I will describe computational aspects of these processes, and discuss some relations to the human visual system.

J. Tsotsos: A brief, selective history of visual attention: experiments and models

This presentation will briefly describe the key experimental observations that have contributed to our understanding of visual attention. Due to the enormous literature much will be omitted unfortunately. The experimental work will be tied to the development of theories and models of attention and a taxonomy of research will be presented that covers computational, mathematical, and algorithmic models and well as models that fall more directly into computer vision. The presentation will conclude by suggesting future lines of research.

J. Tsotsos: The selective tuning model of visual attention

The ST model will be overviewed in detail. Its relationship to biological vision will also be detailed and many examples of its performance will be shown. A number of our own human experimental studies will be shown that support that basic predictions of the model.

S. Ohayon & E. Rivlin: Interactions between scene content and visual attention in non-primates

In this presentation, I will discuss a recent study which investigates interactions between barn owl's overt visual attention and scene content. Trivial concepts in the research of primates, such as the existence of a fovea or eye movements (saccades), had to be re-evaluated and re-examined to suit owls' distinct anatomical and physiological visual system. Through behavioral experiments with a camera mounted on the owl's head, we discovered striking similarities to human fixations. We hypothesize that primates and barn owls may share underlying mechanisms guiding attention. Since this is an on-going research, I will conclude my presentation with intended future lines of research.

A. Leonardis: Hierarchically Learned Representations of Object Categories: From Pixels towards Semantic Parts

The question how to represent visual information in an artificial cognitive system to enable fast and reliable execution of various cognitive tasks has been discussed throughout the history of computer vision. The theories have converged towards hierarchical architectures of parts composed of parts, (the so-called compositional systems), starting with simple, frequent features that are gradually combined into more and more complex entities. However, the automatic design of parts in hierarchical layers has been hindered by a theoretically enormous number of possible compositions. In this talk, I will describe a novel approach that overcomes the exponential complexity of unsupervised learning by exploiting the favorable statistics of natural images in a sequential, hierarchical manner. The parts recovered in the individual layers of the hierarchy vary from simple to more complex ones and enable a fast indexing (bottom-up) and matching (top-down) scheme that can be efficiently used for a variety of cognitive tasks. I will show the results of the proposed approach obtained on different data sets, yielding important insights for designing compositional systems.

Registration (Download the registration form)

Registration fees (including full accomodation from Sunday evening before dinner to Friday after lunch) :