Eero Simoncelli is an outstanding computational neuroscientist who is working to understand how sensory systems arrive at reliable interpretations of the world, allowing us to make predictions and perform difficult tasks with surprising accuracy. His work specifically aims to answer several key questions in this area: How do populations of neurons encode sensory information, and how do subsequent populations extract that information for recognition, decisions, and action? And from a more theoretical perspective, why do sensory systems use these particular representations, and how can we use these principles to design better man-made systems for processing sensory signals? Dr. Simoncelli uses a combination of computational theory and modeling, coupled with perceptual and physiological experiments, and has provided crucial insights that address these key questions in neuroscience.
One area of study by his group is the optimal encoding of visual information. Since the mid 1990's his group has developed successively more powerful models describing the statistical properties of local regions of natural images, using them in parallel to understand the structure and function of both visual and auditory neurons, and more recently pioneering the development of new forms of signal-adaptive representations. He has also contributed to the experimental characterization of neural and perceptual responses, and developed key models for understanding neuronal cell activities and neuronal mechanisms. Finally, he has worked on problems in optimal decoding and its relationship to human visual perception.
His work has been widely recognized by the scientific community. He has held the prestigious position of Howard Hughes Medical Institute (HHMI) Investigator since 2000. He has been awarded an NSF CAREER Award (1996-2000), a Sloan Research Fellowship (1998-2000), and was named a Hilgard Visiting Scholar at Stanford in 2013. In 2008, he was elected a Fellow of the Institute of Electrical and Electronic Engineers (IEEE). He serves as associate editor of the Annual Review of Vision Science, and is on the editorial board of the Journal of Vision. In the past, he has served as an associate editor IEEE Transactions on Image Processing and as a member of the Faculty of 1000 Theoretical Neuroscience section. In 2015, he was awarded an Engineering Emmy Award from the Television Academy, for his work on computational modeling of perceived visual quality of images.
Although his teaching has primarily focused on graduate-level education, he did teach the Undergraduate Tutorial Research course, and has mentored a number of undergraduates doing research projects in his group.
Our sensory systems provide us with a remarkably reliable interpretation of the world, allowing us to make predictions and perform difficult tasks with surprising accuracy. How do these capabilities arise from the underlying neural circuitry? Specifically, how do populations of neurons encode sensory information, and how do subsequent populations extract that information for recognition, decisions, and action? And from a more theoretical perspective, why do sensory systems use these particular representations, and how can we use these principles to design better man-made systems for processing sensory signals? Broadly speaking, my research aims to answer these questions, through a combination of computational theory and modeling, coupled with perceptual and physiological experiments. These endeavors can be categorized into three general classes.
Optimal Encoding of Visual Information. It has long been assumed that visual systems are adapted, at evolutionary, developmental, and behavioral timescales, to the images to which they are exposed. Since not all images are equally likely, it is natural to assume that the system use its limited resources to process best those images that occur most frequently. Thus, it is the statistical properties of the environment that are relevant for sensory processing. Such concepts are fundamental in engineering disciplines -- compression, transmission, and enhancement of images all rely heavily on statistical models. Since the mid 1990's we've developed successively more powerful models describing the statistical properties of local regions of natural images [ref1, ref2, ref3], demonstrated the power of these models by using them to develop state-of-the-art solutions to classical engineering problems of compression and noise removal, and using them in parallel to understand the structure and function of both visual and auditory neurons. These same ideas can be used to construct new nonlinear forms of image representation [ref] that provide a framework for assessing perceptual distortion [ref1, ref2, ref3]. In a related line of research, we've explored image representations that offer various forms of "invariance". This includes the development of the steerable pyramid image representation that serves as a substrate for most of our image processing and computer vision applications, as well as modeling the receptive fields of populations of neurons in primary visual cortex. Recent work includes the development of new forms of signal-adaptive representations [ref1, ref2, ref3].
Experimental Characterization of Neural and Perceptual Responses. Our models for sensory representations serve as precise instantiations of scientific hypotheses, and must therefore be tested and refined through comparison to experimental measurements. A component of our work is aimed at developing new experimental paradigms, including novel stimuli and analysis methods, for such experiments. In the retina we find that a "general linear model" (GLM), in which spiking responses arise from the superposition of a filtered stimulus signal, a feedback signal (embodying refractoriness and other forms of suppression excitation derived from the spike history), and a lateral connectivity signal (embodying influences from the spiking activity of other cells) provides a remarkably precise account of spike timing in populations of ganglion cells. In primary visual cortex (area V1), we've developed and fit a model that can capture the stimulus selectivity and gain control properties of a wide range of cells. In extrastriate cortex, we've developed and refined a model for motion representation in the middle temporal (MT) dorsal area. More recently, we've developed targeted stochastic motion stimuli that allow us to characterize the specific properties of individual MT neurons in terms of their V1 afferents. And by examining human detection performance for such stimuli, we have produced strong evidence for the existence of such mechanisms in the human visual system. We've also developed a model for the representation of visual texture, and used it to synthesize texture images that humans perceive as similar (we've developed analogous models for auditory textures). By coupling this model with known receptive field properties of neurons in the ventral stream (specifically, the growth of receptive field size with eccentricity), we've been able to generate new forms of stimuli that exhibit severe peripheral distortion (scrambling of visual patterns, and a complete loss of recognizability) but are indistinguishable from intact photographs ref. We've used perceptual experiments to determine the sizes of neural receptive fields underlying these ambiguities, which allows us to identify the locus of this representation as area V2. Physiological experiments are currently underway (in collaboration with the Movshon lab) to further elucidate these neural mechanisms.
Optimal Decoding & Perception Our everyday experience deludes us into believing that perception is a direct reflection of the physical world around us. But scientists have recognized for centuries that it is more akin to a process of inference, in which incoming measurements are fused with internal expectations. In the 20th century, this concept was formalized in theories of Bayesian statistical inference, and since the early 1990s, I've used this framework to understanding the means by which percepts arise from neural responses. An interesting example arises in the perception of retinal motion. If one assumes that the light intensity pattern falling on a local patch of retina is undergoing translational motion, that the neural representation of this information is noisy, and that in the absence of visual information, the distribution of retinal velocities that are typically encountered is broad but centered at zero (no motion), one can derive an optimal estimator for image velocity [ref1, ref2]. The resulting estimates are strongly biased toward slower speeds when the incoming stimulus is weakened (e.g., at low contrast). This behavior is also seen in humans, and we've used perceptual measurements to determine the internal preferences of human observers. We've obtained analogous results for human perception of local orientation, where observer preferences for horizontal and vertical orientations are well-matched to their prevalence in the natural world. The inferential computations required for these percepts are compatible with the simple neural models described above, and our current work (both theoretical and experimental) aims to elucidate the means by which prior preferences can be learned and embedded in neural populations.
Biography: I started my higher education as a physics major at Harvard, went to Cambridge University on a Knox Fellowship to study Mathematics for a year and a half, and then returned to the States to pursue a doctorate in Electrical Engineering and Computer Science at MIT. I received my Ph.D. in 1993, and joined the faculty of the Computer and Information Science department at U Pennsylvania. I came to NYU in September of 1996, as part of the Sloan Center for Theoretical Visual Neuroscience. I received an NSF Faculty Early Career Development (CAREER) grant in September '96, for research and teaching in "Visual Information Processing", and a Sloan Research Fellowship in February of 1998. In August 2000, I became an Investigator of the Howard Hughes Medical Institute, under their new program in Computational Biology.