Marr’s Theory of Vision

David Marr (1945-1980), British neuroscientist and physiologist, developed one of the most known theories about the way our vision works. According to him, we receive an input in a similar way to an image in our retina, and this will then be processed by our brain in three different steps.

Primal Sketch

When you first receive an image of an object, your retina makes a simple register of the light intensity in each point of said image. That’s when you start detecting the vertices, the boundaries between objects, shapes, spots of union… Resuming, you get the fundamental points of the object, but not the depth or textures yet. This primal sketch is totally dependent on which perspective the object is presented to you at.

The 2.5D Sketch

The second step is still dependent on the perspective in which the object is presented to you at, yet, now you have more information to work with, such as the textures as well as depth and you can start to see how the visible surfaces of said object relate to each another.

3D Model

In this last step, you’ll be able to get the information which is not dependant on the perspective. These representations are built from the visual inputs and compared with the representations stored in your Long-term Memory. At this point is where object recognition will take place.


Marr’s theory is able to tie together the initial vision and the eventual object recognition, providing us a theory that can explain the whole system of recognition and identification of objects and patterns visually presented to us.


Perception: What you see and what you think you see

Perception is the significance you give to what you see or feel, the stimuli you receive from your senses, the significance you give to the objects, the way you see or feel them and the way you put them into space.

Vision is a very important sense for most of the people, it’s through your eyes that you’ll recognize the world and receive information. However, you must be able to represent this information or it won’t be enough.

In order to recognize an object, there are a lot of systems working closely together in your mind and, sometimes, all it takes is a change in perspective for you not to be able to recognize a once familiar object. So, what makes you recognize the objects and the world around you?

Perception refers to a representation of stimuli based on a pattern, it will provide us the possibility of recognizing an object and detect its location in the space. It’s an active process of our mind, where our former knowledge and information are used for the recognition of other object and entities.


Visual Cognition

The visual recognition is, as you might imagine, quite complex, so what are the cognitive processes involved in it?

There are three major theories that try to explain the way our brain makes these recognitions: Template-matching Theory, Conceptual Categories and Structural Models.

According with the Template-matching Theory, you have images from all objects in your mind and when you receive any new stimuli, you’ll search your memory for an image that matches with the current one. As you see it’s a very simple theory (too simple) and because of that, it raises a lot of questions. It cannot explain how you recognize an object given a specifically unfamiliar perspective or how you process such a large amount of information so quickly. It’s inflexible and inherently flawed.

The Conceptual Categories Theory talks about concepts that will be mental representations of the objects. According to this theory, our brain will group the concepts into categories, which can be subordinated from one another. The subordinated categories will inherit the properties of superordinate categories and have some new properties of their own.

This theory isn’t free of problems and one of the major ones is the lack of formal definition from some of its concepts and the very thin lines and frail differences between some of the aforementioned conceptual categories.

In regards to Structural Models, there are two main theories here, the Geon Theory and the Recognition-by-Components Theory, both by Biederman. The Geon Theory defends that there are primitive forms which are regrouped according to specific rules and the object is the spacial arrangement of its geons, as you can see in the image below. It’s a very simple theory and, because of that, reductive in its nature.

So, Biederman elaborated the Recognition-by-Components Theory (RBC), according to which, our brain will detect the non-accidental properties (the ones that do not change much, even given different perspectives) of the object and its concavities. From these two elements, the human brain will be able to determine the components which will be compared to the object representations in our memory.