According to Plato's allegory, we are all sitting in a cave and only see the shadows of reality. What would it be like if those shadows were explained to us?
Augmented reality promises to expand the real world using supplementary information, much like a footnote in a book, a highlighted sentence or subtitles for a foreign-language film – these are all forms of augmentation.
It will be interesting to see what happens when augmented reality is combined with standard glasses in the form of – somewhat clunky – head mounted devices. The first such attempt to attract media attention was Google Glass, a product which supplemented a person's perception of everyday reality with 'footnotes.' If this information is in direct vicinity to the annotated objects, then this is called contact-analog superimposition. This technology requires a much larger field of view which corresponds to a person's visual field – an advancement Google Glass was able to offer. But there's a catch: the person wearing the glasses must come to terms with the new visual experience. We'll come back to this point later.
If it is already possible to annotate reality, then the step to mixed reality can't be far behind. Older readers probably remember the film Who Framed Roger Rabbit – this is an early example of cartoon characters coming to life in the real world. Augmentation not only offers an explanation of what is visible, but also augments real – but invisible – objects. This technology can be quite practical: imagine using augmented reality to try out new furniture or new shoes virtually. And it would undoubtedly be exciting to find virtual objects suddenly embedded in the real world. But anyone not wearing AR glasses would probably wonder what on earth is going on because outsiders don't perceive the virtual world. It's a bit like that invisible rabbit in the film Harvey! But let's return to the technical details.
The human eye is a true marvel of nature. We distinguish between the visual field of an unmoving eye and the field of vision of a moving eye. With unmoving eyes, you can perceive approximately 180° horizontally and 135° vertically. If you move your eyes, then you perceive an angle of approximately 210° horizontally – and you can even look slightly behind you. Yet we only have clear vision in a range of approximately 2° – this is referred to as foveal vision. For example: when reading, you move your eyes and 'scan' each letter along the lines. Visual acuity diminishes considerably outside of the fovea. Our brain makes us believe that we see a world which we actually do not see at all. Peripheral vision is based partly on hypotheses – and this is a weakness magicians love to take advantage of. As soon as there are light stimuli or changes in the peripheral area – such as in a magic show –, we look in that direction by turning either our eyes or our head about 15°. As you might expect, head mounted devices rotate along with the wearer's head.
The maximum mobility of the human eye is about 50° in each direction of rotation. This determines a person's field of clear vision if they do not move their head. However, we seldom utilize this field of vision and instead prefer to turn our head 15°. Looked at from this perspective, a field of view on an AR device would only have to be about 30°. But, as mentioned above, the field of perception is horizontally significantly larger – approximately 210°. Thus it is no wonder that several testers found the field of view on the Microsoft Hololens – at around 35° (diagonally) – too small. Pure virtual reality headsets, such as the ZEISS VR ONE, already offer a lot more. Multi-diagonal fields of view of around 100° are normal – of course without a direct view of the surroundings. Currently, not many people have experienced virtual reality headsets, and, of course, the appropriate field of view on a pair of augmented reality glasses is primarily determined by the application. A small field of view is sufficient for footnotes, news and unimpaired vision, but a larger field of view is required for full immersion in a virtual environment.
To demonstrate this difficulty, let us make a 100° field of view the goal for AR headsets. In principle you can see clearly by rotating your eyes all the way to the edge of your field of view. However, your pupils – i.e. your own entrance aperture for light – are sweeping across a relatively large surface. This surface, referred to as the eyebox on an optical instrument, should be filled with light everywhere. Otherwise, at least parts of the virtual image disappear when you move your eyes. You can see this on a basic pair of binoculars: these have an eyebox of just a few millimeters, which is generally too small. But with binoculars this does not matter because you can move them relative to your head. If you move your eyes e.g. to look up or to the left, then with just a bit of practice you will automatically move the binoculars as well. Viewfinder cameras work in much the same way, but it is different for head mounted devices. An eyebox with a diagonal of over 2 cm would be ideal for a 100° field of view – that's huge.
When we look at objects, our eyes usually focus on – or accommodate – the object without us noticing. This process is not very important when looking at objects that are far away, but things become more complicated if we want to bring virtual objects into our immediate vicinity. Does a person have to project the virtual image positions of the objects at an appropriate distance so that they see the objects clearly at the distance required? Keep in mind that our two eyes do not only refocus: they also move towards each other to place the object being observed at the intersection point of the lines of sight for the two eyes. This process is called convergence and prevents double images. Here is a simple example you can try out yourself: hold out your arm in front of you with a pen in your hand. Look at a distant point behind the pen. You will see two pens. This is called binocular disparity. When we perceive objects in our immediate surroundings in 3D vision, our eyes converge in line with the refocusing process to minimize disparities between the two images we perceive with our eyes. This is called the accommodation convergence reflex. This can be measured because this reflex causes the interpupillary distance of both eyes to change by a good 5 mm. What does all this have to do with head mounted displays? If we want to look at an object at infinity, then the two visual axes are parallel, as are the light rays entering our two eyes. If we focus on a virtual object up close, the visual axes move towards each other and the light rays consequently come from different directions for our two eyes. Any calculations must allow for this. It is generally thought that any failure to take this into account can lead to a feeling of nausea because the brain cannot match the space of the projection with the expected distance at which such spaces are normally perceived. Moreover, the accommodation in the near zone must be adjusted.
But our vision poses even more challenges. We can see up to 1.3 line pairs clearly per minute of angle. On a 16:9 field of view with a 100° diagonal, this means approximately 35 megapixels x three colors which must be provided with image information. This corresponds approximately to a 8K video format. The ambient brightness during the day is between 2000 candela/m2 and 8000 candela /m2. The virtual information is superimposed on the ambient light and – so that we are able to see at all – should be just as bright. If we include the transmission losses in the Head Up System, then we've quickly reached displays which offer more than 10000cd/m2 of brightness and approximately 150 mW of moderate light intensity. This can only be achieved using lasers, which the self-illuminating displays currently available are far from being able to provide. These displays only enable the use of AR system in a darker environment. Older readers remember sitting down in front of CRT TVs in the evening: back then you could only watch TV once the sun had set or you had to be in a darkened room because TVs only had a brightness of 200cd/m2. It is a small consolation that their use in a dark environment accommodates the necessary image frequency because – in dark rooms – the flicker fusion threshold of approx. 90 hertz sinks to approx. 25 hertz in daylight conditions – as expressed in Porter's law.
These are only a few of the issues related to augmented reality. We still have not examined e.g. why you need a very good model of the environment and high computing power in order to position virtual objects in context with actual objects in real time. However, one thing is clear: we can certainly look forward to further technical advances and can be sure that our eyes will adjust accordingly. Returning to Plato's metaphor: we will have to decide if a small data field is sufficient to explain the shadows or if we would prefer to head to the pub with a 2-meter-tall large rabbit by the name of Harvey...