Digital Worlds – The Blogged Uncourse

Digital Worlds – Interactive Media and Game Design was originally developed as a free learning resource on computer game design, development and culture, authored as part of an experimental approach to the production of online distance learning materials. Many of the resources presented on this blog also found their way into a for credit, formal education course from the UK’s Open University.

This blog was rebooted at the start of summer 2016 to act as a repository for short pieces relating to mixed and augmented reality, and related areas of media/reality distortion, as preparation for a unit on the subject in a forthcoming first level Open University course.

Augmented TV Sports Coverage

itIn the post From Magic Lenses to Magic Mirrors and Back Again we saw how magic lenses allow users to look through a screen at a mediated view of the scene in front of them, and magic mirrors allow users to look at a mediated view of themselves. In this post, we will look at how remote viewer might capture a scene that is then mediated in some way before being presented to the viewer in near-real-time. In particular, we will consider how live televised sporting events may be augmented to enhance the viewer’s understanding or appreciation of the event.

Ever since the early days of television, TV graphics have been used to overlay information – often in the “lower third” of the screen – to provide a mediated view of the scene being displayed. For example, one of the most commonly scene lower third effects is to display a banner giving the name and affiliation of a “talking head”, such as a politician being interviewed in a news programme.

But in recent years, realtime annotation of elements within the visual scene have become possible, providing the producers of sports television in particular with a very rich and powerful way of enhancing the way that a particular event is covered.

EXERCISE: from your own experience, try to recall two or three examples of how “augmented reality” style effects can be used to enhance televised sporting events in a real-time or near-realtime way.

Educators often use questions to focus the attention of the learner onto a particular matter. For example, an educator reading an academic paper may identify things of interest (to them) that they want the learner to pick up on. The educator then needs to find a way of twisting the attention of the learner to those points of interests. This is often what motivates the questions they set around a resource (its purpose is to help the students learn how to focus their attention on a resource and immediately reflect back why something in the paper might be interesting – by casting a question to which the item in the paper is the answer). When addressing a question, the learner also needs to appreciate that they expected to answer the question in an academic way. More generally, when you read something, read it with a set of questions in mind that may have been raised by reading the abstract. You can also annotate the reading with questions which that part of the reading answers. Another trick is to spot when part of the reading answers a question or addresses a topic you didn’t fully understand: “Ah, so that means if this, then that…”. This is  a simple trick, but a really powerful one nonetheless, and can help you develop your own self-learning skills.

EXERCISE: Read through the following abstract taken from a BBC R&D department white paper written in 2012 (Sports TV Applications of Computer Vision, riginally published in ‘Visual Analysis of Humans: Looking at People’, Moeslund, T. B.; Hilton, A.; Krüger, V.; Sigal, L. (Eds.), Springer 2011):

This chapter focuses on applications of Computer Vision that help the sports broadcaster illustrate, analyse and explain sporting events, by the generation of images and graphics that can be incorporated in the broadcast, providing visual support to the commentators and pundits. After a discussion of simple graphics overlay on static images, systems are described that rely on calibrated cameras to insert graphics or to overlay content from other images. Approaches are then discussed that use computer vision to provide more advanced effects, for tasks such as segmenting people from the background, and inferring the 3D position of people and balls. As camera calibration is a key component for all but the simplest applications, an approach to real-time calibration of broadcast cameras is then presented. The chapter concludes with a discussion of some current challenges.

How might the techniques described be relevant to / relate to AR?

Now read through the rest of the paper, and try to answer the following questions as you do so:

  • what is a “free viewpoint”?
  • what is a “telestrator” – to what extent might you claim this is an example of AR?
  • what approaches were taken to providing “Graphics overlay on a calibrated camera image”? How does this compare with AR techniques? Is this AR?
  • what is Foxtrax and how does it work?
  • what effects are possible once you “segment people or other moving objects from the background”? What practical difficulties must be overcome when creating such an effect?
  • how might prior knowledge help when constructing tracking systems? What additional difficulties arise when tracking people?
  • how can environmental features/signals be used to help calibrate camera settings? what does it even mean to calibrate a camera?
  • what difficulties are associated with  Segmentation, identification and tracking?

The white paper also identifies the following challenges to “successfully applying computer vision techniques to applications in TV sports coverage”:

The environment in which the system is to be used is generally out of the control of the system developer, including aspects such as lighting, appearance of the background, clothing of the players, and the size and location of the area of interest. For many applications, it is either essential or highly desirable to use video feeds from existing broadcast cameras, meaning that the location and motion of the cameras is also outside the control of the system designer.

  • The system needs to fit in with existing production workflows, often needing to be used live or with a short turn-around time, or being able to be applied to a recording from a single camera.
  • The system must also give good value-for-money or offer new things compared to other ways of enhancing sports coverage. There are many approaches that may be less technically interesting than applying computer vision techniques, but nevertheless give significant added value, such as miniature cameras or microphones placed in a in cricket stump, a ‘flying’ camera suspended on wires above a football pitch, or a high frame-rate cameras for super-slow-motion.

To what extent do you think those sorts of issues apply more generally to augmented and mediated reality systems?

In the rest of this post, you will some some examples of how computer vision driven television graphics have been used in recent years. As you watch the videos, try to relate the techniques demonstrated with the issues raised in the white paper.

From 2004 to 2010, the BBC R&D department, in association with Red Bee Media, worked on a system known as Piero, now owned by Ericsson, that explored a wide range of augmentation techniques. Watch the following videos and see how many different sorts of “augmentation” effect you can identify. In each case, what sorts of enabling technology do you think are required in order to put together a system capable of generating such an effect?

In the US, SportVision provide a range of real-time enhancements for televised sports coverage. The following video demonstrates car and player tracking in motor-racing and football respectively, ball tracking in baseball and football (soccer), and a range of other “event” related enhancements, such as offside lines or player highlighting in football (soccer).

EXERCISE: watch the SportVision 2012 showreel on the SportVision website. How many different augmented reality style effects did you see demonstrated in the showreel?

Watching the videos, there are several examples of how items tracked in realtime can be visualised, either to highlight a particular object or feature (such as tracking a player, highlighting the position of a ball, puck, or car), or trace out the trajectory followed by the object (for example, highlighting in realtime the path followed by a ball).

Having seen some examples of the techniques in action, and perhaps started to ask yourself “how did they do that?”, skim back over the BBC white paper to see if any of the sections jump out at you in answer to your self-posed questions.

In the UK, Hawk-Eye Innovations is one of the most well known providers of such services to UK TV sports viewers.

The following video describes in a little more detail how the Hawk-Eye system can be used to enhance snooker coverage.

And how Hawk-Eye is used in tennis:

In much the same way as sportsmen compete on the field of play, so too do rival technology companies. In the 2010 Ashes series, Hawk-Eye founder Paul Hawkins suggested that a system provided by rivals VirtualEye could lead to inaccurate adjudications due to human operator error compared to the (at the time) more completely automated Hawk-Eye system (The Ashes 2010: Hawk-Eye founder claims rival system is not being so eagle-eyed).

The following video demonstrates how the Virtual Eye ball tracking software worked to highlight the path of a cricket ball as it is being bowled:

EXERCISE: what are the benefits to sports producers from using augmented reality style, realtime television graphics as part of their production?

The following video demonstrates how the SportVision Liveline effect can be used to help illustrate what’s actually happening in an Americas Cup yacht race, which can often be hard to follow for the casual viewer:

EXERCISE: To what extent might such effects be possible in a magic lens style application that could be used by a spectator actually witnessing a live sporting event?

EXERCISE: review some of the video graphics effects projects undertaken in recent years by the BBC R&D department. To what extent do the projects require: a) the modeling of the world with a virtual representation of it; b) the tracking of objects within the visual scene; c) the compositing of multiple video elements, or the introduction of digital objects within the visual scene?

As a quick review of the BBC R&D projects in this area suggests, the development of on-screen graphics that can track objects in real time may be complemented by the development of 3D models of the televised view so that it can be inspected from virtual camera positions that provide a view of the scene that is reconstrcuted from a model bulit up from the real camera positions.

Once again, though, there may be a blurring of reality – because is the view actually taken from a virtual camera, or a real one such as in the form of a Spidercam?

As well as overlaying actual footage with digital effects, sports producers are also starting to introduce virtual digital objects into the studio to provide an augmented reality style view of the studio to the viewer at home.

The use of 3D graphics in TV studios is increasingly being used to dress other elements of the set. In addition, graphics are also being used to enhance TV sports through the use of virtual advertising. Both these approaches will be discussed in another post.

More generally, digital visual effects are used widely across film and television, as we shall also explore in a later post…

From Magic Lenses to Magic Mirrors and Back Again

In recent years, commercial outdoor advertising has made increasing use of screen based digital signage. These can be used for video based advertising campaigns as well as “carousel” style displays where the same screen can be used to display different adverts in turn. But in a spirit of playfulness, they may also be used as magic lens style displays, similar in kind to the handheld magic lens applications described in the post “Magic Lenses” and See-Through Displays. In 2014, the Pepsi Max “Unbelievable” ad campaign by Abbott Mead Vickers BBDO tricked passengers waiting in London bus shelters into think a customised bus shelter had a transparent side wall, when it fact it was a large magic lens – the Pepsi Max “Unbelievable Bus Shelter”.

Magic lenses provide both a view of the world in front of the display as well as mediated, augmented or transformed version of it. But what if we replace the idea of a lens with that of a mirror, that augments the scene captured by a front-mounted, user facing camera?

Another part of the Pepsi Max “Unreality” campaign replaced a real mirror with a “magic mirror” that transformed the “reflection” seen by the subject by replacing their face with a virtually face-painted version of it:

Reference: Campaign, Pepsi Max “unbelievable” by Abbott Mead Vickers BBDO.

Just as mobile phone provide a convenient device for viewing the scene directly in front of the user via a screen, with all that entails in terms of re-presenting the scene digitally, front mounted cameras on smart phones allow the user to display a live video feed of their own face on the screen, essentially using the user-facing camera+live video display combination as a mirror. But can such things also be used as magic mirrors?

Indeed they can. Several cosmetics manufacturers already publish make-up styling applications that show the effect of applying different styles of make-up selected by the user. The applications rely on identifying particular facial features, such as lips, or eyes, and then allow the use to apply the make-up virtually. (You will see how this face-capture works in another post.)

Another application, ModiFace, offers a similar range of features.

In much the same way that the Pepsi Max bus shelter used a large size display as a magic lens, so to can human size displays be used to implement magic mirrors.

Once again, the fashion industry has made use of full length magic mirrors to help consumers “try on” clothes using augmented reality. The mirror identifies the customer and then overlays their “reflection” with the items to be tried on. The following video shows the FXGear FXMirror being used as part as a shop floor fitting room.

EXERCISE: Read the blurb about the FXGear FXMirror. What data is collected about users who model clothes using the device? How might such data be used?

EXERCISE: How else have marketers and advertisers used augmented and mediated reality? Try searching through various marketing trade/industry publications to find reports of recent campaigns using such techniques. If you find any, please provide a quick review of them, along with a link, in the comments.

Augmented Reality Apps for the Design Conscious

When the 2013 Ikea catalogue was first released at the start of August 2012, as part of a campaign developed in association with the McCann advertising agency, it was complemented by an augmented reality application that allowed customers to place catalogue items as if in situ in their own homes. Each year since then, the augmented reality app has been updated with the latest catalogue items, demonstrating Ikea’s ongoing commitment to this form of marketing.

For an early report, see for example: WiredSo Smart: New Ikea App Places Virtual Furniture in Your Home, August 2013.

Perhaps not surprisingly, the use of augmented reality in the context of interior design extends far beyond just an extension of the Ikea catalogue.

One of the drawbacks of the current generation of augmented reality interior design applications is the low quality of the rendering of the digital 3D object. As we shall see elsewhere, the higher powered computer processors available in today’s desktop and laptop computers, compared to mobile devices, means that it is becoming possible to render photorealistic objects in a reasonable amount of time with a personal computer. However, meeting the realtime rendering requirement of augmented reality apps, as well as the ability to ensure that that the rendered object is appropriately shaded given the lighting conditions of the environment and the desired location of the artificial object, presents further technological challenges.

EXERCISE: read the Accenture report from 2014 Life on the digital edge: How augmented reality can enhance customer experience and drive growth and then answer the following questions:

  • what does the report describe as “one of the main goals of any retailer’s digital investment”? How do they claim augmented reality might achieve that goal? To what extent do you think that claim is realistic? What obstacles can you think of that might stand in the way of achieving such a goal using augmented or mediated reality?
  • according to the report, how might augmented reality be used in retail? The report was published in 2014 – can you find any recent examples of augmented reality being used in ways described in the report? Is it being used for retail in ways not identified in the report?
  • what does the report identify as the possible business value benefits of using augmented reality? In that section, a table poses the question “What augmented reality use case would increase your likelihood of purchasing the product?”. Can you find one or more current or historical examples of the applications described? Do such applications seem to be being used more – or less – frequently in recent times?

A lot of hype surrounds artificial reality although in many respects is value other than as a novelty are yet to be determined. To what extent do you think augmented reality applications are a useful everyday contribution to the marketer’s toolkit, and to what extent are they simply a marketing novelty fit only for short lived campaigns? What are the challenges to using such applications as part of an everyday experience?

Noise Cancellation – An Example of Mediated Audio Reality?

Whilst it is tempting to focus on the realtime processing of visual imagery when considering augmented reality, notwithstanding the tricky problem of inserting a transparent display between the viewer and the physical scene when using magic lens approaches, it may be that the real benefits of augmented reality will arise from the augmentation or realtime manipulation of another modality such as sound.

EXERCISE: describe two or three examples of how audio may be used, or transformed, to alter a user’s perception or understanding of their current environment.

ANSWER: car navigation systems augment spatial location with audio messages describing when to turn and audio guides in heritage settings, where you can listen to a story that “augments” a particular location. Noise cancelling earphones transform the environment by subtracting, or tuning out, background noise and modern digital hearing aids process the audio environment at a personal level in increasingly rich ways.

Noise Cancellation

As briefly described in Blurred Edges – Dual Reality, mediated reality is a general term in which information may be added to or subtracted from a real world scene. In many industrial and everyday settings, intrusive environmental noise may lead to an unpleasant work environment, or act as an obstacle to audio communication. In such situations, it might be convenient to remove the background noise and expose the subjects within it to a mediated audio reality.

Noise cancellation provides one such form of mediated reality, where the audio environment is actively “cleaned” of an otherwise intrusive noise component. Noise cancellation technology can be use to cancel out intrusive noise in noisy environments, such as cars or aircraft.

Noise cancelled environments might also be considered as a form of hyper-reality, in the sense that no information other than that contained within, or derived from, the original signal is presented as part of the “augmented” experience

EXERCISE: watch the following videos that demonstrate the effect of noise cancelling headphones and that describe how they work, then answer the following questions:

  • how does “active” noise cancellation differ from passive noise cancellation?
  • what sorts of noise are active noise cancellation systems most effective at removing, and why?
  • what sort of system can be used to test or demonstrate the effectiveness of noise cancelling headphones?

Finally, write down an algorithm that describes, in simple terms, the steps involved in a simple noise cancelling system.

EXERCISE: Increasingly, top end cars may include some sort of noise cancellation system to reduce the effects of road noise. How might noise cancellation be used, or modified, to cancel noise in an enclosed environment where headphones are not typically worn, such as when sat inside a car?

Rather than presenting the mixed audio signal to a listener via headphones, under some circumstances speakers may be used to cancel the noise as experienced within a more open environment.

As well as improving the experience of someone listening to music in a noisy environment, noise cancellation techniques can also be useful as part of a hearing aid for hard of hearing users. One of the major aims of hearing aid manufacturers is to improve the audibility of speech – can noise cancellation help here?

EXERCISE: read the articles – and watch/listen to the associated videos – Noise Reduction Systems and Reverb Reduction produced by hearing aid manufacturer Sonic. What sorts of audio reality mediation are described?

It may seem strange to you to think of hearing aids as augmented, or more generally, mediated, reality devices, but their realtime processing and representation of the user’s current environment suggests this is exactly what they are!

In the next post on this theme, we will explore what sorts of physical device or apparatus can be used to mediate audio realities. But for now, let’s go back to the visual domain…

“Magic Lenses” and See-Through Displays

In the post Taxonomies for Describing Mixed and Alternate Reality Systems we introduced various schemes for categorising and classifying the various components of mixed and augmented reality systems. In this post, we will see how one particular class of display – see-through displays – can be put to practical purpose. 

Using a phone, or tablet, with a forward facing, back-mounted camera as a see-through video display, you can relay the camera view to the screen in a realtime view mode and manipulate the current scene. This approach has been referred to as a “magic lens, … a see-through interface/metaphor that affords the user a modified view of the scene behind the lens” (D. Baričević, C. Lee, M. Turk, T. Höllerer and D. A. Bowman, “A hand-held AR magic lens with user-perspective rendering,” Mixed and Augmented Reality (ISMAR), 2012 IEEE International Symposium on, Atlanta, GA, 2012, pp. 197-206, doi: 10.1109/ISMAR.2012.6402557 [PDF]). (See also  M. Rohs and A. Oulasvitra. Target acquisition with camera phones when used as magic lens, Proceedings of the 26th international conference on Human Factors in Computing Systems, CHI ’08, pages 1409–1418. ACM, 2008, who define a magic lens as an “augmented reality interface thats consist of a camera-equipped mobile device being used as a see-through tool. It augments the user’s view of real world objects by graphical and textual overlays”).

However, as the paper noted at the time:

Many existing concept images of AR magic lenses show that the magic lens displays a scene from the user’s perspective, as if the display were a smart transparent frame allowing for perspective-correct overlays. This is arguably the most intuitive view. However, the actual magic lens shows the augmented scene from the point of view of the camera on the hand-held device. The perspective of that camera can be very different from the perspective of the user, so what the user sees does not align with the real world. … We define the user-perspective view as the geometrically correct view of a scene from the point-of-view of the user, in the direction of the user’s view, and with the exact view the user should have in that direction.

Whilst head-up displays are also examples of see-through displays, many head-up displays do not necessarily situate the virtual digital objects as direct augmentations of the perceived physical world – rather they are frequently pop-up style dashboard that open as “desktop windows” or pop-up menus within the visual scene, rather than as direct objects of physical objects perceived within the visual scene.

The first wave of consumer augmented reality applications relied printing out registration images or QR codes that could act as fiducial markers and be easily recognised using image recognition software, and then overlaid with a 3D animation.

If an image could reliably be detected, it could be used as part of an augmented reality system, resulting in some innovative marketing campaigns.

The same idea can be used to enhance two-dimensional print publications. With a suitable device and the appropriate app installed, you can recognise a particular page of print and “unlock” additional content, an approach taken by the Layar augmented reality app, among other things, that allows you you create your own augmented reality enhanced content.

For more confident programmers, one of the earliest widely available augmented reality programming toolkits, the open source ARToolkit, (which is still being developed today and is distributed for free at, and the Wikitude SDK (software development kit), which allow professional and hobbiest programmers alike to create their own augmented reality demonstrations. (See also commercial services such as Catchoom: CraftAR.)

Within all these applications, we see how there is a need for “enabling technologies, … advances in the basic technologies needed to build compelling AR environments. Examples of these technologies include displays, tracking, registration, and calibration” (Azuma, Ronald, Yohan Baillot, Reinhold Behringer, Steven Feiner, Simon Julier, and Blair MacIntyre. “Recent advances in augmented reality.” IEEE computer graphics and applications 21, no. 6 (2001): 34-47) that make the development of such systems possible by developers outside of advanced research and development labs.

One popular category of AR Toolkit demonstration, and an approach that hints at a particular  category of potential augmented reality applications, was the development of interactive Lego model assembly manuals. These could recognise a registration image associated with a particular model and could then step through the sequential steps required to build the model, overlaying the next piece to be added to the model in a stepwise fashion. The known size and of the marker, the fixed geometry of the model, and the availability of open source Lego CAD tools based around LDraw meant that many of the physical and computational building blocks required for creating such applications were already in place.

SAQ: What’s wrong with the demonstration shown in the video above?

Answer: The placing of the virtual block on the model does not appear to be in the correct place, but is offset slightly. This might arise from a combination of issues, including the placement of the registered image or the positioning of the or see-through device or the camera used to record the video.

An earlier demonstration of a Lego construction model instruction manual includes some additional humour in the form of an animated Lego figure mechanic who fetches an appropriate piece at each step and then demonstrates where to attach it to a model based around the original Lego Mindstorms Robot Invention System.

The demonstration also shows how augmented reality can be used to to test the operation of the completed assembly, stepping the user through a test sequence and virtually animating the expected behaviour. The ability of the RCX computer brick at the heart of the model to communicate back to the computer hosting the manual also allowed the augmented reality layer to display information captured by the brick (the light sensor readings) to be displayed in the augmented reality layer.

SAQ: how might advances in 3D image recognition technology be used to further improve the functionality of the manual, for example, in terms of checking the correct assembly of the model? What other enabling technologies may also help in this endeavour?

Answer: as the ability to recognise, identify and orientate 3D objects improved, the potential for generating digital overlays on three dimensional objects became more tractable. This means that it may be possible to recognise pieces picked up by the person building the model and checked by the interactive manual against the expected part. Erroneous parts could be highlighted with a warning sign. Additionally, the state of the model after each step is completed visually checked to see that the correct piece appears to have been placed in the correct position, although this is likely to present a mode complex task and may not be possible. If a piece could be identified as incorrectly placed (for example, in a “likely possible” misplaced position) the instruction manual might show how to move the part to the correct place, or rewind to show where the piece should have been placed.

The availability of programming building blocks capable of regnising individual lego bricks and associating them with a part number also used by CAD tools such as LDraw could be seen as an enabling technology for the further development of such diagnostic AR tools.

As a proof-of-concept idea, augmented reality Lego construction manuals provide a realistic, if toy example (in several senses of the word!) of how such techniques might be used in a practical setting. So it’s not surprising that augmented reality instruction manuals were among the first application areas described when the possibility of AR first began to emerge.

Optional Reading: two relatively early descriptions of augmented reality instruction and assembly manuals can be found in: Caudell, Thomas P., and David W. Mizell. “Augmented reality: An application of heads-up display technology to manual manufacturing processes“, Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences, vol. 2, pp. 659-669. IEEE, 1992.; and Feiner, Steven, Blair Macintyre, and Dorée Seligmann, “Knowledge-based augmented reality.” Communications of the ACM 36, no. 7 (1993): 53-62.

A similar approach is also being used to develop service manuals for use in industry, via magic lens displays, and presumably also in smart helmet displays:

Tablet, or phone, based magic lens apps are also being use to support car maintenance in the form of augmented reality car manuals that are capable of recognising particular features of a car dashboard, or engine, and the interactively annotating them with a virtual overlay, as described in this press release from Hyundai about their augmented reality car ownersmanual:

On the other hand, maybe such applications are just so much hype and not actually of interest to a wider public? For example, in 2013, one company attempted to crowdsource funding for a general purpose app that could act as an AR style guide for a range of car models:

At the time of writing, in mid-2016, the site appears to have raised almost $500 of the $90, 000 goal. Maybe augmented reality is not that compelling for the mass market?!

However, it seems as if the development of augmented reality technical documentation is still an area of academic research, at least.

In the next post on this theme, we’ll see how augmented reality can be used to implement “magic mirrors”, in contrast to “magic lenses”. But first – what “lenses” can we apply to another modality: sound?

Real or Virtual Objects?

In the post Taxonomies for Describing Mixed and Alternate Reality Systems, we provided a framework for talking about the various physical components of an augmented reality system. But how should we talk about the different elements within the perceived augmented reality scene?

Milgram and Kishino (Milgram, Paul & Fumio Kishino, “A taxonomy of mixed reality visual displays”IEICE TRANSACTIONS on Information and Systems 77, no. 12 (1994): 1321-1329) started by clarifying the notions of real and virtual in an augmented reality sense:

  • Real objects are objects that have a physical, tangible existence, whereas virtual objects are purely digital representations, without a physical correlate, within the rendered visual scene (although they may be digital representations of things that do exist).
  • An object viewed directly appears has an existence in the real world and is viewed as such by the viewer. A non-directly viewed object is one that has been sampled and re-presented to the viewer via a display medium, or a virtual object whose existence can only be viewed via such a medium. This is referred to as the image quality.
  • A real image is one that has “some luminosity at the location at which it appears to be located”, such as a directly viewed object or an image viewed on a screen. Virtual images are produced by optical tricks, such as holograms and mirror images, and have no luminosity at the location at which they appear.


Whilst these distinctions are helpful when considering the representation of a single object, they may become confused when trying to analyse a view composed of multiple objects, both real and virtual. For example, in the Google Translate example described in Augmenting Reality With Digital Overlays, the screen is a physical display, that is, a real image, that provides a non-direct view. But is the text a real object or a virtual object?

To help us talk about objects within the augmented visual scene, we might add an additional correspondence dimension, that describes whether an object within the scene, or component of it, is presented as:

  • a raw, otherwise untouched, part of the image (that is, a faithful re-presentation of the object represented in that part of the image);
  • an overlay, where an additional layer of information is added to the scene, as in the case of a HUD dashboard;
  • a re-touch, where the object is still recognisable but has been reshaped and/or recoloured;
  • a replacement, where an object has been detected and then replaced.

We now have various tools at out disposal for helping us see – and talk about – the various components of a mixed reality system from a range of critical perspectives.

Augmenting Reality With Digital Overlays

Typically, head up displays of the sort referred to in Introducing Augmented Reality Apparatus – From Victorian Stage Effects to Head-Up Displays represent one or more layers of “dashboard” style information to a forward viewer without them having to look down at an instrument panel. But augmented reality displays can go further by registering or identifying items within the visual scene and then overlaying information on top of the scene that directly relates to those entities, or transforming it directly, in real time. In this section, we will introduce several examples of how augmented reality has been implemented, and the uses to which it has been put, over the last few years, and identify further ways of describing the various components that make up a mixed reality system.

In the examples of augmented reality that follow, try to relate the “problem” being solved with the sort of AR apparatus being used as described in Taxonomies for Describing Mixed and Alternate Reality Systems. Ask yourself why that technique might have been chosen and whether it appears to be the most appropriate one. Would alternative implementations also work, and if so, how would they compare in term of their relative advantages and disadvantages?

Projection based displays

The augmented reality church organ/equaliser we met earlier represents an example of what MIT researchersRaskar, Ramesh, Greg Welch, and Henry Fuchs referred to as Spatially Augmented Reality (SAR) (Raskar, Ramesh, Greg Welch, and Henry Fuchs, “Spatially augmented reality“, First IEEE Workshop on Augmented Reality (IWAR’98), pp. 11-20. 1998):

In Spatially Augmented Reality (SAR), the user’s physical environment is augmented with images that are integrated directly in the user’s environment, not simply in their visual field. For example, the images could be projected onto real objects using digital light projectors, or embedded directly in the environment with flat panel displays.

The Virtual Watershed Table / Augmented Reality Sandbox provides another example of SAR, in which the vertical relief of a table of sand moulded in three dimensions by the user is tracked in real time by a Microsoft Kinect device. A virtual model of the extracted shape of the surface is then used as the basis for a topographic map projection onto the surface of the sand, along with animated displays of waterflows across the sculpted sand model.

SAQ: What difficulties might be associated with projection based displays?

Answer: one obvious problem is that the viewer may occlude the projected imagery, casting a shadow over parts of it. Another is that a projection system is required, and must be calibrated so that it maps the digital imagery appropriately on the matched physical substrate.

Augmented Reality Apps

Although the AR Sandbox provides a compelling demonstration of how augmented reality can be used to enrich a learning or discussion activity, augmented reality applications have yet to prove they can make it in the consumer marketplace. Do users really want to stand looking through a camera as a see-through display, or would they be happier grabbing a photo and then looking at an augmentation or transformation of it?

A good example of this is shown by the Word Lens augmented reality application that was acquired by Google and is now part of Google Translate. It not only detects text, in realtime, in a visual scene, but also identifies the language and then translates the text, as required, replacing the original text with the translated version.

If you’ve ever found yourself in a foreign city with a script you don’t recognise, such as Greek, or Russian, you might appreciate the value of this sort of application! But does this really need to be an augmented reality video application? Or would it work equally well if the user looked up  to take a photo of the street sign that was causing them confusion and then looked down at their phone to inspect a translated version of it, much as they might preview a photo they had just taken?

SAQ: how would you categorise the previous examples of augmented reality in terms of the AR technology frameworks?

With a conceptual scheme (the technology framework) already in place for categorising the various approaches to implementing the optical components of an augmented reality system, we now need some way of talking about the visual components that make up the augmented reality scene.

Taxonomies for Describing Mixed and Alternate Reality Systems

In the post Introducing Augmented Reality Apparatus – From Victorian Stage Effects to Head-Up Displays, we saw how a Victorian illusion could be repurposed as the basis of a a modern day augmented reality application. In this post, we’ll start to pick apart the various ways in which mixed and alternate reality systems can be put together, and explore how we can distinguish such systems from each other.

When coming to a new topic, it can often be hard to know how people who work in that area, or who are experts in it, make sense of it. If presented with a photograph of a bird and asked to identify it, an ornithologist (bird watcher) would almost certainly see different, and distinctive, things in the image than I would! So as we embark on out journey into augmented reality, what sort of things do we need to be looking out for to help us get our bearings?

A taxonomy describes a classification scheme that allows you categorise related items within a particular frame of reference in a meaningful way. Milgram and Kishino’s “A taxonomy of mixed reality visual displays”, helps us to identify a range of methods for displaying mixed reality scenes for viewing by individual’s in a non-immersive way (I am using “non-immersive” in the sense that the participant can still see the physical word around them). Their classification includes the following:

  • “Monitor based … video displays – i.e. ‘window-on-the-world’ (WoW) displays – upon which computer generated images are electronically or digitally overlaid”; there is no implication of being able to “see through” these displays. Rather, the viewed scene may be remote in terms of time and/or space and the focus is on the manipulation of an already captured video scene. A window-on-the-world view might be something as simple as television view displaying a swimming race with an overlaid virtual line placed on top of the scene showing where the race leader would have to be at that point in time if they were setting a world record pace.
  • Displays, such as HMDs (Head Mounted Displays), “equipped with a[n optical] see-through [ST] capability, with which computer generated graphics can be optically superimposed, using half-silvered mirrors, onto directly viewed real-world scenes”. A head up display on a smart helmet is a good example of an optical see through display.
  • Displays that use “video, rather than optical, viewing of the ‘outside’ world. … the displayed world should correspond orthoscopically [that is, size, shape and perspective should be maintained] with the immediate outside real world, thereby creating a ‘video see-through’ system, analogous with the optical see-through [approach]”. Someone viewing the world through a camera view on their smartphone would be looking at a video see through system.

A second paper from the same lab (Milgram P, Takemura H, Utsumi A, Kishino F. Augmented reality: A class of displays on the reality-virtuality continuumPhotonics for industrial applications, 1995 Dec 21 (pp. 282-292). International Society for Optics and Photonics) further classified these display types in terms of whether the principle depicted world (the substratum world) was real or computer generated (CG), providing a basis for comparing augmented reality systems from virtual reality ones, whether the substrate was “scanned” or directly viewed (that is, directly perceived without mediation through a video screen or projection) and whether the view was a first person, egocentric view (that is, from the viewer’s perspective) or an exocentric view (from some other perspective).

Class of MR System  Real (R) or CG world?  Direct (D) or Scanned (S) view of substrate?  Exocentric (EX) or Egocentric (EG) Reference? 
Monitor-based video, with CG overlays  R S EX
HMD-based optical ST, with CG overlays  R D EG
HMD-based video ST, with CG overlays R S EG

We might further refine the exocentric notion into 2nd and 3rd person views, where we imagine the second person view is capable of including the presence of the viewer, and the third person view is completely remote from them.

A later paper by Bimber, Oliver, and Ramesh Raskar, “Modern approaches to augmented reality“, ACM SIGGRAPH 2006 Courses, p. 1. ACM, 2006, also considered the sort of physical system, or apparatus, required to augment a visual scene with digital imager. (The idea is not that all of these methods are employed at the same time – only one of them is!)

  • retinal display;
  • head-mounted display;
  • hand-held display;
  • spatial optical see-through display;
  • projected display on object.


(We might also add contact lens mounted displays between retinal and head mounted displays.)

A related classification is used by Van Krevelen, D. W. F., & Poelman, R. (2010). A survey of augmented reality technologies, applications and limitations. International Journal of Virtual Reality, 9(2), 1, which groups the approaches as retinal, optical see-through, video see-through, and projective.

Drawing on all these ideas, the following classification allows us to talk about a range of visual displays capable of rendering mixed and augmented realities, whether locally or remotely situated with respect to the reality being augmented, to individuals or groups:

  • proximity dimension:
    • proximal: retinal and head mounted displays, which may be grouped together as augmented visual field devices (AVFDs)
    • hand-held: hand-held devices such as phones or tablets
    • distal: free standing displays (e.g. monitors or projected displays)
  • optical dimension:
    • video screen based window on the world displays, which overlay a given video image
    • see through displays to augment the visual scene perceived through the display, which may be video based, and as such provide a “scanned” (or indirect) view of the substrate, or optically based, where the substrate is directly perceived
    • projected displays, which directly enhance the environment
  • viewpoint
    • first degree (first person?): first person view
    • second degree (bystander?): colocated with viewer and capable of presenting them in the visual scene
    • third degree (third party? remote?): representing a non-local visual scene.

On the one hand, the classification allows us to refer to an augmented reality phone-app as a hand-held see-through video screen based display used to indirectly perceive the visual scene from a first degree viewpoint. On the other, it allows us to refer to a mixed reality scene such as televised sporting event with overlaid graphics as indirect view of the scene from a third degree viewpoint using hand-held or distal window on the world video display.




Get every new post delivered to your Inbox.

Join 64 other followers