Archive for the 'TM111' Category

Blurred Edges – Dual Reality

The launch of several virtual reality headsets into the consumer market in the first half of 2016 saw a flurry of new hype around this rather old idea, at least in terms of computer technology: a technical report from the Institute of Computer Graphics and Algorithms at the Vienna University of Technology saw fit to report on Virtual Reality History, Applications, Technology and Future over twenty years ago, in 1996. However, of the disadvantages of immersive virtual reality, other than oft reported visually induced “VR sickness”, is that the apparatus required to enter it covers your eyes, and completely occludes your direct view of the physical world with a computer generated one. On the other hand, in an augmented reality system, you still directly perceive physical world elements, even if they are overlaid, or annotated, with additional digital information.

One of the seminal papers in augmented reality research, Milgram, Paul & Fumio Kishino, “A taxonomy of mixed reality visual displays”IEICE TRANSACTIONS on Information and Systems 77, no. 12 (1994): 1321-1329) describes a Mixed Reality environment as “one in which real world and virtual world objects are presented together within a single display, that is, anywhere between the extrema of the virtuality continuum”.


The paper also describes an operational definition of Augmented Reality (AR) as “any case in which an otherwise real environment is ‘augmented’ by means of virtual (computer graphic) objects…. not for lack of a better name, but simply out of conviction that the term Augmented Reality is quite appropriate for describing the essence of computer graphic enhancement of video images of real scenes” and we shall find it convenient to adopt a similar definition, although other definitions exist.

For example, as Azuma et al. (Azuma, Ronald, Yohan Baillot, Reinhold Behringer, Steven Feiner, Simon Julier, and Blair MacIntyre. “Recent advances in augmented reality.” IEEE computer graphics and applications 21, no. 6 (2001): 34-47) define it:

An AR system supplements the real world with virtual (computer-generated) objects that appear to coexist in the same space as the real world.[A]n AR system [is defined] to have the following properties:

  • combines real and virtual objects in a real environment;
  • runs interactively, and in real time; and
  • registers (aligns) real and virtual objects with each other.

Note that this definition is not limited to any particular display technology or sensory modality.

Augmented reality is itself a form of mediated reality, or computer mediated reality. Mediated realities may themselves be thought of in terms of the extent to which information is added to an environment (augmented reality) or subtracted from an environment (which we might term a diminished reality). In addition, the notion of hyper-reality describes a system where no externally derived information is added to the system.

To implement a mixed reality system requires the presence of some sort of physical system, or apparatus, that can typically capture a visual scene, often from the viewer’s perspective, and render it back to the viewer, replete with augmentations. In a visually based system, we also need a computational system that is capable not only of registering and tracking, in real time, objects or locations within the scene, but also transforming them in some way in order to generate the augmented view of the physical reality.

In this series of posts, created as part of a scoping activity for a short unit in a new Open University introductory computing course, we’ll be stopping short of discussing fully immersive virtual environments, but we will be looking at augmented reality, exploring how digital technologies are blurring the ground in terms of the physical reality of what we see whenever we look at – or through – a screen, as well as capturing physical depictions of form and movement so that they can be rendered within mixed reality spaces. We will also consider non-visual  mixed realities, such as mixed realities that we can listen to, rather than see visually.

When taken to extremes, such technologies may present us with a nightmarish, rather than compelling, vision of the future, as imagined by Keiichi Matsuda in his video short, “Hyper-Reality” [review]. Fortunately, perhaps, the physical technology required to implement such a system is still several years away!

See also: Infinity AR Augmented Reality Concept Video.

That isn’t to say, however, that frivolities such as Pokemon Go, released to global audiences at the start of July 2016, won’t have their five minutes of global appeal!

Across the posts, we will be focusing primarily the notion idea of virtual overlays or real or virtual transformations of real objects, looking at how we can overlay virtual scenes and information onto views of the real world, as well as how to get representations of physical objects into the virtual world so that they can be virtually transformed. This will include a consideration of how to capture real objects so that can be represented as faithful virtual objects which provide the basis for the virtual transformation, where the real and the virtual are combined in a composite view of the world, as well as a consideration of the apparatus required to implement such techniques.

As the posts are produced (and they may well be subject to change after posting!), I’ll add them to the list here:



Introducing Augmented Reality Apparatus – From Victorian Stage Effects to Head-Up Displays

When viewing things through a screen, can you distinguish what’s “real” from what isn’t? How much of the background in the last blockbuster movie you saw was footage of real buildings, a set put together by scenic carpenters, a scale model, or computer generated imagery? In the last fantasy film you saw, were the mythical creatures puppets, “pure animations”, or animations based on human performance capture? Are the adverts that you see in televised sporting events really displayed on the pitch or hoardings? And are the dashboard instruments in you car display “real” or “virtual”?

At the time it was released in 1997, Jamiroquai’s Virtual Insanity video raised the sort of confusion that one might have imagined Victorian audiences seeing Pepper’s ghost – an illusion we will return to – for the first time. Watching the video today, you might imagine it was created using digital trickery, but in fact the illusion created was purely a physical one.

I’ve been unable to find any behind the scenes footage from the making of that video, but the technique, or something akin to it, was reused to make an advert for the Spanish beer, Estrella Galicia:

There is little, if anything, in this form of production that would have prevented a similar sequence being filmed over a hundred years ago, well before the advent of digital technologies.

But some Victorian theatrical effects can benefit from a dash of the digital…

Pepper’s ghost

If you’ve ever seen a floating “holographic head” as part of a Ghost Train or Haunted House fairground ride, you’ve most likely been presented with a version of Pepper’s Ghost. Taken into the theatre by popular scientist John Henry Pepper, building on a technique developed a few years by Henry Dircks, an engineer, inventor, and  debunker of Victorian spiritualists and pseudo-scientists, Pepper’s ghost appeared to place a ghostly apparition alongside a “real” actor on stage.

The effect is an optical and relies on placing a piece of glass at an angle and through which the audience sees the “real” on stage characters. The glass reflects an otherwise hidden area and “projects” it’s ghostly image onto the stage. What that audience sees is the reflection of the ghost in the glass, and the main stage actors through the glass.

//image from:

As well as theatrical use, the effect can be used on a large scale in amusement park rides.

If you are happy with an illuminated scene, the effect can be used to float static objects, and even actors, within the visual field of view of the audience.  However, the same technique can be employed to use a projector to cast an image onto the glass plate; which means you can also use a digital projector. This has the advantage that you can now float (animated) digital creations, as well as filmed ones, onto the stage.

If you’ve ever seen pop statistician Hans Rosling’s OU co-produced BBC Two statistics lectures, you’ll have seen this effect being used to cast huge “holographic” data visualisations onto the stage via the Musion 3D projection system.

The effectiveness of the technique is not limited to the large theatrical scale either. Indeed, you can create you own floating three dimensional “holographic” display using just a few pieces of acetate, or CD/DVD cases, and a mobile phone…

Creating your own 3D display:

The 3D effect is created by having four separate points of view, each with its own animation, one projected onto each face of the four sided pyramid.

A wide range of “how to” videos showing how to make these viewers are available online. You can also can find a range of “4 sided” pre-made videos to use with them by searching social video websites for: pyramid hologram screen up.

Making Wider Use of Pepper’s Ghost

The Pepper’s Ghost illusion provides one way of casting the digital into the physical world. Heads-up displays (HUDs) provide another opportunity for overlaying projected digital imagery onto our view of the world. Head-up displays represent a simple form of augmented reality in which the “real” visual scene is overlaid, or augmented, with additional visual information.

Head-up displays have been a feature of military aircraft for many years, more recently appearing in civilian aircraft. Head-up displays use a transparent screen mounted inside the cockpit and in the field of view of the pilot onto which aircraft related information is projected.

(More recently, HUD technology has started migrating inside the pilot’s helmet, as with the Rockwell Collins F35 Helmet Display System.)

Head-up displays are now also starting to appear in top-end production cars, with the display projected onto the windscreen, HUD style attachments are also starting to appear as freestanding peripherals either using a built in “screen”, or, as in the case of the Garmin Head-Up Display, by projecting the display onto a transparent film attached to the windscreen. .

SAQ: The HUDWAY Glass demo shows vehicle lanes projected onto the display. How do you think those lanes are generated?

Answer: As the device appears to be free standing, and is not apparently attached to a camera, I suspect that the lanes are generated using GPS and map data. The fact that the lane appears to predict, rather than track, the actual view of the road would appear to confirm this.

Head-Up display units can also be built into “smart helmets”, such as as this example from BMW:

Here’s another, more elaborate, concept video, form start-up company LiveMap:

These helmet displays project the display onto a transparent screen within the field of view of one of the wearer’s eyes. This approach is rather more elegant than other “in-sight” displays such as the original version of Google glass or the Garmin Varia Vision cycling glasses. Indeed, we would probably not class these as true head up displays because the intention is not to overlay a transparent display layers onto the visual scene. Instead, a small screen is inserted into the field of view and occludes the scene over the visual angle it intrudes into.

Given the requirement for a physical layer onto which the head up display layer or layers can be projected, more “natural” forms of eyewear in the form of glasses present may be required for mass adoption of everyday head up displays or augmented reality wear. However, as with the bulky and not particularly becoming 3D glasses for watching 3D cinema films, frames such as Sony’s SmartEyeglass look as if they still have some way to do in the fashion design stakes, and minituarisation of the projection technology  seems to be issues for other display innovators such as  Magic Leap.

In the workplace, however, where protective equipment may be the norm, there may be more freedom to develop augmented reality displays mounted within protective headware. Once again, smart helmets may provide the answer, such as the DAQRI Smart Helmet.

SAQ: How can the Pepper’s Ghost illusion be used to render augmented reality layers in the field of view of a viewer?

SAQ: what practical problems does the use of Pepper’s Ghost style projection introduce into the design of a head-up augmented reality display?

SAQ: what other uses can  you think of – or discover – for head-up displays?

Extension SAQ: what other ways might them be of projecting visual imagery into a physical space?

Extension answer:

  • a glass or plastic screen can be replaced using a fog screen:

If you have a physical screen that can be sensibly visualised, you may be able to do a site specific visual augmentation of the site simply by using a well-directed projector. For example, treating the pipes of a church organ as sound level indicators for each pipe:

The video also suggests how audio processing may also be used to dynamically alter the perception of the sound, to give use “augmented reality audio”.

The Art of Sound – Algorithmic Foley Artists?

As well as being a visual medium, films also rely on a rich audio environment to communicate emotion and affect (sic). In some cases, it may not be possible to capture the sound associated with a particular action, either because of noise in the environment (literally), or because the props themselves do not have the physical properties of the thing they portray. For example, two wooden swords used in a sword fight that are painted to look like metal would not sound like metal swords when coming in contact to each other. When a film is dubbed, and the original speech recording replaced by a post-production recording, any original sound effects also need to be replaced.

Foley artists add sounds to a film in post-production (that is, after the film has been shot). As foley artist John Roesch describes, whatever we see on that screen, we are making the most honest representation thereof, sonically (“Where the Sounds From the World’s Favorite Movies Are Born“, Wired, 0m42s).

One of the aims of the foley artist is to represent the sounds that the viewer expects to hear when watching a particular scene. As Roesch says of his approach, “when I look at a scene, I hear the sounds in my head” (0m48s). So can a visual analysis of the scene be used to identify material interactions and then automatically generate sounds corresponding to our expectations of what those interactions should sound like?

This question was recently asked by a group of MIT researchers (Owens, Andrew, Phillip Isola, Josh McDermott, Antonio Torralba, Edward H. Adelson, and William T. Freeman. “Visually Indicated Sounds.” arXiv preprint arXiv:1512.08512 [PDF] (2015)) and summarised in the MIT News article “Artificial intelligence produces realistic sounds that fool humans“.

“On many occasions, … sounds are not just statistically associated with the content of the images – the way, for example, that the sounds of unseen seagulls are associated with a view of a beach – but instead are directly caused by the physical interaction being depicted: you see what is making the sound. We call these events visually indicated sounds, and we propose the task of predicting sound from videos as a way to study physical interactions within a visual scene. To accurately predict a video’s held-out soundtrack, an algorithm has to know about the physical properties of what it is seeing and the actions that are being performed. This task implicitly requires material recognition…”

In their study, the team trained an algorithm using thousands of videos of a drum stick interacting with a wide variety of material objects in an attempt to associate particular with sounds with different materials, as well as the mode of interaction (hitting, scraping, and so on).

The next step was then to show the algorithm a silent video, and see if it could generate an appropriate soundtrack, in effect acting as a synthetic foley artist (Visually-Indicated Sounds, MITCSAIL).


SAQ: to what extent do you think foley artists like John Roesch might be replaced by algorithms?

Answer: whilst the MIT demo is an interesting one, it is currently limited to a known object – the drumstick – interacting with an arbitrary object. The video showed how even then, the algorithm occasionally misinterpreted the sort of interaction being demonstrated (e.g. mistaking a hit). For a complete system, the algorithm would have to identify both materials involved in the interaction, as well as the sort of interaction, and synthesize an appropriate sound. If the same sort of training method was used for this more general sort of system, I think it would be unlikely that a large enough corpus of training videos could be created (material X interacts with material Y in interaction Z) to provide a reliable training set. In addition, as foley artist John Roesch pointed out, “what you see is not necessarily what you get” (1m31s)…!

Taxonomies for Describing Mixed and Alternate Reality Systems

In the post Introducing Augmented Reality Apparatus – From Victorian Stage Effects to Head-Up Displays, we saw how a Victorian illusion could be repurposed as the basis of a a modern day augmented reality application. In this post, we’ll start to pick apart the various ways in which mixed and alternate reality systems can be put together, and explore how we can distinguish such systems from each other.

When coming to a new topic, it can often be hard to know how people who work in that area, or who are experts in it, make sense of it. If presented with a photograph of a bird and asked to identify it, an ornithologist (bird watcher) would almost certainly see different, and distinctive, things in the image than I would! So as we embark on out journey into augmented reality, what sort of things do we need to be looking out for to help us get our bearings?

A taxonomy describes a classification scheme that allows you categorise related items within a particular frame of reference in a meaningful way. Milgram and Kishino’s “A taxonomy of mixed reality visual displays”, helps us to identify a range of methods for displaying mixed reality scenes for viewing by individual’s in a non-immersive way (I am using “non-immersive” in the sense that the participant can still see the physical word around them). Their classification includes the following:

  • “Monitor based … video displays – i.e. ‘window-on-the-world’ (WoW) displays – upon which computer generated images are electronically or digitally overlaid”; there is no implication of being able to “see through” these displays. Rather, the viewed scene may be remote in terms of time and/or space and the focus is on the manipulation of an already captured video scene. A window-on-the-world view might be something as simple as television view displaying a swimming race with an overlaid virtual line placed on top of the scene showing where the race leader would have to be at that point in time if they were setting a world record pace.
  • Displays, such as HMDs (Head Mounted Displays), “equipped with a[n optical] see-through [ST] capability, with which computer generated graphics can be optically superimposed, using half-silvered mirrors, onto directly viewed real-world scenes”. A head up display on a smart helmet is a good example of an optical see through display.
  • Displays that use “video, rather than optical, viewing of the ‘outside’ world. … the displayed world should correspond orthoscopically [that is, size, shape and perspective should be maintained] with the immediate outside real world, thereby creating a ‘video see-through’ system, analogous with the optical see-through [approach]”. Someone viewing the world through a camera view on their smartphone would be looking at a video see through system.

A second paper from the same lab (Milgram P, Takemura H, Utsumi A, Kishino F. Augmented reality: A class of displays on the reality-virtuality continuumPhotonics for industrial applications, 1995 Dec 21 (pp. 282-292). International Society for Optics and Photonics) further classified these display types in terms of whether the principle depicted world (the substratum world) was real or computer generated (CG), providing a basis for comparing augmented reality systems from virtual reality ones, whether the substrate was “scanned” or directly viewed (that is, directly perceived without mediation through a video screen or projection) and whether the view was a first person, egocentric view (that is, from the viewer’s perspective) or an exocentric view (from some other perspective).

Class of MR System  Real (R) or CG world?  Direct (D) or Scanned (S) view of substrate?  Exocentric (EX) or Egocentric (EG) Reference? 
Monitor-based video, with CG overlays  R S EX
HMD-based optical ST, with CG overlays  R D EG
HMD-based video ST, with CG overlays R S EG

We might further refine the exocentric notion into 2nd and 3rd person views, where we imagine the second person view is capable of including the presence of the viewer, and the third person view is completely remote from them.

A later paper by Bimber, Oliver, and Ramesh Raskar, “Modern approaches to augmented reality“, ACM SIGGRAPH 2006 Courses, p. 1. ACM, 2006, also considered the sort of physical system, or apparatus, required to augment a visual scene with digital imager. (The idea is not that all of these methods are employed at the same time – only one of them is!)

  • retinal display;
  • head-mounted display;
  • hand-held display;
  • spatial optical see-through display;
  • projected display on object.


(We might also add contact lens mounted displays between retinal and head mounted displays.)

A related classification is used by Van Krevelen, D. W. F., & Poelman, R. (2010). A survey of augmented reality technologies, applications and limitations. International Journal of Virtual Reality, 9(2), 1, which groups the approaches as retinal, optical see-through, video see-through, and projective.

Drawing on all these ideas, the following classification allows us to talk about a range of visual displays capable of rendering mixed and augmented realities, whether locally or remotely situated with respect to the reality being augmented, to individuals or groups:

  • proximity dimension:
    • proximal: retinal and head mounted displays, which may be grouped together as augmented visual field devices (AVFDs)
    • hand-held: hand-held devices such as phones or tablets
    • distal: free standing displays (e.g. monitors or projected displays)
  • optical dimension:
    • video screen based window on the world displays, which overlay a given video image
    • see through displays to augment the visual scene perceived through the display, which may be video based, and as such provide a “scanned” (or indirect) view of the substrate, or optically based, where the substrate is directly perceived
    • projected displays, which directly enhance the environment
  • viewpoint
    • first degree (first person?): first person view
    • second degree (bystander?): colocated with viewer and capable of presenting them in the visual scene
    • third degree (third party? remote?): representing a non-local visual scene.

On the one hand, the classification allows us to refer to an augmented reality phone-app as a hand-held see-through video screen based display used to indirectly perceive the visual scene from a first degree viewpoint. On the other, it allows us to refer to a mixed reality scene such as televised sporting event with overlaid graphics as indirect view of the scene from a third degree viewpoint using hand-held or distal window on the world video display.


Augmenting Reality With Digital Overlays

Typically, head up displays of the sort referred to in Introducing Augmented Reality Apparatus – From Victorian Stage Effects to Head-Up Displays represent one or more layers of “dashboard” style information to a forward-facing viewer without them having to look down at an instrument panel. But augmented reality displays can go further by registering or identifying items within the visual scene and then overlaying information on top of the scene that directly relates to those entities, or transforming it directly, in real time. In this section, we will introduce several examples of how augmented reality has been implemented, and the uses to which it has been put, over the last few years, and identify further ways of describing the various components that make up a mixed reality system.

In the examples of augmented reality that follow, try to relate the “problem” being solved with the sort of AR apparatus being used as described in Taxonomies for Describing Mixed and Alternate Reality Systems. Ask yourself why that technique might have been chosen and whether it appears to be the most appropriate one. Would alternative implementations also work, and if so, how would they compare in term of their relative advantages and disadvantages?

Projection based displays

The augmented reality church organ/equaliser we met earlier represents an example of what MIT researchersRaskar, Ramesh, Greg Welch, and Henry Fuchs referred to as Spatially Augmented Reality (SAR) (Raskar, Ramesh, Greg Welch, and Henry Fuchs, “Spatially augmented reality“, First IEEE Workshop on Augmented Reality (IWAR’98), pp. 11-20. 1998):

In Spatially Augmented Reality (SAR), the user’s physical environment is augmented with images that are integrated directly in the user’s environment, not simply in their visual field. For example, the images could be projected onto real objects using digital light projectors, or embedded directly in the environment with flat panel displays.

The Virtual Watershed Table / Augmented Reality Sandbox provides another example of SAR, in which the vertical relief of a table of sand moulded in three dimensions by the user is tracked in real time by a Microsoft Kinect device. A virtual model of the extracted shape of the surface is then used as the basis for a topographic map projection onto the surface of the sand, along with animated displays of waterflows across the sculpted sand model.

SAQ: What difficulties might be associated with projection based displays?

Answer: one obvious problem is that the viewer may occlude the projected imagery, casting a shadow over parts of it. Another is that a projection system is required, and must be calibrated so that it maps the digital imagery appropriately on the matched physical substrate.

Augmented Reality Apps

Although the AR Sandbox provides a compelling demonstration of how augmented reality can be used to enrich a learning or discussion activity, augmented reality applications have yet to prove they can make it in the consumer marketplace. Do users really want to stand looking through a camera as a see-through display, or would they be happier grabbing a photo and then looking at an augmentation or transformation of it?

A good example of this is shown by the Word Lens augmented reality application that was acquired by Google and is now part of Google Translate. It not only detects text, in realtime, in a visual scene, but also identifies the language and then translates the text, as required, replacing the original text with the translated version.

If you’ve ever found yourself in a foreign city with a script you don’t recognise, such as Greek, or Russian, you might appreciate the value of this sort of application! But does this really need to be an augmented reality video application? Or would it work equally well if the user looked up  to take a photo of the street sign that was causing them confusion and then looked down at their phone to inspect a translated version of it, much as they might preview a photo they had just taken?

SAQ: how would you categorise the previous examples of augmented reality in terms of the AR technology frameworks?

With a conceptual scheme (the technology framework) already in place for categorising the various approaches to implementing the optical components of an augmented reality system, we now need some way of talking about the visual components that make up the augmented reality scene.

Real or Virtual Objects?

In the post Taxonomies for Describing Mixed and Alternate Reality Systems, we provided a framework for talking about the various physical components of an augmented reality system. But how should we talk about the different elements within the perceived augmented reality scene?

Milgram and Kishino (Milgram, Paul & Fumio Kishino, “A taxonomy of mixed reality visual displays”IEICE TRANSACTIONS on Information and Systems 77, no. 12 (1994): 1321-1329) started by clarifying the notions of real and virtual in an augmented reality sense:

  • Real objects are objects that have a physical, tangible existence, whereas virtual objects are purely digital representations, without a physical correlate, within the rendered visual scene (although they may be digital representations of things that do exist).
  • An object viewed directly appears has an existence in the real world and is viewed as such by the viewer. A non-directly viewed object is one that has been sampled and re-presented to the viewer via a display medium, or a virtual object whose existence can only be viewed via such a medium. This is referred to as the image quality.
  • A real image is one that has “some luminosity at the location at which it appears to be located”, such as a directly viewed object or an image viewed on a screen. Virtual images are produced by optical tricks, such as holograms and mirror images, and have no luminosity at the location at which they appear.


Whilst these distinctions are helpful when considering the representation of a single object, they may become confused when trying to analyse a view composed of multiple objects, both real and virtual. For example, in the Google Translate example described in Augmenting Reality With Digital Overlays, the screen is a physical display, that is, a real image, that provides a non-direct view. But is the text a real object or a virtual object?

To help us talk about objects within the augmented visual scene, we might add an additional correspondence dimension, that describes whether an object within the scene, or component of it, is presented as:

  • a raw, otherwise untouched, part of the image (that is, a faithful re-presentation of the object represented in that part of the image);
  • an overlay, where an additional layer of information is added to the scene, as in the case of a HUD dashboard;
  • a re-touch, where the object is still recognisable but has been reshaped and/or recoloured;
  • a replacement, where an object has been detected and then replaced.

We now have various tools at out disposal for helping us see – and talk about – the various components of a mixed reality system from a range of critical perspectives.

“Magic Lenses” and See-Through Displays

In the post Taxonomies for Describing Mixed and Alternate Reality Systems we introduced various schemes for categorising and classifying the various components of mixed and augmented reality systems. In this post, we will see how one particular class of display – see-through displays – can be put to practical purpose. 

Using a phone, or tablet, with a forward facing, back-mounted camera as a see-through video display, you can relay the camera view to the screen in a realtime view mode and manipulate the current scene. This approach has been referred to as a “magic lens, … a see-through interface/metaphor that affords the user a modified view of the scene behind the lens” (D. Baričević, C. Lee, M. Turk, T. Höllerer and D. A. Bowman, “A hand-held AR magic lens with user-perspective rendering,” Mixed and Augmented Reality (ISMAR), 2012 IEEE International Symposium on, Atlanta, GA, 2012, pp. 197-206, doi: 10.1109/ISMAR.2012.6402557 [PDF]). (See also  M. Rohs and A. Oulasvitra. Target acquisition with camera phones when used as magic lens, Proceedings of the 26th international conference on Human Factors in Computing Systems, CHI ’08, pages 1409–1418. ACM, 2008, who define a magic lens as an “augmented reality interface thats consist of a camera-equipped mobile device being used as a see-through tool. It augments the user’s view of real world objects by graphical and textual overlays”).

However, as the paper noted at the time:

Many existing concept images of AR magic lenses show that the magic lens displays a scene from the user’s perspective, as if the display were a smart transparent frame allowing for perspective-correct overlays. This is arguably the most intuitive view. However, the actual magic lens shows the augmented scene from the point of view of the camera on the hand-held device. The perspective of that camera can be very different from the perspective of the user, so what the user sees does not align with the real world. … We define the user-perspective view as the geometrically correct view of a scene from the point-of-view of the user, in the direction of the user’s view, and with the exact view the user should have in that direction.

Whilst head-up displays are also examples of see-through displays, many head-up displays do not necessarily situate the virtual digital objects as direct augmentations of the perceived physical world – rather they are frequently pop-up style dashboard that open as “desktop windows” or pop-up menus within the visual scene, rather than as direct objects of physical objects perceived within the visual scene.

The first wave of consumer augmented reality applications relied printing out registration images or QR codes that could act as fiducial markers and be easily recognised using image recognition software, and then overlaid with a 3D animation.

If an image could reliably be detected, it could be used as part of an augmented reality system, resulting in some innovative marketing campaigns.

The same idea can be used to enhance two-dimensional print publications. With a suitable device and the appropriate app installed, you can recognise a particular page of print and “unlock” additional content, an approach taken by the Layar augmented reality app, among other things, that allows you you create your own augmented reality enhanced content.

For more confident programmers, one of the earliest widely available augmented reality programming toolkits, the open source ARToolkit, (which is still being developed today and is distributed for free at, and the Wikitude SDK (software development kit), which allow professional and hobbiest programmers alike to create their own augmented reality demonstrations. (See also commercial services such as Catchoom: CraftAR.)

Within all these applications, we see how there is a need for “enabling technologies, … advances in the basic technologies needed to build compelling AR environments. Examples of these technologies include displays, tracking, registration, and calibration” (Azuma, Ronald, Yohan Baillot, Reinhold Behringer, Steven Feiner, Simon Julier, and Blair MacIntyre. “Recent advances in augmented reality.” IEEE computer graphics and applications 21, no. 6 (2001): 34-47) that make the development of such systems possible by developers outside of advanced research and development labs.

One popular category of AR Toolkit demonstration, and an approach that hints at a particular  category of potential augmented reality applications, was the development of interactive Lego model assembly manuals. These could recognise a registration image associated with a particular model and could then step through the sequential steps required to build the model, overlaying the next piece to be added to the model in a stepwise fashion. The known size and of the marker, the fixed geometry of the model, and the availability of open source Lego CAD tools based around LDraw meant that many of the physical and computational building blocks required for creating such applications were already in place.

SAQ: What’s wrong with the demonstration shown in the video above?

Answer: The placing of the virtual block on the model does not appear to be in the correct place, but is offset slightly. This might arise from a combination of issues, including the placement of the registered image or the positioning of the or see-through device or the camera used to record the video.

An earlier demonstration of a Lego construction model instruction manual includes some additional humour in the form of an animated Lego figure mechanic who fetches an appropriate piece at each step and then demonstrates where to attach it to a model based around the original Lego Mindstorms Robot Invention System.

The demonstration also shows how augmented reality can be used to to test the operation of the completed assembly, stepping the user through a test sequence and virtually animating the expected behaviour. The ability of the RCX computer brick at the heart of the model to communicate back to the computer hosting the manual also allowed the augmented reality layer to display information captured by the brick (the light sensor readings) to be displayed in the augmented reality layer.

SAQ: how might advances in 3D image recognition technology be used to further improve the functionality of the manual, for example, in terms of checking the correct assembly of the model? What other enabling technologies may also help in this endeavour?

Answer: as the ability to recognise, identify and orientate 3D objects improved, the potential for generating digital overlays on three dimensional objects became more tractable. This means that it may be possible to recognise pieces picked up by the person building the model and checked by the interactive manual against the expected part. Erroneous parts could be highlighted with a warning sign. Additionally, the state of the model after each step is completed visually checked to see that the correct piece appears to have been placed in the correct position, although this is likely to present a mode complex task and may not be possible. If a piece could be identified as incorrectly placed (for example, in a “likely possible” misplaced position) the instruction manual might show how to move the part to the correct place, or rewind to show where the piece should have been placed.

The availability of programming building blocks capable of regnising individual lego bricks and associating them with a part number also used by CAD tools such as LDraw could be seen as an enabling technology for the further development of such diagnostic AR tools.

As a proof-of-concept idea, augmented reality Lego construction manuals provide a realistic, if toy example (in several senses of the word!) of how such techniques might be used in a practical setting. So it’s not surprising that augmented reality instruction manuals were among the first application areas described when the possibility of AR first began to emerge.

Optional Reading: two relatively early descriptions of augmented reality instruction and assembly manuals can be found in: Caudell, Thomas P., and David W. Mizell. “Augmented reality: An application of heads-up display technology to manual manufacturing processes“, Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences, vol. 2, pp. 659-669. IEEE, 1992.; and Feiner, Steven, Blair Macintyre, and Dorée Seligmann, “Knowledge-based augmented reality.” Communications of the ACM 36, no. 7 (1993): 53-62.

A similar approach is also being used to develop service manuals for use in industry, via magic lens displays, and presumably also in smart helmet displays:

Tablet, or phone, based magic lens apps are also being use to support car maintenance in the form of augmented reality car manuals that are capable of recognising particular features of a car dashboard, or engine, and the interactively annotating them with a virtual overlay, as described in this press release from Hyundai about their augmented reality car ownersmanual:

On the other hand, maybe such applications are just so much hype and not actually of interest to a wider public? For example, in 2013, one company attempted to crowdsource funding for a general purpose app that could act as an AR style guide for a range of car models:

At the time of writing, in mid-2016, the site appears to have raised almost $500 of the $90, 000 goal. Maybe augmented reality is not that compelling for the mass market?!

However, it seems as if the development of augmented reality technical documentation is still an area of academic research, at least.

In the next post on this theme, we’ll see how augmented reality can be used to implement “magic mirrors”, in contrast to “magic lenses”. But first – what “lenses” can we apply to another modality: sound?

Noise Cancellation – An Example of Mediated Audio Reality?

Whilst it is tempting to focus on the realtime processing of visual imagery when considering augmented reality, notwithstanding the tricky problem of inserting a transparent display between the viewer and the physical scene when using magic lens approaches, it may be that the real benefits of augmented reality will arise from the augmentation or realtime manipulation of another modality such as sound.

EXERCISE: describe two or three examples of how audio may be used, or transformed, to alter a user’s perception or understanding of their current environment.

ANSWER: car navigation systems augment spatial location with audio messages describing when to turn and audio guides in heritage settings, where you can listen to a story that “augments” a particular location. Noise cancelling earphones transform the environment by subtracting, or tuning out, background noise and modern digital hearing aids process the audio environment at a personal level in increasingly rich ways.

Noise Cancellation

As briefly described in Blurred Edges – Dual Reality, mediated reality is a general term in which information may be added to or subtracted from a real world scene. In many industrial and everyday settings, intrusive environmental noise may lead to an unpleasant work environment, or act as an obstacle to audio communication. In such situations, it might be convenient to remove the background noise and expose the subjects within it to a mediated audio reality.

Noise cancellation provides one such form of mediated reality, where the audio environment is actively “cleaned” of an otherwise intrusive noise component. Noise cancellation technology can be use to cancel out intrusive noise in noisy environments, such as cars or aircraft. By removing noisy components from the real world audio, noise cancellation may be thought of as producing a form of diminished reality, in the sense that environmental components have ben lost, rather than added to, even though the overall salient signal to noise ration may have increased.

Noise cancelled environments might also be considered as a form of hyper-reality, in the sense that no information other than that contained within, or derived from, the original signal is presented as part of the “augmented” experience.

EXERCISE: watch the following videos that demonstrate the effect of noise cancelling headphones and that describe how they work, then answer the following questions:

  • how does “active” noise cancellation differ from passive noise cancellation?
  • what sorts of noise are active noise cancellation systems most effective at removing, and why?
  • what sort of system can be used to test or demonstrate the effectiveness of noise cancelling headphones?

Finally, write down an algorithm that describes, in simple terms, the steps involved in a simple noise cancelling system.

EXERCISE: Increasingly, top end cars may include some sort of noise cancellation system to reduce the effects of road noise. How might noise cancellation be used, or modified, to cancel noise in an enclosed environment where headphones are not typically worn, such as when sat inside a car?

Rather than presenting the mixed audio signal to a listener via headphones, under some circumstances speakers may be used to cancel the noise as experienced within a more open environment.

As well as improving the experience of someone listening to music in a noisy environment, noise cancellation techniques can also be useful as part of a hearing aid for hard of hearing users. One of the major aims of hearing aid manufacturers is to improve the audibility of speech – can noise cancellation help here?

EXERCISE: read the articles – and watch/listen to the associated videos – Noise Reduction Systems and Reverb Reduction produced by hearing aid manufacturer Sonic. What sorts of audio reality mediation are described?

It may seem strange to you to think of hearing aids as augmented, or more generally, mediated, reality devices, but their realtime processing and representation of the user’s current environment suggests this is exactly what they are!

In the next post on this theme, we will explore what sorts of physical device or apparatus can be used to mediate audio realities. But for now, let’s go back to the visual domain…

Interlude – Cleaning Audio Tracks With Audacity

Noise cancelling headphones remove background noise by comparing a desired signal to a perceived signal and removing the unwanted components. So for noisy situations where we don’t have access to the clean signal, are we stuck with just the noisy signal?

Not necessarily.

Audio editing tools like Audacity can also be used to remove constant background noise from an audio track by building a simple model of the noise component and then removing it from the audio track.

The following tutorial shows how a low level of background noise may be attenuated by generating a model of the baseline noise on a supposedly quiet part of an audio track and then removing it from the whole of the track. (The effect referred to as Noise Removal in the following video has been renamed Noise Reduction in more recent versions of Audacity.)

SAQ: As the speaker records his test audio track, we see Audacity visualising the waveform in real time. To what extent might we consider this a form of augmented reality?

Other filters can be used to remove noise components with a different frequency profile such as the “pops” and “clicks” you might hear on a recording made from a vinyl record.

In each of the above examples, Audacity’s visual representation of the audio waveform, creating a visual reality from an audio one. This reinforces through visualisation what the original problems were with the audio signals and the consequences of applying the particular audio effect when trying to clean them.

DO: if you have a noisy audio file to hand and fancy trying to clean it up, why not try out the techniques shown in the videos above – or see if you can find any more related tutorials.

From Magic Lenses to Magic Mirrors and Back Again

In recent years, commercial outdoor advertising has made increasing use of screen based digital signage. These can be used for video based advertising campaigns as well as “carousel” style displays where the same screen can be used to display different adverts in turn. But in a spirit of playfulness, they may also be used as magic lens style displays, similar in kind to the handheld magic lens applications described in the post “Magic Lenses” and See-Through Displays. In 2014, the Pepsi Max “Unbelievable” ad campaign by Abbott Mead Vickers BBDO tricked passengers waiting in London bus shelters into think a customised bus shelter had a transparent side wall, when it fact it was a large magic lens – the Pepsi Max “Unbelievable Bus Shelter”.

Magic lenses provide both a view of the world in front of the display as well as mediated, augmented or transformed version of it. But what if we replace the idea of a lens with that of a mirror, that augments the scene captured by a front-mounted, user facing camera?

Another part of the Pepsi Max “Unreality” campaign replaced a real mirror with a “magic mirror” that transformed the “reflection” seen by the subject by replacing their face with a virtually face-painted version of it:

Reference: Campaign, Pepsi Max “unbelievable” by Abbott Mead Vickers BBDO.

Just as mobile phone provide a convenient device for viewing the scene directly in front of the user via a screen, with all that entails in terms of re-presenting the scene digitally, front mounted cameras on smart phones allow the user to display a live video feed of their own face on the screen, essentially using the user-facing camera+live video display combination as a mirror. But can such things also be used as magic mirrors?

Indeed they can. Several cosmetics manufacturers already publish make-up styling applications that show the effect of applying different styles of make-up selected by the user. The applications rely on identifying particular facial features, such as lips, or eyes, and then allow the use to apply the make-up virtually. (You will see how this face-capture works in another post.)

Another application, ModiFace, offers a similar range of features.

For an academic take on how an augmented reality make-up application can be used for make-up application tutorial purposes, see de Almeida, D. R. O., Guedes, P. A., da Silva, M. M. O., e Silva, A. L. B. V., do Monte Lima, J. P. S., & Teichrieb, V. (2015, May). Interactive Makeup Tutorial Using Face Tracking and Augmented Reality on Mobile Devices. In Virtual and Augmented Reality (SVR), 2015 XVII Symposium on (pp. 220-226). IEEE.

In much the same way that the Pepsi Max bus shelter used a large size display as a magic lens, so to can human size displays be used to implement magic mirrors.

Once again, the fashion industry has made use of full length magic mirrors to help consumers “try on” clothes using augmented reality. The mirror identifies the customer and then overlays their “reflection” with the items to be tried on. The following video shows the FXGear FXMirror being used as part as a shop floor fitting room.

EXERCISE: Read the blurb about the FXGear FXMirror. What data is collected about users who model clothes using the device? How might such data be used?

EXERCISE: How else have marketers and advertisers used augmented and mediated reality? Try searching through various marketing trade/industry publications to find reports of recent campaigns using such techniques. If you find any, please provide a quick review of them, along with a link, in the comments.

Augmented Reality Apps for the Design Conscious

When the 2013 Ikea catalogue was first released at the start of August 2012, as part of a campaign developed in association with the McCann advertising agency, it was complemented by an augmented reality application that allowed customers to place catalogue items as if in situ in their own homes. Each year since then, the augmented reality app has been updated with the latest catalogue items, demonstrating Ikea’s ongoing commitment to this form of marketing.

For an early report, see for example: WiredSo Smart: New Ikea App Places Virtual Furniture in Your Home, August 2013.

Perhaps not surprisingly, the use of augmented reality in the context of interior design extends far beyond just an extension of the Ikea catalogue.

One of the drawbacks of the current generation of augmented reality interior design applications is the low quality of the rendering of the digital 3D object. As we shall see elsewhere, the higher powered computer processors available in today’s desktop and laptop computers, compared to mobile devices, means that it is becoming possible to render photorealistic objects in a reasonable amount of time with a personal computer. However, meeting the realtime rendering requirement of augmented reality apps, as well as the ability to ensure that that the rendered object is appropriately shaded given the lighting conditions of the environment and the desired location of the artificial object, presents further technological challenges.

EXERCISE: read the Accenture report from 2014 Life on the digital edge: How augmented reality can enhance customer experience and drive growth and then answer the following questions:

  • what does the report describe as “one of the main goals of any retailer’s digital investment”? How do they claim augmented reality might achieve that goal? To what extent do you think that claim is realistic? What obstacles can you think of that might stand in the way of achieving such a goal using augmented or mediated reality?
  • according to the report, how might augmented reality be used in retail? The report was published in 2014 – can you find any recent examples of augmented reality being used in ways described in the report? Is it being used for retail in ways not identified in the report?
  • what does the report identify as the possible business value benefits of using augmented reality? In that section, a table poses the question “What augmented reality use case would increase your likelihood of purchasing the product?”. Can you find one or more current or historical examples of the applications described? Do such applications seem to be being used more – or less – frequently in recent times?

A lot of hype surrounds artificial reality although in many respects is value other than as a novelty are yet to be determined. To what extent do you think augmented reality applications are a useful everyday contribution to the marketer’s toolkit, and to what extent are they simply a marketing novelty fit only for short lived campaigns? What are the challenges to using such applications as part of an everyday experience?