In Introducing Augmented Reality Apparatus – From Victorian Stage Effects to Head-Up Displays, we saw how the Pepper’s Ghost effect could be used to display information in a car using a head-up display projected onto a car windscreen as a driver aid. In this post, we’ll explore the extent to which digital models of the world that may be used to support augmented reality effects may also be used to support other forms of behaviour…
Constructing a 3D model of an object in the world can be achieved by measuring the object directly, or, as we have seen, measuring the distance to different points on the object from a scanning device and then using these points to construct a model of the surface corresponding to the size and shape of the object. According to IEEE Spectrum’s report describing A Ride In Ford’s Self-Driving Car, “Ford’s little fleet of robocars … stuck to streets mapped to within two centimeters, a bit less than an inch. The car compared that map against real-time data collected from the lidar, the color camera behind the windshield, other cameras pointing to either side, and several radar sets—short range and long—stashed beneath the plastic skin. There are even ultrasound sensors, to help in parking and other up-close work.”
Whilst the domain of autonomous vehicles may seem to be somewhat distinct from the world of facial capture on the one hand, and augmented reality on the other, autonomous vehicles rely on having a model of the world around them. One of the techniques currently used in detecting distances to objects surrounding an autonomous vehicle is LIDAR, in which a laser is used to accurately detect the distance to a nearby object. But recognising visual imagery also has an important part to play in the control of autonomous and “AI-enhanced” vehicles.
For example, consider the case of automatic lane detection:
Here, an optical view of the world is used as the basis for detecting lanes on a motorway. The video also shows how other vehicles in the the scene can be detected and tracked, along with the range to them.
A more recent video from Ford shows the model of the world perceived from the range of sensors one of their autonomous vehicles.
Part of the challenge of proving autonomous vehicle technologies to regulators, as well as development engineers, is the ability to demonstrate what the vehicle thinks it can see and what it might do next. To this extent, augmented reality displays may be useful in presenting in real-time a view of a vehicle’s situational awareness of the environment it currently finds itself in.
DO: See if you can find some further examples of the technologies used to demonstrate the operation of self-driving and autonomous vehicles. To what extent do these look like augmented reality views of the world? What sorts of digital models do the autonomous vehicles create? To what extent could such models be used to support augmented reality effects, and what effects might they be?
If, indeed, there is crossover between the technology stack that underpins autonomous vehicles, computational devices developed to support autonomous vehicle operation may also be useful to augmented and mixed reality developers.
DO: read through the description of the NVIDIA DRIVE PX 2 system and software development kit. To what extent do the tools and capabilities described sound as if they may be useful as part of an augmented or mixed reality technology stack? See if you can find examples of augmented or mixed reality developers using such toolkits originally developed or marketed for autonomous vehicle use and share them in comments below.
In the post From Magic Lenses to Magic Mirrors and Back Again we reviewed several consumer facing alternate reality phone applications, such as virtual make-up apps In this post, we’ll review some simple face based reality distorting effects with an alternative reality twist.
In the world of social networks, Snapchat provides a network for sharing “disposable” photographs and video clips, social objects that persist on the phone for a short period before disappearing. One popular feature of snapchat comes in the form of its camera and video filters, also referred to as SnapChat Lenses, that can be used to transform or overlay pictures of faces in all sorts of unbecoming ways.
As the video shows, the lenses allow digital imagery to be overlaid on top of the image, although the origin of the designs is sometimes open to debate as the intellectual property associated with facepainting designs becomes contested (for example, Swiped – Is Snapchat stealing filters from makeup artists?).
Behind the scenes, facial features are captured using a crude form of markerless facial motion capture to create a mesh that acts as a basis for the transformations or overlays as described in From Motion Capture to Performance Capture and 3D Models from Imagery.
Another class of effect supported by “faceswap” style applications is an actual faceswap, in which one person’s face is swapped with another’s – or even your own.
As well as swapping two human faces, faceswapping can be used to swap a human face with the face of a computer game character. For computer gamers wanting to play a participating role in the games they are playing, features such as EASports GameFace allow users to upload two photos of their face – a front view and a side view – and then use their face on one of the game characters models.
The GameFace interface requires the user to physically map various facial features on the uploaded photograph so that these can then be used to map the facial mesh onto an animated character mesh. The following article shows how facial features registered as a simple mesh on two photographs can be used to achieve a faceswap effect “from scratch” using open source programming tools.
DO: read through the article Switching Eds: Face swapping with Python, dlib, and OpenCV by Matthew Earl to see how a faceswap style effect can be achieved from scratch using some openly available programming libraries. What process is used to capture the facial features used to map from one face to the other? How is the transformation of swapping one face with another actually achieved? What role does colour manipulation play in creating a realistic faceswap effect?
Developing algorithms and approaches face tracking is an active area of research, both in academia and commercially. The outputs of academic research are often written up in academic publications. Sometimes, the implementation code is made available by researchers, although at other times it is not. Academic reports should also provide enough detail about the algorithms described for independent third parties to be able to implement, as is the case in Audun Mathias Øygard’s clmtrackr.
DO: What academic paper provided the inspiration for clmtrackr? Try running examples listed on auduno/clmtrackr and read about the techniques used in the posts Fitting faces – an explanation of clmtrackr and Twisting faces: some use cases of clmtrackr. How does the style of writing and explanation in those posts compare to the style of writing used in the academic paper? What are the pros and cons of each style of writing? Who might the intended audience be in each case?
UPDATE: it seems as if Snapchat may doing a line of camera enabled sunglasses – Snapchat launches sunglasses with camera. How much harder is it to imagine the same company doing a line in novelty AR specs that morph those around you in a humorous and amusing way whenever you look at them…?! Think: X-Ray spex adds from the back of old comics…
In the post Augmented TV Sports Coverage & Live TV Graphics, we saw how live TV graphics could be used to overlay sports events in order to highlight particular elements of the sports action.
One of the thing things you may have noticed in some of the broadcasts was that as well as live “telestrator” style effects, such as highlighting the trajectory of a ball, or participant tracking effects, many of the scenes also included on pitch advertising. So was the pitch really painted with large adverts, or were they digital effects? The following showreel from Namadgi Systems (which in its full form demonstrates many of the effects shown in the previously mentioned post) suggests that the on pitch advert are, in fact, digital creations. Other vendors of similar services include Broadcast Virtual and BrandMagic.
So-called virtual advertising allows digitally rendered adverts to be embedded into live broadcast feeds in a way that makes the adverts appear as if they are situated on or near the field of play. As such, to the viewer of the broadcast, it may appear as if the advert would be visible to the spectators present at the event. In fact, it may be the case that the insert is an entirely digital creation, an overlay on top some sort of distinguished marker or location (determined relative to an easily detected pitch boundary, for example), or a replacement of a static, easily recognised and masked local advert.
EXERCISE: Watch the following video and see how many different forms of virtual advertising you can detect.
So how many different ways of delivering mediated reality ads did you find?
The following marketing video from Supponor advertises their “digital billboard replacement” (DBRLive) product that is capable of identifying and tracking track or pitched advertising hoardings and replacing them with custom adverts.
EXERCISE: what do you think are the advantages of using digital signage over fixed advertising billboards? What further advantages do “replacement” techniques such as DBRLive have over traditional digital signage? To what extent do you think DBRLive is a mediated reality application?
As well as transforming the perimeter, and event the playing area, with digital adverts, sports broadcasters often present a mediated view of the studio set inhabited by the host and selected pundits to provide continuity during breaks in the action, as the following corporate video from vizrt describes:
So how do virtual sets work and how do they compare with the “chroma key” effects used in TV and film production since the 1940s? We’ll need another post for that…
In the post Augmented TV Sports Coverage & Live TV Graphics, we saw how sports broadcasters increasingly make use of effects that highlight tracked elements in a sporting event, from the players in a football match to the ball they are playing with. So how else might we apply such tracking technologies?
According to Melvin Kranzberg’s first law of technology, “Technology is neither good nor bad; nor is it neutral”. In the sports context, we may be happy to thing that cameras can be used to track – and annotate – each player’s every move. But what if we take such technological capabilities and apply them elsewhere?
EXERCISE: As well as being used to support referees making decisions about boundary line events, such as whether a tennis ball landed “in” or “out”, or whether a football crossed the goal line, how might virtual boundaries be used as part of a video surveillance system? To what extent could image tracking systems also be used as part of a video surveillance system?
One way of using virtual boundaries as part of a video based surveillance system might be to use them as virtual trip wires, where breaches of a virtual boundary or fence can be used to flag a warning about a possible physical security breach and perhaps start a detailed recording of the scene.
ASIDE: The notion of virtual tripwires extends into other domains too. For example, for objects tracked using GPS, “geo-fences” can be defined that raise an alert when a tracked object enters, or leaves, a particular geographic area. The AIS ship identification system used to uniquely identify ships – and their locations – can be used as part of a geofenced application to raise an alert whenever a particular boat, such as a ferry, enters or leaves a port.
Video surveillance might also be used to track individuals through a videoed scene. For example, if a person of interest has been detected in a particular piece of footage, they might be automatically tracked through that scene. If multiple cameras cover the same area, persons of interest may be tracked across multiple video feeds, as described by Khan, Sohaib, Omar Javed, Zeeshan Rasheed, and Mubarak Shah. “Human tracking in multiple cameras.” In Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, vol. 1, pp. 331-336. IEEE, 2001.
Where the environment is rather more constrained, such as an office block, tools such as the FXPAL DOTS Video Surveillance System allow for individuals to be tracked throughout the building. Optional filters also allow tracking or identification based on the colour of clothing, which may be meaningful in an environment where different colour uniforms or protective clothing are used to identify people by role – and perhaps by different access permission levels.
Once a hard computer science problem to solve, a wide variety of programming libraries and tools now support object identification and tracking. There are even Javascript libraries available, such as tracking.js, that are capable of tracking objects and faces streamed from a laptop camera using code that runs just in your browser.
Tracking is one thing – but identification of tracked entities is another. In some situations, however, tracked entities may carry clearly seen identifiers – such as car number plates. Automatic Number Plate Recognition (ANPR) is now a mature technology and is widely deployed against moving, as well as stationary, vehicles.
With technology firmly in place for tracking objects, and perhaps even identifying them, analysts are now turning their attention to systems that are capable of automatically identifying different events, or behaviours, within a visual scene, a step up from the simple “threshold crossing” behaviours used to implement virtual tripwires.
Once behaviours have been automatically identified, the visual scene may be overlaid with a statement of, or interpretation of, those behaviours.
Many technologies are developed for a particular purpose, but that does not prevent them being adopted for other purposes. When new technologies emerge, there are often many opportunities for businesses and entrepreneurs to find ways of using those technologies either on their own or in combination with other technologies. However, there are also risks, not least that the technology is used for a harmful purpose, or that that we do not approve of. More difficult is to try to predict what the consequences of using such technologies widely may be. As technologists, it’s our job to try to think critically about how emerging technologies may be used, whether for good, or evil, and contribute to debates about whether we want to approve the use of such technologies, or limit them in some way.
In the post From Magic Lenses to Magic Mirrors and Back Again we saw how magic lenses allow users to look through a screen at a mediated view of the scene in front of them, and magic mirrors allow users to look at a mediated view of themselves. In this post, we will look at how remote viewer might capture a scene that is then mediated in some way before being presented to the viewer in near-real-time. In particular, we will consider how live televised sporting events may be augmented to enhance the viewer’s understanding or appreciation of the event.
Ever since the early days of television, TV graphics have been used to overlay information – often in the “lower third” of the screen – to provide a mediated view of the scene being displayed. For example, one of the most commonly scene lower third effects is to display a banner giving the name and affiliation of a “talking head”, such as a politician being interviewed in a news programme.
But in recent years, realtime annotation of elements within the visual scene have become possible, providing the producers of sports television in particular with a very rich and powerful way of enhancing the way that a particular event is covered with live TV graphics.
EXERCISE: from your own experience, try to recall two or three examples of how “augmented reality” style effects can be used to enhance televised sporting events in a real-time or near-realtime way.
Educators often use questions to focus the attention of the learner onto a particular matter. For example, an educator reading an academic paper may identify things of interest (to them) that they want the learner to pick up on. The educator then needs to find a way of twisting the attention of the learner to those points of interests. This is often what motivates the questions they set around a resource (its purpose is to help the students learn how to focus their attention on a resource and immediately reflect back why something in the paper might be interesting – by casting a question to which the item in the paper is the answer). When addressing a question, the learner also needs to appreciate that they expected to answer the question in an academic way. More generally, when you read something, read it with a set of questions in mind that may have been raised by reading the abstract. You can also annotate the reading with questions which that part of the reading answers. Another trick is to spot when part of the reading answers a question or addresses a topic you didn’t fully understand: “Ah, so that means if this, then that…”. This is a simple trick, but a really powerful one nonetheless, and can help you develop your own self-learning skills.
EXERCISE: Read through the following abstract taken from a BBC R&D department white paper written in 2012 (Sports TV Applications of Computer Vision, riginally published in ‘Visual Analysis of Humans: Looking at People’, Moeslund, T. B.; Hilton, A.; Krüger, V.; Sigal, L. (Eds.), Springer 2011):
This chapter focuses on applications of Computer Vision that help the sports broadcaster illustrate, analyse and explain sporting events, by the generation of images and graphics that can be incorporated in the broadcast, providing visual support to the commentators and pundits. After a discussion of simple graphics overlay on static images, systems are described that rely on calibrated cameras to insert graphics or to overlay content from other images. Approaches are then discussed that use computer vision to provide more advanced effects, for tasks such as segmenting people from the background, and inferring the 3D position of people and balls. As camera calibration is a key component for all but the simplest applications, an approach to real-time calibration of broadcast cameras is then presented. The chapter concludes with a discussion of some current challenges.
How might the techniques described be relevant to / relate to AR?
Now read through the rest of the paper, and try to answer the following questions as you do so:
what is a “free viewpoint”?
what is a “telestrator” – to what extent might you claim this is an example of AR?
what approaches were taken to providing “Graphics overlay on a calibrated camera image”? How does this compare with AR techniques? Is this AR?
what is Foxtrax and how does it work?
what effects are possible once you “segment people or other moving objects from the background”? What practical difficulties must be overcome when creating such an effect?
how might prior knowledge help when constructing tracking systems? What additional difficulties arise when tracking people?
how can environmental features/signals be used to help calibrate camera settings? what does it even mean to calibrate a camera?
what difficulties are associated with Segmentation, identification and tracking?
The white paper also identifies the following challenges to “successfully applying computer vision techniques to applications in TV sports coverage”:
The environment in which the system is to be used is generally out of the control of the system developer, including aspects such as lighting, appearance of the background, clothing of the players, and the size and location of the area of interest. For many applications, it is either essential or highly desirable to use video feeds from existing broadcast cameras, meaning that the location and motion of the cameras is also outside the control of the system designer.
The system needs to fit in with existing production workflows, often needing to be used live or with a short turn-around time, or being able to be applied to a recording from a single camera.
The system must also give good value-for-money or offer new things compared to other ways of enhancing sports coverage. There are many approaches that may be less technically interesting than applying computer vision techniques, but nevertheless give significant added value, such as miniature cameras or microphones placed in a in cricket stump, a ‘flying’ camera suspended on wires above a football pitch, or a high frame-rate cameras for super-slow-motion.
To what extent do you think those sorts of issues apply more generally to augmented and mediated reality systems?
In the rest of this post, you will some some examples of how computer vision driven television graphics have been used in recent years. As you watch the videos, try to relate the techniques demonstrated with the issues raised in the white paper.
From 2004 to 2010, the BBC R&D department, in association with Red Bee Media, worked on a system known as Piero, now owned by Ericsson, that explored a wide range of augmentation techniques. Watch the following videos and see how many different sorts of “augmentation” effect you can identify. In each case, what sorts of enabling technology do you think are required in order to put together a system capable of generating such an effect?
In the US, SportVision provide a range of real-time enhancements for televised sports coverage. The following video demonstrates car and player tracking in motor-racing and football respectively, ball tracking in baseball and football (soccer), and a range of other “event” related enhancements, such as offside lines or player highlighting in football (soccer).
EXERCISE: watch the SportVision 2012 showreel on the SportVision website. How many different augmented reality style effects did you see demonstrated in the showreel?
For further examples, see the case studies published by vizrt.
Watching the videos, there are several examples of how items tracked in realtime can be visualised, either to highlight a particular object or feature (such as tracking a player, highlighting the position of a ball, puck, or car), or trace out the trajectory followed by the object (for example, highlighting in realtime the path followed by a ball).
Having seen some examples of the techniques in action, and perhaps started to ask yourself “how did they do that?”, skim back over the BBC white paper to see if any of the sections jump out at you in answer to your self-posed questions.
In the UK, Hawk-Eye Innovations is one of the most well known providers of such services to UK TV sports viewers.
The following video describes in a little more detail how the Hawk-Eye system can be used to enhance snooker coverage.
And how Hawk-Eye is used in tennis:
In much the same way as sportsmen compete on the field of play, so too do rival technology companies. In the 2010 Ashes series, Hawk-Eye founder Paul Hawkins suggested that a system provided by rivals VirtualEye could lead to inaccurate adjudications due to human operator error compared to the (at the time) more completely automated Hawk-Eye system (The Ashes 2010: Hawk-Eye founder claims rival system is not being so eagle-eyed).
The following video demonstrates how the Virtual Eye ball tracking software worked to highlight the path of a cricket ball as it is being bowled:
EXERCISE: what are the benefits to sports producers from using augmented reality style, realtime television graphics as part of their production?
The following video demonstrates how the SportVision Liveline effect can be used to help illustrate what’s actually happening in an Americas Cup yacht race, which can often be hard to follow for the casual viewer:
EXERCISE: To what extent might such effects be possible in a magic lens style application that could be used by a spectator actually witnessing a live sporting event?
EXERCISE: review some of the video graphics effects projects undertaken in recent years by the BBC R&D department. To what extent do the projects require: a) the modeling of the world with a virtual representation of it; b) the tracking of objects within the visual scene; c) the compositing of multiple video elements, or the introduction of digital objects within the visual scene?
As a quick review of the BBC R&D projects in this area suggests, the development of on-screen graphics that can track objects in real time may be complemented by the development of 3D models of the televised view so that it can be inspected from virtual camera positions that provide a view of the scene that is reconstrcuted from a model bulit up from the real camera positions.
Once again, though, there may be a blurring of reality – because is the view actually taken from a virtual camera, or a real one such as in the form of a Spidercam?
As well as overlaying actual footage with digital effects, sports producers are also starting to introduce virtual digital objects into the studio to provide an augmented reality style view of the studio to the viewer at home.
The use of 3D graphics in TV studios is increasingly being used to dress other elements of the set. In addition, graphics are also being used to enhance TV sports through the use of virtual advertising. Both these approaches will be discussed in another post.
More generally, digital visual effects are used widely across film and television, as we shall also explore in a later post…
PS In the absence of a more recent round-up, here’s an application reviewed in late 2017:
And here’s an example of an application for annotating sports scenes:
Such is the power of today’s web browsers, on smartphones as well as laptops, that it’s possible to run a simple augmented reality demo in your phone or laptop browser using just the code contained in a small javascript library.
DO: visit the online Github code repository jeromeetienne/threex.webar. You can run the demo in several ways:
if you have a laptop computer with a camera, make a copy of the registration marker image, either by printing it or grabbing a photograph of it with a smartphone and then show the marker to the demo page;
load the demo page on your smartphone, allow the page to make use of the phone camera, and then use it to view the marker image displayed on a computer screen or the screen of someone else’s smartphone.
What enabling technologies made the threex.webar demonstration possible?
In recent years, commercial outdoor advertising has made increasing use of screen based digital signage. These can be used for video based advertising campaigns as well as “carousel” style displays where the same screen can be used to display different adverts in turn. But in a spirit of playfulness, they may also be used as magic lens style displays, similar in kind to the handheld magic lens applications described in the post “Magic Lenses” and See-Through Displays. In 2014, the Pepsi Max “Unbelievable” ad campaign by Abbott Mead Vickers BBDO tricked passengers waiting in London bus shelters into think a customised bus shelter had a transparent side wall, when it fact it was a large magic lens – the Pepsi Max “Unbelievable Bus Shelter”.
Magic lenses provide both a view of the world in front of the display as well as mediated, augmented or transformed version of it. But what if we replace the idea of a lens with that of a mirror, that augments the scene captured by a front-mounted, user facing camera?
Another part of the Pepsi Max “Unreality” campaign replaced a real mirror with a “magic mirror” that transformed the “reflection” seen by the subject by replacing their face with a virtually face-painted version of it:
Just as mobile phone provide a convenient device for viewing the scene directly in front of the user via a screen, with all that entails in terms of re-presenting the scene digitally, front mounted cameras on smart phones allow the user to display a live video feed of their own face on the screen, essentially using the user-facing camera+live video display combination as a mirror. But can such things also be used as magic mirrors?
Indeed they can. Several cosmetics manufacturers already publish make-up styling applications that show the effect of applying different styles of make-up selected by the user. The applications rely on identifying particular facial features, such as lips, or eyes, and then allow the use to apply the make-up virtually. (You will see how this face-capture works in another post.)
Another application, ModiFace, offers a similar range of features.
For an academic take on how an augmented reality make-up application can be used for make-up application tutorial purposes, see de Almeida, D. R. O., Guedes, P. A., da Silva, M. M. O., e Silva, A. L. B. V., do Monte Lima, J. P. S., & Teichrieb, V. (2015, May). Interactive Makeup Tutorial Using Face Tracking and Augmented Reality on Mobile Devices. In Virtual and Augmented Reality (SVR), 2015 XVII Symposium on (pp. 220-226). IEEE.
In much the same way that the Pepsi Max bus shelter used a large size display as a magic lens, so to can human size displays be used to implement magic mirrors.
Once again, the fashion industry has made use of full length magic mirrors to help consumers “try on” clothes using augmented reality. The mirror identifies the customer and then overlays their “reflection” with the items to be tried on. The following video shows the FXGear FXMirror being used as part as a shop floor fitting room.
EXERCISE: Read the blurb about the FXGear FXMirror. What data is collected about users who model clothes using the device? How might such data be used?
EXERCISE: How else have marketers and advertisers used augmented and mediated reality? Try searching through various marketing trade/industry publications to find reports of recent campaigns using such techniques. If you find any, please provide a quick review of them, along with a link, in the comments.
Augmented Reality Apps for the Design Conscious
When the 2013 Ikea catalogue was first released at the start of August 2012, as part of a campaign developed in association with the McCann advertising agency, it was complemented by an augmented reality application that allowed customers to place catalogue items as if in situ in their own homes. Each year since then, the augmented reality app has been updated with the latest catalogue items, demonstrating Ikea’s ongoing commitment to this form of marketing.
Perhaps not surprisingly, the use of augmented reality in the context of interior design extends far beyond just an extension of the Ikea catalogue.
One of the drawbacks of the current generation of augmented reality interior design applications is the low quality of the rendering of the digital 3D object. As we shall see elsewhere, the higher powered computer processors available in today’s desktop and laptop computers, compared to mobile devices, means that it is becoming possible to render photorealistic objects in a reasonable amount of time with a personal computer. However, meeting the realtime rendering requirement of augmented reality apps, as well as the ability to ensure that that the rendered object is appropriately shaded given the lighting conditions of the environment and the desired location of the artificial object, presents further technological challenges.
what does the report describe as “one of the main goals of any retailer’s digital investment”? How do they claim augmented reality might achieve that goal? To what extent do you think that claim is realistic? What obstacles can you think of that might stand in the way of achieving such a goal using augmented or mediated reality?
according to the report, how might augmented reality be used in retail? The report was published in 2014 – can you find any recent examples of augmented reality being used in ways described in the report? Is it being used for retail in ways not identified in the report?
what does the report identify as the possible business value benefits of using augmented reality? In that section, a table poses the question “What augmented reality use case would increase your likelihood of purchasing the product?”. Can you find one or more current or historical examples of the applications described? Do such applications seem to be being used more – or less – frequently in recent times?
A lot of hype surrounds artificial reality although in many respects is value other than as a novelty are yet to be determined. To what extent do you think augmented reality applications are a useful everyday contribution to the marketer’s toolkit, and to what extent are they simply a marketing novelty fit only for short lived campaigns? What are the challenges to using such applications as part of an everyday experience?
Whilst it is tempting to focus on the realtime processing of visual imagery when considering augmented reality, notwithstanding the tricky problem of inserting a transparent display between the viewer and the physical scene when using magic lens approaches, it may be that the real benefits of augmented reality will arise from the augmentation or realtime manipulation of another modality such as sound.
EXERCISE: describe two or three examples of how audio may be used, or transformed, to alter a user’s perception or understanding of their current environment.
ANSWER: car navigation systems augment spatial location with audio messages describing when to turn and audio guides in heritage settings, where you can listen to a story that “augments” a particular location. Noise cancelling earphones transform the environment by subtracting, or tuning out, background noise and modern digital hearing aids process the audio environment at a personal level in increasingly rich ways.
Noise Cancellation
As briefly described in Blurred Edges – Dual Reality, mediated reality is a general term in which information may be added to or subtracted from a real world scene. In many industrial and everyday settings, intrusive environmental noise may lead to an unpleasant work environment, or act as an obstacle to audio communication. In such situations, it might be convenient to remove the background noise and expose the subjects within it to a mediated audio reality.
Noise cancellation provides one such form of mediated reality, where the audio environment is actively “cleaned” of an otherwise intrusive noise component. Noise cancellation technology can be use to cancel out intrusive noise in noisy environments, such as cars or aircraft. By removing noisy components from the real world audio, noise cancellation may be thought of as producing a form of diminished reality, in the sense that environmental components have ben lost, rather than added to, even though the overall salient signal to noise ration may have increased.
Noise cancelled environments might also be considered as a form of hyper-reality, in the sense that no information other than that contained within, or derived from, the original signal is presented as part of the “augmented” experience.
EXERCISE: watch the following videos that demonstrate the effect of noise cancelling headphones and that describe how they work, then answer the following questions:
how does “active” noise cancellation differ from passive noise cancellation?
what sorts of noise are active noise cancellation systems most effective at removing, and why?
what sort of system can be used to test or demonstrate the effectiveness of noise cancelling headphones?
Finally, write down an algorithm that describes, in simple terms, the steps involved in a simple noise cancelling system.
EXERCISE: Increasingly, top end cars may include some sort of noise cancellation system to reduce the effects of road noise. How might noise cancellation be used, or modified, to cancel noise in an enclosed environment where headphones are not typically worn, such as when sat inside a car?
Rather than presenting the mixed audio signal to a listener via headphones, under some circumstances speakers may be used to cancel the noise as experienced within a more open environment.
As well as improving the experience of someone listening to music in a noisy environment, noise cancellation techniques can also be useful as part of a hearing aid for hard of hearing users. One of the major aims of hearing aid manufacturers is to improve the audibility of speech – can noise cancellation help here?
EXERCISE: read the articles – and watch/listen to the associated videos – Noise Reduction Systems and Reverb Reduction produced by hearing aid manufacturer Sonic. What sorts of audio reality mediation are described?
It may seem strange to you to think of hearing aids as augmented, or more generally, mediated, reality devices, but their realtime processing and representation of the user’s current environment suggests this is exactly what they are!
In the next post on this theme, we will explore what sorts of physical device or apparatus can be used to mediate audio realities. But for now, let’s go back to the visual domain…
In the post Taxonomies for Describing Mixed and Alternate Reality Systems we introduced various schemes for categorising and classifying the various components of mixed and augmented reality systems. In this post, we will see how one particular class of display – see-through displays – can be put to practical purpose.
Using a phone, or tablet, with a forward facing, back-mounted camera as a see-through video display, you can relay the camera view to the screen in a realtime view mode and manipulate the current scene. This approach has been referred to as a “magic lens, … a see-through interface/metaphor that affords the user a modified view of the scene behind the lens” (D. Baričević, C. Lee, M. Turk, T. Höllerer and D. A. Bowman, “A hand-held AR magic lens with user-perspective rendering,” Mixed and Augmented Reality (ISMAR), 2012 IEEE International Symposium on, Atlanta, GA, 2012, pp. 197-206, doi: 10.1109/ISMAR.2012.6402557 [PDF]). (See also M. Rohs and A. Oulasvitra. Target acquisition with camera phones when used as magic lens, Proceedings of the 26th international conference on Human Factors in Computing Systems, CHI ’08, pages 1409–1418. ACM, 2008, who define a magic lens as an “augmented reality interface thats consist of a camera-equipped mobile device being used as a see-through tool. It augments the user’s view of real world objects by graphical and textual overlays”).
However, as the paper noted at the time:
Many existing concept images of AR magic lenses show that the magic lens displays a scene from the user’s perspective, as if the display were a smart transparent frame allowing for perspective-correct overlays. This is arguably the most intuitive view. However, the actual magic lens shows the augmented scene from the point of view of the camera on the hand-held device. The perspective of that camera can be very different from the perspective of the user, so what the user sees does not align with the real world. … We define the user-perspective view as the geometrically correct view of a scene from the point-of-view of the user, in the direction of the user’s view, and with the exact view the user should have in that direction.
Whilst head-up displays are also examples of see-through displays, many head-up displays do not necessarily situate the virtual digital objects as direct augmentations of the perceived physical world – rather they are frequently pop-up style dashboard that open as “desktop windows” or pop-up menus within the visual scene, rather than as direct objects of physical objects perceived within the visual scene.
The first wave of consumer augmented reality applications relied printing out registration images or QR codes that could act as fiducial markers and be easily recognised using image recognition software, and then overlaid with a 3D animation.
If an image could reliably be detected, it could be used as part of an augmented reality system, resulting in some innovative marketing campaigns.
The same idea can be used to enhance two-dimensional print publications. With a suitable device and the appropriate app installed, you can recognise a particular page of print and “unlock” additional content, an approach taken by the Layar augmented reality app, among other things, that allows you you create your own augmented reality enhanced content.
For more confident programmers, one of the earliest widely available augmented reality programming toolkits, the open source ARToolkit, (which is still being developed today and is distributed for free at ARToolkit.org), and the Wikitude SDK (software development kit), which allow professional and hobbiest programmers alike to create their own augmented reality demonstrations. (See also commercial services such as Catchoom: CraftAR.)
Within all these applications, we see how there is a need for “enabling technologies, … advances in the basic technologies needed to build compelling AR environments. Examples of these technologies include displays, tracking, registration, and calibration” (Azuma, Ronald, Yohan Baillot, Reinhold Behringer, Steven Feiner, Simon Julier, and Blair MacIntyre. “Recent advances in augmented reality.” IEEE computer graphics and applications 21, no. 6 (2001): 34-47) that make the development of such systems possible by developers outside of advanced research and development labs.
One popular category of AR Toolkit demonstration, and an approach that hints at a particular category of potential augmented reality applications, was the development of interactive Lego model assembly manuals. These could recognise a registration image associated with a particular model and could then step through the sequential steps required to build the model, overlaying the next piece to be added to the model in a stepwise fashion. The known size and of the marker, the fixed geometry of the model, and the availability of open source Lego CAD tools based around LDraw meant that many of the physical and computational building blocks required for creating such applications were already in place.
SAQ: What’s wrong with the demonstration shown in the video above?
Answer: The placing of the virtual block on the model does not appear to be in the correct place, but is offset slightly. This might arise from a combination of issues, including the placement of the registered image or the positioning of the or see-through device or the camera used to record the video.
An earlier demonstration of a Lego construction model instruction manual includes some additional humour in the form of an animated Lego figure mechanic who fetches an appropriate piece at each step and then demonstrates where to attach it to a model based around the original Lego Mindstorms Robot Invention System.
The demonstration also shows how augmented reality can be used to to test the operation of the completed assembly, stepping the user through a test sequence and virtually animating the expected behaviour. The ability of the RCX computer brick at the heart of the model to communicate back to the computer hosting the manual also allowed the augmented reality layer to display information captured by the brick (the light sensor readings) to be displayed in the augmented reality layer.
SAQ: how might advances in 3D image recognition technology be used to further improve the functionality of the manual, for example, in terms of checking the correct assembly of the model? What other enabling technologies may also help in this endeavour?
Answer: as the ability to recognise, identify and orientate 3D objects improved, the potential for generating digital overlays on three dimensional objects became more tractable. This means that it may be possible to recognise pieces picked up by the person building the model and checked by the interactive manual against the expected part. Erroneous parts could be highlighted with a warning sign. Additionally, the state of the model after each step is completed visually checked to see that the correct piece appears to have been placed in the correct position, although this is likely to present a mode complex task and may not be possible. If a piece could be identified as incorrectly placed (for example, in a “likely possible” misplaced position) the instruction manual might show how to move the part to the correct place, or rewind to show where the piece should have been placed.
The availability of programming building blocks capable of regnising individual lego bricks and associating them with a part number also used by CAD tools such as LDraw could be seen as an enabling technology for the further development of such diagnostic AR tools.
As a proof-of-concept idea, augmented reality Lego construction manuals provide a realistic, if toy example (in several senses of the word!) of how such techniques might be used in a practical setting. So it’s not surprising that augmented reality instruction manuals were among the first application areas described when the possibility of AR first began to emerge.
Optional Reading: two relatively early descriptions of augmented reality instruction and assembly manuals can be found in: Caudell, Thomas P., and David W. Mizell. “Augmented reality: An application of heads-up display technology to manual manufacturing processes“, Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences, vol. 2, pp. 659-669. IEEE, 1992.; and Feiner, Steven, Blair Macintyre, and Dorée Seligmann, “Knowledge-based augmented reality.” Communications of the ACM 36, no. 7 (1993): 53-62.
A similar approach is also being used to develop service manuals for use in industry, via magic lens displays, and presumably also in smart helmet displays:
Tablet, or phone, based magic lens apps are also being use to support car maintenance in the form of augmented reality car manuals that are capable of recognising particular features of a car dashboard, or engine, and the interactively annotating them with a virtual overlay, as described in this press release from Hyundai about their augmented reality car ownersmanual:
On the other hand, maybe such applications are just so much hype and not actually of interest to a wider public? For example, in 2013, one company attempted to crowdsource funding for a general purpose app that could act as an AR style guide for a range of car models:
At the time of writing, in mid-2016, the site appears to have raised almost $500 of the $90, 000 goal. Maybe augmented reality is not that compelling for the mass market?!
However, it seems as if the development of augmented reality technical documentation is still an area of academic research, at least.
In the next post on this theme, we’ll see how augmented reality can be used to implement “magic mirrors”, in contrast to “magic lenses”. But first – what “lenses” can we apply to another modality: sound?
In the post Taxonomies for Describing Mixed and Alternate Reality Systems, we provided a framework for talking about the various physical components of an augmented reality system. But how should we talk about the different elements within the perceived augmented reality scene?
Milgram and Kishino (Milgram, Paul & Fumio Kishino, “A taxonomy of mixed reality visual displays”, IEICE TRANSACTIONS on Information and Systems 77, no. 12 (1994): 1321-1329) started by clarifying the notions of real and virtual in an augmented reality sense:
Real objects are objects that have a physical, tangible existence, whereas virtual objects are purely digital representations, without a physical correlate, within the rendered visual scene (although they may be digital representations of things that do exist).
An object viewed directly appears has an existence in the real world and is viewed as such by the viewer. A non-directly viewed object is one that has been sampled and re-presented to the viewer via a display medium, or a virtual object whose existence can only be viewed via such a medium. This is referred to as the image quality.
A real image is one that has “some luminosity at the location at which it appears to be located”, such as a directly viewed object or an image viewed on a screen. Virtual images are produced by optical tricks, such as holograms and mirror images, and have no luminosity at the location at which they appear.
Whilst these distinctions are helpful when considering the representation of a single object, they may become confused when trying to analyse a view composed of multiple objects, both real and virtual. For example, in the Google Translate example described in Augmenting Reality With Digital Overlays, the screen is a physical display, that is, a real image, that provides a non-direct view. But is the text a real object or a virtual object?
To help us talk about objects within the augmented visual scene, we might add an additional correspondence dimension, that describes whether an object within the scene, or component of it, is presented as:
a raw, otherwise untouched, part of the image (that is, a faithful re-presentation of the object represented in that part of the image);
an overlay, where an additional layer of information is added to the scene, as in the case of a HUD dashboard;
a re-touch, where the object is still recognisable but has been reshaped and/or recoloured;
a replacement, where an object has been detected and then replaced.
We now have various tools at out disposal for helping us see – and talk about – the various components of a mixed reality system from a range of critical perspectives.
Recent Comments