Archive Page 2

From Magic Lenses to Magic Mirrors and Back Again

In recent years, commercial outdoor advertising has made increasing use of screen based digital signage. These can be used for video based advertising campaigns as well as “carousel” style displays where the same screen can be used to display different adverts in turn. But in a spirit of playfulness, they may also be used as magic lens style displays, similar in kind to the handheld magic lens applications described in the post “Magic Lenses” and See-Through Displays. In 2014, the Pepsi Max “Unbelievable” ad campaign by Abbott Mead Vickers BBDO tricked passengers waiting in London bus shelters into think a customised bus shelter had a transparent side wall, when it fact it was a large magic lens – the Pepsi Max “Unbelievable Bus Shelter”.

Magic lenses provide both a view of the world in front of the display as well as mediated, augmented or transformed version of it. But what if we replace the idea of a lens with that of a mirror, that augments the scene captured by a front-mounted, user facing camera?

Another part of the Pepsi Max “Unreality” campaign replaced a real mirror with a “magic mirror” that transformed the “reflection” seen by the subject by replacing their face with a virtually face-painted version of it:

Reference: Campaign, Pepsi Max “unbelievable” by Abbott Mead Vickers BBDO.

Just as mobile phone provide a convenient device for viewing the scene directly in front of the user via a screen, with all that entails in terms of re-presenting the scene digitally, front mounted cameras on smart phones allow the user to display a live video feed of their own face on the screen, essentially using the user-facing camera+live video display combination as a mirror. But can such things also be used as magic mirrors?

Indeed they can. Several cosmetics manufacturers already publish make-up styling applications that show the effect of applying different styles of make-up selected by the user. The applications rely on identifying particular facial features, such as lips, or eyes, and then allow the use to apply the make-up virtually. (You will see how this face-capture works in another post.)

Another application, ModiFace, offers a similar range of features.

In much the same way that the Pepsi Max bus shelter used a large size display as a magic lens, so to can human size displays be used to implement magic mirrors.

Once again, the fashion industry has made use of full length magic mirrors to help consumers “try on” clothes using augmented reality. The mirror identifies the customer and then overlays their “reflection” with the items to be tried on. The following video shows the FXGear FXMirror being used as part as a shop floor fitting room.

EXERCISE: Read the blurb about the FXGear FXMirror. What data is collected about users who model clothes using the device? How might such data be used?

EXERCISE: How else have marketers and advertisers used augmented and mediated reality? Try searching through various marketing trade/industry publications to find reports of recent campaigns using such techniques. If you find any, please provide a quick review of them, along with a link, in the comments.

Augmented Reality Apps for the Design Conscious

When the 2013 Ikea catalogue was first released at the start of August 2012, as part of a campaign developed in association with the McCann advertising agency, it was complemented by an augmented reality application that allowed customers to place catalogue items as if in situ in their own homes. Each year since then, the augmented reality app has been updated with the latest catalogue items, demonstrating Ikea’s ongoing commitment to this form of marketing.

For an early report, see for example: WiredSo Smart: New Ikea App Places Virtual Furniture in Your Home, August 2013.

Perhaps not surprisingly, the use of augmented reality in the context of interior design extends far beyond just an extension of the Ikea catalogue.

One of the drawbacks of the current generation of augmented reality interior design applications is the low quality of the rendering of the digital 3D object. As we shall see elsewhere, the higher powered computer processors available in today’s desktop and laptop computers, compared to mobile devices, means that it is becoming possible to render photorealistic objects in a reasonable amount of time with a personal computer. However, meeting the realtime rendering requirement of augmented reality apps, as well as the ability to ensure that that the rendered object is appropriately shaded given the lighting conditions of the environment and the desired location of the artificial object, presents further technological challenges.

EXERCISE: read the Accenture report from 2014 Life on the digital edge: How augmented reality can enhance customer experience and drive growth and then answer the following questions:

  • what does the report describe as “one of the main goals of any retailer’s digital investment”? How do they claim augmented reality might achieve that goal? To what extent do you think that claim is realistic? What obstacles can you think of that might stand in the way of achieving such a goal using augmented or mediated reality?
  • according to the report, how might augmented reality be used in retail? The report was published in 2014 – can you find any recent examples of augmented reality being used in ways described in the report? Is it being used for retail in ways not identified in the report?
  • what does the report identify as the possible business value benefits of using augmented reality? In that section, a table poses the question “What augmented reality use case would increase your likelihood of purchasing the product?”. Can you find one or more current or historical examples of the applications described? Do such applications seem to be being used more – or less – frequently in recent times?

A lot of hype surrounds artificial reality although in many respects is value other than as a novelty are yet to be determined. To what extent do you think augmented reality applications are a useful everyday contribution to the marketer’s toolkit, and to what extent are they simply a marketing novelty fit only for short lived campaigns? What are the challenges to using such applications as part of an everyday experience?

Noise Cancellation – An Example of Mediated Audio Reality?

Whilst it is tempting to focus on the realtime processing of visual imagery when considering augmented reality, notwithstanding the tricky problem of inserting a transparent display between the viewer and the physical scene when using magic lens approaches, it may be that the real benefits of augmented reality will arise from the augmentation or realtime manipulation of another modality such as sound.

EXERCISE: describe two or three examples of how audio may be used, or transformed, to alter a user’s perception or understanding of their current environment.

ANSWER: car navigation systems augment spatial location with audio messages describing when to turn and audio guides in heritage settings, where you can listen to a story that “augments” a particular location. Noise cancelling earphones transform the environment by subtracting, or tuning out, background noise and modern digital hearing aids process the audio environment at a personal level in increasingly rich ways.

Noise Cancellation

As briefly described in Blurred Edges – Dual Reality, mediated reality is a general term in which information may be added to or subtracted from a real world scene. In many industrial and everyday settings, intrusive environmental noise may lead to an unpleasant work environment, or act as an obstacle to audio communication. In such situations, it might be convenient to remove the background noise and expose the subjects within it to a mediated audio reality.

Noise cancellation provides one such form of mediated reality, where the audio environment is actively “cleaned” of an otherwise intrusive noise component. Noise cancellation technology can be use to cancel out intrusive noise in noisy environments, such as cars or aircraft.

Noise cancelled environments might also be considered as a form of hyper-reality, in the sense that no information other than that contained within, or derived from, the original signal is presented as part of the “augmented” experience

EXERCISE: watch the following videos that demonstrate the effect of noise cancelling headphones and that describe how they work, then answer the following questions:

  • how does “active” noise cancellation differ from passive noise cancellation?
  • what sorts of noise are active noise cancellation systems most effective at removing, and why?
  • what sort of system can be used to test or demonstrate the effectiveness of noise cancelling headphones?

Finally, write down an algorithm that describes, in simple terms, the steps involved in a simple noise cancelling system.

EXERCISE: Increasingly, top end cars may include some sort of noise cancellation system to reduce the effects of road noise. How might noise cancellation be used, or modified, to cancel noise in an enclosed environment where headphones are not typically worn, such as when sat inside a car?

Rather than presenting the mixed audio signal to a listener via headphones, under some circumstances speakers may be used to cancel the noise as experienced within a more open environment.

As well as improving the experience of someone listening to music in a noisy environment, noise cancellation techniques can also be useful as part of a hearing aid for hard of hearing users. One of the major aims of hearing aid manufacturers is to improve the audibility of speech – can noise cancellation help here?

EXERCISE: read the articles – and watch/listen to the associated videos – Noise Reduction Systems and Reverb Reduction produced by hearing aid manufacturer Sonic. What sorts of audio reality mediation are described?

It may seem strange to you to think of hearing aids as augmented, or more generally, mediated, reality devices, but their realtime processing and representation of the user’s current environment suggests this is exactly what they are!

In the next post on this theme, we will explore what sorts of physical device or apparatus can be used to mediate audio realities. But for now, let’s go back to the visual domain…

“Magic Lenses” and See-Through Displays

In the post Taxonomies for Describing Mixed and Alternate Reality Systems we introduced various schemes for categorising and classifying the various components of mixed and augmented reality systems. In this post, we will see how one particular class of display – see-through displays – can be put to practical purpose. 

Using a phone, or tablet, with a forward facing, back-mounted camera as a see-through video display, you can relay the camera view to the screen in a realtime view mode and manipulate the current scene. This approach has been referred to as a “magic lens, … a see-through interface/metaphor that affords the user a modified view of the scene behind the lens” (D. Baričević, C. Lee, M. Turk, T. Höllerer and D. A. Bowman, “A hand-held AR magic lens with user-perspective rendering,” Mixed and Augmented Reality (ISMAR), 2012 IEEE International Symposium on, Atlanta, GA, 2012, pp. 197-206, doi: 10.1109/ISMAR.2012.6402557 [PDF]). (See also  M. Rohs and A. Oulasvitra. Target acquisition with camera phones when used as magic lens, Proceedings of the 26th international conference on Human Factors in Computing Systems, CHI ’08, pages 1409–1418. ACM, 2008, who define a magic lens as an “augmented reality interface thats consist of a camera-equipped mobile device being used as a see-through tool. It augments the user’s view of real world objects by graphical and textual overlays”).

However, as the paper noted at the time:

Many existing concept images of AR magic lenses show that the magic lens displays a scene from the user’s perspective, as if the display were a smart transparent frame allowing for perspective-correct overlays. This is arguably the most intuitive view. However, the actual magic lens shows the augmented scene from the point of view of the camera on the hand-held device. The perspective of that camera can be very different from the perspective of the user, so what the user sees does not align with the real world. … We define the user-perspective view as the geometrically correct view of a scene from the point-of-view of the user, in the direction of the user’s view, and with the exact view the user should have in that direction.

Whilst head-up displays are also examples of see-through displays, many head-up displays do not necessarily situate the virtual digital objects as direct augmentations of the perceived physical world – rather they are frequently pop-up style dashboard that open as “desktop windows” or pop-up menus within the visual scene, rather than as direct objects of physical objects perceived within the visual scene.

The first wave of consumer augmented reality applications relied printing out registration images or QR codes that could act as fiducial markers and be easily recognised using image recognition software, and then overlaid with a 3D animation.

If an image could reliably be detected, it could be used as part of an augmented reality system, resulting in some innovative marketing campaigns.

The same idea can be used to enhance two-dimensional print publications. With a suitable device and the appropriate app installed, you can recognise a particular page of print and “unlock” additional content, an approach taken by the Layar augmented reality app, among other things, that allows you you create your own augmented reality enhanced content.

For more confident programmers, one of the earliest widely available augmented reality programming toolkits, the open source ARToolkit, (which is still being developed today and is distributed for free at ARToolkit.org), and the Wikitude SDK (software development kit), which allow professional and hobbiest programmers alike to create their own augmented reality demonstrations. (See also commercial services such as Catchoom: CraftAR.)

Within all these applications, we see how there is a need for “enabling technologies, … advances in the basic technologies needed to build compelling AR environments. Examples of these technologies include displays, tracking, registration, and calibration” (Azuma, Ronald, Yohan Baillot, Reinhold Behringer, Steven Feiner, Simon Julier, and Blair MacIntyre. “Recent advances in augmented reality.” IEEE computer graphics and applications 21, no. 6 (2001): 34-47) that make the development of such systems possible by developers outside of advanced research and development labs.

One popular category of AR Toolkit demonstration, and an approach that hints at a particular  category of potential augmented reality applications, was the development of interactive Lego model assembly manuals. These could recognise a registration image associated with a particular model and could then step through the sequential steps required to build the model, overlaying the next piece to be added to the model in a stepwise fashion. The known size and of the marker, the fixed geometry of the model, and the availability of open source Lego CAD tools based around LDraw meant that many of the physical and computational building blocks required for creating such applications were already in place.

SAQ: What’s wrong with the demonstration shown in the video above?

Answer: The placing of the virtual block on the model does not appear to be in the correct place, but is offset slightly. This might arise from a combination of issues, including the placement of the registered image or the positioning of the or see-through device or the camera used to record the video.

An earlier demonstration of a Lego construction model instruction manual includes some additional humour in the form of an animated Lego figure mechanic who fetches an appropriate piece at each step and then demonstrates where to attach it to a model based around the original Lego Mindstorms Robot Invention System.

The demonstration also shows how augmented reality can be used to to test the operation of the completed assembly, stepping the user through a test sequence and virtually animating the expected behaviour. The ability of the RCX computer brick at the heart of the model to communicate back to the computer hosting the manual also allowed the augmented reality layer to display information captured by the brick (the light sensor readings) to be displayed in the augmented reality layer.

SAQ: how might advances in 3D image recognition technology be used to further improve the functionality of the manual, for example, in terms of checking the correct assembly of the model? What other enabling technologies may also help in this endeavour?

Answer: as the ability to recognise, identify and orientate 3D objects improved, the potential for generating digital overlays on three dimensional objects became more tractable. This means that it may be possible to recognise pieces picked up by the person building the model and checked by the interactive manual against the expected part. Erroneous parts could be highlighted with a warning sign. Additionally, the state of the model after each step is completed visually checked to see that the correct piece appears to have been placed in the correct position, although this is likely to present a mode complex task and may not be possible. If a piece could be identified as incorrectly placed (for example, in a “likely possible” misplaced position) the instruction manual might show how to move the part to the correct place, or rewind to show where the piece should have been placed.

The availability of programming building blocks capable of regnising individual lego bricks and associating them with a part number also used by CAD tools such as LDraw could be seen as an enabling technology for the further development of such diagnostic AR tools.

As a proof-of-concept idea, augmented reality Lego construction manuals provide a realistic, if toy example (in several senses of the word!) of how such techniques might be used in a practical setting. So it’s not surprising that augmented reality instruction manuals were among the first application areas described when the possibility of AR first began to emerge.

Optional Reading: two relatively early descriptions of augmented reality instruction and assembly manuals can be found in: Caudell, Thomas P., and David W. Mizell. “Augmented reality: An application of heads-up display technology to manual manufacturing processes“, Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences, vol. 2, pp. 659-669. IEEE, 1992.; and Feiner, Steven, Blair Macintyre, and Dorée Seligmann, “Knowledge-based augmented reality.” Communications of the ACM 36, no. 7 (1993): 53-62.

A similar approach is also being used to develop service manuals for use in industry, via magic lens displays, and presumably also in smart helmet displays:

Tablet, or phone, based magic lens apps are also being use to support car maintenance in the form of augmented reality car manuals that are capable of recognising particular features of a car dashboard, or engine, and the interactively annotating them with a virtual overlay, as described in this press release from Hyundai about their augmented reality car ownersmanual:

On the other hand, maybe such applications are just so much hype and not actually of interest to a wider public? For example, in 2013, one company attempted to crowdsource funding for a general purpose app that could act as an AR style guide for a range of car models:

At the time of writing, in mid-2016, the site appears to have raised almost $500 of the $90, 000 goal. Maybe augmented reality is not that compelling for the mass market?!

However, it seems as if the development of augmented reality technical documentation is still an area of academic research, at least.

In the next post on this theme, we’ll see how augmented reality can be used to implement “magic mirrors”, in contrast to “magic lenses”. But first – what “lenses” can we apply to another modality: sound?

Real or Virtual Objects?

In the post Taxonomies for Describing Mixed and Alternate Reality Systems, we provided a framework for talking about the various physical components of an augmented reality system. But how should we talk about the different elements within the perceived augmented reality scene?

Milgram and Kishino (Milgram, Paul & Fumio Kishino, “A taxonomy of mixed reality visual displays”IEICE TRANSACTIONS on Information and Systems 77, no. 12 (1994): 1321-1329) started by clarifying the notions of real and virtual in an augmented reality sense:

  • Real objects are objects that have a physical, tangible existence, whereas virtual objects are purely digital representations, without a physical correlate, within the rendered visual scene (although they may be digital representations of things that do exist).
  • An object viewed directly appears has an existence in the real world and is viewed as such by the viewer. A non-directly viewed object is one that has been sampled and re-presented to the viewer via a display medium, or a virtual object whose existence can only be viewed via such a medium. This is referred to as the image quality.
  • A real image is one that has “some luminosity at the location at which it appears to be located”, such as a directly viewed object or an image viewed on a screen. Virtual images are produced by optical tricks, such as holograms and mirror images, and have no luminosity at the location at which they appear.

r76JBo-Milgram_IEICE_1994_pdf2

Whilst these distinctions are helpful when considering the representation of a single object, they may become confused when trying to analyse a view composed of multiple objects, both real and virtual. For example, in the Google Translate example described in Augmenting Reality With Digital Overlays, the screen is a physical display, that is, a real image, that provides a non-direct view. But is the text a real object or a virtual object?

To help us talk about objects within the augmented visual scene, we might add an additional correspondence dimension, that describes whether an object within the scene, or component of it, is presented as:

  • a raw, otherwise untouched, part of the image (that is, a faithful re-presentation of the object represented in that part of the image);
  • an overlay, where an additional layer of information is added to the scene, as in the case of a HUD dashboard;
  • a re-touch, where the object is still recognisable but has been reshaped and/or recoloured;
  • a replacement, where an object has been detected and then replaced.

We now have various tools at out disposal for helping us see – and talk about – the various components of a mixed reality system from a range of critical perspectives.

Augmenting Reality With Digital Overlays

Typically, head up displays of the sort referred to in Introducing Augmented Reality Apparatus – From Victorian Stage Effects to Head-Up Displays represent one or more layers of “dashboard” style information to a forward-facing viewer without them having to look down at an instrument panel. But augmented reality displays can go further by registering or identifying items within the visual scene and then overlaying information on top of the scene that directly relates to those entities, or transforming it directly, in real time. In this section, we will introduce several examples of how augmented reality has been implemented, and the uses to which it has been put, over the last few years, and identify further ways of describing the various components that make up a mixed reality system.

In the examples of augmented reality that follow, try to relate the “problem” being solved with the sort of AR apparatus being used as described in Taxonomies for Describing Mixed and Alternate Reality Systems. Ask yourself why that technique might have been chosen and whether it appears to be the most appropriate one. Would alternative implementations also work, and if so, how would they compare in term of their relative advantages and disadvantages?

Projection based displays

The augmented reality church organ/equaliser we met earlier represents an example of what MIT researchersRaskar, Ramesh, Greg Welch, and Henry Fuchs referred to as Spatially Augmented Reality (SAR) (Raskar, Ramesh, Greg Welch, and Henry Fuchs, “Spatially augmented reality“, First IEEE Workshop on Augmented Reality (IWAR’98), pp. 11-20. 1998):

In Spatially Augmented Reality (SAR), the user’s physical environment is augmented with images that are integrated directly in the user’s environment, not simply in their visual field. For example, the images could be projected onto real objects using digital light projectors, or embedded directly in the environment with flat panel displays.

The Virtual Watershed Table / Augmented Reality Sandbox provides another example of SAR, in which the vertical relief of a table of sand moulded in three dimensions by the user is tracked in real time by a Microsoft Kinect device. A virtual model of the extracted shape of the surface is then used as the basis for a topographic map projection onto the surface of the sand, along with animated displays of waterflows across the sculpted sand model.

SAQ: What difficulties might be associated with projection based displays?

Answer: one obvious problem is that the viewer may occlude the projected imagery, casting a shadow over parts of it. Another is that a projection system is required, and must be calibrated so that it maps the digital imagery appropriately on the matched physical substrate.

Augmented Reality Apps

Although the AR Sandbox provides a compelling demonstration of how augmented reality can be used to enrich a learning or discussion activity, augmented reality applications have yet to prove they can make it in the consumer marketplace. Do users really want to stand looking through a camera as a see-through display, or would they be happier grabbing a photo and then looking at an augmentation or transformation of it?

A good example of this is shown by the Word Lens augmented reality application that was acquired by Google and is now part of Google Translate. It not only detects text, in realtime, in a visual scene, but also identifies the language and then translates the text, as required, replacing the original text with the translated version.

If you’ve ever found yourself in a foreign city with a script you don’t recognise, such as Greek, or Russian, you might appreciate the value of this sort of application! But does this really need to be an augmented reality video application? Or would it work equally well if the user looked up  to take a photo of the street sign that was causing them confusion and then looked down at their phone to inspect a translated version of it, much as they might preview a photo they had just taken?

SAQ: how would you categorise the previous examples of augmented reality in terms of the AR technology frameworks?

With a conceptual scheme (the technology framework) already in place for categorising the various approaches to implementing the optical components of an augmented reality system, we now need some way of talking about the visual components that make up the augmented reality scene.

Taxonomies for Describing Mixed and Alternate Reality Systems

In the post Introducing Augmented Reality Apparatus – From Victorian Stage Effects to Head-Up Displays, we saw how a Victorian illusion could be repurposed as the basis of a a modern day augmented reality application. In this post, we’ll start to pick apart the various ways in which mixed and alternate reality systems can be put together, and explore how we can distinguish such systems from each other.

When coming to a new topic, it can often be hard to know how people who work in that area, or who are experts in it, make sense of it. If presented with a photograph of a bird and asked to identify it, an ornithologist (bird watcher) would almost certainly see different, and distinctive, things in the image than I would! So as we embark on out journey into augmented reality, what sort of things do we need to be looking out for to help us get our bearings?

A taxonomy describes a classification scheme that allows you categorise related items within a particular frame of reference in a meaningful way. Milgram and Kishino’s “A taxonomy of mixed reality visual displays”, helps us to identify a range of methods for displaying mixed reality scenes for viewing by individual’s in a non-immersive way (I am using “non-immersive” in the sense that the participant can still see the physical word around them). Their classification includes the following:

  • “Monitor based … video displays – i.e. ‘window-on-the-world’ (WoW) displays – upon which computer generated images are electronically or digitally overlaid”; there is no implication of being able to “see through” these displays. Rather, the viewed scene may be remote in terms of time and/or space and the focus is on the manipulation of an already captured video scene. A window-on-the-world view might be something as simple as television view displaying a swimming race with an overlaid virtual line placed on top of the scene showing where the race leader would have to be at that point in time if they were setting a world record pace.
  • Displays, such as HMDs (Head Mounted Displays), “equipped with a[n optical] see-through [ST] capability, with which computer generated graphics can be optically superimposed, using half-silvered mirrors, onto directly viewed real-world scenes”. A head up display on a smart helmet is a good example of an optical see through display.
  • Displays that use “video, rather than optical, viewing of the ‘outside’ world. … the displayed world should correspond orthoscopically [that is, size, shape and perspective should be maintained] with the immediate outside real world, thereby creating a ‘video see-through’ system, analogous with the optical see-through [approach]”. Someone viewing the world through a camera view on their smartphone would be looking at a video see through system.

A second paper from the same lab (Milgram P, Takemura H, Utsumi A, Kishino F. Augmented reality: A class of displays on the reality-virtuality continuumPhotonics for industrial applications, 1995 Dec 21 (pp. 282-292). International Society for Optics and Photonics) further classified these display types in terms of whether the principle depicted world (the substratum world) was real or computer generated (CG), providing a basis for comparing augmented reality systems from virtual reality ones, whether the substrate was “scanned” or directly viewed (that is, directly perceived without mediation through a video screen or projection) and whether the view was a first person, egocentric view (that is, from the viewer’s perspective) or an exocentric view (from some other perspective).

Class of MR System  Real (R) or CG world?  Direct (D) or Scanned (S) view of substrate?  Exocentric (EX) or Egocentric (EG) Reference? 
Monitor-based video, with CG overlays  R S EX
HMD-based optical ST, with CG overlays  R D EG
HMD-based video ST, with CG overlays R S EG

We might further refine the exocentric notion into 2nd and 3rd person views, where we imagine the second person view is capable of including the presence of the viewer, and the third person view is completely remote from them.

A later paper by Bimber, Oliver, and Ramesh Raskar, “Modern approaches to augmented reality“, ACM SIGGRAPH 2006 Courses, p. 1. ACM, 2006, also considered the sort of physical system, or apparatus, required to augment a visual scene with digital imager. (The idea is not that all of these methods are employed at the same time – only one of them is!)

  • retinal display;
  • head-mounted display;
  • hand-held display;
  • spatial optical see-through display;
  • projected display on object.

TR2006-105_pdf_AR-displays

(We might also add contact lens mounted displays between retinal and head mounted displays.)

A related classification is used by Van Krevelen, D. W. F., & Poelman, R. (2010). A survey of augmented reality technologies, applications and limitations. International Journal of Virtual Reality, 9(2), 1, which groups the approaches as retinal, optical see-through, video see-through, and projective.

Drawing on all these ideas, the following classification allows us to talk about a range of visual displays capable of rendering mixed and augmented realities, whether locally or remotely situated with respect to the reality being augmented, to individuals or groups:

  • proximity dimension:
    • proximal: retinal and head mounted displays, which may be grouped together as augmented visual field devices (AVFDs)
    • hand-held: hand-held devices such as phones or tablets
    • distal: free standing displays (e.g. monitors or projected displays)
  • optical dimension:
    • video screen based window on the world displays, which overlay a given video image
    • see through displays to augment the visual scene perceived through the display, which may be video based, and as such provide a “scanned” (or indirect) view of the substrate, or optically based, where the substrate is directly perceived
    • projected displays, which directly enhance the environment
  • viewpoint
    • first degree (first person?): first person view
    • second degree (bystander?): colocated with viewer and capable of presenting them in the visual scene
    • third degree (third party? remote?): representing a non-local visual scene.

On the one hand, the classification allows us to refer to an augmented reality phone-app as a hand-held see-through video screen based display used to indirectly perceive the visual scene from a first degree viewpoint. On the other, it allows us to refer to a mixed reality scene such as televised sporting event with overlaid graphics as indirect view of the scene from a third degree viewpoint using hand-held or distal window on the world video display.

 

The Art of Sound – Algorithmic Foley Artists?

As well as being a visual medium, films also rely on a rich audio environment to communicate emotion and affect (sic). In some cases, it may not be possible to capture the sound associated with a particular action, either because of noise in the environment (literally), or because the props themselves do not have the physical properties of the thing they portray. For example, two wooden swords used in a sword fight that are painted to look like metal would not sound like metal swords when coming in contact to each other. When a film is dubbed, and the original speech recording replaced by a post-production recording, any original sound effects also need to be replaced.

Foley artists add sounds to a film in post-production (that is, after the film has been shot). As foley artist John Roesch describes, whatever we see on that screen, we are making the most honest representation thereof, sonically (“Where the Sounds From the World’s Favorite Movies Are Born“, Wired, 0m42s).

One of the aims of the foley artist is to represent the sounds that the viewer expects to hear when watching a particular scene. As Roesch says of his approach, “when I look at a scene, I hear the sounds in my head” (0m48s). So can a visual analysis of the scene be used to identify material interactions and then automatically generate sounds corresponding to our expectations of what those interactions should sound like?

This question was recently asked by a group of MIT researchers (Owens, Andrew, Phillip Isola, Josh McDermott, Antonio Torralba, Edward H. Adelson, and William T. Freeman. “Visually Indicated Sounds.” arXiv preprint arXiv:1512.08512 [PDF] (2015)) and summarised in the MIT News article “Artificial intelligence produces realistic sounds that fool humans“.

“On many occasions, … sounds are not just statistically associated with the content of the images – the way, for example, that the sounds of unseen seagulls are associated with a view of a beach – but instead are directly caused by the physical interaction being depicted: you see what is making the sound. We call these events visually indicated sounds, and we propose the task of predicting sound from videos as a way to study physical interactions within a visual scene. To accurately predict a video’s held-out soundtrack, an algorithm has to know about the physical properties of what it is seeing and the actions that are being performed. This task implicitly requires material recognition…”

In their study, the team trained an algorithm using thousands of videos of a drum stick interacting with a wide variety of material objects in an attempt to associate particular with sounds with different materials, as well as the mode of interaction (hitting, scraping, and so on).

The next step was then to show the algorithm a silent video, and see if it could generate an appropriate soundtrack, in effect acting as a synthetic foley artist (Visually-Indicated Sounds, MITCSAIL).

 

SAQ: to what extent do you think foley artists like John Roesch might be replaced by algorithms?

Answer: whilst the MIT demo is an interesting one, it is currently limited to a known object – the drumstick – interacting with an arbitrary object. The video showed how even then, the algorithm occasionally misinterpreted the sort of interaction being demonstrated (e.g. mistaking a hit). For a complete system, the algorithm would have to identify both materials involved in the interaction, as well as the sort of interaction, and synthesize an appropriate sound. If the same sort of training method was used for this more general sort of system, I think it would be unlikely that a large enough corpus of training videos could be created (material X interacts with material Y in interaction Z) to provide a reliable training set. In addition, as foley artist John Roesch pointed out, “what you see is not necessarily what you get” (1m31s)…!


Categories


Follow

Get every new post delivered to your Inbox.

Join 66 other followers