September 22, 2022
As we continue to explore bringing accessibility to XR technology, we identified the need to examine audio cues in immersive environments. XRA and XR Access partnered to host a 90-minute session comprised of industry experts and the public to identify if lessons learned in the application of audio cues in 2D games can be applied to immersive experiences to benefit the blind and low vision communities. Nearly 70 people registered for the event, and we were joined by three featured speakers to probe into the topic. A recording of the session can be viewed here. Notes from the session, including helpful resources, relevant links, discussion topics, and transcripts, can all be found here.
About the Speakers
Robert Ridihalgh (Microsoft)
Robert has been in the audio cues design field for almost two decades as a part of XBOX and has a plethora of industry experience in making audio cues reliable and useful to the player. Spatial audio is paramount to giving users the ability to place themselves in a virtual environment, and there are many ways to render spatiality, for example, a home theater can bounce sound waves off the ceiling to create a highly immersive atmosphere. Accurately representing the nature of the sound plays a key role in the realism of a scene. For example, does sound originate behind a rock or behind a tree? This could result in a different sound and should be captured.
To navigate a virtual world, building ambient sounds is not enough, one must include a beacon or similar function to help with navigation. Frequency and intensity of sound are important factors to consider to boost the realism of a scene.
Saqib Shaikh (Microsoft Seeing AI)
Saqib and his team created the Seeing AI app to help blind people navigate the world around them. This work includes the use of LIDAR, now commonly found in phones, to tell how far away things are, build a 3D model of the world, and present it through spatial audio.
Saqib has approached audio cues in a few ways. By placing virtual beacons on a door for example, users can navigate to the door and walk through it. The Seeing AI app can also describe an object that is in front of the user, which can help with navigation.
This is similar to Microsoft Soundscape, which is focused on mimicking the sounds that a usermay hear when walking outdoors, for example, describing an intersection or buildings that arenear them. This is accomplished by using GPS meta data and placing audio cues in the realworld.
Tim Stutts (Cognixion)
Tim is the Director of Product Design at Cognixion, a neurotech startup working on a haptic/brain interface system. Tim has worked in various audio specialist roles for a number of organizations including Magic Leap, where he was introduced to audio cue work, and Vuforia, where he helped create product geared towards work in a factory, where loud sounds can be
very distracting. Now at Cognixion, his direct focus is on accessibility, primarily to help people with cerebral palsy or other sever motor disabilities regain movement.
Community Discussion
Following the presentations from Saqib, Tim, and Robert, we opened the discussion to the participants, answering questions and addressing the most pressing issues that were presented. The below is a summary of the discussion.
Beacons
The question was raised if a beacon must be connected to a moving object or if it could be assigned through other means to help identify an object. The group discussed the game Swamp which has a beacon system to track NPC’s, other players, and objects to in space. Pitches can be used to indicate specific things. For example, when the player brings a menu up, the pitch up sound can be applied, and if the menu is retracted, the sound can be a pitch down. The beacon used in Microsoft’s Soundscape can change its texture based on the user’s orientation to the beacon.
Frequency and rhythm
Frequency and rhythm are two critical factors to include for an immersive audio experience. High frequency sounds localize better and low frequency sounds spread out. To give the user a pinpoint idea of where something is, use a high frequency sound. People are also very attuned to rhythm and can remember them very well, over tone or frequency. For any given set of circumstances going on, the important details and audio cues should be presented to the user. If there are too many sounds happening at once, it could confuse the user. One possible solution is to include a ducking algorithm to avoid too much sound from too many sources happening simultaneously.
Audio defaults and earcons
There should be sensible defaults of sounds, but there should be customizable solutions. So if you’d like to adjust the volume of one source of sound that should be able to be selected in the in-app menu. The user should be able to select the volume of each earcon. Sounds should be individually discernable, and consistency for earcons is very important. For example, the text message indicator is the same every time and we are aware of what that specific sound means.
The question was raised which audio cues are required to make VR most approachable for the average user. A problem with games and apps today is discoverability – that you must learn a whole new audio language which each experience. Players should also be allowed to adjust volume on each kind of sound individually. The group discussed if there should be a mix of general system sounds and then in-game sounds to address this problem.
Additional discussion
Haptics can also enhance the experience by adding an element of directionality.
The group discussed passive vs active sound. Words are generally too slow to relay things that need to be communicated instantaneously. Pings and haptic responses are much more efficient. This is related in for fast-paced experiences, like racing games. Different experiences will need to be equipped with different tools.
The XR Access/XRA github is a place to post accessible code that developers can pull. Equipping developers with tools to make accessible experiences will be instrumental in ensuring accessible experiences. Plug-in models may work best for adding 3 rd party accessibility features so that developers aren’t burdened with starting from scratch on every new project.