Vocabulary
- Level: The level of an audio signal is it’s loudness. The term spread outside the context of audio signals (like microphone cables) into generally speaking about audio.
- Phase: A position in a waveform cycle.
- Walk next to someone else with the same stride length. Steps in sync? you’re in the same phase. If the person next to you has a different stride length, thats a difference in wavelength.
- Stereophonic Sound/Stereo: Using multiple speakers that reproduces audio in a way that is perceived as directional. A broad term.
- Timbre: Tone. Perceived sound quality. Characteristics of sound. Think of it as the properties of the sound that make that sound unique compared to other sounds (like different instruments all playing the same note).
Hearing Environments
Many things around us produce a sound. Theres the obvious sound producing things like people, speakers, and background noise like cars and birds. But there’s so much more. Air conditioners. Computer fans and hard disks. Refrigerators. My cat. Radiators. A house “settling”. The hum of certain electronics. There’s sounds of everything we/others do - rustling clothes, wheeled chairs, breathing, footsteps, and more. All of these sounds are unique to their environment. The particular timbre, the volume, the collection and arrangement. Every environment is unique, and ever environment uniquely affects the sounds with in it, with reverberation. Every space has its own sound.
How we perceive the space is largely informed by this sound. We also use it to inform our own location within the space, and we can get a sense of the scale, the “character” or “mood”, and more of a space from its acoustic properties.
Even if a user is not trying or able to precisely identify the location of a particular sound source, being capabale of doing so - having more localization information - helps the player feel present in the environment. In other words, effective environmental sound design can significantly lower the barrier to a feeling of immersion.
Never underestimate sound design.
Everything around us affect the way things sound
Scientists studying hearing aids in the 50’s came to an important conclusion: It is more comfortable to listen to somebody when we can accurately locate them auditorily. (Raymond Carhart)
Design wise, consider that it is more comfortable not just to listen to somebody, but to interact with anything when the interaction is reinforced auditorily, and not just visually. And, outside of interactions, for en environment to impact us - for us to feel the space, to achieve that sense of presence in VR, we need environments to sound appropriate.
Sound Localization - Why
“Localization” is identifying the location of the source of a sound, specifically or generally. We are not pin-point accurate, but even broad/blurry understandings can be helpful!
While I will be discussing the specifics of determining the location of an audio source, keep in mind that this is information is used to understand all sorts of properties about an environment. Directly, it informs our understanding of a space’s scale, shape, the materials it’s made of. Indirectly, it informs where one may be able to go, if one is alone or with others, how they should feel about a space, how that space is connected to it’s surrounding environment, and more.
For example, even the most broad of understandings can help locate ourselves within an environment and help us understand our location when we move or turn. Localization is a process relative to the listener. We can use sounds to locate ourselves, not just locate the sounds position.
Consider the last room you watched TV inside of. You could be teleported blind to anywhere in that room and you would know where you were in that room from the audio cues alone.
We use sound to stay aware of moving objects in an environment. I am typing right now and I know exactly where my cat is. I know where conversations are taking place at a house party, and when someone approaches my office to enter compared to walk past.
Sound Localization - Okay but how does it work.
We are better at localizing sound on the horizontal plane then the vertical.
Level Difference
Sound loses energy as it travels, and even the difference it loses between our ears is significant enough to work with. Is a sound louder in this ear? It’s to the left.
Of course, it’s also louder because of the direction our ears are pointing. A cat can angle and change their ears directions in order to sense the environment in different ways.
Consider how when you are trying to listen for something and you don’t know where it’s coming from, you rotate your head about in a sort of “scanning” behavior.
Time Difference
Sound travels fast! But not that fast.
Sound in front of us will reach our ears about the same time, but sound to the sides will reach one ear before the other.
For impulse sounds, we can identify the time difference it took for the sound to reach our ears to localize it. Neat!
Stereo Level/Time Delay Example
Stereo Recording Example
Street musician with saxophone in a park recording by Gregor Quendel.
Phase Difference
But what about continuous sounds? Surely if there is no start or stop time, this doesn’t helps us at all? Wrong! Our brains can identify the phase difference and localize sound from that.
This, of course, works better for certain frequencies (wavelengths) - very high pitched sounds have such a short wavelength, it can be hard to identify the phase difference. Same for lower frequencies.
Sound above or below us is more challenging to locate, as wll as sound from highly reverberant environments.
BONUS THINKS: What makes a sound, as we colloquially describe and think about it, “high” or “low” in pitch? What makes it’s pitch “normal”? What informs this categorization? Could our ability to localize sounds in certain frequencies more easily affect how we feel or think about the sound? Does our ability to localize a sound affect how comfortable it is to listen to that sound?
Head Masking Effect/Head Shadow Effect
The difference a sound has between our ears can get muffled by our head. Sound from different directions will get muffled in different ways to our different ears. We use this information to inform our localization.
By “muffled”, it will generally have a variety of different audio characteristics/frequency responses in addition to simply being not as loud.
Contribution of Head Shadow and Pinna Cues to Chronic Monaural Sound Localization by Marc M. Van Wanrooij, A. John Van Opstal.
Directionality and the head-shadow effect by Cherish Oberzut, Laurel Olson
Ear Shape
The shape of our ear matters. You know, ears. They have that little flap the pinnae, the ridges, all that weird stuff. These introduce slight differences in frequency response to a sound depending on the angle the sound entered our ear from.
Sound coming from the front will travel into our ear a different way than sound coming from the side, and that difference will affect the sound. We can use the differences to localize the audio’s source.
Our brain, in processing this information, will generally disregard and normalize these differences - you don’t “notice” it, actively. The sound sounds the same to you regardless of where it comes from. In fact, even differences we can hear can be hard to notice.
One way to simulate this effect is not to simulate it digitally at all. Instead, just record with a microphone that also has the same shape. In other words: a microphone with weird rubber ears. This is called a binaural microphone. Listening to these recordings with headphones can be quite remarkable. Compared the effect to listening without headphones.
Headphones are neccesary to listen to the following recording.
Reverberation, Signature and Tone
Every environment sounds different. Sound travels directly from the source to the listener. Sound also travels all around the environment, where it bounces off of things (“picking up” new acoustic properties) and then traveling to the listener. It took longer for these sounds to reach the listener than the direct. We call these indirect sounds “reverberation”.
Reverb Example
Mono Recording with Examples of Reverb. Reverb added digitally.
Reverberation and room tone are very important for VR, but this is not the space for a detailed discussion or investigation. In regards to localization, reverberation can be localized just like direct sounds can. Often with less precision. Reverb gives us a general sense of a room. As we turn around in it, reverb is a part of what allows us to place ourselves in the room. This technique, taken to a precise extreme, could be called echolocation. Humans have, with practice, learned to echolocate. For most of us, the awareness of reverberation as it relates to our location is extremely vague, and generally just cues us into rough orientation.
When it comes to design, reverb is far more important in helping inform the user into architectural properties of the environment. Particularly scale and material. A small underground cave will sound different than a large cavern, which will sound different than a wooden hut.
Consider this next time you walk down a stairwell. Many stairwells are highly reverberant, it’s easy to identify the sounds. The direct sounds of your footsteps never really change location, but as you approach the bottom of the stairwell, the reverberation of the footsteps changes. Your ears tell you that you are near the bottom (i.e.: it’s time to look up from your phone).
Reverb is easy to fake, and “inject” into audio recordings. Adding “fake” reverb to a recording is a fundamental tool in a sound engineers toolkit.
It’s challenging, but not impossible, to simulate reverb in VR. As designers, we should be aware of the ways to add reverb to our sounds in order to help sell a particular environment, and make it feel impactful.
Challenges to Audio Localization
We must be careful in our sound design when creating environments.
Constant Audio
Hums, buzzing, noise, and continuous tones can be very difficult to locate in space. There is basically no time difference and often a hard to perceive level difference. We must rely largely on phase difference, which often - particularly the case with noise - just isn’t enough information.
Constant Frequencies
A single frequency - such as a single note being played on a violin, or a computer generated sin wave signal - is challenging to locate. There is less information contained in all cues, but particularly head transformation effect and ear shape cues.
Certain Frequencies
Our auditory system does best within certain frequency ranges, where we are more perceptive. Such as the range of frequencies that make up human speech. Higher and lower frequencies of sound are challenging. Our ears are less perceptive to them, they don’t distort or change as much while reverberating around a room, and can be basically meaningless in terms of phase difference.
Unexpected Sounds
When an audio cue catches us off guard, we may not be paying enough - or the right kind - of attention to it. We may need to stop and focusing on the sound in order to locate it, which is 1) bad for sound design but 2) useless when the sound only happens once, and unexpectedly, or at unexpected intervals.
The Annoy-A-Tron
Using the above information, can we imagine a device that creates noises that one can’t find? One that chirps high pitch, single, briefly sustained, audio at random intervals with slightly random changes in pitch and volume. If you hide this device, it would be really annoying. Enter the “annoy-a-tron”, a device created and sold by the now-defunct thinkgeek.com.
As VR Designers, our goal is to not prank our users with horribly annoying audio. We can use this information to know what kind of sound design we want to avoid.
Takeaways
Most of the above information deals with sound traveling directly to our ears. What can we do with this information?
The Horizontal Plane
We are better at localizing sounds on a horizontal plane than vertical. avoid putting key acoustic information directly above or below the player.
Consider a direct example for the need to localize sound. A “beacon” system that allows players to point out a location by pointing at it, and having it make a visual and acoustic effect. Beacons are often used in multiplayer experiences. Consider if someones points near someones feat, or nearby. We want it to be easy for the player to tell if they fired the beacon to the side, or behind, the player. We could check if we are close to a user, and if so, add a second invisible sound source above the ground, closer to the horizontal plane of the users ears, in order to better communicate the beacons directionality to that user. It could just play the same sound effect, or carry direct-not-reverberant audio only.
Certain frequencies are easier to localize than others. Ensure sounds of important elements include these “normal” frequencies in their design.
Avoid Acoustic Minimalism
With the goal of creating immersive environments, we want to locate the user in a rich sound-scape.
Within reason, and with plenty of exceptions, it’s probably better to have more things in your environment that are producing noise, than less. Obviously we don’t want to drown out, overwhelm, or distract the user with excessive sound design. But a minimalist approach where only the things that matter make sound? In the words of one of my students referencing something I’ve never heard of: “That ain’t it, chief”.
Ambiance in VR Design
Sound design theory from games involves making sounds big, punchy, and attention getting - high impact. Great for reactive and informative sounds in our environments! But we do more than that. We also need ambient sounds. Sounds that are designed to just subtly reinforce the sense of presence, environment, and mood.
They help the user locate themselves in the environment as they move about. They also are important for design, ambient sound informs the player a lot about mood and atmosphere of spaces.
Background audio also helps cut out sounds from the “real world”. How can a quiet VR experience happen inside of a room that is not quiet? The users real-world sounds conflict, and lower the degree of immersion.
These sounds should not be designed in a way to get the users attention, or incorrectly indicate importance. This is a challenge. It goes against many users learned preconceptions about designed environments. “That this is buzzing - lets investigate!”
One strategy is to have sounds come from important things. Give your environment ambiance via it’s significant elements. In other words: bypass the problem! Excellent.
Another strategy is to use items that users have already been trained in real life to ignore or deemphasize. For example, speakers playing diegetic background music, birds chirping outside of a window, cars rumbling past a road outside, or fans. (I would tend to avoid things that sound unique but don’t have a visible or identifiable source, like an air conditioner hum, unless it contributes to the environment or storytelling in other ways).
Interesting/Related
- The field of “Psychoacoustics” is worth a wikipedia dive at least
- Duophonic Sound
- Cocktail Party Effect