August 18, 2018

192k Experiments

I was just reading an article on Shaun Farley’s blog in which David Sonnenschein posed an interesting question about some of the techniques I use in my sound design. You can find the article this post.

[SF] He recorded it at 192, and then I believe he played it back at 44.1. He didn’t go in with pitch shifting, he went and played back the existing samples at a slower rate. So that was a good way to take advantage of that technical capability.

[DS] I’d be curious, and I haven’t done this myself, to hear a side-by-side of a sample rate speed change, and then a pitch shift at a fixed sample rate, to hear what the difference is like. I don’t know how they would compare to our ears; if they would be identical, or if they would have some other subjective quality.

This is the recording they were talking about:



One of the things I like to do in my sound design is to record at 192k. It opens up a world of options to you – you can do lots of cool things like major pitch shifting (since 192k will capture up to 96kHz, and most pro recorders are capable of 40 – 50k) and major time stretches (since you have 4x the data as 48k) without incurring any major sonic degradation. I love recording at 192kHz and then forcing it to play back at 44.1kHz. You get a lot of really cool sounds – some very deep, yet sonically full sounds.

David and Shaun bring up a good point though. If you were to alter the file by manually adjusting the pitch and time, would the results be different at all? I explored this and here’s what I found:

But first! A technical lesson (in case you’re unfamiliar with the techniques)

Here is the original recording I’m going to be using for all of these demonstrations. It is recorded with an Aquarian H2a hydrophone (freq resp approx: 10hz – 100k) at 192kHz, 24bit, into a Sound Devices 702T recorder (response tested up to 50kHz).




Let’s talk about pitch first. In the recording below, you can here a “pinging” at about 9.5k. However, the recording has been pitched down by 4x (time not altered), so the original sound was somewhere in the vicinity of 38k (that you can definitely not hear in the original recording), something you would never be able to capture without recording at a high sample rate. If you try to pitch down a recording made at 44.1kHz or 48kHz, you are going to lose a lot of high end in your final product. For instance, if you recorded at 48kHz and pitched that down 4x, the bottom end of your signal would be nearly identical to the bottom end of the 192kHz recording, however, since 48kHz is only capable of recording up to 24k, your top end would peter out at 6k in the final product. That’s a far cry from a full spectrum sound! If you’re just in it for LFE content or to low end oomph in your track, sure, that’ll work, but that’s about the only thing it would work for. With 192kHz (and 96kHz in this instance), you can pitch your sound down 4x to 5x and your result would still be full spectrum (all this assuming your recorder and microphone are capable of recording high frequencies above 20k – if not, all this is moot)





The second half of this idea is time shift. As you most likely know, sound is recorded in “samples”, or, more basically, slices of time – still frames of audio. Much like frames in a film. In an American movie theater, you will see a film played back at 24fps (traditionally). So, basically, your eyes are seeing 24 still pictures per second, and your brain accepts this as moving picture. But, if you slow down the movie, say to 20, or 15 frames per second, your brain notices that something is wrong, and you no longer believe this is a moving picture.

The same goes for audio. If you are working with 44.1kHz (or 44,100 samples per second – apparently our ears are much more discerning than our eyes!), your ears believe that you are hearing true sound. What you notice when you start stretching a 44.1kHz file out is that it becomes gritty. There isn’t enough information between samples to fool your ears. There are now programs out there that help interpolate data between samples (like paulstretch) – basically the algorithm analyzes the audio and makes up information to go between the samples – but that’s not really natural, and while the results can sound cool, they aren’t generally very useful when you’re trying to hide the fact that you’ve stretched something, or when you’re trying to preserve the “non-digital” feeling of a sound.

It is, however, important to note that with time shift, pitch shift comes naturally! If you play something back at half speed, you automatically cut all the frequencies in half as well (1x 8va). Think about this: if you’ve recorded a 100hz tone for 1 second, then you have 100 cycles of that tone (frequency = cycles per second). However, if you stretch that 1 second to 2 seconds, you still only have 100 cycles. So now you have 100 cycles per 2 seconds (or 50hz, which is an octave lower than 100hz). So the only way to stretch out a sound without altering it’s frequency is to have an additional algorithm to correct the pitch. This (often) doesn’t sound very good – but I’ll let you explore that one on your own.

Here’s an example of a the original sound simply slowed down (all pitch shifting is natural)




Enter 192kHz. When recording at 192kHz, you are gathering 4.35x more data than recording at 44.1kHz (or 4x as much as 48kHz). That means, if my session is at 48kHz, and I have a 192kHz file, I can make it up to 4x as long without my ears starting to break down the audio (you would need to be using the original file – not one converted to 48kHz!). Very cool if you’re trying to get some long drones or something from a shorter sound, but want to keep things sounding natural.



So now we come to what I actually learned from my experiment (which I haven’t even talked about yet…)

I created the original file (which is at the top of this article) from a larger file (just to keep it short). I then processed it in two ways. (In case you want to do this yourself…) The first thing I did was to create a 48k/24b session in Pro Tools. I went to the “Import Audio” menu and selected my file to import. Here’s the important step in making this work: UNCHECK the SRC (Sample Rate Conversion) box! You do NOT want Pro Tools to correct the sample rate, or your file will sound identical to your original. Once you click Import, a dialogue will pop up warning you that the file is not in the same sample rate as the session, which will cause the file to play back slower. Just hit ok, because this is what you’re trying to achieve! I ended up with a region that is exactly 4x the length of my original file. I saved my session to come back to in a few minutes


Next, I opened up a new session at 192k/24b. I imported my original file (which doesn’t require any conversion, since it’s already at 192k/24b), threw it on a track, and opened up the Time Shift Audio Suite plugin. I set it to make my file 4x longer (aka 25% speed) and pitch it down by 4x (also 25%), then processed the file. I listened to it, and it sounded just like my other file. Just to be sure, I wanted to do a side-by side comparison of the two files. So, I exported the file (Shift + Command + K) at 48k/24b (since I am going to next bring it into my other session), saved my work and closed the session.





Final step, I opened up the first session and imported the newest file I’d created. I lined the two files up and inspected them. To my (almost, but not quite) disbelief, they were absolutely identical! Not a single difference between them, even at the sample level!




Comparison between the Pitch/Time manual shift in PT vs the forced 4x slower playback:



Interestingly, this seems to be some function of the plugin itself. If you go in and simply use the TCE tool (time compressor / expander) and then pitch shift, or even do the process the other way around, it doesn’t work. It sounds aweful!

Eample:



Ick…

My final conlusions:

All of that just to say, if you are to perform the same essential process on either via forced playback or manual conversion, there is absolutely zero difference in the resulting file, as long as you have a plugin (like the Pro Tools Time Stretch AS plugin) that can do it properly. However, if you are looking for a little more control over the file, the manual (plugin) way would be the way to go. You can fine tune your pitch and speed, something you cannot do with forced playback, and, as I’ve found, you don’t risk losing any sonic quality either way. If you don’t have access to a plugin like the once above, just go with the forced pulldown method. It will yield much better results!

One more thought on the subject: should you record everything in 192kHz? No! Only things that you’re interested in significantly altering in post. I would still record foley and the such at 48k. There’s no reason to use that much disc space for something like that. I choose my sample rate on a project by project basis. If I think I’ll find something interesting by recording in 192k, I will. If not, I usually won’t (my typical sample rate for sessions is 96k)

Thanks to Shaun and David for the cool question!

If you’d like some further information on higher sample rate recording, Tim Prebble published this article a while ago about why plugins like higher sample rates.

Comments

  1. Nicely done experiment, Colin. Thanks for taking the time to try that out. I’ll make sure to point this post out to David.

  2. Love the idea of ‘forced playback’ in another sampling rate. I can’t wait to do my own experiments. Thanks!

  3. Colin -
    Thanks for following up on this. If the results had been any different, I would have been really curious why and how. But it does make sense what you got. We’ll keep on asking more questions, see what we discover in this collaborative group of sound explorers!

Trackbacks

  1. [...] over to Colin’s site to find out the results of his experiment. colin hart, high sample rate recording, pitch [...]

Speak Your Mind

*