Question

Proper AVAudioRecorder Settings for Recording Voice?

I am adding a voice memo capability using AVAudioRecorder and I need to know the best settings for the recorder for recording voice.

Unfortunately, I know nothing about audio to the extent I am not even sure what terms to google for.

Currently, I am using the following which I copied from somewhere for testing purposes:

recorderSettingsDict=[[NSDictionary alloc] initWithObjectsAndKeys:[NSNumber numberWithInt:kAudioFormatAppleIMA4],AVFormatIDKey,
                        [NSNumber numberWithInt:44100.0],AVSampleRateKey,
                        [NSNumber numberWithInt: 2],AVNumberOfChannelsKey,
                        [NSNumber numberWithInt:16],AVLinearPCMBitDepthKey,
                        [NSNumber numberWithBool:NO],AVLinearPCMIsBigEndianKey,
                        [NSNumber numberWithBool:NO],AVLinearPCMIsFloatKey,
                        nil];

or:

defaultSettings =     {
    AVFormatIDKey = 1768775988;
    AVLinearPCMBitDepthKey = 16;
    AVLinearPCMIsBigEndianKey = 0;
    AVLinearPCMIsFloatKey = 0;
    AVNumberOfChannelsKey = 2;
    AVSampleRateKey = 44100;
};

This works but I don't know if it's optimal for voice in terms of quality, speed, file size etc.

The AVAudioRecorder Class Reference list many settings constants but I have no clue which ones to use for voice.

Baring that, if someone knows of a good "AudioFormats for Dummy's" resource I will take that as well. (Note:I've been through the Apple Docs and they assume a knowledge base in digital audio that I do not posses.)

45 31405 45

1 Jan 1970

Solution

You'll want to read the iPhone Application Programming Guide section titled Using Sound in iPhone OS, and the Audio Queue Services Programming Guide. (Edit: These links are outdated, the Using Sound in iPhone OS has been edited out of the current Application Programming Guide, but the Audio Queue Services Programming Guide is updated and moved.)

Most sounds in human voices are in the middle range of human hearing. Recorded speech is easily understood even when digitized with very low data rates. You can stomp all over a voice recording, yet still have a useful file. Therefore, your ultimate use for these recordings will guide your decisions on these settings.

First you need to choose the audio format. Your choice will be determined by what you want to do with the audio after you record it. Your current choice is IMA4. Maybe you'll want a different format, but IMA4 is a good choice for the iPhone. It's a fast encoding scheme, so it won't be too taxing for the limited iPhone processor, and it supplies 4:1 compression, so it won't take up too much storage space. Depending upon the format you choose, you'll want to make further settings.

Your current sample rate, 44.1 kHz, is the same as the standard for CD audio. Unless you're after a high fidelity recording, you don't need this high of a rate, but you don't want to use arbitrary rates. Most audio software can only understand rates at specific steps like 32 kHz, 24 kHz, 16 kHz, or 12 kHz.

Your number of channels is set to 2, for stereo. Unless your using additional hardware, the iPhone only has one microphone, and 1 mono channel should be sufficient. This cuts your data needs in half.

The three Linear PCM settings you are using seem to be just for Linear PCM format recordings. I think they have no effect in your code, since you are using the IMA4 format. I don't know the IMA4 format well enough to tell you which settings you'll need to make, so you'll have to do some additional research if you decide to continue using that setting.

2010-02-02

Solution

One thing to consider is that for a long time the traditional land-line voice companies--since going digital--used 8-bit, 7KHz sampling. This is why trunk lines come in the sizes they come in. A T1 20 64k channels, which leaves a little overhead for the 56k of voice data coming through plus whatever management metadata they need.

So if you want POTS quality, 8b/7KHz should be fine. Adjust up based on your needs.

2013-07-28