25 Feb 2022
In this post we’ll learn how to obtain audio data from a microphone in MacOS, using Audio Queues.
This is heavily based on the chapter Recording from Learning Core Audio book by Adamson and Avila , which describes how to record audio to an AAC (Advanced Audio Coding format).
The main difference is that in our post we are interested in intercepting the actual audio samples instead of writing straight to disk.
To that end, we’ll use the PCM (Pulse-code modulation) audio format instead of AAC, since the former is uncompressed, which is needed if we are to have access to the raw samples.
Finally, we’ll use C++ classes in our code, while  uses plain C.
We’ll rely on a few key concepts in this post which is worth defining.
Sample. At a high-level, sample corresponds to a discrete snapshot of a signal at a specific point in time. More concretely we can see it as the signal’s amplitude(s) + the timestamp.
Sample Rate. Frequency in which Samples are obtained from analog signal (given in Hz).
Channel. Analogous to channel in images (RGB), digital audio can have channels for the left and right speakers, while more advanced 5.1 surround-sound formats might have 6.
Frame. Analogous to a pixel in image, it contains the amplitudes of each channel of a sample.
Packet. In compressed formats it might groups several frames together in a packet. Different packets might have different number of frames. For uncompressed formats it’s assume the number of frame per packet is 1.
For a more comprehensive terminology list, check the Core Audio cheat sheet.
Apple’s Audio APIs can work with a variety of file formats. The metadata about formats is stored in the structure
AudioStreamBasicDescription (sometimes abbreviated as ASBD). It can represent uncompressed or compressed, lossless or lossy formats.
For the compressed formats it might need additional metadata since it often groups samples in non-uniform packet sizes. But in this post we’ll use PCM, which simplifies the metadata a bit.
To get started, let’s define a function to fill-in a
AudioStreamBasicDescription with PCM settings:
Let’s discuss some of the fields.
mFormatID = kAudioFormatLinearPCM indicates this metadata is describing the PCM format. In PCM there’s no compression so each packet has exactly one frame, so
mFramesPerPacket = 1. The microphone I’m using only has one channel, so
mChannelsPerFrame = 1.
We’ll use a 32-bit integer to represent the amplitude of the sample, so
mFormatFlags = kAudioFormatFlagIsSignedInteger and
mBitsPerChannel = 32.
In the case of PCM some properties are derived, such as
mBytesPerFrame = mBitsPerChannel / 8 * mChannelsPerFrame and
mBytesPerPacket = mBytesPerFrame * mFramesPerPacket.
The sample rate is set based on the input device. First we get the ID of the default input microphone using the
Once we have the device ID, we can use
AudioObjectGetPropertyData again to get its sample rate:
AudioQueue is a data structure from Apple’s Audio API. It implements a message queue that is useful to connect a source (in our case a microphone) with a destination (in our case our callback).
The high-level operation of the queue is that it will receive samples from the microphone and accumulate them in buffers. Once the buffer fills up, it will invoke a callback. The major reason for this is debouncing: we don’t have to invoke the callback on every sample received, which could be expensive.
AudioQueue keeps a queue of $n$ buffers (configurable) so that it can send the buffer to the callback as soon as it fills up, and start working on the next buffer.
In theory 2 buffers would suffice but having more helps with temporary variability in throughput. For example, suppose the callback became temporarily slow due to some I/O issue. If we only had two buffers, the input might fill up its own buffer, but if there were more buffers it could start working on it.
An alternative would be to increase the buffer size but then the callback would be called less often which could be disruptive if it’s a real-time application (e.g. playback).
As mentioned earlier, we’ll create a class to represent a PCM queue wrapping the Audio APIs. The overall structure of our class is:
The first thing we’ll do is to register the callback via
AudioQueueNewInput(). Among other things, it takes the audio format structure (
format), a pointer to the callback (
PCMQueue::CallbackWithoutThis), the additional input for it (
this) and the queue itself (
queue_, a member variable).
We can’t pass instance methods as function pointers, but we pass
this as additional input and then invoke a method inside
This helps converting from a plain-C API into a more OOP version. In
CallbackWithThis() we call a virtual method
OnReceiveData() to be implemented by a derived class, so that we can keep this as a generic PCM queue without specific business logic.
This function also re-enqueues the buffer once it’s consumed so that the source can pick it up, hiding this detail from the derived class.
Once the queue is setup, we can set some extra metadata in the
AudioStreamBasicDescription. I have no idea what this is doing exactly but if I omit this the recorded audio is a lot more noisy.
In  the authors mention it might be setting some codecs, but from a quick inspection of the fields of
format_ I couldn’t find what changed.
We need to add the buffers to the queue. We parametrize (via
Options) this function with the number of buffers and the corresponding amount of time it can hold data for.
AddBuffer() creates and enqueues a buffer of given size using the
Creating, starting, stopping and destroying the queue is simple given the methods defined above and other
We can subclass
PCMQueue to provide an implementation for the
OnReceiveData() which appends the samples to a file.
To put everything together we define a
main() function and block the thread until Enter is typed.
To verify the samples we collected make sense, we can convert those amplitudes to
wav format using Python. We only need to take note of the sample rate when running the C++ code.
The full C++ code is available on Github and can be build using
make build (you probably need the MacOS SDK that ships with XCode).
In this post we learned how to “intercept” samples from a microphone in MacOS.
For this particular case we wrote the data to file, which is not much different from chapter Recording in , but being able to get the raw audio samples in real-time is a key ingredient for doing processing, such as speech recognition, on the fly.