Mobile devices of the first generation were not capable of playing any sounds except monophonic ringtones. Today any smartphone platform supports playback, record and to some extent manipulation of digital sound, often up to CD quality.
Android is no exception to that. Even older 1.5 devices can do a lot of things with digital audio. However, due to lack of guidance, a developer who is doing his first audio feature might miss some of the available API facilities, or might choose the approach that is not optimal to her use case. This article is supposed to clarify the three major APIs for working with audio on Android. As you will see further, with a correctly chosen API, you can do a lot of things with audio, including:
Streaming audio from the Web
- Playing game sound effects with a low latency
Modifying and mixing audio on the fly
- And generating your own custom audio data from scratch.
The APIs that we will review are MediaPlayer, SoundPool and AudioTrack. As a bonus, we will mention the future API that is planned for forthcoming versions of Android – OpenSL ES. Let’s start with the most popular audio API – MediaPlayer.
API #1: MediaPlayer
MediaPlayer is the most widely used class for audio playback on Android. It has a good official javadoc, is widely shown in examples and is quite easy to understand.
In fact, MediaPlayer is designed to play not only audio, but video as well. It is meant to be a highlevel API for playing back various media sources, including media from the app resources, from the SD card and even streaming media from the Web, such as playing an MP3 file located on an HTTP server.
The API is quite rich, and clearly designed for “long” media streams, such as voice audio recordings, music and videos. For example, seeking operations are present, and care is taken to buffer the source by portions. However, there is a big price you pay for using the powerful and convenient MediaPlayer. It is rather heavyweight resource-wise and slow to initialize. Latency (e.g. time between a “play” button click and when the sound actually starts) is OK for an MP3 player app but absolutely dreadful and unpredictable for using for sound effects, such as in games.
Also, the platform clearly does not expect you to have more than one or two MediaPlayers working at the same time. To summarize, a MediaPlayer is what it is named. It is a powerful player for media streams, including audio and video, with typical media player operations supported and stable playback of long streams guaranteed. It is not a good idea to use it if you need quick (low-latency) playback of short audio samples, especially if more than one sample is expected to be playing simultaneously.
In those cases, you should really check out the SoundPool.
API #2: SoundPool
The SoundPool is a polar opposite of MediaPlayer, except it plays audio too! It is a relatively obscure class that is designed for quick playback of a predefined set of relatively short sound samples.
The most typical use case is the sound effect pool in a game. Usually you have a set of audio samples (gunshot, monster growling, scream of pain, power-up sound etc.) that are played over and over again, often intermixed as they need to sound at the same time.
In such a scenario, you require low latency since you want the gunshot sound to be played at the same time the game canvas shows the gun shooting (and preferably, as faster as possible after the user clicks the shoot button). You also want to be able to play several sounds at once since sound-generating events can happen at the same time or within a short period.
On the other hand, you know all the sounds in advance and you are OK to preload them before the game starts.
Another use case for a SoundPool would be a UI app that plays sound effects in response to user actions. For example, Skype plays sound effects when you send and receive messages. Doing that with a high latency, such as when you use a MediaPlayer, would annoy the user.
In addition to the basic operations of preloading and playing sounds, SoundPool supports such goodies as:
- Setting the maximum number of sounds to play at the same time
- Prioritizing the sounds so that the low-priority ones will be dropped when the maximum threshold is reached
- Pausing and stopping sounds before they finish playing
- Looping sounds
- Changing the playback rate (effectively, the pitch of each sound)
- Setting stereo volume (separate for left and right channels)
In most cases, either MediaPlayer or SoundPool can solve your case. However, sometimes you will choose to do something really powerful – mess with the raw sound bits, and push sound data to the audio hardware. This is where AudioTrack comes into play.
API #3: AudioTrack
AudioTrack is the lowest level audio API that we have on Android. Contrary to what you might think, there is no native audio API, so AudioTrack is the most powerful, but the most difficult to use sound API on the platform.
What AudioTrack gives you is a “channel” that you configure in terms of rate (e.g. 22050, 44100 samples per sec etc.), stereo or mono, audio format (such as “16 bit unsigned PCM”) and buffer size. (If any of those concepts are not clear to you, then you really need to learn more about digital audio before using AudioTrack.)
After you create an AudioTrack “channel”, you can push raw audio data (as byte buffers) into it whenever you find it suitable (likely from a non-UI thread), and
AudioTrack will interpret it according to its configuration and play it on the audio hardware.
Clearly, this allows you to do a lot of cool things, such as:
- Decoding audio from any format that is not supported by the platform
Modifying or enhancing audio on the fly (such as applying distortion, vocoder, reverb effects – but you have to code them yourself!)
Loading audio from custom sources to play it on the fly (but you have to deal with buffering and stability yourself – be sure not to develop another MediaPlayer!)
The price is that you are on your own regarding buffering, latency and correctness of the sound you produce with AudioTrack.
With all three APIs in mind, let’s see what we can expect from the future versions of Android.
Future API #4: OpenSL ES
OpenSL ES is a cross-platform audio API designed specifically for mobile devices. As it is clear from its name, it’s like OpenGL, but for audio. Some readers might be familiar with OpenAL. The difference between OpenSL and OpenAL is that OpenSL is designed from scratch for limited resource devices, while OpenAL is mostly for desktop machines. It is underlined by the difference in their names that the APIs are not designed to be compatible.
It is not yet clear when OpenSL is going to be supported on Android, but it seems like it will be supported from native code (only).
You can learn more about OpenSL here: http://www.khronos.org/opensles/
Conclusion: Choosing the right audio API
Here’s a table that might help you find your way between the three audio APIs that you have on Android.
Do you need low latency, such as in games or sound effects?
Use SoundPool; if it’s not flexible enough, use AudioTrack
Do you need to play video that has an audio track?
Do you have a set of short sounds that you expect to play many times?
Do you need to stream audio from an external source, e.g. HTTP or TCP stream?
Try using MediaPlayer; if it doesn’t support your case, use AudioTrack
Do you need to generate audio from scratch, such as by using math formulas / frequency modulation?
Do you need to play background music?
Do you need to modify the audio on the fly?
This table is far from being complete, but hopefully it could help you choose the right API for your case.
Now that you know the tools - be sure to have a lot of fun while implementing your digital sound ideas on Android!