Thursday, October 29, 2009

Creating Cisco ring tone files with GStreamer & GNonLin

Cisco IP phones have the ability to customize their ring tones. Unfortunately, many people have difficulty creating ring tones because the file format that the Cisco phones expect can be difficult to produce using your average audio application. The phone is expecting a raw format audio file (raw files do not have any sort of header that identifies the encoding of the contents) with one channel, 8000 samples per second, eight bits per sample, and encoded with the μ-law encoding.

GStreamer can create files with this format easily with a command that looks like this:

gst-launch filesrc location=file.wav ! decodebin2 ! audioconvert ! audioresample ! mulawenc ! audio/x-mulaw,rate=8000,channels=1 ! filesink location=file.raw

However, ring tone files have one other twist - they cannot be more than 16080 bytes long, which is about 2.01 seconds of audio! Fortunately GStreamer has GNonLin. GNonLin is a set of plugins for GStreamer that add non-linear editing functions to GStreamer. Unfortunately, they can't be used from the gst-launch command line. Fortunately, the GStreamer Python bindings allow us to create a short
script that will solve our problem.

The code can be found in a Git repository or you can get the script directly here.

Let's take a look at the section of the code that deals with GNonLin:

self.gnlcomposition = gst.element_factory_make('gnlcomposition')
self.gnlcomposition.connect('pad-added', self.on_pad_added)
self.pipeline.add(self.gnlcomposition)

The gnlcomposition element is the toplevel GNonLin element that brings together a number of GNonLin source elements.

self.gnlfilesource = gst.element_factory_make('gnlfilesource')

Create the source element.

self.gnlfilesource.set_property('caps', gst.caps_from_string('audio/x-raw-int; audio/x-raw-float'))

The caps property tells the gnlfilesource that we want audio data out of the file. This is in case we feed the script a video file and we need to ignore the video data.

self.gnlfilesource.set_property('location', source)

Tell the gnlfilesource element where the original file is.

self.gnlfilesource.set_property('start', 0 * gst.SECOND)
self.gnlfilesource.set_property('duration', int(duration * gst.SECOND))

The start and duration properties tell the gnlcomposition element where in the destination audio stream to place the audio we extract from the source audio. If we were writing an application that manipulated multiple audio streams we could rearrange and/or overlap the audio in the destination.

GStreamer uses nanoseconds internally to communicate times, which is why we multiply the duration in seconds by gst.SECOND, a constant that will convert seconds to nanoseconds (there are 1,000,000,000 nanoseconds in a second).

self.gnlfilesource.set_property('media-start', int(start * gst.SECOND))
self.gnlfilesource.set_property('media-duration', int(duration * gst.SECOND))

The media-start and media-duration properties tell the gnlfilesource what part of the source audio stream to extract.

self.gnlcomposition.add(self.gnlfilesource)

Add the gnlfilesource to the gnlcomposition.

The script that I came up with started out as tutorial code originally posted by Jono Bacon on his blog.

1 comments:

bilboed said...

Interesting usage of GNonLin :) If you're only going to use one gnlfilesource, you don't even need the composition, you can just use the gnlfilesource directly (and therefore should be able to use it from gst-launch)