CSC108H Assignment 2, Summer 2009

Introduction

The purpose of this assignment is to give you practice with strings, lists, functions, and sound processing.

You'll use our media module (called Pygraphics when you install it) for several parts of this assignment. You should make sure that you have properly installed that module. You can test that it is working by typing import media at the Python shell. Then, please place this media.py and sound.py into the same directory in which you are working on your assignment. These modules exist as part of the media module, but I have made several important fixes. By placing these files in your assignment directory, Python will use them instead of the ones supplied as part of the media module.

Start Early, Please

This is a long handout. I recommend reading it through once or twice before starting, so you get an overall picture of how everything fits together. I also recommend starting early. It is easier to do the assignment in small steps rather than all at once, because each function on its own is not large.

Background

Sounds are waves of air pressure. When a sound is generated, a sound wave consisting of compressions (increases in pressure) and rarefactions (decreases in pressure) moves through the air. This is similar to what happens if you throw a stone into a pond: the water rises and falls in a repeating wave.

When a microphone records sound, it takes a measure of the pressure in front of the microphone and returns it as a value. These values are called samples and can be positive or negative corresponding to increases or decreases in air pressure. Each time the air pressure is recorded, we are sampling the sound. Each sample records the sound at an instant in time; the faster we sample, the more accurate is our representation of the sound. The sampling rate refers to how many times per second we sample the sound. For example, CD-quality sound uses a sampling rate of 44100 samples per second; sampling someone's voice for use in a VOIP conversation uses far less than this. Sampling rates of 11025 (voice quality), 22050, and 44100 (CD quality) are common; we will create sounds with a sampling rate of 22050 in this assignment.

A sample is simply a positive or negative integer that represents the amount of compression (positive or negative) in the air at the point the sample was taken. We will use 16 bits for each sample. Note that for stereo sounds, a sample is actually made up of two integer values: one for the left speaker and one for the right. We will be working with only mono (i.e. one channel) sound files in this assignment.

The media module we have been using contains functionality for working with sound files. You will use its features to complete the various subparts of this assignment, after which you will have created a Song Parser. The Song Parser will be able to take a string of note data and play the song it represents. In part 1 of this assignment, you'll write some functions that manipulate sounds. In part 2, you will write the song parser, some of whose features will rely on your functions from part 1. Some of the sound files in this assignment came from acoustica.com.

Part 0 - Messing Around

The media module contains functions for loading sounds from existing wav files, and creating new, empty sounds that contain as many samples as you request. The relevant functions are load_sound and create_sound. You should familiarize yourself with these functions (by using Python help) before continuing.

For example, here is a Sample Wav File that plays the notes C, D and E. You could verify this by loading it into any media player, but you can also play the file much as you displayed a picture in assignment 1:

>>> import media
>>> a = media.load_sound('cde.wav')
>>> a.play()

Being able to play wav files with the media module will help you test some of the functions you will write.

You may be more familiar with MP3 files than wav files. The major difference between the two is that MP3 files are sound files that use a form of lossy compression to make them smaller than their wav counterparts. Wav files typically store sound in an uncompressed format, so they are usually far bigger (sometimes ten times bigger) than the same sounds stored as MP3 files.

Part 1 - Sound Functions

All functions in this part should be stored together in a file called sound_functions.py. Done correctly, each can fit comfortably in 15 lines (but certainly does not have to, as long as your code is clear), and follow similar looping strategies.

1.1 Reversing a Sound

The first function we'll write is reverse (snd): it takes a sound, and creates a new sound that is the reverse of the original sound. (The original sound is not modified.) For example, the reverse of this sample wav file we gave above is reversed here. Notice that reversing a sound simply means "play it backwards". Reversing everyday noises can sound strange: here is a door slamming and a door slamming in reverse!

To reverse a sound, we want to reverse the order of its samples. If we conceive of a sound as a sequence of samples played from left to right, we reverse a sound by instead ordering its samples from right to left. For example, if a sound has sample values 2, 3, and 4, then the reversed sound will have sample values of 4, 3, and 2. (Of course, sounds usually have hundreds or thousands of samples, not just three.) Investigate the function get_sample, which gives you access to single samples of sounds, and get_value and set_value, which allow you to retrieve and modify the value of a sample, respectively. In case your solution requires it, you can obtain the length of a sound in samples using the len function that we used on strings.

1.2 Mixing Sounds

Next, write a function mix(snds) that takes a list of sound objects, and "mixes" them into a new sound that is returned. (None of the original sounds in the list is modified.) By "mixing", we mean that the original sounds are played at the same time, so that each sound that is mixed is heard at the same time with the other sounds. For example, if we mix this three-note sound and this sound of water bubbling, we get this combination of notes and water. Your function should work with an arbitrary list of sounds, not just two sounds. The length of the resultant sound should be the length of the longest sound in the input list, otherwise part of one or more sounds will be cut off! In our example, the three notes were longer than the bubbling water, so the length of the mixed sound was the same as the notes sound.

Mixing two or more sounds involves adding corresponding samples together. For example, if one sound has three samples: 2, 4, and 6; and another sound has four samples: 10, 11, 12, and 13; mixing them yields a sound of four samples: 12, 15, 18, and 13.

Note that if we try to mix too many sounds together -- or just a couple loud ones! -- the resultant, mixed sound will sound distorted. This phenomenon is called clipping. For example, if you mix together four or five copies of the above water sound, it sounds more like static than like water. Why do you think this occurs? (You should submit a short, two-or-three sentence explanation in a file called clip.txt.) Hint: remember, we use only 16 bits to store each sample.

1.3 Changing Volume

To increase or decrease the volume of a sound wave, we increase or decrease its amplitude, respectively. In terms of our digital representation of sounds, we will achieve this by multiplying or dividing each sample by a constant in order to increase or decrease the volume. If we multiply each sample by 2, for example, we double the volume; if we divide each sample by 2, we halve it.

Write a function change_volume(snd, factor) that returns a new sound resulting from multiplying each sample in snd by factor. (The original sound is not modified.) We will then be able to use this function to increase the volume (by providing a factor larger than 1) or decrease it (by providing a factor between 0 and 1). For example, if we use a factor of 0.5 on this crow cawing sound, we get this crow at half volume sound.

1.4 Adding Echo

Taking the lessons on mixing and volume from the previous two exercises, write a function echo(snd, delay) that takes a sound object and a delay, and returns a new sound that adds an echo to snd; snd itself is not modified. To add an echo to a sound, we will mix in another, lower-volume copy of that sound starting delay samples from the beginning. The lower-volume copy of the sound should be at 25% of the original volume. The number of samples in Your new sound will be the sum of the number of samples in the original sound, plus delay samples. The reason the new sound is longer than the original is because otherwise the echoing copy of the sound would be cut off before it completes. Of course, you can write this function by directly manipulating sound samples, but it would be pretty hyper if you relied on your volume and mixing functions...

Here is an example. This Crow Cawing sound has an echo added to it to create this echoing crow cawing sound. I used a delay of 5000 samples. As another example, this welcome sound has an echo added to it to create this welcome sound with echo. Here, I used a delay of 10000 samples.

Part 2 - Song Parser

In this section, you will write a function song_parse(notestring) that takes a notestring (to be described) and returns its representative sound object. The sound returned by your function can then be played like any sound loaded from a wav file. However, you will directly generate the returned sound; you are not to load any wav files at all for this part. Where appropriate, you should call and reuse your functions from part 1. The function for this part should be saved as file song_parser.py.

Let's begin with the simplest notestrings, and incrementally describe all of the features you must support. First, consider the notestring "CDEFGAB". Passing this string to your song_parse function should result in the sound object which, when saved to a wav file, results in this sound file. The sound is composed of the note C, followed by the note D, followed by the note E, and so on, until the note B. The simplest notestrings, then, are composed of the letters A, B, C, D, E, F, and G, corresponding to the seven notes of a scale. The media module has a create_note method for creating notes based on these note names that you should use to create notes and append them together to create a sound that represents the entire notestring. Notes created in this way are like the sounds we have been using all along: they support the same methods and have the same functionality. When creating a default note, it should last for 5000 samples, use the default volume of the note when it is created, and use the default octave. These parameters are influenced by other features of notestrings, described shortly.

Between each pair of notes, you should introduce 500 samples of silence. It is OK if there are 500 silent samples after the last note. (There is an easy way to generate silence with the media module.) The reason we do this is so that a string such as "CC" plays two distinct C notes, not one longer C note. The string "CC" should sound like two distinct notes, not like one longer note. Listening carefully, the first of these files contains two distinct notes, whereas the second contains one double-length note.

The second feature of notestrings is evident in a string such as "2Cd2E". If we have a positive integer number n directly preceding a note, it means that the note should last n times its normal length. The integer n may be multiple digits long; you should support these multi-digit integers. Note that "CC" is different from "2C"; the first plays two distinct C notes, whereas the second plays a double-length C note.

Here is a string for you to parse once your function supports all of the syntax described so far; it is the first ten notes of Canada's national anthem: 4E3GG6C2D2E2F2G2A6D.

The third feature of notestrings is the ability to change octaves. A > symbol means "increase the octave by 1" and a < symbol means "decrease the octave by 1". The new octave is active until changed by another greater-than or less-than sign. That is, all of the notes following an octave-changing sign will be in that new octave until the octave is changed again. When we increase the octave, all of the notes still "sound the same" except they have a higher pitch. Similarly, when we decrease the octave, notes have a lower pitch. (Interestingly, the sound frequency doubles each time we increase the octave by 1. That is, in terms of frequencies, corresponding notes in successive octaves become more and more distant as the octaves increase, even though it sounds like a linear increase in pitch to us! ... But you don't have to care about this for the assignment.)

As an example, the string 4E>4E>4E>4E<<<<4E>4E sounds like E's in different octaves. The note E is played in four increasing octaves; then played one octave below the default octave; then played again at the default octave. Here's another example: 4C>4C<2BGA2B>2C; do you know this famous song?

The fourth feature of notestrings is the ability to change volume. A + symbol means "increase the volume by 1" and a - symbol means "decrease the volume by 1". The new volume setting is active until changed in the string. The default volume is 0. A volume of 1 causes notes to play at twice the default volume; a volume of 2 plays notes at three times the default volume; and so on. A volume of -1 causes notes to play at half of the default volume; a volume of -2 plays notes at one-third of the default volume, and so on.

As an example, the string 2C+2C+2C+2C+2C----2C sounds like C's at various volumes. The double-length C note is played at the default volume, then at each of four increasing volume levels, before being played again at default volume.

The fifth feature of notestrings allows us to include multiple "channels" that are mixed together in the final sound. The | symbol in a notestring indicates that the current channel has ended, and a new channel is beginning. For example, the string "8C|8F|8A|>4C" sounds like a sound with multiple channels. It plays four channels at the same time: the first channel plays a C, the second an F, the third an A, and the fourth a C in the next octave. This final note is shorter than the others, so ends first; the remaining three notes keep playing the harmonious chord. Note that any octave or volume changes are restricted to the channel in which they occur; in particular, octave and volume commands have no effect on any channel descriptions that follow it in the string.

Another way to think of what the | does is to think in terms of "hands" playing a piano. The stuff before the first | is what your left hand is playing, the stuff after the first | and before the second | (or until the end of the string if there are only two channels) is what your right hand is playing. At this point, we run out of hands for our metaphor, but your supported strings should not be restricted to just two channels.

The following string is a larger example that collects most of the functionality discussed so far. It contains notes, notes with numbers preceding them (for increasing their length), octave changes, volume changes, and two channels played simultaneously:
">+CCGGAA2GFFEEDD2CGGFFEE2DGGFFEE2DCCGGAA2GFFEEDD2C|+CGEGCA2EBFCGBF2CEGFAEG<2B>EGFAEG<2B>CGEGCA2EBFCGBF2C"
The result is this nice little Twinkle Twinkle tune. The first channel contains the melody of the song, and the second channel contains the accompanying harmony. The harmony plays along with the melody to give the song a fuller sound.

The final feature of our notestrings is that they may begin with the substring [x] (including the square brackets), where x is an integer indicating that an echo with delay x should be applied to all channels defined in the string. For example, using the string that results from prepending the Twinkle Twinkle string with [43000], we get Twinkle Twinkle With Echo. You'll hear the song start; shortly after, another copy of the song -- at reduced volume -- is overlayed on the first. Once the first copy of the song ends, you'll hear only the lower-volume copy ending on its own.

The only place an echo delay specifier can appear is at the very beginning of the entire notestring. In particular, it is not permitted to occur at the beginning of any channel's description besides the first. If the first character of a notestring is not the [ symbol, the resultant sound will have no echo.

Some further information and tips:

Marking

These are the aspects of your work that we will focus on in the marking:

What to Hand In

Hand in the following files:

Remember that spelling of filenames, including case, counts: your files must be named exactly as above.