Pitch Shift Calculations

Music Theory Background

All musical notes are constant tones of a certain frequency, known as their fundamental frequency. Table 1 shows the fundamental frequency of each note in the first 7 octaves. For reference, ‘middle C’ is denoted as C₄, which corresponds to a frequency of 261.6 Hz (more accurately, 261.625565 Hz (Google, 2013)).

In music, an octave is the interval between one musical pitch and another with half or double its frequency. The octave relationship is a natural phenomenon that has been referred to as the "basic miracle of music". It may be derived from the 'harmonic series'- as it is the interval between the first and second harmonics. This means that the first and second harmonics are always exactly one octave apart.

An example that illustrates this halving or doubling of frequency to go down or up an octave is as follows; if one note has a frequency of 440 Hz, the note an octave above it is at 880 Hz, and the note an octave below is at 220 Hz. The ratio of frequencies of two notes an octave apart is therefore 2:1. Further octaves of a note occur at 2ⁿ times the frequency of that note (where n is an integer), such as 2, 4, 8, 16, etc. and the reciprocal of that series. For example, 55 Hz and 440 Hz are one and two octaves away from 110 Hz because they are 0.5 (or 2⁻¹) and 4 (or 2²) times the frequency, respectively.

Table 1: Musical notes and their respective fundamental frequencies

Equation 1: MTS definition of musical note’s frequencies: p is the absolute number of semitones of a frequency f(Hz) above C 5 octaves below middle C (C_-1)

These frequencies are standardized according to the MIDI Tuning Standard (MTS). They are defined according to Equation 1. The quantity log₂(ƒ/440) is the number of octaves above the 440 Hz concert A (A₄ in Table 1)and will be negative if the frequency is below that pitch. Multiplying it by 12 gives the number of semitones above that concert A, as there are 12 semitones in an octave. Adding 69 gives the number of semitones above the C five octaves below middle C (C_-1). Using Equation 1 in an example; middle C (C₄) will be used for the frequency.

For middle C, use f = 261.6 Hz:

Equation 2: Number of octaves middle C (C₄) is above A₄. This means that C₄ is 0.75 octaves below A₄.

Equation 3: Number of semitones middle C (C₄) is above A₄. This shows that A₄ is 9 semitones below A₄.

Equation 4: Number of semitones C₄ is above ‘(C_-1)’ or 8.18 Hz (5 octaves below C₄). This shows it is 60 semitones above it, and 12*5=60 so exactly 5 octaves which is correct.

Pitch-Shifting

Musically, pitch-shifting is shifting the melody by a number of semitones (1 or more) up or down. From a mathematical or signals perspective it involves scaling the fundamental frequency of a note by a specific factor to achieve the frequency of the desired pitch-shifted note. The formula for this can be seen in Equation 5.

Equation 5: s is the number of semitones to shift by; f_final is the output frequency (Hz), f_initial is the input frequency (Hz).

The frequency of a note can be analogized to the tempo or beats-per-minute (BPM) of a song. Changing the speed at which a song is played will change the pitch at which each note is played by the same scaling factor as the speed change. This means that if the entire song is in a certain musical key, changing the pitch by an exact number of semitones will transpose the key by this number of semitones. The formula for this can be seen in Equation 6, which is just Equation 5 with s = 1, and 1÷12 = 0.83, and then the parameter k (number of semitones to pitch shift by) is a multiplier on the exponent (0.83). This may seem confusing, but remember that an exponent to the power of another number is the same as the exponent and the number multiplied together.

Equation 6: y is the new BPM, x is the original BPM, k is the number of semitones to pitch shift (+ or -)

Equation 7: a is the constant defined by 2^0.83; y, x and k are as above for Equation 6

The practical version of this algorithm should actually use a = 1.059463094359295260, as even though the applications of it (such as Serato) do not support BPM fidelity greater than 1 decimal place, meaning that the use of these calculations in a mix of 2 pieces of music will result in a slight mismatch in key between them anyway, the difference between the value to 18 d.p. and 2 d.p. will result in meaningful differences as the number of semitones to transpose by increases, certainly enough to change the output BPM by more than 0.1.

Although plotting the data on a graph shows an apparently linear relationship, this is not the case – as returning the equation of the line yields a different linear equation depending on the range of data input values. For example, Table 2 below shows results from pitch range +1.0 to +6.0 semitones (half an octave upward) and Table 3 shows the results from pitch range -1.0 to -6.0 (half an octave downward). The equation of the straight line that passes through these points (or as near to as possible) for 70 BPM is:

Equation 8: Linear equation (slope-intercept form) closest to results (data points) from pitch range +0.5 to +3.0

If the range of data input values (the pitch range) is extended, however, to a pitch range between +1.0 and +12.0 (full octave upward), the equation of the straight line closest to these data points is shown in Equation 9:

Equation 9: Linear equation (slope-intercept form) closest to results (data points) from pitch range +0.5 to +5.0

This result is counterintuitive to what would be expected in gathering data that was part of a linear set. To explain this, consider the case x = 0 (i.e. zero pitch change), y should be equal to the original BPM value, which in Cartesian geometry is the intercept 'c' from the slope-intercept form of the linear equation y = mx + c. When evaluated, however, this value is 69.681 for the first equation with fewer data points (Equation 8), and 69.452 for the second equation with more data points (Equation 9)), it should be 70 as the mx term of the linear equation goes to zero, leaving y = c or y = 70. The crux is that the more data points which are added (for instance in Equation 9) the further from 70 this intercept value ('c') becomes. This is because the relationship is actually of an exponential nature rather than a linear nature. Because the constant ‘a’ is so close to 1 (~1.06), the relationship looks linear on inspection, especially when ‘k’ is small. It, however, is not - and bearing this in mind; the correct equation for the results (when the original BPM is 70) is:

Equation 10: Correct implementation of the equation pitch change at 70 BPM, a and k are as above for Equation 6

This means that the original equation (Equation 7) must be used each time, and an example of its use can be seen in Equation 10. I have included two practical tools that can be used when mixing songs if a key shift is required:

A lookup table of common BPMs, pitch shifts and output results (Table 2 and 3)
A ruler for finding the number of semitones between notes in an octave (Table 4)

Table 2: Output BPM values for given original BPM values with an applied positive pitch shift to change the key (in semitones)

Table 3: Output BPM values for given original BPM values with an applied negative pitch shift to change the key (in semitones)

Table 4: Pitch and Key relationship to Semitones and Notes

incidentNormal._github.io

Navigation

Pitch Shift Calculations