Two basic types of sound sources are available on a modular sound synthesizer; internal sources and external sources. An external sound source can be literally anything that produces sound, but for internal sound sources a specific reference is made to modules that are at the heart of a sound, in general the modules that are also responsible for the pitch of the sound. To be able to use external sources the synthesizer must have audio inputs. On an analog modular synthesizer the audio inputs of the modules expect signals that are much stronger than the line level signals that are produced by standard audio equipment like CD players. All signals that come from such equipment, and also signals generated by microphones, electric guitars, etc., need a preamplifier to be useful in an analog system. Many of the old analog modular systems offered a special external input module that would amplify the external signal up to a level where it could be used with other modules. On digital systems special inputs must be present that convert the audio signals into digital information, so the audio signals can be processed on the digital level.
In essence pitched musical sounds are a dynamically changing complex structure of repetitious waveforms with a certain pitch sensation, loudness contour and characteristic timbre. One single instance of a repetitious waveform is named a cycle. Many synthesis techniques simply try to produce and manipulate these waveform cycles. Mathematically every single waveform cycle in a short sound clip can be seen as the accumulation of a series of sine and cosine partials of certain amplitudes and by being able to handle these partials individually any conceivable sound can in theory be made. So, a single cycle of a waveform can be broken down into little parts, each part being a sinewave of a number of cycles that fits exactly into the ‘space’ of the waveform cycle of interest. It is a bit difficult to imagine how and why a waveform cycle should be broken down into these sinewave partials, the math to do this is pretty complex, but the reason is because the ear does really hear these partials. The hearing mechanism of the human ear translates the partial information in a sound into the sound sensation that the mind experiences, with all the sense of timbre, loudness, harmonicity and even the sense of recognition and meaning that sounds can have. By using the amplitude information about these partials the structure of the sound can be defined in the frequency domain, which is basically a description which partials will be present in a single cycle at a given moment in time. This means that every cycle can be descibed in a separate spectral plot. The whole sound clip can be described by creating a series of spectral plots, one for each consecutive waveform cycle in the clip. Next, all this data in the frequency domain can be converted into the time domain, which is the actual signal or recording. In essence the time domain describes the vibrations of the air pressure during the time the sound is actually heard. The method to handle all partials as single entities and control them in time is named additive synthesis. Regrettably this method needs so much data on all the possible partials for all the waveform cycles in a sound, plus that this data needs to be made available in a very fast rate, that in practice it is very hard to design an instrument using this additive synthesis method that allows to be played easily and expressively. Which means that in practice about all sound synthesis techniques used in musical instruments are about simplification, as the theoretically perfect additive synthesis technique is practically just too cumbersome to be implemented in an instrument. However, simplified models of additive synthesis, like in the drawbar organ, have become very popular. Still, the drawbar organ does not allow for precise imitations of acoustic instruments, as in general any simplification will imply a certain characteristic sound caused by the simplification. This will automatically put the instrument in a class of its own. The last statement is a very important notion, as many believe that sound synthesizers are designed to be used to imitate already existing traditional instruments. Which is a limited view, as the best one can ever get when imitating is a close approximation where the difference in sound between the real world instrument and the synthesized approximation adds a little characteristic of its own to the imitation. In general it is better to see electronic music instruments in a distinct class of their own, just as these instruments can be so much more that just imitators. In fact, when an electronic instrument can do imitations well, it can most certainly do proprietary stuff even better.
To overcome the need to handle the big bulks of data in additive synthesis the analog modular synthesizers from the sixties have used the subtractive synthesis method. The popular conception is that this method does not build brick by brick but tries to take the opposite approach by using a signal that contains at least all the partials needed and later simply remove what is not needed in the final sound. The modules that are used as the primary sound sources are named oscillators. Oscillator modules will provide the musician with a tuneable, single pitched raw sound with a static and in general very rich timbre that lends itself well for filtering. There are several types of oscillators, each optimized for certain fields of application. All oscillators have at least an input for a control signal that will define the pitch of the sound signal it will produce, plus at least one output where the sound signal can be taken from. Depending on the type of oscillator there can be one or more extra inputs for specific modulation purposes. Some oscillator types even need an audio signal from somewhere else before they can produce anything, an example is the type of oscillator which is commonly used in a technique named physical modelling or waveguide synthesis.
The advantage of using oscillators and filters is that they suit the earlier described exiter/resonator model very well, as in this model the oscillator will function as the exiter. A resonant filter that removes what is not needed and emphasizes what characterizes the sound will function as the resonator for the oscillator signal. When making a comparison to a violin the oscillator relates to the string and the bowing action, while the resonant filter relates to the wooden violing body acting as the resonance box.
There are two commonly used waveforms which are very simple to generate and that have the very rich sound that is useful to be filtered later. These two waveforms are named the sawtooth waveform and the pulse waveform. When plotted graphically, the sawtooth waveform may rise up or slope down, but the human ear does not notice any difference if it slopes up or down. Still, it can sometimes make a difference if a sawtooth waveform slopes up or down when it is processed later. Sonically the sawtooth sounds very rich and bright. When two or three sawtooth oscillators are closely tuned to create an unison effect, and their mix is filtered with the right sort of filter, a very rich sound with a spacious, reverberant character is created. Exactly this sound was very easy to patch on the first modular system designed by Bob Moog halfway the sixties, and has become one of the hallmark sounds of the synthesizer. This type of sound is still very popular in dance music where it is the foundation of a thick unison sound named a hoover. This hoover sound is thickened even more by a chorusing unit and then played in a dramatic way with lot’s of pitchbend at the start of the notes.
The sawtooth waveform contains all the possible harmonics of the pitch it is tuned to. Which makes the sawtooth an ideal waveform to be filtered, as in a sense the basic timbre of a sawtooth is neutral.
The pulse waveform is a signal that is basically only on or off, in this respect it is similar to a binary signal. There is a ratio between the time that the signal is on and the time it is off, this ratio is named the pulsewidth and can be expressed in a percentage. The pulsewidth has a pronounced effect on the basic sound of the pulse waveform, if the pulse is perfectly symmetric, meaning that the time it is on is exactly the same as the time it is off, the sound has a distinct hollow character, a bit similar to the hollow sound of a clarinet. Such a symmetric pulse waveform, where the pulsewidth is exactly 50%, is named a squarewave. The important property of this 50% pulse waveform is that it has only the odd harmonics of the basic pitch present. It is the absence of even harmonics that creates this typical hollow sound. The moment the pulsewidth is changed from 50% to a smaller pulse some of the even harmonics will return. There are pulsewidth settings where other harmonics disappear, e.g. when the pulsewidth is 33.3% the third harmonic will disappear but the second harmonic will be significantly present. On virtually all analog synthesizers the pulsewidth can be controlled dynamically, a feature named pulsewidth modulation. Every pulsewidth setting has a different harmonic spectrum, and a very lively effect is created when the pulsewidth is dynamically changed. This pulsewidth modulation effect sounds close to the unison effect of two closely tuned oscillators. Common methods to modulate the pulsewidth are by using a low frequency oscillator set to a triangle waveform pitched to around 1 Hz. Another common trick is to use an AD envelope with a fast attack time and a decay time between 300 msec and 1 second to smoothly glide the pulsewidth from e.g. 20% to 50%. When this same AD envelope is also used to sweep a lowpass filter which filters the oscillator signal, the typical snappy sound is produced that was often used in the sequenced or arpeggiated synthlines in the electropop genre of the eighties.
After generating a basic signal by one or more oscillators, one or more filters can do the removal of all unwanted partials. The quality and controlling possibilities of the filters define how accurate the method will be in practice. But to be theoretically perfect the filter would have to be so complex and need so much dynamic control data that probably the same amount of data would be needed as when using additive synthesis. Again simplifications are made. In fact subtractive synthesis as used in ‘analog’ synthesizers and their ‘virtual analog’ digital equivalents can better be seen as a form of formant synthesis where resonant filters are used for the purpose to create a strong but easily controllable formant at the resonant frequency of the filter. The reason why the sawtooth and pulse waveforms are used as the raw material to be filtered has much more to do with how these waveforms excite the resonant filters than with the spectral content of the waveforms. The sharp transients in the waveform, these are the flanks in the waveform plot where the level suddenly changes from one extreme to the other, are what ‘fires’ the resonance in a resonant filter. Transients contain an enormous amount of ‘energy’. They have to, as when such a waveform directly drives a speaker, this is the moment when all the mass of the speaker has to be moved from one extreme to the opposite extreme. In the resonant filter this energy is transformed into a ‘ripple’ lagging the transient in the waveform, which creates a strong formant at the resonant frequency of the filter. Sweeping the resonant frequency of the filter creates a musically expressive sweeping formant with only a single parameter to be controlled. More expressive results can be obtained by sweeping two or more formants, at the cost of extra filters and controllers.
An important experiment to gain some more insight into this matter is to connect a sawtooth oscillator directly to the output of the synthesizer. Don’t set the volume of the amplifier too loud to avoid damage to your speakers! Now set the pitch as low as possible, somewhere around 1 Hz is perfect for this experiment. If the oscillator cannot go this low a low frequency oscillator (LFO) with a sawtooth waveform or a pulse waveform can be used as well. When the sawtooth is at 1 Hz only clicks are heard, in fact one click per second. In between clicks there appears to be nothing. Now put a resonant filter in between the oscillator output and the synthesizer output and set the filter cutoff frequency setting to somewhere around 1 kHz and slowly open up the resonance knob. The click will gain more timbre and slowly transform into a percussive ‘claves’ sound. The short 1 kHz resonance ripple lagging the transient can easily be heard. This little experiment proves that it is indeed the transient which fires the resonance in the filter. Now slowly raise the pitch of the oscillator and you get an idea of what a sawtooth waveform does to a resonant filter and how both act together to create the final timbre.
The sawtooth waveform is extremely easy to generate by both analogue and digital circuitry. In an analog sawtooth oscillator the waveform is created by charging a capacitor, an electronic component that can be charged by a current, similar to the rechargeable batteries in modern mobile phones. In comparison to a battery a capacitor can store only very little charge, but a capacitor can be fully charged and discharged almost instantly. By gradually charging the capacitor at a controlled rate the voltage over the capacitor rises. When the voltage reaches a certain level a relay circuit like a switching transistor is used to instantly discharge the capacitor, after which it is slowly charged again, discharged, charged, etc. The gradual charging creates the rising slope of the sawtooth and the instantaneous discharging moment creates the flank in the sawtooth waveform. When the discharging is indeed instantaneous the pitch of the sawtooth will depend on the charge rate only. By controlling the charge rate by a knob or a control signal the pitch of the sawtooth wave can be precisely set. Discharging the capacitor will still take a little time on analog oscillators, an average of about 1 to 2 microseconds is not uncommon. As the discharge time is fixed it will make the frequency behaviour of the oscillator slightly non-linear, which can sometimes be corrected by a trimmer control named ‘high frequency tracking’.
The relationship between the charging current and the generated frequency of an analog sawtooth oscillator is linear, doubling the current will double the frequency. The ear however perceives frequency in an exponential way, it ‘hears in octaves’. This means that a frequency perceived by the ear as three octaves higher than another frequency, has an actual frequency that is eigth times higher when measured in Hertz. The calculation here is simple, raise the number 2 to the number of octaves of the pitch transposition and the result will be the amount that the actual frequency in Hz is raised to, in the previous example 2^3=8. The analog sawtooth oscillator needs a circuit to easily transform the equally tempered scale note data from a keyboard into the correct charging current for the capacitor. This device is named an exp/lin converter. The synthesizers built by Moog in the sixties used a 1Volt/Octave translation in the exp/lin converter to drive the oscillator and this has become the de facto standard for analog synthesizers. The circuitry that does this conversion can easily drift on changing temperatures and temperature compensation must be built in. The quality of analog oscillators depends largely on the temperature drift behaviour, the accuracy of the exp/lin converter and the presence of a proper high frequency tracking trimmer control. As these three factors must be implemented with top quality components, causing good quality analog oscillators to be costly.
The digital sawtooth oscillator algorithm is incredibly simple, in essence it is just a single addition instruction in the DSP chip. By repeatedly adding a certain fixed value to a register the value in the register will increase, just like the charge in the capacitor increases by the charging current. At a certain moment the register will overflow and an overflow condition will be set in the DSP. If simple integer arithmetic is used and the register is allowed to simply wrap around on overflow it is not even needed to ‘discharge’ the register as this is implied in the wrap around. If the DSP does not allow for wrap around the register can be ‘discharged’ by subtracting the maximum value the register can hold. This can in many cases be conveniently done by an AND instruction with an operand that has all bits set. If floating point arithmetic is used a modulus function can be used to ‘discharge’ the register. Or alternatively rounding the result in the register to the nearest integer, which in this case will be the number one, and subtracting the rounded result from the value in the register. In this particular case the value to be added must be a fractional value between zero and one. The preferred way to implement the digital sawtooth is by using 24 or 32 bit integer arithmetic, running at a sample rate of 96kHz or higher and allow for wrap around. It’s the simplest, most efficient and fastest method. It also allows for a frequency parameter with a ‘negative’ value, which will produce the waveform in antiphase. The integer result of the addition can instantly be used to scan waveform tables and read and write index points in delay lines, but to be able to use the result as an audio waveform it must be bandwidth-limited and probably rescaled to get the best sound quality. Bandwidth limiting is necessary as the digital sawtooth is actually too perfect. Let’s assume a sawtooth at a pitch of 100 Hz is generated at a sample rate of 96kHz. Not only the audible harmonics up to 20kHz will be present but a lot of harmonics above the hearing range will be present as well. Between 20kHz and 48kHz there are 280 harmonics present at 100 Hz intervals. These very high harmonics are up to no good, as they can intermodulate with the sample rate and cause an audible distortion named aliasing. Audibly the best sound is achieved when all possible harmonics above 20 kHz are not present at all and the harmonics between 5kHz and 20kHz gradually decrease in energy. The best thing is if the harmonics above 20 kHz are not generated at all by the algorithm used in the sawtooth oscillator. This will make the algorithm for a good audio quality sawtooth oscillator much more complex than the just described accumulation method.
With the sawtooth signal a lot of things can be done, in fact most synthesis methods use a sawtooth signal at their heart to drive their synthesis engine. On both the traditional and on the virtual analog synthesizers it can drive a resonant filter very well. But the waveform can also be manipulated in a more ‘constructing’ way to obtain different waveforms with specific desirable properties. E.g. the pulse waveform is constructed from the sawtooth waveform. The way to do this is by comparing the level of the sawtooth waveform to a fixed or slowly varying control signal and providing an output signal that is either on or off, depending on whether the pulsewidth control signal or the sawtooth signal has the highest momentary value. The circuit that can do this type of comparison function is named a comparator and the output of the comparator circuit is the pulse waveform. This comparator circuit is commonly built into the oscillator and provides an extra output with the pulse signal. A triangle waveform is also constructed from a sawtooth waveform by folding down the upper half of the sawtooth waveform. From this triangle a sine wave can be constructed by passing the triangle through a device with the right non-linear function, in cheap synthesizers two diodes or more properly a more expensive circuit using a balanced modulator. In a digital oscillator these pulse, triangle and sine waveforms can be derived from a sawtooth in a similar way. There are basically two methods. The first method uses the sawtooth signal to scan a wavetable, a small part of memory where a ‘graphic’ representation of the waveform is stored. The second method uses functions to construct the other waveforms, e.g. a simple compare instruction in the DSP can create the pulse waveform from the sawtooth. But a good quality digital pulse waveform will need bandwidth limiting just like the sawtooth. The triangle wave can be constructed with some more instructions and from the triangle waveform a sine waveform can be constructed by using suitable mathematical functions, some of which can be executed quite efficiently. There are other ways to generate these waveforms directly in a digital system, but going into these details is beyond the scope of this book.
There can be a little difference in sound between similar waveforms on an anlog system and a digital system. Analog systems are said to have a warmer sound and digital systems are said to sound more brilliant in the very high. These are in many cases quite subjective differences, it all depends a lot on the quality of the analog oscillators, the bandwidth limiting of the digital oscillators and the quality and bandwidth of the DA converters used in a digital system. The main issue here is the area between 10kHz and 20kHz, analog oscillators tend to have a little less energy in this very high part of the sound spectrum. Most analog circuitry is bandlimited to about 5khz to 10khz to fight analog noise. Filtering away this area on a digital system can make it sound warmer and additionally less conflicting with e.g. cymbal sounds or the ‘air’ in vocals.
Roughly there are three types of manipulations possible on a waveform on the oscillator level. First, the oscillator output can be modulated in amplitude by passing the output signal through a controllable amplifier or multiplier.
The second manipulation is to modulate the waveform in time by smoothly shifting the waveform forwards and backwards in time. This will compress and expand the waveform in a rhythmic manner and when done at audio rate it creates new partials in the sound. The third possibility is to ‘make a jump in time’ by prematurely restarting the cycle of the waveform. These three techniques are respectively called amplitude modulation or AM, frequency modulation or FM and oscillator synchronisation or sync. If the aim of these techniques is to create a new waveform from an existing one it is common to talk about waveshaping. The purpose of waveshaping is to change the sonic properties of the waveform into other sonic properties that are special to the new waveform. Waveshaping can change the sound of a certain waveform dramatically, which means that it is musically a very interesting technique. In subtractive synthesis it is equally important as filtering, simply because shaping the waveform into a new waveform can remove certain aspects of the sound that are hard to remove with filters. Additionally, when a waveform can be shaped into another one in a smooth transition over time special musical effects can be created. Waveshaping can be present on many levels in a sound synthesizer. It can be used in an oscillator to create a new set of static waveforms from one reference waveform. But when the waveshaping is dynamic, it can be used to interactively and expressively play the timbre of the sound. Either under manual control or under control of a control signal from a modulation source, like a low frequency oscillator, an envelope generator, all sorts of sensors that can produce a useable control signal or a changing midi control signal received from e.g. a sequencer program running on a computer.
To control the amplitude of an oscillator an extra modulatable gain control module is needed after the output of the oscillator. On analog synthesizers the multiplication circuit is either a VCA or a ringmodulator, digitally it is a single signed or unsigned multiply instruction. VCA stands for Voltage Controlled Amplifier. The module has an audio input and a control input plus an output. Often there is a knob that sets the initial gain of the module. A ringmodulator has two identical audio inputs and one output. The main difference is that a VCA can modulate the audio signal by a positive control signal only, whenever the control signal is zero or becomes negative the VCA suppresses the output fully. The ringmodulator is a true signed multiplier, meaning that just like in an arithmetical multiplication it can accept both positive and negative values on its inputs and on the output is the arithmetical product of the input signals. In theory the inputs are identical and it doesn’t matter which of the signals is fed into which of the inputs. But in practice there might be small differences, depending on the quality of the circuit. Ringmodulators that will accept both audio signals and fixed or slowly varying control signals are only found on the most expensive analog modular synthesizers. On the simpler systems a ringmodulator will in general only accept audio rate signals and block control signals on both its inputs. One of the big advantages of digital modular synthesizers over analog modular synthesizers is that the analog modulars invariably have a very limited amount of VCAs and ringmodulators. Analog modules are probably not very accurate, due to component tolerances that might be up to 10%. They also most likely exhibit leakage of controlling and modulating signals on the output. In contrast, the digital multiply instruction is at least accurate within the bit depth of the system and does not exhibit leakage. And as it is only a single DSP instruction many multiply operations can easily be done, although some scaling of the inputs and output might be necessary, this depending on the actual system.
For now let’s assume that the multiplier is capable of handling both positive and negative values on both of its inputs. The multiplier can be controlled by a fixed value, which will change the volume level. A fixed control signal of negative polarity will bring the signal in antiphase. When a wildly varying control signal is used, like an audio signal, several sonically interesting things happen, as this can create new partials that are not yet present in either of the two input signals. The new partials can be harmonics of the original waveform, but can also be enharmonic partials. The multiplier can also be controlled by a signal that is derived from the input signal. This last case means that a transfer function is applied to the oscillator waveform. An example of a transfer function is when distortion is applied. E.g. in the case of a saturation distortion the input signal itself will ‘control’ the transfer function, the higher the momentary signal level the more saturation will be applied, caused by compressing the signal at audio rate. Note that there is a difference between the global sound level or volume and the momentary signal level, which is the momentary value of the signal at the extremely short moment named the now. Many complex mathematical functions can be made by using multipliers, mixers to do additions and subtractions and using constant values. In general these constant values are in a scale between arithmetically minus one and plus one. The transfer function is implemented as either a piece of programming code on a DSP system or on an analog system the patching of several ringmodulators and mixers.
As an example for a digital system, the function to generate a triangle waveform from a sawtooth waveform at amplitude 1 is to take the absolute value of the sawtooth times two minus one or ABS(Saw)*2-1. Using a Taylor-series function this triangle can be transformed into a sinewave. Chebyshev polynomials are well known functions based on taking sums of quadratures of sinewaves and rescaling to keep amplitude 1 results. They can be used to generate the harmonic partials from an amplitude 1 sine wave. If coded efficiently these functions can in many instances be faster than interpolated table lookup methods.
Basically any non-linear function can be used this way to amplitude modulate any audio signal, results may range from a great sound to totally havoc, but there are no rules, anything is allowed as in the end its all a matter of taste. In some cases there might be only one input and an output, meaning that the effect will fully depend on the signal level of the input signal. In other cases there might be knobs for controllable parameters that allow for dynamic timbre control. When the waveshaping technique is fully mastered an enormous range of basic sounds becomes available, many of them allowing for intuitive and expressive play.
Frequency modulation is based on shifting the waveform smoothly backwards and forwards in time at audio rates. To do this, another waveform is required to control this dynamic shift. The waveform to be modulated is named the carrier wave generated by the carrier oscillator. The waveform that is used to modulate the carrier oscillator is named the modulator. The modulation process can be applied at several points in the carrier oscillator, both the exponential frequency value and the linear frequency value can be modulated. The change in frequency will in effect cause the timeshift of the waveform. On a digital system there is also the possibility to modulate the phase of the waveform in time. Applying the modulation to the exponential frequency input can quickly create enharmonic results, so using this input is of less practical value. Within an analog oscillator linear modulation can be implemented by adding the modulating signal as a current to the current that is charging the timing capacitor. Regrettably it is difficult to do this in a a accurate and stable way, so only few of the more expensive quality analog oscillators offer an input for linear frequency modulation that also works properly. Digitally it is no problem at all, the momentary value of the modulating waveform is simply added to the linear frequency value on the output of the exp/lin converter. To avoid enharmonic results the ratio between the frequency of the modulating wave and the frequency of the carrier wave should be kept constant in simple ratios like 2:1, 3:2, 5:2, etc.
The amount of modulation applied is denoted by the modulation index. The value of the modulation index is the frequency deviation of the carrier divided by the frequency of the modulating waveform. If this ratio is constant, meaning that the modulation index is constant, a waveform is produced that has the same harmonic spectrum for all musical notes. This harmonic spectrum depends also on the phase relationship of the carrier and modulator, so preferably these should be locked in phase to get a stable waveform with a stable harmonic spectrum. Phase locking must be used to get predictable results. On a digital system phase locking between oscillators is much easier to implement than with two analog oscillators, which is one of the reasons why in practice all synthesizers that use any form of FM to generate their sounds are digital systems. In most FM synthesizers the system itself takes care of phase locking between oscillators, so it is of no concern to the musician to delve deeper into this matter.
To get a better understanding of what the modulation index really is, imagine a sinewave with a pitch of 500 Hz, which is modulated by a square wave at a very low frequency of 1 Hz. This will result in two steady tones alternating at a rate of two tones a second. The two tones are pitched around 500Hz, one tone is higher when the modulating square is at high level and the other tone is lower, due to the modulating square being at a low and negative value. The frequency shift of the two tones compared to the 500Hz is equal, but one of the tones has a negative frequency shift giving it a lower pitch. Let’s assume that the two tones have a 100 Hz shift. This results in one tone at a lower pitch at 500 - 100 = 400 Hz, and the other tone to have a higher pitch at 500 + 100 = 600 Hz. This 100 Hz shift is named the frequency deviation, it tells by how many Hz the new pitches deviate from the original pitch. As the linear frequency parameter is modulated, this shift of 100 Hz depends on the signal level of the modulating waveform. When the signal level of the squarewave is increased the two pitches will deviate further away from the 500 hz. But if this signal level remains the same all the time, the frequency shift for a 1000 Hz sinewave will also be 100 Hz and this would result in two tones of 900 Hz and 1100 Hz. And here is the catch, the shift at 500 Hz pitch is 20% as 100 Hz is 20% of 500 Hz. But the shift at 1000 Hz is only 10%, which you can imagine is up to no good. In fact the basic trick in FM is to create a constant percentage of shift and use this as the reference to manipulate the modulation depth. By remembering that the percentage of shift should by default be constant the FM technique can be better understood. And this percentage of shift is in fact directly related to the modulation index. It happens that the amount of frequency shift can be expressed as the frequency deviation of the carrier divided by the frequency of the modulating waveform, which results in nice and easy to work with numbers. In the case of the low frequency squarewave it was easy to imagine how it works, as only two distinct pitches are produced. When instead of the squarewave a sinewave is used the frequency glides smoothly between two extremes instead of jumping from one pitch to the other. In this case it is the maximun frequency shift caused by the sinewave that is used in the formula. So, if the resulting pitch glides between 400 Hz and 600Hz the deviation is 100 Hz up and down compared to the 500 Hz pitch when no modulation is applied.
It should be clear that to keep the modulation index constant the amplitude of the modulating waveform should be corrected for each pitch on the musical scale. Luckily the relation between the overall amplitude of the modulating waveform and the pitch of the carrier is very simple, it suffices to multiply the modulating waveform amplitude by the original linear frequency parameter on the carrier oscillator before it is added to the internal carrier frequency parameter. If this condition is met, increasing the amplitude of the modulating waveform will simply brighten the timbre to a richer sound and create a similar type of timbre control as sweeping the resonance frequency of a resonant filter. Which effectively creates a single expressive parameter that can be easily played by a controller like a knob or a modulation wheel. And the timbral effect tracks the keyboard in the same way as a filter can track the keyboard. On analog oscillators there are actually two points in the exp/lin converter circuitry where linear frequency modulation can be applied, and one of them has the inherent property to keep the modulation index constant over the pitch range. The other point keeps the deviation constant, which results in a strong formant that stays fixed to a certain frequency area. This can give a nasal effect to the sound when it is played over several octaves. By applying a little of the modulating waveform to both these modulatable points the keyboard tracking effect can be steplessly set between no tracking and 100% tracking. On an oscillator in a digital system there might be a button to choose between tracking and no tracking. Additionally FM synths have a feature that is named keyboard scaling or level scaling which can be used to control the keyboard tracking of the timbral effect of the FM modulation.
The modulating waveform can be basically any waveform, but for the carrier oscillator it is best to use a waveform without any strong transients, as these transients can get shifted in and out of the resulting waveform. Which might in cases sound quite harsh. An exception is when a square wave is used as both the modulator and the carrier and a deep modulation is used, this will have the effect of a deep and bright pulsewidth modulation effect. The sine wave and triangle wave seem to always perform very well as carrier wave, but the sawtooth waveform is definitely tricky for a carrier.
On a digital sawtooth oscillator not only the exponential and linear frequency parameters can be modulated, but instead the actual output can be phase modulated by adding the modulating signal directly to the output signal and applying a ‘wrap around’ or modulo function on the result to make the resulting waveform fold back to the minus one to plus one signal level range. This sounds equal to when a frequency parameter is modulated. From the phase modulated sawtooth waveform other waveforms can be derived by the proper waveshaping functions or the table lookup method mentioned before. When the phase is modulated and the modulation index must by default remain constant, it is again possible to multiply the modulating waveform with the internal linear frequency parameter from the oscillator, before the actual modulation is applied.
Modulation at audio rates of the phase of sinewave oscillators was explored in the sixties by Chowning. Later the japanese synth manufacturer Yamaha would use Chowning’s work to build their hugely successful DX7 FM synthesizer and the whole range of FM synths that followed. The advantage of phase modulation on a sinewave over modulation of the frequency parameter is that if selfmodulation is applied, meaning that the carrier wave is routed back to its own modulating input, there will be no unwanted pitch shift if the modulation amount is increased and the oscillator remains neatly tuned. Increasing the depth of the selfmodulation will gradually change the sinewave into a sawtoothlike waveform and with even deeper modulation will force the oscillator into a chaotic range that sounds like white noise. When instead of the selfmodulation of the phase, selfmodulation of the frequency parameters is used the basic pitch will drift away. This drifting away of the basic pitch is due to an inherent increase of a DC component in the modulated output signal, which will bring the oscillator badly out of tune. A workaround to this drifting away is to use a high pass filter on the modulation input. But even a simple 6 dB high pass filter tends to oscillate at a very high frequency if it is fed back, even through the carrier oscillator, and this will make the carrier oscillator unstable at higher modulation levels and not produce the proper chaotic behaviour. The rules are that when the pitch of a FM modulated oscillator should remain the same and selfmodulation is applied, only phase modulation should be used. But to create chaotic and noise sounds it is sometimes better to selfmodulate the linear frequency modulation input.
This chaotic range is quite interesting to explore, to get much better results in this range a lowpass filter can be inserted in the feedback patch, the steeper the filter the more interesting the chaotic waveforms that result. Other filter types like a variable width bandpass filter can give very good results as well. The variable width bandpass filter works very well because the highpass part will prevent a pitch drift and the lowpass part will give more control over the brightness of the chaotic range and prevents the highpass part to oscillate at a very high frequency.
An interesting case of FM is when the carrier oscillator frequency is set to zero Hertz by using a value of zero for the linear frequency parameter. This will in fact stop the oscillator. This technique can only be done on digital oscillators that can also be set to a negative frequency by negating the frequency parameter which should bring the oscillator output waveform into antiphase. Applying a modulation signal to a linear frequency parameter which tracks the keyboard will rhythmically start, stop, start in antiphase and again stop the oscillator. The musical importance of this frequency modulation of a zero frequency carrier oscillator is an audio signal that will always inherit its pitch from the modulating oscillator and has a strong formant area in its formant spectrum which location depends directly on the modulation index. Rule of thumb is again that the sound brightens if the modulation depth is increased. Frequency modulation of an oscillator at a 0 Hz pitch can never produce enharmonic results if the modulating signal isn’t already enharmonic. Using a square wave as modulating waveform will produce the timbral result of an analog technique named softsync.
This is an interesting example of frequency modulation, as although the frequency parameter is modulated the pitch will always be the pitch of the modulation waveform. In this respect the technique behaves much more like waveshaping done with amplitude modulation. Later there will be practical examples of amplitude modulation techniques where a steady detuning effect is created byt the waveform remains the same. In fact amplitude modulation and frequency modulation are intimately related, and both can shape a waveform at a fixed frequency and additionally amplitude modulation can change a frequency without changing the waveform.
Oscillator synchronisation lets an oscillator restart its waveform in synchronization with another waveform. Analog oscillators that are capable of synchronizing commonly use the flank or transient from another waveform waveform to synchronize to. In such an oscillator a circuit named a transient detector generates a very small pulse that is used to prematurely discharge the capacitor. This implies that on an analog sawtooth oscillator the synchronized sawtooth restarts with the maximum negative value from where it ramps up. On digital oscillators it is common to restart the waveform at the upward zero crossing point. It is also common to let the oscillator synchronize on an upward zero crossing point in the synchronizing waveform. To detect this zero crossing point the current sample is compared with the previous one and if the current one has a positive value and the previous one a negative value the zero crossing point is detected. At this moment the register that holds the current sawtooth waveform value is filled with a certain value instead of doing the addition.
Oscillator synchronization introduces a new flank in the synchronized waveform at the moment it is synchronized. This makes the current level change to a certain fixed level of either zero or the maximum negative extreme value. This ‘sync’ transient is very audible and the characteristic timbre effect of a sync sweep is caused partly by the changing magnitude of this transient. On waveforms like the sawtooth this magnitude changes gradually and doesn’t contrast too much with the timbre of the original wave. But with sine and triangle waves the contrast is greater and doesn’t always sound very well. In many cases the sound can be improved dramatically by suppressing this transient by applying an envelope over the waveform. This envelope is called the mask and the technique is named masked sync. The mask must be synchronized to the synchronizing waveform, so it is obvious to construct the mask from the synchronizing waveform. It is preferrable that if the mask is applied the gradient at the start of the next cycle is equal to the gradient of the previous cycle, but this depends a bit on the waveform to be synchronized. If this waveform is a sinewave it is best to use the first half of a bell-shaped curved mask, if it is a sawtooth, a square or a triangle wave a simple downward slope can be used for the mask as well. This downward slope can easily be derived from a rising sawtooth by applying the function x’=-0.5 * x + 1. In other words by inverting the sawtooth, halving the amplitude and adding just enough fixed voltage to make the result a positive only signal. On an analog synth the VCA can probably be modulated by an amount control that fades between full modulation and full signal, so in many case it suffices to invert the sawtooth, feed it to the VCA modulation input and set the amount control half open and tune the sound by ear. To get the half bell-shaped mask the sawtooth can be soft clipped by maybe a log-type function before it is converted into the mask.
The three basic techniques can be combined in all possible ways to create even more waveshaping possibilities. As an example an expressive waveshaping oscillator can be patched by using two synchronized sinewaves and multiplying them before a half bell-shaped mask is applied to their multiplied result. The gradient of a sin^2 wave is zero degrees at the start of its cycle, and multiplying the two sinewaves also gives a gradient of zero degrees as the startpoint of the cycles are synchronized to the synchronizing sawtooth oscillator. In this case it is the synchronized waves that produce the zero degree gradient at the start of the cycle and the mask that causes the zero degree gradient at the end of the cycle. Setting the two sinewaves to different frequencies above the frequency of the synchronizing sawtooth oscillator that supplies the mask will create a timbre with an expressive character with two distinctly audible sweepable formants. The sinewaves can be manipulated before or after they are multiplied together, but before the mask is applied. Transfer functions like a sin^3 or a sin*abs(sin) function perform very well to make the timbre even more talkative. Applying some heavy saturation distortion can add a lot of beef to the resulting sounds as well. Using a joystick or any other X-Y controller to offset the frequencies of the synchronized sine waves allows for very expressive timbre control. Envelopes and LFOs can be used equally well to create slowly evolving timbre changes. Applying FM on the two sine waves can also give expressive results. This waveshaping oscillator can easily be patched on both analog and digital modular synthesizers. On an analog modular one sawtooth and two syncable sinewave oscillators are need plus a single ringmodulator and a VCA with a level control an d level modulation. On a digital modular these modules will be plenty available and even more complex variations with more sinewave oscillators can be patched in various configurations.
Another interesting example is when syncing a pulsewave to a sawtooth wave and again using the sawtooth wave as a mask over the pulsewave output. By applying pulsewidth modulation and routing the sawtooth wave also to a linear FM input on the pulse oscillator several interesting timbres with smooth changes in brightness can be made.