Waveshaping and distortion

Introduction

Audio waves can be plotted in two dimensions on paper or on the computer screen. In such a plot the horizontal axis represents time and the vertical axis represents the momentary value of the air pressure at the particular point in time that is marked right below on the horizontal axis. In an electronic system the vertical axis commonly represents a voltage level or a number. The series of values follows as close as possible how the air pressure will fluctuate when the electronic signal represented by the graph is fed to a quality speaker system. Since the introduction of harddisk recording software for computers it has become more and more common to look at waveforms like graphs, as in this software each track can be shown on screen as the long graphic track strips that in essence show how the air vibrates when that track is played back. This type of graphic representation of sound implies that sound is actually a two-dimensional phenomenom. The first dimension always represents time. The second dimension represents the momentary air pressure at the place where the microphone is located. The air pressure can be expressed as a numerical or a voltage value. This also shows the limits of what can be done to sound once it is in the electronic domain, as there are only two possible directions into which the sound can be altered. A momentary level can be changed or transformed to another value on the vertical axis and a momentary level can be pushed forwards or backwards in time on the horizontal axis. Everything that can be done to a sound will be based on one of these two possibilities or a combination of both. A smooth repetitive compression or expansion on the horizontal axis is named frequency modulation, while smoothly varying changes in the vertical direction are named amplitude modulation. It is also possible to make jumps on the horizontal time axis, which creates a displacement in time which will delay the audio. Techniques like oscillator synchronization, echo delays, granular synthesis, but also techniques like filtering, are all based on creating displacements in time and how the time delays caused by these displacements are handled.

The reason that there are so many possible ways to process sound by electronic means is based on the notion that sound consists of wave patterns that span a certain amount of time. These wave patterns have their own properties, like harmonics and partials, each with their distinct frequencies, and a volume envelope. Certain processes have specific and well defined effects on each of the individual partials, e.g. odd harmonic distortion will create a series of odd numbered harmonics out of each partial that is present in the sound, and all these newly created partials will be added to the original waveform.

Waveshaping and distortion are techniques where the original waveform is manipulated on one or both axes in a way that the basic pitch of the sound is left unaltered. If the pitch is left unaltered there must be a change in either amplitude or timbre, or both amplitude and timbre. In general, waveshaping includes all techniques where the waveforms are changed on the graphic level. On the other hand, distortion includes all techniques that work on the individual partials in the sound. Waveshaping techniques in general create a lot of new and often very high harmonics, resulting in a very bright and fuzzy sound. Examples are wavewrapping, clipping and soft clipping. The individual levels of the partials do not matter much, as there is no clear relation between the individual partials present in the original waveform and the partials that the waveshaping process generates. The advantage of the more elaborate waveshaping techniques is that they can create distinct formant areas in the processed sound that can give the effect a very pronounced character. Distortion on the other hand can be much more subtle. Newly generated harmonics depend on the individual partials that are present in the unprocessed signal. This means that if there are little high harmonics in the original sound, most distortion techniques will have a grungy character in the mid of the sound spectrum. When applied with taste distortion can enhance the apparent presence of instruments in a mix instead of creating ‘a distorted sound’. Distortion works out particularly well if it is applied in only a small portion of the overall sound spectrum. Distortion can work very well on already recorded audio signals that contain chords and enharmonic signals like percussion and cymbal sounds. In fact, a certain amount of properly applied distortion is highly desirable in synthesized sounds. In contrast, waveshaping doesn’t work out very well on audio material containing chords, etc. It is rather used on the oscillator or single voice level, e.g. to produce more character in the separate voices themselves in a popyphonic sound. Very often waveshaping is used right after a single oscillator and before a filter, while distortion works very well after a filter. So, although in general both waveshaping and distortion do not change the apparent pitch of sounds, they do have their own specific fields of application.

Distortion is always inherently present in any analog electronic device, although modern electronics are so good that the artifacts produced by this distortion usually fall below the treshold of hearing. Loudspeakers also inherently distort, and the distortion figures of loudspeakers can be quite serious for the cheaper ones. There are three basic types of distortion, even harmonic distortion, odd harmonic distortion and total harmonic distortion. Even harmonic distortion appears in radio tubes. Tubes have an amplification curve that is slightly bend like an exponential curve, though not as extreme as a true exponential curve. The effect is that the negative part of a signal is amplified slightly less as the positive part of the signal. This asymmetrical amplification will cause even harmonic distortion. An example of odd harmonic distortion is the saturation effect of magnetic recording tape. Recording tape has a limit to the strength of the signal it can record, similar to soft clipping. The more the signal strength approaches this limit the more the tape will resist to record at that strength. This effect is the same for both the positive and the negative part of a waveform, so it is a symmetric effect. This effect will cause odd harmonic distortion. Analog VCA circuits also exhibit this effect and when overdriven will cause odd harmonic distortion. Even harmonic distortion is said to sound more clean and natural compared to the more grungy sounding odd harmonic distortion, but qualifications like this are actually quite subjective and depend a lot on how distortion is applied. When simulating even harmonic distortion it is much harder to keep the effect in check as odd harmonic distortion, as the asymmetric effect can cause the positive part of a signal to quickly reach headroom levels and cause clipping. Even harmonic distortion can make a filter sound more steep, a technique that will be explained later, but it is a tricky technique that needs attention to prevent the mentioned possible clipping. Simulating odd harmonic distortion is less accident prone as both the positive and negative parts of a waveform are attenuated more as the signal level rises, so it can actually prevent signals from clipping. In fact, on many analog synthesizers the VCA circuit that comes after the filter circuit is allowed to be overdriven to reduce jumps in signal levels when the filter is set to a very high resonance value. This way the overdrive effect acts as sort of a signal level limiter. In practice analog electronic components have a limit to their working range, e.g. it is impossible to amplify a signal to a level that would exceed the power supply voltages. When a level is close to a power supply voltage the amplifier starts to refuse to amplify further which creates a saturation effect. As this is a symmetrical effect devices like radio tubes do not only create even but also odd harmonics. In this case it is common to talk about total harmonic distortion.

When there are chords or enharmonic sounds in the audio material that is distorted the partials start to interact with each other and create intermodulation or IM distortion. Note that both even and odd harmonic distortion also create IM distortion. As distortion is always the result of some nonlinear effect, the new partials produced by the distortion will have frequencies that are the sums and differences of the frequencies of the partials in the original signal. This means that of a pure quint is played with pitches at 220 Hz and 330 Hz, a partial at 110 Hz will be produced, as 110 Hz is the difference between 220 Hz and 330 Hz. This 110 Hz partial will start to act as a subharmonic that gives a low bottom to the sound. Electic guitar players use this IM distortion principle almost unconciously. The main reason why guitar amps and speaker cabinets have relatively high distortion figures is to have the IM distortion produce a grungy low bottom end in the guitar sound when chords are played. It is quite important to have the instrument well tuned, as when not properly in tune the IM distortion also creates a strong and probably unwanted beating at a low frequency. An important thing to note is that IM distortion is like a recursive process, meaning that the partials produced by the distortion will also immediately intermodulate with the original and newly created partials. This effect increases exponentially when the distortion depth is increased. E.g. if the quint from the previous example was tuned at 220 Hz and 331 Hz there would be a new partial at 111 Hz (331-220). This partial would also intermodulate with the 220 Hz and create a new partial at 109 Hz (220-111). And the new 111 Hz partial would intermodulate with the new 109 Hz partial to produce a beating at 2 Hz (111-109). A guitar player can tune his guitar strings to get just the right effect for the relatively few chords used in most pop songs. He can also correct the tuning of chords by bending some strings and even use the beating as an expressive effect. But on common synthesizers this type of individual voice bending control is lacking. So, distortion should be used with care on synthetic sounds to prevent unwanted strong beating effects. E.g. a lot of distortion on a chorused or unisono sound will in general sound very nervous, as the distortion strongly exaggerates the subtle beating that is already present in the unisono effect. Deep distortion on a reverberated sound is considered pretty aweful by most people, and is indeed hardly useable, even as a special effect.

The trick to applying distortion is to apply it only to specific frequency bands. To do so the sound must first be split up in different frequency bands by using crossover filters. It hardly pays to apply distortion to the frequency band above 2.5 kHz if the sound is already quite bright. But when used with care it can freshen up a dull sound, e.g. aural exciters are based on adding subtle distortions to the very high ranges of the sound spectrum. Subtle distortion in the range between 500 Hz and 2.5 kHz can greatly enhance the presence of a sound in a mix and can be an important method to improve the overall sound. Distortion below 500 Hz can easily make the bass range sound muddy, so it should be used quite conciously. It depends a lot on how the bass and the kickdrum work together. In general it is best to apply distortion separately to the bass and the kick before they are mixed together, to prevent strong IM distortion between the bass and the kick.

Transfer function

All types of waveshaping and distortion that work by manipulating the momentary amplitude level can be drawn in a simple graph that shows the transfer function in a graph. The horizontal axis of the graph spans the range for all posiible input values and the vertical axis spans the range of all possible output values. In most cases this graph will have linear scales on both axes. To work with the graph a momentary value that is found on the vertical axis of the earlier mentioned waveform plot is drawn on the horizontal axis of the transfer function plot. The transformed value can be found on the vertical axis of the transfer function plot and this value will substitute the value in the original waveform plot. If the line on the transfer function plot is curved or has sudden corners the plot is nonlinear, as if the line would have been straight there would only be a linear amplification or attenuation depending on the angle of the straight line. Basically any function that produces a curved or cornered line will produce some waveshaping or distortion effect. If the input is a sawtooth waveform that spans the full dynamic range the resulting waveform will have the same shape as the line in the graph. This particular case is very useful to understand what actually happens in the distortion process. First observation is that as the sawtooth contains all possible harmonics with smoothly decaying amplitudes for the higher harmonic numbers, and so is free of formants, the new waveform will have some harmonics enhanced and others attenuated. So, the new waveform will have formants. These formants will have a place in the audio spectrum that is relative to the pitch of the waveform, increasing the pitch will also shift the new formant areas up in the audio spectrum. Another observation is that if there are corners in the transfer function the new waveform will also have corners, and these will contain much sonic energy in the highest part of the audio spectrum. So, if the graph is not smoothly curved the resulting wavefrom will sound fuzzy. But if the graph is a smoothly curved line the new waveform will have a more grungy character, meaning that the harmonics just above the fundamental will be enhanced, and little energy is added in the very high parts of the audio spectrum. This means that the energy in the melodic part of the audio spectrum is enhanced, which can increase the perceived presence of the sound in a mix without having to boost the overall volume of that sound. This effect is mainly based on psychoacoustic principles, or how and where the mind tends to focus in a mix. Needless to say this is an important technique in mixing and mastering. Still, using distortion to improve a mix is a subtle and delicate art that needs quite an amount of practice. First it must be determined where in the mix more presence is needed and then a proper technique must be applied with subtlety to get a result that is not overdone but leads to just about the right balance. As only the presence should be increased and the effect should not sound distorted. It is impossible to give recipes that always work, as different material will probably need different treatment. There is a lot of intuition involved here. The only available tools to judge the final results are your ears. Meaning that careful listening is highly recommended.

Waveshaping

On an analog oscillator with multiple waveform outputs the waveforms are internally derived from one basic waveform by means of a technique named waveshaping. In most cases the oscillator itself generates a sawtooth waveform. As the input to the waveshaping transfer function is a sawtooth, the graph of the transfer function is equal to the new waveform to be created from the sawtooth waveform. On a digital system a lookup table can be used, the momentary sawtooth value will in this case be the index to get a value from the lookup table. By describing the new waveform in the lookup table a sawtooth waveform can be transformed into virtually any new waveform. This technique is sometimes named wavetable synthesis. If there are more lookup tables stored in the system dynamically changing waveforms can be created by smnoothly crossfading between the results of two or more lookup table transfers. Instead of lookup tables specific functions can be used to get specific waveforms, e.g. using the momentary sawtooth value as input for a sine function will generate a sine wave. Analog systems will use the specific properties of certain electronic components like diodes or devices like opamps or comparators to create the transfer functions to transform the sawtooth waveform into other waveforms.

Following is a description of common methods to cretae the more common waveforms found on analog oscillators. The pulse waveform is derived from a sawtooth by comparing the current level of the sawtooth to a constant value. When in the comparison the current level is greater the pulse output will be positive. And if the current value is less the pulse output will be negative. The transfer function plot will show a straight vertical line. Every input value that is left of this line will transform to the maximum negative value and every value right to the line will transform to the maximum positive value. Varying the compare level by e.g. a slow triangle waveform will achieve pulsewidth modulation. Basically the vertical line in the transfer function will be shifted from left to right and back again. A triangle waveform can be derived from the sawtooth waveform by folding down the upper halve of the sawtooth waveform. Alternatively the upper quarter of the sawtooth can be folded down while the lower quarter of the sawtooth is folded upwards, until their ends meet. The triangle waveform can be changed into a sinewave. On an analog oscillator this is often done by feeding the triangle through a device that has a voltage dependent resistance. On a cheaper system two diode components are used, although diodes will not produce a very pure sinewave. This method also needs careful trimming to get the least harmonic distortion in the sinewave. On a digital system a much more pure sinewave can be created by either using a lookup table that describes a sine wave or by using a mathematical function based on what is known as a Taylor series evaluation. This last method can be computed quite efficiently and can produce a very pure sinewave without having to use a long sine function lookup table stored in memory.

The mentioned techniques are used to create the waveforms that are commonly used on analog synthesizers, but those waveforms can be manipulated further to create more waveforms with certain desirable sonic properties. The basic waveforms, except for asymmetrical pulse waveforms, all have a harmonic series that falls off smoothly, meaning that there are no strong formant properties in the timbre. To create more characteristic timbres waveshaping should introduce formants, and in most cases it will.

A common approach to creating suitable transfer functions for waveshaping is to divide the input range into two or more segments. In the transfer function graph these segments show on the horizontal axis. The angle of the transfer curve line differs for each segment. The graph lines for each segment do not necessarily have to join, if they do not join it will create a sharp vertical transient in the final waveform when the input value crosses the border between the two segments. If the segment lines do join, a corner is created in the final waveform. Technically the segments can be created by using one or more voltage or level comparators that control a set of switches. Each switch passes on the input signal with a controllable amplification factor plus an additional variable level offset. The offset levels can be set in a way that the line segments in the transfer function graph join ends to suppress unwanted transients. There are several variations possible on how the comparators and switches can be set up, which is up to the synth designer. The G2 system offers a module named a control sequencer which is a very convenient setup that divides the input range into sixteen equally spaced segments. This module can be set to interpolate between sixteen slider values that provide the parameters for each segment. If the input signal amplitude varies between 0 and +60 units a very flexible waveshaper is created. The input waveform oscillator that drives this waveshaper can e.g. be a shaper oscillator set to the waveform that morphs between a triangle and a sawtooth. The basic waveform can be set by ‘drawing’ the waveform with the sliders and then the timbre can be dynamically altered by modulating the triangle<->sawtooth input waveform with e.g. a low frequency oscillator.

Clipping

Clipping clips off the top or both the top and the bottom of a waveform. Depending on the original waveform the effect can be from subtle to quite extreme. The transfer function plot is divided into three segments. The middle segment shows a straight line at an angle of 90 degrees going through the centre or origin of the plot. When the graph line reaches the left and the right segments the line makes a corner and becomes horizontal in both outer segments. Clipping can produce a lot of high harmonics and often works best on a raw waveform before it is fed into a filter. When a moderate amount of clipping is used on a sawtooth or a triangle waveform it will increase the presence of the fundamental in the waveform, giving the final sound a bit more beef without destroying the basic character of the sawtooth or triangle waveforms. In most cases the clip levels are controlled by a fixed value and can not be modulated. But an interesting modulation effect is created by adding a slow triangle waveform to the audio waveform before it enters a clipper module. The sonic effect is that of a lively change in timbre that sounds related to pulsewidth modulation. Note that clipping only works on waveforms that have smoothly rising or falling slopes, e.g. on square and pulse waveforms clipping doesn’t have any sonic effect at all.

A disadvantage of many clipping modules is that when the clipping levels are changed, the overall amplitude of the output might be set to the same levels. This means that the volume can drop significantly when clipping levels are set to more extreme values. There is a very simple way to overcome this by actually using the clipper in a feedback loop in a mixer. The idea is that even with great amounts of feedback the clipper will always clip the feedback signal to the set levels. So, by setting the clip levels to fixed values and controlling the amount of feedback, the output amplitude will remain constant and it will become easier to work with clipping. Of course this only works when both the top and the bottom are clipped. By using a three input mixer both the amount of clipping and the clipping modulation can be conveniently controlled. The output of the mixer is fed into the input of the clipper, while the output of the clipper is fed back into one of the mixer inputs, the second mixer input receives the audio signal while the third input can receive a slowly varying modulation signal. Final output is taken from the output of the clipper module. Instead of using a second slowly varying waveform a second audio waveform can be used as well. Tuning this second waveform to a just pitch ratio, e.g. to 2:3 or 3:4 can create thick and bright sonic results. Adding even more audio waveforms and using enharmonic detune ratios can give thick metallic timbres useful as basic material for bright and metallic percussive sounds like metal can hits, gong sounds, etc.

An alternative for the G2 Clipper module is to use a modulatable crossfader module that crossfades between two fixed values. The advantage of the modulatable crossfader is that the clipping action takes place on the modulation input, if the input value on the modulation input exceeds either -64 or +64 the crossfader will stay fixed to either the A or the B crossfader input. This means that when e.g. a triangle waveform is fed into the crossfade position modulation input the crossfader output will vary between the two fixed levels on the A and B inputs. The input modulation level input will set the clipping sensitivity while the values on the A and the B input set the minimum and maximum levels the waveform will clip to. As these A and B values can be set to any value this clipper setup can force a waveform to be shifted into a clearly defined amplitude range that it can never exceed. So, the two fixed A and B values define where the boundaries between the outer two segments and the middle segment are positioned, while the middle segment line always joins at its two ends with the horizontal lines of the outer two segments.

Soft clipping

The transfer function for soft clipping is almost similar to the transfer function of a clipping module. The difference is that as the middle line segment in the clipper transfer function approaches the minimum or maximum limits it starts to smoothly bend towards the limits, so the corners are softened. Soft clipping produce much less energy in the very high parts of the audio spectrum, making it sound less fuzzy. Soft clipping is often more useful as straight clipping, unless a very large amount of very high harmonics is needed. Many times soft clipping is created by using a slightly curved line that is derived from a simple exponential function. This method is computationally simple and quite effective. An even better result is achieved by using a sine function where the range between -90 degrees and +90 degrees fills up the middle segment. This will result in a nicely grungy soft clipping effect where the newly produced harmonics are well balanced and sounding a bit more organic as when using exponential functions.

Wavewrapping

Wavewrapping uses similar folding circuitry as is used to derive a triangle wave from a sawtooth wave. But wavewrapping offers the possibility to dynamically fold the top and bottom in a way that ‘multiple folds’ can be created. When applied to a triangle waveform, the wavewrapping amount modulation creates an effect that is sonically very similar to using hardsync on a triangle wave, meaning that it results in a strong sweeping formant effect. The effect on other waveforms can be quite harsh, as it can create even more high harmonics as clipping does. The amount of wavewrapping works very well on both triangle and sawtooth waves when the amount of wrapping is set to a fixed level and a slow triangle waveform is mixed to the audio waveform before it is fed into the wavewrapper module. The amount of the low frequency modulation signal is best set to only one third or less of the fixed amount of wavewrapping. This will create a lively effect that can be enhanced by using an extra chorus. Setting the overall envelope attack rather slow and using long note decay times will result in characteristic padsounds with a slightly ethereal sound. Replacing the low frequency oscillator with an AD envelope generator can create a characteristic attack.

Nonlinear waveshaping

Virtually any mathematical function that uses one input and one output value can be used to produce nonlinear waveshaping. Such functions can use extra control values that dynamically alter the transfer function. In computer graphics smooth curves can be drawn with functions named Bezier curves and B-spline or qubic spline curves. For smooth graphic curves these functions are used in two dimensions for curved line segments or three dimensions for curved surfaces. They can also be used in one single dimension and then can be used directly to modify a waveform into another waveform. The idea behind such functions is actually quite simple, imagine first that there are two control values and a crossfader that fades between these two values. The audio input signal is used to control the position of the crossfader. What will happen is that the waveform on the output of the crossfader is a copy of the input waveform, but its minimum and maximim amplitude value will be equal to the two control values. By changing the two control values the waveform amplitude is attenuated and shifted up and down. Basically, if the crossfader is at one end only that control value will define the output and if the crossfader is at the other end the other control value will define the output. In the middle the effect of both control values will be fifty-fifty. Now imagine that the crossfader is adapted in a way that in the middle the effect of both control values is only 25%. But a third control value is added in a way that its effect in the middle is full but at both extreme ends it is zero.The curve for this third value must be smooth, like a bell shape. The three curves for the control values can be chosen insuch a way that if the end values are +1 and -1 and the middle value is zero and the input waveform is a sinewave the output is also a sinewave. But when the three control values are set to randomly chosen values the input sinewave will be shaped into another smooth waveform. The curves for the three control values are named blending curves and define how much effect the corresponding control value will have on the output value for a certain input value. The more control values the finer the control on the resulting waveshapes.