Audio Asylum Thread Printer
Get a view of an entire thread on one page
|For Sale Ads|
This has been bothering me for a long time. It's one of those things I don't understand that haunts me.
It is generally accepted that we can't hear even surprising amounts of phase distortion. If you look at the phonograph record processing and reproduction chain (bunches of filters, and each RIAA filter has its own design and phase distortions in recording and playback) there is an absolute boatload of phase distortion between original performance and the signal hitting your speakers. And yet records can sound awfully good.
So what? Just accept it and move on. I can't. Here's the pickle:
I don't think it's just a matter of phase distortion. I think that phase distortion of real-world signals MUST affect dv/dt and slew rate requirements in unpredictable ways.
A square wave can be thought of as being the sum of an infinite number of odd harmonics at specific amplitude ratios, all in phase. Passing a perfect square wave requires infinite frequency response and slew rate, so square waves can only be approximated in the real world.
Now picture two sine waves or other signals of fixed "shape" but shifting frequency. I'm thinking here of two (or more) voices singing in unison but with vibrato. As they independently slide around the central frequency the phase relationship varies unpredictably. So the needed slew rate varies too, but the peak slew rate could be very high. Now add a chorus of voices or instruments all doing vibrato and you have a Fourier function that has a whole mess of high-frequency components. Add the phase-vagueries of the recording/playback chain, and the problems compound. I think that may be why recording choruses is especially difficult-the slew rate/frequency requirement may be just awful, far in excess of what's apparent by the natural frequency content of the human voice. And there are woefully few good chorus recordings.
Walt Jung and others have written extensively about the need for high slew rate. I think I've become a believer. But that's not going to fix recordings already made-you can't unclip peaks.
I tried to do the modelling of sine waves with variable phase relations (d^2 (sin x + sin y)/dt) to find the maximum rate of change) but no longer have the math skills.
A couple thoughts;
Whatever that group of voices or instruments present to a single microphone, is simply the vector sum of all those magnitudes and phases. The concern is what comes out of the loudspeaker if that signal faithfully captured is fed in?
Looking at frequency is useful when one has steady state signals BUT the idea of Frequency doesn't exist until some period of time has passed. For example, take a super low distortion oscillator and a FFT analyzer and examine the oscillator output. It has one frequency component and "no" other harmonics (being very low distortion). Now what happens if one doesn't have a steady signal? Take the oscillator's output and change it's amplitude and you see the signal covers a bandwidth and is no longer a single frequency. Modulate that signal so that it has a 5 cycle "envelope" with a Gaussian shape and you see the low distortion oscillator signal is about 1/3 of an octave wide. The point being that large fast changes in level require more bandwidth than a steady state sine wave view suggests.
Your right to be concerned about dynamics too, many recordings are compressed within an inch of their lives and then on playback very often the signal chain is clipped somewhere and often the amplifier. You can't hear short term clipping as a flaw like it sounds when it lasts longer, but it does sound different , less dynamic than the same signal unclipped. If one has an oscilloscope, examine the amplifier output for flat topping on transients.
The maximum output level, maximum frequency, and maximum slew rate are related by:
dV/dt = 2 * pi * Vmax * fmax
fmax is the maximum frequency the circuit may be expected to encounter
Vmax is the maximum output voltage
So the wider the bandwidth, the higher the slew rate required. And for a power amp, the higher the power output, the higher the slew rate required. For a given maximum slew rate, you can trade off maximum output against bandwidth.
Often discussions of slew rate drift into TIM and how it is related to NFB. I've heard it said several times anecdotally that early solid state designs had low open-loop bandwidth and needed a lot of NFB to achieve low distortion and to achieve the desired bandwidth (Google gain-bandwidth tradeoff). But if their slew rate was also low, then they could run into slew rate limiting that NFB cannot fix.
Now picture two sine waves or other signals of fixed "shape" but shifting frequency. I'm thinking here of two (or more) voices singing in unison but with vibrato. As they independently slide around the central frequency the phase relationship varies unpredictably. So the needed slew rate varies too, but the peak slew rate could be very high. Now add a chorus of voices or instruments all doing vibrato and you have a Fourier function that has a whole mess of high-frequency components.
If the circuit is linear, then summing up a bunch of different voices at different pitches superimposes their frequency content but doesn't generate new frequencies. For example, summing a sine wave at f1 with another sine wave at f2 just results in a signal containing f1 and f2. If f1 and f2 are close together, then you may hear a beat frequency at f1-f2 but there is no such frequency component in the signal. In this case, human auditory processing is working like an envelope detector.
Assuming we're working with linear circuits (or at least approximately linear), the required slew rate is dependent on the bandwidth of the signal and the output level per the formula above. It doesn't really matter how complex the signal is, just its maximum bandwidth.
I think you're missing the point about summing sliding sine waves. If two (or two dozen) sines waves are "vibratoing" then occasionally some or (pathological case) all of them could be momentarily in phase, resulting in horrendous amplitude peaks and resulting dv/dt.
Perhaps another way of modelling it would be as noise in frequency domain. Picture the noise (vibrato) components adding randomly, analogous to the way noise voltage adds in time domain. As in time domain, occasionally the peaks are large.
I do know that I have heard few good choral recordings. Most are accompanied by an amplitude-related ripping sound that I think is intermodulation distortion or TIM or slew-induced distortion. It sounds awful and unmusical. Since it is related to amplitude of complex wave forms (many human voices) I think it's related to the cause I described. And if even a single component in the recording or reproduction chain can't keep up, you get that "wonderful" distortion.
You are correct that some sound waves can line up in phase at certain times, but aside from the harmonics resulting from a single voice or instrument, most won't. As Geoff mentioned, you need some luck. I will try to show that via math at the bottom of this post.
I think what we really care about is crest factor, the ratio of the highest peak level of the music over the RMS (average) level of the music. I've heard that the crest factor of the human voice is around 12 dB, whereas the crest factor for percussion could be over 24 dB. So I would not expect choral music to be the most stressing. Back in the days before 24-bit digital recording, it was typical to use a compressor in the recording chain, at least for percussion. These days, the compression is usually applied after the fact in software. Either way, the end product usually has some compression in it. So the recordings we play back through our systems have crest factors of no more than about 16-18 dB at the high end, in the case of a relatively uncompressed recording from before the loudness wars.
Given the crest factor, the average level you would like to listen at, loudspeaker sensitivity and impedance, how far away you sit from the speakers, and an approximate room gain, you can calculate how much peak amplifier power you need and what the peak voltage will be. Suppose it's 200W and 40V.
Then you pick a maximum frequency. For CD, 22.05 KHz. The formula from my previous post gives a maximum slew rate requirement of 2 * pi * 40 * 22050 = 5.54V/usec. This calculation is conservative because the power spectra of music falls off at high frequency so you would never get a 20KHz component at 0dBFS.
But suppose you want to be ultra conservative. Let's say the amplifier has a wide 100KHz bandwidth and its voltage rails are at 50V. In order to make sure there is no possibility whatsoever of slew rate limiting regardless of what input signal you feed it, you would need a maximum slew rate of 2 * pi * 50 * 100000 = 31.4V/usec. For modern solid state amplifiers where a maximum slew rate specification is available, a typical value is around 40V/usec.
Here is the math bit I promised:
Start with the formula for a sine wave:
V(t) = A * sin(2*pi*f*(t-t0))
V is volts, A is the amplitude, and 2*pi*f*(t-t0) is the phase. The phase has a time varying term 2*pi*f*t and a constant term 2*pi*f*t0 where t0 is the time offset and f is the frequency of the sine wave.
If you have two sine waves:
V1(t) = A1 * sin(2*pi*f1*(t-t1))
V2(t) = A2 * sin(2*pi*f2*(t-t2))
The sum V(t) = V1(t) + V2(t)
The derivative dV/dt = dV1/dt + dV2/dt
dV1/dt = A1 * cos(2*pi*f1*(t-t1))
dV2/dt = A2 * cos(2*pi*f2*(t-t2))
Now we're concerned with the maximum slew rate, i.e. the maximum absolute value of dV/dt. You can see that IF the phase of the two sine waves lines up, the maximum will be A1 + A2. So the important question is under what conditions will the phase line up.
The maximum of value of cos(x) is 1 and it occurs at values of x=0, x=2*pi, x=4*pi, ... The minimum of value is -1 and it occurs at values of x=pi, x=3*pi, x=5*pi, ...
So the maximum value of dV1/dt occurs at t = t1 + n * 1/f1 where n=0,1,2,3...,
and the maximum value of dV2/dt occurs at t = t2 + m * 1/f2 where m=0,1,2,3...
Likewise, the minimum value of dV1/dt occurs at t = t1 + pi + n * 1/f1 where n=0,1,2,3...,
and the minimum value of dV2/dt occurs at t = t2 + pi + m * 1/f2 where m=0,1,2,3...
In order for a maximum of dV1/dt to line up in time with a maximum of dV2/dt, or alternatively for two minima to line up, there has to be a pair of integers m,n where t1 + n * 1/f1 = t2 + m * 1/f2. Rearrange this to be n = m * f1/f2 + f1*(t2-t1). Since n and m have to be integers, the sum of m * f1/f2 + constant has to result in an integer value.
In the lucky case of t1=t2, there is at least one solution: m=n=0. That is, if t1=t2 then the two maxima are guaranteed to line up at least once at time t=t1=t2 but not necessarily before or after. In the similarly lucky case where f1*(t2-t1) happens to fall on an integer value other than 0, then there are certain combinations of f1 and f2 which will also produce a solution at one time but not necessarily before or after. These lucky occurrences aren't very interesting solutions because they only produce the maximum of A1+A2 at one time value. I'm more interested in knowing under what conditions this can recur, and the answer is time/phase aligned harmonics.
If t1=t2 and f1 is an integer multiple of f2 or vice versa, then you will get a periodic series of solutions. This would occur if f1 is a harmonic of f2 or vice versa. Suppose you have the case of t1=t2 and f1 = 3*f2. Then there will be solutions at t=0, t=3/f1, t=6/f1, and so on.
Any time you are adding harmonics, you will get a pattern repeating at the frequency of the fundamental. But in order for the pattern to include maximum values of A1+A2, the harmonics have to align in time/phase. That itself is a lucky occurence given that the phase response of a microphone is generally not a constant vs. frequency.
Also, this is an analysis of just two sine waves. It can be extended to account for additional frequency components, but as you increase the number of components, the number of solutions which can produce a pathological maximum of A1+A2+A3+... become significantly more rare. Which explains why we don't see these in music.
Thanks for the analysis. I followed it (but am too rusty to do it).
I think you reinforced my argument. If you picture a bunch of voices (or instruments, or a mix) playing vibrato (or even just a little out of tune) then, while the probability of most of them lining up at a peak or valley is low at any one time, there are a hell of a lot of temporal opportunities (cycles) for that to happen. So I think it probably does happen, some, and the distribution of "slew-induced distortion events" would be predictable, and the intensity of such events would follow some kind of normal curve.
Thanks again for the insight and analysis. It helps.
The longer you sample, the greater the chance of a random collection of waves adding up and creating a large peak. Most recordings that are made cleanly (uncompressed and not subject to tape compression from loud recordings) will have a crest factor around 18 dB. With mild compression the crest factor will be around 15 dB. Percussive recordings can have much greater crest ratios, of course. Also, the recording style affects crest ratios, due to reverberations, etc.
These figures relate to short term period of averaging (0.5 to 2 seconds) during which the music is homogeneous. If you average over longer periods there will be occasional larger peaks and the possibility depends on the number of individual waveform sources. If musical dynamics come into play, then the crest factors can easily get to 25 dB or more.
If there is a rare peak over the ability of the recording or playback this may or may not be audible, depending on the technology. For example, momentary clipping of a stable amplifier on a percussive transient will not be audible if this happens infrequently in the course of a track. However, if the amplifier is unstable (i.e. incompetently designed) then a temporary overload on one peak may cause the amplifier to misbehave for an extended period. Similar problems happen with clipping in digital recording or playback. With LP playback peaks can cause needles to jump out of the groove, etc...
These are all factors well familiar to recording engineers, especially mastering engineers.
"Diversity is the law of nature; no two entities in this universe are uniform." - P.R. Sarkar
The math seems to indicate that the only waves which will add coherently are harmonics that are time aligned, i.e. harmonics from a single instrument or voice. The voices in a chorus will add incoherently, so the total acoustic power will be the sum of their individual acoustic powers.
Measurements of crest factor indicate that the pathological cases you're thinking of don't really happen, or at least they are so rare that they aren't a factor in recording. The highest observed crest factors are from impulses generated by percussive instruments, not from massed voices.
Anyway it's a moot point with modern solid state amplifiers because most are capable of slew rates faster than the product of their rail voltage and max frequency, which guarantees that they can't be driven into slew rate limiting with any input. That was not the case back in the 1970s when Walt Jung was writing about slew rate distortion, when amplifier slew rates were 1-2 orders of magnitude slower than today.
I just calculated the probability that all twenty four sine waves would be in phase with each other at the SAME TIME. The probability is extremely low. Even the probablity of several of the twenty four waves being in phase is quite low. Assuming the sine waves are independent which they might not be. Of course there are any number of reasons why massed voices might be congealed, harsh, irritating, or distorted. I could make a list of candidate causes but it would be rather long.
Pretty sure they figured out a long time ago Slew Rate should be just so high but no higher. You get into trade off territory, you know, just like with Total Harmonic Distortion. Having vanishingly low THD is not a guarantee of anything. Certainly Slew Rate had a good run, there was a time when everybody craved high Slew Rate. Besides room anomalies produce a very high amount of phase distortion, which is worse?
Post a Followup:
Post a Message!
This post is made possible by the generous support of people like you and our sponsors: