Why am I (currently) building what I build ?

At first, it was only a gut feeling that not all reflections are bad by nature, although too many people still believe this today. Then I experienced myself that wide and uniform dispersion are desirable properties; something that had been missing in my audio voyage most of the time !

Reason and rhyme was provided by Wolfgang Klippel [1], [2] with his work around 1990, although I found his papers only after my first "new" experiments. These had been inspired by Siegfried Linkwitz, who provides a wealth of complementary and additional knowledge and experience.
“Room Reflections Misunderstood ?” [3], for example, is a major supporting pillar in this picture.
Klippel's work is also summarized in Toole's book [4].
In a nutshell, Klippel’s findings suggest that a certain increased amount of indirect/reflected sound (5 dB for music) had been rated positively in a big study mainly for the sensations of “pleasantness” and “naturalness”. The main contributor to both sensations was the sub category “feeling of space”.

When you go to a classical concert you hopefully find during the event that the air is full of sound, which is produced by 15+ first violins and 15+ second violins as one (!) example. Such an auditory experience is at least close to goose bumps. It is for certain reasons highly satisfactory; for people, who like music anyway, even if classical music is not on their regular playlist. So much highs, full of soft and velvety energy, cannot be reproduced at home (actually not a lot of a big live un-amplified concert can be reproduced at home not only because of the sheer energy and dynamics). But the speakers that come closest to this very feeling and sensation, do have the properties mentioned above.
Another resulting desirable behavior is that the speakers disappear. Partly because of the uniform polar response and partly because of the amount of reflections they create. Ideally, they don't provide any clues where they are, so you cannot really locate them and your brain can concentrate on enjoying the music. You hear less from the speaker and more from the auditory scene, although this might sound contradicting to some people. This is a bit like in electronics: Sometimes adding an active component lowers noise instead of increasing it.
In total, a higher auditory satisfaction is the consequence of the mentioned attributes.

Now how do dipoles, which actually have a pretty narrow dispersion with -6dB @60° and no radiation @90°, fit into the picture ? It must be their rear radiated sound that produces the required reflections and interestingly this does not make a huge auditory difference compared to omnis.
Other constant/controlled directivity speakers such as boxes with a low crossed tweeter in a big waveguide/horn are not able to create these sensations. At least not to a satisfying extent because they produce too little reflections in typical small rooms. They might have other applications such as sound reinforcement where you are listening to such speakers from a greater distance.

Speakers, which have a constantly changing/alternating radiation pattern (directivity sawtooth at cross-over points) such as regular consumer box speakers, can normally be disregarded right away. This issue can be detected easily even by untrained listeners. More often than not they also suffer from diffraction at the cabinet edges, which creates secondary sound sources typically in the 2KHz...5KHz region. That re-radiation can be so strong so that the sound seems to stick on the speakers, revealing their position in the room and thereby withdrawing attention from the phantom image.

Update 17-May-2012:

Now, why do I believe that Constant directivity is the way to go ?

First and foremost, the claim that the reflected sound is a delayed and attenuated copy of the direct sound [3] supports the precedence effect, provided that the speakers are set up correctly. The precedence effect can be divided into three sub phenomena:

Terminology used by...

W. M. Hartmann [5]

JASA [6]

1. localization phenomenon


2. Haas Effect

localization dominance

3. De-reverberation

lag-discrimination suppression





In addition, constant directivity results in at least two more effects, which directly influence the speaker/room/listener interface:

a. The Initial Time Delay Gap (ITDG) is constant.
So, the perception of distance or the intimacy feeling does not vary with frequency no matter how far away the listener is located from the speakers. With box speakers there are more and stronger reflections in the bass than in the highs because the bass is typically omnipolar. As a result the ITDG is longer in the highs. Hence tweeters appear closer to the listener or they seem to somehow stick out. This effect is maximized by horn tweeters, which I perceive as "blowing into the face".
Check out this nice animation.

b. The critical/reverberation distance of a CD speaker is also constant.
With consumer box speakers, the critical distance is smaller in the bass than in the highs because, again, the bass is typically omnidirectional. So you hear more bass reflected from the room compared to the highs. The critical distance increases in the highs as the directivity index rises and there is more direct sound and less reflections. This creates an imbalance and the tonality / timbre of a speaker varies from room to room and with the listening distance.
Only beyond the critical distance of the tweeter the room finally dominates completely.

In the above considerations the term box speakers include speakers with a monopole bass and a tweeter in a big waveguide although latter features CD behavior.

For best results, a speaker should exhibit CD behavior over the entire audio spectrum or at least above the transition zone/Schroeder frequency.
Only acoustically small omnis, bipoles, dipoles and cardioids are practical solutions for typical listening rooms. They are perceptually neutral transducers even though the on-axis frequency response may need alteration for a balanced sound.

What is the Optimum Polar Response for a Loudspeaker ?
(initial version: 11-Aug-2012)

This is probably the most burning question in speaker design among pros and DIYers.
It is my current understanding that there is no universal answer to it. That is simply because of the room the loudspeakers are going to be placed in. Rooms are varying vastly from customer to customer in size (volume and dimensions) and energy decay properties. So the room is the biggest variable in the room/speaker/listener interface. But exactly this complex interface is the key to sound reproduction. I would like to draw a picture of this system but that is difficult to do and might require several drawings. So for the time being I try to describe it in words. For that I begin again with the biggest variable:

The room
The listening room has a multitude of physical properties such as size, which relates to volume, surface area and distances. The dimensions of a room also dictate with a little freedom where the speakers are going to be placed and where the listener sits. In addition, the size controls how much other stuff like furniture will find its way into the room.
In the end those properties result acoustically in a time value that describes how long it takes for the energy, which the loudspeaker throws into the room to decay until it does not bother the human ear anymore. A room, where the energy decays very quickly sounds very dry. If the energy bounces back and forth for a while you will not understand a single spoken word anymore.
This time value is typically expressed as reverberation time RT30 or RT60. I will use RT60 in my further write up. This is the time it takes from switching off an acoustical signal with 0dB reference level until the sound pressure level has dropped to -60dB, which represents one thousandth of the initial SPL.
The RT60 value mainly depends on the volume of the room and the surface area that can have varying absortion coefficients depending on what materials are being used.
A dry sounding domestic room has an RT60 of 200...300 ms. Speech intelligebility is excellent.
A modern living room has an RT60 of 400...600 ms depending on the materials of the floor, ceiling and walls and openings in the room and the amount of furniture placed into the space.
I consider values above 700..800 ms as less than optimum for sound reproduction, no matter what type speaker you are using.

Here is the related Sabine equation. However, RT60 is best measured with an appropriate system (e.g. Arta) and loudspeaker (omni) to excite the room uniformly.

RT60 = k · V / A
k = (24 · ln 10) / c = 0,161
V = Volume of the room [m3]
A = a · S = equivalent absorption area [m2]
a = Absorption coefficient(s)
RT60 = reverberation time [s]
S = absorbing surface [m2]
A = a1 · S1 + a2 · S2 + a3 · S3 + ...
c = speed of sound at 20°C [343 m/s]

Also the Schroeder frequency depends on the volume of the room and the decay time RT60.

fs = 2000 ( RT60 / V )1/2

Below this frequency we have to deal with discrete sparse room modes, which, once excited, prolong the energy decay to typically inacceptable levels. So it becomes clear that woofers and sub woofers that operate around the transition zone or below need different measures than drivers for the mids and highs.

RT60 should be fairly constant at least above the Schroeder F. Below the transition zone that is difficult to achieve because damping is inefficient for long wavelengths.
The ideal room is symmetrically.

The speaker/room interface
Now the speaker with its radiation pattern is placed into the room. Ideally, the speakers should be setup symmetrically to the walls at distances >= 1 m to support the precedence effect. Depending on the radiation pattern/dispersion characteristics the speakers more or less excite the room and create reflections.
The decay property (RT60 value) of the room in combination with the radiation pattern determines the critical distance/reverberation distance of the speaker. The critical distance is where the ratio of direct vs. reflected sound = 1. So at the critical distance we hear as much of the room (reflected sound) as from the speaker itself.

dc = 0.1 ( G V / (p RT60) )1/2

G is the directivity gain.
Monopole G=1
Cardioid G=1.71...4 (depending on the shape)
Dipole G=3

In a life sounding room with an omnipolar speaker, the critical distance can be very small like 1 m or below. For larger rooms you therefore need speakers that are more directional what results in a longer critical distance. Or you need a room which is damped heavily, which results in a longer critical distance as well.

The speaker/room/listener interface
Now the listener enters the room. The person will typically sit in a distance where the direct and the reflected sound blend into each other to a certain degree. But what is the right listening distance ?
Can you sit too close or too far away ?
Yes ! According to Klippel [1], [2] there is an optimum. The optimum he describes for music is where the sound reflected in the room is 5dB higher than the direct sound. And how do you know where to sit then ?

a. You can hear it (at least with a constant directivity speaker) ! If you are too far out in the room, it does not sound good anymore. Try it yourself. I bet you will find the spot where the room starts to dominate too much.

b. 5dB more than the direct sound means "a bit less than twice the critical distance" (twice the critical distance equals 6dB).

Too far out in the room is no good because you mainly get to hear the reflections. Maybe you know the issue from audio shows. The room is too full and you need to stand all the way back. All of a sudden all speakers sound the same. Too close is also not ideal either because too much direct sound is not as pleasing or live sounding. If you would go to zero distance you could hear lobing errors that result from driver separation and cross-overs.

So you can see already here why constant directivity is a must: Because the critical distance of the speaker remains constant. You could argue now that RT60 of the room is not constant either. But in a room with a reasonably flat RT60 you can still hear the advantages over a speaker where the directivity is constantly changing at e.g. the crossover points or simply because the tweeter beams.

Further, we assume that our person will setup the speakers in the equilateral stereo triangle and the listener will place himself at the apex. If you now follow the 5dB rule, the size of the triangle is pre-determined. If you make it too small because the critical distance is too short, the speakers come too close together and if it becomes too big then the speakers cannot be placed at a sufficient (>= 1 m) distance from the walls.

After looking at the situation a bit closer, the question "What is the Optimum Polar Response for a Loudspeaker ?" would only answer one aspect of the described complex room/speaker/listener system. However, the question can easily be extended to
What is the Optimum Polar Response for a Loudspeaker in a particular room and listening scenario (setup) ?

You can also see what loudspeakers are worth that have a constantly changing radiation pattern and that have been developed for a standard IEC room, which may or may not come close to your room at home: The experience will not be optimum or even only mediocre.

On the contrary, the described system yields some very clear demands for a loudspeaker to get the best possible out of Stereo for a given room:

1. The directivity pattern has to be constant. As a minimum it has to be very uniform above the Schroeder frequency. The precedence effect will be at play if there are no very early reflections and in addition CD results in a constant critical distance of the speaker and a constant Initial Time Delay Gap.

2. The dispersion pattern (wide vs. narrow) has to be selected according to the listening scenario (setup) and the room properties (RT60). Also, the listener should not sit farther away than approximately twice the critical distance of the speaker and also not too close. It is well known that some listeners prefer a higher degree of direct sound. In this case the listening distance could be decreased to the critical distance as a minimum.
If the dispersion pattern (wide vs. narrow) is selected based on preference, the possible consequences have to be observed.

3. Below the Schroeder frequency you need a system, which does not couple excess energy into the modal nodes so that the sound decays quickly [7].

4. The on-axis freq. response has to consider the radiation pattern (width of dispersion / sound power), acoustical properties of the room and psychoacoustic demands of domestic rooms have to be satisfied. There is no such general requirement to produce a ruler flat and constant on-axis FR. While this may be correct under anechoic conditions, the speakers and the room form a new acoustical system with new behaviors, which have to be taken into account during the design and equalization process.

5. The loudspeaker must not have a Gestalt. It must only reproduce sound and add nothing to it; no distortion, no uncontrolled diffraction etc. Otherwise the fragile phantom image may be compromised.

6. The room should be symmetrical and the speakers should be setup symmetrically as well. There should be no room boundary too close to the listener.

7. The reverberation time above the Schroeder frequency should be as uniform as possible.


It should be clear that the sound field in a small room is not really diffuse. In the above considerations, terms of the diffuse-field theory have been used because they simply describe best what is happening approximately.
So, e.g. critical distance does not apply exactly to small rooms. Yet, the direct sound diminishes anyway (approx. at a rate of 3dB per double-distance) as you move away from a speaker and the reverberant sound becomes dominant. Hence, the listening distance vs. directionality of a speaker is perfectly applicable and very relevant for sound reproduction in small rooms.

[1] Multidimensional relationship between subjective listening impression and objective loudspeaker parameters, Acustica 70
[2] Assessing the subjectively perceived loudspeaker quality on the basis of objective parameters, 88th Convention, AES, preprint 2929
[3] Room Reflections Misunderstood?, Siegfried Linkwitz, AES Convention Paper, presented at the 123rd Convention, October 2007, Preprint 7162
[4] Sound Reproduction, The Acoustics and Psychoacoustics Of Loudspeakers And Rooms, first edition, 2008, pages 457 - 461
[5] William M. Hartmann, "Listening in a Room and the Precedence Effect", Chapter 10 in "Binaural and Spatial Hearing in Real and Virtual Environments" R. Gilkey and T. Anderson (Eds.), Lawrence Erlbaum Associates, Hillsdale, NJ, 1997
[6]  Litovsky, R.Y., Colburn, H.S., Yost, W.A., and Guzman, S.J. (1999). "The precedence effect". The Journal of the Acoustical Society of America (JASA) Volume 106, Issue 4, pp. 1633–1654.
[7] L. Elmer, B. Fazenda, J. Hargreaves, J. Hirst, M. Wankling, “Subjective Preference of Modal Control Methods in Listening Rooms”, Journal of the AES, Volume 60 Issue 5, pages 338-349, May 2012

What makes loudspeakers “disappear” ?

According to my experience and the recent speaker projects, the following criteria (in arbitrary order) are important in order to create the sensation of disappearing speakers:

  • minimized diffraction on the speaker itself in a way that the cabinet or any other structures don't act as secondary sound sources.
  • no discontinuities in the directivity index/sound power response.

    Both of the above bullets can be summarized to the requirement of a smooth/uniform off axis behavior.
    Constant directivity is not a requirement.
  • wide dispersion.

The above criteria are nearly automatically fulfilled by building speakers that remain acoustically small compared to radiated wavelength.

"Wet" recordings as well as live sounding rooms contribute to the phenomenon by masking.

In addition to the above, low distortion, absence of venting sounds and minimized re-radiation through the cones or vents of the speakers caused by internal reflections are of some importance as well.
There must be no very early reflections (greater distance to large objects/surfaces) to establish and maintain the precedence effect.

So far, Demokrit-T is the best speaker that I know in performing this trick. They are simply nowhere and the equalization method, which puts increased focus to the phantom image between the speakers, supports the hiding trick additionally. Hard panned sounds to one channel only hover in the air somewhere near the pipes but they don't seem to have any fix relation to the physical speakers.

The sound is floating freely and all that remains is a fascinating and realistic auditory scene from two simple stereo speakers.

Last updated 03-Feb-2019