Skype, Facetime, YouTube… I’m not going to talk about them, though they use networks to carry audio. I’m talking about audio networks for professional audio - you know, CobraNet, EtherSound, Dante, Ravenna, AVB and so on. I can think of at least ten different types of professional audio network that I have used during the last decade. All of them claim to carry uncompressed digital audio around a studio/concert hall/festival site/other entertainment venue. They all have slightly different features and advantages. But which one has the best sound?

First let’s consider the parameters by which we measure the quality of digital audio: Sample rate and bit depth. 48kHz, 24-bit is used by most professional audio equipment. Some take it even further, beyond 96kHz and 32-bits. Recent network solutions such as Dante and Ravenna are compatible with these values, but most audio networks are run at 48kHz, 24-bit, so let’s consider this a level playing field so far.

Next, let’s think about latency. Early CobraNet systems had a 5.33ms latency in the network. Now some networks have as little as 0.125ms latency. This is convenient progress, but does it improve their sound? Well, it could help to provide better sound at a concert where acoustic instruments are amplified and their sound is reinforced by a networked audio system. But when the audience is completely separated from the acoustic source, will there be a difference? Listen to one of your favourite CDs of a live music recording. Then listen to it using the same equipment in the same room one hour later. It will sound the same. It started ‘life’ as acoustic sound waves, then was electrified, digitized, mixed and stored (onto CD). There was latency (shipping time, waiting in the shop, resting on the shelf), and it was turned back into acoustic sound waves by the amplifier and loudspeakers. No different for networked audio, except the latency is way shorter!

How about clock synchronisation? That could affect the quality of sound when listening to audio blended from two separate paths through the network. Older solutions such as CobraNet and EtherSound could keep devices in sync, but not in phase. This would result in small timing differences between audio being output from different networked devices. A network standard called ‘Precision Time Protocol’ solved this for later systems including Dante, Ravenna and AES67. But if all our audio outputs come from just one device, it’s not going to be a problem.

How else do these audio network types differ from conventional digital audio like AES/EBU? They - to use an awkward Americanism - ‘packetise’. In other words, for digital audio data to be squeezed into a computer-style network, the data must be formed into small packets. Imagine the audio network as a road and each audio sample is a person travelling along the road. AES/EBU would be two people on a motorbike travelling down a one-way street. CobraNet would be eight people in a minibus, but now there are eight lanes on each side of a wide highway: space for eight mini-buses travelling in each direction. EtherSound would be 64 people in a coach travelling down a road with a single lane each side. Dante would be four people in a car, but the highway now has more than 100 lanes on each side! All the vehicles travel at the same speed. It just takes a little longer for the people to get in and out of the car/bus/coach.

Does ‘packetising’ the data change the sound quality? Imagine a large Lego model being transported. Whether you pull the model apart and carry each brick separately or in groups of four, eight, 64… as long as they get reconstructed correctly at the destination, no viewer will care how it was couriered. It's the same with audio networks: they just carry the data without changing it. If you want to improve the sound quality of your digital audio system, you will first need a better Lego model designer (AD converter). And then think about how to bring the Lego model to life again (DA converter). The network just carries the bricks!

A couple of years ago I took part in a test with around 50 audio professionals and students familiar with listening comparisons. We used two Yamaha devices with multiple card slots, and linked them with four different audio networks and also AES/EBU. The same high fidelity audio source, AD/DA convertors and powered speakers were used for all. We asked the listeners to choose their favourites while listening ‘blind’. We didn’t allow them to rank any two the same. The results were as random as guessing how a dice rolls!