KLANG - 1 more Linux audio disaster waiting for slashdot/reddit

Recently this page surfaced with a description of KLANG which aims/claims to be a new framework for audio on Linux. It contains a number of troubling errors, mistakes and wrong-headed thinking.
  • As an opening note, I find it a bit odd that this announcement of KLANG is so anonymous. No name, no email addresses. Of course, we do know who the author is, its just strange that this page has no contact information, no links, nothing.
  • Just in case nobody realized audio is in already done in the kernel. That's where device drivers live, and device drivers are how all I/O with audio interfaces takes place (theoretically, it might be possible to do user-space I/O with USB, but in practice, it doesn't work well).
  • Dropouts ("xruns") are not really caused by being in user space. They arise from two different types of events:
    1. poor kernel scheduling
    2. incorrect application implementation
    You can't solve the first without just doing everything in an interrupt handler. This is 2012, and this is not how you write general purpose operating systems anymore. You can't solve the second with a kernel framework, and incorrect application implementation will remain an issue with any audio API. Remember: most of what an audio app does happens in user space regardless of how the kernel side of things operates.
  • The unix calls (they are not "OSS calls") open/read/write/ioctl are NOT the right API for streaming media, not because they don't work but because they allow sloppy developers to write code with the wrong design. Every piece of software on Linux that does audio i/o ultimately calls these functions, but they are wrapped in ways that (gently) encourage developers to structure their software in the right way(s) to avoid dropouts and other issues.
  • KLANG as documented does not appear to offer any approach to inter-application audio, which JACK does in a way that is completely seamless with actual device audio i/o. This is not a small thing.
  • open/read/write/ioctl are also the wrong API because no interposition is possible without using grotesque hacks like LD_PREOPEN. By using system calls, you basically mandate that everything on the other side of them is done in the kernel. This makes it very inefficient to implement inter-application audio, since everything has to make extra transfers across the kernel/user space boundary.
  • KLANG is talking about putting mixing (and possibly some other kinds of processing) into the kernel. Since such processing is almost always done in floating point format by almost every piece of software on the planet, this is problematic, since there is no floating point allowed in the kernel.
  • JACK consumes barely any more CPU than an in-kernel design would. If you don't understand why this is true, then you don't understand enough to be crafting a replacement.
  • JACK's impact on power consumption is a function of low latency realtime audio, not JACK (i.e. there is no way to do low latency realtime without affecting pwoer consumption). If a user or an application wants to handle audio data with 1msec of latency, you cannot avoid the CPU staying active. You cannot buffer you way out of the requirement to handle very frequent interrupts from the audio interface.
  • The KLANG "documentation" suggests the need to reimplement kernel side drivers for every audio interface, which is just an absurd effort.
  • Finally, I would note that ESD is irrelevant and has been for nearly a decade - I don't know why anyone would even mention it.

A few comments on the comments

I'd like to address some points given here:

As an opening note, I find it a bit odd that this announcement of KLANG is so anonymous. No name, no email addresses. Of course, we do know who the author is, its just strange that this page has no contact information, no links, nothing.

That is, because the project was not to be officially announced for a long time. This page was meant as a placeholder so that people I discuss this had something to bookmark, that doesn't 404. It's still far too early to be released and this was sort of a FAQ for those who kept asking.

Somebody of those people posting this to Reddit preempted this.

Dropouts ("xruns") are not really caused by being in user space. They arise from two different types of events:

  1. poor kernel scheduling
  2. incorrect application implementation

Exactly. Scheduling is indeed the point here. You can't influence the scheduling from user space. But you can from kernel space. If the buffers of a process doing audio are running low, you can reschedule process for early continuation. If the buffers for a reading process get full, you can reschedule it for early continuation.

KLANG as documented does not appear to offer any approach to inter-application audio, which JACK does in a way that is completely seamless with actual device audio i/o. This is not a small thing.

Actually KLANG is all about inter-application audio. You open /dev/dsp and your FD becoms a endpoint into the KLANG routing system. O_RDONLY opened FDs are sinks, O_WRONLY opened FDs are sources. O_RDWR opened FDs create a sink and a source. You can connect any sink with any source in KLANG as you can do it with JACK. You can route process endpoints to process endpoints, or HW endpoints to HW endpoints.

A lot of design decisions in KLANG were directly influenced by JACK. Two of the key points in KLANG's design were:

- Everything that goes with JACK should be possible with KLANG.
- KLANG should not replace JACK but actually provide a nice environment for it to live in.

In fact I had planned to get in touch with the JACK developers rather soon, so that we could implement a libjack.so that does all routing via KLANG instead of going through a user space server. Routing over a user space server adds some additional expensive context switches. Yes I know that JACK uses floating point, but this is not a problem, because mixing is simple addition, and that can be done without the help of a FPU, and rather efficiently, too, actually.

This makes it very inefficient to implement inter-application audio, since everything has to make extra transfers across the kernel/user space boundary.

Sorry, but this is just FUD. How do you think data is exchanged between user space processes? You cross the userspace-kernel boundary twice doing so. Any sort of IPC always involves system calls. Even if it goes over shared memory. Because shared memory is actually shared address space and all sorts of "kernel-hell" breaks loose, touching it.

Since such processing is almost always done in floating point format by almost every piece of software on the planet, this is problematic, since there is no floating point allowed in the kernel.

KLANG uses integer arithmetic for its whole signal chain. You can do everything with integers just fine, and in fact with higher precision. The main reason to use floating point numbers in audio is for space efficiency when storing large dynamic range audio. A 32 bit float has 24 bits of effective precision in the mantissa. The exponent is "just" a gain factor (so to speak).

KLANG's internal stream format gives at least 8 bits of foot- and headroom for all samples in it. Gain/attenuation is applied by factoring the multiplicator to the closest radix 2 and remainder. Then a bitshift is applied followed by multiplication with the remainder.

JACK consumes barely any more CPU than an in-kernel design would. If you don't understand why this is true, then you don't understand enough to be crafting a replacement.

JACK itself not. But the added context switches between applications do. That's the main problem here.

If a user or an application wants to handle audio data with 1msec of latency, you cannot avoid the CPU staying active.

I thought so, too, for a long time. But then Lennart Poettering (re-)discoverd a rather old method – this is one of the few cases where I think he did something good – how you could get low latency even when operating with large buffer sized. This might sound impossible, but only as long as you assume a filled buffer being intouchable. If you accept, that one may actually perform updates on a already submitted buffer, just slightly ahead from where it's currently read from (for example in a DMA transfer) you can get latency down, even with larger buffers. Lennart implemented this in PA when they were approaching very long buffers (256ms and longer) on mobile devices, but still needed low latency for audio events.

The only drawback of PA's implementation is, that it uses a rather crude scheme to estimate the position of the "readhead", which is prone to phase oscillations (in the readhead position). Basically one wants to use a PLL for this, but PA uses sort of a FLL for this.

The KLANG "documentation" suggests the need to reimplement kernel side drivers for every audio interface, which is just an absurd effort.

True, and this is the biggest roadblock actually. But if you look at the state of many of the sound drivers, many, if not each single of them, requires a major overhaul.

Finally, I would note that ESD is irrelevant and has been for nearly a decade - I don't know why anyone would even mention it.

This was meant more of a joke and was there for completeness. I added it to mention the problems you run into, when designing audio multiplexing systems. Also I recently actually used the ESD protocoll in a crazy hack, where I had a Atmel AVR with Ethersex network stack playing tracker music via ESD (generated and uploaded sampled into a PA esd protocol module, then triggered playback). Totally crazy and useless, but fun.

Please don't feel that I want to replace JACK. JACK has its proper place in the Linux ecosystem. It just doesn't fit as a kitchen sink audio system (although I know plenty of people who actually use JACK as their universal audio backend on their boxes). Actually my intention was to build a healthy relationship with the JACK community for mutual benefit. KLANG's design has been heavily influenced by JACK and its API.

The unix calls (they are not "OSS calls") open/read/write/ioctl are NOT the right API for streaming media, not because they don't work but because they allow sloppy developers to write code with the wrong design.

Actually I'm very interested in what you mean by this. I know a certain coder (goes by the handle mazzoo) who does things like realtime signal processing for very low frequency SDR and exclusively uses the OSS compatibility API because native ALSA is a drag to use. Also I don't see any problems there.

Playing audio: write(2) the samples to the FD and the write call returns when the buffer is almost finished playing. Almost, because the process shall be given enough time to prepare another buffer.

For asynchronous operations you can use either mmap or the async io syscalls (on Linux).

Reading is even simpler, because you request a certain number of samples and read returns either having read that amount of samples or maybe a bit early, but buffering further samples so that nothing (beyond a certain timeout is lost).

I'm very interested in exaples of bad audio program design, though. So if you have them, I'd like to read them, to learn from them, what not to do or encourage.

I don't do discussions on forums ...

I am no fan of web forums, and especially not drupal forums. There's a lot that is still really wrong in your comments, but if we're going to have a conversation about it, it needs to be on a mailing list or at very least a phpBB powered forum.

How about discussing this in

How about discussing this in the forum thread already existing over at Phoronix? Where you already answered some questions.

I'm going to setup a maillist for KLANG in the near future, but first I want to get that MTA switch done (Postfix to qmail) on the eudyptula MX.

phoronix ...

done (initial main comment is in moderation)

KLANG ... the sound of metal

KLANG ... the sound of metal crap crashing down ...

Just read the whole thing (time to waste tonight ...). Apart from some odd statements, I am very curious to see how a "professional grade" studio will fare with an "Intel HDA chipset" ...

Hmm looks like it could be a

Hmm looks like it could be a bad case of - http://xkcd.com/927/

I'd like to reiterate a few

I'd like to reiterate a few points based on our experience with Linux audio at Harrison.

We have a product called Xdubber that was developed 5 years ago. The Xdubber uses a custom JACK driver (no ALSA). The system operates with 8-sample buffers ( compared to the much more common 1024 or, at best, 64 samples provided in most systems ) for extremely low latency. We send 64 channels of 96kHz audio in and out of the system. I have tested this system for days using an Audio Precision bit-test with error-checking turned on.

Our findings with an actual commercial product have shown that:
*JACK has a very minimal CPU/memory footprint.
*JACK has nothing to do with XRUNS. it reports xruns that happen in the driver and/or application level which otherwise would go unreported.
*There is no fundamental issue with the coding style of JACK. It's just very hard work to d this kind of plumbing.
*An ultra-high performance, ultra-low latency system can be built with JACK

I've had similarly good experiences with the best-implemented ALSA devices, such as RME and older M-Audio.

There ARE a lot of issues with linux audio. But they stem mostly from the unbelievably wide range of use-cases between applications and devices. And the fact that Linux users actually have higher expectations of their audio system .... for example Windows still doesn't have the concept of virtual MIDI ports, much less inter-application routing. OSX's CoreAudio is better but still lacks fundamental features of ALSA and JACK.

-Ben Loftis
Harrison Consoles