GPU Computing again....

10 replies [Last post]
User offline. Last seen 4 years 28 weeks ago. Offline
Joined: 2007-06-07

Hi all,
I've been following the news for GPU-computing thats going on around OpenCL, nVidia's CUDA and AMD streaming computing for a while now and it sounds very exciting for the realm of audio processing. Now there have been quite a number of posts here that were of the kind "hey we need GPU-computing", but I was wondering where exactly the problems are and what the possible solution as well as possibilities (as to a possible usage of GPU-Computing) would be.
The obvious advantage of GPUs as has been stated everywhere is obviously its enormous processing power, when it comes to data-parallel tasks. That power is very applicable to digital audio processing, if I am not mistaken, which made me think about getting into a little coding in this respect (once I finish my thesis)...
Now Paul wrote the following in a post on GPU-Computing:
"GPGPU's are very powerful but also very latency-inducing. They are not designed for use in realtime, low latency contexts. If you were doing "offline rendering" they would offer huge amounts of power, but their current design adds many milliseconds to delays to signal processing."
I was wondering if someone with the knowledge could maybe elaborate a little further on that.
Ardour does have complete latency compensation (again if I am not mistaken) meaning that at least some of the audio effects could well be processed online on a GPU, as long as that compensation works correctly. Why would this not be viable?
I am not aware of how Ardour (or other programs for that matter) handles the effect processing and also not very familiar with LV2 or LADSPA (although I did read up on it a little).
Taking from the complexities that are being talked about, I suspect that the effect-plugins do not open their own threads on the CPU or do they? If they do, wouldn't it be possible to code them in OpenCL and use latency compensation?
Just to clarify: This isn't meant as a "please do this post", but rather I am wondering whether there is any sense to my idea of trying this as a personal project...
Hope this all doesn't sound too ridiculous, given that my knowledge of these things at the moment is still very rudimentary...
Thanks in advance for a reply,

User offline. Last seen 4 years 28 weeks ago. Offline
Joined: 2007-06-07

Oh yeah.
And maybe someone could point me in the right direction to read up a little on these things...

User offline. Last seen 2 years 5 weeks ago. Offline
Joined: 2006-12-14
Posts: would be a nice place to start I think.
As to the LADSPA and LV2 specs, they can be found at and respectively.
They are both very simple to use, and have pretty good documentation (use the source Luke ;-) ).

User offline. Last seen 4 years 28 weeks ago. Offline
Joined: 2007-06-07

Thanks for the reply. I checked out Unfortunately most of the links there seem to be dead. I also shortly looked into some sample code of an LV2 (again). The thing I don't understand (and that was my main question before) is how they work/interact with the host software. Can an LV2 plug-in open their own thread (i.e. do they have their own process) or do they somehow run "through" host. (I remember, for example, Paul saying that the Ardour audio processing section is not yet multi-threading capable. So does that only relate to the actual audio routing or also the effect plugins....)
I stumbled into this very interesting link:
So at least with VST it seems to be possible to do GPU audio processing using CUDA. I haven't tried out the plug-in yet, but I will, once I get around to it.
So maybe, if someone with insight into the deeper workings of LV2 (or the compiling process for that matter) could comment on this, I'd be grateful.

User offline. Last seen 4 years 28 weeks ago. Offline
Joined: 2007-06-07

Couldn't anyone just give a short reply on this?
I'd be very thankful,

linuxdsp's picture
User offline. Last seen 2 days 9 hours ago. Offline
Joined: 2009-02-04

For what its worth, this is my understanding of LV2:

The LV2 plugin is a shared library. The shared library contains a few functions to instantiate the plugin (set up various data structures etc) and also a 'run' function that is called by the host every time the it wants to process a block of audio samples. (The host calls the function with pointers to the buffers containing the audio and also specifies the number of samples to process). The function does its thing with the audio samples and returns. Essentially the shared library just provides a few functions that the host will call as and when it needs. When you load a plugin into the host, the host loads the shared library into memory and calls the various functions to instantiate the plugin. There is nothing to prevent you starting up a new thread from within the plugin when it is instantiated - or even fork / exec - ing another process but there are various rules governing what should / can execute in which thread - have a look in the lv2.h header which you need to compile an LV2 plugin - this pretty much encapsulates the API. I'm not entirely convinced about LV2's ability to handle GUI extensions etc yet across all hosts - but this is a topic of some debate it seems. There is a FAQ that covers some of the issues I've encountered while trying to develop LV2 plugins on my site:

You can use the contact info on my site and I can give a more detailed description, the LV2 devs maybe able to shed some light on things - it looks to me as though LV2 is still evolving...

SkinDeepHouse's picture
User offline. Last seen 1 day 2 hours ago. Offline
Joined: 2015-04-12

Quoted from: "I have been working on a jack client that uses the GPU to process audio. The benefit of this is that it lightens the load off the CPU to process audio. I used the NVIDIA CUDA toolchain to create a jack-cuda client. Currently I have made a gain plugin that uses 256 parallel threads to amplify a jack audio stream in realtime. There is a bit of overhead because it copies the stream to video ram first, then processes the audio and copies it back to main ram, but the PCI-e bus is pretty fast so it’s still overall pretty fast."

allank's picture
User offline. Last seen 3 days 2 hours ago. Offline
Joined: 2006-12-07

Doesn't sound like a great solution. The newer AMD APU's, offer a method of sharing the RAM between the CPU part of the chip and the GPU part of the chip, this would solve the latency issues, but would be limited to AMD only.

TW's picture
User offline. Last seen 2 days 1 hour ago. Offline
Joined: 2015-04-13

I seem to remember reading that there is some sort of effort to get CPU GPU concurrency going in the Linux kernel. I think on a very basic level the idea is that the resources available from both the CPU and GPU are placed into a single pool. And then as resources are required they are then drawn from said pool.
Combining concurrency and cgroups has the potential to allow hardware to be used more efficiently and allow access to more previously unavailable resources under heavy load, if for example a GPU expansion card is used in a machine, where the GPU would previously just be idling. The effects of concurrency on CPUs with hyper threading and also the combination of a GPU expansion card with a CPU which offers hyper threading will be partly interesting.
In relation to audio; I believe that cgroups will be useful in combination with concurrency to achieve low latencies.

paul's picture
User offline. Last seen 1 hour 50 min ago. Offline
Joined: 2006-03-16

GPU's are still largely irrelevant for audio work. The latencies involved in moving data to and from the device are too great for real time work unde r most conditions.

CGroups are already part of the Linux kernel and have been for many years. They do not provide any features to improve latency or concurrency, they just provide mechanisms to manage resource allocation, and are already active and in use. CGroups are the reason that SCHED_FIFO scheduling is no longer dangerous on Linux kernels - they prevent even a SCHED_FIFO process from stealing all the CPU time.

User offline. Last seen 1 day 5 hours ago. Offline
Joined: 2009-12-08

"I seem to remember reading that there is some sort of effort to get CPU GPU concurrency going in the Linux kernel."

What you are probably remembering is heterogeneous system architecture (HSA) which allows the CPU and GPU to share a single address space. That would in principle allow passing a pointer to a buffer from the CPU to the GPU, rather than copying the data from CPU address space to GPU address space as is currently done. Available now on some CPU's from AMD.
It should improve latency a lot compared to the current segregated address space, but is probably still marginally useful for true real time operation.

A first design for GPU processed audio would probably be something like an external jack client, rather than a plugin, so that you could hide the specifics of passing data to the GPU, and could report the latency through the jack api.

Perhaps a useful first question would be in what situations do you run out of processing power currently, even when increasing the latency? It would not be fair to compare for example capabilities using CPU at 5ms or 10ms or 15ms latency to a GPU processing solution at 50ms or 100ms latency, so if you have a situation where you max out the CPU capabilities, the first step would be to increase the latency and see whether you still have the limit, or whether you are really just seeing the limits of low latency real time operation with that system.
Only after clearly stating the problem could you determine if GPGPU is a solution to that problem (if in fact there is a problem to be solved).