Frequency-color representation for audio regions
It's easy to align the rhythms of rhythm instruments / tracks. There are clearly visible onsets / offsets in the drawn waveform. But the same is not true for less rhythmic instruments / vocal tracks. I'm personally concerned with vocals since I mostly deal with A Cappella music, but the same could be true for a variety of instruments. It's not always evident where an "ooh" changes to an "aah," and pitch changes on the same syllable are even harder to detect visibly (at least in a suitable zoom scale for quickly aligning multiple parts.)
Enter colors with a direct relationship to the frequency of the waveform at that location. The more I think about it the more complicated I realize it is; yet if it is pulled off it would be a huge convenience to many engineers / producers as a simple, elegant, unobtrusive addition the visual information being presented. The basic concept could be one of two. The first would be to only consider the fundamental frequency (a la pitch-tracking algorithms), though that wouldn't help at all for making an "ooh"-"ah" transition visible. The second therefore could color the waveform according to the entire spectral content much like how a mixture of different frequencies of light produces a single color. I'm not entirely sure the latter would work since I'm not too familiar with principles of color perception for wide-band light signals, but if it were pulled off it would certainly be a more robust solution than only considering fundamental frequency (which is also prone to more errors than straight-up spectral analysis)
First thought for a mapping: red = 20Hz, Violet = 20KHz, logarithmic in between. This might, however, not yield a sufficient enough contrast between C3 and D3 to perceive. If this is the case, a potential solution is a modular map. 11Hz = Red, 100Hz = Violet, 101Hz = Red, 1KHz = Violet. 1.001KHz = Red, 10KHz = violet; you get the idea. How that would play into the spectral content color mixing I'm not really sure.
The goal is not necessarily to be able to read the pitch accurately from the color of the waveform, but to just be able to tell by inspection when each singer goes from A to G, so that all of the singers (or instruments) can be aligned by inspection even when onset / offset / change in intensity cues are not present or obvious.
Has anyone seen an idea like this in implementation? Any ideas on how the spectral content color-mixing could work?