Understanding the Spectrogram/Waveform display

Overview

The RX Audio Editor features a rich visual environment for editing and repairing audio. The central focus of the interface is the Spectrogram/Waveform display. It combines an advanced Spectrogram with a waveform transparency overlay to provide frequency and amplitude information in one highly configurable window.

Using the spectrogram to identify audio problems

ANATOMY OF THE SPECTROGRAM DISPLAY

The spectrogram allows you to visualize both frequency and amplitude information of an audio recording in one display.

FREQUENCY

The Spectrogram shows frequency information across the vertical axis. Lowest frequency content is displayed at the bottom, highest frequency content is displayed at the top.

Spectrogram Sine Sweep
This image shows the spectrogram of a sine sweep over pink noise. The sine sweep starts at 20 Hz (bottom of the display) and sweeps to 20 kHz (top of the display) over 4 minutes.

AMPLITUDE & COLOR

The amplitude of frequency content is indicated by variations in color in the Spectrogram. The color map ruler (to the right of the frequency ruler) shows the color being used to represent a given amplitude value.

spectrogram color map
In this example, Louder events (speech) are indicated by brighter colors (yellow/bright orange) and quieter events (breaks in speech and noise floor) are indicated by darker colors (dark orange, blue, black)

Spectrogram Settings

The RX Spectrogram is highly configurable, you can adjust the default configuration, load a preset or save your own preset in the Spectrogram Settings window.

The Spectrogram Settings window can be opened:

  • From the “View” menu of the RX Audo Editor
  • By right-clicking on the spectrogram display and selecting “Spectrogram Settings” from the context menu
  • Using a keyboard shortcut: Command+Shift+, (on Mac) or Ctrl+Shift+,

Spectrogram Settings

  • SPECTROGRAM TYPE: RX offers different methods for displaying time and frequency information in the Spectrogram. RX’s advanced Spectrogram modes allow you to see sharper time (horizontal) and frequency (vertical) resolution simultaneously. There is always a trade-off of display quality versus processing time, so keep in mind that some modes will take longer to draw on the screen than others.

TYPE DESCRIPTION EXAMPLE
REGULAR STFT Most common spectrogram type (can be found in other editors) It has a fixed uniform time-frequency resolution. This is the simplest and fastest drawing mode in RX. RegularSTFT
AUTO-ADJUSTABLE STFT Automatically adjusts FFT size (i.e. time and frequency resolution of a Spectrogram) according to the zoom level. For example, if you zoom in horizontally (time) you’ll see that percussive sounds and transients will be more clearly defined. When you zoom in vertically (frequency), you’ll see individual musical notes and frequency events will appear more clearly defined. Auto Adjustable STFT
MULTI-RESOLUTION Calculates the Spectrogram with better frequency resolution at low frequencies and better time resolution at high frequencies. This mimics psychoacoustic properties of our perception, allowing the Spectrogram display to show you the most important information clearly. Multi resolution
ADAPTIVELY SPARSE Automatically varies the time and frequency resolution of the Spectrogram to achieve the best Spectrogram sharpness in every area of the time-frequency plane. This often lets you see the most details for a thorough analysis, but it’s the slowest mode to calculate. Adaptively sparse
  • FFT SIZE: The greater the FFT size, the greater the frequency resolution, i.e. notes and tonal events will be clearer at larger sizes. However, choosing a larger number here will make time events less sharply defined because of the way this type of processing is done. Choosing Auto-adjustable or Multi-resolution modes allows you to get a good combination of frequency and time resolution without having to change this setting as you work.

What does FFT mean?

Fast Fourier Transform: a procedure for the calculation of a signal frequency spectrum. The greater the FFT size, the greater the frequency resolution, i.e., notes and tonal events will be clearer at larger sizes.

  • ENABLE REASSIGNMENT: Enables a special technique for Spectrogram calculation that allows very precise pitch tracking for any harmonic components of the signal. When used together with Frequency Overlap/Time Overlap controls, this option can provide virtually unlimited time and frequency resolution simultaneously for signals consisting of tones.

    Example Image of the effect of reassignment

    Enable Reassignment Example

  • WINDOW: Selects between the different weighting functions (or windows) that are used for the FFT analysis. Window functions control the amount of signal leakage between frequency bins of the FFT. “Weak” windows, such as Rectangular, allow a lot of leakage, which may blur your Spectrogram vertically. “Strong” windows, such as Kaiser or cos3, eliminate leakage at the expense of a slight loss of frequency resolution.

  • FREQUENCY SCALE: Using different frequency scales can help you see useful information more easily. Different scales have different characteristics for displaying the vertical (frequency) information in the Spectrogram display.

    • LINEAR: Displays frequencies spread out in a uniform way. This is most useful when you want to analyze higher frequencies.
    • LOGARITHMIC: this scale puts more attention on lower frequencies.
    • MEL: the Mel scale (derived from the word Melody) is a frequency scale based on how humans perceive sound. This selection is one of the more intuitive choices because it corresponds to how we hear differences in pitch.
    • BARK: the Bark scale is also based on how we perceive sound, and corresponds to a series of critical bands.

  • FREQUENCY OVERLAP: Controls the amount of oversampling on the frequency scale of Spectrogram. When used together with the Reassignment option, it will increase the resolution of the Spectrogram vertically (by frequency).

  • TIME OVERLAP: This controls the time oversampling of the Spectrogram. In most cases, overlap of 4x or 8x is a good setting to start with. However, using higher overlap together with the Reassignment option will increase the time resolution of a Spectrogram, letting you see transient events clearly.

  • COLOR MAP: The Spectrogram display allows you to choose between several different color schemes. There is no right or wrong color setting to use and we recommend you try them all to determine your preference. Sometimes certain color modes will make different types of noise stand out more clearly. Experiment!

  • HIGH-QUALITY RENDERING: Accurate max-bilinear interpolation of the Spectrogram (recommended). Turning this control off makes Spectrogram rendering slightly faster, but you’ll lose some detail and clarity in the Spectrogram image.

  • REDUCE QUALITY ABOVE: RX’s Spectrogram uses very accurate rendering, letting you see audio problems, such as clicks, even at low zoom levels. However, performing such rendering for long files can be somewhat slow. When the length of the visible Spectrogram is above the specified number of seconds, the Spectrogram calculation is changed to a fast and less accurate preview mode. When you zoom in, the Spectrogram calculation becomes accurate again.

  • CACHE SIZE (MB): Limits the amount of memory used by the Spectrogram.

Rulers

On the right side of the Spectrogram/Waveform display are the Amplitude ruler for the Waveform, Frequency ruler for the Spectrogram, and Color Map ruler for the Spectrogram.

Amplitude Rulers

You can right-click on the spectral Amplitude ruler to reveal a selection of amplitude scales:

  • dB: Shows Waveform levels in decibels, relative to digital full scale (it is the most common type of scale used for spectrum analyzers).
  • NORMALIZED: Shows Waveform levels relative to the full scale level of 1.
  • 16 BIT: Shows Waveform levels as quantization steps of a 16-bit audio format (−32768 to +32767).
  • PERCENT: Shows Waveform levels as percentage from full scale.

Color Map Ruler

This ruler shows what color represents what amplitude in the Spectrogram. The range of this display is the dynamic range of the RX Spectrogram. You can click and drag the map to change the range and use the scroll wheel to make the range larger or smaller. This is useful for seeing very quiet noises without using gain to change the level of your audio.

Frequency Rulers

Right-clicking on the frequency ruler will display the frequency scale options:

  • LINEAR: Linear scale means that Hertz are linearly spaced on a screen.
  • MEL (default) & BARK: Mel and Bark are frequency scales commonly found in psychoacoustics, and reflect how our ears detect pitch. They are approximately linear below 500 Hz and approximately logarithmic above 500 Hz.

    • MEL scale reflects our perception of pitch: equal subjective pitch increments produce equal increments in screen coordinates.
    • BARK scale reflects our subjective loudness perception and energy integration. It is similar to Mel scale, but puts more emphasis on low frequencies.
  • LOG: in this mode, different octaves occupy equal screen space. The screen coordinates are proportional to the logarithm of Hertz down to 100 Hz.

  • EXTENDED LOG: this extends the logarithmic scale down to 10 Hz, so that it puts even more attention on lower frequencies.

  • PIANO ROLL OVERLAY: A representation of how specific frequency ranges correlate to the western musical scale can be displayed by right-clicking on the Frequency ruler and selecting Show Piano Roll. If you would like to hide the frequency indicators so they don’t obscure this piano roll, you can disable Show Frequencies and Ticks (which is enabled by default).

Waveform Displays

Waveform Transparency Balance Slider

Transparency Slider

The Spectrogram Display features a transparency slider that lets you superimpose a Waveform display over the Spectrogram, allowing you to see both frequency and overall amplitude at the same time. This can be invaluable for quickly identifying clipping, clicks and pops, and other events.

Below are examples of the same clip shown with different transparency balance values:
spectrogramwave & spectrogram
mostly waveformwave only

Waveform Overview

Audio Waveform Above Main Display

An overview of the entire audio file’s Waveform is displayed above the main Spectrogram/Waveform display in order to provide a handy reference point when zooming and making audio selections in RX.

The Waveform overview will always display the entire audio file, and will also display any selections made in the main display.

When zooming in on your audio, the currently visible audio region will also be highlighted in the Waveform overview. Click and drag on the highlighted region in order to scroll your main audio display left or right, and click and drag on the edges of the highlighted region in order to make the zoom tighter or wider. To zoom out fully, simply double click on the highlighted visible region.

Note

With your mouse hovering over the Waveform overview, you can also use the mouse wheel to scale the amplitude of the Waveform display to provide a clearer overview. This will not affect the amplitude scaling in the main Spectrogram/Waveform display.