Gaborator reference: gaborator.h

template<class T> class analyzer;

Frequency scales

The Gaborator supports three types of frequency scales: logarithmic, linear, and mel scales. Each is represented by a class deriving from the abstract base class fq_scale.

class fq_scale {

Comparison

Comparison operators are provided for compatibility with standard container classes.

bool operator<(const fq_scale &rhs) const;
bool operator==(const fq_scale &rhs) const;
};

Logarithmic frequency scale

The log_fq_scale class represents a logarithmic frequency scale, corresponding to a constant-Q transform or constant-Q spectrogram.

class log_fq_scale: public fq_scale {

Constructor

log_fq_scale(double bands_per_octave,
             double ff_min,
             double ff_ref = 0.5);
bands_per_octave
The number of frequency bands per octave. Values from 4 to 384 (inclusive) are supported. Integer values yield the best performance, but non-integer values are also supported. The minimum of 4 applies when the default overlap of 0.7 is used; if the overlap is increased from the default, the minimum should be increased proportionally. Values greater than 384 probably work but are not regularly tested.
ff_min
The lower limit of the analysis frequency range, in units of the sample rate. The analysis filter bank will extend low enough in frequency that the lowest bandpass filter center frequency falls at or below ff_min. Values from 0.0001 up to but not including 0.5 are supported. Values below 0.0001 probably work but are not regularly tested.
ff_ref
The reference frequency, in units of the sample rate. This allows fine-tuning of the analysis and synthesis filter banks such that the center frequency of one of the filters is aligned with ff_ref. If ff_ref falls outside the frequency range of the bandpass filter bank, this works as if the range were extended to include ff_ref. Must be positive. A typical value when analyzing music is 440.0 / fs, where fs is the sample rate in Hz.
};

Linear frequency scale

The lin_fq_scale class represents a linear frequency scale, corresponding to a short-term Fourier transform (STFT) or constant-bandwidth spectrogram.

class lin_fq_scale: public fq_scale {

Constructor

lin_fq_scale(double size);
size

The size of the transform. This corresponds to the "FFT size" parameter that is typically used with FFT based spectrum analyzers: the frequency spacing between bands is the sample rate divided by size, and the number of frequency bands produced is roughly half of size because negative frequencies are omitted. However, in the Gaborator, the size parameter is not limited to powers of two, or even to integer values, so any arbitrary band spacing of x Hz can be achieved by specifying a size of fs / x where fs is the sample rate in Hz.

Values from 2 to 10000 are supported. Larger values probably work but are not regularly tested.

};

Mel frequency scale

The mel_fq_scale class represents a mel frequency scale, a scale designed to mimic human pitch perception such that one mel is close to the smallest perceptible difference in frequency. It is defined by the formula mel = 2595 log10(1 + f / 700), where f  is the frequency in Hz.

The mel scale is approximately logarithmic at high frequencies, and approximately linear at low frequencies, with a smooth transition between the two.

class mel_fq_scale: public fq_scale {

Constructor

mel_fq_scale(double bands_per_mel,
             double fs);
bands_per_mel

The number of frequency bands per mel. Supported values range from 0.003 (yielding a total of 12 frequency bands between 0 and 22.05 kHz at a 44.1 kHz sample rate) to 1.0 (yielding some 3924 bands). Values larger than 1.0 probably work but are not regularly tested.

The minimum of 0.003 applies when the default overlap of 0.7 is used; if the overlap is increased from the default, the minimum should be increased proportionally. Also, if the sample rate is lower than 44.1 kHz, the minimum should be increased in inverse proportion to the sample rate. For example, at a sample rate 22.05 kHz and the default overlap of 0.7, the minimum number of bands per mel is 0.008).

fs
The sample rate in Hz. Because the mel scale is defined in terms of absolute frequencies in Hz, the sample rate needs to be specified explicitly; this is an exception to the sample rate agnostic design of the rest of the Gaborator API. Values from 8000 to 96000 are supported. Values outside this range probably work but are not regularly tested.
};

Phase conventions

enum class coef_phase { global, local };

The spectrogram coefficients produced by the Gaborator are complex numbers, and as such they have both a magnitude and a phase. The phase can be defined in several different ways, using different reference points for what is considered a phase of zero.

Under one possible definition, an input signal of a real and positive impulse at some time t  will cause all spectrogram coefficients at that time t  to also be real and positive, or in other words, to have a phase of zero, for any frequency. We call this the local phase convention, and it is the default beginning with version 2 of the Gaborator.

Under an alternative definition, an input signal of a cosine wave at some frequency f  will cause all spectrogram coefficients at that frequency f  to have a phase of zero, at any point in time. We call this the global phase convention, and it was the default in version 1 of the Gaborator.

The local phase convention is preferred because it guarantees that the coefficients for a given point in time only depend on the local signal behavior around that point in time, and not on how far that point in time is from the reference point of t=0. This avoids a loss of floating point precision for signals far from t=0.

Spectrum Analysis Parameters

A parameters object holds a set of parameters for the spectrum analysis and resynthesis.

class parameters {

Constructor

The only required parameter is the frequency scale, which is passed as an argument to the parameters constructor. Optional parameters may be specified by assigning to data members of the parameters object after construction.

parameters(const fq_scale &scale);

Legacy Constructor

For backwards compatibility with version 1 of the Gaborator, the following constructor is also supported:

parameters(double bands_per_octave,
           double ff_min,
           double ff_ref = 1.0);

This constructs a set of parameters with a logarithmic frequency scale, like parameters(log_fq_scale(bands_per_octave, ff_min, ff_ref)) except that the phase member is initialized to the version 1 default of coef_phase::global rather than the version 2 default of coef_phase::local.

Phase convention

The phase convention to be used for the analysis and resynthesis can be set or examined using the public data member phase.

coef_phase phase;

Overlap

The amount of overlap between adjacent bandpass bands can be set or examined using the public data member overlap. The default is 0.7, meaning each band will have a standard deviation of 0.7 times the band spacing.

double overlap;

Comparison

Comparison operators are provided for compatibility with standard container classes.

bool operator<(const parameters &rhs) const;
bool operator==(const parameters &rhs) const;
};

Spectrogram Coefficients

A coefs object stores a set of spectrogram coefficients. It is a dynamic data structure and will be automatically grown to accommodate new time ranges, for example as newly recorded audio is analyzed. The template argument T must match that of the analyzer (usually float). The template argument C is the data type used to store each coefficient value; there is usually no need to specify it explicitly as it will default to std::complex<T>.

template <class T, class C = std::complex<T>>
class coefs {

Constructor

coefs(analyzer<T> &a);

Construct an empty set of coefficients for use with the spectrum analyzer a. This represents a signal that is zero at all points in time.

};

Spectrum Analyzer

The analyzer object performs spectrum analysis and/or resynthesis according to the given parameters. The template argument T is the floating-point type to use for the calculations. This is typically float; alternatively, double can be used for increased accuracy at the expense of speed and memory consumption.

template <class T>
class analyzer {

Constructor

analyzer(const parameters &params);
params
The spectrum analysis parameters.

Analysis and synthesis

void
analyze(const T *signal,
        int64_t t0,
        int64_t t1,
        coefs<T> &coefs) const;

Spectrum analyze the samples at *signal and add the resulting coefficients to coefs.

signal
The signal samples to analyze, beginning with the sample from time t0 and ending with the last sample before time t1, for a total of t1 - t0 samples.
t0
The point in time when the sample at signal[0] was taken, in samples. For example, when analyzing an audio recording, this is typically 0 for the first sample in the recording, but this reference point is arbitrary, and negative times are valid. When using the global phase convention, accuracy begins to successively decrease outside the range of about ±108 samples, so using large time values should be avoided. When using the local phase convention, no such loss of accuracy occurs.
t1
The point in time of the sample one past the end of the array of samples at signal, in samples.
coefs
The coefficient object that the results of the spectrum analysis are added to.

If the coefs object already contains some coefficients, the new coefficients are summed to those already present. Because the analysis is a linear operation, this allows a signal to be analyzed in blocks, by making multiple calls to analyze() with non-overlapping ranges that together cover the entire signal. For efficiency, the blocks should be large, as in analyze(first_131072_samples, 0, 131072, coefs), analyze(next_131072_samples, 131072, 262144, coefs), etc.

void
synthesize(const coefs<T> &coefs,
           int64_t t0,
           int64_t t1,
           T *signal) const;

Synthesize signal samples from the coefficients coefs and store them at *signal.

coefs
The coefficients to synthesize the signal from.
t0
The point in time of the first sample to synthesize, in samples, using the same time scale as in analyze().
t1
The point in time of the sample one past the last one to synthesize.
signal
The synthesized signal samples will be written here, beginning with the sample from time t0 and and ending with the last sample before time t1, for a total of t1 - t0 samples.

The time range t0...t1 may extend outside the range analyzed using analyze(), in which case the signal is assumed to be zero in the un-analyzed range.

A signal may be synthesized in blocks by making multiple calls to analyze() with different sample ranges. For efficiency, the blocks should be large, and each t0 should be a multiple of a large power of two.

Frequency Band Numbering

The frequency bands of the analysis filter bank are numbered by non-negative integers that increase towards lower (sic) frequencies. Although this numbering may seem backwards, it makes a certain amount of sense for constant-Q analysis where the frequency range has a hard upper limit (the Nyquist frequency) but no hard lower limit. Numbering the bands from high to low frequency allows the frequency range to be extended into arbitrarily low frequencies without using negative band numbers or having to renumber existing bands. It also maps logically to computer graphics coordinate systems where the Y coordinate increases downwards.

When using a logarithmic frequency scale, there is a number of bandpass bands corresponding to the logarithmically spaced bandpass analysis filters, from near 0.5 (the Nyquist frequency, or half the sample rate) to near fmin, and a single lowpass band containing the residual signal from frequencies below fmin. Note that there is no special highpass band; frequencies at or close to the Nyquist frequency are included in the highest-frequency bandpass band(s).

When using a linear or mel frequency scale, all the bands are considered bandpass bands and there is no special lowpass band.

The band numbering can be examined using the following methods:

int bands_begin() const;

Return the lowest valid band number. This is always 0.

int bands_end() const;

Return the highest valid band number plus one. This is also the total number of bands.

int bandpass_bands_begin() const;

Return the lowest valid bandpass band number, corresponding to the highest-frequency bandpass band. This is currently 0, but should a future version of the Gaborator support a highpass band, that will become band 0 and bandpass_bands_begin() will return 1.

int bandpass_bands_end() const;

Return the highest valid bandpass band number plus one, corresponding to one past the lowest-frequency bandpass band.

int band_lowpass() const;

For an analyzer using a logarithmic frequency scale only, return the band number of the lowpass band. This returns the same number as bandpass_bands_end(), but is preferred for clarity when referring to the lowpass band rather than the excluded upper bound of the range of bandpass bands.

int band_ref() const;

For an analyzer using a logarithmic frequency scale only, return the band number corresponding to the reference frequency ff_ref. If ff_ref falls within the frequency range of the bandpass filter bank, this is a valid bandpass band number, otherwise it is not.

Band properties

double band_ff(int band) const;

Return the center frequency of band number band, in units of the sample rate. The center frequency of the lowpass band (if present) is 0.

double bandpass_band_ff(double band) const;

Return the center frequency of bandpass band number band, in units of the sample rate. Unlike band_ff, this takes a floating point argument and supports interpolating between bands by giving a non-integer band number, and/or extrapolating outside the valid range of bandpass band numbers.

double band_q(double band) const;

Return the Q (quality factor) of the analysis filter of bandpass band band. Q is defined as the -3 dB bandwidth divided by the center frequency. When using a logarithmic frequency scale, the bands form a constant-Q filter bank and Q is the same for all bands, and when using a linear or mel frequency scale, Q will vary from band to band.

Support

double band_analysis_support(int band) const;

Return the one-sided time domain support of the analysis filter for band band. When calling analyze() with a sample at time t, the spectrogram coefficients of band band will change significantly only within the time range t ± support. Coefficients outside the range may change slightly, but the changes will sufficiently small that they may be ignored without significantly reducing accuracy.

double analysis_support() const;

Return the largest band_analysis_support(band) of any band.

double band_synthesis_support(int band) const;

Returns the one-sided time domain support of the reconstruction filter for band band. When calling synthesize() to synthesize a sample at time t, the sample will only be significantly affected by spectrogram coefficients of band band in the time range t ± support. Coefficients outside the range may be used in the synthesis, but substituting zeroes for the actual coefficient values will not significantly reduce accuracy.

double synthesis_support() const;

Return the largest band_synthesis_support(band) of any band.

};

Functions

Iterating Over Existing Coefficients

template <class T, class F, class C0, class... CI>
void process(F f,
             int b0,
             int b1,
             int64_t t0,
             int64_t t1,
             coefs<T, C0> &coefs0,
             coefs<T, CI>&... coefsi);

Process one or more coefficient sets coefs0... by applying the function f to each coefficient present in coefs0, in an indeterminate order.

This can be optionally limited to coefficients whose band number b and sample time t satisfy b0b < b1 and t0t < t1. To process every coefficient present in coefs0, pass INT_MIN, INT_MAX, INT64_MIN, INT64_MAX for the arguments b0, b1, t0, and t1, respectively.

The function f should have the call signature

template <class T>
void f(int b, int64_t t, std::complex<T> &c0, std::complex<T> &ci...);

where

b
The band number of the frequency band the coefficients c0 and ci... pertain to. This may be either a bandpass band or the lowpass band.
t
The point in time the coefficients c0 and ci... pertain to, in samples
c0
A reference to a complex coefficient from coefs0
ci...
Optional references to complex coefficients from the additional coefficient sets coefsi....

The function f may read and/or modify each of the coefficients passed through c0 and each ci....

The first coefficient set c0 is a special case when it comes to the treatment of missing values. Coefficients missing from c0 will not be iterated over at all, but when a coefficient is iterated over and is missing from one of the additional coefficient sets ci..., it will be automatically created and initialized to zero in that additional coefficient set.

For example, in the common case where the processing takes one input and produces one output, c0 should be the input and c1 should be the output. This ensures that the entire input is iterated over, and that the output coefficients get created as needed.

Note: The template parameters C0 and CI... exist to support the processing of coefficient sets containing data of types other than std::complex<T>, which is not currently part of the documented API. In typical use, there is no need to specify them because the template parameter list can be deduced, but if they are explicitly specified, they should all be std::complex<T>.

Creating New Coefficients

template <class T, class F, class C0, class... CI>
void fill(F f,
          int b0,
          int b1,
          int64_t t0,
          int64_t t1,
          coefs<T, C0> &coefs0,
          coefs<T, CI>&... coefsi);

Fill a region of the time-frequency plane with coefficients and apply the function f to each.

This works like process() except that it is not limited to processing coefficients that already exist in coefs0; instead, any missing coefficients in coefs0 as well as any of the coefsi... are created and initialized to zero before f is called.

The t0 and t1 arguments must specify an explicit, bounded time range — they must not be given as INT64_MIN and/or INT64_MAX as that would mean creating coefficients for an an astronomically large time range, requiring a correspondingly astronomical amount of memory.

Forgetting Coefficients

template <class T>
void forget_before(const analyzer<T> &a,
                   coefs<T> &c,
                   int64_t limit);

Allow the coefficients for points in time before limit (a time in units of samples) to be forgotten. Streaming applications can use this to free memory used by coefficients that are no longer needed. Coefficients that have been forgotten will read as zero. This does not guarantee that all coefficients before limit are forgotten, only that ones for limit or later are not, and that the amount of memory consumed by any remaining coefficients before limit is bounded.

Legacy API For Iterating Over Existing Coefficients

Prior to version 1.5, iteration over the coefficients was done using the apply() function. It is similar to process(), except that it

In new code, process() is preferred.

template <class T, class F>
void apply(const analyzer<T> &a,
           coefs<T> &c,
           F f,
           int64_t t0 = INT64_MIN,
           int64_t t1 = INT64_MAX);

Apply the function f to each coefficient in the coefficient set c for points in time t that satisfy t0t < t1. If the t0 and t1 arguments are omitted, f is applied to every coefficient.

a
The spectrum analyzer that produced the coefficients c
c
A set of spectrogram coefficients
f
A function to apply to each coefficient in c, with the call signature
template <class T>
void f(std::complex<T> &coef, int band, int64_t t);
coef
A reference to a single complex coefficient. This may be read and/or modified.
band
The band number of the frequency band the coefficient coef pertains to. This may be either a bandpass band or the lowpass band.
t
The point in time the coefficient coef pertains to, in samples
t0
When not INT64_MIN, only apply f to the coefficients for time ≥ t0
t1
When not INT64_MAX, only apply f to the coefficients for time < t1