F0 estimation for stable intervals
The F0 estimation method for stable intervals finds a quadruple of signal peaks P = (pL, p0, p1, pR) of either maximum or minimum peaks pi, i = L, 0, 1, R, such that the center position of the frame is between p0 and p1. A peak is defined as either a local minimum or a local maximum in the sequence of signal samples. For each peak, pk, k = 0, ..., n - 1, in a speech segment, we maintain a triple of values (xk, yk, ck), where xk and yk are the peak
coordinates and ck is the peak classification - either a minimum or a maximum. The F0 estimate is the inverse of the mean of the period lengths found in P, i.e. the mean of the distances between peaks pL and p0, p0 and p1 as well as p1 and pR. The tuple P is selected among a series of possible candidate tuples according to a similarity score. Furthermore, it is checked whether the peak tuple is not a multiple of the supposedly true F0 period, otherwise a different peak tuple is selected. In the following, we describe the algorithm to find such a peak tuple P for each stable frame.
We start by finding the peak in the frame that has the highest absolute value. We then look for candidate peaks that have a similar absolute height and whose distance from the highest peak is within the permissible range of period lengths. The search for candidate peaks is performed in the direction of the center position of the frame. Given a peak pair p0 and p1 - one of them with the highest absolute value and a candidate peak - the algorithm looks for peaks to the left and the right to complete the quadruple. We select those peaks with the highest absolute values in about the same distance to the left and to the right of p0 and p1 as the distance between the two peaks. The peak quadruple may reduce to a triple peak sequence if such a peak at one side of the middle peak pair cannot be found. Each such candidate peak quadruple or peak triple is scored and the tuple with the highest score is selected as the tentatively best candidate.
The proposed score measures the equality of peak distances and absolute peak heights of a peak tuple. The score s for peak tuple P = (pL, p0, p1, pR) is the product of partial scores sx and sy. The value sx measures the equality of the peak intervals, whereas sy is a measure for the similarity of the absolute peak heights. The partial score sx is defined as 1 - a, where a is the root of the mean squared relative differences between the peak distances at the edges and the distance between the middle peaks. The partial score sy is given as 1 - b, where b is the root of the mean squared difference of the absolute peak heights from the maximum absolute peak height in the given peak tuple. The equations below show how the score s is computed for a peak quadruple in detail. The formulas are easily adapted for tuples with only three peaks:
The partial score sx is defined as follows:
The value xi, i = L, 0, 1, R, refers to the x-coordinate of peak pi as mentioned above.
The partial score sy is given by:
The value y; denotes the peak height of peak pl5 i = L, 0, 1, R. The score s delivers exactly 1 if the peak heights and peak intervals are equal and less than 1 if they differ.
The peak tuple with the highest score may be a multiple of the true period. Thus, we check for the existence of equidistant partial peaks within the peak pair p0 and p1. Such partial peaks must have about the same absolute height as the original candidate peaks p0 and p1. If such partial peaks on both sides of the x-axis are found, we look for a candidate peak tuple with the partial peak distance and install it as the current best candidate.
Next, we find the peak tuple in the center of the frame that has the same period length as the best candidate tuple. This is achieved by looking for peaks to either the left or the right side of the best candidate in the distance of the period length until a peak tuple is found, where the frame’s center is between the two middle peaks p0 and p1.
Finally, we detect sequences of roughly equal F0 estimates in a stable interval. These sequences are referred to as equal sections. The F0 estimates of the frames in an equal section must not deviate from the mean F0 in the equal section by more than a given percentage that is currently set to 10%. The longest such equal section with a minimum length of 3 is set as the equal section of the stable interval. The remaining equal sections are maintained in a list and may be used during F0 propagation (see section 9.4).