Use of phoneme dedicated artificial neural networks to predict segmental durations

The results of two alternative models to predict segmental durations in speech synthesis, both based on Artificial Neural Networks (ANNs) are discussed. The ANN model consists in just one ANN trained to predict the segmental durations for all phonemes. The phoneme dedicated ANN model consists in a set of ANNs, each one dedicated to predict the segmental duration of a specific phoneme. Both models are compared with the same input information extracted from one European Portuguese database. Objective and subjective measurements of performance of both approaches are compared. A slight preference was denoted for the phoneme dedicated ANN model.

Use of phoneme dedicated artificial neural networks to predict segmental durations Artigo de Conferência