Características de tiempo-frecuencia para la estimación de la posición de los órganos articuladores en consonantes explosivas

Alexander Sepulveda-Sepulveda; German Castellanos-Domínguez

doi:10.17230/ingciencia.8.16.2

Alexander Sepulveda-Sepulveda

Universidad Nacional de Colombia

https://orcid.org/0000-0002-9643-5193
German Castellanos-Domínguez

Universidad Nacional

Keywords

acoustic-to-Articulatory inversion, Gaussian mixture models, articulatory phonetics, time-frequency features.

Abstract

Acoustic-to-Articulatory inversion offers new perspectives and interesting applicationsin the speech processing field; however, it remains an open issue. This paper presents a method to estimate the distribution of the articulatory informationcontained in the stop consonants’ acoustics, whose parametrizationis achieved by using the wavelet packet transform. The main focus is on measuringthe relevant acoustic information, in terms of statistical association, forthe inference of the position of critical articulators involved in stop consonantsproduction. The rank correlation Kendall coefficient is used as the relevance measure. The maps of relevant time–frequency features are calculated for theMOCHA–TIMIT database; from which, stop consonants are extracted andanalysed. The proposed method obtains a set of time–frequency components closely related to articulatory phenemenon, which offers a deeper understanding into the relationship between the articulatory and acoustical phenomena.The relevant maps are tested into an acoustic–to–articulatory mapping systembased on Gaussian mixture models, where it is shown they are suitable for improvingthe performance of such a systems over stop consonants. The method could be extended to other manner of articulation categories, e.g. fricatives,in order to adapt present method to acoustic-to-articulatory mapping systemsover whole speech.

PACS: 87.85Ng

MSC: 68T10

Downloads

Download data is not yet available.

Abstract 895 | PDF Downloads 495 HTML Downloads 1271

References

[1] P. Badin, Y. Tarabalka, F. Elisei, G. Bailly, “Can you ’read’ tongue movements? Evaluation of the contribution of tongue display to speech understanding”, Speech Communication, vol. 52, n.o 6, pp. 493-503, jun. 2010. Referenced in 37

[2] J. Schroeter, M. Sondhi, “Speech coding based on physiological models of speech production,” in Advances in Speech Signal Processing, S. Furui and M. M. Sondhi, Eds. NewYork: Marcel Dekker Inc, 1992, ch. 8. Referenced in 37

[3] S. King, J. Frankel, K. Livescu, E. McDermott, K. Richmond, M.Wester, “Speech production knowledge in automatic speech recognition”, The Journal of the Acoustical Society of America, vol. 121, n.o 2, pp. 723-742, 2007. Referenced in 37

[4] P. Jackson, V. Singampalli, “Statistical identification of articulation constraints in the production of speech”, Speech Communication, vol. 51, n.o 8, pp. 695-710, ago. 2009. Referenciado en 37, 45

[5] H. H. Yang, S. V. Vuuren, S. Sharma, H. Hermansky, “Relevance of time-frequency features for phonetic and speaker-channel classification”, Speech Communication, vol. 31, n.o 1, pp. 35-50, may 2000. Referenced in 37

[6] Mark Hasegawa-Johnson. Time-frequency distribution of partial phonetic information measured using mutual information. Beijing, 2000. [Online] Available: http://www.isle.illinois.edu/sst/pubs/2000/hasegawa-johnson00interspeech.pdf, In InterSpeech, pp. 133-136. Referenced in 37

[7] J. Schroeter, M. Sondhi, “Techniques for estimating vocal-tract shapes from the speech signal”, IEEE Trans. on Speech and Audio Processing, vol. 2, pp. 133-150, 1994. Referenced in 37

[8] V. Sorokin, A. Leonov, A. Trushkin, “Estimation of stability and accuracy of inverse problem solution for the vocal tract”, Speech Communication, vol. 30, n.o 1, pp. 55-74, 2000. Referenced in 37

[9] G. Papcun, et. al., “Inferring articulation and recognizing gestures from acoustics with a neural network trained on x-ray microbeam data”, J. Acoust. Soc. Am., vol. 92 n.o 2, pp. 688-700, 1992. Referenced in 37

[10] Gh. Choueiter, J. Glass, “An Implementation of Rational Wavelets and Filter Design for Phonetic Classi cation”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15 n.o 3, pp. 939-948, 2007. Referenced in 38

[11] J. Silva, Shrikanth Narayanan, “Discriminative Wavelet Packet Filter Bank Selection for Pattern Recognition”, IEEE Transactions on Signal Processing, vol. 57, n.o 5, pp. 1796-1810 ,2009. Referenced in 38

[12] P. Addison, The Illustrated Wavelet Transform Handbook, 1st ed. Taylor & Francis, 2002. Referenced in 38

[13] S. Mallat, A Wavelet Tour of Signal Processing, Third Edition: The SparseWay, Academic Press, 1998. Referenced in 38

[14] A. Akansu, P. Haddad, Multiresolution Signal Decomposition, Second Edition: Transforms, Subbands, and Wavelets, 2.a ed. Academic Press, 2000. Referenced in 39

[15] O. Farooq, S. Datta, “Mel filter-like admissible wavelet packet structure for speech recognition”, Signal Processing Letters, IEEE, vol. 8, n.o 7, pp. 196 -198, jul. 2001. Referenced in 39, 40

[16] K. Richmond, S. King, P. Taylor, “Modelling the uncertainty in recovering articulation from acoustics”, Computer Speech & Language, vol. 17, n.o 2-3, pp. 153-172, abr. 2003. Referenced in 40, 44

[17] J. Gibbons, S. Chakraborti, G. Gibbons, Nonparametric Statistical Inference, Marcel Dekker Inc., 2003. Referenced in 42, 43

[18] Alan Wrench. “MOCHA-TIMIT”, The Centre for Speech TechnologyResearch. [Online]. Available: http://www.cstr.ed.ac.uk/research/projects/artic/mocha.html. Referenced in 44

[19] Korin Richmond, Articulatory feature recognition from the acoustic speech signal. PhD. thesis, University of Edinburgh. [Online]. Available: http://www.cstr.ed.ac.uk/publications/users/korin.html. Referenced in 45

[20] Tomoki Toda, Alan Black, Keiichi Tokuda, “Statistical Mapping between Articulatory Movements and Acoustic Spectrum using Gaussian Mixture Models”, Speech Communication, vol. 50 n.o3, pp. 215-227, 2008. Referenced in 48, 51
[21] C. Bishop, Pattern Recognition and Machine Learning, 1st ed. 2006. Corr. 2nd printing. Springer, 2007. Referenced in 48

[22] R. Kent, Charles Read, Acoustic Analysis of Speech, Thomson Learning, 2002. Referenced in 50

PDF HTML

Published

Nov 30, 2012

DOI https://doi.org/10.17230/ingciencia.8.16.2

How to Cite

Sepulveda-Sepulveda, A., & Castellanos-Domínguez, G. (2012). Time-Frequency Energy Features for Articulator Position Inference on Stop Consonants. Ingeniería Y Ciencia, 8(16), 37–56. https://doi.org/10.17230/ingciencia.8.16.2

Issue

Vol. 8 No. 16 (2012)

Section

Articles

Supporting Agencies

Author Biographies

Alexander Sepulveda-Sepulveda, Universidad Nacional de Colombia

Magíster en Automatización, PhD(c) en Ingeniería-Automática

German Castellanos-Domínguez, Universidad Nacional

PhD en Telecomunicaciones

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Main Article Content