Reconocimiento del habla con acento español basado en un modelo acústico

Johanna Gabriela Plaza Salto; Sánchez-Zhunio Cristina; María Inés Acosta Urigüen; Marcos Patricio Orellana Cordero; Irene Priscila Cedillo Orellana; Jorge Luis Zambrano-Martínez

doi:10.29019/enfoqueute.839

Authors

Johanna Gabriela Plaza Salto University of Azuay https://orcid.org/0000-0003-1998-441X
Sánchez-Zhunio Cristina Universidad del Azuay https://orcid.org/0000-0002-9952-4853
María Inés Acosta Urigüen Universidad del Azuay https://orcid.org/0000-0003-4865-2983
Marcos Patricio Orellana Cordero Universidad del Azuay https://orcid.org/0000-0002-3671-9362
Irene Priscila Cedillo Orellana Universidad del Azuay, Universidad de Cuenca https://orcid.org/0000-0002-6787-0655
Jorge Luis Zambrano-Martínez Universidad del Azuay https://orcid.org/0000-0002-5339-7860

DOI:

https://doi.org/10.29019/enfoqueute.839

Keywords:

Automatic Speech Recognition, Language Model, CMUSphinx

Abstract

The objective of the article was to generate an Automatic Speech Recognition (ASR) model based on the translation from human voice to text, being considered as one of the branches of artificial intelligence. Voice analysis allows identifying information about the acoustics, phonetics, syntax, semantics of words, among other elements where ambiguity in terms, pronunciation errors, similar syntax but different semantics can be identified, which represent characteristics of the language. The model focused on the acoustic analysis of words proposing the generation of a methodology for acoustic recognition from speech transcripts from audios containing human voice and the error rate per word was considered to identify the accuracy of the model. The audios were taken from the Integrated Security Service ECU911 that represent emergency calls registered by the entity. The model was trained with the CMUSphinx tool for the Spanish language without internet connection. The results showed that the word error rate varies in relation to the number of audios; that is, the greater the number of audios, the smaller number of erroneous words and the greater the accuracy of the model. The investigation concluded by emphasizing the duration of each audio as a variable that affects the accuracy of the model.

Downloads

Download data is not yet available.

References

Aguiar de Lima, T., y Da Costa-Abreu, M. (2020). A Survey on Automatic Speech Recognition Systems for Portuguese Language and its Variations. Computer Speech and Language, 62. https://doi.org/10.1016/j.csl.2019.101055

Alharbi, S., Alrazgan, M., Alrashed, A., Alnomasi, T., Almojel, R., Alharbi, R., Alharbi, S., Alturki, S., Alshehri, F., y Almojil, M. (2021). Automatic Speech Recognition: Systematic Literature Review. IEEE Accedido 9: 131858–131876. https://doi.org/10.1109/ACCESS.2021.3112535

Ali, A., y Renals, S. (2018). Word error rate estimation for speech recognition: E-wer. ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), 2(2014), 20–24. https://doi.org/10.18653/v1/p18-2004

Ankit, A., Mishra, S. K., Shaikh, R., Gupta, C. K., Mathur, P., Pawar, S., y Cherukuri, A. (2016). Acoustic Speech Recognition for Marathi Language Using Sphinx. ICTACT Journal on Communication Technology, 7(3), 1361–1365. https://doi.org/10.21917/ijct.2016.0201

Celis, J., Llanos, R., Medina, B., Sepúlveda, S., y Castro, S. (2017). Acoustic and Language Modeling for Speech Recognition of a Spanish Dialect from the Cucuta Colombian Region. Ingeniería, 22(3): 362–376. https://doi.org/10.14483/23448393.11616

Belinkov, Y., y Glass, J. (2019). Analysis Methods in Neural Language Processing: A Survey. Transactions of the Association for Computational Linguistics, 7, 49-72.

Dhankar, A. (2017). Study of deep learning and CMU sphinx in automatic speech recognition. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (2296-2301). IEEE.

Singh, R., Raj, B., y Stern, R. M. (2018). Model Compensation and Matched Condition Methods for Robust Speech Recognition. En Noise Reduction in Speech Applications (pp. 245-275). CRC press.

Errattahi, R., El Hannani, A., y Ouahmane, H. (2018). Automatic Speech Recognition Errors Detection and Correction: A review. Procedia Computer Science,128: 32-37.

Peinl, R., Rizk, B., y Szabad, R. (2020). Open-source Speech Recognition on Edge Devices. En 2020 10th International Conference on Advanced Computer Information Technologies (ACIT) (pp. 441-445). IEEE.

Kim, D., Oh, J., Im, H., Yoon, M., Park, J., y Lee, J. (2021). Automatic Classification of the Korean Triage Acuity Scale in Simulated Emergency Rooms Using Speech Recognition and Natural Language Processing: A Proof of Concept Study. Journal of Korean Medical Science, 36(27): 1-13. https://doi.org/10.3346/JKMS.2021.36.E175

Lakdawala, B., Khan, F., Khan, A., Tomar, Y., Gupta, R., & Shaikh, A. (2018). Voice to Text transcription using CMU Sphinx A mobile application for healthcare organization. Proceedings of the International Conference on Inventive Communication and Computational Technologies, ICICCT 2018, Icicct, 749–753. https://doi.org/10.1109/ICICCT.2018.8473305

Medina, F., Piña, N., Mercado, I., y Rusu, C. (2014). Reconocimiento de palabras en español con Julius. ACM International Conference Proceeding Series, 2241. https://doi.org/10.1145/2590651.2590660

Peralta Vásconez, J. J., Narváez Ortiz, C. A., Orellana Cordero, M. P., Patiño León, P. A., y Cedillo Orellana, P. (2021). Evaluación del reconocimiento de voz entre los servicios de Google y Amazon aplicado al Sistema Integrado de Seguridad ECU 911. Revista Tecnológica - ESPOL, 33(2): 147-158. https://doi.org/10.37815/rte.v33n2.840

Tavi, L., Alumäe, T., y Werner, S. (2019). Recognition of Creaky Voice from Emergency calls. Proceedings of the Annual Conference of the International Speech Communication Association. INTERSPEECH, 2019-Septe: 1990-1994. https://doi.org/10.21437/Interspeech.2019-1253

Zhao, L., Alhoshan, W., Ferrari, A., Letsholo, K. J., Ajagbe, M. A., Chioasca, E.-V., y Batista-Navarro, R. T. (2020). Natural Language Processing (NLP) for Requirements Engineering: A Systematic Mapping Study. Computing Surveys 54(3): 1-41.

JCR(JCI)	Q4(0.13)
REDIB Journals Ranking	Q2(16.594)
Google Scholar Citations	3405
Google Scholar h-index	25
Google Scholar i10-index	103
OAJI Impact Factor	0.351
MIAR ICDS	7.5
Index Copernicus Value	94.81
Dimensions	351(0.93)

	2022	2021
Submissions Received	92	65
Submissions Accepted	29	22
Submissions Declined	53	41
Submissions Published	24	24
Days to Accept (x̄)	104	131
Days to Reject (x̄)	35	34
Acceptance Rate	24%	35%
Rejection Rate	76%	65%

Speech recognition based on Spanish accent acoustic model

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

followus

indices

impacto

stats

Current Issue