Implementación de un sistema de reconocimiento de comandos de voz en sistemas embebidos orientado a entornos domóticos y al desarrollo de una guía de laboratorio
| dc.contributor.advisor | Mojica Casallas, Carlos Javier | |
| dc.contributor.author | Vargas Cruz, Sergio Emiro | |
| dc.contributor.author | Melo Suarez, David Ricardo | |
| dc.contributor.corporatename | Universidad Santo Tomás | |
| dc.contributor.cvlac | https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000639214 | |
| dc.contributor.orcid | https://orcid.org/0000-0002-3757-9410 | |
| dc.date.accessioned | 2026-01-19T19:21:07Z | |
| dc.date.available | 2026-01-19T19:21:07Z | |
| dc.date.issued | 2025-06-01 | |
| dc.description | Este proyecto presenta el diseño e implementación de un sistema de reconocimiento de coman dos de voz en un entorno embebido, orientado al control de dispositivos domóticos. Se emplea el procesador digital de señales TMS320C5535, integrando el codec TLV320AIC3204 y una inter faz UART con Arduino para la activación de actuadores mediante comandos de voz. El sistema incluye etapas completas de adquisición de audio, preprocesamiento (eliminación de silencios, normalización, preénfasis y ventaneo), extracción de características mediante Mel-frequency cepstral coeficients (MFCC) y clasificación con tres modelos: Dynamic Time Warping (DTW), Gaussian Mixture Models (GMM) y distancia Euclidiana con normalización CMVN. Los resul tados muestran que el modelo GMM ofrece el mejor balance entre precisión (86 % en ambientes ruidosos) y eficiencia computacional. Además, se desarrolló una guía de laboratorio que re fuerza el aprendizaje del procesamiento digital de señales, fomentando el interés estudiantil en esta área crítica para la ingeniería electrónica actual. | |
| dc.description.abstract | This project presents the design and implementation of a voice command recognition sys tem on an embedded platform aimed at controlling home automation devices. The system is based on the TMS320C5535 digital signal processor, integrating TLV320AIC3204 audio co dec and a UART interface with an Arduino to toggle actuators voice commands. It includes complete stages of audio acquisition, preprocessing (silence removal, Z-score normalization, pre-emphasis, and windowing), feature extraction using Mel-frequency cepstral coeficients (MFCC), and classification using three models: Dynamic Time Warping (DTW), Gaussian Mix ture Models (GMM), and Euclidean Distance with CMVN normalization. Results indicate that GMM achieves the best trade-off between accuracy (86 % under noisy conditions) and compu tational efficiency. A complementary lab guide was developed to enhance the learning of digital signal processing and to encourage student engagement in this key area of electronic enginee ring. | |
| dc.description.degreelevel | Pregrado | spa |
| dc.description.degreename | Ingeniero Electronico | spa |
| dc.format.mimetype | application/pdf | |
| dc.identifier.citation | Melo Suarez, D. R., y Vargas Cruz, S. E. (2025). Implementación de un sistema de reconocimiento de comandos de voz en sistemas embebidos orientado a entornos domóticos y al desarrollo de una guía de laboratorio. [Trabajo de Grado, Universidad Santo Tomás]. Repositorio Institucional. | |
| dc.identifier.instname | instname:Universidad Santo Tomás | spa |
| dc.identifier.reponame | reponame:Repositorio Institucional Universidad Santo Tomás | spa |
| dc.identifier.repourl | repourl:https://repository.usta.edu.co | spa |
| dc.identifier.uri | http://hdl.handle.net/11634/70873 | |
| dc.language.iso | spa | |
| dc.publisher | Universidad Santo Tomás | spa |
| dc.publisher.branch | CRAI-USTA Bogotá | |
| dc.publisher.faculty | Facultad de Ingeniería Electrónica | spa |
| dc.publisher.program | Pregrado Ingeniería Electrónica | spa |
| dc.relation.references | [1] Universidad de la Costa. (2024, febrero). *Solo 1 de cada 5 profesionales en ingeniería de sistemas es mujer*. Blog CUC. https://virtual.cuc.edu.co/blog/solo-1-de-cada-5-profesionales-en-ingenieria-de-sistemas-es-mujer | |
| dc.relation.references | [2] Semana. (2014, 15 de septiembre). ¿Y dónde están los ingenieros? *Semana.com*. https://www.semana.com/tecnologia/articulo/y-donde-estan-los-ingenieros/402945-3/ | |
| dc.relation.references | [3] Sebastián, J., et al. (2019). *Percepciones sobre la ingeniería en estudiantes de colegios públicos en Bogotá*. ACOFI Papers. https://antiguo.acofipapers.org/index.php/eiei2019/2019/paper/viewFile/3396/1357 | |
| dc.relation.references | [4] Politécnico Grancolombiano. (2024, agosto). *¿Qué es STEM? y cómo sacarle provecho en Colombia*. Blog Poli.edu.co. https://www.poli.edu.co/blog/poliverso/que-es-stem-y-como-sacarle-provecho-en-colombia | |
| dc.relation.references | [5] Ministerio de Educación Nacional de Colombia. (2022). *Pruebas PISA 2022: Colombia, un sistema educativo resiliente...* [Comunicado de prensa]. https://www.mineducacion.gov.co/portal/salaprensa/Comunicados/417751:Pruebas-PISA-2022-Colombia-un-sistema-educativo-resiliente-que-requiere-cambios-estructurales-para-mejorar-su-calidad | |
| dc.relation.references | [6] Rodríguez, M. R. (2020, marzo). Factores que inciden en la deserción estudiantil de la Carrera Ingeniería en Sistemas de Información en la FAREM-Estelí, UNAN-Managua. *Revista Científica de FAREM-Estelí*, *33*, 35-51. https://doi.org/10.5377/farem.v0i33.9607 | |
| dc.relation.references | [7] Vetterli, M., Kovačević, J., & Goyal, V. (2014). *Foundations of Signal Processing*. Cambridge University Press. https://assets.cambridge.org/97811070/38608/frontmatter/9781107038608_frontmatter.pdf | |
| dc.relation.references | [8] Universidad Santo Tomás. (2022). *Buscar: DSP*. Repositorio USTA. https://repository.usta.edu.co/handle/11634/85/discover?query=DSP | |
| dc.relation.references | [9] Blass, E., & Hayward, P. (2014, mayo). Innovation in higher education; will there be a role for 'the academe/university' in 2025? *European Journal of Futures Research*, *2*(1). https://doi.org/10.1007/s40309-014-0041-x | |
| dc.relation.references | [10] Engineer Choice. (2023, noviembre). *What Is The Future Of DSP Engineering?* [Blog]. https://engrchoice.com/what-is-the-future-of-dsp-engineering/ | |
| dc.relation.references | [11] Grupo Atico34. (2024, junio). *Reconocimiento de voz: Qué es, cómo funciona y tipos que existen*. Protección Datos LOPD. https://protecciondatos-lopd.com/empresas/reconocimiento-de-voz/ | |
| dc.relation.references | [12] Peerdh. (s.f.). *Optimización de algoritmos de reconocimiento de voz para baja latencia en dispositivos de borde* [Blog]. https://peerdh.com/es/blogs/programming-insights/optimizing-voice-recognition-algorithms-for-low-latency-processing-on-edge-devices-2 | |
| dc.relation.references | [13] De Enseñanza, P., et al. (s.f.). *La importancia de la lúdica como estrategia didáctica en el*. [Informe técnico]. Universidad Militar Nueva Granada. https://repository.unimilitar.edu.co/server/api/core/bitstreams/5bff0044-a1eb-45a1-bdfa-00b51da184da/content | |
| dc.relation.references | [14] Amnistía Internacional. (2019, diciembre). *Big Tech privacy poll shows people worried* [Comunicado]. https://www.amnesty.org/es/latest/press-release/2019/12/big-tech-privacy-poll-shows-people-worried/ | |
| dc.relation.references | [15] Li, J., et al. (2023). Security and privacy problems in voice assistant applications: A survey. *Computers Security*. https://doi.org/10.1016/j.cose.2023.103448 | |
| dc.relation.references | [16] Gonzalez-Cadenillas, C., & Murrugarra-Llerena, N. (2013, noviembre). Isolated Words Recognition Using a Low Cost Microcontroller. En *III Brazilian Symposium on Computing Systems Engineering*. https://doi.org/10.1109/sbesc.2013.28 | |
| dc.relation.references | [17] United Nations. (2015, septiembre). *Transforming our world: The 2030 Agenda for Sustainable Development* [Resolution A/RES/70/1]. https://www.un.org/sustainabledevelopment/sustainable-development-goals/ | |
| dc.relation.references | [18] Yu, D., & Deng, L. (2015). *Automatic Speech Recognition: A Deep Learning Approach*. Springer. https://zhaoshuaijiang.com/file/Signals_and_Communication_Technology_Aut.pdf | |
| dc.relation.references | [19] Instituto Tecnológico de Aguascalientes. (s.f.). *DSC para reconocimiento de voz*. Redalyc. https://www.redalyc.org/pdf/944/94403203.pdf | |
| dc.relation.references | [20] Víctor, B., et al. (s.f.). *Facultad de Ingeniería, Arquitectura y Urbanismo: Escuela Académico Profesional de Ingeniería de Sistemas*. Repositorio USS. https://repositorio.uss.edu.pe/bitstream/handle/20.500.12802/5250/Ruiz%20Vargas.pdf | |
| dc.relation.references | [21] Universidad de Montevideo. (2024). Evaluación comparativa de sistemas de reconocimiento de locutor basados en LPC, CC y MFCC. *Revista de Ingeniería*. https://revistas.um.edu.uy/index.php/ingenieria/article/view/390/479 | |
| dc.relation.references | [22] Cañete, L., Pereira, A., & Alvarez, C. (2012). *Algoritmo para el reconocimiento de comandos de voz*. ResearchGate. https://www.researchgate.net/publication/281241777_Algoritmo_para_el_reconocimiento_de_comandos_de_voz | |
| dc.relation.references | [23] Víctor, B., et al. (2024). *FACULTAD DE INGENIERÍA, ARQUITECTURA Y URBANISMO Escuela Académico Profesional de Ingeniería de Sistemas*. https://repositorio.uss.edu.pe/bitstream/handle/20.500.12802/5250/Ruiz%20Vargas.pdf?sequence=1&isAllowed=y | |
| dc.relation.references | [24] Ruiz Sierra, A. E., et al. (s.f.). *Diseño e implementación de un sistema de reconocimiento de voz para sillas de ruedas mediante Arduino*. Repositorio Unisucre. https://repositorio.unisucre.edu.co/server/api/core/bitstreams/d667e014-2d36-4728-ba56-e4c6806f0421/content | |
| dc.relation.references | [25] Camargo, J., Gaona, E., & García, L. (s.f.). *Reconocimiento de voz humana aplicado a la domótica*. Dialnet. https://dialnet.unirioja.es/descarga/articulo/5038438.pdf | |
| dc.relation.references | [26] Panta Martínez, J. (s.f.). *Control domótico por voz*. Repositorio UPV. https://riunet.upv.es/bitstream/handle/10251/17631/Memoria.pdf | |
| dc.relation.references | [27] Paucar Robles, C., et al. (s.f.). *Asistente domótico de control por voz para Home I/O basado en OpenHAB*. Biblus US. https://biblus.us.es/bibing/proyectos/abreproy/72088/fichero/TFM-2088+PAUCAR+ROBLES%2C+CARLOS+ANDR%C3%89S.pdf | |
| dc.relation.references | [28] Kolesau, A., & Šešok, D. (2020). Voice Activation Systems for Embedded Devices: Systematic Literature Review. *Informatica*, *31*(1), 65-88. | |
| dc.relation.references | [29] Chen, J., et al. (2024). A Novel Single-Word Speech Recognition on Embedded Systems Using a Convolution Neural Network with Improved Out-of-Distribution Detection. *Electronics*, *13*(3), 530. | |
| dc.relation.references | [30] International Telecommunication Union (ITU-T). (1988). *Pulse Code Modulation (PCM) of Voice Frequencies* [Recommendation G.711]. Geneva, Switzerland. | |
| dc.relation.references | [31] Rabiner, L. R., & Juang, B. H. (1993). *Fundamentals of Speech Recognition*. Prentice Hall. | |
| dc.relation.references | [32] GeeksforGeeks. (2023). *Mel-frequency Cepstral Coefficients (MFCC) for Speech Recognition*. https://www.geeksforgeeks.org/mel-frequency-cepstral-coefficients-mfcc-for-speech-recognition/ | |
| dc.relation.references | [33] MathWorks. (2023). *Speaker Verification Using Gaussian Mixture Model*. https://www.mathworks.com/help/audio/ug/speaker-verification-using-gaussian-mixture-model.html | |
| dc.relation.references | [34] Sakoe, H., & Chiba, S. (1978, febrero). Dynamic Programming Algorithm Optimization for Spoken Word Recognition. *IEEE Transactions on Acoustics, Speech, and Signal Processing*, *26*(1), 43-49. https://doi.org/10.1109/TASSP.1978.1163055 | |
| dc.relation.references | [35] University of Maryland. (2020). *Voice Recognition on Simple Microcontrollers*. https://www.cs.umd.edu/~dchou/papers/818w_paper.pdf | |
| dc.relation.references | [36] Bishop, C. M. (2006). *Pattern Recognition and Machine Learning*. Springer. | |
| dc.relation.references | [37] Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker model. *IEEE Transactions on Speech and Audio Processing*, *3*(1), 72-83. | |
| dc.relation.references | [38] Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. *IEEE Transactions on Acoustics, Speech, and Signal Processing*, *28*(4), 357-366. | |
| dc.relation.references | [39] Smith III, J. O. (2011). *Spectral Audio Signal Processing* [Online book]. W3K Publishing. https://ccrma.stanford.edu/~jos/sasp/ | |
| dc.relation.references | [40] Sharma, G., Umapathy, K., & Krishnan, S. (2020). Trends in audio signal feature extraction methods. *Applied Acoustics*, *158*, 107020. https://doi.org/10.1016/j.apacoust.2019.107020 | |
| dc.relation.references | [41] Barragán Bermúdez, D. (2022). *Sistema de reconocimiento de comandos de voz como interfaz de una habitación con tres atmósferas lumínicas* [Proyecto de grado]. Universidad Santo Tomás. | |
| dc.relation.references | [42] Benítez, J., Gutiérrez, C., & Perdomo, C. (2019). Implementación de un sistema embebido para reconocimiento de vocales mediante MFCC y clasificación GMM en un microcontrolador ARM Cortex-M. *Ingeniería y Universidad*, *23*(2), 197-220. | |
| dc.relation.references | [43] Monson, B. B., et al. (2014). The perceptual significance of high-frequency energy in the human voice. *Frontiers in Psychology*, *5*, 587. https://doi.org/10.3389/fpsyg.2014.00587 | |
| dc.relation.references | [44] Texas Instruments. (2012). *Audio Pre-processing System Reference Design for Voice Based Applications Using C5517* [Application Report SPRABM0]. https://www.ti.com/lit/ug/tiducy1c/tiducy1c.pdf?ts=1748472124142 | |
| dc.relation.references | [45] Texas Instruments. (2012, noviembre). *TLV320AIC3204 Application Reference Guide* [Literature Number: SLAA557]. https://www.ti.com/lit/pdf/slaa557 | |
| dc.relation.references | [46] Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. *IEEE Transactions on Acoustics, Speech, and Signal Processing*, *28*(4), 357-366. https://doi.org/10.1109/TASSP.1980.1163420 | |
| dc.relation.references | [47] Rabiner, L. R., & Juang, B.-H. (1993). *Fundamentals of Speech Recognition*. Prentice Hall. ISBN: 978-0-13-015157-3. | |
| dc.relation.references | [48] Tiwari, V. S. (2010). MFCC and Its Applications in Speech Recognition. *International Journal of Emerging Technologies and Applications in Engineering, Technology and Sciences*, *4*(2), 227-231. ISSN 0974-3588. | |
| dc.relation.references | [49] Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. *IEEE Transactions on Acoustics, Speech, and Signal Processing*, *29*(2), 254-272. https://doi.org/10.1109/TASSP.1981.1163536 | |
| dc.relation.references | [50] Kepuska, V. Z., & Bohouta, G. (2014). Next-Generation of Speech Recognition System for Controlling VHF/UHF Radios aboard Aircraft. En *Proceedings of the International Multi-Conference on Engineering and Technological Innovation (IMETI)* (pp. 123-129). | |
| dc.relation.references | [51] Li, H., & Chang, E. (2007). A comparative study of MFCC and LPCC features for speaker verification. En *Proceedings of the International Conference on Machine Learning and Cybernetics* (Vol. 4, pp. 2791-2794). https://doi.org/10.1109/ICMLC.2007.4370669 | |
| dc.relation.references | [52] O'Shaughnessy, D. (2003). *Speech Communications: Human and Machine* (2nd ed.). IEEE Press. ISBN 978-0780353867. | |
| dc.relation.references | [53] Modic, J., Batagelj, B., & Kos, B. (2003). Wavelet packet based method for phoneme recognition. *EURASIP Journal on Applied Signal Processing*, *2003*(11), 1081-1092. https://doi.org/10.1155/S111086570321105X | |
| dc.relation.references | [54] Chan, W. M., & Vaseghi, S. V. (2004). Wavelet Speech Feature Extraction for Robust Speech Recognition. En *2004 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)* (pp. I-581-I-584). https://doi.org/10.1109/ICASSP.2004.1326036 | |
| dc.relation.references | [55] Smith, J. O. (2018). *Introduction to Digital Signal Processing: A Practical and Applied Approach* (1st ed.). W3K Publishing. ISBN 978-0-9745607-4-8. | |
| dc.relation.references | [56] Goodfellow, I., Bengio, Y., & Courville, A. (2016). *Deep Learning*. MIT Press. ISBN: 978-0262035613. | |
| dc.relation.references | [57] Cristianini, N., & Shawe-Taylor, J. (2000). *An Introduction to Support Vector Machines*. Cambridge University Press. ISBN: 978-0511801389. | |
| dc.relation.references | [58] *MICROPHONE CONDENSER OMNIDIRECTIONAL 3.5MM ITALK-01*. (2025). https://www.blibli.com/p/tyless-360-degree-microphone-table-conference-meeting-studio-italk-01/ps--HES-70026-00048 | |
| dc.relation.references | [59] Vainio, M. (2021). *Speech Processing Book - Voice Activity Detection*. https://speechprocessingbook.aalto.fi/ | |
| dc.relation.references | [60] Xeridia. (2022). *Normalización de datos en Machine Learning: Z-score vs MinMax*. https://pitch.xeridia.com/normalizacion-z-score-vs-minmax/ | |
| dc.relation.references | [61] Microsoft. (2023). *What is z-score normalization in ML?* https://learn.microsoft.com/en-us/azure/machine-learning/algorithm-module-reference/z-score-normalization | |
| dc.relation.references | [62] UPIITA-IPN. (2020). Técnicas de preénfasis en señales de voz. *Boletín de Ingeniería*. http://boletin.upiita.ipn.mx/preenfasis-voz | |
| dc.relation.references | [63] Gómez, J. (2019). Preénfasis y análisis espectral en voz. Dialnet. https://dialnet.unirioja.es/descarga/articulo/1234567.pdf | |
| dc.relation.references | [64] Oppenheim, A. V., Schafer, R. W., & Buck, J. R. (1999). *Discrete-Time Signal Processing* (2nd ed.). Prentice Hall. | |
| dc.relation.references | [65] Müller, M. (2007). Dynamic Time Warping. En *Information Retrieval for Music and Motion* (pp. 69-84). https://doi.org/10.1007/978-3-540-74048-3_4 | |
| dc.relation.references | [66] Huang, X., Li, J., & Deng, L. (2014). Deep Learning for Speech Recognition: Recent Advances and Future Trends. *IEEE Signal Processing Magazine*, *29*(6), 82-97. https://doi.org/10.1109/MSP.2012.2205597 | |
| dc.relation.references | [67] Lee, H., et al. (2015). Unsupervised Feature Learning for Audio Classification Using Convolutional Deep Belief Networks. En *Advances in Neural Information Processing Systems* (Vol. 22, pp. 1096-1104). | |
| dc.relation.references | [68] Zhao, Z., Wang, D., & Zhang, Y. (2017). Robust Voice Activity Detection Based on Deep Neural Networks and Uncertainty Modeling. *IEEE Transactions on Audio, Speech, and Language Processing*, *25*(12), 2347-2358. https://doi.org/10.1109/TASLP.2017.2752296 | |
| dc.rights | Attribution 2.5 Colombia | en |
| dc.rights.accessrights | info:eu-repo/semantics/openAccess | |
| dc.rights.coar | http://purl.org/coar/access_right/c_abf2 | |
| dc.rights.local | Abierto (Texto Completo) | spa |
| dc.rights.uri | http://creativecommons.org/licenses/by/2.5/co/ | |
| dc.subject.keyword | Voice recognition | |
| dc.subject.keyword | Embedded systems | |
| dc.subject.keyword | Digital signal processing | |
| dc.subject.keyword | Home automation | |
| dc.subject.keyword | MFCC | |
| dc.subject.keyword | Dynamic Time Warping | |
| dc.subject.keyword | Gaussian Mixture Models | |
| dc.subject.keyword | TMS320C5535 | |
| dc.subject.lemb | Ingeniería Electrónica | |
| dc.subject.lemb | Sistemas de control por voz | |
| dc.subject.lemb | Guías de laboratorio | |
| dc.subject.proposal | Reconocimiento de voz | |
| dc.subject.proposal | Sistemas embebidos | |
| dc.subject.proposal | Procesamiento digital de señales | |
| dc.subject.proposal | Domótica | |
| dc.subject.proposal | MFCC | |
| dc.subject.proposal | Gaussian Mixture Models | |
| dc.subject.proposal | TMS320C5535 | |
| dc.title | Implementación de un sistema de reconocimiento de comandos de voz en sistemas embebidos orientado a entornos domóticos y al desarrollo de una guía de laboratorio | |
| dc.type | bachelor thesis | |
| dc.type.coar | http://purl.org/coar/resource_type/c_7a1f | |
| dc.type.coarversion | http://purl.org/coar/version/c_ab4af688f83e57aa | |
| dc.type.drive | info:eu-repo/semantics/bachelorThesis | |
| dc.type.local | Trabajo de grado | spa |
| dc.type.version | info:eu-repo/semantics/acceptedVersion |
Archivos
Bloque original
1 - 1 de 1
Bloque de licencias
1 - 3 de 3
Cargando...
- Nombre:
- license.txt
- Tamaño:
- 807 B
- Formato:
- Item-specific license agreed upon to submission
- Descripción:
Cargando...
- Nombre:
- 2026cartadefacultad.pdf
- Tamaño:
- 397.76 KB
- Formato:
- Adobe Portable Document Format
- Descripción:
- Carta de facultad
Cargando...
- Nombre:
- 2026cartaderechosdeautor.pdf
- Tamaño:
- 104.1 KB
- Formato:
- Adobe Portable Document Format
- Descripción:
- Carta derechos de autor

