Reducción de Tiempos de Entrenamiento de Algoritmos de Aprendizaje de Máquina a través de Tutoría por parte de un Experto Externo

Salazar Villareal, Carlos Enrique

Reducción de Tiempos de Entrenamiento de Algoritmos de Aprendizaje de Máquina a través de Tutoría por parte de un Experto Externo

dc.contributor.advisor	Calderon Chavez, Juan Manuel
dc.contributor.author	Salazar Villareal, Carlos Enrique
dc.contributor.corporatename	Universidad Santo Tomás	spa
dc.contributor.cvlac	https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000380938
dc.contributor.orcid	https://orcid.org/0000-0002-4471-3980
dc.contributor.orcid	https://orcid.org/0000-0002-3454-3711
dc.date.accessioned	2023-09-26T00:26:45Z
dc.date.available	2023-09-26T00:26:45Z
dc.date.issued	2023-09-25
dc.description	Este proyecto de grado presenta el planteamiento e implementación de una política de aprendizaje para redes neuronales, basándose y refinando técnicas ya existentes entrenado mediante experiencias de un agente externo (experto), tomando como referencia una arquitectura de red neuronal convolucional. Se detalla el proceso de selección del ambiente problemático, la arquitectura y el paradigma seleccionado. Así mismo, se realizan pruebas múltiples para confirmar el desempeño. Se entrenan con distintas políticas de entrenamiento. Finalmente se evalúa el rendimiento de las distintas políticas entrenado con respecto a una política base \textit{exploración/explotación}. Como producto final se presenta: el dataset experto de referencia, un repositorio con los programas realizados para el desarrollo junto a la implementación de la política.	spa
dc.description.abstract	This degree project presents the approach and implementation of a learning policy for neural networks, based on and refining existing techniques trained through the experience of an external agent (expert), taking as reference a convolutional neural network architecture. The selection process of the problematic environment, the architecture and the selected paradigm are detailed. Likewise, multiple tests are performed to confirm the performance. They are trained with different training policies. Finally, the performance of the different trained policies is evaluated with respect to a base policy \textit{exploration/exploitation}. As a final product we present: the reference expert dataset, a repository with the programs made for the development together with the implementation of the policy.	spa
dc.description.degreelevel	Pregrado	spa
dc.description.degreename	Ingeniero Electronico	spa
dc.format.mimetype	application/pdf
dc.identifier.citation	Salazar Villareal, C. E. (2023). Reducción de Tiempos de Entrenamiento de Algoritmos de Aprendizaje de Máquina a través de Tutoría por parte de un Experto Externo. [Trabajo de Grado, Universidad Santo Tomás]. Repositorio Institucional.	spa
dc.identifier.instname	instname:Universidad Santo Tomás	spa
dc.identifier.reponame	reponame:Repositorio Institucional Universidad Santo Tomás	spa
dc.identifier.repourl	repourl:https://repository.usta.edu.co	spa
dc.identifier.uri	http://hdl.handle.net/11634/52386
dc.language.iso	spa
dc.publisher	Universidad Santo Tomás	spa
dc.publisher.branch	CRAI-USTA Bogotá	spa
dc.publisher.faculty	Facultad de Ingeniería Electrónica	spa
dc.publisher.program	Pregrado Ingeniería Electrónica	spa
dc.relation.references	A. Rimassa, J. Luciano, C. Zurita, J. Paul, V. Bautista y S. Francisco, «Reconocimiento de tumores y patologías cerebrales mediante inteligencia artificial.,» 2022. dirección: http://www.dspace.uce.edu.ec/handle/25000/27244.	spa
dc.relation.references	S. Ben-David, E. Kushilevitz e Y. Mansour, «Online Learning versus Offline Learning,» Machine Learning, vol. 29, mayo de 1997. doi: 10. 1023/A:1007465907571.	spa
dc.relation.references	Y. Lecun, L. Bottou, Y. Bengio y P. Haffner, «Gradient-based learning applied to document recognition,» Proceedings of the IEEE, vol. 86, n.o 11, págs. 2278-2324, 1998. doi: 10.1109/5.726791.	spa
dc.relation.references	E. Strubell, A. Ganesh y A. McCallum, Energy and Policy Considerations for Deep Learning in NLP, 2019. doi: 10 . 48550 / ARXIV . 1906 . 02243. dirección: https://arxiv.org/abs/1906.02243.	spa
dc.relation.references	S. Chaudhury, D. Kimura, T. Inoue y R. Tachibana, Model-based imitation learning from state trajectories, 2018. dirección: https://openreview.net/forum?id=S1GDXzb0b.	spa
dc.relation.references	K. Judah, A. Fern, P. Tadepalli y R. Goetschalckx, «Imitation Learning with Demonstrations and Shaping Rewards,» Proceedings of the AAAI Conference on Artificial Intelligence, vol. 28, n.o 1, jun. de 2014. doi: 10 . 1609 / aaai . v28i1 . 9024. dirección: https : / / ojs . aaai . org / index.php/AAAI/article/view/9024.	spa
dc.relation.references	X. Guo, S. Chang, M. Yu, M. Liu y G. Tesauro, Faster Reinforcement Learning with Expert State Sequences, 2018. dirección: https://openreview.net/forum?id=BJ7d0fW0b.	spa
dc.relation.references	I. Radosavovic, X. Wang, L. Pinto y J. Malik, «State-Only Imitation Learning for Dexterous Manipulation,» en 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, págs. 7865-7871. doi: 10.1109/IROS51168.2021.9636557.	spa
dc.relation.references	F. Torabi, G. Warnell y P. Stone, Recent Advances in Imitation Learning from Observation, 2019. arXiv: 1905.13566 [cs.RO].	spa
dc.relation.references	H. M. L. Yisong Yue, IMITATION LEARNING TUTORIAL, 2018. dirección: https://sites.google.com/view/icml2018-imitation- learning/.	spa
dc.relation.references	S. Singh, What is Imitation Learning? 2019. dirección: https : / / deeplearninguniversity.com/what-is-imitation-learning/.	spa
dc.relation.references	M. Schaarschmidt, A. Kuhnle, B. Ellis, K. Fricke, F. Gessert y E. Yoneki, LIFT: Reinforcement Learning in Computer Systems by Learning From Demonstrations, 2018. doi: 10 . 48550 / ARXIV . 1808 . 07903. dirección: https://arxiv.org/abs/1808.07903.	spa
dc.relation.references	R. Marcus, P. Negi, H. Mao, N. Tatbul, M. Alizadeh y T. Kraska, «Bao: Making Learned Query Optimization Practical,» en Proceedings of the 2021 International Conference on Management of Data, ép. SIGMOD ’21, Virtual Event, China: Association for Computing Machinery, 2021, págs. 1275-1288, isbn: 9781450383431. doi: 10 . 1145 / 3448016 . 3452838. dirección: https://doi.org/10.1145/3448016.3452838.	spa
dc.relation.references	R. Marcus, P. Negi, H. Mao et al., «Neo,» Proceedings of the VLDB Endowment, vol. 12, n.o 11, págs. 1705-1718, jul. de 2019. doi: 10. 14778/3342263.3342644. dirección: https://doi.org/10.14778% 2F3342263.3342644.	spa
dc.relation.references	M. F. Argerich, J. Fürst y B. Cheng, «Tutor4RL: Guiding Reinforcement Learning with External Knowledge.,» en AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering (1), 2020.	spa
dc.relation.references	O. Rivlin, Reinforcement Learning Using a Single Demonstration, 2019. dirección: https://towardsdatascience.com/reinforcement- learning-using-a-single-demonstration-7889fe5e9f41.	spa
dc.relation.references	X. Guo, S. Chang, M. Yu, G. Tesauro y M. Campbell, Hybrid Reinforcement Learning with Expert State Sequences, 2019. doi: 10 . 48550 / ARXIV . 1903 . 04110. dirección: https://arxiv.org/abs/1903.04110.	spa
dc.relation.references	R. Zhang, F. Torabi, L. Guan, D. H. Ballard y P. Stone, Leveraging Human Guidance for Deep Reinforcement Learning Tasks, 2019. arXiv: 1909.09906 [cs.AI]	spa
dc.relation.references	A. Aflakian, A. Rastegharpanah y R. Stolkin, «Boosting Performance of Visual Servoing Using Deep Reinforcement Learning From Multiple Demonstrations,» IEEE Access, vol. 11, págs. 26 512-26 520, 2023. doi: 10.1109/ACCESS.2023.3256724	spa
dc.relation.references	A. M. Metelli, M. Pirotta y M. Restelli, «Compatible Reward Inverse Reinforcement Learning,» en The Thirty-first Annual Conference on Neural Information Processing Systems - NIPS 2017, Long Beach, United States, dic. de 2017. dirección: https://hal.inria.fr/hal- 01653328.	spa
dc.relation.references	R. M. J. V. M., «Efectividad, eficacia y eficiencia en equipos de trabajo,» Espacios, 2017.	spa
dc.relation.references	E. Elibol, J. Calderon, M. Llofriu, C. Quintero, W. Moreno y A. Weitzenfeld, «Power usage reduction of humanoid standing process using q-learning,» en RoboCup 2015: Robot World Cup XIX 19, Springer, 2015, págs. 251-263.	spa
dc.relation.references	G. Cardona, C. Bravo, W. Quesada et al., «Autonomous navigation for exploration of unknown environments and collision avoidance in mobile robots using reinforcement learning,» en 2019 SoutheastCon, IEEE, 2019, págs. 1-7.	spa
dc.relation.references	L. J. P. Reyes, N. B. Oviedo, E. C. Camacho y J. M. Calderon, «Adaptable Recommendation System for Outfit Selection with Deep Learning Approach,» IFAC-PapersOnLine, vol. 54, n.o 13, págs. 605-610, 2021.	spa
dc.relation.references	J. A. Cárdenas, U. E. Carrero, E. C. Camacho y J. M. Calderón, «Optimal PID ø axis Control for UAV Quadrotor based on Multi- Objective PSO,» IFAC-PapersOnLine, vol. 55, n.o 14, págs. 101-106, 2022.	spa
dc.relation.references	J. A. Cardenas, U. E. Carrero, E. C. Camacho y J. M. Calderon, «Intelligent Position Controller for Unmanned Aerial Vehicles (UAV) Based on Supervised Deep Learning,» Machines, vol. 11, n.o 6, pág. 606, 2023.	spa
dc.relation.references	A. M. Pedro Larranaga ~ Inaki ~ Inza, Tema 8. Redes Neuronales.	spa
dc.relation.references	J. Schmidhuber, «Deep Learning in Neural Networks: An Overview,» CoRR, vol. abs/1404.7828, 2014. arXiv: 1404.7828. dirección: http: //arxiv.org/abs/1404.7828.	spa
dc.relation.references	A. Gleave, M. Taufeeque, J. Rocamonde et al., imitation: Clean Imitation Learning Implementations, 2022. arXiv: 2211 . 11972 [cs.LG].	spa
dc.relation.references	B. Zheng, S. Verma, J. Zhou, I. Tsang y F. Chen, Imitation Learning: Progress, Taxonomies and Challenges, 2022. arXiv: 2106 . 12177 [cs.LG].	spa
dc.relation.references	A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez y V. Koltun, CARLA: An Open Urban Driving Simulator, 2017. arXiv: 1711.03938 [cs.LG].	spa
dc.relation.references	W. O. Quesada, J. I. Rodriguez, J. C. Murillo et al., «Leader-follower formation for UAV robot swarm based on fuzzy logic theory,» en Artificial Intelligence and Soft Computing: 17th International Conference, ICAISC 2018, Zakopane, Poland, June 3-7, 2018, Proceedings, Part II 17, Springer, 2018, págs. 740-751.	spa
dc.relation.references	D. Paez, J. P. Romero, B. Noriega, G. A. Cardona y J. M. Calderon, «Distributed particle swarm optimization for multi-robot system in search and rescue operations,» IFAC-PapersOnLine, vol. 54, n.o 4, págs. 1-6, 2021.	spa
dc.relation.references	J. León, G. A. Cardona, A. Botello y J. M. Calderón, «Robot swarms theory applicable to seek and rescue operation,» en Intelligent Systems Design and Applications: 16th International Conference on Intelligent Systems Design and Applications (ISDA 2016) held in Porto, Portugal, December 16-18, 2016, Springer, 2017, págs. 1061-1070.	spa
dc.relation.references	G. A. Cardona y J. M. Calderon, «Robot swarm navigation and victim detection using rendezvous consensus in search and rescue operations,» Applied Sciences, vol. 9, n.o 8, pág. 1702, 2019.	spa
dc.relation.references	N. Gómez, N. Peña, S. Rincón, S. Amaya y J. Calderon, «Leader- follower behavior in multi-agent systems for search and rescue based on pso approach,» en SoutheastCon 2022, IEEE, 2022, págs. 413-420	spa
dc.relation.references	B. Pallares, T. Rozo, E. C. Camacho, J. G. Guarnizo, J. M. Calderon et al., «Design and construction of a cost-oriented mobile robot for domestic assistance,» IFAC-PapersOnLine, vol. 54, n.o 13, págs. 293-298, 2021.	spa
dc.relation.references	L. G. Jaimes y J. M. Calderon, «An UAV-based incentive mechanism for Crowdsensing with budget constraints,» en 2020 IEEE 17th Annual Consumer Communications & Networking Conference (CCNC), IEEE, 2020, págs. 1-6.	spa
dc.relation.references	G. A. Cardona, J. Ramirez-Rugeles, E. Mojica-Nava y J. M. Calderon, «Visual victim detection and quadrotor-swarm coordination control in search and rescue environment,» International Journal of Electrical and Computer Engineering, vol. 11, n.o 3, pág. 2079, 2021.	spa
dc.relation.references	G. Cardona, M. Arevalo-Castiblanco, D. Tellez-Castro, J. Calderon y E. Mojica-Nava, «Robust Adaptive Synchronization of Interconnected Heterogeneous Quadrotors Transporting a Cable-Suspended Load,» en 2021 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2021, págs. 31-37.	spa
dc.relation.references	G. Cardona, D. Tellez-Castr, J. Calderon y E. Mojica-Nava, «Adaptive Multi-Quadrotor Control for Cooperative Transportation of a Cable-Suspended Load,» en 2021 European Control Conference (ECC), IEEE, 2021, págs. 696-701	spa
dc.relation.references	E. Elibol, J. Calderon, M. Llofriu, W. Moreno y A. Weitzenfeld, «Analyzing and reducing energy usage in a humanoid robot during standing up and sitting down tasks,» International Journal of Humanoid Robotics, vol. 13, n.o 04, pág. 1 650 014, 2016. 68	spa
dc.rights	Atribución-NoComercial-SinDerivadas 2.5 Colombia
dc.rights.accessrights	info:eu-repo/semantics/openAccess
dc.rights.coar	http://purl.org/coar/access_right/c_abf2	spa
dc.rights.local	Abierto (Texto Completo)	spa
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/2.5/co/
dc.subject.keyword	Artificial Intelligence	spa
dc.subject.keyword	Learning Policies	spa
dc.subject.keyword	Imitation Learning	spa
dc.subject.keyword	Reinforcement Learning	spa
dc.subject.keyword	Expert Learning	spa
dc.subject.lemb	Ingeniería Electrónica	spa
dc.subject.lemb	Teología	spa
dc.subject.lemb	Tendencia de la Investigación	spa
dc.subject.proposal	Inteligencia Artificial	spa
dc.subject.proposal	Políticas de Aprendizaje	spa
dc.subject.proposal	Aprendizaje por Imitación	spa
dc.subject.proposal	Aprendizaje por Refuerzo	spa
dc.subject.proposal	Aprendizaje por Experto	spa
dc.title	Reducción de Tiempos de Entrenamiento de Algoritmos de Aprendizaje de Máquina a través de Tutoría por parte de un Experto Externo	spa
dc.type.coar	http://purl.org/coar/resource_type/c_7a1f
dc.type.coarversion	http://purl.org/coar/version/c_ab4af688f83e57aa
dc.type.drive	info:eu-repo/semantics/bachelorThesis
dc.type.local	Trabajo de grado	spa
dc.type.version	info:eu-repo/semantics/acceptedVersion