Pregrado Estadística

URI permanente para esta colecciónhttp://hdl.handle.net/11634/178

Examinar

Mostrando 1 - 20 de 105

Datos sintéticos: Introducción a técnicas generativas y evaluación de calidad
(Universidad Santo Tomás, 2026-02-04) Cleves Leguízamo, Diego Andrés; , Javier Mauricio; Universidad Santo Tomás; https://orcid.org/0009-0003-5914-4156
This work studies synthetic data from its theoretical conception to its generation. Several models are implemented to synthesize categorical and numerical data and are compared in terms of data masking capability (propensity), statistical similarity, and execution time. The results indicate that rule-based simulation is the most effective approach for categorical variables, while numerical variables could not be adequately synthesized due to the models’ inability to capture the copula structure. The conclusions discuss the identified limitations and potential improvements.
Herramientas para depuración y análisis de datos en investigación de mercados
(Universidad Santo Tomás, 2015) Guevara Núñez, Jeferson Alejandro; Zea Castro, José Fernando; Universidad Santo Tomás
Análisis de Incidentes de Actividad Criminal en Colombia (2023) Usando Modelos de Regresión para Datos de Conteo.
(Universidad Santo Tomás, 2024-12-10) Montes Montes, Laura Valentina; Pineda-Ríos, Wilmer Darío; Universidad Santo Tomás; https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0001454199; https://scholar.google.es/citations?user=4-t7xVcAAAAJ&hl=es&oi=ao; https://orcid.org/0000-0001-7774-951X
This study presents a statistical analysis of thefts from businesses in Colombia, using data from the country’s 32 departments. The primary objective is to identify and model the economic, social, and spatial factors that explain the incidence of this type of crime, with the aim of providing analytical tools for the formulation of more effective public policies. The analysis begins with an initial exploration of the data, during which significant spatial patterns of thefts were identified, confirmed by a positive and statistically significant Moran’s Index. This finding suggests spatial dependence between departments. Based on this diagnostic, Poisson and Negative Binomial Regression Models were implemented, adjusted for population as an exposure variable, to model theft rates instead of absolute counts, thereby facilitating comparability across regions with different population sizes. The parsimonious Poisson Model proved to be a robust tool for analysis; however, the presence of overdispersion in the data justified the implementation of the Negative Binomial Model, which includes an additional parameter to capture excess variability. The results of both models identified key factors influencing the reduction of thefts from businesses: GDP, levels of monetary poverty, crime rates, and a spatial effect measured by the lag of thefts in neighboring departments. In particular, GDP showed a negative and statistically significant effect, underscoring the role of economic development in crime mitigation. The lagged thefts term highlighted the importance of spatial effects in the propagation or containment of thefts, indicating that shared departmental dynamics significantly influence the observed outcomes. The evaluation of residuals from both models, through graphs and Moran’s Index analysis, confirmed the absence of spatial autocorrelation in the residuals, validating the statistical and spatial specifications of the adjusted models. Moreover, the Negative Binomial Model demonstrated superior fit according to the AIC and the inclusion of over-dispersion (α = 0.1149). In conclusion, the study underscores the importance of economic and spatial factors in explaining thefts from businesses, highlighting the need for coordinated regional policies and approaches based on economic development to address this issue. The implementation of joint strategies between neighboring departments and the continuous use of advanced models to monitor and evaluate crime patterns are recommended. This work provides a robust and replicable analytical framework for the study of criminological phenomena in spatial contexts.
Modelo espacial con acercamiento bayesiano para el estudio de víctimas en Colombia en 2015.
(Universidad Santo Tomás, 2016) García Reina, Diana Patricia; Universidad Santo Tomás
Comparación entre regiones de confianza y credibilidad para el punto óptimo en una superficie de respuesta aplicada a un conjunto de datos reales
(Universidad Santo Tomás, 2017) Morales Salazar, Juan Carlos; Universidad Santo Tomás
Modelos de Clasificación Sensibles al costo para una base de CHURN
(Universidad Santo Tomás, 2016) Barón Mora, Fredi Alexsander; Universidad Santo Tomás
UNA NUEVA PRUEBA PARA EL PROBLEMA DE IGUALDAD DE VARIANZAS
(Universidad Santo Tomás, 2017) García Calvo, Mario Felipe; Universidad Santo Tomás
Un modelo de Áreas Pequeñas para la Estimación del Promedio de Costos de Compra y Arrendamiento de Vivienda en Cada Unidad de Planeación Zonal de Bogotá
(Universidad Santo Tomás, 2025-02-17) Rozo Álvarez, Angela Lucia; Tellez Pinerez, Cristian Fernando; Universidad Santo Tomás; https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000016463; https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0001664845; https://orcid.org/0000-0003-3869-1831; https://orcid.org/0000-0001-5902-8460
This research presents the estimates of the average costs of purchasing and renting housing in each Zonal Planning Unit of Bogota through the estimation in small areas, as well as data from the Multipurpose Survey 2021. It was estimated that the UPZs with the highest average cost are the Chapinero cluster: Chico Lago and El Refugio, Usaquén: Country Club and Santa Barbara and Usaquén. Additionally, the results are in line with the city's real estate structure in that the city's most prominent zones in the real estate sector are geographically consistent with those with the highest average.
Evaluación y Optimización de la Inversión en Medios Publicitarios
(Universidad Santo Tomás, 2024-06-09) Contreras Rodriguez, Nicolás; Sierra, Javier Mauricio; Universidad Santo Tomás; https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0001567974; https://scholar.google.com/citations?hl=es&user=WPVb1csAAAAJ; https://orcid.org/0009-0003-5914-4156
In a volatile economic environment, many financial institutions have seen considerable fluctuations in their share prices following the launch of digital advertising campaigns. in their share prices following the launch of digital advertising campaigns. While these campaigns increase brand recognition, they are often associated with significant volatility in the market. This phenomenon underscores the importance of strategically aligning marketing efforts with financial objectives to mitigate risks and take advantage of opportunities. financial objectives in order to mitigate risks and take advantage of opportunities. Marketing in the banking sector is crucial not only for attracting new customers, but also for maintaining the loyalty of current and improving loyalty of existing customers and improve brand perception. Marketing investments can directly influence share price. directly influence stock prices. This study will analyze how investments in different advertising media specifically impact the stock performance of financial institutions. stock performance of financial institutions. Detailed data will be collected on advertising investments and stock prices. and stock prices will be collected. This data will be analyzed to forecast and optimize the long-term impact. long-term impact. In addition, optimization methods will be applied to determine the best allocation of the advertising portfolio and minimize risk. portfolio and minimize risk. A crucial aspect is the sensitivity and robustness of the model. Various approaches and techniques will be tested to ensure that the final recommendations are reliable and practical. Various methodologies will be evaluated to provide a holistic view. The results will provide valuable insights on how to maximize the return on marketing investment and improve financial performance through a well-planned strategy. In summary, the aim is to link marketing efforts with stock performance from a statistical point of view, offering practical data-driven recommendations to improve the effectiveness of advertising investments and strengthen the competitive position in the marketplace, as well as statistical results and procedures.
Aplicación de Machine Learning para la Estimación de la Rotación de Empleados en el Sector BPO
(Universidad Santo Tomás, 2024) Rey Guanumen, Camilo Andres; Sierra, Javier Mauricio; Universidad Santo Tomás; https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0001567974; https://scholar.google.es/citations?hl=es&user=WPVb1csAAAAJ; https://orcid.org/0009-0003-5914-4156
This study uses applications of statistical techniques and Machine Learning for the study of employee turnover Business Process Outsourcing, in Human Resources terms, BPO (Business Process Outsourcing) involves In terms of Human Resources, BPO (Business Process Outsourcing) involves outsourcing certain outsourcing certain functions and processes related to human talent management to a company specialized in this service. This may include activities such as such as recruitment and selection, payroll administration, benefits management, training and development, HR benefits management, training and development, performance management, among others. Throughout this the variables that affect this phenomenon will be studied and a Machine Learning model will be created to Machine Learning model for the prediction of employee turnover will be created, with the objective of the costs associated with employee turnover and the negative effects it has at the organizational level. at the organizational level. Translated with DeepL.com (free version)
Modelo Gam Espacial para la Tasa de Suicidio en los Departamentos de Colombia (2014-2019)
(Universidad Santo Tomás, 2024) Sánchez Cardona, Brahian; Bermudez Rubio, Dagoberto; Universidad Santo Tomás; https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000014678; https://orcid.org/0000-0002-2651-5665
This study aims to analyze the behavior of suicide rates in Colombia by depart ment between 2014 and 2019 using a spatiotemporal beta model. This model seeks to capture both spatial and temporal dependencies, providing a clear view of the factors influencing suicide rates. Recognizing that the suicide rate follows a beta distribution, the analysis considers the separable space-time correlation. Socio economic information by department is included to understand how these factors affect variations in suicide rates at the regional level. The results show significant relationships between spatial effects and socioeconomic factors, while the temporal effect varies in its importance. This study helps identify spatiotemporal patterns in suicide rates, providing valuable information for the formulation of public policies and intervention strategies in Colombia.
Predicciones del Número de Pasajeros de la Ruta 330 del SITP Utilizando Técnicas de Machine Learning: Random Forest y Silverkyte
(Universidad Santo Tomás, 2024) Mesa Cantillo, Yanela Alexandra; Moreno Lopez, Edna Carolina; Universidad Santo Tomás; https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0001381730; https://scholar.google.es/citations?hl=es&user=3HPuekUAAAAJ; https://orcid.org/0000-0002-1364-0096
This study utilizes data provided by Transmilenio, which includes access records to the services of the Integrated Public Transport System (SITP) corresponding to route 330 from the years 2020 to 2023. Various factors that could influence the users’ decision to use the SITP, such as holidays, weekends, and vacation periods, were analyzed. The main objective of this work is to understand the future demand for the use of the Integrated Public Transport System (SITP) through the analysis of passenger number data. Additionally, other relevant variables such as holidays, weekends, and vacation periods will be included to improve the model’s predictions.
Aplicación del Algoritmo de Expectación-Maximización con Bootstrap para la Imputación de Indicadores en Modelos de Regresión Beta Multinivel Bayesianos con Enlace Logit: Un Análisis de Panel para Explicar la Relación del Gasto Público en Educación, el Factor Productivo Nacional y el Factor Demográfico sobre la Pobreza en Educación del 2015 al 2023
(Universidad Santo Tomás, 2024) Montoya Casas, Michael Stibenson; Pacheco López, Mario José; Universidad Santo Tomás; https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000775479; https://scholar.google.es/citations?hl=es&user=a5SEoPgAAAAJ; https://orcid.org/0000-0003-4752-703X
In 2015, the United Nations (UN) established 17 Sustainable Development Goals (SDGs), with the primary aim of eradicating poverty, ensuring prosperity, and protecting the environment. These goals have a follow-up and evaluation agenda until 2030. Using official national data, historical variables were identified related to education poverty indicators, public spending on education, demographic factors (population density and migration), and per capita gross domestic product (GDP) at the departmental level. To analyze education poverty data at the departmental level between 2015 and 2018, the Expectation-Maximization with Bootstrap algorithm was applied, improving the robustness of the model due to the lack of historical data. Subsequently, a Multilevel Beta Regression model with logit link was used to assess the relationship between education poverty and variables such as public spending, demographic factors, and economic production at the departmental level from 2015 to 2023. The results showed no significant relationship between population density and education poverty, while the relationship between public spending and education poverty proved to be significant and relevant. Furthermore, national economic production had a positive and important relationship with education poverty indicators. The study also highlighted inequality in access to education in peripheral departments of the country. Although public spending is higher in areas with worse indicators, the focus and control of resources still need improvement. Finally, it was demonstrated that the imputation through the EMB algorithm better captured trends and relevant information about the educational context in Colombia.
Construcción de Modelos Predictivos para Clasificar Transacciones Legitimas o Fraudulentas Utilizando Algoritmos de Aprendizaje Automático
(Universidad Santo Tomás, 2024-02-06) Blanco Soler, Sergio Alfredo; Rubriche Cardenas, Juan Carlos; Universidad Santo Tomás; https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0001343533; https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0001425785; https://orcid.org/0000-0001-6812-2838
In the current era, the nature of transactions has evolved dramatically, shifting from traditional face-to-face interactions between cardholders and merchants to digital transactions through various channels, such as mobile applications, virtual banking, digital wallets, and electronic payment systems. This diversification has increased susceptibility to various forms of fraud, such as phishing, skimming, fraud by acquaintances, impersonation, and theft. These challenges have driven the need to implement more advanced tools than traditional rules to detect and prevent fraud. In this work, we address this challenge by proposing an approach based on supervised machine learning models. These models aim to detect fraudulent transactions in real-time or near real-time, minimizing false positives and alerting or declining transactions with a high likelihood of fraud.
Estimación de la Tasa de Suicidio en Colombia por Departamento para el Período 2014-2019 Mediante Regresión Beta Clásica, Semi-paramétrica y Bayesiana
(Universidad Santo Tomás, 2024-01-31) Jiménez Jiménez, Michelle Vanesa; Bermúdez Rubio, Dagoberto; Universidad Santo Tomás; https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000014678; https://orcid.org/0000-0002-2651-5665
This thesis examines suicide rates in Colombia, disaggregated by departments, during the period from 2014 to 2019. The estimation of these rates was conducted using generalized linear models and generalized additive models, addressing aspects of location, scale, and shape. It is noteworthy that the response variable, the suicide rate, assumes a beta distribution, so specifically, GLM, GAMLSS, and Bayesian beta models were employed for analysis. Additionally, Bayesian models were incorporated to enhance the analysis with an adaptive perspective based on observed and prior data. Socioeconomic information by department was included, allowing an understanding of these factors in the variations of suicide rates at the regional level. The aim is to comprehend the contextual determinants of suicide rates in Colombia
Extensión del Algoritmo ClustImpute para Variables Cualitativas y Mixtas: Una Aplicación al Capítulo de Cultivos de Bogotá D.C. del III Censo Nacional Agropecuario
(Universidad Santo Tomás, 2023) Rojas Pulido, William Camilo; Pacheco Lopéz, Mario José; Universidad Santo Tomás; https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000775479; https://scholar.google.com/citations?hl=es&user=a5SEoPgAAAAJ; https://orcid.org/0000-0003-4752-703X
In the current Colombian context marked by demographic, economic, and environmental changes, coupled with the significant role of the agricultural sector in the economy, the implementation of the III National Agricultural Census in 2014 emerges as a crucial tool to comprehend the multifaceted variables impacting this sector. With an operational coverage of 98.9%, this census provided detailed and updated information on the agricultural sector nationwide, including municipalities, indigenous territories, lands of black communities, and national parks. However, data analysis presents challenges such as the existence of 4% of records with missing data, which are addressed through statistical approaches like missing value imputation. Focusing on the Cultivation chapter in Bogotá, this work proposes the application of an extended version of the ClustImpute algorithm. By combining imputation techniques with the k-means method, this algorithm aims to address both quantitative and qualitative variables present in the census, offering an innovative alternative to conventional imputation methods. The ultimate goal is to provide a more comprehensive and reliable data analysis to contribute to the understanding and improvement of policies and efforts related to rural development and the quality of life in rural areas in Bogotá and, consequently, throughout the country.
Estudio para la Proyección de Llamadas Recibidas en un Centro de Experiencia Telefónica de una Entidad Financiera
(Universidad Santo Tomás, 2023-10-05) Ruiz Cotrino, Juan Pablo; Sánchez Segura, Deniz Andrea; https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0001657362; https://scholar.google.com/citations?hl=es&user=NO0xayoAAAAJ; https://orcid.org/0000-0001-9573-6704
Currently, many companies worldwide use various communication channels to respond to customer inquiries and reach out to potential clients to offer products. Among these channels is telephone contact, as it is widely used by the majority of the population, enabling companies to expand their reach and the benefits they can offer. To facilitate this contact, there are several call centers that act as a link between the company and the end customer. However, one of the major challenges in operating these centers is determining call flows during a specific period, as it is difficult to predict when they will significantly increase or decrease. Additionally, the exact number of agents working is not always known, making it challenging to assess response effectiveness. Consequently, customers may experience dissatisfaction due to extended wait times and a lack of availability to address their calls. On the other hand, the organization also has uncertainties about agent distribution, such as how many agents are needed each day to ensure the highest level of service, when during the day call volumes decrease to schedule agent breaks, and to determine the average duration of each call.
Modelo de Clasificación para los Anuncios en Tres Portales de Empleo de Colombia según la CIUO-08
(Universidad Santo Tomás, 2023-10-04) Leon Rocha, Laura Nathalia; Pacheco López, Mario José; Universidad Santo Tomás; https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000775479; https://scholar.google.com/citations?hl=es&user=a5SEoPgAAAAJ; https://orcid.org/0000-0003-4752-703X
The International Standard Classification of Occupations (ISCO) is a tool adopted by the International Conference of Labor Statisticians, which allows different types of jobs to be grouped through the activities and tasks of each of them. This work provides an overview of Machine Learning models for classification processes and Natural Language Processing (NLP) by implementing the Topic Modeling textual analysis tool in the descriptions of advertisements in three job portals in Colombia, with the aim of classifying them according to the ISCO to one digit. The classification methods Ada Boost, Naive Bayes, Random Forest, Knn, Decision Trees and Support Vector Machines are used to find the one that best fits the data and to properly order the ads. The Random Forest model was the one that had the greatest success in the nine binary models (one for each ISCO class), given that, for an advertisement, there are different professions that meet the requirements of the position.
Determinación de Factores Causales de Deserción Escolar en el Municipio de Zipaquirá Mediante un Modelo de Regresión Logit
(Universidad Santo Tomás, 2023-08-28) Rentería Guzmán, Matthew Enrique; Fonseca Gómez, Lida Rubiela; Beltrán Cortés, Oscar Javier; Universidad Santo Tomás; https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000125977; https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0001823744; https://scholar.google.com/citations?hl=es&user=uwl_sDgAAAAJ; https://scholar.google.com/citations?hl=es&user=kg-6NCsAAAAJ; https://orcid.org/0000-0002-3597-728X
School dropout is a problem that affects the development of a nation, since it affects the future of children, adolescents and young adults, since it increases the difficulty in obtaining stable and/or well-paid jobs. The study of the characteristics of school dropout has been the subject of permanent research carried out by the Ministry of National Education (MEN) whose advances and setbacks at the national level have been disclosed through technical reports such as its technical notes. The Zipaquirá Secretary of Education is no stranger to the issue of desertion, which is why it used to seek to measure this situation through the SIMPADE platform that worked until 2019. From 2020 to date, it does not have an adequate system to measure desertion, a situation which has had an impact on annual enrollment and other indicators. The present study seeks to identify the factors that most influence school dropout in the municipality of Zipaquirá, both in private and official schools, through the application of a Logit model to determine these factors.
Aplicación de un Modelo Multinomial Logístico con Categorías Ordinales y Binarias para Estudiar los Homicidios Producidos en el Valle del Cauca Durante los Años 2016 al 2019
(Universidad Santo Tomás, 2023-09-19) Novoa Acosta, Johnny Alexander; Bermúdez Rubio, Dagoberto; https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000014678; https://orcid.org/0000-0002-2651-5665
Homicide is a problem that has been latent in Valle del Cauca for many decades and has greatly permeated the history of this department; Located in the Colombian Pacific. Homicides (or also called intentional deaths or violent deaths) are events that are carried out for many reasons; that are associated with the settling of accounts, quarrels, theft, etc. This social situation, being so important, draws the attention of the national and international media every year due to its homicide rates and rates, which are sometimes worrying. For this reason, in this research an ordinal logistic multinomial model with different categories is implemented to analyze and characterize the violent deaths that occurred in the department of Valle del Cauca, Colombia, during the short period between 2016 and 2019. of schooling (illiterate, primary, secondary, technical/technologist and higher education) reached by the victims; It is proposed to model and identify the factors associated with these deaths and understand their influence and incidence in the region. Data from official sources are collected and relevant variables such as demographic, socioeconomic and geographical characteristics of the municipalities of Valle del Cauca are considered. These data areused to develop and fit the model, which allows one to examine the relationships between the predictor variables. The ordinal logistic multinomial model uses classical inferential statistical techniques and generalized linear models to estimate model parameters and provide uncertainty measures. for these estimates. The analysis of the results contributes to a better understanding of the factors associated with violent deaths in the region and provides valuable information for the design of social policies and crime prevention and control strategies in Valle del Cauca.

Examinar

Envíos recientes