Recent seminars

Europe/Lisbon
Online

Gustavo Soutinho, Faculdade de Economia da Universidade do Porto e Instituto Superior de Saúde Pública da Universidade do Porto
Métodos para a Verificação do Pressuposto de Markov em Modelos Multiestado – Aplicação a Dados Reais Usando a Biblioteca R MarkovMSM

Os modelos multiestado permitem descrever processos complexos nos quais os indivíduos se podem mover entre um número finito de estados ao longo do tempo. No caso de aplicações biomédicas, através deste tipo de modelos, é possível analisar a progressão de uma doença; investigar o efeito de preditores para o aumento do risco de transição entre estados; ou efetuar predições de probabilidades de transição para estados futuros dado o histórico de eventos. Em ambos os casos, uma avaliação prévia do pressuposto de Markov é fundamental para evitar, por exemplo, inconsistências nas estimativas obtidas. No seminário serão introduzidos os conceitos fundamentais sobre modelos multiestado, assim como diferentes métodos de inferência e validação do pressuposto de Markov (retirados da literatura e outros publicados pelo orador). Por fim, serão apresentados exemplos práticos de aplicação dos métodos a dados reais na área da saúde usando para tal a biblioteca R markovMSM.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Erida Gjini, CEMAT, Instituto Superior Técnico
Studying co-infection systems with many strains using the replicator equation

Understanding co-infection systems with multiple interacting strains remains difficult. High dimensionality and complex nonlinear feedbacks make the analytical study of such systems very challenging. When strains are similar, we can model trait variation as perturbations in parameters, which simplifies analysis. Applying singular perturbation theory to such multi-strain system we have obtained the explicit collective dynamics in terms of: a fast (neutral) dynamics, and a slow (non-neutral) dynamics. The slow dynamics are given by the replicator equation for strain frequencies, a key equation in evolutionary game theory, which in our case governs selection among N strains. In this talk, I will highlight some key features of this derivation, the use of the replicator equation to better understand such multi-strain system, and discuss links with diversity data both in epidemiology and ecology.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Room 6.4.30, Faculty of Sciences of the Universidade de Lisboa — Online

Bruno Santos, Stockholm University, Sweden
New Approaches To Study Over-Coverage In Population Registers

Accurate values for population estimates are always a challenge and can be further burdened in scenarios with high migrant mobility. In these cases, bias emanating from over-coverage, i.e., resident individuals whose death or emigration is not registered, may result in significant ramifications for policymaking and research. In this talk, we will consider different approaches to obtain over-coverage estimates, using Swedish Population registers for the period 2003-2016. We will discuss difficulties regarding the high dimensionality of the data and we will show current developments and ideas for the future.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Ana M. Bianco and Graciela Boente, Universidad de Buenos Aires & CONICET
Addressing robustness in covariate-specific ROC curves

The Receiver Operating Characteristic curve (ROC curve) is a graphical tool that assesses the accuracy of a classification method based on a continuous random variables, usually known as the marker. Nowadays it is a well-accepted technique that reflects how well this classifier discriminates between two different groups or classes.

In this talk, our focus will be in situations where some covariates with impact on the performance of the ROC curve are registered so it is advisable to incorporate this additional information into the study. We take account the covariate effect through regression models. More precisely, for each population, the markers distribution is modelled separately in terms of the covariates and just after, the induced ROC curve is computed.

The motivation of this talk is the extended belief that ROC curves are robust. Our talk tackles the concept of robustness in the sense of protection against anomalous data in the sample. Aware of the impact that outlying values may have on the diagnostic test accuracy, we center our attention on the robust aspects of the estimation procedures of the conditional ROC curve. Moreover, since regression models are involved in both the direct and induced approaches, atypical data among the responses or the covariates may severely affect the estimation methods. To achieve robustness is even more complex when dealing with functional data, since, in such a situation, different types of atypical data may arise.

Due to the lack of stability of the classical ROC curve estimators, when there are outliers between observations, we will introduce a procedure to obtain robust estimators within the framework of the induced methodology. The proposal is based on a semi-parametric approach in which for each sample a regression model is robustly fitted to the marker and estimators of the distribution functions of the adaptive errors are considered to down-weight large residuals. Robust procedures will be introduced both when there are real covariates and functional covariates.

We will present results regarding the uniform consistency of the estimators. The infinite-sample numerical study illustrates the robustness of the proposal. A real data set is also analysed.

The methods to be described are based on the following papers:

  • Bianco, A. M. and Boente, G. (2022). Addressing robust estimation in covariate specific ROC curves. Econometrics and Statistics.
  • Bianco, A. M., Boente, G. and Gonzalez Manteiga, W. (2022). Robust consistent estimators for ROC curves with covariates. Electronic Journal of Statistics, 16, 4133-4161.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Catarina Padrela Loureiro, Departamento de Matemática, Instituto Superior Técnico
Air Quality Data Analysis with Symbolic Principal Components

Air pollution is a global challenge with deep implications in public health and environment. We examine air quality data from a monitoring station in Entrecampos, Lisbon, using Symbolic Data Analysis. The dataset consists of hourly concentrations of nine pollutants during three years, which are logarithmically transformed and aggregated in intervals, taking the daily minimum and maximum values. The symbolic mean and variance are estimated for each variable through the method of moments, and the pairwise dependencies are captured using a bivariate copula. Symbolic principal component scores are obtained from the estimated covariance matrix and used to fit generalized extreme value distributions. Control charts, based on these distributions' quantiles, are used to identify outlying observations. A comparative analysis with daily average-based outlier detection methods is conducted. The results show the relevance of Symbolic Data Analysis in revealing new insights into air quality.

Joint seminar CEMAT and CEAUL