2002 seminars

Europe/Lisbon
Online

Ivo Sousa-Ferreira, Department of Mathematics, University of Madeira

An Additive Shared Frailty Model Using The Non-Central Chi-Squared Distribution With Zero Degrees Of Freedom

Shared frailty models are particularly useful in recurrent events analysis to account for the within-subject dependence among event times. Usually, such models rely on the assumption that frailty acts multiplicatively on the hazard/rate function. However, in certain scenarios, it may be more realistic for frailty to be included in an additive way. Furthermore, the unobserved heterogeneity may be due to the presence of some subjects who are non-susceptible to the event of interest, and others with a varying degree of susceptibility.

This talk aims to introduce a new additive shared frailty model for recurrent gap time data, characterized by a Weibull rate function derived from a non-homogeneous Poisson process and by a mixed frailty following a non-central chi-squared distribution with zero degrees of freedom. It will be shown that the resulting model may have a competing risk interpretation. Additionally, the Weibull rate model and the classical homogeneous Poisson process are two special cases of degenerate frailty. A frequentist approach for parameter estimation using the maximum likelihood method will be discussed. An application to a well-known data set is provided for illustrative purposes.

The seminar will be taught in Portuguese, but the presentation slides will be in English.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Catarina Padrela Loureiro, Departamento de Matemática, Instituto Superior Técnico

Air Quality Data Analysis with Symbolic Principal Components

Air pollution is a global challenge with deep implications in public health and environment. We examine air quality data from a monitoring station in Entrecampos, Lisbon, using Symbolic Data Analysis. The dataset consists of hourly concentrations of nine pollutants during three years, which are logarithmically transformed and aggregated in intervals, taking the daily minimum and maximum values. The symbolic mean and variance are estimated for each variable through the method of moments, and the pairwise dependencies are captured using a bivariate copula. Symbolic principal component scores are obtained from the estimated covariance matrix and used to fit generalized extreme value distributions. Control charts, based on these distributions' quantiles, are used to identify outlying observations. A comparative analysis with daily average-based outlier detection methods is conducted. The results show the relevance of Symbolic Data Analysis in revealing new insights into air quality.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Ana M. Bianco and Graciela Boente, Universidad de Buenos Aires & CONICET

Addressing robustness in covariate-specific ROC curves

The Receiver Operating Characteristic curve (ROC curve) is a graphical tool that assesses the accuracy of a classification method based on a continuous random variables, usually known as the marker. Nowadays it is a well-accepted technique that reflects how well this classifier discriminates between two different groups or classes.

In this talk, our focus will be in situations where some covariates with impact on the performance of the ROC curve are registered so it is advisable to incorporate this additional information into the study. We take account the covariate effect through regression models. More precisely, for each population, the markers distribution is modelled separately in terms of the covariates and just after, the induced ROC curve is computed.

The motivation of this talk is the extended belief that ROC curves are robust. Our talk tackles the concept of robustness in the sense of protection against anomalous data in the sample. Aware of the impact that outlying values may have on the diagnostic test accuracy, we center our attention on the robust aspects of the estimation procedures of the conditional ROC curve. Moreover, since regression models are involved in both the direct and induced approaches, atypical data among the responses or the covariates may severely affect the estimation methods. To achieve robustness is even more complex when dealing with functional data, since, in such a situation, different types of atypical data may arise.

Due to the lack of stability of the classical ROC curve estimators, when there are outliers between observations, we will introduce a procedure to obtain robust estimators within the framework of the induced methodology. The proposal is based on a semi-parametric approach in which for each sample a regression model is robustly fitted to the marker and estimators of the distribution functions of the adaptive errors are considered to down-weight large residuals. Robust procedures will be introduced both when there are real covariates and functional covariates.

We will present results regarding the uniform consistency of the estimators. The infinite-sample numerical study illustrates the robustness of the proposal. A real data set is also analysed.

The methods to be described are based on the following papers:

  • Bianco, A. M. and Boente, G. (2022). Addressing robust estimation in covariate specific ROC curves. Econometrics and Statistics.
  • Bianco, A. M., Boente, G. and Gonzalez Manteiga, W. (2022). Robust consistent estimators for ROC curves with covariates. Electronic Journal of Statistics, 16, 4133-4161.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Room 6.4.30, Faculty of Sciences of the Universidade de Lisboa — Online

Bruno Santos, Stockholm University, Sweden

New Approaches To Study Over-Coverage In Population Registers

Accurate values for population estimates are always a challenge and can be further burdened in scenarios with high migrant mobility. In these cases, bias emanating from over-coverage, i.e., resident individuals whose death or emigration is not registered, may result in significant ramifications for policymaking and research. In this talk, we will consider different approaches to obtain over-coverage estimates, using Swedish Population registers for the period 2003-2016. We will discuss difficulties regarding the high dimensionality of the data and we will show current developments and ideas for the future.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Erida Gjini, CEMAT, Instituto Superior Técnico

Studying co-infection systems with many strains using the replicator equation

Understanding co-infection systems with multiple interacting strains remains difficult. High dimensionality and complex nonlinear feedbacks make the analytical study of such systems very challenging. When strains are similar, we can model trait variation as perturbations in parameters, which simplifies analysis. Applying singular perturbation theory to such multi-strain system we have obtained the explicit collective dynamics in terms of: a fast (neutral) dynamics, and a slow (non-neutral) dynamics. The slow dynamics are given by the replicator equation for strain frequencies, a key equation in evolutionary game theory, which in our case governs selection among N strains. In this talk, I will highlight some key features of this derivation, the use of the replicator equation to better understand such multi-strain system, and discuss links with diversity data both in epidemiology and ecology.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Gustavo Soutinho, Faculdade de Economia da Universidade do Porto e Instituto Superior de Saúde Pública da Universidade do Porto

Métodos para a Verificação do Pressuposto de Markov em Modelos Multiestado – Aplicação a Dados Reais Usando a Biblioteca R MarkovMSM

Os modelos multiestado permitem descrever processos complexos nos quais os indivíduos se podem mover entre um número finito de estados ao longo do tempo. No caso de aplicações biomédicas, através deste tipo de modelos, é possível analisar a progressão de uma doença; investigar o efeito de preditores para o aumento do risco de transição entre estados; ou efetuar predições de probabilidades de transição para estados futuros dado o histórico de eventos. Em ambos os casos, uma avaliação prévia do pressuposto de Markov é fundamental para evitar, por exemplo, inconsistências nas estimativas obtidas. No seminário serão introduzidos os conceitos fundamentais sobre modelos multiestado, assim como diferentes métodos de inferência e validação do pressuposto de Markov (retirados da literatura e outros publicados pelo orador). Por fim, serão apresentados exemplos práticos de aplicação dos métodos a dados reais na área da saúde usando para tal a biblioteca R markovMSM.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Taban Baghfalaki, Bordeaux University, Bordeaux, France

Dynamic Prediction of an Event Using Multiple Longitudinal Markers: a Model Averaging Approach

Dynamic event prediction, using joint modeling of survival time and longitudinal variables, is extremely useful in personalized medicine. However, estimating joint models that include multiple longitudinal markers remains a computational challenge due to the large number of random effects and parameters that need to be estimated. We propose a model-averaging strategy to combine predictions from several joint models for the event, including models with only one longitudinal marker or pairwise longitudinal markers. The prediction is computed as the weighted mean of the predictions from the one-marker or two-marker models, with the time-dependent weights estimated by minimizing the time-dependent Brier score. This method enables us to combine a large number of predictions issued from joint models to achieve a reliable and accurate individual prediction. The advantages and limitations of the proposed methods are highlighted by comparing them with the predictions from well-specified and misspecified all-marker joint models, as well as one-marker and two-marker joint models, using the available PBC2 dataset. The method is used to predict the risk of death in patients with primary biliary cirrhosis. The method is also used to analyze a French cohort study called the 3C data. In our study, seventeen longitudinal markers are being considered to predict the risk of death.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Ben Stevenson, University of Auckland, New Zealand

Penalised Regression Splines For Spatial Capture-Recapture

Understanding co-infection systems with multiple interacting strains remains difficult. High dimensionality and complex nonlinear feedbacks make the analytical study of such systems very challenging. When similar strains are similar, we can model trait variation as parameter perturbations, simplifying analysis. Applying singular perturbation theory to such a multi-strain system we have obtained the explicit collective dynamics in terms of fast (neutral) dynamics, and slow (non-neutral) dynamics. The slow dynamics are given by the replicator equation for strain frequencies, a key equation in evolutionary game theory, which in our case governs selection among N strains. In this talk, I will highlight some key features of this derivation, the use of the replicator equation to understand such a multi-strain system better, and discuss links with diversity data both in epidemiology and ecology.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Diogo Pereira, CEMAT, Instituto Superior Técnico

A new algorithm for inference in Hidden Markov models with lower span complexity

The maximum likelihood problem for Hidden Markov Models is usually numerically solved by the Baum-Welch algorithm, which uses the Expectation-Maximization algorithm to find the estimates of the parameters. This algorithm has a recursion depth equal to the data sample size and cannot be computed in parallel, which limits the use of modern GPUs to speed up computation time. A new algorithm is proposed that provides the same estimates as the Baum-Welch algorithm, requiring about the same number of iterations, but is designed in such a way that it can be parallelized. As a consequence, it leads to a significant reduction in the computation time. We illustrate this by means of numerical examples, where we consider simulated data as well as real datasets.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
SASlab (6.4.29) Faculty of Sciences of the Universidade de Lisboa — Online

Fernando Moura, Universidade Federal do Rio de Janeiro, Brasil

Modelo Beta-Beta Prime para Índices e suas Precisões com Aplicação à Estimação em Pequenas Áreas

Agências nacionais de estatística do mundo inteiro têm experimentado uma necessidade crescente de fornecer estimativas confiáveis de índices económicos e sociais, como proporções ou taxas, a nível de pequenas áreas ou pequenos domínios a partir de dados de pesquisas amostrais. No entanto, devido ao pequeno tamanho da amostra nessas áreas, não é viável obter estimativas com um nível de precisão aceitável sem usar abordagens baseadas em modelos. Este trabalho propõe modelar conjuntamente o estimador direto de índices no intervalo (0,1) e suas respectivas precisões utilizando-se as distribuições Beta e Beta prime. A novidade é modelar também o estimador de precisão amostral como uma distribuição Beta prime. Um estudo de avaliação com dados reais mostra que há ganho extra na modelagem conjunta do estimador direto e seu estimador de precisão com relação ao modelo Beta que não utiliza informação amostral sobre a precisão das estimativas. Uma aplicação para estimar o índice de insegurança alimentar em pequenas áreas do Estado de Minas Gerais, usando dados da Pesquisa Nacional de Orçamentos Familiares (POF) para o ano de 2018 é também apresentada.

Trabalho conjunto com Soraia Pereira (CEAUL/FCUL) e Giovani Silva (CEAUL/IST).

Joint seminar CEMAT and CEAUL

Europe/Lisbon
SASlab (6.4.29) Faculty of Sciences of the Universidade de Lisboa — Online

Heliton Tavares, Universidade Federal do Pará, Brasil

Modelos Estatísticos para Deteção de Fraudes e Aplicações

O desenvolvimento de Modelos Estatísticos para Detecção de Fraudes em Testes tem ganhado relevância nos últimos, particularmente aqueles baseados na Teoria da Resposta ao Item (TRI). Exames e avaliações podem ter suspeitas de fraude associadas se os resultados estiverem vinculados a vantagens financeiras ou vagas em instituições de ensino. Serão apresentados os principais modelos, comportamentos estatísticos associados, desempenho computacional para execução dos mesmos e uma aplicação a dados reais. Foi construído um pacote computacional no R que será apresentado e disponibilizado ao público.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Amphitheatre Fa2, IST — Online

Renata M. C. R. Souza, Universidade Federal de Pernambuco

Avanços em Ciência de Dados no Paradigma de Análise de Dados Simbólicos (Symbolic Data Analysis - SDA)

Os avanços das tecnologias da informação e dos computadores têm permitido a possibilidade de armazenar grandes e múltiplas bases de dados e frequentemente estes dados podem ser não estruturados com variáveis definidas por múltiplos valores ou múltiplas unidades. Por exemplo, temperaturas diárias registadas por valores mínimos e máximos e preferência de usuários para analisar fenômenos por regiões ao invés de habitantes. A fim de reduzir o tamanho e melhorar a eficiência de modelos associados a esses dados, uma solução é obter novas unidades estatísticas para descrever os fenômenos via dados multivalorados. Em Análise de Dados Simbólicos (ADS) as entradas das bases de dados são novas unidades descritas por variáveis que não se limitam a serem valores reais uma vez que podem ser selecionados de uma lista mais ampla: conjuntos, intervalos, histogramas, árvores, gráficos, funções, fuzzy, etc. O objetivo de ADS é estender as técnicas estatísticas e aprendizagem de máquina (árvores de decisão, regras de classificação, redes neurais, análise fatorial) para dados mais complexos, chamados de dados simbólicos. Nesta última década, diferentes métodos de regressão e agrupamento para dados multivalorados têm sido propostos na literatura de ADS. Diferentes aplicações ilustram o uso desses métodos.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
SASlab (6.4.29) Faculty of Sciences of the Universidade de Lisboa — Online

Cláudia Neves, King’s College London and CEAUL

One way to estimate an out-of-sample quantile of an unknown distribution through extreme value theory

Within the general aim of extreme value statistics lies the estimation of an event that is so rare that might have never been witnessed in the past. Whilst the parametric estimation of an extreme quantile has found its way to the lore of many applied sciences, in terms of evaluating return levels, analogous non-parametric methodology is far less explored. This is an interesting topic because there are different albeit equivalent ways to define an (extreme) out-of-sample quantile as underpinned by different constructs arising from the same foundational extreme value theorem.

In this talk, I will address two of these definitions through the domains of attraction framework and will explain how we succeeded in generalising one of them to allow for either cases of finite or infinite upper bound to the distribution underlying the sampled data.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
SASlab (6.4.29) Faculty of Sciences of the Universidade de Lisboa — Online

Nicoleta Serban, H. Milton Stewart School of Industrial and Systems Engineering at the Georgia Institute of Technology, USA

Computational Methods for Healthcare Access Modeling

This seminar will begin with an introduction of the multidimensional construct of healthcare access, providing a well-established definition and common objectives in access measurement and inference. Different approaches will be presented, focusing on rigorous mathematical models to estimate access, including optimization and simulation under uncertainty of the model inputs. Important aspects will be covered including spatial dependence in the decision parameters of optimization models used to estimate healthcare access and Bayesian hierarchical models used to specify the sampling distributions of model inputs. The models will be illustrated for modeling access to mental healthcare in Georgia, United States.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
SASlab (6.4.29) Faculty of Sciences of the Universidade de Lisboa — Online

Luis Gimeno-Sotelo, CEAUL and University of Vigo, Spain

Dependence modelling of extreme hydrological events in current and future climates

In this seminar, Dr. Luis Gimeno-Sotelo will provide an overview of his most recent advances on the extreme value analysis of the main hydrological extreme events (heavy rainfall and droughts) in terms of their main drivers. The most relevant statistical methods for non-stationary extreme value modelling will be presented, as well as a variety of methods from the copula theory to study bivariate extremes and conditional probabilities. He will explain the main applications of these statistical methodologies in the aforementioned environmental context, allowing for the identification of hotspot regions of high statistical dependence between the drivers and the hydrological extremes, as well as the analysis of the projected changes in the probabilities of occurrence of these extreme events in a global warming context.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
SASlab (6.4.29) Faculty of Sciences of the Universidade de Lisboa — Online

João Torrado Malato, CEAUL and IMM, University of Lisbon, Portugal and Warsaw University, Poland

Impact of misdiagnosis in case-control association studies: the case of myalgic encephalomyelitis/chronic fatigue syndrome

Misdiagnosis can occur when different case definitions are used by clinicians (relative misdiagnosis) or when failing the genuine diagnosis of another disease (misdiagnosis in a strict sense). In complex diseases, such as myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS), this problem translates to a recurrent difficulty in reproducing research findings. To explore these effects, we simulated data from case-control studies under the assumption of misdiagnosis in a strict sense. We estimated the power to detect a genuine association between a potential causal factor and ME/CFS and demonstrated how current research studies may have suboptimal power. To address the implications of these findings, suggestions for how power can be improved are given and explained within the context of the disease.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
SASlab (6.4.29) Faculty of Sciences of the Universidade de Lisboa — Online

Miguel Pereira, Cogitars, UK

Ensaios clínicos bayesianos – pequeno workshop baseado num ensaio muito conhecido

A estatística bayesiana tem sido cada vez mais utilizada em ensaios clínicos, oferecendo maior flexibilidade e eficiência no desenvolvimento de novos fármacos.

Neste seminário abordaremos este tópico utilizando como exemplo base num grande ensaio clínico muito conhecido mas que poucos sabem que utilizou métodos bayesianos. Vamos explorar em detalhe a metodologia utilizada no ensaio e em como é aplicável a outros ensaios. Será também abordado o tema de escolha do tipo de distribuições a priori e como escolher parâmetros de uma distribuição.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Joaquin Cavieres, University of Gӧttingen, Germany

Approximated Gaussian random field under different parameterizations for MCMC

Fitting spatial models with a Gaussian random field as spatial random effect poses computational challenges for Markov Chain Monte Carlo (MCMC) methods, primarily due to two factors: computational speed and convergence of chains for the hyperparameters. To deal with this, a Gaussian random field can be approximated by a Gaussian Markov random field using stochastic partial differential equations. This methodology is commonly used in “latent Gaussian models”, where the inference is done by the Integrated Nested Laplace Approximations, but rarely used in an MCMC method. In this contribution, we evaluated different parameterizations of the approximated Gaussian random field, specifically using the Hamiltonian Monte Carlo algorithm of the Stan software. A simulation study demonstrated that models using the hyperparameters ρ and σu were better able to estimate the values used to simulate the spatial random field. Their speed computation were faster compared to models parameterized with κ and τ. In real data application, the index of relative abundance estimated for Pollock indicates similar trends for the six models proposed. However, models incorporating ρ and σu demonstrated faster computation compared to those utilizing κ and τ, corroborating the results found in the simulation. Even more important, none of these models encountered convergence issues, as indicated by the Rhat statistic.

Joint seminar CEMAT and CEAUL