2002 seminars

Europe/Lisbon
Online

Luis Carvalho, Boston University, Massachusetts

Latent Association Graph Inference for Binary Transaction Data

We discuss a novel approach for modeling multivariate binary transaction data and inferring co-purchase patterns in market basket data. To this end, we exploit a latent graph capturing these purchase associations, where each transaction is a clique, and set meaningful priors based on expected transaction sizes and frequency. We present a MCMC sampling procedure that handles large datasets and conclude that this model provides sparser representations of inferred associations compared to traditional frequent itemset mining (FIM) approaches, without sacrificing predictive accuracy. This is joint work with David Reynolds

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Ariel Duarte-López, Data Management Group (DAMA-UPC), Department of Statistics and OR, Technical University of Catalonia

Zipf Extensions and Their Applications for Modeling the Degree Sequences of Real Networks

In this talk, I will present four bi-parametric extensions of the Zipf distribution. The first two belong to the class of Random Stopped Extreme distributions. The third extension is the result of applying the concept of Poisson-Stopped-Sum to the Zipf distribution and, the last one is obtained by including an additional parameter to the probability generating function of the Zipf. An interesting characteristic of three of the models presented is that they allow for a parameter interpretation that gives some insights about the mechanism that generates the data. Also, I analyze the performance of these models when used to fit the degree sequences of real networks from different areas as: social networks, protein interaction networks, or collaboration networks. The fits obtained have been compared with those obtained with other bi-parametric models such as: the Zipf-Mandelbrot, the discrete Weibull, or the negative binomial.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Abdelhakim Aknouche, Department of Mathematics of Qassim University, Saudi Arabia

On Integer-valued GARCH Modeling

This talk presents a concise review of integer-valued GARCH (INGARCH) modeling for time series of counts. Attention is paid to some commonly used specifications and the main approaches for studying their ergodic properties and their estimation methods. In particular, the focus is on the class of INGARCH processes with equal conditional stochastic and mean orders. Some recent mixture INGARCH extensions, in particular, Markov-switching INGARCH models are also presented.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Thomas Opitz, INRAE - Biostatistique et Processus Spatiaux

Spatiotemporal modeling of extreme-wildfire risk

Accurate spatiotemporal modeling of conditions leading to moderate and large wildfires provides better understanding of mechanisms driving fire-prone ecosystems and improves risk management. We here develop a joint model for the occurrence intensity and the wildfire size distribution by combining extreme-value theory and point processes within a novel Bayesian hierarchical model, and use it to study daily summer wildfire data for the French Mediterranean basin during 1995-2018. The occurrence component models wildfire ignitions as a spatiotemporal log-Gaussian Cox process. Burnt areas are numerical marks attached to points and are considered as extreme if they exceed a high threshold. The size component is a two-component mixture varying in space and time that jointly models moderate and extreme fires. We capture non-linear influence of covariates (Fire Weather Index, forest cover) through component-specific smooth functions, which may vary with season. We propose estimating shared random effects between model components to reveal and interpret common drivers of different aspects of wildfire activity. This leads to increased parsimony and reduced estimation uncertainty with better predictions. Fast approximate (but accurate) Bayesian estimation is carried out in the framework of the integrated nested Laplace approximation. Our methodology provides a holistic approach to explaining and predicting the drivers of wildfire activity and associated uncertainties.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Matheus B. Guerrero, King Abdullah University of Science and Technology

Conex-Connect: Learning Patterns in Extremal Brain Connectivity From Multi-Channel EEG Data

Epilepsy is a chronic neurological disorder affecting more than 50 million people globally. An epileptic seizure acts like a temporary shock to the neuronal system, disrupting normal electrical activity in the brain. Epilepsy is frequently diagnosed with electroencephalograms (EEGs). Current methods study only the time-varying spectra and coherence but do not directly model changes in extreme behavior, neglecting the fact that neuronal oscillations exhibit non-Gaussian heavy-tailed probability distributions. To overcome this limitation, we propose a new approach to characterize brain connectivity based on the joint tail (i.e., extreme) behavior of the EEGs. Our proposed method, the conditional extremal dependence for brain connectivity (Conex-Connect), is a pioneering approach that links the association between extreme values of higher oscillations at a reference channel with the other brain network channels. Using the Conex-Connect method, we discover changes in the extremal dependence driven by the activity at the foci of the epileptic seizure. Our model-based approach reveals that, pre-seizure, the dependence is notably stable for all channels when conditioning on extreme values of the focal seizure area. By contrast, the dependence between channels is weaker during the seizure, and dependence patterns are more "chaotic." Using the Conex-Connect method, we identified the high-frequency oscillations as the most relevant features explaining the conditional extremal dependence of brain connectivity.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Soraia Pereira and Lídia André, Faculty of Sciences, University of Lisbon and Lancaster University

Introduction to Extremes with R

This workshop aims to introduce extreme value theory analysis. We will start by motivating the need for modelling extreme observations and present the most common methodologies to do so: the block maxima and the peaks over threshold approaches. We will apply these methods to real data using the most common R packages for extremes and the inference will be made under a frequentist framework.

It is advised that participants install RStudio, together with the packages "ismev" and "evd", previously to the session.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Frederico Caeiro, Universidade Nova de Lisboa

Adaptive tail inference using Probability Weighted Moments

In statistics of extremes, the upper tail inference is usually based on the sample values over a high threshold. In a semiparametric framework, we consider the probability-weighted moment estimator of a positive Extreme value Index. Due to the specificity of the properties of the estimator, a direct estimation of an "optimal" threshold is not straightforward. In this talk, we consider two adaptive procedures for choosing such a threshold. The performance of the methods will be analysed with a simulation study. An illustration with a real dataset in the field of insurance is also provided.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Regina Bispo, Department of Mathematics, Nova School of Science and Technology

Spatial modelling and mapping of urban fire occurrence in Portugal

Fires continue to be a leading cause of property damage, psychological effects, physical damage and death in modern society. Since 43% of Portuguese population lives in urban areas, these numbers potentially carry severe consequences. In literature, several approaches are used to predict and model fire occurrences. Regardless the approach, most studies emphasize the need to consider spatial techniques to model urban fire occurrences. Spatial econometric models may present benefits as they offer the possibility to consider spatial autocorrelation either in the response variable, the explanatory variables and/or the random error terms. Hence, this research aims at modelling urban fire occurrences while making a comparative analysis of different strategies to account for spatial autocorrelation. In addition, we intend to identify factors that explain the relationship between fire events and the urban pattern. Ultimately, we seek to map the probability of urban fires occurrence in Portugal.

To the best of our knowledge this is the first study that models the urban fire incidence using spatial modelling techniques in relation to socio-economic characteristics on a global spatial scale in Portugal. We conclude suggesting that spatial analytical techniques should be further applied in main districts to explore local dynamics and model the relationship with social-economic and -demographic features on a micro-level fire incident data.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Vanda Inácio, School of Mathematics of the University of Edinburgh

The covariate-adjusted ROC curve: the concept and its importance and a new Bayesian estimator

Accurate diagnosis of disease is of fundamental importance in clinical practice and medical research. Before a medical diagnostic test is routinely used in practice, its ability to distinguish between diseased and nondiseased states must be rigorously assessed. The receiver operating characteristic (ROC) curve is the most popular used tool for evaluating the diagnostic accuracy of continuous-outcome tests. It has been acknowledged that several factors (e.g., subject-specific characteristics such as age and/or gender) can affect the test outcomes and accuracy beyond disease status. Recently, the covariate-adjusted ROC curve has been proposed and successfully applied as a global summary measure of diagnostic accuracy that takes covariate information into account. In this talk I will motivate the importance of including covariate-information, whenever available, in ROC analysis and, in particular, how the covariate-adjusted ROC curve is an important tool in this context. I will also detail the development of a highly flexible Bayesian method, based on the combination of a Dirichlet process mixture of additive normal models and the Bayesian bootstrap, for conducting inference about the covariate-adjusted ROC curve. Illustrations with simulated and real data will be provided.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Jochem Oorschot, Erasmus University Rotterdam

Tail inference using extreme U-statistics

Extreme U-statistics arise when the kernel of a U-statistic has a high degree but depends only on its arguments through a small number of top order statistics. As the kernel degree of the U-statistic grows to infinity with the sample size, estimators built out of such statistics form an intermediate family in between those constructed in the block maxima and peaks-over-threshold frameworks in extreme value analysis. The asymptotic normality of extreme U-statistics based on location-scale invariant kernels is established. Although the asymptotic variance corresponds with the one of the Hájek projection, the proof goes beyond considering the first term in Hoeffding's variance decomposition; instead, a growing number of terms needs to be incorporated in the proof.

To show the usefulness of extreme U-statistics, we propose a kernel depending on the three highest order statistics leading to an unbiased estimator of the shape parameter of the generalized Pareto distribution. When applied to samples in the max-domain of attraction of an extreme value distribution, the extreme U-statistic based on this kernel produces a location-scale invariant estimator of the extreme value index which is asymptotically normal and whose finite-sample performance is competitive with that of the pseudo-maximum likelihood estimator.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Elias Krainski, King Abdullah University of Science and Technology

Implementing Non-Stationary Non-Separable Spacetime Models

In this talk we will briefly introduce models accounting for the correlation over the spacetime domain.

We consider recent results for a class of non-separable spacetime models and to outline a computational implementation including a simple way to introduce non-stationarity.

We discuss some practical details through a working example considering a new R package dedicated to this work.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Wolfgang Schmid, European University Viadrina, Department of Statistics, Frankfurt, Germany

Sequential Monitoring of High Dimensional Time Series

In this presentation new types of multivariate EWMA control charts are presented. They are based on the Euclidean distance and on the distance defined by using the inverse of the diagonal matrix consisting of the variances. The design of the proposed control schemes does not involve the computation of the inverse covariance matrix and, thus, it can be used in the high-dimensional setting. The distributional properties of the control statistics are obtained and are used in the determination of the new control procedures. Within an extensive simulation study the new approaches are compared with the multivariate EWMA control charts which are based on the Mahalanobis distance.

The presented results are based on a joint work with Rostyslav Bodnar and Taras Bodnar.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Oswaldo Gressani, Hasselt University

Approximate inference with Bayesian P-splines in epidemic models

Statistical methods play an important role in infectious disease epidemiology. They provide the main set of tools to compute estimates of key epidemiological parameters and to shed light on the transmission dynamics of a pathogen. Markov chain Monte Carlo (MCMC) methods are powerful simulation techniques used to explore the posterior parameter space and carry out inference under the Bayesian paradigm. As MCMC samplers are iterative by design, drawing samples from the target posterior distribution often requires huge computational resources. This computational bottleneck is particularly unwelcome when analysis of epidemic data and estimation of model parameters is required in (near) real-time, as is often the case during epidemic outbreaks where massive datasets are updated on a daily basis. We explore the synergy between the Laplace approximation and Bayesian P-splines in epidemic models to deliver a flexible inference methodology with fast and nimble algorithms that outperform MCMC-based approaches from a computational perspective. The socalled “Laplacian-P-splines” method is illustrated in the context of nowcasting (i.e. the real-time assessment of the current epidemic situation corrected for imperfect data information caused by delays in reporting) and in the recently proposed EpiLPS framework for estimating the time-varying reproduction number with applications on data of SARS-CoV-2.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Adelaide Cerveira, Universidade de Trás-os-Montes e Alto Douro

Otimização em Problemas de Engenharia: Desenho de parques eólicos e Gestão de Fontes de Energia Renováveis em Smart Grids

O aumento da população mundial e o desenvolvimento de novas economias, aumentaram a procura de recursos energéticos, em larga escala.

A energia eólica está a tornar-se uma importante fonte de produção de energia elétrica. Num parque eólico onshore, a energia elétrica é recolhida numa subestação a partir das várias turbinas eólicas que o constituem, através de cabos elétricos colocados sobre valas terrestres. Neste trabalho considera-se o problema de otimização do desenho de parques eólicos, assumindo que as localizações da subestação e das turbinas eólicas são conhecidas, e se dispõe de um conjunto de cabos elétricos que se podem utilizar. O problema é definido como um modelo de Programação Linear Inteira, que é depois reforçado com diferentes conjuntos de desigualdades válidas.

Para além disso, as fontes de produção renovável e as baterias de armazenamento, surgem como opções para o desenvolvimento de Smart Grids. Através de um modelo de otimização de Programação Linear Inteira Mista (MILP), estuda-se o efeito de fontes de produção renovável e de sistemas de armazenamento numa rede elétrica, de forma a maximizar o lucro do Virtual Power Plant (VPP).

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Thiago de Paula Oliveira, The University of Edinburgh, The Roslin Institute

A method for partitioning trends in genetic mean and variance to understand breeding practices

In breeding programmes, the observed genetic change is a sum of the contributions of different groups of individuals. Quantifying these sources of genetic change is essential for identifying the key breeding actions and optimizing breeding programmes. However, it is difficult to disentangle the contribution of individual groups due to the inherent complexity of breeding programmes. Here we extend the previously developed method for partitioning genetic mean by paths of selection to work both with the mean and variance of breeding values. We first extended the partitioning method to quantify the contribution of different groups to genetic variance assuming breeding values are known. Second, we combined the partitioning method with the Markov Chain Monte Carlo approach to draw samples from the posterior distribution of breeding values and use these samples for computing the point and interval estimates of partitions for the genetic mean and variance. We implemented the method in the R package AlphaPart. We demonstrated the method with a simulated cattle breeding programme.We showed how to quantify the contribution of different groups of individuals to genetic mean and variance. We showed that the contributions of different selection paths to genetic variance are not necessarily independent. Finally, we observed some limitations of the partitioning method under a misspecified model, suggesting the need for a genomic partitioning method. We presented a partitioning method to quantify sources of change in genetic mean and variance in breeding programmes. The method can help breeders and researchers understand the dynamics in genetic mean and variance in a breeding programme. The developed method for partitioning genetic mean and variance is a powerful method for understanding how different paths of selection interact within a breeding programme and how they can be optimised.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

John Einmahl, Tilburg University, The Netherlands

Extreme value Inference for general heterogeneous data

We extend extreme value statistics to independent data with possibly very different distributions. In particular, we present novel asymptotic normality results for the Hill estimator, which now estimates the positive extreme value index of the average distribution. Due to the heterogeneity, the asymptotic variance can be substantially smaller than that in the i.i.d. case. As a special case, we consider a heterogeneous scales model where the asymptotic variance can be calculated explicitly. The primary tool for the proofs is the functional central limit theorem for a weighted tail empirical process. A simulation study shows the good finite-sample behavior of our limit theorems. We present an application to assess the tail heaviness of earthquake energies. This is joint work with Yi He (Univ. of Amsterdam).

Joint seminar CEMAT and CEAUL