2002 seminars

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Manuel Scotto
Manuel Scotto, Instituto Superior Técnico and CEMAT

News from the Max-INAR(1) process for time series of counts

In this talk we discuss some extremal properties of the so-called max-INAR process of order one based on the binomial thinning operator, and marginal distribution exhibiting regularly varying right tail. In particular, attention is paid to the limiting distribution of the number of exceedances of high levels and the joint limiting law of the maximum and the minimum. Furthermore, we also look at the extremal behavior of the max-INAR process of order one under the assumption that its corresponding thinning parameter is random. Finally, the periodic case will be also addressed.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Ana Cristina Moreira Freitas, Faculdade de Economia e Centro de Matemática da Universidade do Porto

Extremal behaviour and rare events point processes for chaotic dynamical systems

We consider stochastic processes arising from dynamical systems by evaluating an observable function along the orbits of the system. We associate the existence of an Extremal Index less than 1 to the occurrence of periodic phenomena. For generic points, the exceedances, in the limit, are singular and occur at Poisson times. But around periodic points, the respective point processes of exceedances converge to a compound Poisson process. The extremal index usually coincides with the reciprocal of the mean of the limiting cluster size distribution. Here, we build dynamically generated stochastic processes with an extremal index for which that equality does not hold.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Mafalda Viana, Institute of Biodiversity, Animal Health and Comparative Medicine, University of Glasgow, Scotland, UK

Bayesian state-space models for understanding and managing infectious disease challenges

Understanding the ecological and epidemiological processes that govern the transmission of complex multi-host, multi-pathogens systems remains challenging. One of the key reasons is that these are difficult to observe directly, which makes it necessary to rely on less direct, and often ‘weak’, sources of inference. In this talk I will show the power of Bayesian state-space models to overcome some of these difficulties and reveal hidden patterns and relationships from field and experimental data from wildlife and human diseases. Specifically, I will show examples from the Canine Distemper Virus and Canine Parvovirus in lions and dogs in the Serengeti, and mosquito vectors of human Malaria, for which these approaches enabled us to identify the disease dynamics, quantify the impacts of intervention on those dynamics and ultimately identify optimal control strategies for these infectious diseases.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Patrícia de Zea Bermudez, Departamento de Estatística e Investigação Operacional e CEAUL, Universidade de Lisboa

Modelling dependence between observed and simulated wind speed data using copulas

In real applications, associations between variables are often non-linear and data commonly exhibit strong asymmetries and/or heavy tails. Copula models enable to create the joint distribution of vectors of random variables independently of their marginal distributions. This work aims to analyse and characterise the dependence between daily maximum wind speeds, $X$, observed in Portugal and simulated daily maximum wind speeds, $Y$, produced by a numerical-physical model. One of the major benefits of using simulated data is their availability at high spatial and temporal resolutions contrarily to observed data, which are commonly scarce. The main problem is that the simulated and the observed winds, in some stations, do not match well and tend to differ mostly in the right tail. Consequently, it is very important to understand the dependence between $X$ and $Y$. The ultimate purpose is to calibrate the simulated data and bring it in line with observed data. That offers practitioners richer data sources. The results showed that, in the overall, Gamma and Lognormal are the most suitable marginal distributions for our data and Gumbel copula is the most adequate to model the dependence structure. Finally, the classical modelling is compared with a Bayesian approach.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Maria do Rosário Ramos, Universidade Aberta e CMAFcIO

A Contribution To The Assessment Of Apparent Losses In Water Usage

The management of non-revenue water (NRW) is one key issue for improving water use efficiency, reducing gaps between water supply and demand.

One of the components of NRW are apparent (commercial) losses, water volumes taken from the network and consumed but not accounted. They are often related to water meter malfunction like under-registration, which is difficult to detect and quantify.

This study was motivated by the challenge proposed by Infraquinta, in the 140𝑡ℎ European Study Group with Industry (ESGI140), held in Portugal in June 2018. The aim was to develop a strategy supported by statistical methods to detect an anomalous decrease in water usage patterns, contributing to meter performance assessment. The basis of the approach is a combination of methods to analyse billed water consumption time series. In the first place, the series is decomposed using Seasonal-Trend decomposition based on Loess. Next, breakpoint analysis is performed on the seasonally adjusted time series. We search for decreasing changes in the periods defined by breakpoints through Mann-Kendall test and Sen’s slope estimator, and an indicator for this change is presented. The strategy is applied to data on water consumption from the Algarve, Portugal.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Juan Juan Cai, Department of Econometrics and Data Science, School of Business and Economics, Vrije Universiteit Amsterdam

A nonparametric estimator of the extremal index

Clustering of extremes has a large societal impact. The extremal index, a number in the unit interval, is a key parameter in modelling the clustering of extremes. We build a connection between the extremal index and the stable tail dependence function, which enables us to compute the value of extremal indices for some time series models. We also construct a nonparametric estimator of the extremal index and an estimation procedure to verify $D^{(d)}(u_n)$ condition, a local dependence condition often assumed for studying the extremal index.

We prove that the estimator is asymptotically normal. The simulation study which compares our estimator to two existing methods shows that our method has better finite sample properties. We apply our method to estimate the expected durations of heatwaves in the Netherlands and in Greece.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Helena Ramalhinho Lourenço, Universidade Pompeu Fabra, Barcelona

Applications and Extensions of the Iterated Local Search

Iterated Local Search (ILS) is a conceptually simple and efficient well-known Metaheuristic. The main idea behind ILS is to drive the search not on the full space of all candidate solutions but on the solutions that are returned by some underlying algorithm; typically, local optimal solutions obtained by the application of a local search heuristic. This method has been applied to many different optimization problems having more than 10,000 entries in Google Scholar. In this talk, we will review briefly the ILS method emphasizing the extensions of ILS. We will describe three relevant types of extensions: the hybrid ILS approaches combining ILS with other metaheuristics and/or exact methods; the SimILS (Simulation+ILS) to solve Stochastic Combinatorial Optimization Problems. We will discuss the advantages and disadvantages of these extensions and present some applications, including real ones in areas like Supply Chain Management, Economic Development or Health Care. Finally, future research topics will be presented.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Marta Belchior Lopes, CMA, FCT NOVA e NOVA LINCS, FCT NOVA

Biomarker discovery in cancer transcriptomic data using network-based regularization

Tumor heterogeneity plays a critical role in cancer progression and therapy resistance. Not only intertumoral heterogeneity leads to the definition of distinct tumor subtypes, but also intratumoral heterogeneity shows at distinct cell clones with different selective advantages. Emerging biomedical technologies, in particular, those generating omics data (e.g., genomics, transcriptomics, proteomics) now make it possible to ask which molecular entities govern tumor heterogeneity and can be candidates for disease biomarkers and therapeutical targets. Omics data are high-dimensional, with the number of features greatly outnumbering the number of observations. This calls for the need to develop statistical and machine learning methods able to translate vast amounts of data into meaningful biological solutions. Learning high-dimensional ‘omic data poses many challenges, in particular for parameter estimation and generation of interpretable solutions. In this talk, I will cover strategies for unveiling relevant information from high-dimensional omic data, including model regularization for feature selection and network-based modeling, with examples of application in the cancer research domain.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Clara Cordeiro, FCT, Universidade do Algarve e CEAUL

Detecting tail probabilities

Nowadays, the availability of high-quality data such as smart meter data provides new challenges to the researchers. Such data can include extreme values due to meter malfunction, burst water pipes, etc. Therefore, special care must be given to these types of events in the series, and specific statistical procedures based on extremes' behaviour are required to handle them. Our aim is to model the statistical characteristics of such time series and understand extreme events' probabilities. The key ideas will be illustrated using hourly water consumption data from a water company in Portugal, Infraquinta.

Joint work with Yi He (University of Amsterdam) and Rob Hyndman (Monash University).

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Rafael Izbicki, Federal University of São Carlos, Brazil

Diagnostic Tools to Conditional Density Models

Conditional density models $f(y|x)$, where $x$ represents a potentially high-dimensional feature vector, are an integral part of uncertainty quantification in prediction and Bayesian inference. However, such models can be difficult to calibrate. While existing validation techniques can determine whether an approximated conditional density is compatible overall with a data sample, they lack practical procedures for identifying, localizing, and interpreting the nature of (statistically significant) discrepancies over the entire feature space. We present more discerning diagnostics such as (i) the" Local Coverage Test" (LCT), which is able to distinguish an arbitrarily misspecified model from the true conditional density of the sample, and (ii)" Amortized Local PP plots" (ALP), which can quickly provide interpretable graphical summaries of distributional differences at any location $x$ in the feature space. Our validation procedures scale to high dimensions, and can potentially adapt to any type of data at hand. We demonstrate the effectiveness of LCT and ALP through a simulated experiment and a realistic application to parameter inference for galaxy images.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Rodrigo Wiff, Pontificia Universidad Católica de Chile - Departamento de Ecología

Combining Stakeholder Interviews and Censored Regressions to Correcting Misreported Catches

The most popular method to re-construct catches is based on qualitative information applied mostly to data-poor fisheries. On the other hand, there are only preliminary quantitative methods to correct catches which are based on censored data (when an observation is only partially known). We use the philosophy underpinning these qualitative and quantitative methods to propose a robust framework to correct catches, combining and improving these both approaches. The main improvements of the proposed framework rely on the combination of interviews applied to the stakeholders with fisheries/biological information from fishing monitoring. The proposed framework was applied to common sardine (Strangomera bentincki) and anchoveta (Engraulis ringens), both species fished off central-southern Chile. Fishery information was used to construct an interview which was applied to 88 stakeholders including fishers and administrative workers (fisheries managers, scientists). We used linear models for censored data conditional to the interviews. A model purely based on interviews and other combining interviews and censored regression were used to estimate the proportion of correction between declared landings and real catches. Information from surveys and estimates from censored data models were weighted and combined producing two times series of corrected catches on each fishery. Finally, these time series were used on the current stock assessment models to evaluate changes on the abundance estimates and current exploitation status. Six periods of variable length were determined between 1990 and 2016 where the percentage of catch corrections is kept constant. The highest censored period in both fisheries was between 2009 and 2013 with differences between reported landing and catches of around 35%. Likewise, the lowest censored period took place between 1997-200 with around 5% between reported landings and catches. The framework proposed gives a promising and robust method to estimate catches in data medium/ rich species merging interviews to stakeholders and fisheries/biological information. The application to common sardine and anchoveta shows credible and robust results, therefore this framework can be applied to exploited fish population classified as data-medium/rich species in which a stock assessment could be implemented or it is already in place.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Lígia Henriques‐Rodrigues, Departamento de Matemática, Universidade de Évora

Box-Cox transformations in Statistics of Extremes

In statistical literature the Box-Cox transformations are used to make the data more suitable for statistical analysis. Considering the cases where the extreme data are strictly positive what is the effect of a Box-Cox power transformation on the data? We know from the literature that this transformation of the data can increase the rate of convergence of the tail of the distribution to the generalized extreme value distribution and as by product the bias of the estimation procedure is reduced. The reduction of bias of the Hill estimator has been extensively addressed in the literature of extreme value theory. Several techniques have been used to achieve such reduction of bias, either by removing the main component of the bias of the Hill estimator of the extreme value index (EVI) or by constructing new estimators based on generalized means or norms that generalize the Hill estimator. In this talk we are going to study the Box-Cox Hill estimator introduced in Teugles and Vanrolen, in 2004. We shall prove the consistency and asymptotic normality of the estimator and address the choice and estimation of the power and shift parameters of the Box-Cox transformation, not only for the EVI-estimation, but also for the estimation of other parameters of extreme events. The performance of the estimators under study will be illustrated for finite samples through small-scale Monte-Carlo simulation studies.

Joint work with M. Ivette Gomes

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Paul Northrop, Department of Statistical Science, University College London

Extreme value threshold selection and uncertainty

A common form of extreme value modelling involves modelling excesses of a threshold by a generalised Pareto (GP) distribution. The GP model arises by considering the possible limiting distributions of excesses as the threshold increased. Selecting too low a threshold leads to bias from model misspecification; raising the threshold increases the variance of estimators: a bias-variance trade-off. Some threshold selection methods do not address this trade-off directly, but rather aim to select the lowest threshold above which the GP model is judged to hold approximately. We use Bayesian cross-validation to address the trade-off by comparing thresholds based on predictive ability at extreme levels. Extremal inferences can be sensitive to the choice of a single threshold. We use Bayesian model averaging to combine inferences from many thresholds, thereby reducing sensitivity to the choice of a single threshold. The methodology is illustrated using significant wave height datasets from the North Sea and from the Gulf of Mexico.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Dae-Jin Lee, Basque Center for Applied Statistics

Modelling and prediction in recurrent time-to-event sports injury data: a penalized Cox regression approach

Sports injuries are complex phenomena that are a result of the dynamic interaction of multiple risk factors and have serious consequences on athletes' health. Recently, statistical models are given special attention to the study of sports injuries to gain an in-depth understanding of its risk factors and mechanisms. In this talk, we evaluate statistical modelling strategies and methods based on the Cox regression model for high-dimensional data and recurrent injury data. Predictive performance is also studied via simulations. A real case study of injuries of female football players of a Spanish football club.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Ana Ferreira, Dept. Matemática do Instituto Superior Técnico

Measuring the fine structure of jump processes through extreme value theory

We shall discuss how the level of activity of jump processes arising from Lévy processes, can be understood from the extreme value index.

We present a new formulation arising from Extreme Value Theory for understanding the fine structure of these time continuous stochastic processes. New estimators and asymptotic properties can be established under first and second order regular variation assumptions.

Proposals for future work will be mentioned.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Kristen Campbell, University of Colorado Anschutz Medical Campus, USA

Development and assessment of risk models for interval-censored events post kidney transplant using the variability of a longitudinal biomarker

This talk discusses methods for using the variability of a longitudinal biomarker to dynamically predict an interval-censored time to event outcome. We first investigate a shared random effects model with longitudinal and interval censored survival sub-models. In our motivating clinical example, the biomarker values were highly variable, and the higher the variance meant the patient was likely being non-adherent to treatment. Thus, individual variance of the longitudinal biomarker was thought to be important in prediction of adverse events. The shared random effects model incorporates the sharing of an individual-specific variance component, along with a traditional intercept and slope. Using this model, we develop a dynamic prediction framework to calculate individualized predicted probabilities of event-free survival for new subjects, based on historical biomarker measurements and demographic data.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Room P3.10, Mathematics Building — Online

John Nolan, American University CAS- Math and Statistics

Dense classes of multivariate extreme value distributions

We explore tail dependence modeling in multivariate extreme value distributions through the use of the scale function. The correspondences between the scale function and the spectral measure or the stable tail dependence function are given. Combining scale functions by simple operations, semi-parametric classes of laws are described and analyzed, and resulting nested and structured models are discussed. Finally, the denseness of each of these classes is shown.

Joint work with Anne-Laure Fougeres and Cecile Mercadier at the University of Lyon.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Joana Gonçalves de Sá, Instituto Superior Técnico and SPAC-LIP

Are the least knowledgeable unaware of it? A statistical revisitation of the Dunning-Kruger effect

It has been argued that “no problem in judgment and decision making is more prevalent and more potentially catastrophic than overconfidence”, with Nobel laureate Daniel Kahneman going as far as stating that if he could eliminate just one judgement bias, overconfidence would be his choice. Most notably, Dunning and Kruger have shown that there is little to no correlation between knowledge and confidence, with the least knowledgeable also being more likely to overestimate their skills. This can play a role in misinformation control and science communication in general, as several studies have identified evidence of the Dunning-Kruger Effect (DKE) in highly controversial anti-science movements concerning vaccinations, biotechnology use, and climate change. However, both the original Dunning and Kruger paper and early subsequent work, have been revisited for different reasons including 1) being tested mostly on small populations, often of elite college students, 2) in the USA. More recent criticism focuses on methodological issues, namely 3) ignoring a possible expected regression to the mean effect, and 4) not discriminating between lack of metacognition or just having inappropriate priors. Finally, as knowledge levels often do not follow a perfect normal distribution, 5) presenting these results in quartiles, the common practice, might hide important variance, particularly in the most extreme quartiles, where several knowledge and confidence bins are treated as one.

In this talk, I will present a different methodological and analytical approach to address the above issues. First, regarding measurement, we introduce a new method, based on the premise that “don’t know” versus “wrong” answers to knowledge questionnaires can be used as a proxy for confidence, and we examine how this non-self-reported confidence varies with knowledge. Regarding sampling, we apply our metric to several large surveys, conducted over 25 years in Europe and the USA. We also introduce a new analytical approach, whereby we analyze these surveys over its full range of variability (instead of considering only performance quartiles) and compare their results to the two described models: the metacognition model expectation that confidence should grow linearly with knowledge, and the Dunning-Kruger effect of almost no relationship between the two variables (a null model of zero correlation is also considered), as are different answering strategies.

We find that, contrary to previous work, and unlike the DKE, overconfidence is not highest among the least knowledgeable: the relationship between knowledge and confidence is non-linear, with overconfidence peaking at intermediate knowledge levels and leading to populations that have some knowledge but strongly overestimate it. Finally, we investigated public attitudes towards science and found that this intermediate knowledge group also corresponds to the one displaying the most negative attitudes.

I’ll discuss the impact of our findings in the broad field of overconfidence studies, in the context of science communication, and considering current methods of data analysis in social psychology.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Jennifer Israelsson and Helga Kristín Ólafsdóttir, University of Reading and Gothenburg University/Chalmers

Rainfall Extremes

In this theme session, two speakers will share their work regarding the extreme aspects of rainfall phenomena, in smaller 25-min presentations.

TALK 1: Jennifer Israelsson (University of Reading)

Estimating the dependence structure for extreme tropical rainfall; many issues and some success.
Abstract:
A great deal of research has been done on rainfall extremes over Europe and the US both in a univariate and bivariate setting thanks to the availability of high-quality data. There has however been very limited amount of work done over Africa, and close to none in a multivariate setting, due to the general lack of rain gauge observations and the poor performance of weather models. In this talk, I will present some of my PhD work on estimating the dependence structure in extreme daily rainfall over west Africa and how this connects with the monsoon cycle. I will also talk about some of the many issues and limitations we faced and how some of these might be addressed.
Bio:
Jennifer Israelsson is a postdoctoral researcher at the University of Reading where she currently works on creating risk scenarios for local hospitals by translating regional climate projections to admissions at a hospital level. Her PhD was in the intersection of Statistics and Meteorology and focused on developing new methods to estimate dependence structures in daily tropical rainfall, and the application of those to better understand differences between rainfall intensities.

TALK 2: Helga Kristín Ólafsdóttir (Gothenburg University/Chalmers)

Frequency changes in extreme rainfall in the Northeastern USA
Abstract:
Extreme daily rainfall can increase with the individual extreme rainfalls becoming more frequent, more intense, or both more intense and more frequent. Based on the Generalized Extreme Value (GEV) distribution for annual maxima series and the General Pareto (GP) distribution for exceedances over threshold for the partial duration series, we develop a new statistical extreme value model, the PGEV model, allowing the use of high quality annual maximum series data instead of less well-checked daily data to estimate trends in intensity and frequency separately. The method is applied to annual maxima data from the NOAA Atlas 14, Volume 10. With increasing mean temperature, the frequency of extreme rainfall events increases as mean temperature increases while the distribution of the intensities of individual extreme rainfall events remains constant in the Northeastern US. We also study three other large areas in the contiguous US, the Midwest, the Southeast, and Texas, where similar but weaker trends than those in the Northeast are found.
Bio:
Helga is a PhD student at Gothenburg University/Chalmers in Applied Mathematics and Statistics with focus on modelling and model evaluation of extremes with applications on extreme rainfall under climate change.

Joint seminar CEMAT and CEAUL