Recent seminars

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Rui Pires da Silva Castro, Eindhoven University of Technology, The Netherlands
Detecting a (late) changepoint in the preferential attachment model

Motivated by the problem of detecting a change in the evolution of a network, we consider the preferential attachment random graph model with a time-dependent attachment function. We frame this as a hypothesis testing problem where the null hypothesis is a preferential attachment model with $n$ vertices and a constant affine attachment with parameter $\delta_0$, and the alternative hypothesis is a preferential attachment model where the affine attachment parameter changes from $\delta_0$ to $\delta_1$ at an unknown changepoint time $\tau_n$. For our analysis we focus on a scenario where one only sees the final network realization (and not its evolution), and the changepoint occurs “late”, namely $\tau_n = n − cn^\gamma$ with $c \geq 0$ and $\gamma\in(0,1)$. This corresponds to the relevant scenario where we aim to detect the changepoint shortly after it has happened. We present two asymptotically powerful tests that are able to distinguish between the null and alternative hypothesis when $\gamma\gt 1/2$. The first test requires knowledge of $\delta_0$, while the second test is significantly more involved, and does not require the knowledge of $\delta_0$ while still achieving the same performance guarantees. Furthermore, we determine the asymptotic distribution of the test statistics, which allows us to easily calibrate the tests in practice. Finally, we conjecture that in the setting considered there are no powerful tests when $\gamma\lt 1/2$. Our theoretical results are complemented with numerical evidence that illustrates the finite sample characteristics of the proposed procedures.

Joint work with Gianmarco Bet, Kay Bogerd, and Remco van der Hofstad.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Katiane S. Conceição, Universidade de São Paulo, Brazil
Regression Model for Zero-Modified Count Data

In this work, we present a family of distributions for count data, named Zero-Modified Power Series (ZMPS), an extension of the Power Series distributions family whose support starts at zero. This extension consists of modifying the probability of observing zero of each Power Series distribution, allowing the new zero-modified distribution appropriately accommodate datasets that have any amount of zero observations (for instance, zero-inflated or zero-deflated datasets). Power Series distributions included in the Zero-Modified Power Series family are Poisson, Generalized Poisson, Geometric, Binomial, Negative Binomial, and Generalized Negative Binomial. In addition, we introduce the Zero-Modified Power Series regression models and propose a Bayesian approach. Two real datasets are analyzed: the first corresponds to leptospirosis notifications in cities of Bahia State in Brazil; the second corresponds to the number of goals scored by a team in a sports competition.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Peter Rousseeuw

Peter Rousseeuw, KU Leuven
New graphical displays for classification

Classification is a major tool of statistics and machine learning. Several classifiers have interesting visualizations of their inner workings. Here we pursue a different goal, which is to visualize the cases being classified, either in training data or in test data. An important aspect is whether a case has been classified to its given class (label) or whether the classifier wants to assign it to a different class. This is reflected in the probability of the alternative class (PAC). A high PAC indicates label bias, i.e. the possibility that the case was mislabeled. The PAC is used to construct a silhouette plot which is similar in spirit to the silhouette plot for cluster analysis. The average silhouette width can be used to compare different classifications of the same dataset. We will also draw quasi residual plots of the PAC versus a data feature, which may lead to more insight in the data. One of these data features is how far each case lies from its given class, yielding so-called class maps. The proposed displays are constructed for discriminant analysis, k-nearest neighbors, support vector machines, CART, random forests, and neural networks. The graphical displays are illustrated and interpreted on data sets containing images, mixed features, and texts.

Joint work with: Jakob Raymaekers, Mia Hubert

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Paula Moraga, KAUST, Saudi Arabia
Bayesian spatial modeling of misaligned data using INLA and SPDE

Spatially misaligned data are becoming increasingly common due to advances in data collection and management. We present a Bayesian geostatistical model for the combination of data obtained at different spatial resolutions. The model assumes that underlying all observations, there is a spatially continuous variable that can be modeled using a Gaussian random field process. The model is fitted using the integrated nested Laplace approximation (INLA) and the stochastic partial differential equation (SPDE) approaches. In order to allow the combination of spatially misaligned data, a new SPDE projection matrix for mapping the Gaussian Markov random field from the observations to the triangulation nodes is proposed. We show the performance of the new approach by means of simulation and an application of PM2.5 prediction in USA. The approach presented provides a useful tool in a wide range of situations where information at different spatial scales needs to be combined.

Joint seminar CEMAT and CEAUL

Europe/Lisbon
Online

Jorge Tendeiro, Hiroshima University, Japan
Perspectives on the Bayes Factor

In this talk I will discuss the Bayes factor: What it is, why (or why not) it should be used, and how to use it. My emphasis will be more on conceptual understanding and less on technicalities, as much as possible. My talk will include both theoretical and practical features, hopefully catering for an informed use of the Bayes factor.

Joint seminar CEMAT and CEAUL