Program VII International Workshop on Proximity Data

Welcome to WAMyC 2026!

Below you will find the final schedule for the workshop. All sessions, including keynote lectures, oral communications, and poster sessions, are now confirmed. We invite you to explore the timing for each day to make the most of your experience in Salamanca.

All academic sessions will take place at the Edificio San Boal (Centro Hispano japonés) in the historic center of Salamanca.

Day 1 - 16th April
Day 2 - 17th April

Time	Activity	Details
10:00	Registration & Opening
10:30	INVITED SPEAKER: Michael Greenacre	Invited Speaker Michael Greenacre Universitat Pompeu Fabra, Barcelona Title: "Post-Aitchison Compositional Data Analysis" Abstract: Compositional data analysis (CoDA) was essentially established by the Scottish statistician John Aitchison in the 1980s, although many of its ideas existed earlier, for example in the biomedical literature of Paul Lewi in Belgium and the teachings of Jean-Paul Benzécri in France. The foundational principle on which Aitchison-style CoDA is based is that of subcompositional coherence: that is, results for a given compositional data set should be invariant to analyzing subcompositions or extending the data set to include more compositional parts. For a review, see Greenacre (2022). As Aitchison himself said, CoDA is easy. It is simply the application of a logratio transformation to the compositional data, set and then carrying on with well-known methods of univariate and multivariate statistics on the transformed data. The only “difficulty” is the interpretation of the logratio-based results and back-transforming them to the original more familiar compositional scale. The most serious problem of CoDA’s use of logratios is how to cope with the many zeros that occur in compositional data, which I have called the “Achilles heel” of CoDA. A major development in the early 2000s was the introduction of what was called the isometric logratio (ILR) into the CoDA toolbox, recently rebranded by some authors, for no clear reason, as the orthonormal logratio (OLR). In the opinion of Aitchison, myself and many others working in this area, this development has been unfortunate, since the ILR transformation is (i) complicated in its definition, (ii) difficult, if not impossible, to interpret, (iii) problematic to choose in the context of a research objective, since there are billions of possibilities even for a moderate-sized data set. In spite of these difficulties, the ILR was sold in the literature as the most favourable transformation to use in practical applications, even though Aitchison himself opposed the idea, preferring simpler alternatives that give equivalent results. See Greenacre et al (2023) for a full discussion of this issue. Another unfortunate consequence of this ILR dogma has been its infiltration into the research literature where logratio transformations are not necessary at all, for example in time-use research where subcompositional coherence is a non-issue since all the compositional parts (daily activities) are included in the 24-hour day. In bioinformatics, the “omics” fields of genomics, microbiomics, metabolomics, transcriptomics, etc…, involve data that are compositional, but the issue of coherence is considerably diluted by the high-dimensionality of the data features. In this overview of CoDA, of its raison d’être, and of its use in many research areas, I will describe how this important area of multivariate statistics has evolved over four decades. In particular, I will discuss the thorny problem of data zeros, and the many situations when one can effectively ignore the logratio transformation and analyse compositional data on original scale, or by applying power-transformations, logarithmic transformation or normalizing transformations such as the chi-squared in correspondence analysis. For an update on the practical aspects of CoDA, for both unsupervised and supervised statistical learning, see the second edition of my book Compositional Data Analysis in Practice, to be published later this year (Greenacre 2026). References Greenacre M (2021) Compositional data analysis. Annual Review of Statistics and its Application, 8:21.1-21.29. Greenacre M, Grunsky E, Bacon-Shone J, Erb I, and Quinn, T (2023) Aitchison’s compositional data analysis 40 years on: a reappraisal. Statistical Science 38(3): 386-410. Greenacre M (2026) Compositional Data Analysis in Practice, Second Edition. Chapman & Hall / CRC Press. To appear
11:30	Coffee Break
12:00	Session I: Advances in Multivariate Modelling, Dimensionality Reduction, Robust Methods, and Classification (I)	Session I: Advances in Multivariate Modelling, Dimensionality Reduction, Robust Methods, and Classification (I) Chair: Aurea Grané Model-based Clustering of Official Distributional Data. Paula Brito, A. Pedro Duarte Silva. Statistical depth-based estimation for multivariate NSUM. Rosa E. Lillo, Antía Enriquez, Belén Pulido. Sheaf Diffusion: A Topological Approach to Fairness in Classification. Arturo Pérez-Peralta, Sandra Benítez-Peña, Rosa E. Lillo. A Priori Control of Censoring Scenarios in Mixture Cure Models: An Empirical-Based Simulation Framework for Robust Model Validation. Miguel Ramos, P. González-Barquero, R.E. Lillo, A. Méndez-Civieta. New distance-based robust clustering algorithms for large mixed-type data. Fabio Scielzo-Ortiz, Aurea Grané. Burnout syndrome among security professionals: a PLSc-SEM approach under semicontinuous covariance. Luís Miguel Lindinho da Cunha Mendes Grilo, Tiago F. Braz, Helena L. Grilo, Jean P. Maidana, Milan Stehlík.
14:00	Lunch Break
15:30	INVITED SPEAKER: Adelaide Freitas	Invited Speaker Adelaide de Fátima Baptista Valente Freitas DMat & CIDMA, University of Aveiro, Portugal Title: "A biplot for univariate time series" Abstract: Extracting essential features from real-valued time series is crucial for exploration, modeling, and forecasting. Singular Spectrum Analysis embeds a time series into a Hankel trajectory matrix and applies Singular Value Decomposition, enabling a low-rank representation whose grouped rank-one terms are commonly interpreted as trend, oscillatory components, and noise Because the time series structure is tightly linked to the eigenvectors of the trajectory matrix, graphical representations of these objects can support intuitive interpretation. The biplot method provides a joint low-dimensional display of the row and column spaces of a data matrix, allowing relationships among observations and among variables—and their associations—to be interpreted geometrically through angles, distances, and projections. In this talk, we propose a biplot-based visualization framework built from the SSA trajectory matrix and its decomposition, using a Partial Least Squares formulation. We focus on the HJ-biplot, which provides a high-quality simultaneous display of rows and columns, and we discuss how its geometry can be interpreted specifically for Hankel trajectory matrices. We present illustrative examples to show how the biplot method can be used to reveal key features of univariate time series. Co-authors: Alberto Silva (CIDMA - University of Aveiro).
16:30	Session II: Advances in Multivariate Modelling, Dimensionality Reduction, Robust Methods, and Classification (II)	Session II: Advances in Multivariate Modelling, Dimensionality Reduction, Robust Methods, and Classification (II) Chair: Eva Boj Multivariate analysis on estimable functions. Carles M. Cuadras. Explainable Robust Distance-Based Predictive Models. Eva Boj, Aurea Grané, Marcos Álvarez. Distance-based dimensionality reduction for big data. Pedro Delicado, Adrià Casanova-Lloveras, Cristian Pachón-García. Co-Inertia Analysis for Ordinal Data Using Polychoric Correlations and Associated Biplot. Laura Vicente-González, F. Javier del Río Olvera, José Luis Vicente-Villardon.
18:30	Guided Walking Tour and Ieronimus Visit	🚶 Guided Walking Tour (18:30) A professional journey through Salamanca's history. Let's meet at 6:20 p.m. under the clock in the main square. 🏰 Ieronimus Cathedral Towers (19:30) A unique experience walking among the battlements and rooftops of the Cathedral. Let's meet at 7:25 p.m. next to the cathedral. Note: Ieronimus involves climbing narrow medieval stairs and heights. Optional for those with vertigo or mobility concerns. 🔗 More info: Social Program
21:30	Conference Dinner	🍽️ Conference Dinner Join us for an unforgettable evening at the renowned Lilicook Gastronomía, one of Salamanca’s most acclaimed culinary spots. Price: €40 View full menu & details → Experience creative local cuisine in a modern atmosphere, perfect for networking and celebrating the workshop's first day.

Time	Activity	Details
10:00	Session III: Biostatistics, Public Health, and Risk Modeling	Session III: Biostatistics, Public Health, and Risk Modeling Chair: Laura Vicente-González Multivariate Cox Proportional Hazards Models for accurate prediction of individual Patient Risk. Alberto Berral González, María Sánchez-Martin, Emma Pérez-García, Natalia Alonso-Moreda, Santiago Bueno-Fortes, Manuel Martin-Merino, Jose M. Sánchez-Santos, Javier De Las Rivas. Development of a bioinformatics workflow for survival analysis and patient risk stratification based on tumor omics data. María Sánchez Martín, Alberto Berral-González, Emma Pérez-García, Natalia Alonso-Moreda, Jose M. Sánchez-Santos, Javier De Las Rivas. Modeling Competing Risks to Characterize the Risk of Transplant-Associated Thrombotic Microangiopathy in Adults Undergoing Hematopoietic Stem Cell Transplantation. Jesús M. Zahinos, Pedro I. Dorado, Diego Clavo, Carmen Rodríguez, María Cortes, Mónica Cabrero. Additive Neural Networks to Identify Patients at Low Risk of Choledocholithiasis and Reduce Unnecessary Invasive Procedures. Breyner Chacón Cordero, Pedro Ignacio Dorado Díaz, Jaime López Sánchez, María Fuentes Sánchez, Francisco Blanco Antona. Data-driven decision making for health area balancing: Evidence-based patient reassignment in Salamanca. Rocío de Andrés Calle, S. Prieto-Herráez, J.M. Cascón. Multiple Correspondence Analysis and Clustering Methods in Occupational Exposure to COVID-19 in Healthcare Workers in Castilla y León, Spain. Verónica Carrasco Bonal, María Purificación Vicente Galindo.
11:30	Coffee Break
12:00	INVITED SPEAKER: Francesca Condino	Invited Speaker Francesca Condino University of Calabria, Italy Title: "New perspectives in analysing income inequality: discrepancies and similarities among European countries" Abstract: Income inequality remains a central topic in economic and social research, as disparities in the distribution of resources can have far-reaching implications for well-being, social cohesion and economic development. Traditional approaches often rely on synthetic measures, such as the Gini inequality index or Theil indices, to summarize income concentration in a population. While these indicators are undeniably useful and widely adopted, they provide only a partial picture. A systematic framework is proposed for analysing income inequality by exploiting the rich information content of share densities, functions directly linked to the Lorenz curves (Lorenz, 1905) and capable of describing multiple aspects of how income is distributed. When comparing income concentration across groups or geographic areas, synthetic indicators such as the Gini or Theil indices provide useful initial insights, but often fail to reveal differences in the underlying structure of inequality. Indeed, populations with similar Gini values may exhibit very different distributive patterns. Share densities, defined as the derivative of the Lorenz curve (Farris, 2010), quantify how income shares are allocated across percentiles of the population, offering a probabilistic and fine-grained interpretation of income concentration. Despite their descriptive potential, they are rarely employed explicitly in empirical analyses. This study adopts a parametric approach, modelling directly the probability that a randomly selected unit of income belongs to a specific percentile range. From this perspective, a weighted maximum likelihood function is derived to estimate share densities and the corresponding Lorenz curves. Particular attention is devoted to parametric families widely used for modelling income distributions (Kleiber and Kotz, 2003), with a focus on the Dagum model (Dagum, 1977), whose associated share density belongs to the Generalized Beta distribution of the first kind (GB1). The likelihood function is constructed using ordered income data and their associated population shares, allowing for the estimation of parameters that determine inequality measures such as the Gini and Theil indices (Rohde, 2008). EU-SILC 2021 data for the 27 European countries are considered to explore the potential of the proposed method. The application shows that the GB1 model provides an excellent approximation of the empirical share densities and Lorenz curves. The close correspondence between observed and fitted Gini and Theil values confirms the effectiveness of the parametric approach in capturing the main features of income inequality. To compare the inequality structures of different countries, the Jensen–Shannon (JS) divergence is employed as a measure of dissimilarity between densities. The JS divergence captures differences across the entire distribution, reflecting the full behaviour of share densities. As a result, any type of deviation, such as increased inequality in the tails or greater concentration around the centre of the income distribution, contributes to the measure. This makes the JS divergence particularly suitable for comparing populations characterised by heterogeneous inequality patterns. On the basis of JS dissimilarities, both hierarchical (Condino, 2023a) and non-hierarchical (Condino, 2023b) clustering algorithms for unconventional data are applied. The hierarchical method starts from the JS dissimilarity matrix among pairs of countries and produces clear groupings of countries with similar inequality structures. The Dynamic Clustering Algorithm (DCA) further refines these results by iteratively minimising within-cluster divergence and representing each cluster through a prototype, defined as the mixture of the share densities of countries in that cluster. Three distinct clusters emerge, corresponding respectively to low, intermediate and high inequality regimes. Moreover, the proposed framework enables the decomposition of total divergence into within-cluster and between-cluster components. This allows the evaluation of the obtained partition quality through the ratio DJS W / DJS T, which quantifies the degree of homogeneity of countries in terms of income inequality within clusters. References Condino, F. (2023a). Hierarchical Clustering of Income Data Based on Share Densities. In: Grilli, L., Lupparelli, M., Rampichini, C., Rocco, E., & Vichi, M. (eds), Statistical Models and Methods for Data Science. Springer, Cham. Condino, F. (2023b). Share density-based clustering of income data. Statistical Analysis and Data Mining, 16(4), 336–347. Dagum, C. (1977). A new model of personal distribution: specification and estimation. Economie Appliquée, 30, 413–437. Farris, F.A. (2010). The Gini Index and Measures of Inequality. The American Mathematical Monthly, 117(10), 851-864. Kleiber, C., Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences. John Wiley & Sons. Lorenz, M.O. (1905). Methods of Measuring the Concentration of Wealth. Publications of the American Statistical Association, 9(70), 209–219. Rohde, N. (2008). Lorenz Curves and Generalised Entropy Inequality Measures. In: Chotikapanich, D. (ed), Modeling Income Distributions and Lorenz Curve. Springer. Keywords: income inequality, share density, Lorenz curve, Jensen-Shannon divergence, unconventional data.
13:00	Session IV: Applied Algorithms and Spatio-Temporal Data Analysis	Session IV: Applied Algorithms and Spatio-Temporal Data Analysis Chair: Ana Belén Nieto Librero Selective State Space Models for Financial Return Prediction: A Comparative Study Using the Mamba Architecture. Josu Sabin Iriondo Delgado, Pedro Ignacio Dorado Díaz, Jesús Ángel Román Gallego. A Computational Entropy-Based Approach to Intention-Vote Transitions. Patriza Pérez Asurmendi, Rocío de Andrés Calle, José Manuel Pavía. Comparison of classical multivariate and machine learning methods in sex determination using dental morphometric and morphoscopic variables. Nery Sofía Huerta Pacheco, Rosa A. Sepúlveda Correa, Ivet Gil-Chavarría. Multivariate Linear Models with Error-Prone Variables. Li-Pang Chen.
14:00	Lunch Break
15:30	Poster Session	Poster Session A Monte Carlo Simulation in R: Analysis of the Game "Happy Little Dinosaurs" Irene Gutiérrez Moreno, Laura Varela Anido. A Multivariate Statistical Index for the Water–Energy–Climate Nexus: Global Sustainability Patterns Teresa Pérez-Labrador, Carmen Patino-Alonso. A New Record-Counting Methodology for Nonparametric Cointegration Analysis in Multivariate Time Series M. Teresa Santos-Martín, Ana E. Sipols, Clara Simón, Lynda Atill, Hocine Fellag. Advances in Maritime Safety: The Use of Multivariate Statistical Techniques in Port State Control Inspections. Jose Manuel Prieto, Víctor Amor-Esteban, Emilio Rodriguez, Nieves Endrina, David Almorza. An integrated algorithmic workflow for analyzing high-dimensional single-cell biogenomic data. Emma Perez Garcia, Enrique de la Rosa Moron, Maria Sanchez-Martin, Alberto Berral Gonzalez, Jose M Sanchez-Santos, Javier De Las Rivas. ASSESSMENT OF GENDER INEQUALITY IN EUROPE AND SPAIN: A MULTIVARIATE AND MACHINE LEARNING APPROACH Marta Ruiz de la Hermosa González de la Aleja. Breast Cancer Diagnosis Using Machine Learning and Deep Learning Techniques Sergio Galán López. COSTATIS vs. GPA in multi-way data: A Comparative Structural Perspective María Concepción Vega Hernández, Carmen Patino Alonso. dbrobust R-package for Mixed-Type Data: Distance-Based Visualization and Analysis. Marcos Álvarez Martín, Eva Boj, Aurea Grané Detection and interpretation of collective outliers Alexander Trilleras Martínez, Carmen Patino Alonso. Dual STATIS of Age-Structured Cerebral Cortex Transcriptomics Ángel Tejero-Aznar, Carmen Patino-Alonso, Elisa Frutos-Bernal. Dynamics of Urban Housing Precarity in Bolivia: A Partial Triadic Analysis Approach CARLOS FERNANDO SILVA VIAMONTE, Nerea González García, Carmen Patino Alonso. Estimating dominant variables in multivariate time series through WLMC and VisualDom Josué M. Polanco-Martínez. Higher Lymphocyte-to-Monocyte Ratio Predicts Favorable 90-Day Functional Outcome After Reperfusion-Treated Ischemic Stroke Diana L. Tarruella-Hernández, Antonio García-Molina, Marta M. Dolcet-Negre, Alicia Aliena-Valero, Álvaro Lucero-Garófano, Manuel Pedrero-Prieto, Lluís Morales-Caba, Gerardo Fortea-Cabo, Fernando Aparici-Robles, María Jesús Rivas-López, Juan B Salom, José I Tembl, Irene Escudero-Martínez. Linear Mixed Models for the Analysis of Mitochondrial Dynamics in Cancer Lorenzo Castro, Edurne Almeida, Dirk Fennema, Nuria Ferrándiz. Multivariate modelling of gene expression data for AML prognosis: an interpretable framework Carla Ijurko, Nerea González-García, Purificación Galindo-Villardón, Ángel Hernández-Hernández. Optimal design and modeling for Chemotherapy Clearance Juan M. Rodríguez Díaz, M. Teresa Santos-Martín, M. Isabel Asensio, Irene Mariñas-Collado. Perfiles de riesgo de salud y bienestar en la población panameña: Un estudio a partir de la encuesta ENSPA Kathia Díaz-Arias, Irene Albarrán, Aurea Grané. Performance evaluation of machine learning models across multiple independent datasets for predicting bladder cancer recurrence. Karoline Brito Caetano Andrade Coelho, Alberto Berral-Gonzalez, Jose M Sanchez-Santos and Javier De Las Rivas. Review of the psychometric properties of the Female Sexual Function Index (FSFI) in Spanish female university students Francisco Javier del Río Olvera, José Luis Vicente Villardón, Manuel Antonio García Sedeño, Laura Vicente González, Silvia V. Navarro Murcia. Selection and Integration of a Measurement Instrument for Multivariate Research on Digital Consumer Behaviour Hugo-Matías Speratti-Mendoza, Carmen Patino-Alonso, Nerea González-García. Statistical Analysis of Coping with Socially Stressful Situations: Assessment of Skills, Psychosocial Variables, and Predictive Models Ángela Rodríguez Laguna, Cristina Jenaro Río, Rosa Milagros Mesón García. The PCovR Biplot: A Graphical Tool for Principal Covariates Regression Elisa Frutos-Bernal, José Luis Vicente-Villardon.
16:30	INVITED SPEAKER: José Luis Vicente Villardón	Invited Speaker José Luis Vicente Villardón Universidad de Salamanca Title: "Logistic Biplots for Ordinal Variables based on alternating gradient descent on the cumulative probabilities" Abstract: Biplot methodology provides a joint low-dimensional representation of the row and column entities of a data matrix. Classical formulations are based on Principal Component Analysis (PCA) and are therefore restricted to continuous variables under linear model assumptions. Subsequent extensions to binary and nominal data, known as logistic biplots (LBs), replace the linear structure with generalized linear models using logistic link functions. Nevertheless, these approaches do not adequately accommodate ordinal data, which exhibit inherent ordering but violate both metric and nominal assumptions. In this work, we generalize the biplot framework to ordinal responses through the introduction of the ordinal logistic biplot (OLB). The proposed model assumes a cumulative logit formulation in which row scores act as latent traits generating ordinal responses across multiple dimensions, while column parameters define category thresholds and discrimination vectors. The resulting response functions correspond to ordinal logistic surfaces which, when evaluated over the latent space, induce a linear biplot representation. This construction is embedded within a latent variable model and yields a multidimensional structure formally equivalent to Samejima’s graded response model in Item Response Theory (IRT). We analyze the geometric structure of the OLB, characterizing the relationship between category boundaries, discrimination parameters, and the induced partitioning of the latent space. For parameter estimation, we develop an iterative optimization scheme based on alternating gradient descent, jointly updating row scores and item parameters under identifiability constraints. Additionally, we derive prediction directions that enable coherent visualization of ordinal response probabilities within the biplot space. The OLB can thus be interpreted as a multidimensional IRT model augmented with a biplot-based graphical representation, enhancing both interpretability and exploratory capabilities. Its objective is to recover latent structures and variable relationships in ordinal data through a unified geometric and probabilistic framework. The methodology is illustrated through an application to job satisfaction data from PhD holders in Spain. The estimated latent structure reveals two dominant dimensions, interpretable as intellectual satisfaction and extrinsic job conditions (e.g., salary and benefits). Comparative evaluation against alternative methods demonstrates improved discriminatory capacity of the proposed model with respect to item parameters.
17:30	AMyC Group Meeting and Closing Address

Program VII International Workshop on Proximity Data

San Boal Building, the conference venue