Publications

GeoRF: a geospatial random forest

Margot Geerts, Seppe vanden Broucke, Jochen De Weerdt (2024), Data Mining and Knowledge Discovery, DOI: 10.1007/s10618-024-01046-7; Published online: 2024-06-19

Keywords: Random forest · Spatial data · Real estate · Explainability


A Binarization Approach to Model Interactions Between Categorical Predictors in Generalized Linear Models

Emilio Carrizosa, Marcela Galvis Restrepo, Dolores Romero Morales (2024), Applied Intelligence, DOI: 10.1007/s10489-024-05576-x; Published online: 2024-06-24

Keywords: Generalized linear models, Interpretability, Categorical predictors, Interactions, Clustering of categories


SHINE: A Scalable Heterogeneous Inductive Graph Neural Network for Large Imbalanced Datasets

Rafael Van Belle, Jochen De Weerdt (2024), Transactions on Knowledge and Data Engineering, DOI: 10.1109/TKDE.2024.3381240; Published online: 2024-03-25

Keywords: Class imbalance, fraud detection, graph neural network (GNN), heterogeneous graph, inductive node classification


Counterfactual analysis and target setting in benchmarking

Peter Bogetoft, Jasone Ramírez-Ayerbe, Dolores Romero Morales (2024), European Journal of Operational Research, DOI: 10.1016/j.ejor.2024.01.005; Published online: 2024-01-08

Keywords: Data envelopment analysis, Benchmarking, DEA targets, Counterfactual explanations, Bilevel optimization


Mathematical optimization modelling for group counterfactual explanations

Emilio Carrizosa, Jasone Ramírez-Ayerbe, Dolores Romero Morales (2024), European Journal of Operational Research, DOI: 10.1016/j.ejor.2024.01.002; Published online: 2024-01-05

Keywords: Machine learning, Interpretability, Mathematical optimization, Counterfactual explanations, Location analysis


CATCHM: A novel network-based credit card fraud detection method using node representation learning

Rafaël Van Belle, Bart Baesens, Jochen De Weerdt (2023), Decision Support Systems, DOI: 10.1016/j.dss.2022.113866; Published online: 2022-09-22

Keywords: Network representation learning, DeepWalk, Credit card fraud, Fraud detection


Simulation-based optimization of user interfaces for quality-assuring machine learning model predictions

Y. Zhang, M. Tennekes, T. De Jong, L. Curier, B. Coecke, and M. Chen (2024), ACM Transactions on Interactive Intelligent Systems, DOI: 10.1145/3594552; Published online: 2023-05-23

Keywords: model-based evaluation, quality assurance, interactive machine learning, data labeling, classification


Radial Icicle Tree (RIT): Node separation and area constancy

Y. Jin, T. J. A. de Jong, M. Tennekes, and M. Chen (2024), IEEE Transactions on Visualization and Computer Graphics, DOI: 10.1109/TVCG.2023.3327178; Published online: 2023-10-26

Keywords: Tree visualization, icicle tree, sunburst tree, size encoding, area constancy, node separation, radial icicle tree, RIT


Supervised feature compression based on counterfactual analysis

V. Piccialli, D. Romero Morales, C. Salvatore (2024), European Journal of Operational Research, DOI: 10.1016/j.ejor.2023.11.019; Published online: 2023-11-15.

Keywords: Machine learning, Supervised classification, Interpretability, Feature compression, Counterfactual analysis


Generating collective counterfactual explanations in score-based classification via mathematical optimization

Emilio Carrizosa, Jasone Ramírez-Ayerbe, Dolores Romero Morales (2024), Expert Systems with Applications, DOI: 10.1016/j.eswa.2023.121954; Published online: 2023-10-13.

Keywords: Machine Learning, Classification model, Mathematical Optimization, Optimization Problem


A new model for counterfactual analysis for functional data

Emilio Carrizosa, Jasone Ramírez-Ayerbe, Dolores Romero Morales
(2023), Advances in Data Analysis and Classification, DOI: 10.1007/s11634-023-00563-5 ; Published online: 2023-10-25.

Keywords: Counterfactual explanations, Mathematical optimization, Functional data, Prototypes, Random forests

 


Cost-sensitive probabilistic predictions for support vector machines

Sandra Benítez-Peña, Rafael Blanquero, Emilio Carrizosa, Pepa Ramírez-Cobo
(2023), European Journal of Operational Research, DOI: 10.1016/j.ejor.2023.09.027; Published online: 2023-10-23.

Keywords: Machine Learning, Support Vector Machines, Probabilistic Classification, Cost-Sensitive Classification

 


On clustering and interpreting with rules by means of mathematical optimization

Emilio Carrizosa, Kseniia Kurishchenko, Alfredo Marín, Dolores Romero Morales
(2023), Computers & Operations Research, DOI: 10.1016/j.cor.2023.106180; Published online: 2023-03-02.

Keywords: Machine learning, Interpretability, Cluster analysis, Rules, Mixed-integer programming

 


On optimal regression trees to detect critical intervals for multivariate functional data

Rafael Blanquero, Emilio Carrizosa, Cristina Molero-Río, Dolores Romero Morales
(2023), Computers & Operations Research, DOI: 10.1016/j.cor.2023.106152; Published online: 2023-01-13.

Keywords: Optimal randomized regression trees, Multivariate functional data, Critical intervals detection, Nonlinear programming

 


A bounded measure for estimating the benefit of visualization (Part I): theoretical discourse and conceptual evaluation

M. Chen and M. Sbert
(2022), Entropy, DOI: 10.3390/e24020228; Published online: 2022-01-31.

Keywords: information theory, theory of visualization, cost–benefit analysis, divergence measure, benefit of visualization, human knowledge in visualization, abstraction, deformation, volume visualization, metro map

 


Predicting student performance using sequence classification with time-based windows

Galina Deeva, Johannes De Smedt, Cecilia Saint-Pierre, Richard Weber, Jochen De Weerdt
(2022), Expert Systems with Applications, DOI: 10.1016/j.eswa.2022.118182; Published online: 2022-12-15.

Keywords: Machine learning, Sequence mining, Feature engineering, Success prediction, Behavioral patterns

 


On mathematical optimization for clustering categories in contingency tables

Emilio Carrizosa, Vanesa Guerrero, Dolores Romero Morales
(2022), Advances in Data Analysis and Classification, DOI: 10.1007/s11634-022-00508-4; Published online: 2022-06-28.

Keywords: Contingency tables, Mathematical optimization, Relational constraints, Clustering

 

The tree based linear regression model for hierarchical categorical variables

Emilio Carrizosa, Laust Hvas Mortensen, Dolores Romero Morales, M. Remedios Sillero-Denamiel
(2022), Expert Systems with Applications, DOI: 10.1016/j.eswa.2022.117423; Published online: 2022-10-01.

Keywords: Hierarchical categorical variables, Linear regression models, Accuracy vs. model complexity, Mixed integer convex quadratic problem with linear constraints

 


On sparse optimal regression trees

Rafael BlanqueroEmilio Carrizosa, Cristina Molero-Río, Dolores Romero Morales.
(2021), European Journal of Operational Research, DOI: 10.1016/j.ejor.2021.12.022; Published online: 2021-12-18.

Keywords: Machine Learning, Classification and regression trees, Optimal regression trees, Sparsity, Nonlinear Programming.

 

Relevant open-source routines can be found here


Interpreting clusters via prototype optimization

Emilio CarrizosaKseniia Kurishchenko, Alfredo Marín, Dolores Romero Morales.
(2021), Omega, DOI: 10.1016/j.omega.2021.102543; Published online: 2021-09-23.

Keywords: Machine Learning, Interpretability, Cluster Analysis, Prototypes, Mixed-Integer Programming.

 


Constrained Naïve Bayes with application to unbalanced data classification

Rafael BlanqueroEmilio Carrizosa, Pepa Ramírez Cobo, M Remedios Sillero-Denamiel.
(2021), Central European Journal of Operations Research, DOI: 10.1007/s10100-021-00782-1; Published online: 2021-10-20.

Keywords: Probabilistic Classification, Constrained optimization, Parameter estimation, Efficiency measures, Naïve Bayes.

 


Variable selection for Naïve Bayes classification

Rafael BlanqueroEmilio Carrizosa, Pepa Ramírez Cobo, M Remedios Sillero-Denamiel.
(2021), Computers & Operations Research, DOI: 10.1016/j.cor.2021.105456; Published online: 2021-06-02.

Keywords: Clustering, Conditional Independence, Dependence measures, Heuristics, Probabilistic Classification, Cost-sensitive Classification. 

 

 


Design space of origin-destination data visualization

 M. Tennekes and M. Chen (2021), Computer Graphics Forum, DOI: 10.1111/cgf.14310; Published online: 2021-06-21.

Keywords: 

 


On Clustering Categories of Categorical Predictors in Generalized Linear Models

Emilio CarrizosaMarcela Galvis Restrepo, Dolores Romero Morales.(2021), Expert Systems with Applications, DOI: 10.1016/j.eswa.2021.115245; Published online: 2021-05-24.

Keywords: Statistical Learning, Interpretability, Greedy Randomized Adaptive Search Procedure, Proximity between categories. 

 


On sparse ensemble methods: An application to short-term predictions of the evolution of COVID-19

Sandra Benítez-PeñaEmilio Carrizosa, Vanesa Guerrero.M Dolores Jiménez-Gamero, Belén Martín-Barragán, Cristina Molero-Río, Pepa Ramírez-Cobo, Dolores Romero Morales, M Remedios Sillero-Denamiel. (2021), European Journal of Operational Research, DOI: 10.1016/j.ejor.2021.04.016; Published online: 2021-04-18.

Keywords: Machine Learning, Ensemble Method, Mathematical Optimization, Selective Sparsity, COVID-19.

 


Mathematical optimization in classification and regression trees

Emilio CarrizosaCristina Molero-Río, Dolores Romero Morales.(2021), TOP, DOI: 10.1007/s11750-021-00594-1; Published online: 2021-03-17.

Keywords: Classification and regression trees, Tree ensembles, Mixed-integer linear optimization, Continuous nonlinear optimization, Sparsity, Explainability. 

 


Optimal randomized classification trees

Rafael Blanquero, Emilio CarrizosaCristina Molero-Río, Dolores Romero Morales.(2021), Computers & Operations Research, DOI: 10.1016/j.cor.2021.105281; Published online: 2021-03-08.

Keywords: Classification and regression trees, Cost-sensitive classification, Nonlinear programming. 

 


A cost-sensitive constrained Lasso

Rafael Blanquero, Emilio CarrizosaPepa Ramírez-Cobo, M. Remedios Sillero-Denamiel.(2020), Advances in Data Analysis and Classification, DOI: 10.1007/s11634-020-00389-5; Published online: 2020-03-12.

Keywords: Performance constraints, Cost-sensitive learning, Sparse solutions, Sample average approximation, Heterogeneity, Lasso.

 


Expert-driven trace clustering with instance-level constraints

Pieter De Koninck, Klaas NelissenSeppe vanden Broucke, Bart Baesens, Monique Snoeck, Jochen De Weerdt(2020), Knowledge and Information Systems, DOI: 10.1007/s10115-021-01548-6; Published online: 2020-03-01.

Keywords: Trace clustering, Process mining, Semi-supervised learning, Constrained clustering.

 

Relevant open-source software can be found here and here


Sparsity in Optimal Randomized Classification Trees

Rafael Blanquero, Emilio Carrizosa Cristina Molero-Río, Dolores Romero Morales(2019), European Journal of Operational Research, Elsevier Ltd., DOI: 10.1016/j.ejor.2019.12.002; Published online: 2019-12-16

Keywords: Data mining, Optimal Classification Trees, Global and Local Sparsity, Nonlinear Programming.

 


Feature Selection in Data Envelopment Analysis: A Mathematical Optimization approach