An evaluation framework for input variable selection algorithms for environmental data-driven models

Stefano Galelli, Greer B. Humphrey, Holger R. Maier, Andrea Castelletti, Graeme C. Dandy, Matthew S. Gibbs

Research output: Contribution to journalArticle

102 Citations (Scopus)

Abstract

Input Variable Selection (IVS) is an essential step in the development of data-driven models and is particularly relevant in environmental modelling. While new methods for identifying important model inputs continue to emerge, each has its own advantages and limitations and no single method is best suited to all datasets and modelling purposes. Rigorous evaluation of new and existing input variable selection methods would allow the effectiveness of these algorithms to be properly identified in various circumstances. However, such evaluations are largely neglected due to the lack of guidelines or precedent to facilitate consistent and standardised assessment. In this paper, a new framework is proposed for the evaluation and inter-comparison of IVS methods which takes into account: (1) a wide range of dataset properties that are relevant to real world environmental data, (2) assessment criteria selected to highlight algorithm suitability in different situations of interest, and (3) a website for sharing data, algorithms and results (. http://ivs4em.deib.polimi.it/). The framework is demonstrated on four IVS algorithms commonly used in environmental modelling studies and twenty-six datasets exhibiting different typical properties of environmental data. The main aim at this stage is to demonstrate the application of the proposed evaluation framework, rather than provide a definitive answer as to which of these algorithms has the best overall performance. Nevertheless, the results indicate interesting differences in the algorithms' performance that have not been identified previously.

Original languageEnglish
Pages (from-to)33-51
Number of pages19
JournalEnvironmental Modelling and Software
Volume62
DOIs
Publication statusPublished - 1 Dec 2014

Keywords

  • Artificial neural networks
  • Data-driven modelling
  • Evaluation framework
  • Input variable selection
  • Large environmental datasets

ASJC Scopus subject areas

  • Software
  • Environmental Engineering
  • Ecological Modelling

Cite this