Loading...
 

Port DECORATE to Scikit-Learn

This idea is part of the A Dollar Worth of Ideas series, with potential open source, research or data science projects or contributions for people to pursue. I would be interested in mentoring some of them. Just contact me for details.


There is this old algorithm in Weka, DECORATE that can be brought back to modern ML. After my interest in feature engineering, doing transformations to the instance seems an approach that is worth further expansion.

Getting patches into scikit-learn is not easy but at close to 400 citations, this is by no means a fringe paper.

Here it is the abstract of:

Melville P, Mooney RJ. Constructing diverse classifier ensembles using artificial training examples. In IJCAI 2003 Aug 9 (Vol. 3, pp. 505-510).

Ensemble methods like bagging and boosting that combine the decisions of multiple hypotheses are some of the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its generalization error. This paper presents a new method for generating ensembles that directly constructs diverse hypotheses using additional artificially-constructed training examples. The technique is a simple, general metalearner that can use any strong learner as a base classifier to build diverse committees. Experimental results using decision-tree induction as a base learner demonstrate that this approach consistently achieves higher predictive accuracy than both the base classifier and bagging (whereas boosting can occasionally decrease accuracy), and also obtains higher accuracy than boosting early in the learning curve when training data is limited.