I have been hit by this issue multiple times, since the early years when individual algorithms like C4.5 or Ripper were distributed as C source code. When integrated platforms such as Weka (or scikit learn) arrived, their own re-implementations of the old algorithms (e.g., JRip) never lived up to the behavior of the C programs.

Why? Sometimes I have tracked down the differences for very specific cases where I was doing a migration for a production system. The differences boiled down to:

  • Different default parameters.
  • Capabilities for automatically handling undefined values.
  • Feature representation issues (being able to handle set-based features or categorical features directly).
  • More obscure issues (floating point size, etc)


A systematic study of the behavioral differences of different toolkits will clearly enhance the state-of-the-art for data science practitioners.