Loading...
 

Pip Install UIMA

With the emphasis of end-to-end deep learning models, there might be less perceived need for NLP toolkits interoperability. But many problems of value in NLP still need the "processing" bit in the acronym.

The Apache UIMA framework is still one of the best frameworks for interoperability, featuring stand-off annotations, XML serialization and a wide array of features.

The current convergence on the Python programming language and the raise of the spaCy toolkit has left the framework a little bit on the sides. However, integrating UIMA with spaCy and other annotators shouldn't be particularly difficult if UIMA is well packed for Python, using its C++ implementation and a captive JVM to run Java annotators. The success of PySpark means such packaging is doable.

I have used UIMA from Python ten years ago and it worked just fine but it was not PiPy installable. Making it easy to install will allow the Python community to use the Apache Rule-Based Text Annotation framework, which is excellent.