This idea is part of the A Dollar Worth of Ideas series, with potential open source, research or data science projects or contributions for people to pursue. I would be interested in mentoring some of them. Just contact me for details.


Back in IBM I had the pleasure of using JuruXML, a search engine that indexed XML fragments. This technology allowed for advanced semantic search.

The ideas behind JuruXML have been published, it would be interesting to put together an open source implementation of it or of related ideas that followed.

The goal is to be able to search, for example, for a phone number in an email that is not in the signature of the email nor in quoted text. (A big issue with this approach is how to allow users to access such functionality, back in the 2000s we were using dialogue interfaces such as the ones later popularized by Siri and Alexa.)