The concept is straightforward: can be large amounts of compiled Java bytecode available be used to distill informative keywords to obfuscated compiled code?
For this, I took all the Java code in the Debian project and run it through an obfuscator, then generated features for statistical ML. This produced mixed results that were reported at a talk in RECON, the leading conference in reverse engineering. A talk for scientists at Université de Montreal followed.
After much work and several paper rejections, a final version of the approach, trying to predict the first word in a Java method (get/set/etc) was put together and a paper presented at the Argentinian Symposium of Artificial Intelligence.
There are now much better approaches using Deep Learning for the problem. And I learned that multidisciplinary research is very tough when it comes to communications.