This was a DEFCON submission for which I wrote a PoC experiment.
Much energy is allocated to protect a company's secrets from unauthorized access. However, it is well understood in the information retrieval (I.e., search engines) community that a poorly configured index can reveal private secrets.
How much information can be exposed?
Enough to know the general topic of the documents. This information can then be used to access the documents through legal (i.e., supoenas) or technical means.
How can this information be obtained?
By harvesting the visible documents and then performing a differential attack on the returned scores by the search engine. To streamline the attack, in this talk an open source tool, the Visual Elastic Search Reverse Engineering" (VESRE) tool can be built.