This benchmark indexes and searches a 20 M document subset of the 
New York City taxi ride corpus, in both a sparse and dense way.  Green taxi rides make up ~11.5% of the 20 M documents, and yellow are ~88.5%.  See 
this blog post for details.
Click and drag to zoom; shift + click and drag to scroll after zooming; hover over an annotation to see details; click on a data point to see its source code changes.