Compression and Indexing of Big Data: what's new?
The last thirty years have witnessed an upsurging interest towards the design of algorithms and data structures that could scale with the deluge of data produced from an ever growing number of sources. That has resulted in the development of four main research areas: External Memory, Compressed, Cache-Oblivious and Streaming. Thanks to those developments, we have now theoretically sound results to cope with Big Data which are also implemented in widely used software libraries. In the first part of the talk I'll sketch some of these results on the case of strings and graphs. Then, I'll concentrate on a limitation of the proposed solutions that is becoming more and more evident, namely they offer a collection of different trade-offs in terms of e.g. I/Os, working space, time, etc., among which software engineers have (not easily, indeed) to choose the one that best fits the needs of their application. Hence I'll talk about a new series of results which combine novel machine-learning tools with classic data structures thus opening new avenues of research and application opportunities.