https://arxiv.org/pdf/1703.08219.pdf
We present Flare: a new back-end for Spark that brings performance closer to the best SQL engines, without giving up the added expressiveness of Spark. We demonstrate order of magnitude speedups both for relational workloads such as TPC-H, as well as for a range of machine learning kernels that combine relational and iterative functional processing.
Flare achieves these results through (1) compilation to native code, (2) replacing parts of the Spark runtime system, and (3) extending the scope of optimization and code generation to large classes of UDFs.
We present Flare: a new back-end for Spark that brings performance closer to the best SQL engines, without giving up the added expressiveness of Spark. We demonstrate order of magnitude speedups both for relational workloads such as TPC-H, as well as for a range of machine learning kernels that combine relational and iterative functional processing.
Flare achieves these results through (1) compilation to native code, (2) replacing parts of the Spark runtime system, and (3) extending the scope of optimization and code generation to large classes of UDFs.