Matei Zaharia, Databricks, at #SparkSummit on #theCUBE
#SparkSummit #theCUBE
[ Ссылка ]
What new apps are chugging on Spark 2.2’s real-time engine?
Apache Spark 2.2 has achieved event-by-event data streaming by trimming some fat from its execution process. So what new applications will the leaner, meaner engine drive online?
“Since we began structured streaming, we tried to make sure the API [Application Programming Interface] is not tied in with micro-batching in any way, and so this is the next step to actually eliminate that from the engine,” said Matei Zaharia (pictured), chief technologist and co-founder of Databricks Inc., a cloud big data service founded by Spark.
Untying that knot for good frees Spark 2.2 to stream a single event at a time with 1 millisecond of latency — effectively, true real-time, Zaharia told George Gilbert (@ggilbert41) and David Goad (@davidgoad), co-hosts of theCUBE, SiliconANGLE Media’s mobile live streaming studio, during Spark Summit 2017 in San Francisco, California. (* Disclosure below.)
To take full advantage of Spark 2.2’s streaming engine, however, Spark APIs are available to integrate with users’ own databases. Conversely, “If you want to do these transactions on a file system, there will be basically some performance constraints to doing that,” Zaharia said.
The engine enables various next-gen continuous streaming data applications. Automated decision-making apps on websites for loan approval, for instance, are one type. “But it could be in an even lower latency, like say stock-market style of place or Internet of Things or industrial monitoring and making decisions there,” he said.
Continuous stream-to-stream Extract-Transform-Load can produce new data streams from existing ones without losing anything to latency, Zaharia continued. This may not sound exciting, but it could boost the performance of microservices-based applications, he added.
DataBricks Serverless announcement
In the microservices-next-gen app vein, DataBricks announced the DataBricks Serverless platform for running Spark and data applications at the Summit.
“Serverless computing is this idea of: Users can just submit a query or a computation. They don’t have to configure the hardware at all,” Zaharia said. “So far, [serverless computing] has been very successful with stateless workloads, such as SQL [Structured Query Language] or Amazon Lambda [serverless compute], which is just functions serving a webpage,” he said.
DataBricks now extends this to Spark and big data, Zaharia concluded.
Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s independent editorial coverage of Spark Summit 2017. (* Disclosure: DataBricks Inc. sponsored this Spark Summit 2017 segment on SiliconANGLE Media’s theCUBE. Neither DataBricks nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
Ещё видео!