Serving Machine Learning Models

A Guide to Architecture, Stream Processing Engines, and Frameworks

Boris Lublinsky, Principal Architect, Lightbend, Inc.


Grab your copy

Machine learning is certainly one of the hottest topics in software engineering today, but one aspect of this field demands more attention: how to serve models that have been trained. Typically, two different groups are responsible for model training and model serving. Data scientists often introduce their own machine-learning tools, causing software engineers to create complementary model-serving frameworks to keep pace. It’s not a very efficient system.

This practical report demonstrates a more standardized approach to model serving and model scoring. Author Boris Lublinsky, Principal Architect at Lightbend, introduces architecture for serving models in real time as part of input stream processing. This approach would also enable data science teams to update models without restarting existing applications.

Using Python, Beam, Flink, Spark, Kafka streams and Akka code examples (available on GitHub), Lublinsky examines different ways to build this model-scoring solution, using several popular stream processing engines and frameworks.

You’ll explore:

  • Methods for exporting models, using Predictive Model Markup Language (PMML) and TensorFlow as examples
  • Implementing Lightbend’s architecture with stream processing engines: Spark, Flink, and Beam
  • Implementing the same solution with stream processing libraries: Kafka Streams and Akka Streams
  • Methods for monitoring the architecture with queryable state