That’s OctoML’s offering in a nutshell. We think it paints a representative picture of today’s landscape in AI application deployment, a domain also known as MLOps. We have identified MLOps as a key part of the ongoing shift to machine learning-powered applications, and introduced OctoML in March 2021, on the occasion of their Series B funding round. Launched today at TVMcon 2021, the conference around the open source Apache TVM framework for machine learning acceleration, OctoML’s new release brings a number of new features. We caught up with OctoML CEO and Co-founder Luis Ceze to discuss OctoML’s progress, as a proxy for MLOps progress at large.
Exceeding expectations
The first thing to note in this progress report of sorts is that OctoML has exceeded the goals set out by Ceze in March 2021. Ceze noted back then that the goals for the company were to grow its headcount, expand to the edge, and make progress towards adding support for training machine learning models, beyond inference which is already supported. All of those goals were met to some extent or another, with support for training machine learning models covered in depth by ZDNet’s own Tiernan Ray recently. Ceze said that OctoML is making good progress on that front, with the roadmap being to release this on the OctoML platform at some point in 2022. What was never listed as a goal, but happened anyway, was another funding round. This took place in November 2021, and OctoML received $85 million in a Series C round. As Ceze noted, it’s a sign of the times. We have literally lost count of the seemingly never-ending funding rounds in the AI space lately. Ceze said that although OctoML was not looking to raise more money, they decided that it would help them grow faster. And grow they did. OctoML exceeded its goals in terms of recruitment, now has a headcount of over 100, and keeps growing. This is notable, given the hard to find the combination of expertise in machine learning and hardware that OctoML is looking for. Let’s see what else OctoML has accomplished, and what new features it’s announcing today. First, it has expanded the choice of deployment targets to include Microsoft Azure target support. OctoML now provides choice across all three major clouds, including AWS and Google Cloud Platform, with AMD and Intel CPUs and Nvidia GPUs as target options in each cloud. Interestingly, OctoML recently published some experimentation with Apple’s M1 processor as well. We asked Ceze whether support for it is coming and whether support for up-and-coming hardware vendors such as Blaize, Graphcore or Samba Nova is on the roadmap too. Ceze replied that the goal of the M1 exercise was to show that OctoML can easily onboard any hardware target, whether this is done in collaboration with the vendor or independently. Support for M1, or for any other hardware target, will be added on a market-driven basis. Most of the up and coming vendors are aware of OctoML, and many of them ping the company to work with them or do so on their own, he went on to add. The other front on which OctoML is expanding its support for is the edge. OctoML now has support for Nvidia Jetson AGX Xavier, and Jetson Xavier NX to go along with Arm A-72 CPUs using 32 and 64 bit OSs. Ceze verified what we have been noting as well – there is tremendous growth in demand for edge machine learning applications.
More acceleration engines, more choice
On the software side of things, OctoML is announcing expanded model format support that includes ONNX, TensorFlow Lite, and several TensorFlow model packaging formats, so that users can upload their trained models without conversion. But that’s not all there is to it. In addition, the respective new acceleration engines – ONNX Runtime, TensorFlow, and TensorFlow Lite – are now supported in in addition to OctoML’s “native” TVM support. This way, Ceze said, users can compare and contrast and choose which one they want to use. This is a departure from what was previously a tight coupling between the open source Apache TVM project, and OctoML’s offering. Essentially, OctoML offered the software-as-a-service version of TVM. Now, OctoML also offers additional choices in terms of acceleration engines. Users, Ceze noted, now have the ability to do a very comprehensive benchmarking of their workflows: “You upload the model, and then you can choose what hardware targets you want in one single workflow, that can be against all clouds or specific edge targets. And then we do the optimization or the tuning packaging benchmark and provide this comprehensive data to help you make decisions in how you’re going to deploy in the cloud or your edge devices”, said Ceze. In addition, OctoML now comes with what they dub “a pre-accelerated Model Zoo”. In other words, a collection of machine learning models which includes a Computer Vision (Object Classification and Image Detection) set that includes Resnet, YOLO, Mobilenet, Inception, and others, as well as a Natural Language Processing (NLP) set that includes BERT, GPT-2, and more. As far as the Apache TVM community goes, Ceze noted that there’s 50% growth compared to last year, and the momentum is not slowing down. Ceze also mentioned some interesting use cases of OctoML adoption, including Microsoft Watch For, and Woven Planet Holdings. All in all, considering it has only been 6 months since OctoML’s Series B, growth has been remarkable. We see it as exemplifying the growth in MLOps at large, and we expect this to keep going on this trajectory for the foreseeable future.