Combining Transactional And Analytical Workloads On MemSQL with Nikita Shamgunov - Episode 51

Summary

One of the most complex aspects of managing data for analytical workloads is moving it from a transactional database into the data warehouse. What if you didn’t have to do that at all? MemSQL is a distributed database built to support concurrent use by transactional, application oriented, and analytical, high volume, workloads on the same hardware. In this episode the CEO of MemSQL describes how the company and database got started, how it is architected for scale and speed, and how it is being used in production. This was a deep dive on how to build a successful company around a powerful platform, and how that platform simplifies operations for enterprise grade data management.

Your Data Scientist finished a new Machine Learning model, so he sends you his python script and wishes you good luck. Now you have to figure out where to put it and plead with DevOps to deploy it. Not to mention write the API to consume the model’s results.

Wouldn’t it make your job easier if the Data Science team could build, train, deploy and monitor their models independently? Metis Machine agrees.

Meet Skafos, the machine learning platform that enables teams of data scientists to drastically speed up the time to market by providing tools and workflows that are familiar and easy. Serverless ML production deployment is as simple as “git push”. Skafos orchestrates your jobs seamlessly, guaranteeing they will run.

Skafos handles the tedious and time-consuming work of applying Machine Learning at scale so you can focus on what you do best.

The team here at Metis Machine shipped a proof-of-concept integration between our powerful machine learning platform Skafos, and the business intelligence software Tableau. BI teams can now invoke custom-built machine learning models built by in-house science teams.

Does that sound awesome? It is.

Join Metis Machine’s free webinar to walk through the architecture of this extension, demonstrate its capabilities in real time, and lay out a use case for empowering your BI team to modify machine learning models independently and immediately see the results, right from Tableau. You have to see it to believe it. So join us on October 11th at 2 PM ET (11 AM PT) and see what
Skafos + Tableau can do.

To register, go to metismachine.com/webinars

linode-banner-sponsor-largeDo you want to try out some of the tools and applications that you heard about on the Data Engineering Podcast? Do you have some ETL jobs that need somewhere to run? Check out Linode at promo.linode.com/dataengineeringpodcast or use the code dataengineering2018 and get a $20 credit (that’s 4 months free!) to try out their fast and reliable Linux virtual servers. They’ve got lightning fast networking and SSD servers with plenty of power and storage to run whatever you want to experiment on.


Preamble

  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute.
  • You work hard to make sure that your data is reliable and accurate, but can you say the same about the deployment of your machine learning models? The Skafos platform from Metis Machine was built to give your data scientists the end-to-end support that they need throughout the machine learning lifecycle. Skafos maximizes interoperability with your existing tools and platforms, and offers real-time insights and the ability to be up and running with cloud-based production scale infrastructure instantaneously. Request a demo at dataengineeringpodcast.com/metis-machine to learn more about how Metis Machine is operationalizing data science.
  • And the team at Metis Machine has shipped a proof-of-concept integration between the Skafos machine learning platform and the Tableau business intelligence tool, meaning that your BI team can now run the machine learning models custom built by your data science team. If you think that sounds awesome (and it is) then join the free webinar with Metis Machine on October 11th at 2 PM ET (11 AM PT). Metis Machine will walk through the architecture of the extension, demonstrate its capabilities in real time, and illustrate the use case for empowering your BI team to modify and run machine learning models directly from Tableau. Go to metismachine.com/webinars now to register.
  • Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch.
  • Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
  • Your host is Tobias Macey and today I’m interviewing Nikita Shamgunov about MemSQL, a newSQL database built for simultaneous transactional and analytic workloads

Interview

  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by describing what MemSQL is and how the product and business first got started?
  • What are the typical use cases for customers running MemSQL?
  • What are the benefits of integrating the ingestion pipeline with the database engine?
    • What are some typical ways that the ingest capability is leveraged by customers?
  • How is MemSQL architected and how has the internal design evolved from when you first started working on it?
    • Where does it fall on the axes of the CAP theorem?
    • How much processing overhead is involved in the conversion from the column oriented data stored on disk to the row oriented data stored in memory?

    • Can you describe the lifecycle of a write transaction?
  • Can you discuss the techniques that are used in MemSQL to optimize for speed and overall system performance?

    • How do you mitigate the impact of network latency throughout the cluster during query planning and execution?
  • How much of the implementation of MemSQL is using custom built code vs. open source projects?
  • What are some of the common difficulties that your customers encounter when building on top of or migrating to MemSQL?
  • What have been some of the most challenging aspects of building and growing the technical and business implementation of MemSQL?
  • When is MemSQL the wrong choice for a data platform?
  • What do you have planned for the future of MemSQL?

Contact Info

Parting Question

  • From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA