Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

09 October 2018

Combining Transactional And Analytical Workloads On MemSQL with Nikita Shamgunov - E51

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Share on social media:


Summary
One of the most complex aspects of managing data for analytical workloads is moving it from a transactional database into the data warehouse. What if you didn’t have to do that at all? MemSQL is a distributed database built to support concurrent use by transactional, application oriented, and analytical, high volume, workloads on the same hardware. In this episode the CEO of MemSQL describes how the company and database got started, how it is architected for scale and speed, and how it is being used in production. This was a deep dive on how to build a successful company around a powerful platform, and how that platform simplifies operations for enterprise grade data management.
Preamble
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute.
  • You work hard to make sure that your data is reliable and accurate, but can you say the same about the deployment of your machine learning models? The Skafos platform from Metis Machine was built to give your data scientists the end-to-end support that they need throughout the machine learning lifecycle. Skafos maximizes interoperability with your existing tools and platforms, and offers real-time insights and the ability to be up and running with cloud-based production scale infrastructure instantaneously. Request a demo at dataengineeringpodcast.com/metis-machine to learn more about how Metis Machine is operationalizing data science.
  • And the team at Metis Machine has shipped a proof-of-concept integration between the Skafos machine learning platform and the Tableau business intelligence tool, meaning that your BI team can now run the machine learning models custom built by your data science team. If you think that sounds awesome (and it is) then join the free webinar with Metis Machine on October 11th at 2 PM ET (11 AM PT). Metis Machine will walk through the architecture of the extension, demonstrate its capabilities in real time, and illustrate the use case for empowering your BI team to modify and run machine learning models directly from Tableau. Go to metismachine.com/webinars now to register.
  • Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch.
  • Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
  • Your host is Tobias Macey and today I’m interviewing Nikita Shamgunov about MemSQL, a newSQL database built for simultaneous transactional and analytic workloads
Interview
  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by describing what MemSQL is and how the product and business first got started?
  • What are the typical use cases for customers running MemSQL?
  • What are the benefits of integrating the ingestion pipeline with the database engine? 
    • What are some typical ways that the ingest capability is leveraged by customers?
  • How is MemSQL architected and how has the internal design evolved from when you first started working on it?
    • Where does it fall on the axes of the CAP theorem?
    • How much processing overhead is involved in the conversion from the column oriented data stored on disk to the row oriented data stored in memory?
    • Can you describe the lifecycle of a write transaction?
  • Can you discuss the techniques that are used in MemSQL to optimize for speed and overall system performance?
    • How do you mitigate the impact of network latency throughout the cluster during query planning and execution?
  • How much of the implementation of MemSQL is using custom built code vs. open source projects?
  • What are some of the common difficulties that your customers encounter when building on top of or migrating to MemSQL?
  • What have been some of the most challenging aspects of building and growing the technical and business implementation of MemSQL?
  • When is MemSQL the wrong choice for a data platform?
  • What do you have planned for the future of MemSQL?

Contact Info
Parting Question
  • From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Share on social media:


Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey