Self Service Real Time Data Integration Without The Headaches With Meroxa - Episode 153

Summary

Analytical workloads require a well engineered and well maintained data integration process to ensure that your information is reliable and up to date. Building a real-time pipeline for your data lakes and data warehouses is a non-trivial effort, requiring a substantial investment of time and energy. Meroxa is a new platform that aims to automate the heavy lifting of change data capture, monitoring, and data loading. In this episode founders DeVaris Brown and Ali Hamidi explain how their tenure at Heroku informed their approach to making data integration self service, how the platform is architected, and how they have designed their system to adapt to the continued evolution of the data ecosystem.

Datadog LogoDatadog is a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog delivers complete visibility into the performance of modern applications in one place through its fully unified platform—which improves cross-team collaboration, accelerates development cycles, and reduces operational and development costs.

Try Datadog in your environment with a free 14-day trial—and get a complimentary t-shirt if you install the agent. Go to datadog.com/dataengineeringpodcast to get started!

Immuta LogoBusinesses are increasingly faced with the challenge of satisfying several, often conflicting, demands regarding sensitive data. From sharing and using sensitive data to complying with regulations and navigating new cloud-based platforms, Immuta helps solve these needs and more.

With automated, scalable data access and privacy controls, and enhanced collaboration between data and compliance teams, Immuta empowers data teams to easily access the data they need, when they need it – all while protecting sensitive data and ensuring their customers’ privacy. Immuta integrates with leading technology and solutions providers so you can govern your data on your desired analytic system.

Start a free trial of Immuta​ to see the power of automated data governance for yourself.


Your data platform needs to be scalable, fault tolerant, and performant, which means that you need the same from your cloud provider. Linode has been powering production systems for over 17 years, and now they’ve launched a fully managed Kubernetes platform. With the combined power of the Kubernetes engine for flexible and scalable deployments, and features like dedicated CPU instances, GPU instances, and object storage you’ve got everything you need to build a bulletproof data pipeline. If you go to dataengineeringpodcast.com/linode today you’ll even get a $60 credit to use on building your own cluster, or object storage, or reliable backups, or… And while you’re there don’t forget to thank them for being a long-time supporter of the Data Engineering Podcast!


Announcements

  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise.
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help you derive real results from your data at dataengineeringpodcast.com/immuta.
  • Today’s episode of the Data Engineering Podcast is sponsored by Datadog, a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog uses machine-learning based algorithms to detect errors and anomalies across your entire stack—which reduces the time it takes to detect and address outages and helps promote collaboration between Data Engineering, Operations, and the rest of the company. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial. If you start a trial and install Datadog’s agent, Datadog will send you a free T-shirt.
  • You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data platforms. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to dataengineeringpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today!
  • Your host is Tobias Macey and today I’m interviewing DeVaris Brown and Ali Hamidi about Meroxa, a new platform as a service for data integration

Interview

  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by describing what you are building at Meroxa and what motivated you to turn it into a business?
  • What are the lessons that you learned from your time at Heroku which you are applying to your work on Meroxa?
  • Who are your target users and what are your guiding principles for designing the platform interface?
  • What are the common difficulties that engineers face in building and maintaining data infrastructure?
  • There are a variety of platforms that offer solutions for managing data integration, or powering end-to-end analytics, or building machine learning pipelines. What are the shortcomings of those existing options that might lead someone to choose Meroxa?
  • How is the Meroxa platform architected?
    • What are some of the initial assumptions that you had which have been challenged as you proceed with implementation?
  • What new capabilities does Meroxa bring to someone who uses it for integrating their application data?
  • What are the growth options for organizations that get started with Meroxa?
  • What are the core principles that you are focused on to allow for evolving your platform over the long run as the surrounding ecosystem continues to mature?
  • When is Meroxa the wrong choice?
  • What do you have planned for the future?

Contact Info

Parting Question

  • From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

  • Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Liked it? Take a second to support the Data Engineering Podcast on Patreon!