Decoupling Data Operations From Data Infrastructure Using Nexla

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

25 August 2021

Decoupling Data Operations From Data Infrastructure Using Nexla - E215

0:00/0:00

Share on social media:

Summary

The technological and social ecosystem of data engineering and data management has been reaching a stage of maturity recently. As part of this stage in our collective journey the focus has been shifting toward operation and automation of the infrastructure and workflows that power our analytical workloads. It is an encouraging sign for the industry, but it is still a complex and challenging undertaking. In order to make this world of DataOps more accessible and manageable the team at Nexla has built a platform that decouples the logical unit of data from the underlying mechanisms so that you can focus on the problems that really matter to your business. In this episode Saket Saurabh (CEO) and Avinash Shahdadpuri (CTO) share the story behind the Nexla platform, discuss the technical underpinnings, and describe how their concept of a Nexset simplifies the work of building data products for sharing within and between organizations.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management
When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
Schema changes, missing data, and volume anomalies caused by your data sources can happen without any advanced notice if you lack visibility into your data-in-motion. That leaves DataOps reactive to data quality issues and can make your consumers lose confidence in your data. By connecting to your pipeline orchestrator like Apache Airflow and centralizing your end-to-end metadata, Databand.ai lets you identify data quality issues and their root causes from a single dashboard. With Databand.ai, you’ll know whether the data moving from your sources to your warehouse will be available, accurate, and usable when it arrives. Go to dataengineeringpodcast.com/databand to sign up for a free 30-day trial of Databand.ai and take control of your data quality today.
We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to dataengineeringpodcast.com/census today to get a free 14-day trial.
Your host is Tobias Macey and today I’m interviewing Saket Saurabh and Avinash Shahdadpuri about Nexla, a platform for powering data operations and sharing within and across businesses

Interview

Introduction
How did you get involved in the area of data management?
Can you describe what Nexla is and the story behind it?
What are the major problems that Nexla is aiming to solve?
- What are the components of a data platform that Nexla might replace?
What are the use cases and benefits of being able to publish data sets for use outside and across organizations?
What are the different elements involved in implementing DataOps?
How is the Nexla platform implemented?
- What have been the most comple engineering challenges?
- How has the architecture changed or evolved since you first began working on it?
- What are some of the assumptions that you had at the start which have been challenged or invalidated?
What are some of the heuristics that you have found most useful in generating logical units of data in an automated fashion?
Once a Nexset has been created, what are some of the ways that they can be used or further processed?
What are the attributes of a Nexset? (e.g. access control policies, lineage, etc.)
- How do you handle storage and sharing of a Nexset?
What are some of your grand hopes and ambitions for the Nexla platform and the potential for data exchanges?
What are the most interesting, innovative, or unexpected ways that you have seen Nexla used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on Nexla?
When is Nexla the wrong choice?
What do you have planned for the future of Nexla?