With the constant evolution of technology for data management it can seem impossible to make an informed decision about whether to build a data warehouse, or a data lake, or just leave your data wherever it currently rests. What’s worse is that any time you have to migrate to a new architecture, all of your analytical code has to change too. Thankfully it’s possible to add an abstraction layer to eliminate the churn in your client code, allowing you to evolve your data platform without disrupting your downstream data users. In this episode AtScale co-founder and CTO Matthew Baird describes how the data virtualization and data engineering automation capabilities that are built into the platform free up your engineers to focus on your business needs without having to waste cycles on premature optimization. This was a great conversation about the power of abstractions and appreciating the value of increasing the efficiency of your data team.
Datacoral is this week’s Data Engineering Podcast sponsor. Datacoral provides an AWS-native, serverless, data infrastructure that installs in your VPC. Datacoral helps data engineers build and manage the flow of data pipelines without having to construct its infrastructure. Datacoral’s customers report that their data engineers are able to spend 80% of their work time invested in data transformations, rather than pipeline maintenance. Raghu Murthy, founder and CEO of Datacoral built data infrastructures at Yahoo! and Facebook, scaling from mere terabytes to petabytes of analytic data. He started Datacoral with the goal to make SQL the universal data programming language. Visit dataengineeringpodcast.com/datacoral for more information.
Do you want to try out some of the tools and applications that you heard about on the Data Engineering Podcast? Do you have some ETL jobs that need somewhere to run? Check out Linode at dataengineeringpodcast.com/linode or use the code dataengineering2019 and get a $20 credit (that’s 4 months free!) to try out their fast and reliable Linux virtual servers. They’ve got lightning fast networking and SSD servers with plenty of power and storage to run whatever you want to experiment on.
What happens when your expanding log & event data threatens to topple your Elasticsearch strategy? Whether you’re running your own ELK Stack or leveraging an Elasticsearch-based service, unexpected costs and data retention limits quickly mount. Now try CHAOSSEARCH. Run your entire logging infrastructure on your AWS S3. Never move your data. Fully managed service. Half the cost of Elasticsearch. Check out this short video overview of CHAOSSEARCH today! Forget Elasticsearch! Try – search analytics on your AWS S3.
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
- This week’s episode is also sponsored by Datacoral, an AWS-native, serverless, data infrastructure that installs in your VPC. Datacoral helps data engineers build and manage the flow of data pipelines without having to manage any infrastructure, meaning you can spend your time invested in data transformations and business needs, rather than pipeline maintenance. Raghu Murthy, founder and CEO of Datacoral built data infrastructures at Yahoo! and Facebook, scaling from terabytes to petabytes of analytic data. He started Datacoral with the goal to make SQL the universal data programming language. Visit dataengineeringpodcast.com/datacoral today to find out more.
- Having all of your logs and event data in one place makes your life easier when something breaks, unless that something is your Elastic Search cluster because it’s storing too much data. CHAOSSEARCH frees you from having to worry about data retention, unexpected failures, and expanding operating costs. They give you a fully managed service to search and analyze all of your logs in S3, entirely under your control, all for half the cost of running your own Elastic Search cluster or using a hosted platform. Try it out for yourself at dataengineeringpodcast.com/chaossearch and don’t forget to thank them for supporting the show!
- You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, Alluxio, and Data Council. Upcoming events include the combined events of the Data Architecture Summit and Graphorum, the Data Orchestration Summit, and Data Council in NYC. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
- Your host is Tobias Macey and today I’m interviewing Matt Baird about AtScale, a platform that
- How did you get involved in the area of data management?
- Can you start by describing the AtScale platform and how it fits in the ecosystem of data tools?
- What was your motivation for building the platform and what were some of the early challenges that you faced in achieving your current level of success?
- How is the AtScale platform architected and what have been some of the main areas of evolution and change since you first began building it?
- How has the surrounding data ecosystem changed since AtScale was founded?
- How are current industry trends influencing your product focus?
- Can you talk through the workflow for someone implementing AtScale?
- What are some of the main use cases that benefit from data virtualization capabilities?
- How does it influence the relevancy of data warehouses or data lakes?
- What are some of the types of tools or patterns that AtScale replaces in a data platform?
- What are some of the most interesting or unexpected ways that you have seen AtScale used?
- What have been some of the most challenging aspects of building and growing the platform?
- When is AtScale the wrong choice?
- What do you have planned for the future of the platform and business?
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
- Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat