The Data Engineering Podcast is supported by some great companies. Check out the sponsors that have helped to bring you the stories behind the projects that you use every day and the people who built them.
Do you want to try out some of the tools and applications that you heard about on the Data Engineering Podcast? Do you have some ETL jobs that need somewhere to run? Check out Linode at dataengineeringpodcast.com/linode or use the code dataengineering2019 and get a $20 credit (that’s 4 months free!) to try out their fast and reliable Linux virtual servers. They’ve got lightning fast networking and SSD servers with plenty of power and storage to run whatever you want to experiment on.
Managing data in your warehouse today is hard. Often you’ll find yourself relying on manual work and hacks to get something done. Data knowledge is fragmented across the team and the data is often unreliable. So you hire a data engineer and they spend all their time managing this custom infrastructure when what they want to be doing is focusing on writing code.
Dataform enables data teams to own the end to end data transformation process by giving them a collaborative web based platform where they can develop SQL code with real time error checking, run schedules, write data quality tests and use a data catalog to document.
Enabling analysts to publish tables and maintain complex SQL workflows without requiring the help of engineers. And letting data engineers focus their time on transformation code instead of having to maintain custom infrastructure.
Alluxio provides an open source unified data orchestration layer for hybrid and multi-cloud environments, making data accessible wherever data computation and processing is done. By seamlessly pulling data from underlying data silos, Alluxio unlocks the value of data and allows for modern data-intensive workloads to become truly elastic and flexible for the cloud.
Want a free Alluxio t-shirt? Sign up below and we’ll send one to you!
Datacoral is this week’s Data Engineering Podcast sponsor. Datacoral provides an AWS-native, serverless, data infrastructure that installs in your VPC. Datacoral helps data engineers build and manage the flow of data pipelines without having to construct its infrastructure. Datacoral’s customers report that their data engineers are able to spend 80% of their work time invested in data transformations, rather than pipeline maintenance. Raghu Murthy, founder and CEO of Datacoral built data infrastructures at Yahoo! and Facebook, scaling from mere terabytes to petabytes of analytic data. He started Datacoral with the goal to make SQL the universal data programming language. Visit dataengineeringpodcast.com/datacoral for more information.
DataKitchen’s DataOps software allows your team to quickly iterate and deploy pipelines of code, models, and data sets while improving quality.
Unlike a patchwork of manual operations, DataKitchen makes your team shine by providing an end to end DataOps solution with minimal programming that uses the tools you love.
Join the DataOps movement and sign up for our newsletter at datakitchen.io/de today.
Hate data conferences that are swarming with sales people! So do we!, That’s why we created a better one! Data Council Helps Technical Professionals Stay
Abreast of the Latest Advancements in Data Engineering, Science & Machine Intelligence. This April we will host 6 unique tracks and 50+ speakers over 2 full
days of deeply technical learning and fun. We are offering a $200 discount to listeners of the Data Engineering Podcast. Use code: DEP-200 at checkout
Listen, I’m sure you work for a ‘data driven’ company – who doesn’t these days?? Does your company use Amazon Redshift? Have you ever groaned over slow queries or are just afraid that Amazon Redshift is gonna fall over at some point??
Well, you GOTTA talk to the folks over at intermix.io. They have built the “missing” Amazon Redshift console – it’s an amazing analytics product for data engineers to find and re-write slow queries and gives actionable recommendations to optimize data pipelines. WeWork, Postmates, and Medium are just a few of their customers.
DEP listeners get a $50 discount! Just go to dataengineeringpodcast.com/intermix and use promo code DEP at sign up.
Your Data Scientist finished a new Machine Learning model, so he sends you his python script and wishes you good luck. Now you have to figure out where to put it and plead with DevOps to deploy it. Not to mention write the API to consume the model’s results.
Wouldn’t it make your job easier if the Data Science team could build, train, deploy and monitor their models independently? Metis Machine agrees.
Meet Skafos, the machine learning platform that enables teams of data scientists to drastically speed up the time to market by providing tools and workflows that are familiar and easy. Serverless ML production deployment is as simple as “git push”. Skafos orchestrates your jobs seamlessly, guaranteeing they will run.
Skafos handles the tedious and time-consuming work of applying Machine Learning at scale so you can focus on what you do best.
The team here at Metis Machine shipped a proof-of-concept integration between our powerful machine learning platform Skafos, and the business intelligence software Tableau. BI teams can now invoke custom-built machine learning models built by in-house science teams.
Does that sound awesome? It is.
Join Metis Machine’s free webinar to walk through the architecture of this extension, demonstrate its capabilities in real time, and lay out a use case for empowering your BI team to modify machine learning models independently and immediately see the results, right from Tableau. You have to see it to believe it. So join us on October 11th at 2 PM ET (11 AM PT) and see what
Skafos + Tableau can do.
To register, go to metismachine.com/webinars
Integrating data across the enterprise has been around for decades – so have the techniques to do it. But, a new way of integrating data and improving streams has evolved. By integrating each silo independently – data is able to integrate without any direct relation. At CluedIn they call it “eventual connectivity”. If you want to learn more on how to deliver fast access to your data across the enterprise leveraging this new method, and the technologies that make it possible, get a demo or presentation of the CluedIn Data Hub by visiting dataengineeringpodcast.com/cluedin.
Join us at the Data Orchestration Summit on November 7 at the Computer History Museum in Mountain View hosted by Alluxio! This one day community conference is focused on the key data engineering challenges and solutions around building analytics and AI platforms. Attendees will hear from companies including Walmart, Netflix, Google, DBS Bank, on how they leveraged technologies such as Alluxio, Presto, Spark, Tensorflow, and you will also hear from creators of open source projects including Alluxio, Presto, Airflow, Iceberg, and more! Use discount code PODCAST for 25% off tickets. Admissions also includes a free training session on getting started with Presto and Alluxio in AWS run by the creators of Presto and Alluxio. Attendees will takeaway learnings, swag, a free voucher to visit the museum, and a chance to win the latest ipad Pro!
Data Engineering Podcast listeners get 25% off with discount code PODCAST. Register here!
Segment provides the reliable data infrastructure companies need to easily collect, clean, and control their customer data. Once you try it, you’ll understand why Segment is one of the hottest companies coming out of Silicon Valley. Segment recently launched a Startup Program so that early-stage startups can get a Segment account totally free up to $25k, plus exclusive deals from some favorite vendors and other resources to become data experts. Go to dataengineeringpodcast.com/segmentio today and see if you or a startup you know qualify for the program today.
Have you ever found yourself lost in a pile of directories, each only differing by a cryptic and poorly considered version number, wishing that you could just dump it all into your source control system to track changes and change history? Lucky for you the fine folks at Quilt Data were in the same boat and decided to build something just for you! Quilt is an open source platform for managing your data sets in the same way that you manage your software. It includes metadata management, version history, and distributed delivery so that you can build a workflow that works for your whole team.
Stop by their booth at JupyterCon in New York City on August 22nd through the 24th to say Hi and tell them that the Data Engineering Podcast sent you! After that, keep an eye on the AWS marketplace for a pre-packaged version of Quilt for Teams to deploy into your own environment and stop fighting with your data.
strongDM enables you to easily manage and audit access to databases and servers. Leading organizations including Hearst, SoFi, and Peloton rely on strongDM to eliminate the manual-heavy work required to onboard, offboard, and audit staff’s access to everything. Simplify your access control strategy today and schedule a demo to see how much easier your life can be.
This episode of the Data Engineering Podcast is brought to you by Clubhouse, the first project management platform for software development that brings everyone together so that teams can focus on what matters – creating products their customers love. Clubhouse provides the perfect balance of simplicity and structure for better cross-functional collaboration. Its fast, intuitive interface makes it easy for people on any team to focus-in on their work on a specific task or project, while also being able to “zoom out” to see how that work is contributing towards the bigger picture. With a simple API and robust set of integrations, Clubhouse also seamlessly integrates with the tools you use everyday, getting out of your way so that you can deliver quality software on time.
Listeners of the Data Engineering Podcast can sign up for two free months of Clubhouse by visiting dataengineeringpodcast.com/clubhouse.
What happens when your expanding log & event data threatens to topple your Elasticsearch strategy? Whether you’re running your own ELK Stack or leveraging an Elasticsearch-based service, unexpected costs and data retention limits quickly mount. Now try CHAOSSEARCH. Run your entire logging infrastructure on your AWS S3. Never move your data. Fully managed service. Half the cost of Elasticsearch. Check out this short video overview of CHAOSSEARCH today! Forget Elasticsearch! Try – search analytics on your AWS S3.
Mode is the only data platform built by data experts for data experts. With Mode, analysts and data scientists work how they want to, with a powerful end-to-end workflow that covers everything from exploration stages to final, shareable product. They get the flexibility to work with raw or modeled data without moving between different programs, and Mode’s robust collaboration tools make it easy to work with other data experts on their team. As a result, they can mine for more opportunities, diagnose bigger business problems, predict outcomes, and make recommendations for the future faster than ever before.
Check out the data analysis platform that Lyft trusts at dataengineeringpodcast.com/mode-lyft