The Data Engineering Podcast is supported by some great companies. Check out the sponsors that have helped to bring you the stories behind the projects that you use every day and the people who built them.
Tidy Data is a monitoring platform to help you monitor your data pipeline. Custom in-house solutions are costly, laborious, and fragile. Replacing them with Tidy Data’s consistent managed data ops platform will solve these issues. Monitor your data pipeline like you monitor your website. It’s like pingdom for data. No credit card required to sign up. Go to dataengineeringpodcast.com/tidydata today to get started with their free tier.
Machine learning is finding its way into every aspect of software engineering, making understanding it critical to future success. Springboard has partnered with us to help you take the next step in your career by offering a scholarship to their Machine Learning Engineering career track program. In this online, project-based course every student is paired with a Machine Learning expert who provides unlimited 1:1 mentorship support throughout the program via video conferences. You’ll build up your portfolio of machine learning projects and gain hands-on experience in writing machine learning algorithms, deploying models into production, and managing the lifecycle of a deep learning prototype.
Springboard offers a job guarantee, meaning that you don’t have to pay for the program until you get a job in the space. The Data Engineering Podcast is exclusively offering listeners 20 scholarships of $500 to eligible applicants. It only takes 10 minutes and there’s no obligation. Go to dataengineeringpodcast.com/springboard and apply today! Make sure to use the code AISPRINGBOARD when you enroll.
Do you want to try out some of the tools and applications that you heard about on the Data Engineering Podcast? Do you have some ETL jobs that need somewhere to run? Check out Linode at dataengineeringpodcast.com/linode or use the code dataengineering2019 and get a $20 credit (that’s 4 months free!) to try out their fast and reliable Linux virtual servers. They’ve got lightning fast networking and SSD servers with plenty of power and storage to run whatever you want to experiment on.
Data pipelines are the backbone of modern data systems and yet existing solutions require excessive coding and don’t operationally scale. The Ascend Unified Data Engineering Platform makes it possible for teams to quickly and easily build autonomous data pipelines that dynamically adapt to changes in data, code, and environment. Data engineers using Ascend can now ingest, build, integrate, run, and govern advanced data pipelines with 95% less code. Start building on Ascend today with a free 30-day trial and partner with a dedicated data engineer to help you get started.
Have you ever found yourself lost in a pile of directories, each only differing by a cryptic and poorly considered version number, wishing that you could just dump it all into your source control system to track changes and change history? Lucky for you the fine folks at Quilt Data were in the same boat and decided to build something just for you! Quilt is an open source platform for managing your data sets in the same way that you manage your software. It includes metadata management, version history, and distributed delivery so that you can build a workflow that works for your whole team.
Stop by their booth at JupyterCon in New York City on August 22nd through the 24th to say Hi and tell them that the Data Engineering Podcast sent you! After that, keep an eye on the AWS marketplace for a pre-packaged version of Quilt for Teams to deploy into your own environment and stop fighting with your data.
Datacoral is this week’s Data Engineering Podcast sponsor. Datacoral provides an AWS-native, serverless, data infrastructure that installs in your VPC. Datacoral helps data engineers build and manage the flow of data pipelines without having to construct its infrastructure. Datacoral’s customers report that their data engineers are able to spend 80% of their work time invested in data transformations, rather than pipeline maintenance. Raghu Murthy, founder and CEO of Datacoral built data infrastructures at Yahoo! and Facebook, scaling from mere terabytes to petabytes of analytic data. He started Datacoral with the goal to make SQL the universal data programming language. Visit dataengineeringpodcast.com/datacoral for more information.
Enabling real-time analytics is a huge task. Without a data warehouse that outperforms the demands of your customers at a fraction of cost and time, this big task can also prove challenging. But it doesn’t have to be tiring or difficult with ClickHouse — an open-source analytical database that deploys and scales wherever and whenever you want it to and turns data into actionable revenue. And Altinity is the leading ClickHouse software and service provider on a mission to help data engineers and DevOps managers. Go to dataengineeringpodcast.com/altinity to find out how with a free consultation.
Join us at the Data Orchestration Summit on November 7 at the Computer History Museum in Mountain View hosted by Alluxio! This one day community conference is focused on the key data engineering challenges and solutions around building analytics and AI platforms. Attendees will hear from companies including Walmart, Netflix, Google, DBS Bank, on how they leveraged technologies such as Alluxio, Presto, Spark, Tensorflow, and you will also hear from creators of open source projects including Alluxio, Presto, Airflow, Iceberg, and more! Use discount code PODCAST for 25% off tickets. Admissions also includes a free training session on getting started with Presto and Alluxio in AWS run by the creators of Presto and Alluxio. Attendees will takeaway learnings, swag, a free voucher to visit the museum, and a chance to win the latest ipad Pro!
Data Engineering Podcast listeners get 25% off with discount code PODCAST. Register here!
DataKitchen offers the first end-to-end DataOps Platform that empowers teams to reclaim control of their data pipelines and deliver business value instantly, without errors. The platform automates and coordinates all the people, tools, and environments in your entire data analytics organization – everything from orchestration, testing and monitoring to development and deployment. It’s DataOps Delivered.
Alluxio provides an open source unified data orchestration layer for hybrid and multi-cloud environments, making data accessible wherever data computation and processing is done. By seamlessly pulling data from underlying data silos, Alluxio unlocks the value of data and allows for modern data-intensive workloads to become truly elastic and flexible for the cloud.
Want a free Alluxio t-shirt? Sign up below and we’ll send one to you!
Hate data conferences that are swarming with sales people! So do we!, That’s why we created a better one! Data Council Helps Technical Professionals Stay
Abreast of the Latest Advancements in Data Engineering, Science & Machine Intelligence. This April we will host 6 unique tracks and 50+ speakers over 2 full
days of deeply technical learning and fun. We are offering a $200 discount to listeners of the Data Engineering Podcast. Use code: DEP-200 at checkout
What happens when your expanding log & event data threatens to topple your Elasticsearch strategy? Whether you’re running your own ELK Stack or leveraging an Elasticsearch-based service, unexpected costs and data retention limits quickly mount. Now try CHAOSSEARCH. Run your entire logging infrastructure on your AWS S3. Never move your data. Fully managed service. Half the cost of Elasticsearch. Check out this short video overview of CHAOSSEARCH today! Forget Elasticsearch! Try – search analytics on your AWS S3.
strongDM enables you to easily manage and audit access to databases and servers. Leading organizations including Hearst, SoFi, and Peloton rely on strongDM to eliminate the manual-heavy work required to onboard, offboard, and audit staff’s access to everything. Simplify your access control strategy today and schedule a demo to see how much easier your life can be.
This episode of the Data Engineering Podcast is brought to you by Clubhouse, the first project management platform for software development that brings everyone together so that teams can focus on what matters – creating products their customers love. Clubhouse provides the perfect balance of simplicity and structure for better cross-functional collaboration. Its fast, intuitive interface makes it easy for people on any team to focus-in on their work on a specific task or project, while also being able to “zoom out” to see how that work is contributing towards the bigger picture. With a simple API and robust set of integrations, Clubhouse also seamlessly integrates with the tools you use everyday, getting out of your way so that you can deliver quality software on time.
Listeners of the Data Engineering Podcast can sign up for two free months of Clubhouse by visiting dataengineeringpodcast.com/clubhouse.
Integrating data across the enterprise has been around for decades – so have the techniques to do it. But, a new way of integrating data and improving streams has evolved. By integrating each silo independently – data is able to integrate without any direct relation. At CluedIn they call it “eventual connectivity”. If you want to learn more on how to deliver fast access to your data across the enterprise leveraging this new method, and the technologies that make it possible, get a demo or presentation of the CluedIn Data Hub by visiting dataengineeringpodcast.com/cluedin.
Your Data Scientist finished a new Machine Learning model, so he sends you his python script and wishes you good luck. Now you have to figure out where to put it and plead with DevOps to deploy it. Not to mention write the API to consume the model’s results.
Wouldn’t it make your job easier if the Data Science team could build, train, deploy and monitor their models independently? Metis Machine agrees.
Meet Skafos, the machine learning platform that enables teams of data scientists to drastically speed up the time to market by providing tools and workflows that are familiar and easy. Serverless ML production deployment is as simple as “git push”. Skafos orchestrates your jobs seamlessly, guaranteeing they will run.
Skafos handles the tedious and time-consuming work of applying Machine Learning at scale so you can focus on what you do best.
The team here at Metis Machine shipped a proof-of-concept integration between our powerful machine learning platform Skafos, and the business intelligence software Tableau. BI teams can now invoke custom-built machine learning models built by in-house science teams.
Does that sound awesome? It is.
Join Metis Machine’s free webinar to walk through the architecture of this extension, demonstrate its capabilities in real time, and lay out a use case for empowering your BI team to modify machine learning models independently and immediately see the results, right from Tableau. You have to see it to believe it. So join us on October 11th at 2 PM ET (11 AM PT) and see what
Skafos + Tableau can do.
To register, go to metismachine.com/webinars
Segment provides the reliable data infrastructure companies need to easily collect, clean, and control their customer data. Once you try it, you’ll understand why Segment is one of the hottest companies coming out of Silicon Valley. Segment recently launched a Startup Program so that early-stage startups can get a Segment account totally free up to $25k, plus exclusive deals from some favorite vendors and other resources to become data experts. Go to dataengineeringpodcast.com/segmentio today and see if you or a startup you know qualify for the program today.
Mode is the only data platform built by data experts for data experts. With Mode, analysts and data scientists work how they want to, with a powerful end-to-end workflow that covers everything from exploration stages to final, shareable product. They get the flexibility to work with raw or modeled data without moving between different programs, and Mode’s robust collaboration tools make it easy to work with other data experts on their team. As a result, they can mine for more opportunities, diagnose bigger business problems, predict outcomes, and make recommendations for the future faster than ever before.
Check out the data analysis platform that Lyft trusts at dataengineeringpodcast.com/mode-lyft
Listen, I’m sure you work for a ‘data driven’ company – who doesn’t these days?? Does your company use Amazon Redshift? Have you ever groaned over slow queries or are just afraid that Amazon Redshift is gonna fall over at some point??
Well, you GOTTA talk to the folks over at intermix.io. They have built the “missing” Amazon Redshift console – it’s an amazing analytics product for data engineers to find and re-write slow queries and gives actionable recommendations to optimize data pipelines. WeWork, Postmates, and Medium are just a few of their customers.
DEP listeners get a $50 discount! Just go to dataengineeringpodcast.com/intermix and use promo code DEP at sign up.
Managing data in your warehouse today is hard. Often you’ll find yourself relying on manual work and hacks to get something done. Data knowledge is fragmented across the team and the data is often unreliable. So you hire a data engineer and they spend all their time managing this custom infrastructure when what they want to be doing is focusing on writing code.
Dataform enables data teams to own the end to end data transformation process by giving them a collaborative web based platform where they can develop SQL code with real time error checking, run schedules, write data quality tests and use a data catalog to document.
Enabling analysts to publish tables and maintain complex SQL workflows without requiring the help of engineers. And letting data engineers focus their time on transformation code instead of having to maintain custom infrastructure.
Many data engineers say the most frustrating part of their job is spending too much time maintaining and monitoring their data pipeline. Snowplow works with data-informed businesses to set up a real-time event data pipeline, taking care of installation, upgrades, autoscaling, and ongoing maintenance so you can focus on the data.
Snowplow runs in your own cloud account giving you complete control and flexibility over how your data is collected and processed. Best of all, Snowplow is built on top of open source technology which means you have visibility into every stage of your pipeline, with zero vendor lock in.
At Snowplow, we know how important it is for data engineers to deliver high-quality data across the organization. That’s why the Snowplow pipeline is designed to deliver complete, rich and accurate data into your data warehouse of choice. Your data analysts define the data structure that works best for your teams, and we enforce it end-to-end so your data is ready to use.
Get in touch with our team to find out how Snowplow can accelerate your analytics. Go to dataengineeringpodcast.com/snowplow. Set up a demo and mention you’re a listener for a special offer!