Misaligned priorities across business units can lead to tensions that drive members of the organization to build data and analytics projects without the guidance or support of engineering or IT staff. The availability of cloud platforms and managed services makes this a viable option, but can lead to downstream challenges. In this episode Sean Knapp and Charlie Crocker share their experiences of working in and with companies that have dealt with shadow IT projects and the importance of enabling and empowering the use and exploration of data and analytics. If you have ever been frustrated by seemingly draconian policies or struggled to align everyone on your supported platform, then this episode will help you gain some perspective and set you on a path to productive collaboration.
Your data platform needs to be scalable, fault tolerant, and performant, which means that you need the same from your cloud provider. Linode has been powering production systems for over 17 years, and now they’ve launched a fully managed Kubernetes platform. With the combined power of the Kubernetes engine for flexible and scalable deployments, and features like dedicated CPU instances, GPU instances, and object storage you’ve got everything you need to build a bulletproof data pipeline. If you go to dataengineeringpodcast.com/linode today you’ll even get a $60 credit to use on building your own cluster, or object storage, or reliable backups, or… And while you’re there don’t forget to thank them for being a long-time supporter of the Data Engineering Podcast!
Many data engineers say the most frustrating part of their job is spending too much time maintaining and monitoring their data pipeline. Snowplow works with data-informed businesses to set up a real-time event data pipeline, taking care of installation, upgrades, autoscaling, and ongoing maintenance so you can focus on the data.
Snowplow runs in your own cloud account giving you complete control and flexibility over how your data is collected and processed. Best of all, Snowplow is built on top of open source technology which means you have visibility into every stage of your pipeline, with zero vendor lock in.
At Snowplow, we know how important it is for data engineers to deliver high-quality data across the organization. That’s why the Snowplow pipeline is designed to deliver complete, rich and accurate data into your data warehouse of choice. Your data analysts define the data structure that works best for your teams, and we enforce it end-to-end so your data is ready to use.
Get in touch with our team to find out how Snowplow can accelerate your analytics. Go to dataengineeringpodcast.com/snowplow. Set up a demo and mention you’re a listener for a special offer!
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, a 40Gbit public network, fast object storage, and a brand new managed Kubernetes platform, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. And for your machine learning workloads, they’ve got dedicated CPU and GPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
- Are you spending too much time maintaining your data pipeline? Snowplow empowers your business with a real-time event data pipeline running in your own cloud account without the hassle of maintenance. Snowplow takes care of everything from installing your pipeline in a couple of hours to upgrading and autoscaling so you can focus on your exciting data projects. Your team will get the most complete, accurate and ready-to-use behavioral web and mobile data, delivered into your data warehouse, data lake and real-time streams. Go to dataengineeringpodcast.com/snowplow today to find out why more than 600,000 websites run Snowplow. Set up a demo and mention you’re a listener for a special offer!
- You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
- Your host is Tobias Macey and today I’m interviewing Sean Knapp, Charlie Crocker about shadow IT in data and analytics
- How did you get involved in the area of data management?
- Can you start by sharing your definition of shadow IT?
- What are some of the reasons that members of an organization might start building their own solutions outside of what is supported by the engineering teams?
- What are some of the roles in an organization that you have seen involved in these shadow IT projects?
- What kinds of tools or platforms are well suited for being provisioned and managed without involvement from the platform team?
- What are some of the pitfalls that these solutions present as a result of their initial ease of use?
- What are the benefits to the organization of individuals or teams building and managing their own solutions?
- What are some of the risks associated with these implementations of data collection, storage, management, or analysis that have no oversight from the teams typically tasked with managing those systems?
- What are some of the ways that compliance or data quality issues can arise from these projects?
- Once a project has been started outside of the approved channels it can quickly take on a life of its own. What are some of the ways you have identified the presence of "unauthorized" data projects?
- Once you have identified the existence of such a project how can you revise their implementation to integrate them with the "approved" platform that the organization supports?
- What are some strategies for removing the friction in the collection, access, or availability of data in an organization that can eliminate the need for shadow IT implementations?
- What are some of the inherent complexities in data management which you would like to see resolved in order to reduce the tensions that lead to these bespoke solutions?
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
- Shadow IT
- Google Sawzall
- M&A == Mergers and Acquisitions
- Waterfall Development
- Data Governance
- Data Lineage
- Pioneers, Settlers, and Town Planners