For business analytics the way that you model the data in your warehouse has a lasting impact on what types of questions can be answered quickly and easily. The major strategies in use today were created decades ago when the software and hardware for warehouse databases were far more constrained. In this episode Maxime Beauchemin of Airflow and Superset fame shares his vision for the entity-centric data model and how you can incorporate it into your own warehouse design.
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack
- Your host is Tobias Macey and today I'm interviewing Max Beauchemin about the concept of entity-centric data modeling for analytical use cases
- How did you get involved in the area of data management?
Can you describe what entity-centric modeling (ECM) is and the story behind it?
- How does it compare to dimensional modeling strategies?
- What are some of the other competing methods
- Comparison to activity schema
What impact does this have on ML teams? (e.g. feature engineering)
What role does the tooling of a team have in the ways that they end up thinking about modeling? (e.g. dbt vs. informatica vs. ETL scripts, etc.)
- What is the impact on the underlying compute engine on the modeling strategies used?
What are some examples of data sources or problem domains for which this approach is well suited?
- What are some cases where entity centric modeling techniques might be counterproductive?
What are the ways that the benefits of ECM manifest in use cases that are down-stream from the warehouse?
What are some concrete tactical steps that teams should be thinking about to implement a workable domain model using entity-centric principles?
- How does this work across business domains within a given organization (especially at "enterprise" scale)?
What are the most interesting, innovative, or unexpected ways that you have seen ECM used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on ECM?
When is ECM the wrong choice?
What are your predictions for the future direction/adoption of ECM or other modeling techniques?
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
- Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email firstname.lastname@example.org) with your story.
- To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
- Entity Centric Modeling Blog Post
- Max's Previous Apperances
- Apache Airflow
- Apache Superset
- Ralph Kimball
- The Rise Of The Data Engineer
- The Downfall Of The Data Engineer
- The Rise Of The Data Scientist
- Dimensional Data Modeling
- Star Schema
- Database Normalization
- Feature Engineering
- DRY == Don't Repeat Yourself
- Activity Schema
- Corporate Information Factory (affiliate link)