Summary
Building and maintaining a system that integrates and analyzes all of the data for your organization is a complex endeavor. Operating on a shoe-string budget makes it even more challenging. In this episode Tyler Colby shares his experiences working as a data professional in the non-profit sector. From managing Salesforce data models to wrangling a multitude of data sources and compliance challenges, he describes the biggest challenges that he is facing.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, a 40Gbit public network, fast object storage, and a brand new managed Kubernetes platform, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. And for your machine learning workloads, they’ve got dedicated CPU and GPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
- You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on great conferences. We have partnered with organizations such as ODSC, and Data Council. Upcoming events include the Observe 20/20 virtual conference and ODSC East which has also gone virtual. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
- Your host is Tobias Macey and today I’m interviewing Tyler Colby about his experiences working as a data professional in the non-profit arena, most recently at the Natural Resources Defense Council
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you start by describing your responsibilities as the director of data infrastructure at the NRDC?
- What specific challenges are you facing at the NRDC?
- Can you describe some of the types of data that you are working with at the NRDC?
- What types of systems are you relying on for the source of your data?
- What kinds of systems have you put in place to manage the data needs of the NRDC?
- What are your biggest influences in the build vs. buy decisions that you make?
- What heuristics or guidelines do you rely on for aligning your work with the business value that it will produce and the broader mission of the organization?
- Have you found there to be any extra scrutiny of your work as a member of a non-profit in terms of regulations or compliance questions?
- Your career has involved a significant focus on the Salesforce platform. For anyone not familiar with it, what benefits does it provide in managing information flows and analysis capabilities?
- What are some of the most challenging or complex aspects of working with Saleseforce?
- In light of the current global crisis posed by COVID-19 you have established a new non-profit entity to organize the efforts of various technical professionals. Can you describe the nature of that mission?
- What are some of the unique data challenges that you anticipate or have already encountered?
- How do the data challenges of this new organization compare to your past experiences?
- What have you found to be most useful or beneficial in the current landscape of data management systems and practices in your career with non-profit organizations?
- What are the areas that need to be addressed or improved for workers in the non-profit sector?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
Links
- NRDC
- AWS Redshift
- Time Warner Cable
- Salesforce
- Cloud For Good
- Tableau
- Civis Analytics
- EveryAction
- BlackBaud
- ActionKit
- MobileCommons
- XKCD 1667
- GDPR == General Data Privacy Regulation
- CCPA == California Consumer Privacy Act
- Salesforce Apex
- Salesforce.org
- Salesforce Non-Profit Success Pack
- Validity
- OpenRefine
- JitterBit
- Skyvia
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering podcast, the show about modern data management. When you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need some more to deploy it. So check out our friends over at Linode. With 200 gigabit private networking, scalable shared block storage, a 40 gigabit public network, fast object storage, and a brand new managed Kubernetes platform, you've got everything you need to run a fast, reliable, and bulletproof data platform. And for your machine learning workloads, they've got dedicated CPU and GPU instances.
Go to data engineering podcast.com slash lunode today to get a $20 credit and launch a new server in under a minute. And don't forget to thank them for their continued support of this show. You listen to this show to learn and stay up to date with what's happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers, you don't want to miss out on some great conferences. We have partnered with organizations such as ODSC and Data Council with upcoming events, including the observe 2020 virtual conference on April 6th and ODSC East, which has also gone virtual starting April 16th.
Go to data engineering podcast.com/conferences to learn more about these and other events and take advantage of our partner discounts to save money when you register today. Just because you're stuck at home doesn't mean you can't still learn something. Your host is Tobias Macy. And today, I'm interviewing Tyler Colby about his experiences as working as a data professional in the nonprofit arena, most recently at the Natural Resources Defense Council. So Tyler, can you start by introducing yourself? Absolutely.
[00:01:43] Unknown:
I'm Tyler Colby. I work at the Natural Resources Defense Council as a data infrastructure director, currently overseeing all of our data infrastructure needs, most prevalent among which is our Salesforce instance, which leads our fundraising and development efforts, and, our Amazon Redshift, which sits at the middle of all of our disparate data platforms. And do you remember how you first got involved in the area of data management? Absolutely. I've had a twisting and turning path through my career. So it started way back in 2005 with Time Warner Cable. I was working my way through college as a part time telemarketer.
And my manager at the time was fed up with the ridiculous order entry, training. It was about a 6 week process in order to get everybody onto the telephones. And, you know, he was looking at the team and saying, you know, this is these are people who signed up to be telemarketers, not order entry specialists. And so he went out and bought Salesforce. And as is the case with so many Salesforce professionals, fell into what's called kind of an accidental admin role and was just put at the, you know, core of standing up the Salesforce instance because I knew computers, which was pretty much the only qualification at that time. We went through some really rapid transformations there, and were able to take the team from 8 part time telemarketers up to 250 covering the entire Midwest in the matter of just a couple years. And what that really showed me was, you know, when we start to look at these, at the time, new technologies, right? Cloud was brand new at the time. Just how powerful it was to not have to, you know, wait on an IT queue of things or stand up a whole bunch of servers in order to manage these, different processes, that there was a new and better way to do these things.
As it's progressed through the years, I've tackled everything from, you know, doing pro bono consulting for nonprofits. I've worked on the for profit side, then at Salesforce itself. And then, did a lot of consulting before this as an integration architect as well with a company called Cloud For Good, which dealt specifically with nonprofits working on Salesforce, but connecting them to usually legacy systems and making sure that their data could flow between legacy systems and Salesforce so that they could you know, still have that rapid transformation in the future with this, you know, better platform, but still keep some of those platforms, that they had on-site and on premises.
[00:04:08] Unknown:
You mentioned that in your current role as the director of data infrastructure for the NRDC, you're working pretty heavily with Salesforce and Red Shift. I'm curious if you can just give a bit more of a description of the overall scope of your responsibilities there and some of the challenges that you're facing in terms of being able to manage those elements of the data platform and any other components that you're using to support it. Absolutely.
[00:04:32] Unknown:
So the NRDC started its rapid modern modernization of its data systems approximately 4 years ago. And if we look at the landscape at the NRDC 4 years ago, it was a lot of on premise and very siloed systems. And a lot of them were starting to become or just were being deprecated or sunset by the vendors that supplied them. So looking at, you know, end of life scenarios on a lot of those platforms forced us into making some really rapid decisions. So specifically with Salesforce, we had a very tight turnaround on the platform and a very abbreviated implementation. So a lot of times, we would have to, you know, in the last few years, we would have to go back to our old data systems to pull through either data or reconfigure a process in order to move that through. But today, we are moving to a more traditional data warehouse model, specifically with Amazon Redshift sitting at the number of all of our platforms. So Salesforce, as we mentioned, is the key, for our fundraising and development efforts. We do also have ActionKit for our advocacy work, Mobile Commons for our SMS based messaging. We do Tableau for dashboarding.
And our direct mail efforts are run off of Heroku Postgres with Salesforce Connect. We have a lot of other space and specifically the Salesforce nonprofit space space, and specifically the Salesforce nonprofit space for quite a while. And what I found unique about my time at the NRDC is really the scale of data. For a nonprofit, we're not, you know, a lot of times when people think nonprofit, they think, you know, small amounts. And I know, you know, we're not looking into the billions and, you know, into the petabytes, but we are still working with the millions and tens of millions of records across our datasets. So as we look at, you know, how do we move this data around? How do we keep this data in sync? And how do we make sure that we know which constituent is which across each of these disparate platforms? So as that conversation evolves across each of these different platforms that we're able to know what this is, develop the metrics around them, make sure that things like their communication consider a large data volume. And also
[00:06:50] Unknown:
for a lot of the organizations that are dealing with petabyte or exabyte scales of data, their primary challenge is just in terms imagine in the nonprofit for smaller organizations, and particularly, I imagine in the nonprofit space, the challenge isn't so much in the volume of data and being able to scale to meet that. It's in the variety and the overall cleanliness of the data that you're dealing with and being able to scale to those challenges, which is a much different type of scaling and a much different type of complexity that you're dealing with, where if you have these petabyte and exabyte scale datasets, then there's a high probability that you've already put in the effort to make sure that everything that's landing in that is already clean and fairly homogeneous. So And so, again, it's just a matter of being able to process at scale rather than being able to do a bunch of cleanup after the fact. And so, again, it's just a matter of being able to process at scale rather than being able to integrate all of the data from the disparate sources and the different ways of representing it, which I'm sure is probably more of what you're dealing with. And so I'm curious a bit about the types of data sources and some of the ways that it is, some of the way that it manifests that you're having to deal with for your work. That's a really insightful way to put it because,
[00:08:03] Unknown:
you know, we're looking not at, you know, petabytes of log data. We're gonna be looking at gigabytes of very complex data and data that has a number of rules that is attached to each of those records. And there's a, you know, a lot of different scenarios that it can fall into. There's an old XKCD comic. It's number 1667. It's 1 of my favorite. And it describes the complexity of algorithms on a spectrum. It has left pad and quick sort on the far left. And on the far right is a sprawling excel sheet from a non profit. And, you know, in my experience, that is so the case with nonprofits. You know, if we look at, you know, either the traditional ETL or ELT models, that t is so heavy.
And there's so much complexity built into what does this data exactly need to do. And developing, you know, even getting the business analysis around what do we need this data to do, when does you know, if we look internally to the NRDC, you know, we have things called giving levels. So as giving levels increase, that may mean a different set of choices and a whole different rubric for where that data could go, what we think of the data, what team that's being routed to. And that's just 1, of many decisions. And when you start to compound these issues on top of each other, it gets really difficult just to make sure that that, you know, again, that t within either the ETL or ELTs framework is correctly, attributed or attributing those records and moving them to where they need to go. And moving beyond that, really, there's a large segment of nonprofits that are really just limited by funding.
So, you know, during my time consulting for nonprofits, especially as an integration architect, solutions really had to meet some very strict funding limitations. So typically, you know, services would be covered by a grant or a specific donation with very left very little leftover for choosing the perfect solution or the most robust solution. So a lot of times, you know, we'd be left with tooling that is just as good as it could be, and then designing processes that were as good as they could be given the limitations. So when we look at the NRDC in specific in today's world, we are fortunate that we are able to spend a little extra time, and we have a great internal team in order to build out some more robust data pipelines, especially because we have Redshift and a great analytics team that's assisting us. So we work with Civis Analytics, specifically to do a lot of our, more complex data pipelines. But, you know, because we have those, we are able to get to more automated data flows, make sure that our data is being less pronounced in for profit institutions is
[00:10:53] Unknown:
less pronounced in for profit institutions is the need to very closely align the work that you're doing with the value that it's going to produce to ensure that you're not wasting cycles on something that might be technically elegant and useful, but is not necessarily going to give the immediate impact that's necessary to ensure that you're meeting the mission and these, specific financial needs of the organization that you're working with. And I'm curious how that manifests in terms of the ways that you approach the technical and design decisions of your work and any of the aspects of the build versus buy dichotomy in terms of how you're building things out? Build versus buy is a constant,
[00:11:34] Unknown:
question, and it comes across daily. You know, I was on a call this even this morning, and we were talking about specifically doing roll ups within our instance and starting to look at more time series and snapshot data. And, you know, there's a lot of different ways that we could approach the situation. And we do have Tableau, on-site currently, but talking through just where do we store this data, where does it move out to, and do we need something specific to solve that 1 specific issue? So, you know, as giving level changes or as the primary team assigned changes, getting a grasp on what that population is, that falls into that giving level, how much funding, how many activities is a constant struggle. Again, at the NRDC, we have a lot of different options, and, you know, we have a lot of different tools at our disposal in order to decide, you know, is this a builder or buy? Typically, we do end up on the buy, you know, more buy the best solution. And, you know, we can stand up and build if we need to. But with smaller nonprofits, it does become more of a challenge. So, you know, in my work as, you know, consulting, a lot of times it would end up being, you know, a design limit. You'd end up designing around, it, and it's less around the buy and more what can we do as far as almost like a minimal viable product just to make things work. And so
[00:12:58] Unknown:
you mentioned that at your current state, you have Salesforce and Redshift as the primary data sources that you're working with. And you have multiple other sources of information that feed into those. And you also mentioned that as of about 4 years ago is when this transformation started at NRDC. I'm curious what you saw to be the state of the ecosystem at the time, and any challenges that you see in terms of the available tooling and available systems as it pertains to the needs of nonprofit organizations, where a lot of these platforms were built out either from industry with the needs of suiting the for profit spectrum, or in academia with a larger research bent that was then repurposed for use in industry. And so given that there are a lot of different influences going into the tools that are available, how well do you find that they meet the needs of the specifics of nonprofits, and how did you approach the overall effort of navigating the landscape of what was available to determine what would best fit your needs at the NRDC?
[00:14:04] Unknown:
Absolutely. There are a lot of vendors and in the nonprofit space that really tailor their solutions and their platforms for nonprofits. So it was a long and lengthy decision, before we decided on Salesforce. And when we looked at the entire, you know, landscape of possibilities and where where we could go, there's a few main players, within the nonprofit space, specifically when we look at something like Salesforce to lead our fundraising and development. So, you know, when we looked at competitors like EveryAction and Blackbaud has a number of solutions, really it came down to the ease of which we could implement. So, you know, looking at the partner network and looking at the robust support that we could get from Salesforce. And then also looking at, you know, what could we do off of this platform?
So as we speak, we are talking in the middle of the COVID 19 crisis and self isolating. And 1 of the projects that came down across all of our platforms was making sure that these platforms are mobile ready. So that as people are making the shift from working in an office to working from home, that we don't have to go through and make sure that we're standing up these systems to make them ready wherever somebody is. And what's great about, at least, Salesforce as a platform is all of that was ready for us from day 1. So we didn't have to go through and do a bunch of system configuration or, you know, all you know, even adjusting our page layouts for it. It was all ready for our users from day 1, with all the security and everything else, already built in. So having that box checked on day 1 makes crisis planning that much easier. And as we look through, you know, the rest of the other decisions, the time to get somebody from walking into Salesforce as just an admin to being very proficient with the system is just a matter of years sometimes versus, you know, on some systems, a matter of decades, including long periods of college training.
So Salesforce really allowed us to just move a lot faster as far as getting our system ready for users. As we start to look towards our data side, it's a very similar decision to why we chose Redshift. So Redshift has allowed us to very quickly, you know, pull a lot of data in. And we're ingesting, data from all of our systems into Redshift and really using that as our central hub to do a lot of the more ELT work so that we are combining these things. We just had a significant release solving 1 of our main issues, which was identity resolution. So I think I mentioned this a little bit before, but, you know, who is who across each system? It's this problem that I believe has a lot of solutions on the for profit side, but not a lot that has been done on the nonprofit side. So until just very recently. So we ingest Salesforce, ActionKit, Mobile Commons, and a few other data sources, and we resolved them all down to a single ID, which we call the NRDC ID. And we just had, again, a very significant release that allows us to do this, and we'll be building off of this. So our next big challenge is making sure that communication preferences are unique across all systems. So really making sure that we honor the data privacy legislation that's coming out, so GDPR and CCPA, across all these systems.
And making sure also that our constituents are being contacted the way they want to be contacted. So, you know, sometimes it is because we are a large environmental nonprofit. You know, some people don't like to receive direct mail from us, and they only want to receive email communications. So making sure that that message gets across all systems so we're not accidentally sending mail and, you know, upsetting donors as well is something that we have keep in mind. So, you know, some of that tooling and some of that work we are doing and building on, but isn't readily available and isn't there are no out of the box solutions for it today. Going further on the subject of regulations and compliance,
[00:18:08] Unknown:
and as you mentioned, some of the less formally stated, but still important aspects of user privacy, such as, you know, not, accidentally exposing somebody's affiliation or support for some particular organization. What are some of the data challenges that exist in terms of the nonprofit space for being able to comply with any of this increased scrutiny of regulations
[00:18:36] Unknown:
or different compliance regimes and things like that? And what are some of the ways that you have found to be useful strategies for approaching that? You know, if you asked me that question 6 months ago, I wouldn't have had a good answer. The Salesforce platform has pushed out a number of updates, And we are able to take each specific piece of data within Salesforce as far as what regulation it may fall under. So whether it's just taking it as PII and making sure that it's masked in any system that's not Salesforce or on any export, all the way up to taking it for GDPR or any of the right to be forgotten legislation.
We're able to do that on the metadata level inside of Salesforce now, which has been a great help. Because as we started to work through the architecting of that solution and how do we actually pattern this out, It would have been built on top of our identity resolution platform with custom rules. So we had started to build this on our own, but Salesforce did come out with a significant release which allows us to tag it there and then start to handle a lot of that more on the Salesforce side. Since most of our PII and constituent data is housed inside of Salesforce today, making sure that that is our main hub for most of those decisions And tracking which of that data falls into that is, you know, again, has been a great save of time. Moving beyond today though and, you know, where we go. So we have gone through our system and tagged, all of the field level data with that those metadata options.
We are moving to, or moving that data now to our Amazon Redshift and starting to pattern that out using that NRDC ID. So making sure that each of these systems, have the same, the same column metadata, as we do in Salesforce, is our current effort, and just make sure that it's marked across the system. And then while that data is being ingested into Redshift, we'll make sure that it follows the same masking and identification rules, as well as delete policies as well. And that ripples even into our backups as as well, and making sure that if we do receive a delete request, that that's being placed through, to all of our systems and our backups as well. So currently that effort is manual. We don't we have not received many. I think in the last 6 months I've only received 2, delete requests.
But as these, right to be forgotten legislations become more prevalent, I'm assuming that that will be an uptick, especially as we see, you know, states adopt more. And most likely, at some point, a US federal legislation will come. We'll likely see that be an uptick. Digging further into Salesforce,
[00:21:19] Unknown:
I'm tangentially familiar with it as a CRM and sales platform, particularly from when it was first introduced several years ago. And for anybody who isn't familiar with it, can you talk through a bit of the elements that are useful for it and some of the overall workflow that's involved in being able to take advantage of Salesforce, particularly as a data professional?
[00:21:41] Unknown:
So the reason that we like to use Salesforce and what I've seen with Salesforce over the years is a lot of work and focus being placed into the ability to export data and the ability to work with, the data within their system. So a lot of work has been put into their bulk API, and the ability to, move records in and out quickly from Salesforce in the tens of thousands or sometimes even hundreds of thousands of records within a batch. They also have their streaming API, which has enabled a lot more of those rapid and on time, real time changes as well, as well as platform events and change data capture, which allow more of the more modern integration design patterns. So a lot of work has been put into the platform from a data side that allows all of the new design patterns. And we're using a number of those platforms today and a a lot of those, different APIs today. So when we need to move records around in bulk, we're able to get records in and out very, very quickly, especially for processes that don't require a lot of those complex changes. So, you know, whether it's an update across all, you know, millions of records or taking data out to, do roll ups in an outside system, all of that is done extremely quickly. When we do get to some of those more complex data patterns, so, you know, things that do require multiple levels of transformation or a lot of rules within the transformation, we do need to look to outside tooling. So we could put this inside of Salesforce, and, their coding platform is very similar to Java. It's called Apex. And we can put those transfers or transformations into, you know, a trigger and allow things to work there. But a lot of times, we're able to handle this usually off platform and then push the data in. Again, every nonprofit is going to be a little bit different when they're working with Salesforce. So some will do all of those transformations right on platform if they're dealing with a lower, data volume. But because of our data volume and because of, all the complexity in in our system, a lot of times we'll do those transformations off and then use those different APIs to push everything in. But 1 of the key things with Salesforce is, again, that ease of adoption.
It can be extremely difficult for a system's implementation to move from any legacy system over to Salesforce. But what I found in my career is that, you know, when you talk to people about, you know, how they used to work with these older systems, so, you know, Oracle or Microsoft, that those systems implementations,
[00:24:13] Unknown:
you know, were years long, where many times with Salesforce, it's a months long process. So it just takes that time down because so many of the decisions have already been made for you. 1 of the roles that I've seen is a Salesforce architect, which points to the level of flexibility and customization that you mentioned. And I'm wondering what you have found to be some of the common stumbling blocks or the innate complexities of the platform that users should be aware of as they're starting to either modify their existing implementation or start to onboard an organization onto Salesforce?
[00:24:49] Unknown:
I think Salesforce, like any other platform, can really suffer from just technical debt. And because Salesforce has implemented so many tools that are very friendly for admins and very declarative, so instead of, you know, requiring a CS degree to make a process or make a trigger, make something scalable, they have put a lot of tools into the hands of admins. So point and click tools where you can stand up workflow rules or triggers, and a lot of these data processes that are, you know, very seemingly complex and would require code in other systems. The problem with that is when you get to, you know, pages upon pages of automation that have been put into place by an admin without oversight or without long term thinking. So, hey, this business rule came up, and I made a workflow rule, and then it's in the system. Or they have process builders and flow and all of these declarative tools that they put into people's, tool belts. And that's great. And that's, you know, really fantastic for, again, that ease of adoption. But when you start to look at year 5 or year 10 on Salesforce, and when I would work with organizations that are in year 5 or year 10, I would see, you know, extremely slow system performance.
I would even see, you know, sometimes inability to use, specific portions of the system or get access to records because so much automation had been put into place. So when we do look at system architects, or technical architects, or engaging with a consultant, a lot times our first work with an existing implementation and an existing org is starting to take a look through what is already in place, and where are the pain points within the system, where are things going slowly. As we look to organizations that are functioning working well with an architect today, a lot of that work starts to become around, well, what can we do to make sure that page load times are working correctly?
So on the front end, making sure that things look correctly, that these systems are architected for large data volumes. So again, we could if we build a Salesforce system with not a lot of automation in place and make it a more simple system. A lot of those design system or designs decisions don't have to be taken at the architect level. They could be taken at the admin. But if we're starting to look across millions of records, we need to make sure that, you know, anything that goes into the system is well architected, has a design plan that makes sense. And little design decisions can make a big difference into things like record locking, user performance, ability to open records, ability to search for the correct records.
And all of that bubbles up and really falls into that role of an architect or, a trusted consulting partner. In addition to your current role at the NRDC,
[00:27:42] Unknown:
you have also taken it upon yourself to spin up a new nonprofit organization in light of the current global crisis that we're going through with the COVID 19 virus. I'm wondering if you can describe a bit of the nature of that mission and the organization that you're building up around it, and some of the goals that you have for that organization, and how you're hoping to make an impact on the current state of affairs. Absolutely. Yeah. So I was sitting around as we record about 2 weeks ago now and just looking at kind of the state of affairs and the state of the crisis and where experts were saying it was going to trend to. And having been in the nonprofit
[00:28:21] Unknown:
vertical for so long, I knew that, you know, not only was the virus going to impact it and, you know, the public health crisis, but as we look to all of the other, impacts of this crisis as it unfolds, the market crash is a a very big impact as well because usually the first thing to dry up during this time is funding to nonprofits. And it's these community facing organizations that are really going to fill in those critical infrastructure gaps, that are left as we go. And so making sure that our food banks, our animal shelters, our homeless populations, and so many other key areas of our society that are already served by these organizations, that they still have a path to receive these services in this new and changing way that we interact with each other was really a focus. So when I took a look at, you know, all the things that are really unique and special about the Salesforce nonprofit community, 1 of the main things that we do is called a community sprint. And the Salesforce nonprofit team called salesforce.org have been holding these community sprints for a number of years. And typically, these are in person, 2 day events where a whole bunch of professionals so I think the last 1 that I was at was somewhere around 250 professionals, all fly in from around the country. We sit in a conference room, we break apart into small groups, so similar to a hackathon, and we just work on issues. And then we donate all that code back to the community and back to Salesforce Salesforce's nonprofit platform, which is called the Nonprofit Success Pack. So all of that is open and available to any nonprofit that's on the Salesforce platform.
So really, what I was looking to do was say, hey. We need to have a sprint, and we need to have multiple sprints that can be done virtually so that we can start to bring these technology professionals together and start to give our time back to these community organizations during this time. So we need to instead of, you know, taking these off the calendar, we need to ramp up our efforts at this time. But more than that, it's, you know, not just about the Salesforce community, and it's not just about, you know, what we can do on this side of the fence. It's really a call to all technology professionals to stop and say, you know, during this time of self isolation, instead of just clearing out the Netflix queue or, you know, getting to those video games that you're looking to, you know, spend a little bit extra time on or whatever it is that you're doing to, you know, kill all this extra time. Turn your focus to the community. Who do you know in the community that could use a little bit of help? So whether that's helping somebody set up Zoom in an afternoon. You know, I had multiple conversations. We're just setting up Zoom so that people can meet face to face, have a conference call. These are things that we take for granted on a daily basis, a lot of these technologies. And just allowing that business continuity can mean the difference between the life and death of some of these organizations. So, you know, turn the focus around. So even if it's not specifically within the confines of, you know, what we're doing within these sprints, Just take a look at the community, see what you can do, and don't take the technology background that you have for granted.
And make sure that you're using it in this time is really our message.
[00:31:34] Unknown:
1 of the challenges with that type of approach, I imagine, is being able to scalably identify the specific needs of different organizations and then do some match making with people who have an appropriate skill set to help fulfill that need. And I'm wondering what your approach is in terms of being able to handle that match making. And particularly, since this is the data engineering podcast, what you see is being some of the unique data challenges that are posed by this situation and some of the needs for data professionals that might exist in the organizations and communities that you are currently working with? Absolutely.
[00:32:13] Unknown:
That's a great question. So, you know, when we take a look at things like case intake and, volunteer skill matching, that's an issue that the nonprofit community has been focused on for a very long time. So, you know, if you look at organizations like the United Way or other organizations that do mass amounts of volunteerism and really rely on the community to come in and lend a helping hand. Those are data challenges that we've been working on for years. So we have a lot of different case intake platforms, and a lot of different case intake processes.
And then matching those people to the correct volunteers, especially in a pro bono environment, is something that is really making sure that they're vetted to the, you know, correct people with the right expertise. Having a QA person, and a QA team that sits on top of that, that can kind of veto that decision as well. So it's not just people picking up a project, but really making sure that we have a QA team from the very get go, looking at who is being assigned to this and tasked to this initiative. And we're what we're creating is a hub that sits at the center of a lot of different groups. So this hub would be responsible for mostly the complex issues that can come out of this unfolding crisis. So if it's a food bank that we can take a look at, you know, designing a solution that links to popular on-site platforms that are currently developed. So Microsoft has most of the food bank ecosystem currently with a platform called Microsoft Series. And if it's, you know, developing something that we can have an integration between the 2, that's out of the box and allows these custom designs to come over to Salesforce.
So maybe it's inventory tracking implementation. Click through, we can start to move that inventory over to Salesforce, and we can start to move this, into Salesforce world. So the reason that I bring up that example is after Hurricane Harvey, I was in Houston and worked with the Houston Food Bank. And that was 1 of the initiatives that we did, was starting to migrate their data from Microsoft Series over to Salesforce. And they were keeping their Microsoft Series Instant. So this was a close to real time integration. We ended up doing a batch time of about 15 minutes between the 2 so that we could keep the 2 in sync. And what that allowed them to do was then use Salesforce's power to scale out. So they're able to go from a centralized distribution model to a more remote distribution model as well. So instead of just distributing from a single massive warehouse, they're able to now set up different distribution sites and know exactly what food was needed at each different site, who they're distributing to at each different site, and just allowed them to scale a lot. So they, went up 5 x on their distribution within the first few months after Hurricane Harvey hit. So without a technology that allows you to, you know, go off platform and do that, that transformation never woulda happened. Again, this is on the complex side of all the issues. And when we take a look at solutions like that, that's a solution that a lot of data professionals can start to bring to the table is saying, you know, hey, if we need to make a very large scale transformation, is this something that I can engage with so that people can either migrate their data in order or to a new platform in order to allow them to scale? Or is this something that we can make an integration between these 2 platforms and allow this, you know, rapid transformation of scale? It doesn't just have to be food banks. It can be, you know, a lot of different community organizations.
[00:35:48] Unknown:
And in terms of the challenges of the situation above and beyond what you have seen in your career with nonprofits, what are some of the things that stand out to you that you are currently working on trying to find solutions for and some of the support that would be most valuable to you as you continue on, this path and as the crisis sort of continues somewhat unabated?
[00:36:16] Unknown:
That's a really good question. And, you know, honestly, I've been trying to also find the balance, for my personal life as well. Because, you know, as this crisis unfolds, I think it's all impacting everybody in a different emotional way outside of just technology systems as well. So, I've been trying to build this really as a grassroots effort and allowing different leaders to speak and different leaders to step up and take action with my own initiative, and making sure that, you know, when there's a better solution, that I get out of the way as well. Because I think as this unfolds, it will impact us all in just a number of ways. I know personally in my life, I've already had, you know, multiple phone conversations that on a typical week, you know, or in in a typical times, I've just never had.
Whether it's, you know, people being laid off or regarding quarantine plans or making sure that, you know, coworkers and family are just safe, and making sure that they're taking the appropriate precautions. So there's a lot as this unfolds, and it will impact a lot of us on a personal level. So 1 of the things I'll say is, you know, this is again, it's a pro bono effort. I urge, you know, all technology professionals to say, what is my capacity to give? Don't give too much. So don't overpromise and say that you're gonna, you know, give everything. See what you can give. So if it's, you know, a couple hours on a weekend that you can set up a Zoom session and start to take a look through and offer your services, that can sometimes mean the difference between, again, business continuity, your community organization being able to stay alive and fill that critical infrastructure need, and possibly not. So but take a look at your own capacity first before you decide to give.
[00:38:01] Unknown:
And going back out to the point of tools and platforms that exist that are available off the shelf, either in terms of open source where you can build to fit or on hosted platforms. What have you found to be some of the most useful or beneficial in the current landscape of data management systems and best practices? And what are the areas that you feel need to be addressed or improved particularly for workers in the nonprofit sector? It's a great question. So we've had a lot of success,
[00:38:31] Unknown:
with a number of different ETL platforms and a number of different solutions. So specific to Salesforce, is a platform called Validity. They offer a number of different tools specifically around data management. So whether it's looking at merge records, identifying duplicates within your dataset, and just keeping it clean on an automated basis, it's a very robust platform. It's very similar to Google Refine or the old, I believe they switched to Open Refine a few years ago. So very similar set of functionality, but very robust and very very robust specifically for Salesforce.
So we've been working with their platform for a long time, specifically around keeping our data clean on the Salesforce platform. And that's really within our entire data architecture. Salesforce is where most of our data comes in. When we look at the rest of our data ecosystem currently, you know, things like, you know, Amazon Redshift. Amazon Redshift has allowed us to just move at lightning speed compared to a lot of the other platforms that we had been considering, and especially compared to if we had tried to build this ourselves and host this, you know, on a SQL Server or something like that internally. So the ability to just ingest a ton of data, the compression around that, the ability to manipulate that data inside and then push it out to these respective systems, Again, it was just it was something in my career that I didn't know that I was missing it until I had it. And so, you know, I wish that I had worked with that a lot sooner. Besides that, you know, there are some tools that are off the shelf, that I've worked with at a number of nonprofits that are very helpful.
So Jitterbit, you know, has mixed reviews within some of the community, but I've had great success with a number of nonprofits using that as an ETL platform. And even as we go into kind of the micro side, there's a great platform called Skyvia, s k y v I a, which offers some limited functionality. But when we just need some bare bones, you know, batch functionality, typically to move a CSV, Typically, the design pattern is CSV from an FTP over to Salesforce. It allows even an admin to come in and stand up a lot of that without a lot of cost overhead. So the the technology has come down, so that it is accessible to a lot of these nonprofits.
And that has been a great help in order to just make sure that this data is moving from system to system or from place to place no matter how complex it is. Are there any other aspects of your work at the NRDC
[00:41:03] Unknown:
or the tech workers' tasks force or your just overall experience of working in nonprofits as a data professional that we didn't discuss that you'd like to cover before we close out the show? I think this is a a good overview of,
[00:41:16] Unknown:
those. I I would just mention again, you know, if you are looking at the tech worker task force and, that it is meant for mostly Salesforce professionals. So, we could use more data professionals in there, but really making sure that the focus is, again, just turning the commute turning back to your community and reaching out, to either local user groups or just local organizations and just saying, what help do you need at this time? Because you'll be shocked at how often it's a very simple issue, that may not be a 100% what you're doing today, but something that you know, it could be,
[00:41:52] Unknown:
setting up a Slack community in an afternoon so that, discussions can continue. So, you know, very easy things can happen, and very easy solutions can have a very large impact at this time. Well, for anybody who wants to follow along with you or get in touch or offer their help, I'll have you add your preferred contact information to the show notes. And as a final question, I would just like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today. So I'll answer this specifically for nonprofits. And really, the biggest gap I see is just a standard unified
[00:42:24] Unknown:
tool for nonprofits to use that is easy to use across all these different platforms. So right now, there are some platforms. So platforms, so Workato, again, I mentioned Skyvia before, that do allow a more developer or excuse me, a more admin experience, so more declarative point and click integration experience. But on 1 side, you're hit by limitations with Skyvia because of its low cost. It's not a very robust feature set. And on the other end with Recado, a lot of the nonprofits get hit just by the price. So it's kind of a hard sell between the 2. So finding a nice middle ground where we can, you know, start to automate some of these data platforms and automate these data flows for smaller nonprofits and smaller datasets has just been difficult, especially given that they don't have the funding in order to hire an on-site developer and have the staff. So I would like to see some tooling that falls into that kind of middle ground. So easy to use, easy to stand up for an admin, but not hitting the budget in a very harsh way. Well, thank you very much for taking the time today to share your experiences
[00:43:32] Unknown:
working with Salesforce and in the nonprofit community. It's definitely an interesting and valuable area of effort. So thank you for all of your time and, effort on that front. And I hope you enjoy the rest of your day. Thank you, Tobias. It was a pleasure being here. Listening. Don't forget to check out our other show, podcast. Dotinit@pythonpodcast.com to learn about the Python language, its community, and the innovative ways that is being used. And visit the site at dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction to Tyler Colby and His Role
Tyler's Journey into Data Management
Modernizing Data Systems at NRDC
Challenges in Data Variety and Cleanliness
Build vs. Buy Decisions in Nonprofits
Choosing Salesforce and Redshift
Data Privacy and Compliance in Nonprofits
Salesforce as a Data Platform
Common Stumbling Blocks in Salesforce
Tyler's New Nonprofit Initiative
Matching Volunteer Skills with Needs
Balancing Personal Life and Professional Efforts
Useful Tools and Platforms for Nonprofits
Final Thoughts and Call to Action