Summary
Data is often messy or incomplete, requiring human intervention to make sense of it before being usable as input to machine learning projects. This is problematic when the volume scales beyond a handful of records. In this episode Dr. Cheryl Martin, Chief Data Scientist for Alegion, discusses the importance of properly labeled information for machine learning and artificial intelligence projects, the systems that they have built to scale the process of incorporating human intelligence in the data preparation process, and the challenges inherent to such an endeavor.
Preamble
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute.
- Are you struggling to keep up with customer request and letting errors slip into production? Want to try some of the innovative ideas in this podcast but don’t have time? DataKitchen’s DataOps software allows your team to quickly iterate and deploy pipelines of code, models, and data sets while improving quality. Unlike a patchwork of manual operations, DataKitchen makes your team shine by providing an end to end DataOps solution with minimal programming that uses the tools you love. Join the DataOps movement and sign up for the newsletter at datakitchen.io/de today. After that learn more about why you should be doing DataOps by listening to the Head Chef in the Data Kitchen at dataengineeringpodcast.com/datakitchen
- Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch.
- Your host is Tobias Macey and today I’m interviewing Cheryl Martin, chief data scientist at Alegion, about data labelling at scale
Interview
- Introduction
- How did you get involved in the area of data management?
- To start, can you explain the problem space that Alegion is targeting and how you operate?
- When is it necessary to include human intelligence as part of the data lifecycle for ML/AI projects?
- What are some of the biggest challenges associated with managing human input to data sets intended for machine usage?
- For someone who is acting as human-intelligence provider as part of the workforce, what does their workflow look like?
- What tools and processes do you have in place to ensure the accuracy of their inputs?
- How do you prevent bad actors from contributing data that would compromise the trained model?
- What are the limitations of crowd-sourced data labels?
- When is it beneficial to incorporate domain experts in the process?
- When doing data collection from various sources, how do you ensure that intellectual property rights are respected?
- How do you determine the taxonomies to be used for structuring data sets that are collected, labeled or enriched for your customers?
- What kinds of metadata do you track and how is that recorded/transmitted?
- Do you think that human intelligence will be a necessary piece of ML/AI forever?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
- Alegion
- University of Texas at Austin
- Cognitive Science
- Labeled Data
- Mechanical Turk
- Computer Vision
- Sentiment Analysis
- Speech Recognition
- Taxonomy
- Feature Engineering
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the data engineering podcast, the show about modern data management. When you're ready to build your next pipeline, you'll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and 40 gigabit network, all controlled by a brand new API, you'll get everything you need to run a bulletproof data platform. Go to data engineering podcast dot com slash linode to get a $20 credit and launch a new server in under a minute. And are you struggling to keep up with customer requests and letting errors slip into production? Wanna try some of the innovative ideas in this podcast but don't have time? DataKitchens' DataOps software allows your team to quickly iterate and deploy pipelines of code, models, and data sets while improving quality.
Unlike a patchwork of manual operations, DataKitchen makes your team shine by providing an end to end data ops solution with minimal programming that uses the tools you love. Join the DataOps movement today and sign up for the newsletter at datakitchen.iod/de. After that, learn more about why you should be doing data ops by listening to the head chef in the data kitchen at data engineering podcast.com/datakitchen. And go to data engineering podcast.com
[00:01:18] Unknown:
to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. Your host is Tobias Macy. And today, I'm interviewing Cheryl Martin, chief data scientist at Allegion, about data labeling at scale. So, Cheryl, could you start by introducing yourself?
[00:01:33] Unknown:
Sure. I'm Cheryl Martin. I recently joined Allegion as the chief data scientist after working more than 10 years as a research scientist at the University of Texas at Austin. My background as a data scientist is in the machine learning AI and software engineering side of the disciplines. And I also have some background in psychology and cognitive science. The intersection of all those disciplines is, a pretty exciting place for me to work. And how did you first get involved in the area of data management? Through necessity.
The first thing you do as a data scientist is look at your data. And it doesn't take long before you find yourself needing to wrangle it a bit to get it to fit your needs or a lot in some cases. Data scientists end up doing a lot of data management processes before they get to do the science part most of the time. You have to clean up some missing values or correct some errors, add some additional information from different sources, you name it. It all needs to be done before the data science part can start. As I mentioned,
[00:02:43] Unknown:
we're gonna be talking largely about data labeling and collecting source data for use in machine learning and AI projects, and that's a big part of what you're doing at Allegion. So can you just describe a bit about what the company does and how you operate and how you approach that problem space?
[00:03:03] Unknown:
Sure. Yes. So Allegion provides a platform that uses human intelligence to perform a variety of data operations. That includes things like data collection, extraction, evaluation, cleaning, enrichment, and normalization, the and so forth. We usually talk about our platform in terms of AI enablement and building up training data for machine learning. That ranges across a lot of different applications like computer vision initiatives, speech recognition, sentiment analysis, image moderation, and things like that. Once a AI model is trained, our platform also supports model validation, which is basically checking the model's results against human level judgments.
And it also supports exception handling. Kind of once a model is in operation, that allows this customer to send input to us for human processing if the AI model they have is producing low confidence result. The data operations that we provide aren't exclusively useful to AI developers, though. Many of our customers are thinking about doing AI 1 day, or they're just, exploring future AI opportunities, but they need the results already in operation, and they can use our workforce to provide that those data operations.
[00:04:34] Unknown:
There have been some other attempts and approaches at being able to incorporate human intelligence and human feedback in terms of data collection and data processing. Wondering if you can do a bit of compare and contrast with some of the prior models, such as what's offered with things like TaskRabbit or Mechanical Turk versus what you're able to do both in terms of your approach and in terms of the scalability of those solutions?
[00:05:03] Unknown:
So we actually started out as a partner for Mechanical Turk, and we still do work with Mechanical Turk or MTurk through our through their interfaces. We send some of our tasks to the Interc public crowds for processing. So I'd say we're similar to Mechanical Turk in in some ways, and that we provide that kind of data task human intelligence task type of interaction. What we provide in our platform is, also access to a set of private crowds. We provide a service layer where we interact with the customers to provide a quality assurance layer on top of the results. So we're looking out for call the quality of the results, the quality of the workers, the the fraud detection, and those sorts of things. And we work very closely with our customers to provide the results that they need without them having to become experts in a particular platform or its its, interfaces or APIs or technology.
[00:06:23] Unknown:
And what are some of the types of projects for machine learning or AI pipelines where it's necessary to incorporate that human intelligence and human feedback as part of the data life cycle for those projects?
[00:06:39] Unknown:
This actually extends beyond AI, but, basically, if you can provide a human with instructions for how to do something that you need done with data, but you don't have the capability to program a computer to do it or to do it well, maybe yet, but you're working on it, then it's extremely valuable, and it's cost effective to apply human intelligence to do that task. Some of the best examples of this type, a tasker in computer vision, where the task may be to identify and mark all the moving cars in a video or find all the faces in an image.
Humans across the board are extremely good at doing tasks like this with very little additional training because we use our visual systems constantly to interact with and move through the world without bumping into things. There's even evidence that infants from very early days are distinguishing faces from other objects. So our visual systems are optimized biologically and through experience to perform these tasks. And human intelligence in these this case far outpaces what we can do with AI today. So if we need that data building these AI systems, then sending it out to human workforce to get the targets for what we want the AI to do is definitely the way to go. Is it necessary?
I would say that's kind of a strong word. It's generally a matter of practicality and return on investment rather than it being strictly necessary. You can maybe go faster or explore different approaches, not have to help solve the whole problem at once if you're using the human intelligence to maybe deploy incrementally or to try different approaches. But you can maybe do it without human intelligence. It just might not be as practical.
[00:08:45] Unknown:
Yeah. And 1 of the common use cases for that manual data labeling is for these visually oriented tasks because of the inherent difficulty in training computer vision systems to be able to recognize arbitrary types of objects if you don't have that labeled training data. As you said, it is potentially possible, but it requires a lot more data and a lot more, engineering time to be able to ensure the accuracy of those results. Whereas if you have an accurately labeled dataset that you're confident in the labels that are being applied to those images, you can train with a much smaller set of actual data and produce much greater confidence in the trained model for the outputs.
Yes. That's absolutely right. And what are some of the other types or formats of data or approaches that you find you're most commonly given or requested for help with, besides visually oriented training?
[00:09:42] Unknown:
So a lot of the data is verification. You may have data entry errors where you're trying to correct OCR, and it's filling out date fields that are not dates. So they've got characters that aren't don't belong in date fields or, you know, maybe Social Security numbers, and there's characters that don't fit there. So there's data cleaning. There's sentiment analysis where you read a section of text and you label it as, yes, this is a positive sentiment toward this movie review or book review. This is a negative. This is neutral. There are lots of natural language processing or speech recognition tasks where, you know, simple transcription, training a training a computer to recognize what people are saying by giving them the spoken text and then the transcripted text that it should be targeting.
Images that are not necessarily labeling for what's in the picture and object recognition, but do they fit a certain set of guidelines there? Lots of websites that accept user generated content that don't want inappropriate or illegal things placed on their website. So they have to have, a review process with a very quick turnaround that will determine, does it meet their guidelines or not? And so those are kind of things that, you know, you label them yes or no. And those are things that computer vision type task are able to do to some extent, but not, in a complete way at this point. And
[00:11:36] Unknown:
we've been talking a lot of about the labeling aspect of human intelligence in the process of data collection, but there's also the possibility of having unlabeled datasets, but using a human to collect them because they're either not easily obtainable or not already in a digital form, and you need someone to do that manual process of either maybe transcription or just collecting the assets and aggregating them together. So I'm wondering if you can describe a bit about the types of datasets that might be useful in that way as well.
[00:12:10] Unknown:
That's a less common use case for our platform. We do data collection, but it's often in a, say, address verification. I have an image of this building, and it's in it's on this street, but I have the I don't have the address field filled out for it completely. So it's a little bit more granular than collecting, say, large works or even the generation of, you know, creative text. There's a lot of crowdsourcing type work, which is related to what we do, really shows benefits from sourcing creative content from a crowd. And we have some we certainly have the ability to do that, but our typical customers are focused on the more granular labels. Yes, no, an image tag or a a a tagging type thing or a free text field, maybe with the strict descriptive labels.
[00:13:18] Unknown:
And in the process of all of this, the underlying factor that a lot of people don't necessarily have the skill set or knowledge or capacity for is actually being able to find and train and manage the actual human workforce behind all of these collection and labeling efforts? So what have you found to be some of the biggest challenges associated with that aspect of the work? I definitely say that the workforce
[00:13:47] Unknown:
curation, as we call it, and then training are some of the biggest challenges in an ongoing quality control that we do to ensure that the results that are being produced by the human workforce are accurate and to the specifications that the customer needs. And those are kind of the fundamentals that make all that work. And then also along with having the engineering platform to dispatch the task and move the data around. The types of tasks that people are doing on our platform are typically giving these very specific answers. It could be down to yes or no. Is this, image appropriate for our user content generated guidelines? It could be selecting from a list of labels for sentiment analysis or typing in these free text fields. It could be maybe drawing a point on an image or a bounding box, whatever data is required by the customer. So I think that the biggest challenge is you have these these user interface challenges. Are they presented well to the user? Are the instructions understandable for the user? I think 1 of the most important challenges that we really focus on is taking good care of the people in our workforce, and that, allows us to curate and keep good crowds. And it's something that we're always striving to do. We do train people to work on specific customer tasks on an ongoing basis, and we call them data specialists. And we always, 1 thing that I would like to emphasize is that we always know that these are real people out there working on these tasks. And most likely, they're trying to use the money they earn to meet basic needs for themselves and their families. And we wanna make sure that they can earn a wage that they can live on, which means that they need to be able to do these tasks efficiently. So the user interface design and user experience design needs to allow them to not have to do a lot of extra steps and do more effort than is needed. So there's a lot of technology that goes into the platform to make that a smooth process for them, and that is a big challenge.
It's human nature to wanna do a good job, and we have to design our user interface to make sure that they can do that with, minimal friction. And our lead UX designer is working all the time with our workforce, the members of our workforce to see how we can do a better job on that. So I think that's it's not necessarily every the first thing you think of in terms of a challenge for this, but that part is really critical for making the workers experience and, you know, their their livelihood essentially work in this whole system as as an economy and as providing, these labels, this this data processing as a human provider.
[00:16:47] Unknown:
And is the work that you're sending to these, individuals largely undifferentiated and that anybody who has been onboarded onto your platform can just pick it up and run with it? Or do you find that it's occasionally necessary to have certain domain experts available for particular sets of tasks? Or do you have maybe people who are focused primarily on things like sentiment analysis or image labeling? We do have
[00:17:16] Unknown:
some workers who either through preference, in time in terms of the tasks that they like to do or feel they're good at or feel they're efficient or get whatever internal reward, They will choose certain tasks. There's also a set of training that for each type of task, there's there could it could be as simple as reading in a set of instructions and answering a few verification questions, but every data specialist that we have have has to qualify to do a particular type of task. And so there are some restrictions in that. You, as a person on the platform, to work a task, you need to pass that qualification, whatever the bar is. Sometimes it's learning a new technical vocabulary or the ability to provide a kind of a grade on some abstract concept like darkness or something that customer wants to be able to automatically recognize. There's also reasons we might restrict a person's ability to work on a task. For example, if a customer has proprietary data, they might need a set of workers who have signed an NDA. And so some restrictions may come into play there, and we provide on our platform a lot of different ways of putting workers into groups based on qualifications that may be skill based, experience based, or kind of these legal, contractual, or great arrangements.
And for somebody who is working on your platform, platform, what does their typical workflow look like? Our data specialist, they they first sign up on our platform and get an account and then establish their payment mechanisms and whatever reporting information is required by wherever they might be located in the world. Right now, we only have, private access to our sign on process and platform UI. Although we do have that I built that still that relationship with MTurk to provide tasks to workers in a public crowd, the MTurk crowd through their interface. The reason the private sign ons that we have allow us to support 1 of our missions of social outreach where we are working with developing countries and nonprofit organizations throughout the world, including the US, to provide work to people in the communities served by, those organizations.
And they actually help us connect with people who need work, and they sometimes even provide the infrastructure, like the building and the computers for these people to come and do the work. So once that person has signed up to work and be paid, we open these various options to that person for qualifying for a set of tasks that they have available if they meet made some of those legal restrictions. Most of our tasks are available to a wide variety of people, and they can do the qualification, the training to achieve the kind of entry into that set to that workforce group. And then batches of tasks that are relevant to their qualifications appear on a list on their personal dashboards of tasks that they can select from, and they look at the different types of tasks they have available. And they it may be a wide variety of things based on the types of qualifications and interest that they have. And they select a set to work on, and then they work tasks in that batch, that type of task until they as many as they want to until they're done, and then they could go back and they could switch to a different type of task or stop and take a break. It's it's up to them. These tasks are generally pretty short. Sometimes just like yes, no questions. So they can they pretty much have control over the amount of task or amount the length of time they work each session.
[00:21:36] Unknown:
And when you're doing this kind of work, it's easy to potentially get into a rhythm where you might accidentally mislabel data or maybe miss something. So I'm wondering what kinds of tools or processes you have in place to help ensure the accuracy of the inputs that are provided or if there's any means for them to self identify that there was an error made and go back and correct their mistake before it gets submitted into the training set. Answers
[00:22:02] Unknown:
are typically very short. And so the time after which they submit a task, we we typically don't see workers realizing, maybe after 2 or 3 tasks have gone by, that they've made a mistake on a previous 1, although that does happen, and they have app kind of out of band ways. And they'll contact us and say, I just realized that I didn't understand just they really do want to do a good job, and they really care about providing quality work, and they want to do a good job. So they if they do contact us, that's out of band in terms of the task task order.
We in term it's not part of the the working the task. It's it's through a separate reporting mechanism. And but then we can go back and identify their task. In general, for kind of the online performance of the individual tasks, we start out through the training qualification and make sure that they initially understand the task. But then Mhmm. We have a lot of quality control mechanisms in place in the platform and that some of these are fairly typical. You have a consensus where you send it to multiple workers and get a plurality of judgments. We have a review and escalate framework where the workers or the specialists, they check each other's work. So they get to review what another specialist has done and verify or correct what that work what that specialist has done. And we also have quality sampling that we do across tasks, and we check those results.
And we also have this continuous testing against known data where a specialist will get a task that has that's not actually results that may be produced, but we know the answer to that. And this is where the domain expert is very helpful in setting up these known good answers so that we can compare a workers generated task to the known good results, and that can be an automatic comparison. So there's a lot of online kind of constant monitoring and checks and balances in the platform.
[00:24:35] Unknown:
And I imagine that that also helps play into, mitigating situations where someone may be purposefully adding bad labels or incorrect data to try and potentially poison the dataset. I'm sure it's a rare occurrence or something that's never happened, but, you know, there is always the potentiality for that when you have humans with their own motivations or foibles.
[00:24:58] Unknown:
Right. We I think the most common type of that kind of bad resort source from it, from individuals would be kind of the fraudulent actor where someone may be generate trying to build a script that will just go in and submit as many answers kind of randomly as possible to try to get paid for work that they're just scripting. That's not unheard of. And for or fortunately for us, you know, we have by the vast majority of the people that come to work on our platform, they're generally there to work hard and earn money, and we appreciate them so much.
We do have, I think, some of our partners our mission partners will maybe do some amount of screening for people as they're coming to be onboarded on our platform. But even, you know, in the, I guess, the majority of cases where that doesn't happen, we are seeing a high quality of workers that want to come in. But you do have bad actors in every kind of environment who are trying to kinda game the system. So those things that, the quality control measures that I was talking about earlier will help us identify when bad results are coming in, and we can generally eliminate those from the result batch before even the before even the the specials get paid and also before they're sent to the the results are sent to the customers. And we also have some techniques in place to look for these types of activities that would be indicative of fraudulent behaviors like looking at for a script. So looking at the timing of the submissions and if it's too fast for human actions, then that is a tip-off that it could be fraudulent activity. And what have you found to be the
[00:27:00] Unknown:
limitations of crowdsourcing, data labeling, or types of datasets that don't lend themselves well to having that human feedback at a large scale.
[00:27:12] Unknown:
Haven't the limitations so there's some limitations intrinsically on human performance. I'm not sure that it is related to the the types of tasks, the large, whether it's, you know, large scale task, but but humans are notoriously inconsistent at times. They get tired, and they have a bad day, and, you know, whatever. So they we understand that no one's perfect all the time. Even demand experts get things wrong. They are not immune to these inconsistency issues that humans always have. So the quality control mechanisms help us guard against that.
In in terms of these large scale collections, scale is not actually a limitation of our system. We scale is actually 1 of the strengths. So for any given customer on a given day, we turn around, you know, typically tens of thousands of data records, and that might come in spurts. So the customer sends us tens of thousands of data records for 3 weeks, and then they go quiet and they do some other things with the data and then they come back. It wouldn't be unusual to see a, you know, a 1000000 records a day. And we see that kind of volume all already if you add customers together on our platform. I I think that there's less limitations in if you have a good platform and if you with a good interface and a good workforce and you have these quality control mechanisms in place, I think that there's less limitations in this type of work than other approaches because humans can bring so much basic experience and kind of common sense to tasks that we can't we just can't achieve with automation right now. And so I think the limitations that we see with the it's only limited our imagination. We we work with customers to help them get their tasks in a format and develop their the user interfaces so that so that a set of workers can can provide the input they need, and it happens at scale.
[00:29:31] Unknown:
And 1 of the potential complications for these types of datasets is if you have a complex taxonomy that you need to record or if there's a lot of additional metadata that's necessary to be gathered or filled in. So I'm wondering what your approach is as far as developing these taxonomies or structuring the datasets that are collected or enriched for customers to help ensure the accuracy and the utility of the generated data and also make it easy for the for your for the individuals working with that data to, ensure that there is consistency in how that data is structured.
[00:30:16] Unknown:
Right. So and that that's actually, an excellent set of questions. The in terms of a taxonomy, we customers usually come to us with a set of things they need labeled or the the organization of what their taxonomy is that they're trying to get data into. But if they're early in their process, we would usually work with them to develop 1, and we've definitely done that. And we may suggest standards that could be used based on our experience. But when it comes to complex tasks that have multiple judgments, say, per you're making multiple judgments on an image or a piece of text, and they're they may be layered judgments.
If the volume is large enough that we they'll just simply send it you know, if it's a 1, 000 records, we may just send it through multiple workers and and then try to send it in a different order again. But if the volume is large, we spend a lot of time working on user interface, different ways of decomposing the way that the questions are presented to the worker so that they're kind of in cognitively bite sized chunks in the right way that presents the workers most, the best opportunity to give the best results. And we might do some AB testing where we send it to a set of data specials in 1 way, and then we try another. And then we work with a customer at the beginning of a large volume project to say, let's have your experts, you know, look at these small set of results.
What kinds of trends are you seeing? Which 1 seems to be working out the best. And so there the task design and the user interface design for complex tasks can be quite a project in itself, and that's something that we work really hard on using, a lot of inputs from the psychology and user experience literature to understand, is it better for people if they're annotating a video, is it better for them to focus on 1 kind of object in every frame, or is it better to for them to label all the objects in certain frame at once? And so there have been in a lot of cases, there are actually studies about how people process that for most efficiency and most accuracy, and we'll pull that in whenever we can.
[00:33:04] Unknown:
You've referenced a few different studies as we've been speaking. So I'm just curious if you have participated in or conducted any of your own or contributed data to any studies that are either ongoing or recently completed?
[00:33:17] Unknown:
Not here. I have only been at Allegion for a short time. The these are mostly studies that I'm familiar with from the literature or things that are in my previous life. My actual experiments were primarily with, in my own background, were primarily with algorithmic machine learning type performance. The studies that I am interested in from my psychology and cognitive science background are the ones that I I haven't actually done, but I am familiar with through just reading them and being an avid fan of that type of work. I think it's very important to consider how people are perceiving things and how they are able to be the most successful by by understanding the human computer interaction factors as well.
[00:34:16] Unknown:
And what kinds of metadata do you track as part of the overall process of working with these datasets, And how do you record and transmit that information and ensure that it stays collocated and easily associated with the data as it progresses through its various stages?
[00:34:34] Unknown:
So metadata, our our records and our tasks are typically very small in terms of a single task has an input record and a result record. And compared to the volume of the overall platform, that record is quite small. We will monitor information about where it came from, who what it's be how it's being dispatched, how long it's been waiting in a queue, how long it's been taking. You know, the worker the activity the activities of the workforce as they act on the task. We monitor the health and welfare statistics of our platform. And if things are acting for technology reasons in an odd way, like task or timing out when they're loaded instead of having a chance for a data specialist to even have a chance to view it. We're we're monitoring those kinds of health and wear welfare stats all the time for our platform, and then I think that's the 2 categories. The health and welfare of the platform and then the activity of the workers.
The the records themselves, they're fairly, small packages that can be treated fairly independently. And so a lot of our metadata is collected in the browser, associated the results to the input record, and then stored in our back end. Then we use the metadata sometimes for calculations
[00:36:15] Unknown:
on different kinds of metrics and our learning needs that we might have. Yeah. And I imagine that keeping the individual records small is also very beneficial given the fact that, as you mentioned, some of your workforce is in developing nations or rural areas who don't necessarily have easy access to high speed Internet or broadband for being able to pull down large data files or large chunks of datasets at once. So I imagine that factors in as well to, those design considerations as far as how the data is distributed and transmitted as well as the amount of extra information that you track, so that you're not artificially inflating the amount of data that's getting passed back and forth and reducing the usability of the platform as a result. That's
[00:37:00] Unknown:
absolutely right. That's absolutely right. We don't want our workforce to have to sit and wait through long load times for the task because that's impacting their experience, their ability to get work done, and essentially their ability to make a livable wage. So that's that's exactly right. And in terms of the
[00:37:27] Unknown:
sort of storage and distribution characteristics of those datasets, do you have a sort of standard set of technologies that you're using for managing that? Or is it largely based on sort of how the data is delivered to you from the customer and how you are able to portion it out as a result? Fairly
[00:37:48] Unknown:
customer focused in terms of being flexible to the types of input records or data that they are providing and making our platform very flexible to accommodate a wide variety of what they're bringing to us. In terms of, I'm I'm actually not deeply familiar with our full technology stack currently. So I'm not the best person to ask about the details of how you know, what, what technology stack we're using to move data around, unfortunately.
[00:38:22] Unknown:
As you work more in this space of human intelligence as an input to building and enhancing and enriching datasets. Do you think that it will be a practice that is going to be useful and make sense from a cost benefit standpoint as we continue to, progress in terms of our ability to build machine learning and AI pipelines? Or do you think that there will eventually be a tipping point where the algorithms that we have will reach a certain point of sophistication where we can begin working more with largely unlabeled datasets that don't necessarily require as much human intervention.
[00:39:05] Unknown:
So I think we see that progression, moving to moving rather from highly feature engineered approaches to the ability to deal with more raw data and skip essentially the step of this human overlay of the conceptual structure that we tend to use as humans to explain our reasoning. We have seen already that the increases in the computing power and storage and the algorithms that are associated with that as well as the amount of data that we have in some applications have essentially eliminated the need to over engineer in some ways the conceptual reduction of the features and just use them all and use trillions of records of data.
So I think we'll always see that there will be advances where we'll have the in certain areas, we'll have the need for the human labels. The technology will progress to the point where we won't need those particular labels anymore. But I even though I'm a never say never kind of person, I would definitely say that the practical part of me says that we'll never actually finish everything that we want to do with AI. So we're always going to be reaching for that next step, and we're not going to be sure how to do it. We are not right now making self driving cars that just have cameras and are connected directly to the braking system and steering system and say, just learn from all the things you can observe because we frankly don't have that much data that would be required to build that system, and we're still learning about how to do this. And so we're always gonna need that human performance and that human level ports to to reach for the next step. And so I think that this ability to incorporate human intelligence in the development of of AI is just is where we'll always see the next opportunity.
I also, I really believe that this idea of providing meaningful work to people that need and want to work is a hopeful message for the future of AI. It's a it's really the counternarrative to this common perception that automation and AI and machine learning are taking jobs away from people, where instead, it actually can provide jobs and improve the quality of the AI AI services at the same time.
[00:42:20] Unknown:
And, also, I think that as some of these algorithms become more sophisticated and able to process more unlabeled datasets, the greater need will continue to be at the small and medium scale of data where you don't have enough raw data to process through to create these, trained models. And so you need to be able to have that confidence in the smaller set of data to be able to produce a model that has a sufficient degree of accuracy for the task that you're putting it to. That's absolutely right. The smaller the dataset,
[00:42:54] Unknown:
the more that quality matters because you cannot, overcome the inaccuracies that would be in your dataset without a a bulk of data that essentially, you know, the noise is is evenly distributed. The noise needs to cancel out. You need a lot of data in order for that to happen. So you're absolutely right. The smaller set the set of data that you have to work with, the more that you need a high degree of accuracy in the data that you're using.
[00:43:33] Unknown:
And are there any other aspects of this problem space or the work that you're doing with Allegion that you think we should discuss further before we start to close out the show?
[00:43:42] Unknown:
I would I would just like to to say that we do find this work with the human intelligence aspect and the workforce that we have to be very rewarding and hopeful for the people that we have working with us. And we want to say always how much we appreciate their efforts and how hard they work because
[00:44:14] Unknown:
none of this would be possible without them. Well, thank you very much for that. And for anybody who wants to get in touch with you or follow the work that you're up to, I'll have you add your preferred contact information to the show notes. And as a final question before I let you go, from your perspective, what do you see as being the biggest gap in the tooling or technology that's available for data
[00:44:42] Unknown:
connecting the data that you have to the ability to use that data for what you wanna use it for. And we are solving a part of that gap by bringing in human intelligence to add content and context and interpretations that aren't currently possible with computer processing alone, but there are vast opportunities in making additional scripting and tools that allow us to move data around and transform it in a way that using that data frictionless. And so if it can be automated, we absolutely should. If it can't yet be automated, we should use the tools, you know, the judgments that we have available through human intelligence. But as a community, I think together, we are always seeking that increased efficiency that would make this frictionless
[00:45:47] Unknown:
use of data possible. Alright. Well, thank you very much for taking the time out of your day to join me and discuss the work that you're up to. It's definitely been an interesting conversation, and it's an interesting and challenging problem domain. So thank you for that, and I hope you enjoy the rest of your day. Thank you. You as well. It's been my pleasure.
Introduction to Cheryl Martin and Allegion
Understanding Data Labeling and Collection
Comparing Allegion with Mechanical Turk
Human Intelligence in Machine Learning Projects
Types of Data and Common Tasks
Challenges in Workforce Curation and Training
Workflow of Data Specialists
Ensuring Accuracy and Quality Control
Limitations and Challenges of Crowdsourcing
Complex Taxonomies and Data Structuring
Future of Human Intelligence in AI
Importance of High-Quality Data in Small Datasets
Closing Remarks and Contact Information