Hype and Reality: How to Evaluate AI/ML in Cybersecurity
- [Diana] Well, thank you. First of all, thank you so much for being here and I think we're in the overflow room. So thank you also to the folks in the overflow. I know that your time is very valuable.
So I wanna level set a little bit about what we're gonna do in the talk. First, I'll introduce myself. That's probably the best thing, and I won't try not to hit the mic again and shock everybody. (laughs) But my name is Diana Kelley and I am the Chief Security Officer at a startup called Cybrize. And I developed these slides in concert with Dr. Char Sample, who is at ICF International.
She's a cybersecurity researcher and scientist. And Char unfortunately cannot be here today, or it would be great we would both be here happily presenting. So the level set by intermediate technical.
What this is is that if you have never heard of AI or ML before, it's a little bit past that, but this is not gonna go into the advanced technical level. So I'm gonna be talking about the terminology, de-hyping it, giving you examples of the kinds of models that are in use right now, what they're good for, what they're not so good for, to kind of bring down a lot of this magic that we hear spoken about AI and ML. What we're not going to do is to show you any proof of concept code.
We're not gonna get into any reversement engineering, anything like that. So if you're looking for hands on keyboard technical, looking at code kind of talk, this is not gonna be that talk. So I want you to be able to go to another talk if that's really what you want. If you really wanna get a kind of deeper dive on what's behind all this marketing to a little bit more technical scientific viewpoint, and then to also understand the failure modes, what can go wrong in AI and ML, why we have to protect it, and what to ask your vendors, then this is the talk for you. So with that, I'm gonna get going. All right.
The first thing is hype. We've heard a lot of hype in AI and ML. I've got some recent ones up here, but does anybody, and anybody who maybe been in computers a little bit longer like me, anybody remember Eliza? Eliza before Megan? Any Eliza? No. Oh, okay. We got a couple of hands up.
So Eliza was a program that was written in the 1960s at MIT and it was a natural language model, and it spoke back to you. You could prompt it. You could put things into the input and it would write back to you. There was one module called doctor that felt kinda like talking to a shrink.
It asked you a lot about how do you feel. And people thought Eliza was sentient and that basically computers were going to take over the world in the next decade or so. That was the 1960s. We sometimes forget about things from the past.
Back here in the one for Elon Musk, he said that they were gonna be driving themselves two years ago. It's a little bit hard to see that date, but it's 2015. So we were gonna have autonomous vehicles in 2017. How many people have a fully autonomous? Fully autonomous.
You don't have to even get in the driver's seat. Me neither. (laughs) We got Megan, right? If you've seen her, she's a delightful doll that kills her child's enemies for them. And then you know, we also have things like 100% fully automated cyber defense. I think I may have seen that at some point down the RSA show floor.
There's another one. How many folks here have that fully automated, to go with your fully automated car? Yeah, so that's a little bit of the hype. The reality, when we look at where autonomous cars are going, we see something very recently coming out where the self-driving car project is a road to nowhere. I think that's a little bit intense. It's a little bit harsh. But certainly we don't have fully autonomous cars.
The challenge was tougher than we realized. We couldn't just go from, we've got lane detection. We can identify a stop sign to fully autonomous cars. Doesn't mean we won't have them.
It just means that we haven't had it at that hype, that speed. And that's the point I wanna make. When you look at AI and ML, it's the same thing. And generative AI as we get into ChatGPT, right? Everyone's going, where's ChatGPT? Where we get into that? That's the kind of thing that as you look at that, keep in mind that things don't always move as quickly as we may think. And then what's the other reality? A lot of companies are concerned about what's gonna happen with AI and ML. Has anybody here been on a committee or a council at your company in the last year where one of the tasks was figure out what the risk is of AI and ML or figure out the threat model for this app or whatever.
Like just some kind of, basically somebody at the company charged you with go figure out if this is a risk or how it's a risk or what the, I mean, I know I have. I think most of us, yeah, a lot of us have. Because there's a concern.
The reality is, how do we use this properly? How do we use this well? The other thing that comes up a lot that I think if you're gonna do a hype versus reality is that AI is the superset and ML is the subset. So I've heard people say, "It doesn't matter. They're the same thing. They're totally different." You know, they're not. There's a difference. AI is officially, it's kind of this cognitive ability.
It's being able to pass the Turing test, the Turing test as if you can have a person interact with a machine and a human and they don't know which is which, that just based on the answers, they would not identify the machine as a machine. That's what the Turing test is. AI is also looking to apply things that have been lessons that were learned.
ML is good old math, mathematical models. It's machines and data. In some ways, it's come- In some cases, we wanna hope, we hope that ML can be better for us. It can actually make humans a lot better at what we do. You learn your lessons from previous events. And then ML can really help with the prioritization of those lessons.
So they are not completely interchangeable, but you may not care so much about that as you're trying to buy or adopt a solution. It's just they are different. And if someone does say to you as you're trying to figure out what their solution is doing, if they can't explain the differences, then that might be a good indicator that you're either not talking to the wrong person at the company or possibly they may not know their own technology.
So maybe a little bit of hype. So narrow versus general. This is the part that gets to a lot of people.
Narrow AI is very focused. It can be something like robotic automation that you've got just on the shop floor. It does it again and again and is smart enough to be able to maybe pick something up if it gets dropped. It's very repetitive. It's a very specific task that you have that do. And it repeats that task.
A lot of times we forget how much calculation and how much math we do as human beings every day without even thinking about it. And you're like, I don't do complex math every day. You actually do. Your DNA does. Yourself does.
'Cause every time you pick up something to drink, she just picked up and it was in a paper cup, right? Somebody's got a glass bottle over there. We've got a plastic bottle over there. Guess what? When you pick that up, it's a different weight.
You need to use different amounts of force. The amount of liquid that's in there also impacts how much you can, how quickly you can pick it up, right? Because if you didn't, what you'd be doing, you'd be like throwing it in back of you. You wouldn't be able to lift it. It's a paper cup but I can't lift it, right? But we do all of that calculation in our mind, in our bodies automatically.
And we don't even think about it. We don't even give ourselves a pat on the back, do we? You don't say, "That was a whole lot of work that I did." But it is. Now, imagine trying to get a machine to do that, to think ahead. General AI is getting into this realm of artificial intelligence where the systems can think for themselves. When you see science fiction and you see the old how robot, for example in 2001, most of that Skynet being smarter than the humans and keeping and controlling the humans.
Most of that, or even, I don't know where it's going, but if any early adapters of the "Mrs. Davis" TV show, it's amazing. But there's some AI algorithm running the world in that. Hasn't ended yet. So I don't know what's the ending.
But that's all general. Narrow is what we see now. General is sci-fi for now.
I'm not saying it's impossible, but what we're looking at now is narrow AI most of the time. And the reason that I bring that out is that hopefully you won't be as scared as some of what we've seen in the news that really has started to make it seem like these algorithms, right? One of them said that they were watching the Microsoft developer's work, (laughs) you know. Another one told the journalist to leave his wife for it. I mean, this is sentience, but that's really not what's happening. It's much more like what we saw with Eliza.
So narrow, this is what we're seeing already. It's already here for narrow AI. It's already helping us and it's already actually providing a lot of good, especially helping humans to do our jobs better. And I hope that we can continue to generate AI that helps people do their jobs better. I saw somebody with a jackhammer on, you know, the sidewalk the other day and I thought the amount of abuse that a body must take as you jackhammer. What if we could have AI do that? We've got AI doing jobs that humans don't wanna do right now, like going into parking decks where a bomb may be, right? The AI bot goes to that.
So that's narrow. We're already using it. It's already here and it's already helping us. But it's not sentient.
So what else are we seeing in the wild with AI and ML? Generative. Okay, who's played with ChatGPT 3 or 4? How many folks? Is that it? Nah, I think most of you. (laughs) If you haven't, go give it a try after this. Just, you know, play around with it 'cause it's a really, the best way to understand a tool and to understand the pros and the cons is to actually use it. So generative AI.
And then we'll talk about supervised and unsupervised machine learning. And the reason that I wanted to do that in this talk is that I have personally been on the RSA show floor and asked questions of vendors when I'm trying to figure out hype versus reality of what they're selling and said things like, is it supervised or unsupervised machine learning, and why? And have gotten answers that I know are not technically accurate. So it's very hard then to identify the hype versus the reality.
So I wanted to give you guys a good level set on these. So generative, this is what ChatGPT is. It's also DALL-E for example. They're trained on large amounts of data, generally large language models.
But GPT-4 is being trained on both images and text. They tend to respond in a way that's gonna feel very human. They're not entirely, they're not going to just... They're not designed to pass the Turing test, but a lot of them do pass the Turing test, at least as you're doing some initial prompts and back and forth with them.
They can be a very broad corpus of knowledge that it's trained on or a very narrow corpus of knowledge. So a very broad corpus of knowledge is ChatGPT, right? You can pretty much ask it almost anything. And it comes back with a relatively good answer. You can do a very narrow corpus of knowledge. And this was something that when I was at IBM was what we were doing with Watson for Cyber.
We wanted Watson for Cyber, I think it's called QRadar with Watson now. But what we wanted it to do was to be able to answer questions from security analysts. So we cared about the security information it had, so we trained it on a very narrow corpus, which was cybersecurity and things related to cybersecurity. But you can also have this very broad corpus, but they get trained on that. And then that's what they use for probability to generate the responses, the responses that they generate. And this is not me.
This is a data scientist saying it. They're kind of like type ahead. So when you're putting in something in a text or you put it into your browser and you see that it tries to fill it out, you took C and it goes Craigslist and it's like, no, I want CNN, right? That's a type ahead. That's a form of what's happening with ChatGPT. It doesn't feel that way.
It feels like it's so smart, it can pass the MCATs. But wait a second. The MCATs, you need to get into medical school, are all those questions known and probably somewhere on the internet, could that be something that ChatGPT trained on? How about the answers? Do you need fully original answers if you're going into the MCAT and answer nobody's seen before? Or is it probably an answer that is well known, right? It's in the medical books. It's accepted medical truth. So when they say type ahead, that's what they mean. It's the difference between generating something based on probability, using that corpus versus creating something entirely new. And if part of you is thinking, but what is creation and what is entirely new? Yes, that's a whole other metaphysical discussion that philosophers are getting into.
But for this, let's go back to generative, is probabilities based on what it's been trained on. So you've seen it in things like ChatGPT. If you've seen some of those weird pictures that are starting to win photography contests and graphic arts contests. DALL-E is an image generative AI that uses images and now we're starting to see it more in CyberAssistants.
There's the QRadar, oh, I have the right name up there. QRadar Advisor with Watson and Microsoft Security Copilot. And those are, again, gonna be on very narrow corpus.
So where's it really good? Where's the best fit? Oh, and this by the way is a, that's an AI of a dog. I didn't draw that. You never thought I drew that anyway. Okay, so where's generative really good? It's great at brainstorming. It's great at prompting you as a human to give you different ideas. If you're doing research and asking questions against a very trusted corpus of knowledge, it could be great there. And it's really good at generating puppies in the forest.
So it's good for things like the CyberAssistants. Where is it not such best fits? And I apologize that that's a little bit hard to read, but I'll tell you what it says. It says what security books did Diana Kelley co-author or author? And we've got, and you guys could look that up if you want to. We've got five books here and in one of 'em they have like, Diana Kelley and her co-author provided detailed explanation of Zero Trust networking principles. Good for Diana. That sounds like a good book and she did it well.
But here's the thing, guys, I didn't write any of those books. (audience laughing) I have tried this multiple times with ChatGPT. It has never once got, one point it had me writing "Secrets and Lies" which was the New York Times best seller by Bruce Schneier and had a nice explanation about how awesome my writing was and I was like, well, I'll have to tell Bruce all about this. But no.
But the thing is, oh, and also it never once actually got the two books I have co-authored, which was a little bit depressing 'cause it actually outright told me I didn't write one of them, which kind of bummed me out. But so once it's telling us about, so when we're saying ChatGPT, it's so brilliant, it's so amazing, right? I asked it a very simple question, very public knowledge, very easy to find on the internet. You can find it much faster than ChatGPT could accurately. It keeps getting it wrong. I said the magic word earlier and if anybody's thinking it and wants to shout it out, it's probability, I probably could have written these books.
There's actually a high likelihood. I didn't happen to. I wrote other books. But there's a high probability I could have written those books.
So it's probability. And that's why it's not always as good if you don't know the answer or you haven't tuned it in a very specific corpus of knowledge or you feel very confident it's going to give the right answers. Because how many of us doubt what the computer tells us? If you have two 10-digit numbers that you need to multiply, do you put them into your calculator of choice? And then when you get the answer back, do you go, I'm gonna check that. I'm just gonna make sure that that's right.
And then you do the multiplication by hand, right? We don't, we trust what the computer tells us. And if there's one thing that you leave here with, I hope it's that in addition to feeling a little bit more empowered when people start saying magical AI/ML fairy dust. Also, I hope that you take away that one of the reasons we have to be so careful to separate the fact from the hype and the reality is that we are gonna trust these systems. We're gonna trust their outputs.
So they have to be good outputs. They have to be outputs that we can trust. Okay, so unsupervised learning. So this one is sort of like the million mice kind of thing.
This is when you have the training data. The big difference between supervised and unsupervised is if you label the training data. 'Cause if you label the training data, you're giving more information to the model, to the system on how it's gonna learn and classify. If you don't label it, it's gonna start looking for patterns itself.
So with unsupervised, if you give it tons and tons of data, then it can actually find patterns. And this is a wonderful thing about these systems is that when you get volumes and volumes of data, just like we, as human beings, you know, when we get two 10-digit numbers, it can feel a little bit like that's a lot of numbers. When you get huge volumes of data, it's very hard to see patterns if you're a person.
But if you're a machine, it's much easier to see those patterns. So you've actually seen this, you've probably seen this in action, if you go to a movie site or to a retail site and it says, "Hey, you shopped for this. You might also like this other thing." It's that most likely what's underneath that is an unsupervised algorithm that's looking for patterns. 'Cause shoppers like you bought this, or viewers like you, like, enjoyed that movie. So most likely you're gonna like this other thing.
And that's why they can also change dynamically because it's, as you're using the system. So finding those massive amounts of data in cyber, we use it oftentimes to do clustering or grouping. So an example there would be like, if you're starting to see connections again and again, patterns.
And again, I know there's between correlation and causality, but this is about pattern matching. So it's looking for pattern matches. Things like if we're constantly getting malware with this signature, it always happens to come from this nation state. It's not saying that they made it.
It's just this pattern is emerging. So unsupervised learning to find patterns is really very, very useful. You can also find associations you might not have seen before. So things like, if you have a ton of users and all of a sudden you start finding out that certain users are always clicking on the phishing links, but then you might not have this other piece of data, but that may be seen with the association in the model that says, and they all happen to be people who have reused their passwords or something, or they're all executives or they're all, you know, classic, right? Everybody's always picks on marketing, you know, they're all in marketing. But that, you know, seeing these kind of hidden associations that human beings may not be able to see in all the data or may not have been looking for. And then the other thing where it can be really useful is anomalies.
Because if you are unsupervised looking for patterns, patterns, patterns, then if you've got a good level set on the pattern, something that deviates from that is going to show up. It's not as good though. So when you don't wanna use it, if getting it wrong is a really, really, really, really big problem.
So if you're looking for like, hey, who tends to not be great with reuse of passwords, that's getting it right or wrong. It's, you're investigating. You're learning. This isn't the same as you're an oncologist and you're telling somebody whether or not their lesion, their x-ray is cancerous, for example.
So you don't wanna use it most of the time if it's, you know, if it really matters, if it's wrong. If it's gonna be invalid decisions, follow. So, oh, we're seeing some malware, but then some catastrophic shut down the whole business. It may be, you know, it overreacts.
So you wanna probably have a human in the mix as you're taking a beat and you're making a decision based on the output of the model. Also not as great for short training times. And the classic Black Box, if you need to understand 100% of how the weights are working in the model for the output to come out for that classification and that confidence level, then this probably isn't the best for you.
So how about supervised? Unsupervised was what? It was unlabeled. So it's pretty easy. So supervised means that it's labeled and you labeled the input and the output and you see based on this input, what output did I get? So guess what you can start to do as you're training that model. Now it's not just looking over vast amounts of information and saying, "I see this pattern and I see this pattern."
Now, you want it to give you very specific responses. Some things to remember here is that how you, because of the labeling, it has to be labeled very carefully and very well, and you have to monitor what comes out of it. So that matters a lot. Also, the data set completeness really matters. And this one is really, I hope that you're gonna go to more AI and ML talks, but the completeness issue is a big one because if we just give a portion of the data to the model, it's never going to learn how to actually give out accurate outputs.
So being able to improve the accuracy and also to correct as that model drifts, which is actually something that most models do. They will drift towards bias over time. There's also bias of the data that you put in, but the models will drift toward bias over time. So you need to keep watching them and make sure that they're not drifting. And then where you've probably seen supervised learning, actually weather, weather forecasting, 'cause that's a whole lot of data that you can put in, a whole lot of data that comes out, and you wanna predict with accuracy of the future. But you can also retag it, you can tell it where it was wrong.
It's a really good, it's a good set. It's a fantastically large and unique data set. Who owns weather.com? When did they buy it? Right around the time Watson (laughs) was being brought to market, yeah. So, okay, so supervised, where are the best fits? In things like predictions. So you've got, right, that's what weather is.
We've got all this existing data, all this extent data. And now I wanna know with that and a little bit of new data, what I might be able to predict. So in cyber, where would we wanna have things like predictions in classifications about phishing, for example, is this phishing or not? I have seen past phishing emails. I knew they were phishing.
It was tagged, it was labeled, it was told it was phishing, but now here's a new email, right? This is a good use for this kind of thing because you can't, right? If you wanna have ML helping with your phishing, it's got to be able to recognize the new phish. So looking at that. You know, also things like if it's gonna predict the cost of a novel attack. So we've had past incidents. We know what they cost. We know what this really matters to the company. What if we had another incident like this or we had one that was slightly different, can be really useful in that space.
Where it's not so good. Again, if you don't have time to train, so there may be a little bit of a theme here, but if you can't train it, if you can't label the data, if you can't get the right amount of data, all of these are things where you're probably not gonna be as best fit for your supervised. And then also if you really wanna do the automated classification, you're better off using unsupervised 'cause it's the best fit for the model. There are also, and these are the really high level models. Again, there's a lot here to deep dive into, but I just wanted to give you guys a really good base as you go through the rest of the conference. One thing about reinforcement learning, you may have heard about this and I have seen some, you know, like, oh, everything's reinforcement, so it's all automatic, it all learns on itself.
And I did, this actually Char wanted to really call this one out, that, you know, when you look at fully supervised and unsupervised reinforcement learning is an admission that the models are flawed. And that means that it's really about retraining the models with the reinforcement. But do keep in mind that reinforcement learning has its own sideways to go. And that's what that dog is about because a dog got reinforcement learning on the, in France many years ago. He saved this dog, saved a child from falling in the water, and the dog got rewarded.
What a wonderful dog. You get all these food. We love you, right? If you have dogs, you can imagine, you know, the dog liked that.
So the next day, a child fell in the water again and this dog saved it again. And like the French was like, this is amazing. This must be a dangerous part of the sand. But this wonderful hero dog is there. And they gave him, like, food again.
The third day they realized the dog had started to push the kids into the... (audience chattering) Yeah, so we wanna, and that actually, believe it or not, there are analogs with robotic systems like systems that get reinforced learning to clean up a mess will start making messes themselves. So, you know, what else, what can go wrong? Why does this matter? As you assess what can go wrong, what can't in your hype versus reality journey in AI and ML, I wanna give you some baselines of things to use. NIST has put out an artificial intelligence risk management framework.
And on the side there, you can see valid, safe, secure and resilient. These are all going to trust and what we need to look at for trust within our machine learning and our AI. So accountability. So this is a fantastic, if you haven't looked at this resource and you're getting deeper on AI and ML, I strongly recommend taking a look at that. As you're doing threat models or you're thinking about adopting AI and ML and separating that hype, keep in mind that AI is really very much about that trust point that do you trust what's coming out of it. You also need to take a look at the impacts from trusting it too much and then the learning curve of understanding how that information's gonna be used and then how it goes across networks.
So this is big. I don't, I mean, you know, in security, I've always been one of these people I'm like, don't scare people, don't make fear, uncertainty, and doubt. It's kind of the worst.
But I really strongly believe and know that if we do not get AI and ML right, that they're gonna be some repercussions that are not gonna be positive. And I think it's gonna be beyond just, you know, where back when I got started, we were like, what if somebody do all our money out of our bank account? That was sort of our biggest question, you know. I mean, now, we're talking about some pretty serious implications that can come.
We've already seen infrastructure attacks, you know, misuse of AI and ML, and deep fakes for example. So that's what's different. What's different is there's a really, really high impact of getting it wrong. So the other thing about, like, looking at this and what can go wrong is that the data sets itself and how we train the systems on the data sets really is gonna impact the outcome overall. Sometimes people just think throwing more data at the, you know, the larger the corpus the better. But what if I, if you were gonna train a system on cybersecurity, would you have it look at the entire internet for every word that said security or cybersecurity? Or would you pick and choose the information you were gonna give it to train on? I said at one point that one of the systems that was being trained better have a really good BS detector because if you trained it on everything on the internet, it's gonna get a lot of stuff that's wrong and it's not gonna have a lot as much guidance.
So if you train it on something smaller, so larger isn't always better, that's something that sometimes get confused. And then there are things like the data itself manipulating the data, poisoning the data, whether you know it or not. Models with Trojan back doors. There's a lot of stuff that can go wrong here with your ML, especially around the data and around the training that could lead to the negative impacts. And these are both, there are intentional and unintentional failures in ML and AI.
And as you're talking to your vendors, really want them to, I would ask them, have they thought about intentional unintentional failures? How have they threat model those systems? What are they doing to make sure that they're going to do, give you the right and most correct output for your data and for your prompts? So in here in intentional failures, and this is some great research that came out of Harvard and Microsoft's Data Cowboy Ram Shankar. So I always, I love this paper that you wrote. I've got it at the end of the slides.
I also have a LinkedIn learning class that covers all of the intentional and unintentional failure modes. We go into a really deep dive on it. And I think that all the LinkedIn learning classes on AI and ML are free till the end of June. So you can go and get a deeper dive in that if you want.
But so some of the things that can go on supply chain attacks, for example, there are now model Zoos. So model Zoos are freely available machine learning models that people are helping giving to each other and saying, "Look, I had this. It does this thing very well. Do you wanna try it? Train it maybe with some different data, try it in a different environment." So model Zoos are a wonderful collaborative way that we are now sharing information data. They're also an opportunity for nefarious people to put in things like back doors in the supply chain of your machine learning. So you wanna know the providence of the data, the providence of the models that your vendors are using.
For example, membership inference. A lot of times a model will say, "We've got all this data but no one can see it. It's all hidden, you know. It's all behind our API. It's all nice and safe."
No one's gonna actually be able to understand the underlying healthcare information, for example. But that's not the case. There have been provable examples of healthcare information that was thought to be masked that with the right prompts, attackers were able to figure out things like what kind of surgery somebody got. Then there are the, and those are intentional.
These are when people try and make it go wrong, unintentional things like the reward hacking I already talked about with the dog. And that does happen with systems. Distributional shifts over time and something has changed dramatically in the data, but the system may not interpret it properly or may not know so that you're now getting outputs that are not as useful.
If you're trying to predict things like how much traffic there's gonna be, not getting that right can be pretty impactful, for example. Incomplete testing is another, that's a big one that goes to that thing I was talking about, the smaller and the larger data sets, thinking you've totally tested that system when you haven't. And then it gets new information. And instead of being able to interpret it and to respond in an accurate way, it responds in an inaccurate way. Although you won't know because you don't know the answer, right? That's the big thing, right? We don't know the answer to anything that we're asking these systems in advance most of the time except when we're training them. So these are some other things that happened, not because an attacker attacked our systems because the system wasn't designed or trained properly.
So how can you evaluate? So we're gonna quickly go into some evaluation points and I did wanna just really briefly talk about a way to measure performance. If you have data scientists at your company or you're gonna start expanding into a data science team, or ideally I think this is the next frontier, but it's gonna be cyber data scientists, so people who are security scientists and data scientists. So using that, there's ML Commons, which is nonprofit, so it's an org that you can go there and you can get ML performance for inference, which is a machine learning benchmarking suite that will help you understand the speed and the accuracy of your machine learning models. And inference means that it's after it's been trained, so you're actually viewing what it's doing after the fact and the benchmarks are defined by both the data set and the quality target. So if you wanna start getting into some testing, this is a free way to do it.
I know there are a lot of companies that are starting to come up and spring up around this, but if you wanna get started right now with something that's just freely available, take a look at what's going on over at the ML Commons site. They also have something which is an AI leaderboard. What they're trying to get into is the stuff I was alluding to about, hopefully you got that the data itself really matters to how well the system works and operates in the long run.
So looking at the data performance itself and understanding that data to help to try and address some of the bottleneck issues as people are looking through the data, ensuring that it's good, it's useful data. And this is really looking at the quantity, I mean, I'm sorry, looking at the quality as much as the quantity. So yes, we need a lot of data, but it needs to be the right data. And another big piece of research that's going on at ML Commons is their leaderboard are on data performance. So both of those, if you want it, take a look at them.
I strongly recommend you do a little bit more on what you're looking at in the data performance of DataPerf, things like the data selection, for the vision, for speech, you know, NLP, looking at the acquisition strategy for the best training data from multiple train dataset multiplier. So this is a really good reference for you to go take a look at afterwards and get a little bit more of a handle on how you can do your own measurements if you're doing, you're creating your own AI or ML or if you're in talks with the company to acquire some and you have a team that can do this kind of testing and analysis. And then there are some questions just to ask, right? Sometimes the questions, I mean the biggest one, and somebody actually said to me when I said what this talk was about, somebody said, "Oh, I know how you can separate hype from reality. You set it up in your environment and you test it. And if it gives you good information, then it's good and there's no hype. And if it gives you bad information then it's bad."
And I said, "Yeah. (laughs) I mean, yes, that is true, but you know what? That's actually, there's a lot to that. As you're trying to differentiate hype from reality, ask if you can see that system or that model in action. That's really a good way for you to start looking at whether or not it's gonna work for you. No matter what a company tells you if it works and does what you need it to, that's actually one of the most important things. So I would ask, you know, as you're going around the show floor, anybody here gonna talk to vendors about AI? Are you here to maybe potentially acquire? Yeah, okay.
So these are some of the, I asked them why because it's in marketing. It's usually, it's like it'll solve all the problems. But why? Like, why are we using this technology? Why are we using that here? So I actually did have somebody say, I said, you know, what kind of models are you using? And they supervised, that's right, and he said, "We have a supermodel so we only need one model. It's completely unsupervised and it corrects itself perfectly."
And I thought, again, I'm either talking to the wrong person at the company or this company may need to hire some more data scientists. But yeah, so ask them, you know what, why, to get an understanding of why did we put this in there? Why these algorithms? And why AI? How is AI strengthening this solution? AI isn't just a solution for everything 'cause it's the new buzzword. It's not like, anybody here back in late 1996, it's not PKI back in the '90s.
Yeah, okay, I got some, (laughs) but you know, it's not like, you know, PKI was PKI. It wasn't a magical solution. This is not a magical solution either, but it's incredibly advanced technology and math, and if they know what they're doing can be really, really powerful.
Okay, the training data. Just came up quite a bit. I hope you got, you know, a good feel for why that training data really, really matters. But big one, biased data is gonna lead to biased outputs. You also will have bias or model drift in your system. Anybody know the famous story of the Jareds who played lacrosse? So there was a tool that, an applicant tracking system tool that a technology company was using to look at resumes, and I believe because of their existing developer, you know, group, and then also who they were hiring.
What happened was this thing actually started to drift very significantly to the point where it was very clear that it was prioritizing candidates whose names were Jared and had played lacrosse at some point in their school in previous career. And it was now like saying these are our top candidates, all the lacrosse playing Jareds, right? So we can see that and we go, oh no, you know, that's very bias but it can get really tricky because that's so obvious we're like after the fact. But when you start having subtle bias being introduced into these systems, you may not see it with an applicant tracking system.
Jared and lacrosse is pretty obvious, but maybe it's selecting for people on some less obvious criteria. So you wanna make sure that you understand the training sets, how they were labeled, and also if they're being monitored for bias and drift. Things like those model Zoos I was talking about, are they using them? One of the most famous ones or well known ones is Hugging Face.
And again, I am not saying don't use Zoos. Zoos are a wonderful collaboration and research tool. Just be careful and if you're buying from a vendor, ask if they're being careful to avoid things like those backdoor that I was talking about or tainted data, for example. And then that model inference, being able to extract information from a model that you shouldn't necessarily be able to see.
Make sure, ask them, what are you doing to make sure that the data is protected, that access to your API is protected, for example. What have you done to make sure that your training data stays good and stays non poisoned. And then lastly, I always ask if there's a human in the loop and maybe they expect you to have a human in the loop, and that's okay. Then you're the human, we're the humans, you know, like, but, or are they saying that this system turn it on and it's gonna do 100% of everything for you? Did they think about resilience with the ML and AI? Software development life cycles.
They still really, really matter here. In fact, I argue that they matter more because the impacts are so important. So how are, it's not just about the training of the model, but how is that model fitting into your architecture, for example? And is it protected? Are the APIs that provide access, are they protected? Is the data protected? Are the outputs protected? Did you think about privacy? Model inference is that sort of surprise. Oh, we got a little bit more outta here. But what about other kind of privacy? I mean, here is one that I hadn't even thought of until someone brought it up to me last week. But when you're entering your information in one of these generative chatbots, is that always information that is public that you would feel comfortable being on the internet? And we've talked about like the search engines, they look at our searches and they anonymize them, and then use them either to target market to us or to give good stats on how many people are looking at something.
But with ChatGPT, we're a lot more generative. So I have heard of people asking things like, hey, we're about to merge with this other company, both of them, you know, SEC (laughs) traded companies. What would be a good way to write a press release about this? Well, that's a lot. That's not the kind of thing that you'd put in a search engine, right? But we are starting to put, so think about your relationship with ChatGPT.
What have you put in there? And is that potentially, if you don't know who's looking or monitoring on the other side? So privacy by design. Once more around the horn, but we always have to think about it and it's no different here in ML and AI. If you are threat modeling before you buy, which I strongly advise you do before you bring in really corporation changing AI and ML technology, any technology actually.
Make sure that you do the threat model. And I put privacy in that threat model too. And then the 4Rs of resilience, which are robustness, redundancy, rapidity, and resourcefulness. How are they measuring that? Are they using things? Are they using MLPerf? Are they using DataPerf? What other tools are they using? How are they able to give you an assurance that you are going to have the resilience that you need in that system? And this one goes to that person who said to me like, "I don't know, just set it up and test it. You know, see what it is." So here what's the ROI versus the claims? And this would go for any product that you're buying.
I recognize that, but you know, with ML and AI, I've noticed that it seems to be a little bit easier for some marketing to be a little bit more over the top. I think it's because a lot of people were all kind of learning about this together and it's newer technology for a lot of us. So you know, if you see something like I, you know, 100% catch rate, like, okay, show me, you know. (laughs)
I mean, I would love that. I'm excited but you gotta show me 'cause I've never seen that before. So yeah, ask them how they're validating those claims. If it's something like your time to write a report's gonna go down from four days to, you know, a few hours, which is similar to something we actually had at IBM when we were looking at Watson, then just say, "Okay, good, show me that. Show me how that looks in your testing with your customers, your references, and then let me do it at my company." And then this is a good one that a friend of mine recommended, and I had not thought of this before, but it's a really good one, is that if you are gonna test it at your company, ask the vendor if you can just really test it on your own because you don't want the vendor actually doing all the work for you, setting it up and running it because they can tune it and they're not gonna be there tuning it when they've sold it to you.
So if you're doing some of these ROI kind of bake-offs yourselves, try and do it on your own because that's how you're gonna be using it in the real world. And they'll always ask for data and for references. I mean, I get it if somebody's a reference, they love the product, but if they can't give you those references, then that's kind of a sign, you know, at least they should have a few references that really, really like the product and can explain to you what it's like and start networking.
You have, look at this. You've got a room of your peers and there are thousands more out there. Start talking to each other. Ask about what they've used, what they like, which vendors they think are good 'cause you are the best assessors. And then is it actually solving a problem? So again, you know, this is sort of a little bit repetitive of what was was said before, but truly is the AI necessary for what it's gonna solve? And this one, and this is Char.
Char has seen this a lot in her research. So this is her example, you know, like how is the malware being counted? Because a lot of times they'll say that they've got, you know, thousands and thousands of malwares that they're catching. But then if you actually look at it, it's just, it's the same piece.
It's just on thousands of machines. So you wanna know is it unique individual pieces of malware? Is it unique, new phishing attacks? So ask them how they're calculating that out and why again that AI or ML is going to be the way that it's going to solve the problem for you. Next week, I hope that you'll go, this is a lot, and you guys are at the very beginning of the conference, so most of this is probably go whoosh outta your head. So if you're like me, if you're not, then I apologize it would go outta my head.
But, so take a look at these slides next week when you get back just to kind of refresh especially those five questions and then start putting those five questions into whatever you're doing as you're assessing your ML and AI, and hopefully, in six months, which you'll have is institutionalized approach to understanding the de-hype, the reality of AI and ML and being able to assess it into your organization and bring it into your DevOps and dev build environment. I do have, as I said, some recommended reading here at the end. You know, it goes from some more tech and less tech. Also that LinkedIn learning class I talked to you about that I know is available now. But yeah, so you can kind of go hhigher tech, lower tech depending on where your interests are.
But yeah, I just wanted to leave you 'cause I know there's a lot here to cover and we only had 50 minutes so I wanted to make sure you had some good resources for after. So with that, I think we have time for a couple of questions if anybody has any, yes! (indistinct) Ah, yeah. (indistinct) - [Participant 1] My question is- (indistinct) about Hugging Face (indistinct) and begin the pre-trained models, right? And data sets. Oh, thank you. So one of these slides you mentioned was about pre-trained models with Hugging Face- - Hugging Face, yeah. - or Tensor Flow Hub and all of this.
Are there any particular tips where, what to look for if that is an authentic pre-trained model or a data source that our developers can just directly download and use it? - Yeah. - How do we check that? - [Diana] Yeah, at this point, there's a lot on reputation. Do you know that? So the question was if you do use a pre-trained model out of like one of the Zoos, like Hugging Face. Yeah, looking for reputation. Who created it? Is it an entity or person that you trust? These are the kinds that maybe even write to them and ask about it. But yeah, at this point, there's a lot of, you know, a lot of work we have to do still to figure out if they're good.
Thank you. - [Participant 2] I have a question. - [Diana] Okay. - [Participant 2] I have a question. - Yes. No, I think Cameron is, yeah. And this will be the last one, and then I'll wait outside if you guys have more questions 'cause I know that they wanna keep us going.
Oh, okay. - [Participant 2] Yeah, I just had a question. Any more insight into Zero Trust in AI? That's all. Any high level statements or what have you where it's going? - [Diana] Yeah, I mean, I think one of the nice things about Zero Trust is that we're looking, so the question was Zero Trust in AI, and I think Zero Trust in AI can go together very nicely, especially as you look at things like step up authentication. If somebody's being unusual or anomalous, maybe you quarantine them temporarily or you don't grant them the access to the higher value asset that you're talking about.
And this is the kind of thing where ML can absolutely start to see patterns there and I think really contribute quite a bit if it's done properly. So, okay. - Thank you. - [Diana] So thank you. I hope you have a great rest of the conference.
(audience clapping) Thank you so much.