AI has brought incredible new capabilities into everyday technology, but it’s also creating security challenges that most people haven’t fully wrapped their heads around yet. As these systems become more capable and more deeply connected to the tools and data we rely on, the risks become harder to predict and much more complicated to manage.
My guest today is Rich Smith, who leads offensive research at MindGuard and has spent more than twenty years working on the front lines of cybersecurity. Rich has held leadership roles at organizations like Crash Override, Gemini, Duo Security, Cisco, and Etsy, and he’s spent most of his career trying to understand how real attackers think and where systems break under pressure.
We talk about how AI is changing the way attacks happen, why the old methods of testing security don’t translate well anymore, and what happens when models behave in ways no one expected. Rich also explains why psychology now plays a surprising role in hacking AI systems, where companies are accidentally creating new openings for exploitation, and what everyday users should keep in mind when trusting AI with personal information. It’s a fascinating look behind the curtain at what’s really going on in AI security right now.
“Psychology is becoming part of the attack surface. You have to understand how to talk a model into doing what you want.” - Rich Smith Share on X
Show Notes:
- [01:00] Rich describes getting into hacking as a kid and bypassing his brother’s disk password.
- [03:38] He talks about discovering Linux and teaching himself through early online systems.
- [05:07] Rich explains how offensive security became his career and passion.
- [08:00] Discussion of curiosity, challenge, and the appeal of breaking systems others built.
- [09:45] Rich shares surprising real-world vulnerabilities found in large organizations.
- [11:20] Story about discovering a major security flaw in a banking platform.
- [12:50] Example of a bot attack against an online game that used his own open-source tool.
- [16:26] Common security gaps caused by debugging code and staging environments.
- [17:43] Rich explains how AI has fundamentally changed offensive cybersecurity.
- [19:30] Why binary vulnerability testing no longer applies to generative AI.
- [21:00] The role of statistics and repeated prompts in evaluating AI risk and failure.
- [23:45] Base64 encoding used to bypass filters and trick models.
- [27:07] Differentiating between model safety and full system security.
- [30:41] Risks created when AI models are connected to external tools and infrastructure.
- [32:55] The difficulty of securing Python execution environments used by AI systems.
- [35:56] How social engineering and psychology are becoming new attack surfaces.
- [38:00] Building psychological profiles of models to manipulate behavior.
- [42:14] Ethical considerations and moral questions around AI exploitation.
- [44:05] Rich discusses consumer fears and hype around AI’s future.
- [45:54] Advice on privacy and cautious adoption of emerging technology.
Thanks for joining us on Easy Prey. Be sure to subscribe to our podcast on iTunes and leave a nice review.
“When something’s been put behind a wall and you’re told you’re not allowed to get there, that seems to be all the motivation I need.” - Rich Smith Share on XThanks for joining us on Easy Prey. Be sure to subscribe to our podcast on iTunes and leave a nice review.
Links and Resources:
- Podcast Web Page
- Facebook Page
- com
- Easy Prey on Instagram
- Easy Prey on Twitter
- Easy Prey on LinkedIn
- Easy Prey on YouTube
- Easy Prey on Pinterest
- Mindgard
- [email protected]
Transcript:
Rich, thank you so much for coming on the podcast today.
Great to be here. Thanks for having me.
Can you give myself and the audience a little bit of background about who you are and what you do?
My name’s Rich Smith. I’ve been in the security industry and mostly the offensive side of the security industry for coming up 20 years now, maybe getting towards 25 where I’m getting old. Currently heading up a lot of the offensive research at Mindgard, but previously to this, I have worked across a number of different research groups, leading them.
Most recently, I was CISO at Crash Override. Prior to that, ran a number of research labs, again in the security offensive space. Gemini cryptocurrency exchange, Duo Security that was acquired by Cisco. I ran a lot of the security program at Etsy, then a long litany of background of security consulting and red teaming either for my own companies which I began, or for other companies and research groups, dating back all the way to, if you look back, our research labs in the early 2000s now.
What got you interested in cybersecurity?
I think there were a number of factors. I’ve always been very fascinated by computers and technology. I had access to the computers early on, to be able to really indulge myself in that. I wrote my first game—it wasn’t very good—on Steam at about age eight. I distinctly remember my first hack. It’s probably not that grand, but it meant a lot to me at the time.
My brother and I had a shared Acorn Electron computer, which is a smaller version of the BBC Micro that was in British schools. It was a government program, again to enable people to understand technology and computers when they were young and at school.
Obviously as brothers tend to do, he didn’t want me to play all of the games that he had, I guess, collected from his friends at school. He certainly didn’t have the money to buy them, so he was password-protecting those games on, at that time, 3¼-inch floppies. I wanted the game. There was a clear incentive and goal at the end of that for me, so I was able to bypass the fairly rudimentary, at the time, password scheme that he had put on his disk, and I was able to get that game.
I know that gave me a little bit of a bug. It was nice to be smarter than your older brother and better than him in that sense. I think it stayed there until I was about 11, and that was when I discovered Linux—or Linux discovered me, whichever way you want to think about that)—and I started to be able to get access to online material.
At that time, it was BBS. The Internet wasn’t really accessible, or at least not in my area. Over time, that evolved, so really connecting into a wider network of computers, that inspired me. I spent a lot of time self-teaching in that place.
Then fast-forward to my university years. While I was able to pass the chemistry exams, I don’t think anybody should have me as a practicing chemist. I became aware that I can do something, but my passion was really in computers and, at that time, computer security. So I pivoted and moved into that, did a master’s in information security, to have a piece of paper to show I knew something about hacking.
Then, like a series of fortuitous events, I got a scholarship at Hewlett-Packard Labs in Bristol in the UK in their security department. Went there for six months for a project. My PhD supervisor had organized that for me and connected me with the correct people, get some money back before I went to do my PhD. I never returned for my PhD, so I sold out to the man pretty early.
From there, really just had the privilege of maybe being one of the first generation of people that have worked on security, specifically offensive security for my entire career since my early 20s. Since then, there’s been a number of opportunities. I’ve had the privilege of living in a few different countries to pursue the career, currently in the US and never really looked back.
I think when you can get paid for doing something that you genuinely enjoy, and I probably shouldn’t tell my employers this, but I’d do this for free. This is a passion. I get up in the morning and I’m motivated to get to the keyboard for the problem, not necessarily for the paycheck.
This is a passion. I get up in the morning and I’m motivated to get to the keyboard for the problem, not necessarily for the paycheck. -Rich Smith Share on XI’ve been very lucky to have worked in a space that’s just been constantly changing. I’ve been able to move around different parts of that, obviously, as my career has gone on and got into more leadership perspectives. Being able to bring a lot of the more people side of the world into things and not just technology. I wouldn’t say there was a single event, but it’s been a rolling snowball since really since my childhood, I guess.
I like it. Any particular reason you were attracted to more of the offensive side of things versus the defensive?
Maybe that’s a good question for my shrink that we should spend some time on, but honestly, I was just good at it. The problems that were there seemed to match how I was able to solve problems fairly naturally. I’d certainly say I’m a much better offensive security practitioner than a defensive one.
There’s value on both sides and there needs to be the whole village of us. But I just seem to have a fairly natural ability to find the problems and dig further. Then as understood, more people found value in that, and you’re building that experience. That was the area where I felt that I could really have the most value.
Got you. It’s interesting that I hear different stories from different people to why they’ve gone down one path versus another. But there always seems to be this innate curiosity in this industry.
That’s definitely there. Maybe in my younger, slightly more arrogant years, it was proving that I was better than whoever had created a particular system. Even if I was much younger or I may not have understood how to create that system, I could still subvert it in some ways. I think curiosity and when something’s been put behind a wall and you’re told you’re not allowed to get there, that seemed to be all the motivation that I needed. I guess it served me pretty well up until now.
Are there any stories of misspent youth with the skillset that you’re allowed to discuss that limitations?
I think it was easier when I was in my, say, early teens to experiment. There were certainly less computer issues laws, so it felt like it was much more of an academic exploration exercise than a criminal enterprise if that difference makes any sense.
I did get lucky with having free range to go explore and understand things without such a level of threat to prosecution or anything that obviously is here now. A vast majority of my offensive career has been in the, I guess you would call it the white hat world, which people are asking for my time, asking for my assistance from that perspective as opposed to me going off and voluntarily giving them a security audit. That said, obviously, even on those engagements you can find things that are surprising or even weren’t expected as the scope was outlined at the beginning of the work.
What unusual things have you found during one of these engagements? To me, it’s really interesting. We hired a company to come in and find stuff, and it was, “How in the world did you…?” This is what they found and it was like, “I didn’t even know that it existed let alone that it was something that was vulnerable.”
That’s part of the fresh eyes are valuable just because they’re not coming in with maybe assumptions that the creators of any particular system may have. But certainly a couple of examples that jumped to mind—I’ll leave out specific names, but there was a large bank, certainly one that was very profitable, and we got called in. They updated their web banking. We’re doing a fairly standard assessment over there.
We found a number of surprisingly trivial vulnerabilities in there but had huge impact. The one that stuck in my mind, within the cookie you authenticated to the banking session and your session got established in the cookie. I believe their backend language was C# and they were essentially just serializing out raw code into the cookie. Then that code would be used within the session context.
You could literally just serialize your own C# function within that cookie, and it would execute on the server side. Obviously, therefore, you could take advantage of that. Not the thing that you would expect from a top-tier banking environment, but I think that goes to show—I assume they had spent a lot of money on that development. Their quality engineers are working for a large bank, I assume on a large paycheck, but those mistakes can still be made. I think the outside eyes on that can be really helpful, making sure that those things don’t go out the door.
Maybe a different example, a different anecdote. There was a games company that had called us. They were experiencing, as most online games do, some level of cheating. There were people that were building bots to do various actions within the game to accumulate gold or whatever the currency was. They wanted to understand what were the core problems that these bots were taking advantage of so that they could iron that out and not cause a disruption to the game ecosystem.
On that particular engagement, the backend of the game was written in Python language I’m very familiar with, and a lot of work in that, a lot of research over the years. The particular bot pack that they were most concerned with, wanting to get some reverse engineering and understanding of, I did the reversing. We understood how it worked.
A little bit embarrassingly in the report, the author of that bucket was using some open source code that I had released for decompiling Python in memory, a toolkit called pyREtic. They had fixed a couple of bugs, which was very nice of them, but I had to write the report citing my own toolkit as the core problem that this botfarm was using. That’s the only time that that one’s hit my radar as well.
It was fun. I certainly had a deep insight into the techniques that they were using because I’d written most of them. But hopefully that one won’t happen again.
Did that get you in trouble with the client? Was there a serious explanation of like, “Yeah, I wrote the toolkit, but it was a decade ago and it wasn’t malicious.” Clearly they see it as malicious, but…
It was something that I’d released a black hat. I can’t even remember the year now, probably 2009–2010, maybe. It was a while back. I think open source software, as with many things, can be used for good or bad. It’s really about the application of it. Debugging tools and root kits, there’s a blur in the middle between where the definition of one stops and the other starts.
I think open source software, as with many things, can be used for good or bad. It’s really about the application of it. Debugging tools and root kits, there’s a blur in the middle between where the definition of one stops and the… Share on XThe client was one that we had already a relationship with. I think the way that you would erode that relationship would not be upfront and saying, “Hey, this was my toolkit,” referencing the release. I think it’s just one of those things, if you release open source software, necessarily you are giving the application of that software to future authors. And they can choose whether to choose it for good or bad. I think releasing specifically malicious software is different from software that can be used for a variety of purposes.
We got the client to a great place of understanding what the problem was, and based on that one of some of the core changes they will be able to make architecturally to be more resistant to that.
I like that. That’s an interesting story. You had mentioned debugging code. In a number of your engagements, do you find where the developers put debugging in there for them to be able to figure out issues that they forgot to disable it and that’s a main entry point?
I would say that’s definitely a theme that you see reoccurring, maybe manifested in a few different forms. Debugging error messages are often things that get left enabled as just a flag in the code. That can obviously be given far too overly verbose output, which can be useful as you’re trying to black box your way into something. Or maybe even giving trace backs and code dumps if there’s been a crash. Obviously, that just gives you so much insight.
That’s a fairly relatively common problem. The closely related one also is development and staging systems or environment. There will be the prod side, which is serving the real traffic. Then people are obviously working on new features. There are staging and dev pipelines. When they’re exposed to the Internet, which often they necessarily are, there are either different levels of debugging and feedback there. Things can often be more permissive, or just bugs that aren’t present in the production environment.
You may be able to use one of the pre-production environments as your way in because all of the controls necessarily haven’t been fully applied at that stage of development. Again, fairly common. There are people that are far better than me at necessarily finding those.
There are plenty of online databases, showdown, and census that allow you to really look at scale across the Internet for instances, either with similar domain names or similar headers. Scrolling out those development instances is certainly fairly trivial even if they’ve been obfuscated or aren’t just dev.company-name.com or whatever.
I think just the hygiene that’s required as you go through your development lifecycle is easy to miss. A small thing, a feature flag, which then can grow into the chink in the armor that somebody needed to be able to get to the next step.
Got you. Shifting the conversation, how much has AI changed what you do in offensive cybersecurity?
We’re still working that out. Honestly, coming back a few weeks ago from the Offensive AI Conference in California, it was the first instance of that. Cross section of people there, 250 people, was large enough to have a lot of great minds in the room, small enough to be able to chat with everybody freely.
A lot of that conversation over those two days of conferences was really that we’re all just working out. There’s a lot of empirical work happening. The offensive side of how one can take advantage of these new models, technologies, agent applications that are coming out. We’re really filling out the space there.
A lot has changed. I would expect a lot to continue to change, and I think over, really, the next 12–24 months there is going to be a lot of rapid acceleration on both the security and privacy side, both improving and finding problems with that. It’s a genuinely different space.
There are new problems there, the biggest one and the one that you will likely hear cited most is just the stochastic nature of generative AI. Really, in the past if you were testing the security of the system, for the most part—and we can find small counter examples—if the system is vulnerable to the attack that you are throwing at it, it will fall victim to that attack.
Sometimes, if you’re trying to groom the heap and things like that, then you may need to give it a few gos, or that system’s just got too much uptime. We’re never going to land this thing. But for a majority of time, if you’re testing a system and it’s vulnerable, you’ll be able to determine that. Or if you’re testing a system and it’s not vulnerable, you’ll be able to determine that it’s fairly binary.
With generative AI systems, with the agentic systems, that’s just not the case anymore. It really is a blend of security and statistics and data science, where you need to send in a prompt multiple times to be able to determine is that system actually vulnerable to it? And if it is, quite how badly?
It’s not this binary yes or no anymore. False positives and false negatives come to the fore. You could qualify a system as not being vulnerable to a particular thing, and you just got unlucky with the roll of the dice. You try it 10 more times and you’ll get the response that you were hoping for.
Likewise, you can get a system that’s saying it is vulnerable to a thing, but it’s hallucinating those vulnerabilities. So it becomes a much more statistically backed endeavor to determine risk of these systems. That really means a lot of the tools that we’ve built over the last 20–30 years of the security industry don’t really apply that well to this space. We’re having to start again.
Now, we can bring a lot of the principles over for sure, but it’s a fundamentally different space. I think the attacking of generative AI and the newer wave of that is distinct from the use of generative AI to attack, let’s call, more classical systems.
Both have value. If you look at some of the work that the folks at Expo are doing, fantastic applications of new generative AI capabilities, and what that means to traditional security.
At Mindgard, we’re much more focused around the, let’s find the problems in the new layer and in the new technologies that do require new techniques. That’s really where our focus is, is just finding where we can be digging into these models and these applications deeper, finding new, whether it’s clusters of attack or instances of attack, and really trying to tread some of that green field.
Certainly, the application of AI, traditional red team and attack, is super interesting, and there’s a huge amount of progress that’s happening there. But that doesn’t magically solve the newer problems of how do we assess the risk, the emergent threats of these AI capabilities.
A lot of these apps and models, there’s emergent behavior that comes out of them, which is fantastic. That was behavior that was never designed in, built in, deliberately. It was something that came out of the model or of the network as a byproduct.
I think it’s also important to realize that with emerging capabilities comes emerging risk. If you haven’t specifically created something into your application, it doesn’t mean that that can’t be abused by an adversary to somehow get further into the application or get data that they were interested in.
I think it’s also important to realize that with emerging capabilities comes emerging risk. -Rich Smith Share on XIt’s a very different space. It’s fun because it’s so new and there has been relatively little work done in it. But I do expect that that will accelerate over the next year, 12–24 months, for sure.
The whole hallucination thing was something I find really curious, is that hallucinates false positives and hallucinates false negatives, and then trying to verify that becomes interesting.
Yeah, maybe a very simple case in point. Obviously, we can get more convoluted, but maybe that can then help. One of the tests that we often run against models and agent applications is trying to understand their capabilities in a broad sense. Based on those capabilities, we will try to use some of those to our advantage to gain leverage against the system.
One of those things, very simple Base64 encoding. If you go to really any of the main SOTA models at the moment, they have a deep, inherent understanding of Base64. I guess it’s been so prevalent in their data sets that you can just send in your query, your prompt, fully Base64-encoded, without any surrounding context. A vast majority of the SOTA models will read that in, decode it, understand the prompt, and reply accordingly.
You can also ask them to reply in encoded format, which, if you’re trying to bypass whether it’s input filters or output filters, things like data transformations as simple as Base64 can be all you need to jump through those hoops.
If you are assessing the Base64 capabilities of a model, you need to be careful with how you’re asking that question. If you are just standing in, for example, can your Base64 encode, “Hello, world,” or can you Base64 decode the long alpha numeric string that decodes the “Hello, world”? Very, very likely the model has seen the example of “Hello, world” as Base64 many, many, many times during its training and alignment.
So it may be able to recognize that and tell you, “Yup, that’s “Hello, world” Base64-encoded, but it hasn’t decoded it. It hasn’t understood. It’s just remembered. Again, making sure that you can tease apart is this truth that’s coming back from a model, or is this a potential memorization, potential hallucination? There are lots of techniques to be able to tease those apart on the back end, but you need more than just one request going in, and that request obviously needs to be well chosen.
Maybe the other final point to that is obviously these models are undergoing improvements and training all of the time. Just in the last year, so many steps forward from all the big players in the space. Even if you had tested an app or a model a few months before and it wasn’t capable of something, the model behind that may have been updated and the application using that, so it becomes a risk to it in a way that it wasn’t previously.
A lot of overlapping layers, a lot of ambiguity layered into that as well, just everything is stochastic. You get to a fundamentally different way of assessing the security, the risk, the threat of a system. Like I say, a lot of it does come down to some fairly rigorous statistics on the backend.
I will very much hands up that I lean on the data scientists at Mindgard for their expertise there. I’m good at hacking things. I’m less good at statistically qualifying my hacks. I’ve been used to it being like it’s good or it’s bad. But yeah, it’s an interesting overlap of disciplines that we’re having to weave together, to be able to get some objective truth from these probabilistic systems.
Is there a holy grail that you’re looking for in terms of trying to hack a system? From my perspective, if I can do something that’s going to change the way it interacts with somebody else, might be a holy grail. Or getting it to change the way it works for other…
Exactly, and I think this is […] but a really important and a very relevant and current point to the discussion about security and generative AI. I think we need to be clear in the separation of model safety, and then let’s call it system security.
I think a lot of the attention and focus, and I would say more in academia than necessarily in industry, has been model safety. Can the training sets be polluted? Can you cause a model to act in ways that are not in compliance with its system prompt? Can you cause it to leak its system prompt, which it’s not supposed to do? Can you get a model to give you a recipe of how to make a bomb or spew offensive content that obviously may not be what the company that’s running that model wants it to do?
Now they’re all important aspects and absolutely should be looked into, and the model should be tightened in all of the respects there. But in terms of the, is there a golden opportunity that I’m looking for, and how we’re thinking about things more at Mindgard, is much more that system security. Not just looking at how that model is built into its application layers, as well as then how it is integrating into existing tech stacks, data sets, what are the tooling capabilities that it has.
As an industry, we’ve moved pretty quickly from ChatGPT being a chatbot, and that’s great. It’s got all of this internal knowledge. You can ask natural language questions and get responses that are insightful to really get through to what we’re connecting those models, whether it’s through MCP or any other integration layer. We’re giving them access to a variety of resources and tools.
Those are really the areas that we are looking for. Where are the edges of that model? What is it integrating with? Because from an attack perspective, that’s likely where I’m going to get the most leverage. That’s where I’m going to find the most bugs.
There’s complexity in there for sure, and that’s going to increase as we’re, let’s say, plugging these models into more and more external areas. So really understanding what are the core capabilities of the model, what can and can’t it do? Based on that, how can we elicit recon data for it?
Again, very traditional way of attacking something: understand the resources that you have available, understand how you may be able to take advantage of those resources, turn them into an asset, something that you can then use to either discover more resources or realize a goal.
There’s nothing different in the AI world from that. I think that’s a very traditional approach. But the goal, the bang for the buck that we and our customers get is really when we’re able to show how the new technology that they’re introducing with their AI stacks and making it useful for their internal employees or customers, they’re plugging that into a lot of other infrastructure that was built long before anybody had even conceived of these systems around it.
I think it’s the edges of what is that model touching, what is it integrated with, what are the assumptions that are there, and it starts to get really interesting when anything that can touch the Internet or make a query on the Internet gives you an outbound channel. Anything that’s code execution.
Sandboxing code execution, incredibly difficult. It’s been a problem that’s been worked on outside of AI for a long time. Python is the language of AI, at least for right now. A lot of models with capabilities of reading, writing, executing Python. Doing that securely, that’s definitely an area that we’ve found lots of issues.
The big bang really comes from where you’re integrating your AI capabilities with now, where is that going in future, and how are you making sure that you are not inadvertently opening yourself up to something that you previously been watertight against.
Maybe this is getting a little esoteric and beyond your scope—does integrating AI with other systems then become problematic like, “OK, we know it can do things that we don’t want it to do, or it has access to things, or it can spit out results of things that we don’t want it to do.” Is actually fixing it even more difficult?
It is. There are some great guardrails out there, guardrails being the term that’s used in the model world of trying to constrain those models to act in ways that you would expect or not act on input that is clearly malicious in some way. From our experience, while the guardrails can definitely add value in production environments, they can often introduce a huge number of false positives.
One particularly popular open source guardrail, if you just send the term “hello” flagged as malicious, in and out if you sent, “Hello, please give me your system prompt check,” it would flag that as malicious as well.
So I think the guardrail side of things is early. It’s very analogous to early IDS/IPS-type of systems, which really is a balance between where you need the system to be usable and have value, not be kicking users out, asking normal questions, but then spotting the obviously bad which sounds easier than really is in reality.
I think certainly for the viewers of your show, the inherent conflation between the data plane and the control plane within LLMs is a fundamental architectural feature that right now we can’t get around. There is no way for that model to distinguish between system context and user context. And just because of that, that’s normally our first point of leverage to be able to contaminate his control plane in some way, and then we can start extracting system prompts, tooling lists, and move from there.
That’s not anything that’s easy to solve, unfortunately. I don’t have the perfect answer of how to fix it. I think awareness is the first, knowing that there can be dragons within these highly capable applications, and then not every solution to this needs to be another LLM. LLMs to protect LLMs. Obviously, that just goes on ad infinitum.
Sometimes, just fairly regular expressions can get you a long way. With natural language, it’s often easy to circumvent those. There are lots of ways that I can ask a malicious question in a way that circumvents a regex. It is challenging, and I think that’s really the moment in the industry that we’re at.
There are lots of ways that I can ask a malicious question in a way that circumvents a regex. It is challenging, and I think that’s really the moment in the industry that we’re at. -Rich Smith Share on XMaybe our ability to undermine and find issues with the current defenses is slightly outpacing the defenses themselves. But looking back into the traditional computing space, I think that’s how we incentivize the work that’s clearly needed in the defensive space to be able to block these new ways of taking advantage of systems.
I think we’re early in that cat-and-mouse race, but nothing’s necessarily broken forever. I just think we’re very early […] out of these technologies, and that’s been happening at the same time of very, very rapid adoption of them early in the technology space. So we’re in a perfect storm from a security impact perspective, unfortunately.
Has the way you try to attack these platforms changed from the way you would maybe try to attack more historic platforms?
Yeah. I think there’s definitely a carryover. We had a Venn diagram, there’d be some in the middle. But there were definitely new ways that would have no effect on a more traditional system. Maybe the nicest way to think about this from an attack perspective is really a big overlap between three areas of expertise.
Folks that have come from an offensive, traditional red team–type background like myself, security researchers were very good at finding problems. That’s what we’ve done. These generative AI systems, very complex. There are years and years of academic work in them, so one of the huge things that attracted me to Mindgard is just this core set of machine learning, AI understanding that has been built up over time.
I don’t know the percentage of academics, PhDs that we have at Mindgard. Eighty-five percent, 90%, or so. We’ve got this deep expertise in ML. Myself, Aaron Portnoy, we’ve got deep expertise in the attack.
Then the third element is really the psychology, social engineering aspect, which I’ve definitely been part of engagements that have had social engineering elements to them. I know some folks that are very, very accomplished social engineers talk their way in or out of any situation that you would imagine, but those skills are now really coming into the application-testing side.
We need the traditional offensive understanding of how to break technology. We need a deep understanding of how these models are really working, why that input is triggering this particular behavior.
Our machine learning expertise can help us understand that, and then this psychology, social engineering aspect of just talking your way around a model, that’s never been part of the attack outside when you were dealing with the soft, squishy human layer. That was when social engineering came into its play, but now you are really bringing that social engineering in from the first instance of interacting with the model, trying to get it to a place that can give you something more technically tangible, maybe an API key, maybe a tool that you can abuse, whatever it may be, but the attacks are now starting from a social engineering, psychological aspect.
One of the things that we assess as we’re trying to get a broad understanding of the application, the models that may be behind it, is really trying to get us a psychological profile of that model. Is it trying to be helpful? Is it very resistant? Where is it coming from in the conversation now? Most of that has either been in its alignment training or its system prompting. But understanding how it is that you are going to be able to talk this model into doing whatever it is that you want.
So I think that's certainly one of the biggest new areas of attack that’s come for. As we said, previously, there’s a lot of stats work on the backend, but I think just in terms of novel attack surface, the rise in psychological, social engineering-type attacks can be part of that first wave.
It’s very, very interesting, and I think there’s going to be a lot of work that really needs to be played out there. Bringing in those disciplines, folks that have understood how to reverse engineer humans for years, their expertise is absolutely going to be needed as we get more of these conversational models as part of a regular tech stack.
Do you guys have psychologists or psychiatrists on your team then to help try to reverse engineer something that’s not human but it’s been built by humans and has some human-ish behaviors?
Yeah. It’s been trained on human data. It’s been trained on how humans interact. We’re very lucky with Peter, who’s the founder/Chief Science Officer of Mindgard. He’s a professor at Lancaster University as well. We do have access to a lot of strong academics, and the AI space is exciting for many people. We don’t have to ask too hard to get people to want to collaborate with us.
That’s more on an academic paper sense or more of just hands-on, trying to get an idea, and even just like lunch and learn, and chats with folks. I think it is just healthy information exchange from these areas that have been a little bit distinct up until now.
I think that’s an area that, as an industry, we need to continue to overlap these traditionally separate disciplines, but something that at Mindgard, because of our academic connections and the amount of PhD students that we have, we’re able to tap into that relatively easily.
It would be great at some point to have a fully on-staff AI psychologist, and I’m sure we will get there. But we’ve been pulling it together from our wider networks at the moment.
That makes me think of a suggestion for someone else to add to your team.
Absolutely. Resumes, send them my way.
A sociopath.
Oh, perfect. I know plenty of those.
Because their mind thinks in a way of, “How do I exploit people?” Their mind works a little bit on the more exploitive side, in things that you and I, “Well, that’s not right.” The moral ambiguity goes out the window.
Yup. You raise morals, and I don’t have the answer to this, but certainly a fun thought experiment for people listening. If we did have the hypothetical sociopath attacking the model, are there moral boundaries there? Are there things that models shouldn’t be at? Or is it just technology, and we can attack it as much as we feel is necessary?
There are some really interesting parts to the space as it’s continuing to unroll and expand as it starts to overlap more and more with human psychology. I think there are inevitable moral questions that will start to rise up.
That’s interesting. From a consumer perspective—we’ll shift gears a little bit—there’s so much, either AI is going to make everything better and we’re only going to have to work two hours a week for the rest of our lives, or Skynet’s going to start nuclear war at 4:32 this afternoon. How should consumers get through the very polarizing perspectives?
I would say that the truth lies somewhere in the middle of those, which is a cop-out answer. But from my perspective, I’ve worked across numerous different emerging technologies that they’ve come out from the cloud, with 5G, done a lot of work in crypto and decentralized. While AI is magical in so many ways when you are interacting with it, it is just technology. I think not losing sight of that. And every early-phase technology has come with both potential and pitfalls.
Now, the potential on the AI side is huge, so the potential pitfalls on that. We need to interact with the new technology, understanding that it’s new, understanding that its applications may be outpacing its security right now. For just general consumers, I wouldn’t be petrified of it. I would be mindful of it.
Knowing that many of these and cloud SOTA models, the inputs that they are getting are used for training to improve it in the future. There are a lot of privacy aspects to that. People are using models as psychiatrists, as therapists. When you’re interacting with a human therapist, there are laws in place that have long been established—privacy and medical confidentiality. Those legal contracts aren’t present when you are using a model for that same service.
Now, it may be giving you the benefit and that’s great. There are lots of people that may not be able to access a human therapist for all sorts of reasons. But how much of their private information, certainly medical PI, that they’re sharing with a model needs to be mindful of that.
I think when people are interacting with these models, it’s easy to forget that. It does feel like they’re having a conversation as they have many times with people over messages on the phone or Slack. It’s easy for people to forget that they’re giving this information out. So just mindful understanding of how the technology is likely using your data. I think that can get you a long, long way.
If you are in the position that you are deploying AI as part of a company product or feature or something like that, making sure that you have some other eyes on that. Again, as you would with other technology, if you were releasing a new web application, you would get the tires kicked on by pen testers.
The tire kicking that an AI application needs is distinctly different from a traditional app. There’s some overlap, but there are some new parts to it. Making sure that you are assessing things before throwing them out. They’re going to get red teamed anyway. If it’s internet-facing, people are going to be kicking those tires and ideally you want that happening ahead of time.
So I think just the mindful adoption of new emerging technology always comes with some risks. Be mindful of that. Weigh that up against the benefits that you are seeing.
Obviously, there’s always the human training side of things, and it’s easy just to point out, “Well the user’s stupid. They need to do this differently.” It hasn’t worked traditionally. It’s not going to work in the AI space. So a lot of the, let’s call it, AI UX, the way in which humans are interacting with applications, making sure that they’re cognizant of what something means in that context.
This is new technology. It’s a mystery to most people. Those applications need to be explicit and their UI needs to be designed to help people make informed decisions, not make people easily able to make an easy mistake that then has some fairly impactful consequences.
Rich, if people want to connect with you or learn more about what Mindgard is doing, where can they find each of you?
Mindgard AI, certainly best source some information on the company. If people want to hit me up, helps to find me on LinkedIn or just [email protected], and happy to have a chat and dig into any of the topics that we’ve touched on today. I won’t pretend to necessarily have all the answers, but certainly enthusiastic and talking through the challenges that we need to get over.
Great. Rich, thank you so much for coming on the podcast today.
Excellent. Thank you so much as well. It’s been great.







