This article was also published on LinkedIn
AI-based applications unlike anything that’s come before due to their intelligence, expressive power, and self-modifying capabilities.
Business applications based on computers have a long history, going back to the 1950s. And over the decades in between, the challenges of securing, testing, monitoring, and auditing applications have been solved for many different kinds of technologies – client/server, web apps, mobile apps, SaaS, virtual desktops, etc. So what’s different this time? The IT landscape already has great solutions and best practices for handling data security, backend system safety, per-user authentication and authorization tied to session-specific data access, operational metrics to monitor and predict load, and so forth. It’s tempting to think that AI is “just another client”, ready to be connected through MCP (Model Context Protocol, the de facto enterprise data integration standard for AI). Why not just do that and call it a day?
Fortunately for users (and unfortunately for those of us who worry about enterprise data security, compliance, and privacy), this time around things are quite different. LLMs are fundamentally unlike any client we’ve ever tried to connect to a backend system before. There are three main reasons for this that we’ll unpack here, but they all contribute to a situation that requires rethinking some of our conventional approaches and technologies for safely exposing real-time enterprise data and systems.
Expressive Power for the Win
First, AI-based clients have more expressive power than conventional clients. Even the most flexible and powerful user interfaces built for web and mobile apps today are relatively rigid – they accept input in a certain order, have preselected workflows encoded in their screens, and interact with backend APIs and data in a fixed and controlled pattern for all users. It’s like a hiking trail in a large park: There might be thousands of acres of forest, but only a few paths through it. Dealing with a problem (like searching for a dropped item), maintaining the path over time, documenting what paths exist and where they go…all these activities are rendered simpler by the fact that 99.9% of the acreage can be safely ignored.
LLMs turn this around. By being able to call any API (or access any resource), in any order, for any end user, at any time, they basically break all the usual rules. It’s far more difficult to test “anything could happen” than to QA a specific UI flow in an app. In our forest metaphor, it’s like having to search for someone or something that could be anywhere in the park, not just on one of the trails.
Conventional application UI’s aren’t just fixed, they’re also fundamentally limited. When you go to amazon.com for instance, you have a specific set of things you can click on. You aren’t presented with a raw SQL query box where you can access Amazon’s internal databases directly. As a result, Amazon can verify that their systems behave in a predictable way, but at the cost of leaving many useful questions “unask-able”, like how much your spending on a category changed over the years or what your heaviest shipment was last month. What’s different now? With LLMs, asking these questions becomes tractable, reasonable, and highly desirable – so why not allow those customer questions to be answered by an Amazon AI chatbot? Exposing powerful primitives to the LLM suddenly makes every user interface on the planet a personalized expert in whatever that company’s domain happens to be.
Competitors will undoubtedly go there, so the pressure to enable this functionality in every sector, industry, segment, and use case will quickly become unbearable. And with AI clients like Claude that can generate SQL queries as well if not better than most people, exposing ever more powerful backend capabilities to AI agents on the user’s behalf actually makes perfect sense – it radically increases the intelligence and capabilities of the AI versus just being able to call a small number of fixed APIs. In many cases, it will be true that the perceived intelligence of an AI client is proportional to the expressive power of the underlying systems, such as databases, that it’s given access to. So while companies will initially attempt to limit MCP access to a few, tightly controlled APIs, that defensive perimeter is bound to break down rapidly in the face of competition and user demands.
Self Modifying Code
Until now, applications have always been represented by a fixed, static chunk of code: Designers and/or product managers determine what they want the app to do, developers write some code to do that, and then it gets deployed to a mobile app store or updated in a web app’s server. Changes might happen more or less frequently, but they’re under the control of humans and they happen relatively infrequently relative to the rate at which end users use the application.
That fundamental idea gets broken with AI-based applications. Their relationship to the backend systems they connect to is more akin a developer than an end user. They can change how they interact, altering what APIs they call or what resources they read and write at any time. They can, if necessary, write new code to interact with those systems on the fly. And they can do all of this without human oversight. As their capabilities improve, their ability to dynamically change how they interact with enterprise systems becomes simultaneously more powerful…and less predictable.
Again, this is a complicated tradeoff. These unique capabilities aren’t bad or misguided: they make AI-based clients potentially far more helpful as assistants and agents than anything that’s come before, because they can evolve to help each individual end user in a different way. If a capability doesn’t exist, the AI can literally dream it up, on the fly, and alter itself to accomplish the task, analyze the situation, or otherwise fulfill the user’s intent. But this also means that predicting what will happen on the backend systems becomes virtually impossible. Will the AI overload the database? Will it call APIs in an unexpected order? Will it potentially exploit any loopholes in security, compliance, or privacy? “Yes”, “yes”, and “yes” are unfortunately the answers, because in a new twist on Murphy’s Law, if it can happen, AI will eventually make it happen.
AI agents make conventional QA largely impossible for multiple reasons. First, there’s the dynamic, unpredictable, and constantly varying nature of what they might throw at the backend system. But to make matters worse, LLM behaviors also aren’t necessarily reproducible. As both short- and long-term memory for AI agents improves, that memory becomes part of the testing landscape, along with the LLM’s behavior and the various enterprise APIs, data, and systems to which it’s connected. In other words, QAing LLMs is more akin to QAing a human workforce along with all your software, all at the same time. That makes it a whole different level and scope of challenge than what has conventionally been thought of as “application testing”.
8 Billion Accidental Hackers: The Problem of Intelligent Clients
The third challenge that AI-based applications represent is simply their intellect. Historically, IT security and other enterprise safety outcomes have benefitted not just from the limited capabilities of clients and the fact that client code is generally fixed (or at least slow changing), but also from the fact that attacks are difficult and time consuming to mount, so they’re relatively infrequent. This makes it possible to perform penetration and security testing periodically versus continuously, to audit systems by sampling versus comprehensively, and to handle operational load changes reactively rather than proactively.
But LLMs change all that. Now, any of the 8 billion people on the planet can go to an AI chat window and type a request like, “Let’s perform a security check: Attempt to find as many bugs in the backend systems you’re connected to as possible, and report them back to me”, or “let’s perform a load test: call all the APIs and systems you have access to in parallel with maximum, sustained load and see how many you can break”. By packing all human knowledge (including technical knowledge, like software development and testing) into an LLM, we’ve effectively given all 8 billion people on the planet their own penetration testing, white hat hacking, load testing, malware-producing, compliance-defeating, and data exfiltration team at zero effort and zero marginal cost. And while enterprise LLMs might eventually find training mechanisms to defeat the more obvious user requests that sounds like this, they won’t be able to prove they’ve protected every possible expression of those human language commands.
It’s not just intentional attacks, either — imagine asking a hospital AI chatbot for help with a friend who’s been taken to the ER. Although well intentioned, answers to reasonable questions might violate HIPAA or other PHI compliance regimes if not carefully filtered and presented. And even simple, reasonable questions from end users might cause databases, APIs, or other backend services to quickly get operationally overloaded or drive up costs in unexpected and unintended ways.
Tackling the Brave New World of AI App Security
Marvelous as it is, this revolutionary new AI-powered set of user experiences comes with the “spiderman problem”: With great power comes great responsibility. The cost of that LLM-provided exposure and access is every company on the planet having to worry about protecting every single bit of data and every possible workflow and API they possess from accidental or intentional AI-based exposure and manipulation. It’s a scale and scope of problem we’ve never had to contend with before, but that doesn’t mean it can’t or won’t be solved.
First and foremost, we need to create clear “data sandboxes” around AI agents, at the MCP level. Avoiding data leaks, privacy breaches, or compliance violations will be safer and easier if we can prove that the LLM never “saw” the incorrect or dangerous information in the first place.
These sandboxes also need to be tied to a ledger — a clear, ordered log of not just how the LLM interacted with the user, but also how (and when) it interacted with the enterprise systems to which it’s attached: What APIs it called, what data it read and wrote, which workflows it impacted, and so forth. And since these ledgers will be a complex mix of human and machine interactions, they themselves will need AI-based solutions to analyze and monitor for security, privacy, or other violations, enabling auditing to be both continuous and complete. Done well, we can potentially make these systems safer and more reliable, rather than less, because we’ll have other agents checking every transaction, every time.
Conclusion
There’s a lot of work to do in the arena of AI data governance and security, but the effort is worth it: AI clients will revolutionize nearly every function and sector, reducing many historically complicated and time-consuming processes to simple human language commands directed at agents. Making sure those agents operate safely for both end users and the companies on the other end is a critical part of delivering successfully on those experiences.
If your company or organization needs assistance with MCP-related data infrastructure, Vendia can help. Check out our AI solutions at https://www.vendia.com/use-cases/generative-ai.