[COMPLIANCE & ENTREPRISE]

[

2/24/26

]

The First AI Agent Lawsuit Is Already Being Written

[Author]:

Amjad Fatmi

Not the first AI lawsuit. That already happened.

The first lawsuit specifically about what an autonomous agent did on your company's behalf, without explicit human authorization, is being written right now. Somewhere, a plaintiff attorney is sitting across from a client who lost money because an agent took an action nobody intended to authorize. They are reading deployment logs that show what happened but cannot show why it was permitted. They are asking whether the company can prove the action was within the agent's authorized scope.

In most cases, the company cannot.

This post is about what that lawsuit looks like, why the legal framework is further along than most technology teams realize, and what the specific technical controls are that determine whether your organization is defensible.

The Cases That Already Established the Framework

The legal theory for AI agent liability did not start with agents. It started with a chatbot and an airline.

In Moffatt v. Air Canada, decided in 2024 by the British Columbia Civil Resolution Tribunal, a passenger was given inaccurate bereavement fare information by Air Canada's customer support chatbot. Air Canada argued, in its defense, that it could not be held responsible for the chatbot's statements because the chatbot was a separate entity. The tribunal rejected this entirely. The court determined that Air Canada was responsible for the chatbot and that the chatbot's representations had been made negligently.

This ruling established a principle that has since been cited in every serious legal analysis of AI agent liability: you cannot deploy an autonomous system to speak and act on your behalf and then disclaim responsibility for what it says and does. The AI is your agent. Its acts are your acts.

Mobley v. Workday moved the framework from chatbots to autonomous decision-making systems. Derek Mobley applied to over 100 positions through Workday's AI-powered applicant screening system and was rejected within minutes each time, indicating no human review occurred. In July 2024, a federal court allowed discrimination claims against Workday to proceed on an agency theory, finding that the Workday system was "essentially acting in place of the human" and had been delegated responsibility for a function that would otherwise be performed by a person. The case achieved nationwide class action certification in May 2025, covering all applicants over 40 rejected by Workday's AI system.

This is the first time a federal court applied agency theory to hold an AI vendor directly liable for discriminatory outcomes. The legal reasoning is simple and, for anyone deploying agents, should be alarming: if you delegate a function to an AI system, and that system performs the function, you are responsible for how it performs that function. The fact that the decision-maker was a model rather than a person does not reduce your accountability. In the words of the University of Chicago Law Review's analysis of this framework, "people should not be able to obtain a reduced duty of care by substituting an AI agent for a human agent."

Consequential actions by autonomous systems are now a live legal category. Courts are finding liability. Class actions are being certified. The framework is established.

The next chapter is not whether AI agents create liability. It is what evidence determines whether a specific organization is defensible when its agent does something that causes harm.

What "Authorized" Means When a Lawyer Asks the Question

When an agent takes a harmful action, the question a lawyer will ask your team is not "did you intend for the agent to do this?" The question is: "was this action within the scope of what the agent was authorized to do?"

These are different questions with different evidentiary burdens.

Intent is hard to prove either way. Authorization is documented or it is not.

In traditional agency law, an agent's authority comes in two forms. Express authority is what the principal explicitly authorized. Implied authority is what the agent reasonably believed was authorized based on the relationship. Both forms require that someone, somewhere, made a decision about scope.

For AI agents, express authority should exist in the policy that governs the agent's actions. What tools can it use? What operations can it perform? At what parameter ranges? Under what conditions? If that policy is documented, version-controlled, and enforced at execution time, you can answer the authorization question. You can produce the policy that was in effect at the time of the action and demonstrate that the action was either within or outside its terms.

If your agent's "policy" lives in a system prompt, you are in a significantly weaker position. System prompts are not policies in any legal sense. They are instructions to a probabilistic model. They have no version history that a court would recognize as authoritative. They cannot be enforced deterministically. They can be overridden by sufficiently sophisticated inputs. When a lawyer asks what policy governed the agent's action, you will not be able to produce a document with a version hash, a change history, and evidence that it was actually enforced.

The difference between a documented, enforced policy and a system prompt instruction is the difference between an organization that can defend itself and one that cannot.

The Regulatory Timeline Is Not Theoretical

Three regulatory frameworks are converting from guidance to enforcement in 2026, and all three have specific implications for organizations deploying autonomous agents.

The EU AI Act became fully applicable in August 2026 for most operators. The penalty regime includes fines of up to €35 million or 7% of global annual turnover for prohibited practices, and up to €15 million or 3% of turnover for other compliance failures. High-risk AI systems operating in the EU must maintain technical documentation, ensure human oversight, keep access logs, and demonstrate that their systems perform as intended. The Act applies extraterritorially: if your agent's output is used in the EU, the Act applies to you regardless of where you are incorporated.

The EU Product Liability Directive, required to be implemented by EU member states by December 2026, explicitly includes software and AI systems as "products." This matters because product liability is strict liability: you do not need to prove negligence, only that the product was defective and caused harm. For AI systems capable of autonomous action, a single unauthorized transaction, a single erroneous financial decision, a single data exposure could constitute a product defect.

Colorado's AI Act, taking effect June 2026, requires that developers and deployers of high-risk AI systems implement risk management programs, conduct annual impact assessments, disclose the use of AI in consequential decisions, and maintain records that allow for post-hoc investigation of outcomes. Other states are following. The EU AI Act's framework is increasingly being used as a reference point even in jurisdictions where it does not directly apply.

In each framework, the compliance burden falls primarily on deployers, the organizations that put agents into production and operate them against real customers and real data. Vendor contracts matter for indemnification negotiations, but as the Mobley case demonstrated, deployer liability does not disappear because the vendor's contract limits its own exposure.

The Three Questions That Determine Defensibility

When a plaintiff attorney, regulator, or insurer investigates an AI agent incident, three questions determine whether an organization can mount a credible defense.

First: Can you show what the agent was authorized to do at the time it acted?

This requires a policy that was in effect at the specific moment of the action, with evidence that the policy was actually enforced (not merely advisory), in a form that survived without modification from the time of the action to the time of the investigation.

A system prompt satisfies none of these requirements. It has no timestamp proving it was the version active at the time. It has no evidence of enforcement, since a model that received instructions may or may not follow them. It has no tamper-evidence, meaning it can be modified after the fact without detection.

A version-controlled policy file, hashed at the time of each decision and recorded in a tamper-evident audit trail alongside the action it governed, satisfies all three requirements. Every decision record contains the hash of the policy that produced it. The policy cannot be retroactively changed without the hash changing and breaking the audit chain. You can prove, to the standard a court would accept, what rules were in effect and that they were actually enforced.

Second: Can you show that the action was reviewed by a human where the consequence warranted it?

For consequential actions such as financial transactions above material thresholds, modifications to customer accounts, communications that could constitute commitments, or access to sensitive data, the question of whether a human was in the loop is directly relevant to both negligence analysis and regulatory compliance.

Most agent deployments have no documentation of when human review occurred and when it did not. The approval, if it happened, was a click in a UI that generated no record tied cryptographically to the specific action being approved. If you need to demonstrate, in litigation or in a regulatory investigation, that a human reviewed a specific high-value agent action, you need a record that ties the human approval to the canonical description of the action being approved, at a specific timestamp, in a form that cannot be altered.

Third: Can you show that your audit records are complete and have not been modified?

Standard logging systems are not designed to be tamper-evident. Logs can be deleted. Retention policies can be changed. Records can be modified. This is acceptable for operational logging, where the purpose is debugging and monitoring. It is not acceptable for compliance documentation where the legal question is whether the records accurately reflect what happened.

Davis Wright Tremaine's analysis of agentic AI risks notes that these risks "may be amplified by the fact that agentic AI systems may not provide fully transparent audit trails for ongoing oversight and review." This is the current state of most agent deployments. The audit trail exists. Its integrity cannot be proven.

Cryptographically chained records, where each record's hash depends on the hash of the record before it, change this property. Deleting or modifying any record breaks the chain. The chain can be verified independently. You can prove the record is complete.

The Insurance Problem

Cyber insurance is the immediate, practical pressure point for most organizations, and it is arriving ahead of litigation.

Some insurers are already offering AI Security Riders that require documented evidence of specific controls as a prerequisite for coverage. The controls being required map closely to what execution governance produces: documented authorization policies, human-in-the-loop checkpoints for high-consequence actions, audit trails that support post-incident investigation.

The incident described in the Faramesh incident report ("How an Authorized Agent Cost Us $340,000 in Four Hours") illustrates this specifically. In that scenario, an insurer's forensic team reviewed the agent's logs and found a complete record of what the agent did. They could see every API call. What they could not find was any record of why each action was authorized, under which policy, by whose authority. The claim was denied on the grounds that the organization could not demonstrate that the financial losses resulted from unauthorized agent behavior rather than behavior within the agent's intended scope.

The argument that agents acted "within their authorized scope" cuts both ways. It protects you if you can prove the scope was narrow and the action fell outside it. It exposes you if you cannot prove the scope was narrow at all.

The Director and Officer Question

The Squire Patton Boggs analysis of agentic AI legal risks notes that under the UK Companies Act 2006, directors are required to exercise reasonable care, skill, and diligence, and face potential liability for failures in governance or supervision of AI systems.

This is the D&O exposure that most corporate legal teams have not yet fully internalized. When an agent causes material harm, the investigation will not stop at the corporate entity. It will ask whether the board and executive team were aware of the risks, whether they implemented reasonable governance controls, and whether they exercised appropriate oversight.

"We had monitoring and observability tools in place" is not a governance control. Monitoring records what happened. Governance controls what happens. A board that approved agent deployment and received regular operational metrics without ever reviewing what authorization framework governed the agents' actions will have a difficult time demonstrating that it exercised reasonable oversight.

A board that approved a documented agent authorization policy, reviewed it annually, received attestations that the policy was being enforced at execution time, and maintained records of that review is in a materially different position.

The governance infrastructure is not only a technical requirement. It is the documentation trail that demonstrates corporate leadership fulfilled its duty of care.

What Defensibility Requires Technically

This is not abstract. The legal requirements translate directly to specific technical controls.

Authorization must be documented and enforced deterministically. Policy-as-code, stored in version control, with every action decision tied to the specific policy version that governed it. Not a system prompt. Not a configuration comment. A versioned, hashed policy file evaluated by a deterministic function before each action executes.

Audit records must be tamper-evident. Hash-chained records where the chain can be independently verified. Not logs that can be deleted or modified. Records where the integrity of the complete history can be demonstrated to an auditor or court without requiring them to trust the logging infrastructure.

Human review must be documented at the action level. Approval records tied cryptographically to the canonical description of the action being approved. Not "there was a human in the loop somewhere." Evidence that a specific human reviewed a specific action at a specific time and that the action that subsequently executed matched what was reviewed.

Credentials must not be held by the agent. Ephemeral injection from a secrets manager at execution time. No ambient keys in process memory that a compromised or injected agent could access and exfiltrate. The credential access log showing each fetch tied to a specific authorized action.

Fail-closed behavior must be documented and testable. Evidence that actions outside the defined policy are denied by default, not permitted. This is directly responsive to the question of whether the agent's scope was genuinely bounded.

The Timing

The legal framework is established. Mobley and Air Canada settled the principle. The regulatory deadlines are known dates on a calendar. Insurance requirements are current, not future.

Most litigation around AI agent liability will be filed after incidents have occurred. The evidentiary question is whether documentation that should have been captured from the beginning of the deployment exists to answer the authorization questions.

That documentation cannot be created retroactively. A hash-chained audit trail started in March 2026 cannot contain records from November 2025. A policy with a version history starting in 2026 cannot demonstrate what authorized an agent action in 2025.

The window between now and the first significant AI agent litigation is the window to establish the documentation infrastructure. Not to avoid deploying agents. The competitive pressure to deploy is real and the benefits are real. But to deploy with the governance infrastructure that makes the organization defensible when, not if, a material incident occurs.

The lawsuit is being written. The question is whether the discovery process finds documentation that supports a defense or documentation that supports a claim.

Faramesh produces the specific technical controls described in this post: version-controlled policy-as-code with hash binding to individual action decisions, cryptographically chained Decision Provenance Records, documented human approval workflows tied to specific actions, and ephemeral credential brokering with no ambient key exposure. The core is open source at github.com/faramesh/faramesh-core. The formal security specification is at arxiv.org/pdf/2601.17744. The managed platform is at faramesh.dev.

Previous

More

Next

More

[GET STARTED IN MINUTES]

Ready to give Faramesh a try?

The execution boundary your agents are missing.
Start free. No credit card required.