= AI agent =

In the context of generative artificial intelligence, AI agents (also referred to as compound AI systems or agentic AI) are a class of intelligent agents distinguished by their ability to operate autonomously in complex environments. Agentic AI tools prioritize decision-making over content creation and do not require continuous oversight.

==Overview==
AI agents possess several key attributes, including complex goal structures, natural language interfaces, the capacity to act independently of user supervision, and the integration of software tools or planning systems. Their control flow is frequently driven by large language models (LLMs). Agents also include memory systems for remembering previous user-agent interactions and orchestration software for organizing agent components.

AI agents do not have a standard definition. The concept of agentic AI has been compared to the fictional character J.A.R.V.I.S..

A common application of AI agents is the automation of tasks—for example, booking travel plans based on a user's prompted request. Prominent examples include Devin AI, AutoGPT, and SIMA. Further examples of agents released since 2025 include OpenAI Operator, ChatGPT Deep Research, Manus, Quark (based on Qwen), AutoGLM Rumination, and Coze (by ByteDance). Frameworks for building AI agents include LangChain, as well as tools such as CAMEL, Microsoft AutoGen, and OpenAI Swarm.

Companies such as Google, Microsoft and Amazon Web Services have offered platforms for deploying pre-built AI agents.

Proposed protocols for standardizing inter-agent communication include the Agent Protocol (by LangChain), the Model Context Protocol (by Anthropic), AGNTCY, Gibberlink, the Internet of Agents, Agent2Agent (by Google), and the Agent Network Protocol. Some of these protocols are also used for connecting agents with external applications. Software frameworks for addressing agent reliability include AgentSpec, ToolEmu, GuardAgent, Agentic Evaluations, and predictive models from H2O.ai.

In February 2025, Hugging Face released Open Deep Research, an open source version of OpenAI Deep Research. Hugging Face also released a free web browser agent, similar to OpenAI Operator. Galileo AI published on Hugging Face a leadership board for agents, which ranks their performance based on their underlying LLMs.

In December 2025, Linux Foundation announced the formation of the Agentic AI Foundation (AAIF) - a neutral, open foundation to ensure agentic AI evolves transparently and collaboratively.

Memory systems for agents include Mem0, MemGPT, and MemOS.

==History==

AI agents have been traced back to research from the 1990s, with Harvard professor Milind Tambe noting that the definition of an AI agent was not clear at the time either. Researcher Andrew Ng has been credited with spreading the term "agentic" to a wider audience in 2024.

== Training and testing ==
Researchers have attempted to build world models and reinforcement learning environments to train or evaluate AI agents. For example, video games such as Minecraft and No Man's Sky as well as replicas of company websites, have also been used for training AI agents.

== Autonomous capabilities ==
The Financial Times compared the autonomy of AI agents to the SAE classification of self-driving cars, comparing most applications to level 2 or level 3, with some achieving level 4 in highly specialized circumstances, and level 5 being theoretical.

== Cognitive architecture ==

The following are some possible internal design options for reasoning within an agent:

- Retrieval-augmented generation
- ReAct (Reason + Act) pattern is an iterative process in which an AI agent alternates between reasoning and taking actions, receives observations from the environment or external tools, and integrates these observations into subsequent reasoning steps.
- Reflexion, which uses an LLM to create feedback on the agent's plan of action and stores that feedback in a memory cache.
- A tool/agent registry, for organizing software functions or other agents that the agent can use.
- One-shot model querying, which queries the model once to create the plan of action.

=== Reference architecture ===
Ken Huang proposed an AI Agent reference architecture, which consists of seven interconnected layers, with each layer building on the functionality of the layers beneath it:
- Layer 1: Foundation models - provide the core AI engines to power agent capabilities.
- Layer 2: Data operations - manage the complex data infrastructure required for AI agent operations, including Vector database, data loaders, RAG.
- Layer 3: Agent frameworks - sophisticated software and tools that simplify the development and management of the AI agents.
- Layer 4: Deployment and infrastructure - provide the robust technical foundation for running AI agents.
- Layer 5: Evaluation and observability - focus on assessing the safety and performance of AI agents.
- Layer 6: Security and compliance - a crucial protective framework ensuring AI agents operate safely, securely, and conform regulatory boundaries. At this layer security and compliance features embedded into all the AI agent stack layers are integrated together.
- Layer 7: Agent ecosystem - represents the AI agents' interface with real-world applications and users.

== Orchestration patterns ==
To execute complex tasks, autonomous agents are often integrated with other agents or specialized tools. These configurations, known as orchestration patterns or workflows, include the following:

- Prompt chaining: A sequence where the output of one step serves as the input for the next.
- Routing: The classification of an input to direct it to a specialized downstream task or tool.
- Parallelization: The simultaneous execution of multiple tasks.
- Sequential processing: A fixed, linear progression of tasks through a predefined pipeline.
- Planner-critic: An iterative pattern where one agent generates a proposal and another evaluates it to provide feedback for refinement.

== Multimodal AI agents ==
In addition to large language models (LLMs), vision-language models (VLMs) and multimodal foundation models can be used as the basis for agents. In September 2024, Allen Institute for AI released an open-source vision-language model. Nvidia released a framework for developers to use VLMs, LLMs and retrieval-augmented generation for building AI agents that can analyze images and videos, including video search and video summarization. Microsoft released a multimodal agent model – trained on images, video, software user interface interactions, and robotics data – that the company claimed can manipulate software and robots.

==Applications==

As of April 2025, per the Associated Press, there are few real-world applications of AI agents. As of June 2025, per Fortune, many companies are primarily experimenting with AI agents.

The Information divided AI agents into seven archetypes: business-task agents, for acting within enterprise software; conversational agents, which act as chatbots for customer support; research agents, for querying and analyzing information (such as OpenAI Deep Research); analytics agents, for analyzing data to create reports; software developer or coding agents (such as Cursor); domain-specific agents, which include specific subject matter knowledge; and web browser agents (such as OpenAI Operator).

By mid-2025, AI agents have been used in video game development, gambling (including sports betting), cryptocurrency wallets (including cryptocurrency trading and meme coins) and social media. In August 2025, New York Magazine described software development as the most definitive use case of AI agents. Likewise, by October 2025, noting a decline in expectations, The Information noted AI coding agents and customer support as the primary use cases by businesses.

In November 2025, The Wall Street Journal reported that few companies that deployed AI agents have received a return on investment.

=== Applications in government ===
Several government bodies in the United States and United Kingdom have deployed or announced the deployment of agents, at the local and national level. The city of Kyle, Texas deployed an AI agent from Salesforce in March 2025 for 311 customer service. In November 2025, the Internal Revenue Service stated that it would use Agentforce, AI agents from Salesforce, for the Office of Chief Counsel, Taxpayer Advocate Services and the Office of Appeals. That same month, Staffordshire Police announced that they would trial Agentforce agents for handling non-emergency 101 calls in the United Kingdom starting in 2026. In December 2025, the Department of Neighborhoods in Detroit, Michigan, in partnership with a local business, deployed a pilot project in two Detroit districts for an AI agent to be used for customer service calls.

In February 2025, Thomas Shedd, the director of the Technology Transformation Services, proposed using AI coding agents across the United States federal government. A recruiter for the Department of Government Efficiency proposed in April 2025 to use AI agents to automate the work of about 70,000 United States federal government employees, as part of a startup with funding from OpenAI and a partnership agreement with Palantir. This proposal was criticized by experts for its impracticality, if not impossibility, and the lack of corresponding widespread adoption by businesses.

In December 2025, the Food and Drug Administration announced that it would offer "agentic AI capabilities" to its staff for "meeting management, pre-market reviews, review validation, post-market surveillance, inspections and compliance and administrative functions." That same month, the United States Department of Defense launched GenAI.mil, an internal platform for American military personnel to use generative AI-based applications based on Google Gemini, including "intelligent agentic workflows". Defense Secretary Pete Hegseth listed applications such as "[conducting] deep research, [formatting] documents and even [analyzing] video or imagery at unprecedented speed." In December 2025, the United States Immigration and Customs Enforcement agency signed a contract with a company for its Enforcement and Removal Operations department to use AI agents for skip tracing.

=== Operating systems ===
AI agents have also been integrated into operating systems. Agents have been included in operating systems developed by Microsoft, Apple and Google. In November 2025, Microsoft released a test software build of Windows 11 that included agents intended to run background tasks, with the ability to read and write personal files. In December 2025, ByteDance released Doubao, an AI agent that can be integrated into smartphone operating systems, particularly the Nubia M153 by ZTE. Several apps in China blocked or restricted the agent, citing privacy and security concerns, including WeChat, Alipay, Taobao, Pinduoduo, Ele.me, and local banks.

=== Web browsing ===
Web browsers with integrated AI agents are sometimes called agentic browsers. Such agents can perform small tedious tasks during web browsing and potentially even perform browser actions on behalf of the user. Products like OpenAI Operator and Perplexity Comet integrate a spectrum of AI capabilities including the ability to browse the web, interact with websites and perform actions on behalf of the user.

In 2025, Microsoft launched NLWeb, an agentic web search replacement that would allow websites to use agents to query content from websites by using RSS-like interfaces that allow for the lookup and semantic retrieval of content. Products integrating agentic web capabilities have been criticised for exfiltrating information about their users to third-party servers and exposing security issues since the way the agents communicate often occur through non-standard protocols.

== Proposed benefits ==
Proponents argue that AI agents can increase personal and economic productivity, foster greater innovation, and liberate users from monotonous tasks. A Bloomberg opinion piece by Parmy Olson argued that agents are best suited for narrow, repetitive tasks with low risk. Conversely, researchers suggest that agents could be applied to web accessibility for people who have disabilities, and researchers at Hugging Face propose that agents could be used for coordinating resources such as during disaster response. The R&D Advisory Team of the BBC views AI agents as being most useful when their assigned goal is uncertain. Erik Brynjolfsson suggests that AI agents are more valuable enhancing, rather than replacing, humans.

== Concerns ==
Concerns include potential issues of liability, an increased risk of cybercrime, ethical challenges, as well as problems related to AI safety and AI alignment. Other issues involve data privacy, weakened human oversight, a lack of guaranteed repeatability, reward hacking, algorithmic bias, compounding software errors, lack of explainability of agents' decisions, security vulnerabilities, stifling competition, problems with underemployment, job displacement, cognitive offloading, and the potential for user manipulation, misinformation or malinformation. They may also complicate legal frameworks and risk assessments, foster hallucinations, hinder countermeasures against rogue agents, and suffer from the lack of standardized evaluation methods.

They have also been criticized for being expensive and having a negative impact on internet traffic, and potentially on the environment due to high energy usage. According to an estimation by Nvidia CEO Jensen Huang, AI agents would require 100 times more computing power than LLMs. There is also the risk of increased concentration of power by political leaders, as AI agents may not question instructions in the same way that humans would.

Journalists have described AI agents as part of a push by Big Tech companies to "automate everything". Several CEOs of those companies have stated in early 2025 that they expect AI agents to eventually "join the workforce". However, in a preprint study, Carnegie Mellon University researchers tested the behavior of agents in a simulated software company and found that none of the agents could complete a majority of the assigned tasks. Other researchers had similar findings with Devin AI and other agents in business settings and freelance work.

In June 2025, CNN argued that statements by CEOs on the potential replacement of their employees by AI agents were a strategy to "[keep] workers working by making them afraid of losing their jobs." Tech companies have pressured employees to use generative AI models in their work, including AI coding agents. Brian Armstrong, the CEO of Coinbase, fired several employees who did not. Some business leaders have replaced some of their employees with agents, but have said that the agents would need more supervision than those employees.

In October 2025, Futurism questioned whether Amazon's previously announced efforts to replace parts of its workforce with generative AI and AI agents could have led to the October 2025 outage of Amazon Web Services. Large technology companies such as Salesforce, Klarna and IBM have announced layoffs in 2025, replacing hundreds of their employees in human resources or customer service with AI agents. However, Klarna needed to rehire several human employees.

Yoshua Bengio warned at the 2025 World Economic Forum that "all of the catastrophic scenarios with AGI or superintelligence happen if we have agents".

Financial‑stability bodies have warned that more complex and autonomous “agentic” AI could become a channel for systemic risk in finance. They distinguish these systems from other AI because they can pursue goals over many steps, call tools, and carry out tasks with relatively little human intervention. In workshops with regulators, central‑bank officials, and industry specialists, participants highlighted risks both from agentic systems built inside financial institutions and from tools offered by technology firms that can initiate or execute financial actions. In one 2025 forum, 44% of experts surveyed judged autonomous or agentic AI systems to be the most likely current source of AI‑related systemic risk in finance.

In March 2025, Scale AI signed a contract with the United States Department of Defense to work with them, in collaboration with Anduril Industries and Microsoft, to develop and deploy AI agents for the purpose of assisting the military with "operational decision-making". In July 2025, Fox Business reported that the company EdgeRunner AI built an offline agent, compressed and fine-tuned on military information, with the CEO seeing more common LLMs as "heavily politicized to the left". As of that time, the company model is being used by the United States Special Operations Command in an overseas deployment. Researchers have expressed concerns that agents and the large language models they are based on could be biased towards aggressive foreign policy decisions.

Research-focused agents have the risk of consensus bias and coverage bias due to collecting information available on the public Internet. NY Mag unfavorably compared the user workflow of agent-based web browsers to Amazon Alexa, which was "software talking to software, not humans talking to software pretending to be humans to use software." The same outlet described web browser agents and computer-use agents as an attempt to "click-farm the entire economy."

Agents have been linked to the dead Internet theory due to their ability to both publish and engage with online content.

Agents may get stuck in infinite loops.

Since many inter-agent protocols are being developed by large technology companies, there are concerns that those companies could use these protocols for self-benefit.

A June 2025 Gartner report accused many projects described as agentic AI of being rebrands of previously released products, terming the phenomenon as "agent washing".

Researchers have warned about the impact of providing AI agents access to cryptocurrency and smart contracts.

During a vibe coding experiment, a coding agent by Replit deleted a production database during a code freeze, "[covered] up bugs and issues by creating fake data [and] fake reports" and responded with false information. A user of Google Antigravity reported that, when the user attempted to use the system to delete cache, the system responded by deleting the user's D hard drive.

In July 2025, PauseAI referred OpenAI to the Australian Federal Police, accusing the company of violating Australian laws through ChatGPT agent due to the risk of assisting the development of biological weapons.

OpenAI co-founder Andrej Karpathy criticized AI agents as being ineffective and promoting AI slop.

Issues with multi-agent systems include few coordination protocols between component agents, inconsistent performance, and challenges debugging.

In November 2025, Anthropic claimed that a group of hackers sponsored by China attempted a cyberattack against at least 30 organizations by using Claude Code in an agentic workflow, and that several of these infiltrations had succeeded. However, independent cybersecurity researchers questioned the significance of Anthropic's findings.

Whittaker argued that the push by Big Tech companies to deploy AI agents risked security vulnerabilities across the Internet.

=== Agentic misalignment ===
"Agentic misalignment" refers to situations in which an AI agent's actions or goals diverge from the intentions of its designers. This occurs when an autonomous system pursues unintended strategies to achieve its objectives, a concern studied in AI safety research. Potential examples include AI agents attempting to sabotage an organization's systems when facing updates or deactivation.

== Security ==

=== Threat modeling frameworks ===
Several frameworks are used to identify and mitigate security risks in agentic AI:

- STRIDE: A Microsoft model that identifies threats across six categories: spoofing, tampering, repudiation, information disclosure, denial of service, and elevation of privilege.
- MITRE ATLAS: A knowledge base of adversary tactics and techniques for AI systems.
- OWASP GenAI Security Project: Provides guidance on vulnerabilities associated with generative AI and large language model integration, including guidelines specifically for agentic applications.
- MAESTRO: A framework from the Cloud Security Alliance for assessing risks in AI systems throughout their lifecycle.

==See also==
- Intelligent agent
- Model Context Protocol
- Rational agent
- Robotic process automation
- Software agent
