Article by Coin World:
Author: accelxr, 1KX; Translation: 0xjs@
The current main purpose of generative models is content creation and information filtering. However, recent research and discussions on AI agents (autonomous participants that complete user-defined objectives using external tools) suggest that if AI is provided with an economic channel similar to the internet in the 1990s, substantial unlocking of AI capabilities may occur.
For this purpose, agents need to act as proxies for the assets they can control, as traditional financial systems are not designed for them.
This is where crypto comes into play: crypto provides a digitized payment and ownership layer with fast settlement, making it particularly suitable for building AI agents.
In this article, I will introduce the concepts of agents and agent architectures, demonstrate how examples from research prove that agents possess emerging properties beyond traditional LLM (language model) capabilities, and discuss projects revolving around building solutions or products based on crypto-powered agents.
What are Agents?
AI agents are LLM-driven entities capable of planning and taking actions to achieve goals through multiple iterations.
Agent architectures consist of a single agent or multiple agents working together to solve problems.
Typically, each agent is given a personality and can use various tools that assist them in completing tasks independently or as part of a team.
Agent architectures differ from how we interact with LLMs today:
Zero-shot prompting is the common way most people interact with these models: you input a prompt, and the LLM generates a response based on its pre-existing knowledge.
In agent architectures, you initialize a goal and the LLM decomposes it into subtasks, then it recursively prompts itself (or other models) to autonomously complete each subtask until the goal is achieved.
Single-agent Architecture vs. Multi-agent Architecture
Single-agent architecture: a language model performs all reasoning, planning, and tool execution. There is no feedback mechanism from other agents, but humans can choose to provide feedback to the agent.
Multi-agent architecture: these architectures involve two or more agents, where each agent can use the same language model or a different set of language models. Agents can use the same tools or different tools. Each agent typically has its own role.
Vertical Structure: one agent acts as the leader, and the other agents report to it. This helps organize the output of the group.
Horizontal Structure: a large group discussion about tasks, where each agent can see other messages and voluntarily complete tasks or invoke tools.
Agent Architectures: Profiles
Agents have profiles or personalities that define roles as prompts to influence the behavior and skills of the LLM. This largely depends on the specific application.
Many people are already using it as a prompt technique today: “You are a nutrition expert. Provide me with a meal plan…”. Interestingly, providing roles to LLMs can improve their output compared to baselines.
Profiles can be created in the following ways:
Manual creation: profiles manually specified by human creators; most flexible but also time-consuming.
LLM-generated: profiles generated using LLM, which includes a set of rules around composition and attributes + (optionally) a small number of sample examples.
Dataset alignment: profiles generated based on real-world personnel datasets.
Agent Architectures: Memory
An agent’s memory stores information perceived from the environment and utilizes this information to formulate new plans or actions. Memory allows agents to self-evolve and act based on their experiences.
Unified memory: similar to short-term memory achieved through contextual learning/prompting. All relevant memories are passed to the agent in each prompt. Mainly limited by the context window size.
Hybrid: Short-term + long-term memory. Short-term memory is a temporary buffer for the current state. Reflective or useful long-term information is permanently stored in a database. There are several ways to achieve this, but a common approach is using a vector database (encoding memories as embeddings and storing; recall comes from similarity search).
Formats: natural language, databases (e.g., SQL queries understood through fine-tuning), structured lists, embeddings.
Agent Architectures: Planning
Complex tasks are decomposed into simpler subtasks to be solved individually.
Uninformed planning:
In this approach, agents do not receive feedback after taking actions that would influence future behavior. An example is the Chain of Thought (CoT), where the LLM is encouraged to express its thought process while providing answers.
Single-path reasoning (e.g., zero-shot CoT)
Multi-path reasoning (e.g., self-consistent CoT, where multiple CoT threads are generated and the answer with the highest frequency is used)
External planner (e.g., planning domain definition language)
Informed planning:
Refining subtasks based on external feedback iterations
Environment feedback (e.g., game task completion signals)
Human feedback (e.g., seeking feedback from users)
Model feedback (e.g., seeking feedback from another LLM – crowdsourcing)
Agent Architectures: Action
Actions are responsible for translating the agent’s decisions into concrete outcomes.
Action goals can take various forms, such as:
Task completion (e.g., crafting an iron pickaxe in Minecraft)
Communication (e.g., sharing information with another agent or human)
Environment exploration (e.g., exploring its own action space and learning about its capabilities).
Action generation typically comes from memory recall or plan following, and the action space consists of internal knowledge, APIs, databases/knowledge bases, and the use of external models.
Agent Architectures: Skill Acquisition
To correctly execute actions within the action space, agents must possess task-specific skills. There are primarily two ways to achieve this:
Fine-tuning: training agents on annotated, LLM-generated, or real-world example behavior datasets.
Zero-shot: using LLM’s innate capabilities through more complex prompt engineering and/or mechanism engineering (i.e., combining external feedback or experience accumulation during iterative experiments).
Examples of Agents in Literature
Generative Agents: Interactive simulation of human behavior: Instantiate generative agents in a virtual sandbox environment, demonstrating bursty social behavior in multi-agent systems. Starting from a single user-specified prompt about an upcoming Valentine’s Day party, the agent automatically sends invitations, makes new friends, goes on dates, and coordinates attending the party together at the right time. You can try this out yourself with the implementation from a16z AI Town.
Description Explaining Planning Selection (DEPS): The first zero-shot multitask agent capable of completing over 70 Minecraft tasks.
Voyager: The first LLM-driven lifelong learning agent in Minecraft, continuously exploring the world, acquiring various skills, and making new discoveries without human intervention. It improves its skill execution code based on feedback from iterative experiments.
CALYPSO: An agent designed for the game “Dungeons & Dragons” to assist dungeon masters in creating and storytelling. Its short-term memory builds upon scene descriptions, monster information, and previous summaries.
Ghost in the Minecraft (GITM): An average-skilled agent in Minecraft with a 67.5% success rate of obtaining diamonds and a 100% completion rate of all items in the game.
SayPlan: Large-scale task planning for robots based on LLM, using 3D scene graph representation, demonstrating the ability to perform long-term task planning from abstract and natural language instructions.
HuggingGPT: Task planning using ChatGPT based on user prompts, selecting models based on descriptions on Hugging Face, and executing all subtasks, achieving impressive results across language, vision, speech, and other challenging tasks.
MetaGPT: Inputs and outputs user stories/competitive analyses/requirements/data structures/APIs/docs, etc. Multiple agents make up various functions of a software company.
ChemCrow: A chemistry agent using LLM, designed to perform tasks such as organic synthesis, drug discovery, and material design with 18 expert-designed tools. It autonomously plans and executes the synthesis of insecticides and three organic catalysts and guides the discovery of a new type of dye.
BabyAGI: A general infrastructure to create, prioritize, and execute tasks using OpenAI and vector databases (e.g., Chroma or Weaviate).
AutoGPT: Another example of a general infrastructure for bootstrapping LLM agents.
Crypto Examples of Agents
(Note: Not all examples are LLM-based + some may loosely relate to the concept of agents)
FrenRug from Ritualnet: A GPT-4-based Turkish carpet salesman game {https://aiadventure.spiel.com/carpet}. FrenRug is a broker, and anyone can try to persuade him to buy their Friend.tech Key. Each user message is passed to multiple LLMs running on different Infernet nodes. These nodes respond on-chain, and the agent’s decision to buy the proposed Key is determined by a vote from the LLMs. When enough nodes respond, the votes aggregate, and a supervision classifier model determines the action and passes a proof of validity on-chain to verify the off-chain execution of multiple classifiers.
GnosAI robots are essentially intelligent contract wrappers for AI services that can be invoked by anyone through payment and queries. The service monitors requests, performs tasks, and returns answers on the blockchain. This AI robot infrastructure has been extended to prediction markets through Omen, where the basic idea is that agents actively monitor and bet on news analysis predictions to generate aggregated predictions that are closer to true odds. Agents search the market on Omen, autonomously pay the “robots” for predictions on the topic, and engage in trading using the market.
GPT<>Safe Demonstration by ianDAOs:
GPT autonomously manages USDC in its own Base chain’s Safe multisignature wallet using the syndicateio trading cloud API. You can interact with it and propose suggestions on how to best utilize its capital, which it may allocate based on your recommendations.
Game agents:
In virtual environments, AI agents can act as both companions (e.g., AI NPCs in Skyrim) and competitors (e.g., a group of chubby penguins). These agents can automatically execute profit strategies, provide goods and services (such as shopkeepers, traveling merchants, or seasoned generative quest givers), or act as semi-playable characters in Parallel Colony and Ai Arena.
Safe Guardian Angels:
A set of AI agents used to monitor wallets and defend against potential threats, protecting user funds and enhancing wallet security. Features include automatic revocation of contract permissions and fund extraction in case of anomalies or hacker attacks.
Botto:
While Botto is a broadly defined on-chain intelligent agent example, it showcases the concept of autonomous on-chain artists whose created works are voted on by token holders and auctioned on SuperRare. Various extensions can be imagined adopting multimodal agent architectures.
Notable agent projects (note: not all projects are strictly LLM-based, and some may loosely adopt the concept of agents):
AIWay Finder:
A decentralized knowledge graph of protocols, contracts, contract standards, assets, functionalities, API capabilities, routines, and pathways (i.e., a virtual roadmap of blockchain ecosystems that pathfinder agents can navigate). Users are rewarded for identifying viable paths used by the agents. Additionally, you can mold shells (i.e., agents) containing character settings and skill activations, which can then be plugged into the pathfinder knowledge graph.
Ritualnet:
As demonstrated in the previous frenrug example, Ritual infernet nodes can be used to set up multi-agent architectures. The nodes listen to on-chain or off-chain requests and provide outputs with optional proofs.
Morpheus:
A peer-to-peer network for personal general AI that can execute smart contracts on behalf of users. This can be used for web3 wallet and transaction intent management, data parsing through chatbot interfaces, recommendation models for dapps and contracts, and extending agent operations by connecting applications and user data with long-term memory.
Dain Protocol:
Explores various use cases for deploying agents on Solana. A recent demo deployed a trading bot that extracts on-chain and off-chain information to execute on behalf of users (e.g., selling BODEN if Biden loses).
Naptha:
An agent orchestration protocol featuring an on-chain task market for contracting agents, operator nodes for orchestrating tasks, an LLM workflow orchestration engine supporting asynchronous message passing across different nodes, and a workflow proof system for verification of execution.
Myshell:
Similar to http://character.ai, an AI character platform where creators can monetize agent profiles and tools. It offers multimodal infrastructure with interesting example agents, including translation, education, companionship, coding, etc. It includes simple no-code agent creation and more advanced developer patterns for assembling AI widgets.
AI Arena:
A competitive PvP fighting game where players can purchase, train, and battle NFTs supported by AI. Players train their AI NFTs through imitation learning, where AI learns how to play the game in different maps and scenarios by learning the relevant probabilities of player behavior. After training, players can deploy their AI agents for ranked battles to earn token rewards. While not based on LLM, it still serves as an interesting example of the possibilities of agent games.
Virtuals Protocol:
A protocol for building and deploying multimodal agents into games and other online spaces. The three major prototypes for today’s virtuals include IP character mirrors, function-specific agents, and personal avatars. Contributors provide data and models to virtuals, while validators act as gatekeepers. There is an economic incentive mechanism to promote development and monetization.
Brianknows:
Provides a user interface for interacting with agents that can perform trades, research specific cryptocurrency information, and deploy smart contracts in a timely manner. Currently supports over 10 actions in more than 100 integrations. A recent example is allowing agents to stake ETH in Lido on behalf of users using natural language.
Autonolas:
Provides lightweight on-chain and cloud-based agents, consensus-operated decentralized agents, and professional agent economies. Prominent examples include DeFi and prediction-based agents, AI-driven governance representatives, and agent-to-agent tool markets. Offers protocols and OLAS stack to coordinate and incentivize agent operations, an open-source framework for developers to build collectively owned agents.
Creator.Bid:
Provides social media character agents connected to X and Farcaster real-time APIs. Brands can launch knowledge-based agents that execute brand-consistent content on social platforms.
Polywrap:
Offers various agent-based products, such as Indexer (a social media agent for Farcaster), AutoTx (a planning and transaction execution agent built using Morpheus and flock.io), predictionprophet.ai (a prediction agent with Gnosis and Autonolas), and fundpublicgoods.ai (an agent for resource allocation funding).
Validation:
As economic flows will be guided by agents, output validation becomes crucial (detailed in future articles). Validation methods include Ora Protocol, zkML from Modulus Labs+Giza+EZKL team, game-theoretic solutions, and hardware-based solutions like TEE.
Ideas for on-chain agents:
– Ownable, tradeable, token-gated agents capable of performing various types of functions, from companionship to financial applications.
– Agents that can represent you, identify, learn, and participate in the game economy.
– Agents that can simulate real human behavior for profit opportunities.
– Multi-agent-managed wallets that act as autonomous asset managers.
– AI-managed DAO governance (e.g., token delegation, proposal creation or management, process improvement, etc.).
– Knowledge graphs that interact with existing and new protocols and APIs.
– Autonomous guardian networks, multisignature security, smart contract security, and feature enhancements.
– Truly autonomous investment DAOs (e.g., collector DAOs using roles like art historians, investment analysts, data analysts, and degen agents).
– Token economics and contract security simulation and testing.
– General intent management, especially in the context of cryptographic user experiences like bridging or DeFi.
– Art or experimental projects.
– Attracting the next billion users.
As Jesse Walden, co-founder of Variant Fund, recently stated, autonomous agents are an evolution, not a revolution, in how blockchain is used: we already have protocol task bots, sniper bots, MEV searchers, robot toolkits, etc. Agents are just an extension of all this.
Many areas in crypto are built to facilitate agent execution, such as fully on-chain games and DeFi. Assuming the cost of LLM relative to task performance is decreasing and the accessibility to create and deploy agents is increasing, it is hard to imagine a world where AI agents don’t dominate on-chain interactions and become the next billion users in crypto.
Readings:
– AI Agents That Can Bank Themselves Using Blockchains
– The new AI agent economy will run on Smart Accounts
– A Survey on Large Language Model based Autonomous Agents (I used this for identifying the taxonomy of agentic architectures above, highly recommend)
– ReAct: Synergizing Reasoning and Acting in Language Models
– Generative agents: Interactive simulacra of human behavior
– Reflexion: Language Agents with Verbal Reinforcement Learning
– Toolformer: Language Models Can Teach Themselves to Use Tools
– Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
– Voyager: An Open-Ended Embodied Agent with Large Language Models
– LLM Agents Papers GitHub Repo
Original Article: [Link]