AI is a microcontractor
If you treat LLMs as human microcontractors, how they work suddenly starts to make sense.
Thinking about AI as a human microcontractor - someone like a Mechanical Turk worker - is a useful mental model. Its strengths and limitations are just properties of this type of a work arrangement - very short contract jobs submitted to a large pool of human workers.
Many technical people struggle to get some good intuition about how LLM-based AI works. We assume that since it’s running on a computer, it will follow the rules strictly. And then we explain the “intelligence” aspect using the mechanistic model of a “next-token-predictor”. I came to the conclusion that some anthropomorphization (“LLM is a human microcontractor”) is actually useful.
AI works well for simple, well-defined tasks
That’s the core idea of a “microcontract” on platforms like MTurk. The job is clearly defined and a worker is expected to complete it in a few minutes. Jobs like data categorization, data validation, writing product descriptions work well there. A human reads the job request and can instantly start producing output.
AI struggles with complex tasks
That’s outside of the format of the microcontract. A worker expects a task that can be done in a few minutes without requiring deep analysis. There’s no time and possibility to ask clarifying questions, a need that often appears when working on complex tasks.
AI struggles with following a longer list of rules strictly
If you attach a long document with rules to a microcontract, the worker might review it, but he won’t treat it very strictly. The focus is on creating the task output ASAP, and following rules strictly requires extra work of keeping some notes etc. So, again, it goes outside of the microcontract format.
AI isn’t 100% correct even for simple tasks
Even the most advanced LLM models don’t have a 100% correctness rate for very simple tasks requiring strictness (like outputting a JSON document with fields foo and bar). It might be 99.99999%, but not 100%. Same with human workers, obviously.
AI struggles with unclear prompts mixing multiple tasks
The idea of a microcontract is to describe one clear task a worker can finish quickly. If it becomes “multiple tasks”, it confuses the worker. Now the job is to complete “this set of tasks” and requires focusing on multiple things.
AI forgets previous instructions
Each microcontract job is generally handled by a different worker randomly selected from a large pool. If you want to preserve some instructions, you need to explicitly copy them from the previous jobs. There’s no implicit “knowledge base” / “memory” shared between microcontractors, it must be managed explicitly using an external database.
AI is able to produce PhD-level insights
This can also work with microcontracts. A worker has access to Google and can often find good references in reputable sources. Obviously, when you cite a scientific paper, it’s PhD-level knowledge.
What can a worker do if the task isn’t as simple as finding a single citation? He can stitch together multiple citations based on common sense. It might not be perfect and may contain reasoning errors. That’s what shows up when an LLM is being used for expert tasks like solving mathematical theorems - on the surface, it looks reasonable, but deeper investigations show “subtle reasoning mistakes a field expert wouldn’t make”.
Conclusions
AI agent is software for orchestrating microcontractors
This formulation makes it quite clear that creating an AI agent is an unwieldy task. The “software” component is the strictly defined “backbone” of the system, which receives unverified data from AI/workers. Creating something of real value using this setup is hard, because we’re limited by the format of the microcontracts.
Designing an AI agent means decomposing the real-world problem into a graph of microcontract jobs.
Complex, dynamic problems (like a coding agent) might need to use the LLM for the “meta” purposes of deciding what to do and how to decompose the task in hand - so the job graph stops being strictly defined in software. I don’t think such job posts - asking a worker to make some decision or suggest some action - are popular on MTurk, because it makes most sense when the request can be fulfilled instantly. This is something that AI has enabled.
Does it all makes sense?
I find this model to be useful as a mental model for a programmer creating AI agents or even calling LLM APIs for one-time use. It might not bring much to the table, because LLM “best practices” and context engineering advice has always recommended using LLMs for “single, well-defined tasks” and sending a small, task-focused context. However, anthropomorphizing sometimes helps when thinking about software architecture.
What if we would apply this understanding to explain “what LLM AI is capable of”? It’s certainly below the expectations of AI replacing human employees. The outsourcing / offshoring has its limits, and LLM AI replacing employees would be equivalent to an extreme form of outsourcing, with all employees being microcontractors.
On the other hand, LLMs seem to really implement the “microintelligence”. The dream has been partially realized.
