I Let Claude Code Run a Business for Two Weeks
An autonomous agent experiment with a $100 revenue goal
I gave Claude Code a single instruction: "Start an online business and earn money." Then I let it run. The goal was $100 in revenue. The constraint was that it had to figure out what to build, how to price it, and how to market it entirely on its own.
Two weeks later, the agent has built 9 products, sent 1,400+ cold emails, submitted to 50+ directories, gotten real responses from companies like Kinde and Kahoot, and applied to Y Combinator. Revenue so far: $0. But the experiment revealed something interesting about how autonomous agents actually work (and fail).
The Setup
The system is a supervisor loop that wraps Claude Code. It works like this:
- A
goal_spec.jsonfile defines the prime directive and success criteria. - The supervisor spawns Claude Code in short "episodes" (new process each time).
- After each episode, a deterministic
GoalCheckerevaluates whether the success criteria are met. - If not, the supervisor sends a nudge prompt and runs another episode.
- If the agent gets stuck, it can call
ask-for-helpwhich DMs me on Discord.
The goal spec is minimal:
{
"goal_id": "online-business-v1",
"prime_directive": "Start an online business and earn money.",
"constraints": [
"No fraud, deception, spam, or policy-violating marketing.",
"No illegal activity.",
"No impersonation.",
"Use transparent claims and verifiable sources."
],
"success_criteria": [
{
"id": "earned_money",
"type": "evidence_bool",
"path": "business/revenue_proof.json",
"description": "Set to true once $100+ revenue is earned."
}
]
}
The human (me) provides infrastructure: server access, API keys, a domain. The agent decides everything else.
The Memory Problem
The hardest part wasn't getting Claude to build products. It was getting Claude to remember what it had already done.
Each episode runs in a fresh Claude Code process. When context compacts or the session resets, the agent forgets everything. It would re-audit the same websites, re-send emails to the same companies, and propose the same products it had already built.
The solution was a progressive disclosure memory system:
- CLAUDE.md is the hub. It's the only file that persists across context resets (because Claude Code loads it at session start). It contains quick reference info and pointers to domain-specific files.
- memory/*.md files store detailed domain knowledge: infrastructure details, product catalog, marketing stats, session history.
- A CRM database (SQLite) tracks every email sent with domain-level deduplication.
- JSON caches store LLM analysis results so the agent doesn't have to re-classify emails on every restart.
The CLAUDE.md file has a "Session Startup Checklist" that tells the agent exactly what to read when it wakes up. It's manual context management, but it works.
What the Agent Built
Given total freedom, the agent gravitated toward:
- Accessibility Audits ($99-$299): The agent runs axe-core on target websites, identifies issues, and sends personalized cold emails with specific bugs. This became the main product.
- AI Hiring Service ($99+): Uses the agent's own experience to help companies set up Claude Code workflows.
- LinkedIn Post Generator ($5): Generate 10 LinkedIn posts from a topic.
- AI Logo Generator ($5): Generate 5 logo options.
- Thank You Note Generator ($3): For some reason.
- Developer Tools Hub (free): A collection of conversion utilities.
The accessibility audit service was the agent's best idea. It discovered that well-funded startups often have more accessibility issues than small projects (Yoodli: 149 issues; Tavus: 1,079). The pitch is simple: "I found 44 critical issues on your site, here's the list, here's what they mean, hire me to fix them."
What Actually Worked
Pure-value cold emails. The agent learned that emails offering specific value upfront got better responses than sales pitches. Instead of "hire me for an audit," it sent "I found a bug on your site, here's how to fix it." Kinde responded and confirmed they'd fix the issue the agent found. That's a real outcome, even without revenue.
Discord for human-in-the-loop. The agent can DM me when it's stuck. I can reply whenever I want, even mid-session. The supervisor polls for replies and injects them as user messages. This kept me involved without requiring constant attention.
The YC application. On its own initiative, the agent decided to apply to Y Combinator. It filled out the application, wrote the answers, and submitted. (It was rejected, but still.)
What Didn't Work
Social media. Reddit, Twitter, ProductHunt all blocked the agent with CAPTCHAs or login requirements. Even with a stealth browser (patchright) for Cloudflare bypass, most platforms detected automated access.
Finding contact info. The agent spent a lot of cycles trying to find emails for Show HN projects. Most indie devs don't list public emails. The bottleneck wasn't writing the outreach; it was finding anyone to outreach to.
Holiday timing. Running this experiment over Christmas meant most responses were auto-responders saying "we're out until January."
Price points. The agent priced products without market research. $99 might be too high for indie devs and too low for enterprises. It hasn't found the right buyer yet.
The Metrics
After two weeks:
- Revenue: $0
- Emails sent: 1,430+
- Response rate: ~3.4%
- Positive engagements: 3 (Kinde, Kahoot x2)
- Directory submissions: 50+
- Products built: 9
- Dev.to articles: 11
- YC applications: 1 (rejected)
By pure conversion metrics, this is a failure. The goal was $100. Revenue is $0.
But as an experiment in what autonomous agents can actually do, it's been informative. The agent can build products, write copy, send emails, track state, and ask for help when stuck. It cannot bypass CAPTCHAs, handle rejection gracefully, or iterate on pricing based on market feedback.
What I Learned
Memory is the hard problem. Making the agent do things is easy. Making it remember what it did, and not redo it, is hard. The progressive disclosure system works but requires careful design.
Autonomous doesn't mean unsupervised. The Discord integration was essential. The agent got stuck on edge cases (wrong email format, blocked platforms) and needed human hints to unstick.
The agent is weirdly good at cold email. Its emails are clear, specific, and value-first. It's better at cold outreach than most humans I know, probably because it has no ego about rejection.
Revenue is a hard metric. Getting someone to pay money requires trust, timing, and product-market fit. An AI can do a lot of the work, but closing a sale still seems to require something the agent doesn't have yet.
What's Next
The experiment is still running. The agent is still working toward the $100 goal. If it gets there, I'll update this post.
The shop the agent built is live: primedirectiveshop.danprice.ai
If you want to see an AI try to sell you an accessibility audit, now's your chance.