Review · · 11 min read
TesterArmy: what it is, and how to plug it into your vibecoding loop
Cloud QA agent built by three Polish ex-Callstack engineers. CLI, REST API, GitHub App: five integration patterns for Claude Code and Cursor.
TL;DR
TesterArmy is a Y Combinator-backed (Spring 2026, P26) cloud QA agent that drives a real browser, and real iOS/Android builds, using plain-English test descriptions, with vision-based assertions and end-to-end auth handling (OAuth, OTP, captcha). Built by three Polish ex-Callstack engineers operating from San Francisco: Szymon Rybczak (CEO), Oskar Kwaśniewski (CTO), Piotr Matyjasik (CPO). For vibecoders, devs who let Claude Code, Cursor, or Codex write most of the code, TesterArmy ships an official agent skill (npx skills add tester-army/cli) and a JSON-mode CLI that closes the change → test → fix loop without you ever maintaining a Playwright suite. It is also genuinely opinionated about what it doesn't yet do well, which matters more than the marketing.
This article walks through what TesterArmy actually is mechanically, what its integration surface looks like in late April 2026, five concrete patterns to wire it into an AI-coding workflow, and the one structural blocker that decides whether you should adopt it today.
1. What it actually does
The pitch on the homepage is "Test your app with AI, catch bugs before users do." Underneath, the product is a hosted Playwright-style cloud browser runner with an LLM agent on top. You write tests in natural language ("Login, complete onboarding, create a new project, invite a teammate"), the agent breaks the prompt into a step plan you can edit, then runs that plan in a remote browser session. Every assertion is vision-based, the agent looks at the rendered page rather than relying on CSS selectors, which is the explicit pitch against brittle Playwright suites.
Three things are non-obvious from the marketing copy:
Auth is a real engineering surface, not a checkbox. TesterArmy stores credentials with AES-256-GCM, gives every agent run a dedicated email inbox for OTP codes, and handles OAuth, HTTP Basic, file uploads, and captcha solving. This is the hardest part of any E2E suite to maintain by hand, and the most defensible part of what they ship.
Mobile is treated as a first-class target, not an afterthought. You upload an .app (iOS) or .apk (Android) binary, and the same agent drives it the same way. There is a separate tester-army/mobile-github-action@v1 for CI builds. Most direct competitors (Octomind, Reflect, QA.tech, Meticulous) are web-only or charge extra for mobile; this is TesterArmy's clearest moat in the category.
The "army" framing is marketing. Each test run is one agent in one cloud browser. There's no swarm semantics, you can run tests in parallel via --parallel N, but that's just concurrent runs, not coordinated agents.
Run modes available today: manual run from the dashboard, recurring "Production Monitoring" cron, GitHub App auto-PR check (Vercel and Coolify deployments are first-class), and a generic webhook trigger any CI can fire.
What the public site does not tell you, even after careful reading: which LLM powers the agent, where the cloud browsers actually run (Browserbase? Steel? own infra?), the mobile execution backend, and any concrete pricing number. Pricing reads, verbatim: "Start with your free test runs. Paid plans are handled by call when your team is ready." The billing unit is one test run; workspaces are a separate axis. Three tier labels exist (trial, plans, teams) and zero published prices. SOC2/ISO posture is not stated. Self-hosting is mentioned only as an enterprise conversation.
2. Who built it
The founding team is three ex-Callstack React Native engineers with deep open-source pedigree, the strongest non-marketing signal in this whole report. Szymon Rybczak is the 19-year-old CEO who returned to Poland to finish high school while running the company; Oskar Kwaśniewski is the CTO and was a senior RN engineer at Callstack working on on-device LLMs; Piotr Matyjasik is the CPO. Team is four total, based in San Francisco for YC Spring 2026, with Pete Koomen as the YC partner.
The Callstack lineage explains a lot of the product's shape: React Native testing is genuinely hard, the team has been close to that pain for years, and the mobile-binary-as-first-class-target decision is unusual in this category but obvious if you spent your career shipping React Native. It is also the reason the brand is "TesterArmy" without irony, these are not joke-account founders; they came from infrastructure work.
One accident worth flagging if you operate in the Polish market: TestArmy Group S.A. is an established 200-person Polish QA and cybersecurity firm that has existed in Wrocław since 2014. Names will collide in Google for any Polish QA buyer. Will sort itself out as TesterArmy gets traction, but for now the SEO is muddy.
Community signal is honest about the stage. Official GitHub org has four public repos; tester-army/cli has 29 stars; no Product Hunt launch, no Show HN, no Reddit discussion across the obvious subreddits. Founders post on X and LinkedIn; the most concrete user-traction claim is Szymon's LinkedIn note that a viral Bluesky-app demo drove "500+ new signups from Fortune 500 companies and 40+ sales calls within days." Take that at face value: real interest, real pipeline, no public customer logos yet.
3. The integration surface, in detail
CLI
npm install -g testerarmy # or: npx testerarmy ...
ta auth --api-key <KEY> # one-time
ta tests run <testId> --url http://localhost:3000 --json --parallel 3
ta status --json
ta listBinaries: testerarmy and short alias ta. Environment variables: TESTERARMY_API_KEY, TESTERARMY_TARGET_URL, TESTERARMY_PROJECT_ID, TESTERARMY_WEBHOOK_URL. Source: tester-army/cli, MIT-licensed, built on commander plus Playwright 1.59-alpha and @playwright/mcp. Exit code is 1 on test failure; artifacts land in .testerarmy/<timestamp>/.
REST API
OpenAPI 3.1 spec at /api/v1/openapi.json, 19 endpoints, full reference at docs.tester.army/api-reference. Auth is Authorization: Bearer <API_KEY>. Endpoints worth knowing:
- Trigger a run:
POST /v1/runs(accepts apromptfield, but the CLI does not yet expose it) - Trigger via webhook:
POST /v1/webhook/{id}/{secret}andPOST /v1/groups/webhook/{id}/{secret} - Poll a run:
GET /v1/runs/{id} - Cancel:
POST /v1/runs/{id}/cancel - List/create tests:
GET/POST /v1/tests - Mobile binary upload:
POST /v1/projects/{projectId}/mobile/uploadand/upload/confirm(multipart)
Rate limits not documented in headers, but every endpoint defines a 429 response. Storage cap is 2 GB per project; one webhook per group. No GraphQL, no SDKs in any language.
GitHub App
Install once. Auto-comments on every PR with screenshots, video recordings, and a link to the full report. Includes an "exploration agent" that reads the PR title and diff and generates a targeted test plan, interesting because it covers the gap where no human curated a suite for the new feature. Vercel and Coolify deployment events fire it automatically.
GitHub Action (mobile)
- uses: tester-army/mobile-github-action@v1
with:
app_path: .build/MyApp.app
api_key: ${{ secrets.TESTERARMY_API_KEY }}
project_id: ${{ secrets.TESTERARMY_PROJECT_ID }}
webhook_url: ${{ secrets.TESTERARMY_WEBHOOK_URL }}
timeout_seconds: "1800"Outputs: app_id, run_ids (JSON array), overall_status (passed | failed | timed_out).
Agent skill
npx skills add tester-army/cli installs an opinionated agent skill in the Skills format that Claude Code, Codex, and OpenCode read. Teaches the coding agent how to call the CLI, parse JSON output, read artifacts, and re-run on failure. Path of least resistance.
MCP server
No official or community MCP server today. The CLI bundles @playwright/mcp as a dependency, but that's Playwright's MCP, not TesterArmy's own.
Self-hosting
SaaS only. No offline runner. The cloud agent needs to reach your URL. For localhost:3000 you need a tunnel (ngrok, cloudflared) unless testing a deployed preview.
4. Five concrete vibecoding integration patterns
Pattern 1, Slash command: /test-this
Trigger: developer types /test-this after the agent finishes a feature. Mechanism: project-scoped slash command in .claude/commands/test-this.md that runs ta tests run --group $TA_SMOKE_GROUP --url http://localhost:3000 --json, parses the exit code, reads .testerarmy/<timestamp>/result.json, and feeds failures back into the next turn. Value over npm test: real browser, real auth, vision-based assertions. The agent loop closes (change → run → read screenshot and log → patch → rerun) without a human in the middle.
Pattern 2, Stop hook: smoke run after every feature
{
"hooks": {
"Stop": [{
"matcher": "",
"hooks": [{
"type": "command",
"command": "if curl -sf http://localhost:3000 >/dev/null && git diff --cached --name-only HEAD~1 | grep -q '^apps/web/'; then TESTERARMY_TARGET_URL=http://localhost:3000 ta tests run --group $TA_SMOKE_GROUP --json --parallel 3 || echo 'TESTERARMY_FAIL: see .testerarmy/'; fi"
}]
}]
}
}Value: zero ceremony, you never type "run tests"; the agent gets failure text in the next turn's context and self-corrects.
Pattern 3, Pre-commit gate via Husky
Husky pre-commit hook calls ta tests run --group $TA_CRITICAL_GROUP --json. Keep "critical" small (login, checkout). Catches vision-class regressions Vitest cannot see. Tradeoff: 30-90s latency per commit.
Pattern 4, PR bot via GitHub App (zero-config)
PR push → Vercel preview → GitHub deployment event fires the App automatically. No YAML, no CI minutes. Exploration agent surfaces bugs in flows you never wrote a test for.
If not on Vercel/Coolify, six lines of GitHub Action cover it:
- name: Smoke against preview
run: npx testerarmy tests run --group ${{ vars.TA_GROUP }} --project ${{ vars.TA_PROJECT }} --parallel 3 --json
env:
TESTERARMY_API_KEY: ${{ secrets.TESTERARMY_API_KEY }}
TESTERARMY_TARGET_URL: ${{ steps.deploy.outputs.url }}Pattern 5, qa-runner Claude Code subagent
Subagent in .claude/agents/qa-runner.md with Bash(ta:*) and Read(.testerarmy/**). Picks the right group from changed file paths, runs ta tests run --group … --json --debug, parses result.json, returns only failures and screenshot URLs to the main agent. Keeps long QA transcripts out of the main context window.
5. Where TesterArmy sits vs alternatives
Closest comparables:
- Momentic, almost identical pitch, web-focused, larger funding.
- QA.tech, published pricing, no clear iOS/Android support.
- Octomind, generates Playwright code (you own the tests), has MCP.
- QA Wolf, heavyweight, AI plus embedded human QA.
- Reflect.run, record-and-play plus AI, classic QA buyer.
- Meticulous.ai, records real user interactions, no prompts, web-only.
- Checkly, code-first synthetic monitoring, real free tier, Claude/Cursor MCP.
Defensible parts: iOS plus Android plus web in one project; AES-256-GCM credential storage with per-agent OTP inboxes; "hundreds of evals" tuning for false positives.
Thin parts: prompt-to-browser-agent is now table stakes; no published pricing; no open MCP server; no portable test export.
6. Verdict and the one blocker that decides it
If you ship web plus native mobile, want a Polish-engineer-friendly team, and tolerate a sales conversation, TesterArmy is genuinely competitive. If you are a solo indie dev shipping a SaaS side project, the lack of published pricing and free-tier limit is a hard stop, pick Checkly or Octomind instead.
The one structural blocker that decides whether TesterArmy fits the vibecoding loop:
Tests are dashboard-defined and id-based. There is nota run "verify checkout still works after my last change"for ad-hoc agent-authored tests. The agent can execute groups it knows the IDs of, but cannot create-and-run a one-shot test from a natural-language prompt scoped to its own diff.POST /v1/runsaccepts aprompt, but the CLI does not expose it.
Every pattern above still requires a human to pre-author tests in the dashboard, which breaks the cleanest version of the vibecoding promise: agent writes the code, agent writes the test, agent runs the test, agent fixes what broke. The day TesterArmy ships ta run --prompt "test the new flow at /checkout", the equation flips.
For now: install the GitHub App for zero-config PR coverage, install the agent skill for slash-command-driven QA, and treat dashboard-defined tests as the part of your project humans still have to start.
Sources: tester.army, docs.tester.army, npm: testerarmy, github.com/tester-army, YC: TesterArmy.