AgentProbe uses a computer-use agent to drive your real UI, then judges the result against your success criteria using a vision-loop LLM call. No mocks. No record-and-replay. Just a GIF and a verdict.
Drives actual Android apps via adb or Chrome extensions via CDP. No mocks, no record-and-replay. If the UI breaks, the agent finds it.
A second LLM call judges the final screenshot against your success criteria. Pass means confirmed on screen — not just that the agent finished.
GitHub Actions reusable workflow. Add one step, spin up an Android emulator or Chrome, and get a GIF + result.json on every push.
from agentprobe import TestCase, run_case
case = TestCase(
name="basic_smoke",
package="com.android.calculator2",
instruction="Verify the Calculator keypad is visible, then compute 5 + 3 = and confirm the result is 8.",
successCriteria=["Calculator is open with a numeric keypad", "Result 8 is displayed"],
failureCriteria=["App crashes or shows error dialog"],
maxSteps=15,
)
result = run_case(case, output_dir="./agentprobe-output")
print(result["verdict"], "--", result["reason"])
# pass -- YES. The calculator shows 8 after tapping 5 + 3 =.Any APK via adb — emulator or physical device. Full touch + swipe interaction.
Load unpacked extensions in a real Chrome instance and test popup UIs, content scripts, and side panels.
Navigate any website via CDP. Confirm flows, forms, and visual states without writing selectors.
YAML test cases for any URL target. Portable across CI environments and local dev.
Every test run produces two artifacts dropped into your output directory.
Frame-by-frame agent reasoning. Share it in your PR for instant visual evidence that the UI works.
Machine-readable verdict with reason and step count. Parse it in CI to fail the build or annotate the PR.
result.json
{
"verdict": "pass",
"reason": "YES. The calculator shows 8 after tapping 5 + 3 =.",
"steps": 7,
"gif": "./agentprobe-output/demo.gif"
}Open source and available on PyPI. Drop it in CI or run locally in minutes.
Get started in your environment in seconds.
pip install agentprobe