AgentProbe
Available on PyPI

AI-driven UI testing for Android and browser

AgentProbe uses a computer-use agent to drive your real UI, then judges the result against your success criteria using a vision-loop LLM call. No mocks. No record-and-replay. Just a GIF and a verdict.

Real UI, Real Device

Drives actual Android apps via adb or Chrome extensions via CDP. No mocks, no record-and-replay. If the UI breaks, the agent finds it.

Vision-loop Verification

A second LLM call judges the final screenshot against your success criteria. Pass means confirmed on screen — not just that the agent finished.

CI-ready in One Line

GitHub Actions reusable workflow. Add one step, spin up an Android emulator or Chrome, and get a GIF + result.json on every push.

How it works

from agentprobe import TestCase, run_case

case = TestCase(
    name="basic_smoke",
    package="com.android.calculator2",
    instruction="Verify the Calculator keypad is visible, then compute 5 + 3 = and confirm the result is 8.",
    successCriteria=["Calculator is open with a numeric keypad", "Result 8 is displayed"],
    failureCriteria=["App crashes or shows error dialog"],
    maxSteps=15,
)

result = run_case(case, output_dir="./agentprobe-output")
print(result["verdict"], "--", result["reason"])
# pass -- YES. The calculator shows 8 after tapping 5 + 3 =.

What it tests

Android Apps

Any APK via adb — emulator or physical device. Full touch + swipe interaction.

Chrome Extensions

Load unpacked extensions in a real Chrome instance and test popup UIs, content scripts, and side panels.

Web UIs

Navigate any website via CDP. Confirm flows, forms, and visual states without writing selectors.

Any Browser Target

YAML test cases for any URL target. Portable across CI environments and local dev.

What you get

Every test run produces two artifacts dropped into your output directory.

  • demo.gif

    Frame-by-frame agent reasoning. Share it in your PR for instant visual evidence that the UI works.

  • result.json

    Machine-readable verdict with reason and step count. Parse it in CI to fail the build or annotate the PR.

result.json

{
  "verdict": "pass",
  "reason": "YES. The calculator shows 8 after tapping 5 + 3 =.",
  "steps": 7,
  "gif": "./agentprobe-output/demo.gif"
}

Ready to test your UI with an agent?

Open source and available on PyPI. Drop it in CI or run locally in minutes.

Install from PyPI

Get started in your environment in seconds.

pip install agentprobe

View on GitHub

Source, docs, and examples in one place.

dzianisv/agentprobe