Available for work

I test AI workflows and app behavior
so your team catches failures first.

Leo Mari Cuizon · AI Systems Operator · QA & AI Workflow Testing

Web apps · PWAs · API workflows · Edge cases · iOS & Android via BrowserStack · I also build the tools I test.

I validate AI systems in the conditions they'll actually run in.

I help teams ensure that AI features, workflows, and applications behave correctly in real-world conditions — not just in development environments.

Independent systems operator working across QA, AI workflow validation, and data execution. I use AI tools throughout my process to accelerate testing, debugging, analysis, and documentation.

I don't specialize in one stack. I specialize in understanding how systems are supposed to behave, then finding where they don't.

Location
Cebu City, Philippines
Focus
QA · AI Workflow Validation
Availability
Open to work · Remote
Approach
Systems thinking · Execution-first

Four tracks, each one hireable on its own.

01

Web App & PWA QA Testing

  • User flows, auth/session bugs, mobile behavior, UI state failures
  • Form validation, broken links, offline behavior, service worker edge cases
  • Bug reports with steps to reproduce, severity, and fix path
02

AI & LLM Workflow Testing

  • Prompt evaluation — hallucinations, bad outputs, inconsistent behavior
  • Edge case testing across varied inputs, tones, and failure conditions
  • API response validation (OpenAI, Supabase, Groq) and fallback testing
03

Research & Structured Data Execution

  • Primary-source research collected, verified, and organized systematically
  • Raw data cleaned and formatted into spreadsheets, docs, or structured JSON
  • Built for AI ingestion — consistent schema, no junk rows, source-attributed
04

I also build what I test

  • Building PWAs from scratch gives me a closer view of where systems actually break
  • Shipping and testing in the same loop means I understand failure modes from both sides
  • Not a separate service — context that makes the testing sharper

Things I built to test
things I wanted to understand.

These are personal experiments — not polished products. Each one was a reason to get closer to how systems actually work.

Personal hobby project

Stackr

Built and tested an offline-first AI notes PWA. Covered local storage sync, AI response saving, auth persistence across sessions, and service worker caching behavior. Caught iOS Safari ITP failures breaking Supabase auth on reinstall.

Visit project
Personal hobby project

Jungle Dash

Built a 2.5D endless runner PWA from scratch, then used it as a testing ground for continuous state management — collision detection, mobile control behavior, obstacle generation edge cases, state resets on death, and performance under sustained loops.

Visit project
Personal hobby project

Clarity

Built an AI reflection PWA on a structured three-part response loop, then tested output consistency, safety boundary behavior, fallback handling between Groq and OpenAI providers, and freemium paywall state management.

Visit project

Every engagement ends with
something you can act on.

Documented outputs, not activity summaries. Here's what that looks like in practice.

📋

Bug Reports

  • Steps to reproduce, exact conditions, expected vs actual behavior
  • Severity classification and suggested fix path
  • Delivered in your preferred format — doc, sheet, or Notion
🧪

Test Case Checklists

  • Edge cases mapped from your product logic and user flows
  • AI/LLM response evaluation notes with pass/fail criteria
  • Workflow failure summaries with reproduction paths
📁

Structured Research Sheets

  • Source-attributed, consistently formatted, ready for AI ingestion
  • Clean schema — no junk rows, no mixed formats
  • Delivered as CSV, XLSX, or JSON depending on your pipeline
🗂️

Corpus & Dataset Curation

  • Structured text datasets extracted and cleaned for LLM training pipelines
  • Consistent labeling, formatting, and deduplication across large collections
  • Source-verified, schema-consistent, delivered in your required format

Sample Bug Report

Real bug report (sanitized)
P1 — Blocker

System fails to generate downstream outputs after successful data processing

Context

Web application · Staging environment · Workflow: Data ingestion → Processing → Output generation

Steps to Reproduce

  1. Create a new workspace/entity
  2. Connect a data source and initiate processing
  3. Allow processing stage to complete successfully
  4. Trigger output generation step A, then step B
  5. Observe output status

Expected

Both output generation steps complete, producing valid output artifacts.

Actual

Processing completes. Both output steps fail. No artifacts created.

System Logs (Sanitized)

processing completed successfully (items_processed: 16, blocked: false)
output_a.asset_id = null
output_b.asset_id = null
last_failed_runs.output_a.status = failed
last_failed_runs.output_b.status = failed

Analysis

Processing layer executes correctly. Failure occurs in the downstream output generation pipeline. Likely issue: missing or invalid mapping between processed data and output generation inputs — breakdown in the data handoff between modules.

AI-assisted debugging & documentation

PWA login sessions disappearing on refresh

Problem

Users were logged out every time the PWA was refreshed or reinstalled on iOS Safari.

Tested

Service worker caching strategy, Supabase auth token storage, ITP cookie behavior across iOS versions.

Output

Surfaced a caching conflict blocking session persistence. Documented the issue and fix path with AI assistance.

Need your AI app tested
before it ships?

Send me the flow, expected behavior, and access details. I'll return a clear issue list, edge cases, and reproduction notes. Available for 1–2 projects at a time — remote only.