Product Bug Triage & Data Visibility

The Goal: Add Clarity and Accountability to Bug Triage

Teachable Product & Engineering consistently shipped quickly, but we lacked a reliable way to understand the health of our bug reporting and resolution process. What I learned informally from peers in Engineering, Customer Care, and Product was basic, but instructive:

Turns out we couldn't answer basic questions:

How many bugs did we have?
How many of them are valid?
How long do they take to resolve?
Where were things getting stuck?

Bug reports flowed into Customer Care (CC), then to Product Solutions (PS - a technical support team that vetted bugs), and then out to multiple engineering pods. However: ownership, prioritization, and timelines all lacked consistency. The result was friction between teams, unclear accountability, and complete lack of confidence in the current system.

My Goal: Treat Bug Triage as a product and make it measurable.

I launched an initiatve to close the loop on information gaps, and lobbied the VPs of Product, Engineering, and Data to back me up as I rolled out new tooling, behaviors, and overall approach to keeping our bugs squashed, product healthy, and customers feeling cared for.

The Output: Solutions We Implemented (High-Level, Not Exhaustive)

Before I regale you with my approach, let me hit you with why it mattered in the end:

What Changed

Standardized prioritization
- Consolidated Impact × Severity into p1–p4
Introduced SLA-based visibility
- Derived expected resolution timelines by priority
- Tracked days_outstanding and over/under SLA
Clarified ownership
- Product Solutions trusted to escalate to engineering
- Pod leads assign engineers
- Single source of truth for dev workflow
Built analytics foundations
- Requested event-level data from Clubhouse (project management software)
- Designed Redshift-friendly schemas
  - Built Looker views for pods, Product Solutions, and leadership

What Improved

Clearer prioritization across teams
Reduced manual follow-ups
Shared language around bug health
Increased confidence in decision-making

Here's how we came to these outcomes.

Step 1: Diagnose Before Designing

My primary stakeholders:

Customer Care (intake)
Product Solutions (technical triage)
Pod Leads and Engineers (execution)
Data Team (analytics foundations)

What I did:

Mapped the end-to-end-workflow
1. Customer-reported issues -> escalation -> validation -> resolution
2. identified blockers across handoffs, duplication, ownership confusion

Bug workflow

Conducted stakeholder interviews
1. Focused on where context was lost
2. Where follow-ups became manual
3. Where trust broke down. From the archives:
  - "How can I even find out the status of that bug I reported?"
  - "I can't tell which of these is actually important at any given moment. I'm waiting for someone to ring alarms"
  - "I wish they would spend more time with customer inquiries before bringing problems to us that aren't actually bugs"
Audited existing data

We used a project management tool called Clubhouse -- think of it as somewhere in between Jira and Trello. I had to export CSVs at first to discover what data we could actually rely on. (Spoiler alert: not much) I would eventually advocate to ingest webhooks and leverage their API to understand our efficacy and build new tooling.
1. Labeled states, priorities, timelines
2. Identified drift, inconsistency, and missing information and history
Partnered with Data
1. Defined what data needed to exist before:
  1. Any dashboards or tooling could make a meaningful difference
  2. Crucially: We could actually quantifiably assess our gaps as an organization
2. Prototyped analysis manually to validate questions

Step 2: Uncover Silent Systemic Failures

Key Findings

Measurement in the current state was cumbersome

Among other issues, priority labels for reported bugs were clever but vastly overcomplicated. Product Solutions had two dimensions by which they determined priority: Impact (scope, aka how many people it impacts) and Severity (aka, how painful is this).
Ownership was diffused

Engineers weren't sure which tickets to follow. Product Solutions actued as manual workflow glue.
Tooling amplified social friction

Duplicate stories, unclear sources of truth, and constant follow-ups between Tech leads, IC engineers, Product Solutions, and Customer Educators eroded confidence in the process.
Constant process changes to paper over systemic shortcomings

Because of the confluence of factors that made our bug backlog an opaque mess, Product Solutions would hold weekly meeting with Product and Tech leads to lobby for what they deemed to be the "Top 5" most pressing bugs.

Some where new high-priority issues, others were ones that customer service folks waited on for a long time and wanted resolved.

This meeting was art and science and politics. But it sure wasn't effective or optimal 😬

Step 3: Defining Principles and Tradeoffs

Principle 1: Analytical integrity > False precision

I avoided granular metrics until we aligned on stable definitions
Prioritized directional insight first, optimization later

Principle 2: Actionable scope over total coverage

We chose subsets of data to ignore completely to parse signal from noise
Metrics only applied to clearly scoped work teams could actually act on
- e.g. bugs categorized as "Unscheduled" or in "Backlog" would pollute analysis

Principle 3: Visibility over enforcement

We knew we would define SLAs (service-level agreements, or in short: expected time-to-resolve based on priority) as a part of this process. While we wanted to measure this, we made clear that these were guidelines.

SLAs were designed as trust-building signals, not performance weapons
Ownership stayed decentralized by allowing pod leads (PM + TL) to intake bugs, assign them to devs, and work them into sprint planning.

Principle 4: System Clarity > New tooling > new process

Simplified priority taxonomy
Reduced dimensionality to improve adoption
Reduced meetings, eliminated interim owners from process, consolidated Slack channels for communications

Principle 5: Data Visualization was our Primary Tool

In 2-3 stages, I defined schema needs, event histories, and derived metrics up front
- Worked closely with data engineering and analysts to understand what was possible
Prototyped analysis before we built pipelines
- It started with exports and spreadsheets, but continued as we defined prioriteis in tandem.
- In between data building pipelines into our Looker model, I wrote gnarly SQL queries and learned LookML to mock up what dashboards would be helpful to all teams: from the tech support that reported to the pods that resolved.