The Goal: Add Clarity and Accountability to Bug Triage
Teachable Product & Engineering consistently shipped quickly, but we lacked a reliable way to understand the health of our bug reporting and resolution process. What I learned informally from peers in Engineering, Customer Care, and Product was basic, but instructive:
Turns out we couldn't answer basic questions:
- How many bugs did we have?
- How many of them are valid?
- How long do they take to resolve?
- Where were things getting stuck?
Bug reports flowed into Customer Care (CC), then to Product Solutions (PS - a technical support team that vetted bugs), and then out to multiple engineering pods. However: ownership, prioritization, and timelines all lacked consistency. The result was friction between teams, unclear accountability, and complete lack of confidence in the current system.
My Goal: Treat Bug Triage as a product and make it measurable.
I launched an initiatve to close the loop on information gaps, and lobbied the VPs of Product, Engineering, and Data to back me up as I rolled out new tooling, behaviors, and overall approach to keeping our bugs squashed, product healthy, and customers feeling cared for.
The Output: Solutions We Implemented (High-Level, Not Exhaustive)
Before I regale you with my approach, let me hit you with why it mattered in the end:
What Changed
- Standardized prioritization
- Consolidated Impact × Severity into
p1–p4
- Consolidated Impact × Severity into
- Introduced SLA-based visibility
- Derived expected resolution timelines by priority
- Tracked
days_outstandingand over/under SLA
- Clarified ownership
- Product Solutions trusted to escalate to engineering
- Pod leads assign engineers
- Single source of truth for dev workflow
- Built analytics foundations
- Requested event-level data from Clubhouse (project management software)
- Designed Redshift-friendly schemas
- Built Looker views for pods, Product Solutions, and leadership
What Improved
- Clearer prioritization across teams
- Reduced manual follow-ups
- Shared language around bug health
- Increased confidence in decision-making
Here's how we came to these outcomes.
Step 1: Diagnose Before Designing
My primary stakeholders:
- Customer Care (intake)
- Product Solutions (technical triage)
- Pod Leads and Engineers (execution)
- Data Team (analytics foundations)
What I did:
- Mapped the end-to-end-workflow
- Customer-reported issues -> escalation -> validation -> resolution
- identified blockers across handoffs, duplication, ownership confusion

-
Conducted stakeholder interviews
- Focused on where context was lost
- Where follow-ups became manual
- Where trust broke down. From the archives:
- "How can I even find out the status of that bug I reported?"
- "I can't tell which of these is actually important at any given moment. I'm waiting for someone to ring alarms"
- "I wish they would spend more time with customer inquiries before bringing problems to us that aren't actually bugs"
-
Audited existing data
We used a project management tool called Clubhouse -- think of it as somewhere in between Jira and Trello. I had to export CSVs at first to discover what data we could actually rely on. (Spoiler alert: not much) I would eventually advocate to ingest webhooks and leverage their API to understand our efficacy and build new tooling.
- Labeled states, priorities, timelines
- Identified drift, inconsistency, and missing information and history
-
Partnered with Data
- Defined what data needed to exist before:
- Any dashboards or tooling could make a meaningful difference
- Crucially: We could actually quantifiably assess our gaps as an organization
- Prototyped analysis manually to validate questions
- Defined what data needed to exist before:
Step 2: Uncover Silent Systemic Failures
Key Findings
-
Measurement in the current state was cumbersome
Among other issues, priority labels for reported bugs were clever but vastly overcomplicated. Product Solutions had two dimensions by which they determined priority: Impact (scope, aka how many people it impacts) and Severity (aka, how painful is this).
-
Ownership was diffused
Engineers weren't sure which tickets to follow. Product Solutions actued as manual workflow glue.
-
Tooling amplified social friction
Duplicate stories, unclear sources of truth, and constant follow-ups between Tech leads, IC engineers, Product Solutions, and Customer Educators eroded confidence in the process.
-
Constant process changes to paper over systemic shortcomings
Because of the confluence of factors that made our bug backlog an opaque mess, Product Solutions would hold weekly meeting with Product and Tech leads to lobby for what they deemed to be the "Top 5" most pressing bugs.
Some where new high-priority issues, others were ones that customer service folks waited on for a long time and wanted resolved.
This meeting was art and science and politics. But it sure wasn't effective or optimal 😬
Step 3: Defining Principles and Tradeoffs
Principle 1: Analytical integrity > False precision
- I avoided granular metrics until we aligned on stable definitions
- Prioritized directional insight first, optimization later
Principle 2: Actionable scope over total coverage
- We chose subsets of data to ignore completely to parse signal from noise
- Metrics only applied to clearly scoped work teams could actually act on
- e.g. bugs categorized as "Unscheduled" or in "Backlog" would pollute analysis
Principle 3: Visibility over enforcement
We knew we would define SLAs (service-level agreements, or in short: expected time-to-resolve based on priority) as a part of this process. While we wanted to measure this, we made clear that these were guidelines.
- SLAs were designed as trust-building signals, not performance weapons
- Ownership stayed decentralized by allowing pod leads (PM + TL) to intake bugs, assign them to devs, and work them into sprint planning.
Principle 4: System Clarity > New tooling > new process
- Simplified priority taxonomy
- Reduced dimensionality to improve adoption
- Reduced meetings, eliminated interim owners from process, consolidated Slack channels for communications
Principle 5: Data Visualization was our Primary Tool
- In 2-3 stages, I defined schema needs, event histories, and derived metrics up front
- Worked closely with data engineering and analysts to understand what was possible
- Prototyped analysis before we built pipelines
- It started with exports and spreadsheets, but continued as we defined prioriteis in tandem.
- In between data building pipelines into our Looker model, I wrote gnarly SQL queries and learned LookML to mock up what dashboards would be helpful to all teams: from the tech support that reported to the pods that resolved.