Skip to main content

Bug Triage and Prioritization

Triage cadenceโ€‹

Employees can submit tickets to the Site Experience team through the #tech-requests (linked to Linear Triage) Slack channel. This channel should be monitored by Engineering and Product leadership to assess incoming tickets for severity and priority. Once a week, this team meets to refine any non-urgent asks and prioritize them into sprints.

Because not all issues are immediately initiated or reported as tickets, it is also expected that the team maintain ongoing monitoring of the following slack channels:

  • #hydrogen-alerts: Sentry alerts - urgent errors, downtime detection, etc
  • #tech-vendor-alerts: Feed of vendor status alerts
  • #site-escalations: Escalated support issues from external teams
  • #technology: Main technology org channel - initial reporting of issues can happen here
  • #engineering: Main engineering channel - initial reporting of issues can happen here
  • #engineering-site: Team channel - initial reporting and issue discussion can happen here

Triage and Response Flowโ€‹

Critical issues can and should interrupt the sprint; non-critical bugs are refined into the backlog with clear reproduction steps and are scheduled into upcoming sprints according to product and engineering priorities.

  1. Triage Incoming Issues
    • As an issue is reported, classify by user impact and urgency, mapping each to target response and fix times.
    • Assign severity
      • Critical: Revenue Impacting - extendeded production or vendor outage, extended period of revenue flows failing.
      • High: Revenue Impacting but quickly actionable - production outage, vendor outage, cart crashing, add to cart not working, checkout failing.
      • Medium: Actionable but not critical - affects some users, still usable but bad experience.
      • Low/Backlog: Minor, non-urgent - may require future scheduling/investigation, not critical to user flows, new requests.
  2. Initial Stakeholder Communication (Low/Backlog items excluded)
    • Respond to the reporting party within 30 minutes.
    • Acknowledge the issue, confirm investigation has started.
    • Provide estimated resolution or time for next update/check-in.
  3. Resolution/Process
    • Critical Severity Issues
      • Immediately alert full engineering team.
      • Engineering Manager to lead investigation and update stakeholders (15/30min depending on severity) or delegate to team member to lead.
      • Create dedicated Slack channel for triage and communication
      • Complete postmortem when issues are resolved and systems are stable.
    • High Severity Issues
      • Immediately alert full engineering team.
      • Engineering Manager to lead investigation and update stakeholders (15/30min depending on severity) or delegate to team member to lead.
      • Escalate to leadership/critical severity if unable to resolve promptly.
      • Complete postmortem when issues are resolved and systems are stable.
    • Medium Severity Issues
      • EM to assign owner (can be self or delegate to team).
      • Owner responsible for updating relevant parties on status.
      • Owner sees issue through to completion through standard engineering process.
    • Low Priority/Non-Urgent
      • Create ticket in Linear (if not created already).
      • Assign provisional priority; add context for backlog refinement.
      • Engineering leadership to triage during regular backlog reviews.