Skip to content
Aman
TwitterYoutube

The triage master

work3 min read

Hi,

I am Aman, and this safe space is where I am most raw with my thoughts, hop on if you'd like to interact with them :))

Let's talk about bugs today, to a non engineering person maybe not a real one but from an engineering perspective very real!

This is taken from my personal notes on working @commenda and is a projection of how I personally approach on call engineering at an early stage startup

You may think that being an on call engineer is a less important but the it is actually not

Being reliable[1] for your teammates to make contributions is the most positive sum aspect of a great engineering team


Welcome to the world of Triage

As you embark on your journey to become a triage master, you stand at the threshold of a path filled with both immense challenges and unparalleled opportunities. In this role, you'll be responsible for swiftly diagnosing and prioritising software bugs, ensuring that critical issues are addressed with urgency while balancing the overall workflow. This journey will demand your keen analytical mind, exceptional problem-solving skills, and the ability to stay calm under pressure. The fate of your Company user reliability lies in your hands

The Triage Department has three major components which will be your best friend going forwards

  • Reported Bugs (Triage Section)

  • P0 Bugs (Immediately Blocking a customer/Will definitely block a customer in the future if not solved)

  • P1 Bugs (We think of these bugs as less priority but they are very important from a reliability perspective)

Here is a brief rundown of what your job as the triage master will look like during your reign of power

How to kill Bugs(101)

Allocate some time to solve bugs(priority order is very clear P0 then P1), this time should be a focused block so that you don’t have to context switch much.

  • Read all the bugs in triage section(some bugs that are reported as P1 can actually be a P0), so reading all the bugs is very important.

  • Move the bugs from triage section to your active section where you typically would prioritise them.

  • All necessary communications regarding the context of the bugs should happen from linear/bug reporting tool used in your company, If for some reason you feel the response time is slow you can message the stakeholder in the company workspace asking them to check their tagged messages in the reporting tool.

  • If you decide to not prioritise any bug or put it in backlog please leave a comment in that ticket itself explaining your reasoning behind it

  • Always take extra mile for small bugs like copy changes and CSS changes which is very simple for you as a developer but is a big deal for the user

  • Always check your fix

  • Always check your inbox because the QA and other stakeholders will communicate from there, so it is very important for you to stay 0 inbox

  • A bug is not fixed until the stakeholder reporting the bug confirms the desired outcome is as expected

  • Some bugs are functionally working as intended but broken from a user perspective, please tag the stakeholders in such cases and flag them about the issue, usually in such cases there are two solutions

    • The ideal solution which usually will take time and eng resources

    • The quick but effective solution which will improve the current UX

Consult with the stakeholder on whatever solution you are prioritising and act on it, UX issues are also bugs, If we are confused about them our users will likely be too

[1] What does Reliability mean?

  • The app should behave as the user expects

  • The app can tolerate user making mistakes

There is more but these two are very important from the perspective of triage master, some users will deliberately make mistakes in our app with inputs, naming and ui stuff, In those cases

  1. If the fix is simple in nature but annoying please do it because it makes your company more reliable for customers.

  2. If the fix will take significant time, put it in backlog, tagging the product team so that they can prioritise.

The extra mile you take here as the triage master will take your company 10 steps forwards.

Being an on call engineer gives you the opportunity to

  • Improve the codebase
  • Clear tech debt
  • Understand systems which you have not worked on

To sum it up

A tourist visiting England's Elon College asked the gardener how he got the lawns so perfect. "That's easy," he replied, -You just brush off the dew every morning, mow them every other day, and roll them once a week." 'Is that all?" asked the tourist. "Absolutely, " replied the gardener. "Do that for 500 years and you'll have a nice lawn, too." Great lawns need small amounts of daily care, and so do great programmers. Management consultants like to drop the word kaizen in conversations. "Kaizen" is a Japanese term that captures the concept of continuously making many small improvements. - The pragmatic programmer

© 2024 by Aman. All rights reserved.