It’s inevitable; there will come a day when one of your customers has an incident that impacts operations. A domain controller burns to a crisp, half the office is hit with ransomware, or a hurricane wipes the building out completely. Even if the incident in question only impacts one person, it’s still a problem that affects your customer’s ability to do business.
You need to be able to quickly respond and swiftly resolve the issue. Just like every customer problem you’re tasked with solving, there’s a bit of diligence work to understand the nature of the issue, a period of time determining the correct course of action, and then the time it takes to actually rectify the problem (and don’t forget to include any unforeseen consequences, hiccups, etc. that may show themselves while your cleaning up the incident).
If your techs either don’t know how to resolve an incident or simply take too long, your customer’s operations suffer – as does your reputation with the customer.
So, what can you do to improve your ability to respond to incidents? Here’s a simple 4 step guide to set up a reliable incident response procedure.
The following steps will help make the response more a known process to follow and less putting out a fire. Consider performing this for each service you offer.
Step 1 – Build a List of Known Incidents
You should already know what areas of your customer’s tech you are responsible for based on the services you offer. The MSP community already has a ton of empirical and anecdotal data on what kinds of incidents are most common for each of the services you offer. Begin by building out a list of those incidents you believe you should have a plan for. You shouldn’t limit this list to only those major catastrophic-type events such as the loss of a major system or a widespread ransomware attack; include issues big and small – we’ll hone down the list in the next step.
Step 2 – Talk to Your Customer
Discuss your list of common incidents with your customers, seeking to understand which ones they think are most impactful to operations. In many ways, you may already have some of this detail if you plan your customer’s DR strategy by first discussing which systems, applications, and data would hurt the business most if they were unavailable. At the end of this step, you should at very least have a prioritized list of incidents, with your customer’s concerns at the top of the list.
Step 3 – Build Incident Response Plans
If your use of a ticketing system is mature, you likely have a knowledgebase of processes to follow that address specific errors, user issues, etc. In many ways, you’re doing the same thing here; buildout a plan of how to respond to each incident. One of the differences with response plans is you should also include contingency planning. For example, if you can’t get the CEO’s laptop back up and running, what’s the plan? Or if the on-prem recovery of a tier 1 workload isn’t possible because the local hardware is damaged, then what? Think of these plans as containing both tactical and strategic steps, so that your techs are prepared for anything.
Step 4 – Build an Incident Plan for the Unknown
It sounds a little odd, making a plan for the unknown. But it’s important to have a structured plan of what high-level steps need to be taken in the case an incident occurs you haven’t thought of. For example, what should you do if not one, but three of your customer’s critical applications fail simultaneously? Or how about if you believe your customer has become the victim of data manipulation (where the data isn’t deleted or encrypted, it’s maliciously changed, putting the entire data set into question), but you’re not sure of the extent of the attack. In instances like these, your plan should include some form of triaging the affected systems/applications/data, communicating with the customer, prioritizing where to place your response focus, and planning the tactical next steps.
Tools to Help You Along the Way
It’s all well and good to tell you that you should be doing all of the above, but what about tool-sets to help with this? We work in technology, shouldn’t there be some applications out there that can help with this process? Well… yes! Consider the below apps and tools when working on this list and for ongoing operations
Ticketing System – This was mentioned above, but it can’t be stressed enough how important this tool is. A ticketing system (Such as Connectwise) contains a historical list of all incidents with a given customer. Not only will you use this tool to track issue as they arise, but this is also a good place to look for repeating issue. Maybe customer A’s print server goes down on the 3rd Friday of every month. Not only will this tool give you an idea that this is happening, but it can also be used to find and squash repeating bugs.
Wiki & Documentation – Let’s face it, no one wants to read about extended fix information inside of a trouble ticket. Ticketing systems are great at tracking issues, but some lack the ability to do meaningful, lengthy documentation. Consider a wiki or professional documentation tool for this such as IT Glue.
Defined Playbook – You’ll have a better idea of what to plan for if you’re only offering a defined set of products/services. Try to define a primary and secondary offering for each product area. Pick 2 AV vendors, 2 encryption vendors, 2 storage vendors…..etc….etc. Once your team becomes familiar with these defined products not only will responses go better, you’ll be more prepared for the unknown.
Your ability to address an incident is judged by how quickly you respond to the issue and how accurately you resolve it. Without a plan, your techs are spending their time googling the problem hoping to find answers. But, by proactively building response plans (following the 4 steps above), your team will know what to do (or, at very least, have some kind of idea of how to start dealing with the problem) in just about any circumstance.
What about you? Have you put any incident response plans into place? Do you feel they’ve prepared you adequately? Why or why not? Let us know in the comments section below!
Thanks for reading!
Get a 30-day trial of Altaro VM Backup for MSPs
Manage all your customer VM backups from a single cloud console, on a monthly subscription. Try Altaro VM Backup for MSPs for 30 days - no strings attached!
Not a DOJO Member yet?
Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!