The Incident Manager II plays a critical role in ensuring the organization can respond swiftly and effectively to incidents that disrupt production applications services and other outages. This role leads resources responses and communications during production outages and/or service degradations.
As a member of the Incident Management Team this role will act as the primary Incident Manager during major incidents and events. Over time this position will develop in-depth knowledge of SelectQuote products systems and infrastructure and requires a high degree of experience in crisis management. This role will also assess and close gaps by developing mitigation plans to increase the stability of all production operations.
The Incident Management Coordinator II is responsible for driving Root Cause Analysis analyzing incident response effectiveness and driving actionable remediations to prevent future outages.
Supervisory Responsibilities:
- This position has no direct supervisory responsibilities.
Essential Duties and Responsibilities:
- Act as the primary Incident Manager to assess the severity of an incident determine the appropriate response strategy and coordinate resources without requiring approval from senior leadership
- Evaluate business impact and make real-time adjustments to the resolution plan to minimize disruption
- Lead incident calls to drive resolution of the incident by broadcasting alarm notifications to Incident Analysts standing up war rooms/calls communicating statuses to leadership documenting the investigation in real time and assigning post incident review items
- Collect and document information to support Incident details and activities including but not limited to artifacts troubleshooting steps timelines and impacted elements
- Conduct independent assessments on active/live incidents and communicates status to Business and IT leaders
- Prioritizes manages and owns the incident(s) from start to closure
- Documenting results of the analysis/investigation including impact root cause containment recovery remediation and all other efforts
- Owns post incident analysis (RCA) with appropriate teams to identify the root cause and assess key areas of improvement in process procedure and systems
- Implement and manage action plans or improvements with appropriate teams to mitigate future impacts
- Tracking the implementation of post-incident recommendations and ensuring timely resolution of identified gaps risks or weaknesses
- Analyze data to identify trends and areas for improvement in incident reporting
- Producing and maintaining detailed incident reports metrics and analytics
- Continuously learns the SelectQuote suite of products and infrastructure to understand vulnerabilities and weaknesses
- Be available for on-call duties outside regular hours to manage urgent incidents as they arise
- Other duties as assigned
Skills/Abilities:
- Strong experience and understanding of incident response protocols and IT service management (ITSM) frameworks
- Excellent verbal and written communication abilities are required for effectively communicating with business users IT Management and technical professionals
- Ability to analyze incidents diagnose and prioritize issues quickly and implement solutions to minimizing downtime
- Capability to lead small teams during incident responses and acting with decisiveness on critical decisions.
- Proven ability to collaborate with various departments and external partners
- Solid experience in creating and maintaining procedural documentation metrics/analytical reports and visual presentations for management
- Focused and versatile team player that is comfortable in a fast paced environment managing multiple incidents simultaneously
- Solid experience in Incident Management
- Strong project management skills with a track record of meeting deadlines
Education and Experience:
- BS/BA in Engineering Computer Science or equivalent work experience
- Experience supporting Disaster Recovery and/or Business Continuity Processes
- 4+ of experience in an incident response team environment
- Experience in ITIL or similar problem management functions
- ITIL4 Foundations Certification preferred
- ITIL4 Incident Management Certification preferred
- Experience in a customer support function
- Experience in IT service management and operations
- Must be able to work in a fast-paced multitasking technical and administrative environment
- The position supports a 24x7x365 environment (flexible on-call schedules)
- Preferred: 2+ years of hands-on experience in support of two or more of the following technical areas: AWS Windows Servers, Systems Unix Systems, Networking, Storage server virtualization, Voice Over Internet Protocol (VOIP), Database Administration (DBA) (e.g. Oracle Microsoft SQL Server), application software support network, and application monitoring project or program management
Certificates/Licenses/Registration:
- Preferred: ITIL Certification in Incident Management
- APM or PMP certification is a plus
Physical Requirements:
- Work is performed indoors with potential for exposure to safety and health hazards related to office work. Could periodically travel to other office and operational sites. The noise level in the work environment is usually moderate.
- Prolonged periods of sitting at a desk and working on a computer.
Disclaimer: The above statements are intended to describe the general nature and level of work being performed by people assigned to this job. They are not intended to be construed as an exhaustive list of all responsibilities duties and skills required.