This role is full-time and permanent with Converge Technology Solutions. We are searching for a detail-oriented and analytical Platform Operations Manager to oversee daily operations of the Converge IBM Power for Google Cloud (IP4G) platform. You will lead a team of cloud engineers, supporting customers through onboarding, incident management, and build activities, while collaborating with Product, Engineering, and Development teams to ensure operational goals align with customer outcomes. We are experiencing explosive growth and invest heavily in our team members.
What you will accomplish:
First 30 Days:
Familiarize yourself with the organization's strategy, objectives, and priorities
Build relationships with the cloud operations team, engineers, and other cross-functional teams to understand their challenges, strengths, and priorities
Identify and document current operational processes, key workloads, and performance metrics (uptime, incident response times, etc.)
Identify and document any urgent risks or critical performance issues that need attention
First 60 Days:
Address any high-priority issues or risks identified during the first 30 days
Define key performance indicators (KPIs) for cloud operations - e.g., uptime/availability, incident response metrics, internal initiative(s) state, cost optimization, and team performance metrics
Aggregate relevant data-sources to establish reporting mechanisms to support the business critical KPIs
Begin assessing team members' skills and identifying any gaps or training needs.
First 90 Days:
Create a detailed strategy for cloud operations that aligns with the overall business goals and organizational imperatives
Ensure the team is aligned with the new processes and that they are empowered to take ownership of critical areas
Establish a regular reporting structure to keep stakeholders informed on cloud operations team performance, Customer satisfaction, cost, SLO/SLA adherence, and project status among other items as identified
Establish individual development plans, aligned to organizational MBOs and strategic business objectives
Key Responsibilities:
Oversee the resolution and response of platform and Customer impacting incidents, minimizing service disruptions and downtime with an emphasis on clear, concise, and structured Customer communications
Optimize and streamline incident response processes, tooling, and capabilities
Establish and oversee the capacity management and planning framework that considers both the technical and commercial aspects of platform scaling
Curate detailed and actionable operational and capacity data-visualization(s) for stakeholders, including executive leadership and technical audiences
Manage adherence to security controls within the platform environment - SOC, PCI-DSS, and others as appropriate
Collaborate with Product Development and Platform Engineering teams to support platform deployments and ensure frictionless operations and maintenance activities
Lead process optimization efforts to drive efficiency, reduce manual intervention, and enhance the Customer experience through automation and orchestration
Qualifications:
7+ years of experience in cloud operations, infrastructure management, or a related IT operations role
Proven leadership experience managing cloud teams and complex multi-disciplinary projects
Demonstrated ability to lead teams of cloud engineers and operations staff with a focus on developing technical skills and operational excellence
Experience managing cross-functional teams and working closely with developers, product owners, and business stakeholders
Proven track record of managing large-scale cloud migrations or multi-cloud environments
Experience working in Managed/Cloud Service provider business
Proven experience in cost management and financial optimization in cloud environments
Strong incident management and troubleshooting skills
Experience with monitoring and observability tools like CloudWatch, Datadog, Prometheus, Grafana, or similar.
Work Environment:
Remote within the United States
Total Rewards:
We offer a comprehensive total rewards package that includes base salary, healthcare benefits, 401k match, stock match program, PTO/holiday, training/development, promotional opportunity and so much more.