Federal Reserve Bank of San FranciscoWe are the Federal Reserve Bank of San Francisco—public servants with a mission to advance the nation’s monetary, financial, and payment systems to build a stronger economy for all Americans. We are a community-engaged bank, and are committed to understanding and serving the vibrant, expansive communities of the Twelfth District. That means we seek and appreciate new perspectives. We respect people for what they do and for who they are. We build opportunities to learn and grow. When you join the SF Fed, you become part of a diverse team united in its purpose to promote an economy that works for everyone.
As a Sr. /Lead Site Reliability Engineer, you will work with Cash Application Delivery Services (ADS) development, QA , DevOps and National IT teams for managing the systems that support the Cash ADS applications suite both on-prem and in the Cloud. Your main focus will be to ensure that all of our applications are operating optimally, and every aspect of the application is being monitored so as to facilitate quick troubleshooting and resolution of issues as they arise.
We empower our people to balance their life and work responsibilities. That’s why we offer a flexible hybrid work model that allows you to collaborate with office colleagues on some days, and work from home on others.
Responsibilities:
Establish and run playbooks to support the resolution of incidents that occur in production environments.
Help design Dashboards for effective monitoring of infrastructure resources in the cloud environments
Work with development teams to establish Service-Level Objectives and key Service-Level Indicators
Conduct Production Readiness Reviews to ensure services meets accepted standards of operational readiness before going live
Ensure infrastructure aligns with Security standards, assist in audits, and implement recommended practices to protect data and systems.
Facilitate the design and implementation of the Disaster Recovery plans, including back-ups, failover and recovery mechanism with the development and DBA subject matter experts
As one of the SREs, drive improvement opportunities in infrastructure, tooling, and workflows using a continuous feedback loop between development and CloudOps
Ensure uptime and reliability of Cloud based infrastructure and systems, monitoring system performance, and maintaining high availability of cloud-based assets.
Participate in incident Response and Troubleshooting by conducting root cause analysis and implementing solutions to prevent recurrence.
Establish thresholds for cloud based services and capabilities, set up and maintain monitoring systems to detect issues, before they impact users,
Configure alerts for system analogies, develop monitoring dashboards, monitor resource usage, latency, and error rates
Analyze system performance, establish metrics and thresholds, optimize service uptime, reduce latency, and improve customer experience, leveraging infrastructure modifications and configuration tuning etc.
Knowledge of technical troubleshooting approaches, tools and techniques, and the ability to anticipate, recognize, and resolve technical (hardware, software, application or operational) problems
Working experience in programming and scripting languages
Working tooling experience in Ansible, GitLab, Terraform, CloudWatch, Dynatrace, Grafana or equivalent is a must
Qualifications:
Bachelor’s degree in Computer Science, Information Systems, Computer Engineering, Systems Analysis or a related field or equivalent work experience
As a Lead Site Reliability Engineer typically requires 7+ years of industry experience in building and supporting enterprise level systems as a platform engineer or equivalent in a production environment. As a Senior Site Reliability Engineer typically requires 5+ years of hands-on experience implementing, supporting, and using the tools and services required for software orchestration, environment monitoring and management (DevOps) best practices. Experience in Ansible, GitLab, Terraform, CloudWatch, Dynatrace, Grafana is required
2+ years hands-on experience with AWS services - building, deploying, and monitoring using AWS tools and services such as AWS Lambda, AWS CloudWatch, and AWS X-Ray
Must be a U.S. Citizen or a Green Card holder with the intent to become a U.S. Citizen
Base Salary Range Sr. Site Reliability Engineer: Min: $113600 - Mid: $147600 - Max: $181600 (Location: San Francisco)
Base Salary Range: Lead Site Reliability Engineer: Min: $138900 - Mid: $180400 - Max: $221900 (Location: San Francisco)
Final salary and offer will be determined by the applicant’s background, experience, skills, internal equity, and alignment with market data.
We offer a wonderful benefits package including: Medical, Dental, Vision, Pre-tax Flexible Spending Account, Backup Child Care Program, Pre-Tax Day Care Flexible Spending Account, Paid Family Care Leave, Vacation Days, Sick Days, Paid Holidays, Pet Insurance, Matching 401(k), and Retirement/Pension.
We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, perform essential job functions, and receive other benefits and privileges of employment. The SF Fed is an Equal Opportunity Employer.
#LI-Hybrid
Full Time / Part Time
Full time
Regular / Temporary
Regular
Job Exempt (Yes / No)
Yes
Job Category
Information Technology
Work Shift
First (United States of America)
The Federal Reserve Banks believe that diversity and inclusion among our employees is critical to our success as an organization, and we seek to recruit, develop and retain the most talented people from a diverse candidate pool. The Federal Reserve Banks are committed to equal employment opportunity for employees and job applicants in compliance with applicable law and to an environment where employees are valued for their differences.
Always verify and apply to jobs on Federal Reserve System Careers (https://rb.wd5.myworkdayjobs.com/FRS) or through verified Federal Reserve Bank social media channels.