SRE Manager
Company: Descope Inc.
Location: Los Altos
Posted on: May 20, 2025
Job Description:
We are seeking an experienced and driven SRE Manager to lead our
Site Reliability Engineering team. This role is critical to
ensuring the availability, scalability, and performance of our
production systems. As the SRE Manager, you will be responsible for
managing a team of engineers focused on building automation,
enhancing monitoring and observability, improving system
reliability, and fostering a culture of operational excellence. You
will work closely with development, infrastructure, and security
teams to support high-quality product delivery with minimal
downtime.Key Responsibilities:
- Lead and grow a high-performing SRE team responsible for the
reliability, performance, and scalability of production
systems.
- Own the incident management process, postmortems, and root
cause analysis to improve system resilience.
- Drive implementation of SLAs, SLOs, and error budgets across
services to align operational goals with business objectives.
- Champion the use of automation to reduce manual work and
improve deployment and recovery times.
- Collaborate with software engineering and DevOps teams to
ensure systems are designed for reliability and operational
efficiency.
- Oversee system monitoring, alerting, and observability efforts
using tools like Prometheus, Grafana, Datadog, or similar.
- Manage on-call rotations, and ensure proper documentation,
runbooks, and playbooks are maintained.
- Identify and drive continuous improvement in system
architecture, capacity planning, and deployment strategies.
- Ensure compliance with security, privacy, and regulatory
requirements within the infrastructure.
- Provide mentorship, performance reviews, and career development
opportunities for SRE team members.
- Qualifications:
- Bachelor's degree in Computer Science, Engineering, or a
related field (or equivalent experience).
- 4+ years of experience in software engineering, DevOps, or SRE
roles.
- Strong experience with cloud platforms (AWS, GCP, or Azure) and
infrastructure-as-code tools (Terraform, Pulumi, etc.).
- Proficient in programming/scripting languages such as Python,
Go, Javascript.
- Deep understanding of Linux systems, networking, and container
orchestration (Kubernetes, Docker).
- Strong knowledge of CI/CD pipelines and release
automation.
- Excellent leadership, communication, and project management
skills.
- Proven track record of building reliable systems at scale and
managing incident response in production environments.
#J-18808-Ljbffr
Keywords: Descope Inc., Modesto , SRE Manager, Executive , Los Altos, California
Didn't find what you're looking for? Search again!
Loading more jobs...