Title: Senior Site Reliability Engineer 
 Location: Alpharetta, GA 
 Duration: 6-12+ Months 
 
 
 About the Role 
 We're seeking an experienced Senior Site Reliability Engineer to join our team and play a critical role in ensuring the reliability, scalability, and performance of our cloud infrastructure. You'll be a technical leader who combines deep operational expertise with strong automation skills to build and maintain highly available systems. As a Kubernetes expert, you'll drive our container orchestration strategy and serve as a technical authority for our platform teams. 
 
  Key Responsibilities: 
 Infrastructure & Automation 
 Design, deploy, and manage cloud infrastructure across AWS and Azure using Terraform and infrastructure-as-code principles 
 Architect, deploy, and maintain production-grade Kubernetes clusters with a focus on reliability, security, and performance 
 Serve as the subject matter expert on Kubernetes, providing guidance and best practices to engineering teams 
 Build and maintain automated provisioning pipelines to ensure consistent, repeatable deployments 
 Implement and maintain HashiCorp Vault on AWS for secrets management and security, including Vault integration with Kubernetes 
 Design and implement automated High Availability and Disaster Recovery (HA/DR) capabilities through CI/CD pipelines 
 Optimize cloud resources and Kubernetes workloads for performance, cost efficiency, and reliability. 
 Observability & Monitoring 
 Architect and implement comprehensive observability solutions using Datadog for cloud-native applications and Kubernetes infrastructure 
 Build monitoring, logging, and alerting frameworks for containerized workloads that provide actionable insights into system health 
 Implement Kubernetes-native monitoring patterns and troubleshoot complex container orchestration issues 
 Integrate Datadog with PagerDuty and other incident management platforms 
 Define and track SLIs, SLOs, and error budgets to drive reliability improvements 
 Create custom dashboards and monitors to track infrastructure, application, and Kubernetes cluster performance 
 CI/CD & Pipeline Management 
 Design, build, and maintain robust CI/CD pipelines that enable rapid, safe deployments to Kubernetes 
 Implement GitOps workflows and automated deployment strategies for containerized applications 
 Implement automated testing, security scanning, and quality gates within pipelines 
 Drive solutions through test, QA, and production environments with appropriate controls and safeguards 
 Automate deployment strategies including blue-green, canary, and rolling deployments in Kubernetes 
 Security & Vulnerability Management 
 Identify, assess, and remediate security vulnerabilities in infrastructure, applications, and Kubernetes clusters 
 Implement Kubernetes security best practices including RBAC, pod security policies/standards, and network policies 
 Collaborate with security teams to implement and maintain security best practices 
 Manage and maintain HashiCorp Vault infrastructure for secure secrets management 
 Ensure compliance with security policies and industry standards across all environments 
 Incident Management & Response 
 Participate in 24/7 on-call rotation to respond to critical production incidents 
 Serve as Incident Commander, coordinating cross-functional response teams during major outages 
 Lead post-incident reviews and drive thorough root cause analysis across engineering teams 
 Troubleshoot complex Kubernetes and distributed systems issues under pressure 
 Develop and refine incident response procedures and runbooks 
 Collaboration & Leadership 
 Partner with engineering teams to improve system reliability and performance 
 Mentor junior SREs and promote SRE best practices across the organization 
 Lead Kubernetes adoption efforts and educate teams on container orchestration best practices 
 Drive initiatives to reduce toil through automation and process improvement 
 Contribute to architectural decisions with a reliability and operability lens 
 Required Qualifications: 
 5+ years of experience in Site Reliability Engineering, DevOps, or similar roles 
 Expert-level knowledge of Kubernetes 
 , including architecture, operations, and troubleshooting in production environments 
 Proven track record as a go-to Kubernetes resource and technical authority 
 Deep understanding of container technologies (Docker, containerd) and orchestration patterns 
 Strong hands-on experience with 
 AWS and Azure 
 cloud platforms 
 Proficiency in 
 Terraform 
 for infrastructure automation and management 
 Expert-level knowledge of 
 Datadog 
 for monitoring, logging, and observability 
 Experience with 
 HashiCorp Vault 
 , including deployment and management on AWS and Kubernetes integration 
 Deep understanding of 
 CI/CD pipelines 
 , including design, implementation, and optimization for containerized workloads 
 Proven ability to implement automated HA/DR solutions through CI/CD workflows 
 Strong programming skills in 
 Python 
 for automation, tooling, and analysis 
 Proven experience building observability solutions for distributed cloud applications 
 Experience configuring monitoring and alerting systems and integrating with paging platforms like PagerDuty 
 Demonstrated experience identifying and remediating security vulnerabilities 
 Experience driving deployments through multiple environments (test/QA/production) with proper gates and controls 
 Demonstrated experience participating in on-call rotations and responding to production incidents 
 Experience serving as Incident Commander or leading incident response efforts 
 Track record of conducting root cause analysis and driving systemic improvements 
 Strong understanding of networking, security, and cloud architecture principles 
 Excellent communication skills with ability to work across multiple teams and explain complex Kubernetes concepts 
 
 
 Preferred Qualifications: 
 Experience with 
 Google Cloud Platform (GCP)
 and GKE 
 Certified Kubernetes Administrator (CKA) or Certified Kubernetes Security Specialist (CKS)
 Experience with service mesh technologies (Istio, Linkerd, Consul)
 Knowledge of Helm, Kustomize, and other Kubernetes tooling 
 Experience with GitOps tools (ArgoCD, Flux)
 Familiarity with additional CI/CD tools (Jenkins, GitLab CI, GitHub Actions, CircleCI)
 Experience with configuration management tools (Ansible, Chef, Puppet)
 Background in software engineering or systems programming 
 Understanding of chaos engineering and reliability testing methodologies 
 Experience with cost optimization strategies in cloud and Kubernetes environments 
 Security certifications (AWS Security Specialty, CISSP, CKS, etc.)
 Experience with compliance frameworks (SOC 2, ISO 27001, etc.)
 Contributions to open-source Kubernetes projects or active participation in the Kubernetes community 
 What We Offer 
 Competitive salary and equity compensation 
 Comprehensive health, dental, and vision insurance 
 Flexible work arrangements 
 Professional development opportunities and certification support 
 Collaborative and inclusive team culture 
 Our Commitment 
 We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. 
...BEVERAGE DISTRIBUTORS, INC. 207 Church Street P.O. Box 4488 Wallingford, CT 06492(***) ***-**** Part Time Merchandisers G & G Beverage Distributors, Inc. is a proud distributor of beers, craft beers and ciders throughout New Haven, Fairfield County, Litchfield...
...legacy in the making, Queens Harbour is the place for you. Position Overview: As a Senior Sous Chef with a strong background in sushi and raw bar preparation , you will be a key leader in our kitchen, supporting the Head Chef in all aspects of culinary operations....
...Check Fraud Investigator is responsible for conducting internal and external investigations in connection with check fraud against the... ...applicable regulations. Collaborate with other investigators to assist with the successful completion of various fraud related tasks....
Lighthouse Lab Services is excited to represent a hospital laboratory in Macon, GA that is looking for Medical Technologists and Medical Technicians to add to their supportive community team! Shifts: Variable Mid-Shift Position Overview: Performs various testing...
JOB REQUIREMENTS: To include but not limited to Sort and rinse dirty dishes, glass, tableware and other cooking utensils and place them in racks to send through dish machine. Fill/empty soak tubs with cleaning/sanitizing solutions. Sort and stack clean dishes...