Careers

Come Join Us!

Georgia System Operations is a progressive organization offering opportunities for engineers, technicians, project managers, and more.  We’ve been honored with Best Place to Work in Georgia.

Our people-over-profit culture and competitive compensation and benefits packages prove we’re dedicated to retaining the best candidates.  

We offer comprehensive medical, dental, and vision coverage, a strong retirement program, career development, and flexible work schedules.  We’re focused on wellness and being a supportive member of the community.

Benefits

Affordable health insurance options, such as medical, dental, and vision coverage, are available for full-time employees.

Basic insurance for accidental death and dismemberment, long-term disability, and life insurance are available at no cost.  Employees can opt to pay for more coverage.

A competitive retirement plan, with company match and company contributions, is available for full-time employees.

We offer many options for our employees’ well-being, including an employee assistance program, an on-site fitness center, and several wellness-focused programs.

Educational reimbursement is available for full-time employees.  Employees can also participate in a 529 college savings plan.

Employees can participate in voluntary benefits, covering hospitalization and critical illness, legal and ID theft protection, and pet insurance.

Vacation and sick leave are available for full-time positions via the paid time off program. GSOC is closed for 9 national holidays annually.

We support growth and development for all our employees through an on-site training program, online learning tools, and programs designed to develop industry knowledge.

Our employees are given volunteer paid time off every year to contribute to the community service organization of their choice.

Azure Platform Engineer
Department: Shared Services IT
Full Time
image/svg+xml
$99360 - $159900 per year
Tucker, Georgia, United States - 30084
Description

Senior Azure Platform Engineer

 

Role Overview

We are seeking a Senior Azure Platform & Resiliency Engineer to design, build, and operate our cloud foundation in Microsoft Azure. This role combines cloud architecture, infrastructure engineering, and site reliability principles to ensure our platform is secure, scalable, and resilient by design.

You will lead the development of Azure landing zones, infrastructure as code (IaC), and reliability patterns, enabling engineering teams to deploy and operate workloads consistently and safely.

This is a hands-on engineering role, not a pure operations or oversight position.

Key Responsibilities

Azure Platform Architecture

  • Design and implement Azure Landing Zones aligned with Microsoft Cloud Adoption Framework (CAF)
  • Define and manage:
    • Management group and subscription strategy
    • RBAC and identity models
    • Azure Policy and governance controls
  • Build and maintain shared services (networking, logging, identity)

 

Infrastructure as Code (IaC)

  • Develop and maintain infrastructure using Terraform (preferred), Bicep, or ARM
  • Create reusable modules for:
    • Networking (hub/spoke, private endpoints)
    • Compute (VMs, VMSS, AKS)
    • Storage and platform services
  • Integrate IaC into CI/CD pipelines (GitHub Actions or Azure DevOps)

 

Resiliency & Reliability Engineering

  • Design high availability and multi-region architectures
  • Define and implement disaster recovery (DR) strategies (RTO/RPO)
  • Conduct failover testing and resilience validation
  • Establish and track SLIs/SLOs and reliability metrics

 

Serverless & Platform Services

  • Design and support event-driven and serverless architectures:
    • Azure Functions
    • Logic Apps
    • Event Grid / Service Bus
  • Ensure scalability, fault tolerance, and observability of distributed systems

 

Observability & Operations

  • Implement monitoring and logging using:
    • Azure Monitor
    • Log Analytics
    • Application Insights
  • Define alerting strategies and reduce noise through meaningful thresholds
  • Support incident response and lead post-incident reviews

 

Security & Compliance

  • Partner with security teams to implement:
    • Identity-first architecture (Entra ID, Managed Identities)
    • Network security (NSGs, Private Endpoints, Zero Trust patterns)
    • Secrets management (Azure Key Vault)
  • Ensure infrastructure meets compliance and audit requirements

 

Required Qualifications

  • 6+ years of experience in cloud infrastructure, DevOps, or SRE roles
  • Strong hands-on experience with Microsoft Azure architecture and services
  • Proven experience designing and implementing Azure landing zones
  • Deep expertise in Terraform (or equivalent IaC tools)
  • Strong understanding of Azure networking (VNet, peering, DNS, private access)
  • Experience with CI/CD pipelines (GitHub Actions, Azure DevOps)
  • Proficiency in scripting (Python, PowerShell, or similar)

 

Preferred Qualifications

  • Experience with Kubernetes / AKS
  • Familiarity with Azure Front Door, Traffic Manager, or global routing
  • Experience implementing Azure Policy at scale
  • Exposure to chaos engineering or resilience testing tools
  • Understanding of FinOps / cloud cost optimization

 

Certifications (Preferred)

  • Microsoft Certified: Azure Solutions Architect Expert (AZ-305)
  • Microsoft Certified: Azure Administrator Associate (AZ-104)
  • HashiCorp Certified: Terraform Associate

 

What Success Looks Like

  • A well-architected, governed Azure platform that supports multiple teams and environments
  • Infrastructure fully defined and deployed via code
  • Systems designed for high availability and rapid recovery
  • Clear observability and actionable alerting across all services
  • Reduced operational toil through automation

 

Interview Focus Areas

Candidates should be prepared to discuss:

  • Designing an Azure landing zone for a multi-team organization
  • Structuring Terraform for scalability and governance
  • Multi-region architecture and failover strategies
  • Real-world incident response and system reliability improvements