Site Reliability Engineer
Amper Technologies
This job is no longer accepting applications
See open jobs at Amper Technologies.See open jobs similar to "Site Reliability Engineer" The Garage.Software Engineering
United States
Posted on Apr 4, 2026
Site Reliability Engineer (SRE) — Elevate Reliability and Performance at ECI
Location: US - Remote
Join ECI as a hands-on Site Reliability Engineer and become a pivotal force in ensuring the rock-solid reliability, peak performance, and seamless scalability of our Manufacturing ERP Portfolio. You’ll work across hybrid environments—both cloud and on-premises—to keep our critical production systems running flawlessly 24/7/365.
In this dynamic role, you’ll collaborate closely with Product, Development, Infrastructure, and Support teams to exceed uptime, SLA, security, and cost-efficiency goals. As an SRE, you’ll lead the charge in incident response, automation, observability, and continuous improvement, shaping the operational standards and best practices that drive our success.
What You’ll Own
Operational Excellence & Reliability
Be the guardian of our 24/7 production environments, swiftly responding to incidents, driving root cause analyses, and continuously enhancing uptime, error budgets, and recovery metrics. Your proactive mindset will identify risks before they impact our users.
Observability, Telemetry & Alerting
Design and maintain cutting-edge observability frameworks using tools like Coralogix and FireHydrant. Build intuitive dashboards and fine-tune alerting to ensure our teams have clear, actionable insights without the noise.
GitOps, Infrastructure & Automation
Champion GitOps principles and Terraform-driven infrastructure as code. Automate repetitive tasks, streamline CI/CD pipelines, and review pull requests to embed reliability and operational excellence into every deployment.
FinOps & Optimization
Drive cloud and infrastructure cost optimization initiatives, balancing performance with budget-conscious decisions. Collaborate on capacity planning and architect solutions that are both reliable and cost-effective.
Collaboration & Continuous Improvement
Work hand-in-hand with cross-functional teams in an Agile environment, contributing to sprint ceremonies, documenting runbooks, and fostering a culture of continuous learning and improvement.
Required
What We’re Looking For
The International Traffic in Arms Regulations (ITAR) is the United States regulation that controls the manufacture, sale, and distribution of defense and space-related articles and services as defined in the United States Munitions List (USML). Besides rocket launchers, torpedoes, and other military hardware, the list also restricts the plans, diagrams, photos, and other documentation used to build ITAR-controlled military gear. This is referred to by ITAR as “technical data”. ITAR mandates that access to physical materials or technical data related to defense and military technologies is restricted to US Persons only.
Location: US - Remote
Join ECI as a hands-on Site Reliability Engineer and become a pivotal force in ensuring the rock-solid reliability, peak performance, and seamless scalability of our Manufacturing ERP Portfolio. You’ll work across hybrid environments—both cloud and on-premises—to keep our critical production systems running flawlessly 24/7/365.
In this dynamic role, you’ll collaborate closely with Product, Development, Infrastructure, and Support teams to exceed uptime, SLA, security, and cost-efficiency goals. As an SRE, you’ll lead the charge in incident response, automation, observability, and continuous improvement, shaping the operational standards and best practices that drive our success.
What You’ll Own
Operational Excellence & Reliability
Be the guardian of our 24/7 production environments, swiftly responding to incidents, driving root cause analyses, and continuously enhancing uptime, error budgets, and recovery metrics. Your proactive mindset will identify risks before they impact our users.
Observability, Telemetry & Alerting
Design and maintain cutting-edge observability frameworks using tools like Coralogix and FireHydrant. Build intuitive dashboards and fine-tune alerting to ensure our teams have clear, actionable insights without the noise.
GitOps, Infrastructure & Automation
Champion GitOps principles and Terraform-driven infrastructure as code. Automate repetitive tasks, streamline CI/CD pipelines, and review pull requests to embed reliability and operational excellence into every deployment.
FinOps & Optimization
Drive cloud and infrastructure cost optimization initiatives, balancing performance with budget-conscious decisions. Collaborate on capacity planning and architect solutions that are both reliable and cost-effective.
Collaboration & Continuous Improvement
Work hand-in-hand with cross-functional teams in an Agile environment, contributing to sprint ceremonies, documenting runbooks, and fostering a culture of continuous learning and improvement.
Required
What We’re Looking For
- 3–5+ years of hands-on experience in Site Reliability Engineering, DevOps, or Infrastructure roles.
- Deep expertise in at least one major cloud platform (AWS, Azure, or GCP)
- Fluency with Linux/Unix systems administration, including kernel internals, networking, file systems, and advanced shell scripting (Bash, Python) for troubleshooting and automation.
- Proven experience managing production systems in hybrid cloud and on-premises environments.
- Familiarity with GitOps workflows, Terraform, and observability tools.
- Active participation in incident response and on-call rotations.
- Exceptional troubleshooting, problem-solving, and communication skills.
- Bachelor’s degree in computer science, Engineering, or related field, or equivalent experience.
- Experience with Kubernetes or other Container Services.
- Experience supporting high-availability SaaS platforms.
- Cloud certifications (AWS, Azure, or Google Cloud).
- Agile/Scrum experience and proficiency with Jira.
- Knowledge of FinOps and cost optimization best practices.
- Be a hands-on technical leader driving mission-critical system reliability.
- Work with modern SRE methodologies, observability platforms, and automation tools.
- Collaborate with passionate global engineering and product teams dedicated to innovation and excellence.
The International Traffic in Arms Regulations (ITAR) is the United States regulation that controls the manufacture, sale, and distribution of defense and space-related articles and services as defined in the United States Munitions List (USML). Besides rocket launchers, torpedoes, and other military hardware, the list also restricts the plans, diagrams, photos, and other documentation used to build ITAR-controlled military gear. This is referred to by ITAR as “technical data”. ITAR mandates that access to physical materials or technical data related to defense and military technologies is restricted to US Persons only.
This job is no longer accepting applications
See open jobs at Amper Technologies.See open jobs similar to "Site Reliability Engineer" The Garage.