About the company
Ava Labs makes it simple to deploy high-performance solutions for Web3, led by innovations on Avalanche. The company was founded by Cornell computer scientists, who partnered with Wall Street veterans and early Web3 leaders to execute a promising vision for redefining the way people build and use open, permissionless networks. Ava Labs is redefining the way people create value with Web3.
Job Summary
WHAT YOU WILL DO
📍Design, develop, and optimize highly reliable and scalable infrastructure focused on SRE principles. 📍Implement and maintain monitoring, logging, and tracing tools to gain insights into service behavior and health. 📍Establish and uphold SLOs (Service Level Objectives), SLIs (Service Level Indicators), and error budgets for critical systems. 📍Enhance the reliability and resiliency of critical systems by identifying single points of failure and implementing best practices. 📍Collaborate with software developers to build reliability and performance into applications from inception. 📍Automate and streamline incident management processes to minimize service disruption and improve response times. 📍Participate in on-call rotations, ensuring quick restoration of services and fostering a blameless post-mortem culture. 📍Foster a continuous improvement mindset by analyzing and learning from incidents and implementing preventive measures. 📍Leverage cloud technologies and IaC tools to ensure scalability and repeatability. 📍Advocate for best practices in reliability, security, and maintainability within the team and across the organization.
WHAT YOU WILL BRING
📍BS in Computer Science or related field. 📍8+ years of experience as an SRE, DevOps, or Cloud Engineer. 📍Strong grasp of SRE principles, including error budgets, SLOs, and SLIs. 📍Cloud networking and orchestration with AWS (EKS, ECS, VPC, S3, ELB). 📍Strong Kubernetes experience with Docker or RKT containerization. 📍Proficiency in Infrastructure as Code (IaC) using tools such as Terraform, Terragrunt, and Ansible. 📍Experience with monitoring and observability tools like Prometheus, Grafana, or ELK Stack. 📍Building and maintaining CI/CD pipelines with GitHub Actions (preferred), Jenkins, Travis CI, Circle CI. 📍Experience with automation and configuration management using Ansible, Puppet or Chef. 📍Experience with Linux-based infrastructures. (Ubuntu preferred). 📍Experience with scripting languages and the creation of scripts. (Python and GoLang preferred). 📍Working knowledge of decentralized architecture design patterns and distributed systems.