Create New Account
Sign up to continue searching for suitable jobs in Web 3.0

OR
Terms of Use
Already have an account?

Log In to Your Account
Log in to continue searching for suitable jobs in Web 3.0

OR
Donā€™t have an account?
Nethermind
Site Reliability Engineer Technical Lead
about 2 months ago | 99 views | Be the first one to apply

Site Reliability Engineer Technical Lead

Full-time
Remote, Europe
Per year
$112,000 To $156,000

About the company

We are a team of world class builders and researchers with expertise across several domains: Ethereum Protocol Engineering, Layer-2, Decentralized Finance (DeFi), Miner Extractable Value (MEV), Smart Contract Development, Security Auditing and Formal Verification. Working to solve some of the most challenging problems in the blockchain space, we frequently collaborate with renowned companies, such as Ethereum Foundation, StarkWare, Gnosis Chain, Aave, Flashbots, xDai, Open Zeppelin, Forta Protocol, Energy Web, POA Network and many more. We actively contribute to Ethereum core development, EIPā€™s and network upgrades together with the Ethereum Foundation, and other client teams.

Job Summary

Responsibilities:

šŸ“Lead the implementation and refinement of SRE practices across the organization, including SLOs, error budgets, and blameless postmortems šŸ“Design and implement automation to eliminate toil and improve system reliability and efficiency šŸ“Lead initiatives and architect scalable hybrid cloud solutions for Web3 infrastructure šŸ“Manage error budgets and make data-driven decisions about when to prioritize reliability vs. new features šŸ“Drive SRE practices to ensure high availability, performance, and reliability under varying load conditions šŸ“Collaborate closely with Platform engineering team to build reliability into services from the ground up šŸ“Collaborate closely with Nethermindā€™s Infrastructure Leadership department to align SRE strategies with overall technical vision šŸ“Drive the adoption of observability best practices and implement comprehensive monitoring systems šŸ“Develop and maintain service level indicators (SLIs) and objectives (SLOs), working with product owners to define appropriate reliability targets šŸ“Mentor team members in SRE practices and foster a culture of continuous learning šŸ“Lead capacity planning efforts, using quantitative analysis to predict and address future scaling challenges šŸ“Contribute to long-term technical roadmaps, balancing reliability concerns with product innovation

Skills:

šŸ“5+ years of experience in Site Reliability Engineering or DevOps šŸ“Expert knowledge of cloud platforms (AWS, GCP) šŸ“Expert knowledge of Kubernetes šŸ“Proven experience in designing and implementing scalable, efficient, resilient systems šŸ“Deep understanding of Linux/Unix systems and networking protocols šŸ“Strong programming skills in Python or Go šŸ“Strong background in monitoring, observability, and logging systems (e.g., Grafana, Prometheus, Loki) šŸ“Expertise in CI/CD tools (e.g. GitHub Actions, ArgoCD) šŸ“Excellent communication skills, both written and verbal, with the ability to explain complex technical concepts to various audiences šŸ“Experience in producing technical documentation, runbooks, presentations, and post-mortem reports šŸ“Experience and passion for mentoring and upskilling team members

Salaries for similar jobs:

Similar jobs

about 3 hours ago | 4 views | Be the first one to apply
Full-time
New York
about 3 hours ago | 0 views | Be the first one to apply
Full-time
Europe
about 3 hours ago | 4 views | Be the first one to apply
Full-time
Singapore
about 3 hours ago | 5 views | Be the first one to apply
Full-time
Medellin, Remote
$104,000 To $106,000 per year
1 day ago | 20 views | Be the first one to apply
Full-time
Greece