About the company

Blockdaemon offers a multi-chain multi-cloud network management tool that can deploy nodes and connect them to blockchains within minutes. Blockdaemon also offers its own infrastructure for select projects to offer faster deploy times and lower costs. The end goal is to offer all blockchain projects a global decentralized network management tool connected to multiple infrastructure providers.

Job Summary

Your Impact

📍System Architecture and Design: Collaborate with software engineering teams to design scalable, highly available, and resilient systems. Drive architectural improvements to enhance system reliability and performance. 📍Automation and Tooling: Develop automation tools and scripts to streamline deployment, monitoring, and incident response processes. 📍Implement and maintain infrastructure as code frameworks. 📍Monitoring and Alerting: Configure and maintain monitoring systems to detect and mitigate potential issues proactively. Define alerting thresholds and response procedures to ensure timely incident resolution. 📍Incident Management: Respond to and resolve critical incidents, perform root cause analysis, and implement preventive measures to minimize the likelihood of recurrence. Participate in an on-call rotation to provide 24/7 support as needed. 📍Capacity Planning and Performance Optimization: Analyze system performance metrics, identify bottlenecks, and propose optimizations to improve resource utilization and efficiency. 📍Security and Compliance: Work closely with security teams to implement best practices for data protection, access control, and compliance with regulatory requirements. Conduct periodic security audits and vulnerability assessments. 📍Documentation and Knowledge Sharing: Document system configurations, procedures, and troubleshooting steps. Share knowledge and best practices with team members to foster a culture of continuous learning and improvement.

Role Requirements

📍Proven experience in an independent contributor role working with cloud platform technologies (AWS, GCP, Azure, etc), Infrastructure-as-📍Code tooling (Terraform, Pulumi, etc), and CI/CD orchestration platforms (CircleCI, Github Actions, etc). 📍Proficiency in scripting and programming languages such as Python, Golang, or TypeScript. 📍Experience with container and scheduling technologies (e.g., Docker, Kubernetes) and microservices architecture. 📍Hands-on experience with monitoring tools like Prometheus, Grafana, ELK stack, etc. 📍Excellent problem-solving skills and the ability to independently troubleshoot complex issues. 📍Strong understanding of Linux/Unix systems administration and networking concepts. 📍Strong communication and collaboration skills, with the ability to work effectively in cross-functional teams.

Site Reliability Engineer SRE EMEA

About the company

Job Summary

Your Impact

Role Requirements

Salaries for similar jobs:

Similar jobs