About the company

Our team is working on the next generation of crypto solutions. Whether you are looking for a role as a Blockchain Software Engineer in San Francisco, a Partner Engineer in London or a Sales Representative in Singapore, Ripple is the place to build something transformative.

Job Summary

WHAT YOU’LL DO:

📍Keeping your assigned site or service up and running or getting it back up and running quickly when failure occurs, 📍Actively troubleshoot any issues that arise during testing and production, catching and solving issues before launch, 📍Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much more, 📍Monitor and troubleshoot highly scalable and distributed server clusters that perform various functions, from web-servers to machine learning processing, 📍Be on a PagerDuty rotation to respond to availability incidents and provide support for service engineers with customer incidents, 📍Participate and establish best practices in Site Reliability Engineering, 📍Manage code deployments, fixes, updates, and related processes, 📍Work with a close-knit team and brainstorm on the best ways to tackle complex problems in infrastructure, security and monitoring, 📍Provide technical guidance and educate team members and coworkers on monitoring and logging. (Have an interesting idea or solution? Present it!), 📍Automating any software maintenance processes which previously required a manual procedure.

WHAT WE’RE LOOKING FOR:

📍3+ years’ experience with software engineering, software development, or system operations on high available and high traffic environments, 📍Strong experience with Linux-based infrastructures, Linux/Unix administration, and Azure 📍Experience with databases such as PostgreSQL 📍Experience administering linux servers as well as docker based infrastructure (like Kubernetes, AKS, etc.) in a highly available environment, 📍Experience of scripting languages such as Python, Bash, 📍Experience with message broker/queue technologies like RabbitMQ, 📍Experience with modern monitoring, logging and observability tools in complex distributed systems such as with Application Insights, Grafana, New Relic, Splunk, Elastic stack, Datadog, Prometheus, etc, 📍Practical experience with infrastructure-as-code (with tools like Terraform, Chef, Ansible, etc.). 📍Good understanding of cybersecurity fundamentals and best practices, 📍Containerizing and clustering (Dockerfiles, docker-compose, Helm, Kubernetes, etc.), 📍Stellar problem-solving and troubleshooting skills with the ability to spot issues before they become problems, 📍Fluent language skills in English, 📍Excellent oral and written communication skills, 📍Process-oriented with great documentation skills, 📍Solid team player!

Site Reliability Engineer

About the company

Job Summary

WHAT YOU’LL DO:

WHAT WE’RE LOOKING FOR:

Salaries for similar jobs:

Similar jobs