About the company

At PINTU, We are building the #1 crypto investment platform to focus on new investors in Indonesia and Southeast Asia. We know that 99% of new investors are underserved because existing solutions cater to the 1% who are pros and early adopters hence we built an app that helps them to learn, invest and sell cryptocurrencies with one click away.

Job Summary

In this role, you will need to be able to have:

📍Lead efforts to improve the reliability and availability of our systems through automation, proactive monitoring, and capacity planning. 📍Respond to and manage incidents, identifying the root cause and implementing preventive measures to minimize future incidents. 📍Develop and maintain automation tools and scripts to streamline operational tasks, configuration management, and deployment processes. 📍Analyze system performance and identify bottlenecks, making recommendations for improvements and optimizations. 📍Work on designing and implementing scalable architectures to accommodate growth and increased user demand. 📍Utilize IaC tools (e.g., Terraform, Ansible) to manage and provision infrastructure components. 📍Set up and maintain monitoring systems to track system health and performance metrics. Configure alerting and notifications to respond to anomalies. 📍Collaborate with development teams to ensure that new applications and features are designed with reliability and operability in mind. 📍Provide guidance, mentorship, and technical leadership to junior members of the SRE team, fostering their professional growth and ensuring team cohesion 📍Create and maintain documentation for systems, processes, and best practices. 📍Implement and maintain security best practices and participate in security reviews and audits. 📍Participate in an on-call rotation to provide 24/7 support and incident response.

Who We Are Looking For

📍Bachelor's degree in Computer Science, Information Technology, or a related field. 📍Several years of experience in a Site Reliability Engineer or DevOps role. 📍Proficiency in scripting and programming languages like Bash, Python, or Go. 📍Strong knowledge of containerization technologies (e.g., Docker, Kubernetes) and cloud platforms (e.g., AWS & Google Cloud). 📍Expertise in Kafka, including setting up, configuring, and managing Kafka clusters for real-time data streaming. 📍Hands-on experience with designing, implementing, and maintaining distributed systems and microservices architectures. 📍Experience with configuration management tools (e.g., Terraform and Ansible). 📍Deep understanding of networking, databases, and web services. 📍Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, Datadog, ELK Stack). 📍Excellent problem-solving skills and the ability to work well in high-pressure situations. 📍Strong communication and collaboration skills. 📍Relevant certifications (e.g., AWS Certified DevOps Engineer, Google Professional DevOps Engineer) are a plus.

Senior Site Reliability Engineer

About the company

Job Summary

In this role, you will need to be able to have:

Who We Are Looking For

Salaries for similar jobs:

Similar jobs