About the company
At PINTU, We are building the #1 crypto investment platform to focus on new investors in Indonesia and Southeast Asia. We know that 99% of new investors are underserved because existing solutions cater to the 1% who are pros and early adopters hence we built an app that helps them to learn, invest and sell cryptocurrencies with one click away.
Job Summary
In this role, you will need to be able to have:
šLead efforts to improve the reliability and availability of our systems through automation, proactive monitoring, and capacity planning. šRespond to and manage incidents, identifying the root cause and implementing preventive measures to minimize future incidents. šDevelop and maintain automation tools and scripts to streamline operational tasks, configuration management, and deployment processes. šAnalyze system performance and identify bottlenecks, making recommendations for improvements and optimizations. šWork on designing and implementing scalable architectures to accommodate growth and increased user demand. šUtilize IaC tools (e.g., Terraform, Ansible) to manage and provision infrastructure components. šSet up and maintain monitoring systems to track system health and performance metrics. Configure alerting and notifications to respond to anomalies. šCollaborate with development teams to ensure that new applications and features are designed with reliability and operability in mind. šProvide guidance, mentorship, and technical leadership to junior members of the SRE team, fostering their professional growth and ensuring team cohesion šCreate and maintain documentation for systems, processes, and best practices. šImplement and maintain security best practices and participate in security reviews and audits. šParticipate in an on-call rotation to provide 24/7 support and incident response.
Who We Are Looking For
šBachelor's degree in Computer Science, Information Technology, or a related field. šSeveral years of experience in a Site Reliability Engineer or DevOps role. šProficiency in scripting and programming languages like Bash, Python, or Go. šStrong knowledge of containerization technologies (e.g., Docker, Kubernetes) and cloud platforms (e.g., AWS & Google Cloud). šExpertise in Kafka, including setting up, configuring, and managing Kafka clusters for real-time data streaming. šHands-on experience with designing, implementing, and maintaining distributed systems and microservices architectures. šExperience with configuration management tools (e.g., Terraform and Ansible). šDeep understanding of networking, databases, and web services. šFamiliarity with monitoring and logging tools (e.g., Prometheus, Grafana, Datadog, ELK Stack). šExcellent problem-solving skills and the ability to work well in high-pressure situations. šStrong communication and collaboration skills. šRelevant certifications (e.g., AWS Certified DevOps Engineer, Google Professional DevOps Engineer) are a plus.