About the company

Problem: DeFi, Gaming, and Social analytic needs are growing exponentially, but SLAs on scaled compute/storage are missing in our Web3 world. Our Solution: Data warehouse clusters use in-memory optimization for lightspeed-fast SQL+ ML on massive streams of data. Data is automatically migrated between cache, SSDs, and IPFS.

Job Summary

As a Software Engineer on the Data Platform Engineering team, you will join skilled engineers and core database developers responsible for developing new in-memory computing tools and technologies based on Apache Arrow, to help launch data analytics from start to scale. You will collaborate with this diverse and talented team to drive innovations in data processing, implement performant network communications libraries, and build the core foundations for distributed data analytics solutions. This team is responsible for building the automated, intelligent, and highly performant query planner and execution engines, RPC calls between data warehouse clusters, shared secondary cold storage, etc. This includes building new SQL features and customer-facing functionality, developing novel query optimization techniques for industry-leading performance, and building a database system that's highly parallel, efficient and fault-tolerant. This is a vital role reporting to exec leadership and senior engineering leadership.

Responsibilities

Leverage and improve upon Apache Arrow’s in-memory, columnar format for data processing
Utilize Apache Arrow to create new features for Space and Time’s hosted, multi-cluster data warehouse for Web3 Write Rust code, understand distributed systems, and contribute to building Arrow DataFusion and Ballista Improve and optimize performance at scale leveraging existing Arrow primitives Develop new high-performance networking and communication primitives Improve performance of conversion on Spark DataFrame/RDD Develop database optimizers, query planners, query and data routing mechanisms, cluster-to-cluster communication, and workload management techniques Scaling up from proof of concept to “cluster scale” (and eventually hundreds of clusters with hundreds of terabytes each), in terms of both infrastructure/architecture and problem structure Analyze and enhance communication throughput in a MPP and distributed query engine Collaborate with the team responsible for writing new code to build a bigger, better, faster, more optimized HTAP database (using Apache Spark, Apache Arrow and a wealth of other open source data tools)

Skills & Qualifications

Bachelor’s degree (or higher) in computer science or related technical field. 5+ years of experience engineering software and distributed systems / enterprise-scale data warehouses 2+ years of experience with Apache Arrow Experience with data processing, database internals including Apache Spark Experience developing scalable distributed systems and deploying, scaling, and managing microservices Knowledge of concurrency, multithreading, and synchronization a plus Nice to have: Experience contributing to open-source software projects like the Apache Software Foundation

Benefits

˖ Ultra competitive salaries

˖ Medical, dental, and vision insurance, disability/life insurance

˖ 401(k) Plan

˖ Aggressive bonus structure and RSUs

˖ Very flexible PTO, paid holidays, and flexible workweeks

˖ Very flexible remote work options

˖ A massive list of perks including discretionary add-on bonuses for hard work, attending exciting events/conferences/parties, we’re headquartered on the beach near LA (but don’t mind you working remote)

Software Engineer - Apache Arrow

About the company

Job Summary

Salaries for similar jobs:

Similar jobs