About the company
Problem: DeFi, Gaming, and Social analytic needs are growing exponentially, but SLAs on scaled compute/storage are missing in our Web3 world. Our Solution: Data warehouse clusters use in-memory optimization for lightspeed-fast SQL+ ML on massive streams of data. Data is automatically migrated between cache, SSDs, and IPFS.
Job Summary
As a Software Engineer on the Data Platform Engineering team, you will join skilled engineers and core database developers responsible for developing new in-memory computing tools and technologies based on Apache Arrow, to help launch data analytics from start to scale. You will collaborate with this diverse and talented team to drive innovations in data processing, implement performant network communications libraries, and build the core foundations for distributed data analytics solutions. This team is responsible for building the automated, intelligent, and highly performant query planner and execution engines, RPC calls between data warehouse clusters, shared secondary cold storage, etc. This includes building new SQL features and customer-facing functionality, developing novel query optimization techniques for industry-leading performance, and building a database system that's highly parallel, efficient and fault-tolerant. This is a vital role reporting to exec leadership and senior engineering leadership.
Responsibilities
Leverage and improve upon Apache Arrow’s in-memory, columnar format for data processing
Utilize Apache Arrow to create new features for Space and Time’s hosted, multi-cluster data warehouse for Web3
Write Rust code, understand distributed systems, and contribute to building Arrow DataFusion and Ballista
Improve and optimize performance at scale leveraging existing Arrow primitives
Develop new high-performance networking and communication primitives
Improve performance of conversion on Spark DataFrame/RDD
Develop database optimizers, query planners, query and data routing mechanisms, cluster-to-cluster communication, and workload management techniques
Scaling up from proof of concept to “cluster scale” (and eventually hundreds of clusters with hundreds of terabytes each), in terms of both infrastructure/architecture and problem structure
Analyze and enhance communication throughput in a MPP and distributed query engine
Collaborate with the team responsible for writing new code to build a bigger, better, faster, more optimized HTAP database (using Apache Spark, Apache Arrow and a wealth of other open source data tools)
Skills & Qualifications
Bachelor’s degree (or higher) in computer science or related technical field. 5+ years of experience engineering software and distributed systems / enterprise-scale data warehouses 2+ years of experience with Apache Arrow Experience with data processing, database internals including Apache Spark Experience developing scalable distributed systems and deploying, scaling, and managing microservices Knowledge of concurrency, multithreading, and synchronization a plus Nice to have: Experience contributing to open-source software projects like the Apache Software Foundation
Benefits
˖ Ultra competitive salaries
˖ Medical, dental, and vision insurance, disability/life insurance
˖ 401(k) Plan
˖ Aggressive bonus structure and RSUs
˖ Very flexible PTO, paid holidays, and flexible workweeks
˖ Very flexible remote work options
˖ A massive list of perks including discretionary add-on bonuses for hard work, attending exciting events/conferences/parties, we’re headquartered on the beach near LA (but don’t mind you working remote)