About the company
We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step function change will come from vision. So, we are working on training and scaling up multimodal foundation models for systems that can see and understand, show and explain, and eventually interact with our world to effect change. We will deploy these systems to make a new kind of intelligent creative partner that can imagine with us. Free and away from the pressure of being creative. It's for all of us whose imaginations have been constrained, who've had to channel vivid dreams through broken words, hoping others will see what we see in our mind's eye. A partner that can help us show — not just tell.
Job Summary
What’s expected
📍Strong programming skills in python and especially a deep understanding of pytorch, numpy, and basic libraries for working with images and structured data 📍Ability to rapidly build and add features to CLI tools, dashboards for visualization, comparison, and metric tracking to accelerate research 📍Experience with filtering and preparing training data 📍Experience with large model training is a plus
What we wouid love to see
📍Experience working with large distributed systems like SLURM, Ray or other similar technologies 📍Experience with deploying models (building docker images, kubernetes basics) 📍Knowledge of graphics fundamentals, 3D formats, tools (e.g. Blender/UE4), and/or 3D-related python libraries 📍Ability to not only use but also implement new models in codebases like diffusers, transformers