ML Engineer Large - Scale AI Infrastructure

Company: Genbio
Location: Palo Alto
Posted on: June 2, 2025

Job Description:

Headquartered in Silicon Valley, we are a newly established start-up, where a collective of visionary scientists, engineers, and entrepreneurs are dedicated to transforming the landscape of biology and medicine through the power of Generative AI. Our team comprises leading minds and innovators in AI and Biological Science, pushing the boundaries of what is possible. We are dreamers who reimagine a new paradigm for biology and medicine.We are committed to decoding biology holistically and enabling the next generation of life-transforming solutions. As the first mover in pan-modal Large Biological Models (LBM), we are pioneering a new era of biomedicine, with our LBM training leading to ground-breaking advancements and a transformative approach to healthcare. Our exceptionally strong R&D team and leadership in LLM and generative AI position us at the forefront of this revolutionary field. With headquarters in Silicon Valley, California, and a branch office in Paris, we are poised to make a global impact. Join us as we embark on this journey to redefine the future of biology and medicine through the transformative power of Generative AI.Job Description

GPU Cluster Management: Design, deploy, and maintain high-performance GPU clusters, ensuring their stability, reliability, and scalability. Monitor and manage cluster resources to maximize utilization and efficiency.
Distributed/Parallel Training: Implement distributed computing techniques to enable parallel training of large deep learning models across multiple GPUs and nodes. Optimize data distribution and synchronization to achieve faster convergence and reduced training times.
Performance Optimization: Fine-tune GPU clusters and deep learning frameworks to achieve optimal performance for specific workloads. Identify and resolve performance bottlenecks through profiling and system analysis.
Deep Learning Framework Integration: Collaborate with data scientists and machine learning engineers to integrate distributed training capabilities into GenBio AI's model development and deployment frameworks.
Scalability and Resource Management: Ensure that the GPU clusters can scale effectively to handle increasing computational demands. Develop resource management strategies to prioritize and allocate computing resources based on project requirements.
Troubleshooting and Support: Troubleshoot and resolve issues related to GPU clusters, distributed training, and performance anomalies. Provide technical support to users and resolve technical challenges efficiently.
Documentation: Create and maintain documentation related to GPU cluster configuration, distributed training workflows, and best practices to ensure knowledge sharing and seamless onboarding of new team members.Job Requirements:
- Master's or Ph.D. degree in computer science, or a related field with a focus on High-Performance Computing, Distributed Systems, or Deep Learning.
- 2+ years proven experience in managing GPU clusters, including installation, configuration, and optimization.
- Strong expertise in distributed deep learning and parallel training techniques.
- Proficiency in popular deep learning frameworks like PyTorch, Megatron-LM, DeepSpeed, etc.
- Programming skills in Python and experience with GPU-accelerated libraries (e.g., CUDA, cuDNN).
- Knowledge of performance profiling and optimization tools for HPC and deep learning.
- Familiarity with resource management and scheduling systems (e.g., SLURM, Kubernetes).
- Strong background in distributed systems, cloud computing (AWS, GCP), and containerization (Docker, Kubernetes).We are an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.
  #J-18808-Ljbffr

Keywords: Genbio, Modesto , ML Engineer Large - Scale AI Infrastructure, Engineering , Palo Alto, California

Click here to apply!

Didn't find what you're looking for? Search again!

Let Palo Alto recruiters find you. Post your resume for free!

Get Palo Alto Engineering jobs via email.

View more Modesto Engineering jobs

Other Engineering Jobs

Project Engineer - FILLED
Description: Job Title: Project EngineerJob Description:The Project Engineer will be more hands-on on the job site than the Project Manager PM , acting as the eyes and ears
Company: Jennifer Powers
Location: Livermore
Posted on: 06/4/2025

Sr Packaging Engineer
Description: Sr Packaging Engineer br br Interested in learning more about this job Scroll down and find out what skills, experience and educational qualifications are needed. br br Company:
Company: Nevro Corp.
Location: Redwood City
Posted on: 06/4/2025

Fullstack Engineer, Applied AI
Description: Healthcare providers go into medicine to care for people, but end up losing valuable time each day to admin work and other workplace challenges. Time that could otherwise be spent helping patients. And (more...)
Company: Augmedix, Inc.
Location: Mountain View
Posted on: 06/4/2025

Salary in Modesto, California Area | More details for Modesto, California Jobs |Salary

Automotive Service Mechanic
Description: br br br br br br MobilityWorks -is actively seeking mechanics and technicians of different expertise levels to join our team urgently. We welcome candidates from diverse backgrounds, not (more...)
Company: MobilityWorks
Location: Davis
Posted on: 06/4/2025

Solutions Engineer Mountain View, California
Description: Specifically targeting candidates based in Mountain View, CAAt Databricks, our core values drive everything we do a culture of proactiveness and a customer-first mindset fuels our mission to create a (more...)
Company: Databricks Inc.
Location: Mountain View
Posted on: 06/4/2025

Release Engineer, Tools
Description: Apply ul li Bachelor's degree in Computer Science, Electrical Engineering, a related technical field, or equivalent practical experience. li 2 years of experience with one or more development languages (more...)
Company: Google Inc.
Location: Mountain View
Posted on: 06/4/2025

Mid Full-Stack Engineer (Go, React, TypeScript)
Description: GeneralWhat we doOur mission is to empower building materials suppliers with great technology. There are over 1,000,000,000,000 of building materials bought in the US every year, and many of these transactions (more...)
Company: Rundoo Inc.
Location: Redwood City
Posted on: 06/4/2025

Staff Engineer, Backend
Description: About Us br At Citizen Health, we're on a mission to revolutionize rare disease research and patient care through the power of data and technology. Founded by individuals with personal connections to (more...)
Company: Citizen Health Inc.
Location: San Mateo
Posted on: 06/4/2025

Principal Security Engineer
Description: Mountain View, California, United StatesCompany OverviewID.me is a high-growth enterprise software company that simplifies how people prove and share their identity online. The company empowers people (more...)
Company: ECL Kontor
Location: Mountain View
Posted on: 06/4/2025

Senior Android Engineer
Description: Prove is the modern platform for continuous identity authentication and is used by over 1,000 enterprises and 500 financial institutions, including 9 of the top 10 U.S. banks. Prove's cloud solutions (more...)
Company: UnifyID (acquired by Prove)
Location: Redwood City
Posted on: 06/4/2025

Loading more jobs...

ML Engineer Large - Scale AI Infrastructure

Didn't find what you're looking for? Search again!

Other Engineering Jobs

Log In or Create An Account