Principal Engineer - AI Infrastructure Abstractions
Company: Diversity Talent Scouts
Location: San Jose
Posted on: February 16, 2026
|
|
|
Job Description:
Job Description Job Description As a Principal AI Infrastructure
Abstraction Engineer , you will design and implement the
foundational systems that make shared AI compute environments
scalable, secure, and developer-friendly. Your work will focus on
creating abstractions that hide hardware complexity while providing
predictable, cloud-native interfaces for AI workloads. This
position bridges infrastructure and applied AI—turning raw GPUs and
accelerators into programmable, elastic, and multi-tenant resources
for both internal developers and enterprise clients. Key
Responsibilities Architect abstractions that map logical compute
constructs (vGPUs, GPU pools, workload queues) to physical devices.
Build APIs, services, and control planes that expose GPU and
accelerator resources with strong isolation and quality-of-service
guarantees. Develop mechanisms for secure GPU sharing, including
time-slicing, partitioning, and namespace isolation. Work with
orchestration and scheduling systems to ensure intelligent mapping
of resources based on utilization, priority, and network topology.
Define policies for quotas, fair allocation, and resource
elasticity in shared environments. Integrate with AI/ML frameworks
(PyTorch, TensorFlow, Triton, etc.) to optimize model training and
inference workflows. Deliver observability and monitoring
capabilities that trace resource usage from logical abstractions to
hardware. Partner with platform security teams to strengthen access
controls, onboarding processes, and tenant isolation. Support
internal developer adoption of abstraction APIs while maintaining
high performance and low overhead. Contribute to long-term compute
platform strategy with a focus on modularity, abstraction, and
scale. Minimum Qualifications Bachelor’s degree with 15 years of
experience, Master’s with 12 years, or PhD with 8 years. Proven
track record building production-grade infrastructure systems,
preferably in Go, Python, or C++. Strong experience with
containerization and orchestration platforms (Kubernetes, Docker,
KubeVirt). Background in designing logical abstractions for
compute, storage, or networking in multi-tenant systems.
Familiarity with integrating with machine learning platforms (e.g.,
PyTorch, TensorFlow, Triton, MLFlow). Preferred Qualifications
Hands-on experience with GPU sharing, scheduling, or isolation
(MIG, MPS, vGPUs, time-slicing, or device plugin models). Deep
knowledge of resource management: quotas, prioritization, fairness,
elasticity. Strong ability to think across hardware/software
boundaries and design abstractions that scale.
Keywords: Diversity Talent Scouts, Milpitas , Principal Engineer - AI Infrastructure Abstractions, IT / Software / Systems , San Jose, California