Senior Cloud HPC Engineer

U.S., Canada, UK, and elsewhere

Position : Senior Cloud HPC Engineer

As a Senior Cloud HPC Engineer at Qvelo, you will play a crucial role in architecting, deploying, and optimizing high-performance computing (HPC) environments on cloud platforms. This role involves leveraging cloud technologies to build scalable, secure, and high-performance HPC solutions for complex workloads such as simulations, machine learning, big data analytics, and scientific research. You will collaborate with cross-functional teams, providing expertise in cloud-based HPC infrastructure, system integration, and performance tuning, helping clients maximize the value of their cloud HPC deployments.

 

Key Responsibilities:

  • Lead the design and architecture of cloud-based HPC solutions across platforms such as AWS, Azure, and Google Cloud.
  • Deploy and configure cloud HPC environments that support data-intensive applications, AI/ML workloads, and large-scale simulations.
  • Optimize HPC infrastructure for scalability, performance, and cost-efficiency, ensuring seamless integration with on-premises and hybrid HPC systems.
  • Implement and manage high-performance storage solutions in the cloud, including parallel file systems and distributed storage architectures.
  • Leverage cloud-native technologies like containers, Kubernetes, and serverless architectures to enhance the flexibility and agility of cloud HPC environments.
  • Configure and tune high-speed networking solutions in cloud environments, ensuring low-latency communication between compute nodes and storage.
  • Design and deploy automation frameworks to streamline the provisioning, scaling, and monitoring of cloud HPC resources.
  • Ensure security best practices are followed when deploying HPC clusters in the cloud, including data encryption, access control, and secure networking.
  • Collaborate with DevOps teams to create automated CI/CD pipelines for HPC applications, enabling rapid deployment and updates.
  • Monitor and troubleshoot cloud-based HPC systems, identifying performance bottlenecks and ensuring optimal resource utilization.
  • Provide expertise in cloud cost management, optimizing the use of cloud resources to minimize costs while maintaining performance.
  • Mentor and guide junior engineers and technical staff, providing leadership on cloud HPC technologies and best practices.
  • Stay updated on emerging cloud and HPC technologies, incorporating the latest advancements into client solutions and internal infrastructure.

Requirements:

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related technical field.
  • 5+ years of experience in architecting and managing HPC environments, with a strong emphasis on cloud-based HPC systems.
  • Proven expertise in cloud platforms such as AWS (with services like EC2, S3, FSx for Lustre), Microsoft Azure (Azure Batch, Azure HPC Cache), or Google Cloud (Google Cloud HPC).
  • Deep knowledge of cloud-native technologies such as containers (Docker, Singularity), Kubernetes, and orchestration tools for HPC environments.
  • Experience with high-performance interconnects (InfiniBand, RoCE) and optimizing network configurations in cloud environments.
  • Proficiency in configuring and managing parallel file systems (Lustre, GPFS) and distributed storage systems in the cloud.
  • Strong knowledge of cloud cost optimization strategies, including right-sizing instances, leveraging spot instances, and automating scaling policies.
  • Experience with automation and scripting (e.g., Terraform, Ansible, Python, Bash) for cloud resource provisioning, monitoring, and scaling.
  • Familiarity with cloud security best practices, including data encryption, secure access controls, and compliance with industry standards.
  • Strong problem-solving skills, with experience in troubleshooting and optimizing HPC applications for cloud environments.
  • Excellent communication and collaboration skills, with the ability to work with both technical and non-technical stakeholders.

Preferred Qualifications:

  • Certification in cloud platforms such as AWS Certified Solutions Architect, Azure Solutions Architect Expert, or Google Cloud Professional Architect.
  • Experience with hybrid HPC architectures, integrating cloud and on-premises HPC systems.
  • Knowledge of AI/ML workloads in cloud HPC environments, including deep learning frameworks (TensorFlow, PyTorch) and GPU-accelerated computing.
  • Familiarity with quantum computing technologies and their integration into cloud-based HPC solutions.
  • Experience with DevOps practices in cloud environments, including CI/CD for HPC applications and infrastructure-as-code (IaC) principles.

Department
CTO Office

Employment Type
Contract

Location
Remote or Hybrid (depending on your flexibility)

Workplace type
Hybrid/Remote

Compensation
Competitive, based on experience

Security Clearance
Canadian, U.S., or NATO clearance levels are desirable, but not mandatory. Some projects will require applicants to obtain a clearance at Secret-level clearance or higher.

Why Join Us?

As a Senior Cloud HPC Engineer at Qvelo, you will be at the forefront of cloud-based high-performance computing, working on projects that drive innovation across industries. Whether you’re building scalable cloud HPC solutions for scientific research, optimizing AI workloads in the cloud, or automating the deployment of large-scale simulations, you’ll play a key role in shaping the future of cloud HPC. We offer opportunities for continuous learning, professional development, and exposure to the latest cloud technologies, ensuring you stay at the cutting edge of the field. Join our collaborative team and help us push the boundaries of what’s possible in cloud-based high-performance computing.