Senior HPC Storage Architect

U.S., Canada, UK, and elsewhere

Position : Senior HPC Storage Architect

As a Senior HPC Storage Architect at Qvelo, you will be responsible for designing, implementing, and optimizing advanced storage architectures for high-performance computing (HPC) environments. You will work closely with clients, system architects, and engineering teams to develop scalable, high-performance storage solutions that meet the demanding needs of large-scale data processing, AI/ML workloads, and scientific research. Your expertise in parallel file systems, storage technologies, and data management will be critical in ensuring efficient data storage and retrieval in HPC infrastructures.

 

Key Responsibilities:

  • Lead the design and architecture of scalable storage systems for HPC environments, ensuring that they support high throughput, low latency, and massive data volumes.
  • Implement and optimize parallel file systems such as Lustre, GPFS (IBM Spectrum Scale), and other distributed storage solutions to meet client needs for performance and scalability.
  • Collaborate with HPC infrastructure teams to design data storage hierarchies, incorporating high-performance tiers (SSD, NVMe) and long-term archival storage (HDD, object storage).
  • Evaluate and recommend storage technologies that are best suited for specific workloads, including scientific simulations, AI model training, and big data analytics.
  • Ensure that storage solutions are optimized for HPC clusters, addressing issues such as I/O bottlenecks, data redundancy, and fault tolerance.
  • Design data management strategies that handle large-scale data processing, including data retention policies, backup, disaster recovery, and data lifecycle management.
  • Work with clients to assess their current storage infrastructure, identifying areas for improvement and proposing tailored solutions to meet future storage demands.
  • Collaborate with networking teams to ensure high-speed data transfers between storage and compute nodes, leveraging technologies like InfiniBand or high-performance Ethernet.
  • Monitor storage performance, troubleshoot issues, and implement improvements to optimize storage I/O throughput for HPC workloads.
  • Design cloud-based and hybrid storage architectures, integrating on-premises storage with cloud platforms (AWS, Azure, Google Cloud) for flexible and scalable data management.
  • Ensure data security and compliance in storage architectures, implementing encryption, access controls, and security best practices to protect sensitive data.
  • Provide technical leadership and mentorship to junior engineers and storage administrators, guiding best practices in HPC storage architecture and optimization.
  • Stay current with emerging storage technologies, trends, and best practices, continuously improving the efficiency and performance of HPC storage solutions.

Requirements:

  • 8+ years of experience in designing, deploying, and managing storage systems in HPC environments or large-scale IT infrastructures.
  • Expertise in parallel file systems (e.g., Lustre, GPFS, BeeGFS) and storage technologies for high-performance environments.
    Strong knowledge of high-performance storage hardware, including SSD, NVMe, HDD, and tiered storage architectures.
  • Experience with data management strategies, including backup, disaster recovery, data migration, and lifecycle management.
  • Proficiency in configuring and optimizing high-speed networking technologies (InfiniBand, Ethernet) for data transfer in HPC environments.
  • Experience with cloud storage solutions and hybrid storage architectures that integrate cloud platforms with on-premises HPC infrastructure.
  • Strong understanding of performance tuning and I/O optimization for storage systems in data-intensive workloads.
  • Familiarity with AI/ML storage requirements, such as managing large datasets and optimizing storage for GPU-accelerated workloads.
  • Proficiency in scripting and automation (Python, Bash, Ansible) to automate storage tasks and improve operational efficiency.
  • Strong problem-solving skills, with the ability to troubleshoot complex storage issues and optimize systems for HPC workloads.

Preferred Qualifications:

  • Experience with object storage systems (e.g., Ceph, MinIO) and their integration into HPC environments.
  • Knowledge of data compression, deduplication, and other storage optimization techniques.
  • Certifications in storage technologies or related fields (e.g., SNIA Certified Storage Architect).
  • Familiarity with quantum storage solutions or cutting-edge storage technologies for next-generation HPC environments.
  • Experience with DevOps practices for managing and automating storage systems in HPC environments.

Department
CTO Office

Employment Type
Contract

Location
Remote or Hybrid (depending on your flexibility)

Workplace type
Hybrid/Remote

Compensation
Competitive, based on experience

Security Clearance
Canadian, U.S., or NATO clearance levels are desirable, but not mandatory. Some projects will require applicants to obtain a clearance at Secret-level clearance or higher.

Why Join Us?

As a Senior HPC Storage Architect at Qvelo, you will have the opportunity to design and implement cutting-edge storage solutions that power some of the world’s most demanding HPC environments. You will collaborate with industry-leading experts in high-performance computing, AI, and big data, playing a key role in optimizing storage for breakthrough research, complex simulations, and advanced machine learning workloads. We offer a dynamic and innovative work environment where your expertise will have a direct impact on solving complex storage challenges in HPC. Join us and help build the future of data management in high-performance computing