Senior HPC Storage Architect
U.S., Canada, UK, and elsewhere
Position : Senior HPC Storage Architect
As a Senior HPC Storage Architect at Qvelo, you will be responsible for designing, implementing, and optimizing advanced storage architectures for high-performance computing (HPC) environments. You will work closely with clients, system architects, and engineering teams to develop scalable, high-performance storage solutions that meet the demanding needs of large-scale data processing, AI/ML workloads, and scientific research. Your expertise in parallel file systems, storage technologies, and data management will be critical in ensuring efficient data storage and retrieval in HPC infrastructures.
Key Responsibilities:
- Lead the design and architecture of scalable storage systems for HPC environments, ensuring that they support high throughput, low latency, and massive data volumes.
- Implement and optimize parallel file systems such as Lustre, GPFS (IBM Spectrum Scale), and other distributed storage solutions to meet client needs for performance and scalability.
- Collaborate with HPC infrastructure teams to design data storage hierarchies, incorporating high-performance tiers (SSD, NVMe) and long-term archival storage (HDD, object storage).
- Evaluate and recommend storage technologies that are best suited for specific workloads, including scientific simulations, AI model training, and big data analytics.
- Ensure that storage solutions are optimized for HPC clusters, addressing issues such as I/O bottlenecks, data redundancy, and fault tolerance.
- Design data management strategies that handle large-scale data processing, including data retention policies, backup, disaster recovery, and data lifecycle management.
- Work with clients to assess their current storage infrastructure, identifying areas for improvement and proposing tailored solutions to meet future storage demands.
- Collaborate with networking teams to ensure high-speed data transfers between storage and compute nodes, leveraging technologies like InfiniBand or high-performance Ethernet.
- Monitor storage performance, troubleshoot issues, and implement improvements to optimize storage I/O throughput for HPC workloads.
- Design cloud-based and hybrid storage architectures, integrating on-premises storage with cloud platforms (AWS, Azure, Google Cloud) for flexible and scalable data management.
- Ensure data security and compliance in storage architectures, implementing encryption, access controls, and security best practices to protect sensitive data.
- Provide technical leadership and mentorship to junior engineers and storage administrators, guiding best practices in HPC storage architecture and optimization.
- Stay current with emerging storage technologies, trends, and best practices, continuously improving the efficiency and performance of HPC storage solutions.
Requirements:
- 8+ years of experience in designing, deploying, and managing storage systems in HPC environments or large-scale IT infrastructures.
- Expertise in parallel file systems (e.g., Lustre, GPFS, BeeGFS) and storage technologies for high-performance environments.
Strong knowledge of high-performance storage hardware, including SSD, NVMe, HDD, and tiered storage architectures. - Experience with data management strategies, including backup, disaster recovery, data migration, and lifecycle management.
- Proficiency in configuring and optimizing high-speed networking technologies (InfiniBand, Ethernet) for data transfer in HPC environments.
- Experience with cloud storage solutions and hybrid storage architectures that integrate cloud platforms with on-premises HPC infrastructure.
- Strong understanding of performance tuning and I/O optimization for storage systems in data-intensive workloads.
- Familiarity with AI/ML storage requirements, such as managing large datasets and optimizing storage for GPU-accelerated workloads.
- Proficiency in scripting and automation (Python, Bash, Ansible) to automate storage tasks and improve operational efficiency.
- Strong problem-solving skills, with the ability to troubleshoot complex storage issues and optimize systems for HPC workloads.
Preferred Qualifications:
- Experience with object storage systems (e.g., Ceph, MinIO) and their integration into HPC environments.
- Knowledge of data compression, deduplication, and other storage optimization techniques.
- Certifications in storage technologies or related fields (e.g., SNIA Certified Storage Architect).
- Familiarity with quantum storage solutions or cutting-edge storage technologies for next-generation HPC environments.
- Experience with DevOps practices for managing and automating storage systems in HPC environments.
Department
CTO Office
Employment Type
Contract
Location
Remote or Hybrid (depending on your flexibility)
Workplace type
Hybrid/Remote
Compensation
Competitive, based on experience
Security Clearance
Canadian, U.S., or NATO clearance levels are desirable, but not mandatory. Some projects will require applicants to obtain a clearance at Secret-level clearance or higher.
Why Join Us?
As a Senior HPC Storage Architect at Qvelo, you will have the opportunity to design and implement cutting-edge storage solutions that power some of the world’s most demanding HPC environments. You will collaborate with industry-leading experts in high-performance computing, AI, and big data, playing a key role in optimizing storage for breakthrough research, complex simulations, and advanced machine learning workloads. We offer a dynamic and innovative work environment where your expertise will have a direct impact on solving complex storage challenges in HPC. Join us and help build the future of data management in high-performance computing