Expanse is a cutting-edge high-performance computing (HPC) resource designed for researchers and engineers. It offers powerful processing capabilities, scalability, and advanced features like Cloud Bursting and composable systems. The platform provides a user-friendly portal for easy access, file management, and job submission, making it an ideal solution for diverse computational needs.
What is Expanse?
Expanse is a high-performance computing (HPC) system designed to support advanced research and engineering applications. Built by Dell and the San Diego Supercomputer Center (SDSC), it delivers 5.16 peak petaflops of performance. The system features standard compute nodes with AMD EPYC 7742 processors, 256 GB of DDR4 memory, and 1 TB of NVMe storage per node. GPU nodes are equipped with NVIDIA V100 GPUs, enabling accelerated computing for demanding workloads. Expanse also supports composable systems and cloud bursting, allowing users to dynamically allocate resources and scale workflows. Its architecture is optimized for flexibility, scalability, and efficiency, making it a powerful tool for scientific simulations, data analysis, and machine learning tasks. The system is accessible via a user-friendly portal, designed to streamline job submission, file management, and monitoring.
Key Features and Capabilities
Expanse is a high-performance computing (HPC) system designed for advanced research and engineering applications. It delivers 5.16 peak petaflops, with 93,184 CPU cores and 208 NVIDIA V100 GPUs. The system features 220 TB of DRAM and 810 TB of NVMe storage, ensuring high-speed data processing. Standard compute nodes are powered by AMD EPYC 7742 processors, offering 128 cores per node and 256 GB of memory. GPU nodes provide 40 cores and 384 GB of memory, optimized for accelerated computing. Expanse supports composable systems, enabling dynamic resource allocation, and cloud bursting for scalable workflows. Its architecture is designed for flexibility, supporting diverse workloads from scientific simulations to machine learning. The system also offers a user-friendly portal for streamlined job submission, file management, and monitoring, making it a robust platform for computational research.
System Architecture
Expanse is a high-performance computing system designed by Dell and SDSC, delivering 5.16 peak petaflops. It combines standard and GPU nodes, with AMD EPYC processors and NVIDIA GPUs, ensuring scalable performance and efficient resource management.
Hardware Specifications
Expanse features a robust hardware architecture designed for high-performance computing. It includes 728 standard compute nodes, each equipped with dual 64-core AMD EPYC 7742 processors, 256 GB of DDR4 memory, and 1 TB NVMe storage. Additionally, there are 52 GPU nodes, each powered by NVIDIA V100 GPUs, 40 CPU cores, 384 GB of CPU DRAM, and 2.5 TB NVMe storage. The system delivers 5.16 peak petaflops, making it ideal for demanding workloads. Total system resources include 220 TB of DRAM and 810 TB of NVMe storage, ensuring superior performance for computational tasks.
Network and Storage Overview
Expanse boasts a high-speed network infrastructure optimized for efficient data transfer and communication between nodes. The system features a robust storage architecture with 810 TB of NVMe storage and 220 TB of total DRAM, ensuring fast access and high-performance capabilities. The storage solutions are designed to handle large-scale computational demands, providing scalable and reliable data management options. This advanced network and storage setup supports seamless job execution, file transfers, and data-intensive workflows, making Expanse a powerful tool for research and computational tasks.
Getting Started
Getting started with Expanse involves obtaining an account, logging onto the system via login.expanse.sdsc.edu, and mastering basic Unix commands for HPC operations and essential file management.
Obtaining an Expanse Account
To access Expanse, you must first obtain an account. This typically involves contacting the system administrators or filling out an application form. For trial access, visit the official Expanse trial account page. Ensure you are familiar with basic Unix commands, as they are essential for navigating the HPC environment. Once your account is approved, you will receive login credentials. The login node name for Expanse is login.expanse.sdsc.edu, and the hostnames are login01 or login02. After obtaining your account, you can begin exploring the system, transferring files, and submitting jobs. Additional resources, like the Basic Skills Tutorial, are available to help you get started with HPC operations.
Logging Onto Expanse
To log onto Expanse, use a secure shell (SSH) client to connect to the login node, which is accessible at login.expanse.sdsc.edu. The system supports SSH keys or two-factor authentication for secure access. Once connected, you will be directed to one of the login hosts, such as login01 or login02. Ensure you have your credentials ready, as they are required for authentication. After logging in, you can navigate the system, transfer files, and prepare for job submissions. For new users, the Basic Skills Tutorial is recommended to familiarize yourself with the HPC environment. The login node serves as the gateway to Expanse’s powerful computing resources, enabling you to leverage its capabilities effectively.
Basic Unix Commands for HPC
Familiarity with Unix commands is essential for navigating and managing resources on Expanse. Common commands include ls for listing files, cd for changing directories, and mkdir for creating folders. Use rm to delete files and cp or mv for copying or moving files. cat or less can view file contents, while grep helps search for patterns. To create or edit files, use touch or text editors like vim. Permissions can be modified with chmod. For remote access, ssh connects to nodes, and scp or rsync transfers files. Monitoring commands like pwd (show current directory), whoami (display username), and quota (check storage limits) are also useful. These commands provide the foundation for effective HPC workflow management on Expanse.
Obtaining Example Code
To help users get started, Expanse provides example code for various applications. These codes are hosted in the Expanse file system and can be accessed by logging into the system. Users can navigate to the /shared/expanse/examples directory, which contains sample scripts and programs for common tasks like MPI, OpenMP, and GPU acceleration. Additionally, users can download code examples from external repositories using git clone or wget. The Expanse User Portal also offers a “Code Examples” section, where users can find pre-optimized scripts for specific workflows. These resources are designed to help users familiarize themselves with the system and improve their productivity. For assistance, contact HPC support or refer to the Expanse documentation.
Using the Expanse User Portal
The Expanse User Portal offers an intuitive interface for logging in, managing files, submitting jobs, and monitoring their status. It also provides access to support resources and documentation, ensuring a seamless experience for users.
Navigating the User Portal Interface
The Expanse User Portal features a clean and intuitive interface designed to streamline user interactions. Upon logging in, users are greeted by a dashboard displaying key system metrics and recent activity. The navigation menu, located on the left-hand side, provides access to primary functions such as file management, job submission, and monitoring. The main panel offers a file explorer for browsing and organizing directories, while tabs at the top allow quick switching between views like jobs, files, and settings. A search bar at the top facilitates quick access to specific resources. Additionally, a help icon in the header provides direct links to documentation and support resources, ensuring users can easily find assistance when needed. The interface is optimized for efficiency, enabling users to navigate seamlessly and accomplish tasks with minimal effort.
File Transfer and Management
Expanse provides robust tools for transferring and managing files, ensuring efficient workflow management. The web-based portal offers a drag-and-drop interface for uploading files from local devices to the system. Users can also transfer files using secure protocols like SFTP or SCP via command-line tools. Once uploaded, files can be organized into directories, renamed, or deleted directly through the portal. The file explorer interface allows users to navigate their storage spaces, including home, project, and scratch directories. Additionally, the portal supports sharing files with collaborators by setting appropriate permissions. Transfer progress is displayed in real-time, and users receive notifications upon completion. These features make file management straightforward and accessible, even for users new to HPC environments. The portal also provides options for archiving files to tape storage, ensuring data security and availability for future use.
Job Submission and Monitoring
The Expanse User Portal simplifies job submission and monitoring through an intuitive interface. Users can submit jobs by completing a job submission form, specifying job type, resource requirements, and execution parameters. The portal allows real-time monitoring of job status, including queued, running, and completed states, via the “My Jobs” section. Each job is assigned a unique ID, and users can view detailed logs and output files directly within the portal. Job monitoring includes visual indicators for success, failure, or errors, enabling quick troubleshooting. Advanced features allow users to filter jobs by status or submission time, ensuring efficient management of multiple workflows. The portal also provides tools to cancel or re-submit jobs as needed, enhancing productivity for users of all skill levels.
File Editing and Remote Access
The Expanse User Portal offers robust tools for file editing and remote access, enabling seamless interaction with your files. Users can edit files directly within the portal using an integrated text editor, which supports syntax highlighting for various programming languages. Remote access options include SSH connectivity, allowing users to securely connect to Expanse from their local machines. The portal also supports file transfer protocols like SFTP and SCP, ensuring safe and efficient file management. Additionally, users can access and modify files stored on Expanse’s storage systems, including home directories and project spaces. These features streamline workflows, making it easy to manage and edit files without the need for additional software installations, thus enhancing overall user productivity and flexibility.
Advanced Features
Expanse offers advanced capabilities like Cloud Bursting, composable systems, and job scheduling optimization, enabling flexible resource allocation and enhanced performance for complex computational tasks and workflows.
Cloud Bursting
Cloud Bursting on Expanse allows seamless integration with cloud resources, enabling users to dynamically extend their compute capacity beyond the on-premises HPC cluster. This feature is particularly useful for handling peak workloads or specialized tasks that require additional resources. By leveraging cloud infrastructure, users can access a scalable and flexible environment while maintaining consistency with their existing workflows. The system automatically manages resource allocation, ensuring efficient utilization of both on-premises and cloud-based resources. This capability enhances overall system performance and provides a cost-effective solution for meeting variable computational demands without compromising on functionality or user experience.
Composable Systems
Composable Systems on Expanse enable dynamic allocation of resources, allowing users to tailor environments to specific workloads. This feature provides flexibility by enabling the creation of custom configurations from available resources, such as GPUs, storage, and network components. Composable Systems optimize resource utilization, reducing waste and improving efficiency. They support a wide range of applications, from high-performance computing to data analytics, by adapting to varying workload requirements. This capability ensures that users can maximize performance and scalability while maintaining ease of use. Composable Systems are a key feature of Expanse, enhancing its versatility for diverse computational demands.
Job Scheduling and Optimization
Expanse utilizes a robust job scheduling system to optimize resource allocation and workflow efficiency. The Slurm Workload Manager is employed to handle job submissions, ensuring fair sharing and prioritization of resources. Users can submit batch jobs via Slurm’s sbatch
command, specifying parameters like node count, time limits, and memory requirements. To optimize performance, users are encouraged to accurately estimate resource needs and utilize appropriate queues. Additionally, Expanse supports job arrays for submitting multiple similar jobs, reducing overhead. Proper job scheduling ensures efficient system utilization, enabling researchers to achieve faster turnaround times and better resource allocation. This system is designed to maximize productivity while maintaining equitable access to computing resources.
Environment Modules
The Expanse system utilizes the Lmod module system to manage software environments, enabling users to dynamically modify their environment to access specific software packages. This system allows for efficient management of conflicting software versions and dependencies. Key commands include module list
to view loaded modules, module load
to add software, and module unload
to remove it. Users can also discover available modules using module spider
. This approach ensures that users can tailor their environment to their specific needs without affecting others. Proper use of modules enhances workflow efficiency and consistency, making it easier to reproduce computational results. Always check available modules to optimize your workflow on Expanse.
Maintenance and Support
Expanse undergoes regular system maintenance to ensure optimal performance. Scheduled downtimes are communicated in advance, allowing users to plan accordingly and minimize workflow disruptions.
Regular System Maintenance
Expanse undergoes routine maintenance to ensure system stability and performance. Scheduled maintenance occurs quarterly, with downtime typically lasting 8-12 hours; Users receive advance notification via email and the portal. Updates include software patches, hardware optimizations, and security enhancements. Emergency maintenance may be required for critical issues, with notices provided as early as possible. The system is designed with redundancy to minimize unplanned outages. Maintenance ensures Expanse remains secure, efficient, and capable of supporting demanding workloads, aligning with its role as a high-performance computing resource.
Troubleshooting Common Issues
Common issues on Expanse include login problems, file transfer errors, and job submission failures. For login issues, verify your credentials and ensure you’re using the correct login node (login.expanse.sdsc.edu). If files aren’t transferring, check your network connection and confirm storage quotas. Job failures often result from incorrect resource requests or invalid script syntax. Review job logs for error messages and adjust submissions accordingly. For persistent issues, consult the Expanse user guide or contact HPC support via the portal. Regularly checking system status updates and maintenance schedules can also help prevent unexpected problems. Troubleshooting steps are documented in the user guide to ensure smooth operation and minimize downtime.
Accessing Support Resources
Accessing support resources for Expanse is straightforward and designed to assist users effectively. The SDSC support portal offers comprehensive documentation, including user guides and troubleshooting tips. Users can submit help requests directly through the portal or contact HPC support via email at help@sdsc.edu. Additionally, the Expanse User Portal provides quick links to training materials, system status updates, and user guides. Regularly updated FAQs and knowledge bases are available to address common issues. For urgent matters, users can contact the SDSC Help Desk at (858) 534-5000. These resources ensure that users can resolve issues promptly and make the most of Expanse’s capabilities.
Expanse is a powerful HPC resource supported by extensive documentation and expert assistance. For additional guidance, visit the SDSC website and explore user guides, tutorials, and FAQs.
Final Thoughts
Expanse is a powerful and versatile HPC system designed to meet the demands of modern research and computation. With its robust architecture, user-friendly portal, and advanced features like Cloud Bursting and composable systems, Expanse empowers users to tackle complex tasks efficiently. The system’s scalability, combined with its extensive support resources, makes it an ideal choice for researchers and engineers. By leveraging Expanse’s capabilities, users can achieve high-performance results while benefiting from a seamless and intuitive computing environment. For further assistance, the SDSC website offers comprehensive guides, tutorials, and troubleshooting tips to ensure optimal use of the system.
Additional Resources and Links
For further learning, visit the San Diego Supercomputer Center (SDSC) website, which offers comprehensive guides, tutorials, and troubleshooting tips. The Expanse User Portal provides direct access to system documentation and support. Additionally, Meditech Expanse Training Videos are available for healthcare professionals. The SDSC also hosts ACCESS resources for advanced computing needs. For hardware specifics, refer to the Dell and SDSC technical summary. Users can also explore community forums and knowledge bases for peer support and updates. Ensure to review the SDSC Staff Directory for contact information and assistance. These resources collectively enhance your Expanse experience and problem-solving capabilities.