Quiz: HPC Power Monitoring (Episodes 0 & 1)¶
Episode 0: Power Monitoring Introduction¶
Multiple choice, single answer
Energy vs Power Fundamentals:
What is the fundamental relationship between energy and power?
A) Energy is the rate at which power is consumed
B) Power is the rate at which energy is consumed
C) They are the same thing measured differently
D) Power is always zero if energy is constant
If a system consumes an average power of 50 kW for 2 hours, how much energy is consumed?
A) 25 kWh
B) 50 kWh
C) 100 kWh
D) 200 kWh
Measurement Approaches:
Why do component-level measurements (like RAPL) become unreliable for short jobs?
A) The CPU is too slow at short durations
B) They only capture partial node power, missing unmonitored components
C) Short jobs generate no heat
D) Linux cannot track short-running processes
Power Monitoring Hierarchy:
At which level of the monitoring hierarchy would you find CPU, memory, and NIC measurements?
A) Facility level
B) Rack level
C) Node-component level
D) PDU level
What is the main benefit of hierarchical power monitoring?
A) It costs less money
B) It provides different insights at different granularities for optimization
C) It eliminates the need for precise measurements
D) It requires no specialized hardware
In-Band vs Out-of-Band Monitoring:
Which monitoring approach provides real-time power data accessible to applications?
A) Out-of-band monitoring
B) In-band monitoring
C) PDU-based monitoring
D) Post-execution reporting
What is a key disadvantage of out-of-band monitoring?
A) High overhead on application execution
B) Cannot measure CPU power
C) No real-time feedback during job execution
D) Works only on Intel processors
Power Baseline Challenges:
What is the “measurement gap” in power monitoring?
A) The difference between wall power and utility-reported power
B) The gap between measured component power and actual total node power
C) The time delay in reading RAPL counters
D) The thermal margin on CPUs
Why must power baselines be system-specific?
A) It’s a vendor conspiracy to lock in customers
B) Different hardware, firmware, and environmental factors affect unmeasured component power consumption
C) Each data center is at a different altitude
D) BIOS updates always change power consumption
Conceptual questions
Energy Attribution Problem: RAPL reports that your CPU consumed 50 Joules during a computation, but node-level PDU measurements show 200 Joules. Where did the other 150 Joules go? List at least 3 possible components.
Measurement Strategy: You need to monitor power consumption of jobs on an HPC system for energy-aware scheduling. Would you choose in-band or out-of-band monitoring? Justify your choice considering overhead, accuracy, and real-time requirements.
Monitoring Hierarchy Design: Design a three-level power monitoring hierarchy for an HPC data center with 1000 nodes. Specify what gets measured at each level and why.
Episode 1: Power Monitoring Systems¶
Multiple choice, single answer
HDEEM (Bull/Atos):
What is the sampling frequency of HDEEM blade-level monitoring?
A) 100 Hz
B) 1 kHz
C) 10 Hz
D) 100 kHz
Which domains does HDEEM monitor at 100 Hz?
A) Only CPU power
B) CPU, DRAMs, NIC, VAUX
C) Only memory power
D) System-level power aggregates
What is HDEEM’s measurement accuracy uncertainty?
A) ±10%
B) ±5%
C) ±2%
D) ±0.5%
Intel RAPL:
Which RAPL domain is specific to server architectures and disabled by default?
A) Package
B) PP0/Core
C) DRAM
D) PSys/Platform
What are the two time windows in the Intel RAPL Package domain?
A) 1 ms and 10 ms
B) 100 ms and 1 second
C) 1.2× TDP (~ms) and 1× TDP (~second)
D) 1 second and 10 seconds
Which Intel architecture introduced the PSys/Platform domain?
A) Haswell
B) Broadwell
C) Skylake
D) Kaby Lake
AMD RAPL:
How does AMD RAPL differ from Intel RAPL in terms of power capping?
A) AMD supports more fine-grained capping
B) AMD provides read-only energy reporting, not power capping
C) AMD RAPL is faster
D) AMD RAPL works on client CPUs only
What is unique about AMD’s Core power domain?
A) It doesn’t exist
B) It provides per-core granularity
C) It’s not accessible to users
D) It only works with special firmware
Fujitsu A64FX:
What is exceptional about the A64FX’s sampling frequency?
A) 1 kHz like most systems
B) 100 kHz
C) Every cycle of the domain (cycle-accurate)
D) 10 Hz
Which cache level is separately monitored on A64FX?
A) L1 only
B) L2/LLC
C) L3 only
D) All cache levels combined
NVIDIA GPU Monitoring:
What does NvmlDeviceGetTotalEnergyConsumption() return?
A) Instantaneous power in Watts
B) Cumulative energy since GPU power-on in millijoules
C) Estimated energy based on clock frequency
D) Memory bandwidth in GB/s
Which power domain represents memory subsystem power in NVIDIA GPUs?
A) GPU_POWER
B) MEMORY_POWER
C) MODULE_POWER
D) NVML_POWER
AMD GPU Monitoring:
Which user groups must access GPU power data with AMD SMI?
A) root only
B) Any user
C) video and render groups
D) Admin and power-users
How does AMD SMI’s energy counter precision compare to NVIDIA?
A) More precise
B) Similar (±5-10%)
C) Much less precise
D) Cannot be compared
NVIDIA GRACE CPU:
What interface does NVIDIA GRACE CPU use for power monitoring?
A) RAPL via MSR
B) NVIDIA-proprietary API
C) Linux HWMON via sysfs
D) Custom kernel module
What is the sampling window for HWMON power measurements on GRACE?
A) 1-10 ms
B) 50-1000 ms
C) 1-10 seconds
D) 10-100 seconds
NVIDIA GRACE HOPPER:
How is the GPU power domain calculated in Grace Hopper?
A) Direct hardware measurement like RAPL
B) Derived by subtraction: Module - Grace
C) Not measured at all
D) Estimated from frequency scaling
What does the Module domain encompass in Grace Hopper?
A) CPU and GPU separately
B) CPU, GPU, and all interconnect power
C) GPU only
D) System IO only
Power Baseline Theory:
What is a characteristic of unmeasured component power in HPC nodes?
A) It’s always negligible
B) It varies significantly with workload type and intensity
C) It’s the same for all systems
D) It can be calculated from TDP alone
For jobs lasting only a few seconds, why are energy measurements problematic?
A) CPUs don’t consume power for short jobs
B) Short jobs are always energy-efficient
C) The unmeasured component baseline becomes significant relative to job energy
D) Measurement systems don’t work with short jobs
Coding and analysis questions
RAPL Energy Calculation: Assume you read MSR_PKG_ENERGY_STATUS at time t₀ = 0s and get E₀ = 0x12345678. At time t₁ = 60s, you read E₁ = 0x87654321. The MSR_RAPL_POWER_UNIT register shows energy_unit_multiplier = 61e-3 (Joules per unit).
a) Calculate the energy consumed (accounting for 32-bit wraparound if needed)
b) Compute average power during this interval
c) What happens if wraparound occurs? Show the calculation
Power Baseline Determination: You measure a node under baseline (idle): 40W for CPU, 20W for memory, 100W total node power. After calibration with synthetic load: 150W for CPU, 80W for memory, 350W total node power.
a) Calculate unmonitored component power in both states
b) Develop a simple linear model: P_unmonitored = f(P_cpu, P_mem)
c) Predict total node power for CPU=120W, Mem=60W
GPU vs CPU Power Comparison: Write pseudocode to:
a) Query GPU power using NVIDIA NVML: NvmlDeviceGetPowerUsage()
b) Query CPU power using Intel RAPL from sysfs
c) Compare and report which component is consuming more power
d) Handle errors (missing sensors, permission issues)
Grace Hopper Domain Analysis: Given Grace Hopper power readings:
Module: 300W
Grace: 200W (CPU + SysIO + DRAM)
CPU cores: 80W
SysIO: 40W
DRAM: 80W
a) Verify the Grace domain equation
b) Calculate GPU power
c) Compute efficiency ratio: (CPU + GPU) / (Total Module)
d) Identify which component is wasting power as infrastructure overhead
Baseline Comparison: Compare baselines from two systems: LUMI (European supercomputer) and Karolina (Czech supercomputer) using the visualization data in Episode 1.
a) Identify why baseline power differs between accelerated (ACN) and compute (CN) nodes
b) Estimate the cost difference of running identical jobs on ACN vs CN nodes
c) Propose an optimization strategy based on baseline analysis
RAPL Counter Management: Design a monitoring daemon that:
a) Reads RAPL counters every 10 seconds
b) Detects and handles counter wraparound (32-bit overflow)
c) Computes instantaneous power from energy deltas
d) Logs results to a database for post-execution analysis Pseudocode is sufficient.
System-Specific Baseline Model: Given the following empirical data for a compute node:
Workload Type |
Measured Power (CPU+GPU) [W] |
Total Node Power [W] |
Unmeasured [W] |
|---|---|---|---|
Idle |
20 |
80 |
60 |
Synthetic (50% load) |
100 |
220 |
120 |
Synthetic (100% load) |
180 |
400 |
220 |
Real HPC app (50%) |
95 |
210 |
115 |
Real HPC app (100%) |
175 |
390 |
215 |
a) Is the unmeasured power linear or non-linear with measured power?
b) Fit a model: P_unmeasured = a + b × P_measured
c) Predict unmeasured power for measured=120W
d) What sources could cause non-linearity?
Coding questions
Generate a 1D NumPy array of 1 million random floats. Compute the square root of each element using:
a) a Python for loop
b) NumPy’s vectorized np.sqrt
Load a CSV file of weather data (e.g., temperature, humidity, wind).
a) filter rows where temperature > 30°C
b) compute the average humidity for each month using
groupby
Create a random 100×100 matrix A and a vector b.
a) use
scipy.linalg.solveto solve the system $Ax = b$b) verify the solution by checking the residual norm
Simulate a DataFrame with missing values in numerical columns.
a) fill missing values with the column mean (using NumPy)
b) compute basic statistics before and after imputation
Generate noisy data for a quadratic function $y = ax² + bx + c$
a) use
scipy.optimize.curve_fitto fit the data and recover the original parametersb) plot the original vs fitted curve