HPC HW POWER MONITORING: POWER MONITORING SYSTEMS

HIGH DEFINITION ENERGY EFFICIENCY MONITORING (HDEEM)

Overview

HDEEM is an out-of-band power monitoring technology developed by Bull (now Atos) and integrated into their production HPC systems. It provides comprehensive, high-frequency power measurements across all major hardware components without imposing computational overhead on user applications.

Deployment Context

HDEEM is available on:

  • Bull Sequana - Atos’s modern HPC system family

  • Bullx B7xx series - Previous generation systems

  • Integrated directly into the motherboard firmware (BMC - Baseboard Management Controller)

Power Measurement Domains

HDEEM captures power at multiple granularity levels:

Blade-Level Measurement (1 kHz sampling):

  • Entire blade power consumption

  • Captures total power across all components

  • Suitable for system-level accounting and facility management

Voltage Regulator (VR) Monitoring (100 Hz sampling):

  • CPU - Processor power rails

  • DRAMs - Memory subsystem power

  • NIC* - Network interface power (where available)

  • VAUX* - Auxiliary power rails (where available)

The higher granularity at 100 Hz allows detailed component-level analysis while the 1 kHz blade measurement captures transient behavior.

Accuracy and Reliability

Measurement Uncertainty: ±2%

  • This is exceptional accuracy for production systems

  • Enables reliable energy accounting and charge-back models

  • Suitable for research requiring high-precision measurements

Software Interface

HDEEM provides both C library APIs and command-line utilities for data collection:

Command-line Tools:

  • startHdeem - Initiate power measurement collection

  • stopHdeem - End measurement and store data

  • checkHdeem - Verify measurement status

  • printHdeem - Export collected data in human-readable format

  • clearHdeem - Reset measurement buffers

Integration Model:

  • Out-of-band collection (no application overhead)

  • Post-execution data retrieval

  • Scriptable for automated monitoring workflows

../../_images/11-RAPL-domains.png
../../_images/rapl_register.png

INTEL RUNNING AVERAGE POWER LIMIT (RAPL)

Architecture Overview

RAPL is Intel’s in-band power monitoring and power capping mechanism built into modern x86 processors. It provides per-domain power measurement and enforcement capabilities at the hardware level, enabling both measurement and active power management.

Access Interface

Linux Interface:

/sys/devices/virtual/powercap/intel-rapl/intel-rapl:X/intel-rapl:0:Y

This sysfs interface allows user-space tools to:

  • Read energy consumption counters

  • Query power limits

  • Set new power caps (with appropriate privileges)

  • Monitor domain-specific measurements

Power Domains

RAPL defines multiple independent power control domains, each with distinct characteristics:

Package Domain

  • Encompasses entire CPU package: cores + uncore components (caches, memory controller, interconnects)

  • Dual Time Window Architecture:

    • Short window: 1.2 × TDP, ~milliseconds resolution (captures transient peaks)

    • Long window: 1 × TDP, ~second resolution (enforces sustainable power budget)

  • Use case: Preventing thermal runaway and managing peak power draw

DRAM Domain

  • Controls memory subsystem power consumption

  • Availability: Server architectures only (not available on client CPUs)

  • Single time window - Simpler control model than Package domain

  • Default State: Disabled by default; must be explicitly enabled

  • Use Cases:

    • Memory-intensive workload characterization

    • P-State scaling coordination

    • Data center power budgeting

PP0 / Core Domain

  • Restricts power limits to CPU cores only (excluding uncore)

  • Single time window - Fixed control window

  • Modern Availability: Not available on newer server architectures (implementation dropped)

  • Use case: Disaggregating core vs. uncore power contributions

PP1 / Graphics Domain

  • Controls integrated GPU power (iGPU on client processors)

  • Server Availability: Not applicable - no integrated graphics on server CPUs

  • Single time window

  • Use case: Laptop/desktop power optimization

PSys / Platform Domain (Skylake and newer)

  • Controls entire System-on-Chip power

  • Dual time window (short + long, like Package domain)

  • Architecture Requirement: Skylake and newer Intel architectures

  • Vendor Support: Requires explicit firmware support; not universally available

  • Use case: System-wide power budgeting across all domains

Domain Relationships

These domains form a hierarchy: $$P_{\text{Package}} = P_{\text{PP0}} + P_{\text{PP1}} + P_{\text{Uncore}}$$

For older CPUs with distinct domains, the relationship enables component-level analysis.

INTEL RAPL: MSR REGISTERS AND ENERGY ACCOUNTING

Model-Specific Registers (MSRs)

RAPL measurement and control are implemented through dedicated CPU model-specific registers (MSRs) that require privileged access. Applications access these registers through the msr kernel module on Linux or specialized libraries.

Power Limit Configuration

MSR_PKG_POWER_LIMIT (0x610) - Package Power Cap Settings

  • Defines both short-term and long-term power limits for the CPU package

  • Structure contains two independent limit windows

  • Controls hardware power throttling mechanisms

  • Enables dynamic power budget enforcement

Unit Conversion MSRs

Energy measurements from RAPL counters are stored in raw hardware units that must be converted to physical units using calibration factors:

MSR_RAPL_POWER_UNIT (0x606) - Unit Conversion Factors Contains conversion multipliers for:

  • Power units - Converts raw counts to Watts

  • Energy units - Converts raw counts to Joules

  • Time units - Converts time window encodings to seconds

These are system-specific constants determined at CPU design time and stored in read-only register fields.

Energy Status Registers

Each domain maintains an energy accumulator that wraps around periodically. Reading the register reports total energy consumed since last wrap-around or CPU reset.

Domain Energy Counters:

  • MSR_PKG_ENERGY_STATUS (0x611) - CPU Package energy (cores + uncore)

  • MSR_DRAM_ENERGY_STATUS (0x619) - Memory subsystem energy

  • MSR_PP0_ENERGY_STATUS (0x639) - Core-only energy (excludes uncore)

  • MSR_PP1_ENERGY_STATUS (0x641) - Graphics domain energy

  • MSR_PLATFORM_ENERGY_COUNTER (0x64D) - Entire SoC energy (Skylake+)

Energy Accounting Method

Energy consumption is computed by:

  1. Baseline Reading: Read energy counter at time $t_0$ $$E_0 = \text{read}(\text{MSR_PKG_ENERGY_STATUS})$$

  2. Post-Execution Reading: Read counter at time $t_1$ $$E_1 = \text{read}(\text{MSR_PKG_ENERGY_STATUS})$$

  3. Raw Energy Difference: $$\Delta E_{\text{raw}} = E_1 - E_0 \text{ (with wraparound handling)}$$

  4. Convert to Joules: $$E_{\text{Joules}} = \Delta E_{\text{raw}} \times \text{unit_factor}(\text{MSR_RAPL_POWER_UNIT})$$

Practical Considerations

Counter Wraparound:

  • Energy counters are typically 32-bit fields and wrap around

  • Wraparound interval typically 60-100 seconds depending on power draw

  • Monitoring software must poll frequently enough to detect wraparound

Accuracy Limitations:

  • RAPL estimates energy based on performance counters and power models

  • Typical accuracy: ±5-10% compared to external power meters

  • Estimates vary with workload characteristics and CPU frequency scaling

Reference: Haidar et al: “Investigating power capping toward energy-efficient scientific applications” - Foundational work on RAPL behavior and characterization

Haidar et al: Investigating power capping toward energy-efficient scientific applications

../../_images/12-MSR-energy.png
../../_images/12-MSR-power-limit.png
../../_images/12-power-limit.png

AMD RAPL

Compatibility and Design

AMD implements a compatible but simplified variant of Intel’s RAPL interface. The key differences reflect AMD’s architectural choices and market positioning:

Compatibility Layer:

  • Same sysfs interface as Intel RAPL for software compatibility

  • Allows existing monitoring tools to work across vendors with minimal modification

  • Reduces software development and testing burden for HPC centers

Operational Scope

Energy Reporting Only:

  • AMD RAPL is read-only for energy consumption measurement

  • Does NOT support power capping - no enforcement mechanism

  • Fundamental architectural difference from Intel’s dual measurement+control design

This reflects AMD’s philosophy of letting the operating system and BIOS handle power management rather than exposing hardware capping mechanisms.

Power Domains

Package Domain (PKG)

  • Encompasses all in-socket components:

    • CPU cores (all cores combined)

    • IO die containing I/O controllers and interconnects

    • Other socket-level components

  • Monolithic measurement - no separation between core and uncore like older Intel systems

  • Reflects AMD Zen architecture with integrated IO die design

Core Domain

  • Per-core granularity - Each CPU core can be measured independently

  • Modern Zen architecture provides core-level power instrumentation

  • More detailed than Intel’s package-level approach for core analysis

  • Use case: Identifying power imbalance across cores, vectorization efficiency

Architectural Implications

AMD’s integrated IO die design eliminates the traditional core/uncore split:

  • Can’t separately measure memory controller power (integrated into IO die)

  • Can’t separately measure interconnect power

  • Simplifies hardware design but reduces measurement granularity

Software Support

Access through the same Linux sysfs interface as Intel, enabling:

  • Tool portability across AMD and Intel systems

  • Unified monitoring scripts

  • Vendor-neutral application code

Reference: [1]: https://github.com/amd/amd_energy/issues/1 - AMD Energy driver implementation details

../../_images/13-MSR-AMD.png

FUJITSU A64FX

Architecture Context

The Fujitsu A64FX is a custom ARM-based processor designed specifically for HPC, featuring:

  • ARM SVE (Scalable Vector Extension) for floating-point performance

  • High memory bandwidth optimized for scientific computing

  • Purpose-built power monitoring infrastructure

This processor powers Fugaku, one of the world’s leading supercomputers.

Power Measurement Architecture

Unlike x86 systems dominated by Intel/AMD, the A64FX implements a highly granular, cycle-accurate power monitoring system tailored to scientific workload analysis.

Power Domains

Core Domain

  • Individual processor core power consumption

  • Per-core instrumentation enables load balancing analysis

  • Identifies vectorization efficiency by core behavior

Memory Domain - Core Memory Group (CMG)

  • Local memory subsystem per core group

  • Captures memory hierarchy power (hierarchical memory design)

  • Reflects Fujitsu’s multi-level memory architecture

  • Important for data-intensive workload profiling

L2 Cache Domain (LLC)

  • Level 2 cache (Last-Level Cache in the core complex)

  • Separate power measurement for cache hierarchy

  • Enables cache-focused optimization studies

  • Critical for bandwidth-bound application analysis

Sampling Characteristics

Cycle-Accurate Measurement:

  • Sampling Frequency: Every cycle of the domain

  • Essentially continuous measurement with zero delay

  • Unprecedented temporal resolution compared to x86 systems

  • Enables detailed transient power behavior analysis

Implications for Research:

  • Can correlate power spikes with exact instruction sequences

  • Identify power efficiency at instruction level

  • Characterize vectorization overhead and efficiency

  • Fine-grained workload power signature analysis

Use Cases

  1. Vectorization Analysis - Understand SVE instruction efficiency

  2. Memory Hierarchy Optimization - Optimize access patterns for memory power

  3. Power Model Development - Create detailed power models for HPC workloads

  4. Dynamic Power Management - Real-time frequency/voltage scaling decisions

TBD

TBD

NVIDIA GPU POWER MONITORING

NVIDIA Management Library (NVML)

NVML is NVIDIA’s primary API for GPU monitoring and management, providing programmatic access to GPU power metrics. It works across all NVIDIA GPU architectures (consumer and data center).

Power Measurement Functions

Total Energy Consumption:

NvmlDeviceGetTotalEnergyConsumption(device)
  • Returns cumulative energy consumed by GPU since power-on or reset

  • Measured in millijoules (mJ)

  • 32-bit counter that wraps around periodically

  • Use case: Post-execution energy accounting for batch jobs

Instantaneous Power Usage:

NvmlDeviceGetPowerUsage(device)
  • Current GPU power draw in milliwatts (mW)

  • Sampled at typical ~100ms intervals

  • Direct reading from GPU power management unit

  • Use case: Real-time power monitoring and dynamic adaptation

Multi-Field Power Query:

NvmlDeviceGetFieldValues(device, [GPU_POWER, MEMORY_POWER, ...])
  • Efficient bulk reading of multiple power domains simultaneously

  • Reduces API overhead compared to individual calls

  • Returns vector of power values in one operation

Power Domains

GPU_POWER

  • Compute unit power (shader cores, tensor cores)

  • Represents arithmetic compute resource power

MEMORY_POWER (sub-domain)

  • HBM (High Bandwidth Memory) or GDDR memory subsystem

  • Includes memory controllers and interconnects

  • Distinct measurement from compute power

MODULE_POWER (Grace Hopper, Grace Blackwell)

  • CPU+GPU co-processor module power

  • Captures heterogeneous compute power

  • New domain for Grace architecture

Command-Line Interface

Quick Power Check:

nvidia-smi --query-gpu=power.draw --format=csv

Outputs:

  • Current power consumption per GPU

  • Format: comma-separated values (one per GPU)

  • Useful for shell scripts and monitoring loops

Example Output:

power.draw
150.00 W
142.00 W

Architectural Notes

  • NVML works across all GPU types: discrete, data center, consumer

  • Power measurements are estimates based on hardware models

  • Typical accuracy: ±5-10% (similar to RAPL)

  • Sampling resolution: ~100ms typical interval

  • No energy counter wraparound with modern NVIDIA drivers

Limitations

  • Cannot access per-core or per-SM (Streaming Multiprocessor) power

  • Limited to GPU power; system interconnect power not measured

  • Requires NVIDIA driver with NVML support

  • GPU-only monitoring; does not include host CPU power

  • Command-line utility

    • Instant power: nvidia-smi –query-gpu=power.draw –format=csv

AMD GPU POWER MONITORING

AMD System Management Interface (AMD SMI)

AMD provides AMD SMI (formerly AMD ROCm SMI) as their GPU monitoring and management API, mirroring NVIDIA’s NVML approach but with AMD-specific implementations.

API Functions

Total Energy Consumption:

amdsmi_get_energy_count(device)
  • Cumulative energy consumed by GPU

  • Energy value in joules (J)

  • Counter behavior similar to NVIDIA’s energy counter

  • Use case: Batch job energy accounting

Current Power Usage:

amdsmi_get_power_info(device)
  • Instantaneous power draw

  • Returns power value and timestamp

  • More detailed than simple instant power reading

  • Includes power limit information

Access Control

Group Membership Requirements: To access GPU power metrics, users must be members of:

  • video group - Access to GPU hardware

  • render group - Access to rendering/compute capabilities

This security model:

  • Prevents unprivileged users from querying GPU power (privacy)

  • Allows HPC centers to control who can monitor hardware

  • Differs from NVIDIA’s approach which may allow broader access

Command-Line Interface

Instant Power Query:

amd-smi metric --power

Outputs current power consumption across all AMD GPUs accessible to user.

Typical Output:

GPU[0]: 120 W
GPU[1]: 135 W

Comparison with NVIDIA

Aspect

AMD SMI

NVIDIA NVML

API

amdsmi_* functions

Nvml* functions

Energy

amdsmi_get_energy_count

NvmlDeviceGetTotalEnergyConsumption

Power

amdsmi_get_power_info

NvmlDeviceGetPowerUsage

Access Control

video, render groups

Depends on driver/OS

Multi-domain

Limited

Memory_power domain

Granularity

GPU-level

GPU + Memory domains

Implementation Notes

  • AMD SMI maintains API compatibility where possible with NVIDIA’s NVML

  • Power measurements use GPU firmware counters and power models

  • Sampling resolution: ~millisecond range (hardware-dependent)

  • Supports AMD RDNA and CDNA GPU architectures

  • Limited domain separation compared to NVIDIA’s multi-domain support

  • Command-line utility

    • Instant power: amd-smi metric –power

NVIDIA GRACE CPU POWER MONITORING

Architecture Context

NVIDIA GRACE is a high-performance ARM-based CPU designed for HPC and data centers. Unlike NVIDIA’s GPU-focused NVML library, GRACE CPU power monitoring uses Linux’s standard HWMON (Hardware Monitoring) interface, reflecting its role as a CPU rather than a co-processor.

HWMON Interface

Linux HWMON provides standardized hardware monitoring through sysfs:

Power Measurement Files:

/sys/class/hwmon/hwmon*/device/power1_average
/sys/class/hwmon/hwmon*/device/power1_average_interval
  • Generic, vendor-agnostic interface

  • Works across different CPU types and manufacturers

  • Text-based file I/O for easy integration

  • Kernel-level access (typically requires root or special permissions)

Measurement Characteristics

Sampling Window:

  • Interval: 50-1000 milliseconds (configurable via power1_average_interval)

  • Longer than RAPL’s sub-millisecond resolution

  • Trade-off: Lower overhead but coarser temporal resolution

Reported Value:

  • Average power over the measurement window

  • NOT energy accumulation (no counter)

  • Each read gives mean power during interval: $P_{\text{avg}} = \frac{\text{Energy}}{\text{Interval}}$

Calculation for Energy: To compute energy from average power readings: $$E = P_{\text{avg}} \times \Delta t$$

Where $\Delta t$ is the measurement window duration.

Power Domains

GRACE partitions CPU power into distinct measurement domains:

Grace Domain (Total)

  • Entire CPU package power

  • Encompasses all sub-domains

  • Reference point for other measurements

CPU Domain

  • Compute cores and core-local resources

  • Represents active computation power

SysIO Domain

  • System IO controllers and interconnects

  • Memory controllers

  • Off-CPU infrastructure power

DRAM Domain

  • Memory subsystem power

  • Main memory and cache hierarchy

Domain Relationships

The domains maintain a hierarchical accounting relationship: $$P_{\text{Grace}} = P_{\text{CPU}} + P_{\text{SysIO}} + P_{\text{DRAM}}$$

This decomposition enables:

  • Identifying whether power consumption is compute-bound or memory-bound

  • Understanding infrastructure overhead

  • Optimizing specific subsystems

Practical Implications

Measurement Overhead:

  • HWMON is lower overhead than RAPL on some systems

  • 50-1000ms window trades temporal resolution for simplicity

  • Suitable for longer-running jobs where millisecond precision unnecessary

Integration:

  • No special library required (plain sysfs reading)

  • Shell scripts can directly monitor: cat /sys/class/hwmon/.../power1_average

  • Enables easy integration into existing monitoring infrastructure

../../_images/17-hwmon-error.png
../../_images/18-hwmon-grace.png

NVIDIA GRACE HOPPER / GRACE BLACKWELL

Heterogeneous Architecture

NVIDIA’s Grace Hopper (and upcoming Grace Blackwell) represent a paradigm shift: CPU+GPU co-processors integrated on a single module. This creates unprecedented challenges and opportunities for power monitoring.

Unified Module Architecture

Module-Level Design:

  • Single coherent computing system combining CPU and GPU

  • Shared memory hierarchy and fast interconnects

  • Requires unified power monitoring across heterogeneous domains

  • Enables genuine heterogeneous computing (not discrete GPU+CPU)

Power Domains

Grace Hopper power monitoring exposes multiple layers of domain decomposition:

Module Domain (Top-level)

  • Total power consumption of entire Grace Hopper co-processor

  • Includes all CPU, GPU, and interconnect power

  • Reference point for system-level accounting

Grace CPU Domain

  • CPU cores + SysIO + DRAM combined

  • Represents CPU-side power consumption

  • Derived from Grace CPU’s three-component model

CPU Domain

  • CPU compute cores only (cores themselves)

  • Excludes system infrastructure

  • Fine-grained CPU analysis

SysIO Domain

  • CPU-side system controllers and interconnects

  • Includes chipset and interconnect logic

GPU Domain

  • Hopper GPU accelerator power (Module - Grace)

  • Derived by subtraction from total module power

  • Represents heterogeneous accelerator consumption

Domain Hierarchy and Accounting

$$P_{\text{Module}} = P_{\text{Grace}} + P_{\text{GPU}}$$

$$P_{\text{Grace}} = P_{\text{CPU}} + P_{\text{SysIO}} + P_{\text{DRAM}}$$

Measurement Access

Dual Interface Support:

  1. HWMON (Hardware Monitoring)

    • Linux standard interface via sysfs

    • Available for all domains

    • Lower-level, kernel-integrated access

  2. NVML (NVIDIA Management Library)

    • NVIDIA’s high-level API

    • More refined data structures

    • Better integration with NVIDIA ecosystem tools

    • Same functions as discrete GPUs but with CPU awareness

Use Cases

Workload Characterization:

  • Identify if bottleneck is CPU or GPU: compare $P_{\text{CPU}}$ vs $P_{\text{GPU}}$

  • Understand heterogeneous load balance

  • Optimize task distribution across CPU and GPU

Power Budgeting:

  • Allocate power budgets to CPU and GPU independently

  • Prevent one component from monopolizing power budget

  • Enable dynamic load balancing under power constraints

System Efficiency Analysis:

  • Identify infrastructure overhead (SysIO power)

  • Optimize interconnect usage

  • Understand memory subsystem power contribution

Challenges

Attribution Complexity:

  • GPU domain often derived (subtraction) rather than directly measured

  • Potential accumulation of measurement error

  • More difficult to achieve high accuracy in heterogeneous systems

Interface Consistency:

  • HWMON and NVML may report slightly different values

  • Need careful validation and understanding of differences

  • Important for reproducible research

../../_images/19-hwmon-gracehopper.png

POWER MONITORING IN PRACTICE: LUMI SUPERCOMPUTER

Real-world power monitoring implementation requires balancing theoretical capabilities with practical constraints. The LUMI supercomputer (hosted in Finland) provides an instructive case study of how these monitoring systems are deployed in production HPC environments.

Integrated Monitoring Stack

LUMI combines multiple power monitoring technologies:

  • RAPL - CPU-side power measurement

  • NVIDIA NVML - GPU/accelerator power (where applicable)

  • HDEEM or vendor systems - Node-level baseline

  • PDU monitoring - Facility-level accounting

Real System Challenges

The following visualization shows actual power baseline data from LUMI:

../../_images/24-LUMI-baseline.png

Key observations from production data:

  • Variability in idle power baseline

  • Complex relationship between components

  • Need for system-specific calibration

POWER BASELINE

The Measurement Challenge

As discussed in detail in the introduction, power monitoring systems have inherent measurement gaps:

High Frequency Energy Measurement of Some Components (e.g., RAPL, NVML)

  • Provides detailed measurements of CPU and GPU power

  • Missing energy consumption of the remaining parts (memory controllers, interconnects, infrastructure)

Low Frequency Power Monitoring of the Whole Node (e.g., HDEEM, PDU)

  • Captures total node power

  • Unreliable energy measurement for short and medium length regions (high relative error)

Node Power Baseline Estimation

To estimate power consumption of non-monitored on-node components, researchers use empirical calibration:

Load the node with a uniform, reproducible workload (e.g., synthetic benchmark like LINPACK or STREAM), then develop an energy model that accounts for unmeasured components:

../../_images/25-LUMI-baseline-eq.png

Model Development Equation: $$P_{\text{node}} = P_{\text{measured}} + P_{\text{baseline}} + P_{\text{overhead}}$$

Where:

  • $P_{\text{measured}}$ = Component power from RAPL/NVML

  • $P_{\text{baseline}}$ = Idle system overhead

  • $P_{\text{overhead}}$ = Unmeasured component contribution

System-Specific Calibration

Critical Principle: Power baseline is system-specific and must be evaluated for each system individually because:

  1. Hardware Variation - Different processors, memory types, interconnects

  2. Firmware Differences - BIOS settings, power management policies

  3. Environmental Factors - Cooling efficiency, ambient temperature

  4. Workload Sensitivity - Unmonitored components respond differently to different workloads

Practical Implementation

Once baseline is established:

  • Use it consistently across all experiments on that system

  • Document baseline methodology for reproducibility

  • Re-evaluate if hardware changes are made

  • Account for seasonal variations in data center conditions

KAROLINA SUPERCOMPUTER: POWER BASELINE CASE STUDY

Real systems demonstrate the complexity and system-specificity of baseline determination. The Czech supercomputer Karolina provides a detailed real-world example with distinct node types requiring separate baseline analysis:

Node Types and Power Profiles

Karolina’s heterogeneous architecture includes different node configurations:

ACN - Accelerated Compute Nodes

  • Include GPU accelerators (typically NVIDIA)

  • Higher peak power consumption

  • Complex power relationships between CPU and GPU

CN - Compute Nodes

  • CPU-only nodes

  • Simpler power behavior

  • Different baseline characteristics

Baseline Determination Visualization

../../_images/26-baseline-Karolina-ACN.png
../../_images/26-baseline-Karolina-CN.png

Key Insights from Real Data

These visualizations demonstrate:

  1. Different Baselines per Node Type

    • ACN and CN nodes show distinctly different baseline power profiles

    • Cannot use single system-wide baseline

    • Requires node-type-aware monitoring

  2. Non-Linear Power Relationships

    • Power doesn’t scale linearly with load

    • Overhead varies with workload type

    • Multiple calibration points needed

  3. Infrastructure Overhead

    • Significant portion of idle power is infrastructure (cooling, power delivery)

    • Changes with system age and environmental conditions

    • Must be factored into charge-back models

Practical Implications for Karolina Users

  • Use ACN baseline for accelerated jobs

  • Use CN baseline for CPU-only jobs

  • Monitor outliers (jobs with unusually high power)

  • Re-calibrate seasonally or after hardware changes

  • Understand baseline uncertainty when reporting energy metrics

This real-world example illustrates why power monitoring in production HPC systems requires careful, ongoing calibration and validation beyond theoretical models.