Scroll to navigation

IPMCTL-START-DIAGNOSTIC(1) ipmctl IPMCTL-START-DIAGNOSTIC(1)

NAME

ipmctl-start-diagnostic - Starts a diagnostic test

SYNOPSIS

ipmctl start [OPTIONS] -diagnostic [TARGETS]

DESCRIPTION

Starts a diagnostic test.

OPTIONS

-h, -help

Displays help for the command.

-ddrt

Used to specify DDRT as the desired transport protocol for the current invocation of ipmctl.

-smbus

Used to specify SMBUS as the desired transport protocol for the current invocation of ipmctl.


Note

The -ddrt and -smbus options are mutually exclusive and may not be used together.

-lpmb

Used to specify large transport payload size for the current invocation of ipmctl.

-spmb

Used to specify small transport payload size for the current invocation of ipmctl.


Note

The -lpmb and -spmb options are mutually exclusive and may not be used together.

-o (text|nvmxml), -output (text|nvmxml)

Changes the output format. One of: "text" (default) or "nvmxml".

TARGETS

-diagnostic [Quick|Config|Security|FW]

Start a specific test by supplying its name. All tests are run by default. One of:

•"Quick" - This test verifies that the PMem module host mailbox is accessible and that basic health indicators can be read and are currently reporting acceptable values.

•"Config" - This test verifies that the BIOS platform configuration matches the installed hardware and the platform configuration conform to best known practices.

•"Security" - This test verifies that all PMem modules have a consistent security state. It is a best practice to enable security on all PMem modules rather than just some.

•"FW" - This test verifies that all PMem modules of a given model have consistent FW installed and other FW modifiable attributes are set in accordance with best practices.
Note that the test does not have a means of verifying that the installed FW is the optimal version for a given PMem module model just that it has been consistently applied across the system.

-dimm [DimmIDS]

Starts a diagnostic test on specific PMem modules by optionally supplying one or more comma separated PMem module identifiers. The default is to start the specified tests on all manageable PMem modules. Only valid for the Quick diagnostic test.

EXAMPLES

Starts all diagnostics.

ipmctl start -diagnostic

Starts the quick check diagnostic on PMem module 0x0001.

ipmctl start -diagnostic Quick -dimm 0x0001

LIMITATIONS

If a PMem module is unmanageable, then Quick test will report the reason, while Config, Security and FW tests will skip unmanageable PMem modules.

RETURN DATA

Each diagnostic generates one or more log messages. A successful test generates a single log message per PMem module indicating that no errors were found. A failed test might generate multiple log messages each highlighting a specific error with all the relevant details. Each log contains the following information.

Test

The test name along with overall execution result. One of:

•"Quick"

•"Config"

•"Security"

•"FW"

State

The collective result state for each test. One of:

•"Ok"

•"Warning"

•"Failed"

•"Aborted"

Message

The message indicates the status of the test. One of:

•"Ok"

•"Failed"

SubTestName

The subtest name for given Test.
Test Name Valid SubTest Names
Quick 4 • Manageability 4 • Boot status 4 • Health
Config 4 • PMem module specs 4 • Duplicate PMem module 4 • System Capability 4 • Namespace LSA 4 • PCD
Security 4 • Encryption status 4 • Inconsistency
FW 4 • FW Consistency 4 • Viral Policy 4 • Threshold check 4 • System Time

State

The severity of the error for each sub-test displayed with SubTestName. One of:

•"Ok"

•"Warning"

•"Failed"

•"Aborted"

Events are generated as a result of invoking the Start Diagnostics command in order to analyze the Intel® Optane™ PMem module for potential issues.

Diagnostic events may fall into the following categories:

•Quick health diagnostic test event

•Platform configuration diagnostic test event

•Security diagnostic test event

•Firmware consistency and settings diagnostic test event

Each event includes the following pieces of information:

•The severity of the event that occurred. One of:

•Informational (Info)

•Warning (Warning)

•Error (Failed)

•Aborted (Aborted)

•A unique ID of the item (PMem module UUID, DimmID, NamespaceID, RegionID, etc.) the event refers to.

•A detailed description of the event in English.

The following sections list each of the possible events grouped by category of the event.

Quick Health Check Events

The quick health check diagnostic verifies that the Intel® Optane™ PMem module’s host mailboxes are accessible and that basic health indicators can be read and are currently reporting acceptable values.

Table 1. Table Quick Health Check Events

Code Severity Message Arguments
500 Info The quick health check succeeded.
501 Warning The quick health check detected that PMem module [1] is not manageable because subsystem vendor ID [2] is not supported. UID: [3] 4 1. PMem module Handle 4 2. Subsystem Vendor ID 4 3. PMem module UID
502 Warning The quick health check detected that PMem module [1] is not manageable because subsystem device ID [2] is not supported. UID: [3] 4 1. PMem module Handle 4 2. Subsystem Device ID 4 3. PMem module UID
503 Warning The quick health check detected that PMem module [1] is not manageable because firmware API version [2] is not supported. UID: [3] 4 1. PMem module Handle 4 2. FW API version 4 3. PMem module UID
504 Warning The quick health check detected that PMem module [1] is reporting a bad health state [2]. UID: [3] 4 1. PMem module Handle 4 2. Actual Health State 4 3. PMem module UID
505 Warning The quick health check detected that PMem module [1] is reporting a media temperature of [2] C which is above the alarm threshold [3] C. UID: [4] 4 1. PMem module Handle 4 2. Actual Media Temperature 4 3. Media Temperature Threshold 4 4. PMem module UID
506 Warning The quick health check detected that PMem module [1] is reporting percentage remaining at [2]% which is less than the alarm threshold [3]%. UID: [4] 4 1. PMem module Handle 4 2. Actual Percentage Remaining 4 3. Percentage Remaining Threshold 4 4. PMem module UID
507 Warning The quick health check detected that PMem module [1] is reporting reboot required. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
511 Warning The quick health check detected that PMem module [1] is reporting a controller temperature of [2] C which is above the alarm threshold [3] C. UID: [4] 4 1. PMem module Handle 4 2. Actual Controller Temperature 4 3. Controller Temperature Threshold 4 4. PMem module UID
513 Error The quick health check detected that the boot status register of PMem module [1] is not readable. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
514 Error The quick health check detected that the firmware on PMem module [1] is reporting that the media is not ready. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
515 Error The quick health check detected that the firmware on PMem module [1] is reporting an error in the media. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
519 Error The quick health check detected that PMem module [1] failed to initialize BIOS POST testing. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
520 Error The quick health check detected that the firmware on PMem module [1] has not initialized successfully. The last known Major:Minor Checkpoint is [2]. UID: [3] 4 1. PMem module Handle 4 2. Major checkpoint : Minor checkpoint in Boot Status Register 4 3. PMem module UID
523 Error The quick health check detected that PMem module [1] is reporting a viral state. The PMem module is now read-only. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
529 Warning The quick health check detected that PMem module [1] is reporting that it has no package spares available. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
530 Info The quick health check detected that the firmware on PMem module [1] experienced an unsafe shutdown before its latest restart. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
533 Error The quick health check detected that the firmware on PMem module [1] is reporting that the AIT DRAM is not ready. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
534 Error The quick health check detected that the firmware on PMem module [1] is reporting that the media is disabled. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
535 Error The quick health check detected that the firmware on PMem module [1] is reporting that the AIT DRAM is disabled. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
536 Error The quick health check detected that the firmware on PMem module [1] failed to load successfully. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
538 Error PMem module [1] is reporting that the DDRT IO Init is not complete. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
539 Error PMem module [1] is reporting that the mailbox interface is not ready. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
540 Error An internal error caused the quick health check to abort on PMem module [1]. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
541 Error The quick health check detected that PMem module [1] is busy. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
542 Error The quick health check detected that the platform FW did not map a region to SPA on PMem module [1]. ACPI NFIT NVPMem module State Flags Error Bit 6 Set. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
543 Error The quick health check detected that PMem module [1] DDRT Training is not complete/failed. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
544 Error PMem module [1] is reporting that the DDRT IO Init is not started. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
545 Error The quick health check detected that the ROM on PMem module [1] has failed to complete initialization, last known Major:Minor Checkpoint is [2]. 4 1. PMem module Handle 4 2. Major checkpoint : Minor checkpoint in Boot Status Register 4 3. PMem module UID

Platform Configuration Check Events

This diagnostic test group verifies that the BIOS platform configuration matches the
installed hardware and the platform configuration conforms to best known practices.

Table 2. Table Platform Configuration Check Events

Code Severity Message Arguments
600 Info The platform configuration check succeeded.
601 Info The platform configuration check detected that there are no manageable PMem modules.
606 Info The platform configuration check detected that PMem module [1] is not configured. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
608 Error The platform configuration check detected [1] PMem modules installed on the platform with the same serial number [2]. 4 1. Number of PMem modules with duplicate serial numbers. 4 2. The duplicate serial number
609 Info The platform configuration check detected that PMem module [1] has a goal configuration that has not yet been applied. A system reboot is required for the new configuration to take effect. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
618 Error The platform configuration check detected that a PMem module with physical ID [1] is present in the system but failed to initialize. UID: [2] 4 1. PMem module handle in the SMBIOS table 4 2. PMem module UID
621 Error The platform configuration check detected PCD contains invalid data on PMem module [1]. UID: [2] 4 1. PMem module Handle 4 2. PMem module UID
622 Error The platform configuration check was unable to retrieve the namespace information.
623 Warning The platform configuration check detected that the BIOS settings do not currently allow memory provisioning from this software.
624 Error The platform configuration check detected that the BIOS could not apply the configuration goal on PMem module [1] because of errors in the goal data. The detailed status is COUT table status: [2] [3], Partition change table status: [4], Interleave change table 1 status: [5], Interleave change table 2 status: [6]. 4 1. PMem module Handle 4 2. Validation Status 4 3. Text error code corresponding to the status code 4 4. Partition Size Change Status 4 5. Interleave Change Status 4 6. Interleave Change Status
625 Error The platform configuration check detected that the BIOS could not apply the configuration goal on PMem module [1] because the system has insufficient resources. The detailed status is COUT table status: [2] [3], Partition change table status: [4], Interleave change table 1 status: [5], Interleave change table 2 status: [6]. 4 1. PMem module Handle 4 2. Validation Status 4 3. Text error code corresponding to the status code 4 4. Partition Size Change Status 4 5. Interleave Change Status 4 6. Interleave Change Status
626 Error The platform configuration check detected that the BIOS could not apply the configuration goal on PMem module [1] because of a firmware error. The detailed status is COUT table status: [2] [3], Partition change table status: [4], Interleave change table 1 status: [5], Interleave change table 2 status: [6]. 4 1. PMem module Handle 4 2. Validation Status 4 3. Text error code corresponding to the status code 4 4. Partition Size Change Status 4 5. Interleave Change Status 4 6. Interleave Change Status
627 Error The platform configuration check detected that the BIOS could not apply the configuration goal on PMem module [1] for an unknown reason. The detailed status is COUT table status: [2] [3], Partition change table status: [4], Interleave change table 1 status: [5], Interleave change table 2 status: [6]. 4 1. PMem module Handle 4 2. Validation Status 4 3. Text error code corresponding to the status code 4 4. Partition Size Change Status 4 5. Interleave Change Status 4 6. Interleave Change Status
628 Error The platform configuration check detected that interleave set [1] is broken because the PMem modules were moved [2]. 4 1. Interleave set index ID 4 2. List of moved PMem modules.
629 Error The platform configuration check detected that the platform does not support ADR and therefore data integrity is not guaranteed on the PMem modules.
630 Error An internal error caused the platform configuration check to abort.
631 Error The platform configuration check detected that interleave set [1] is broken because the PMem module with UID: [2] is missing from location (Socket-Die-iMC-Channel-Slot) [3]. 4 1. Interleave set index ID 4 2. PMem module UID 4 3. Location ID
632 Error The platform configuration check detected that interleave set [1] is broken because the PMem module with UID: [2] is misplaced. It is currently in location (Socket-Die-iMC-Channel-Slot) [3] and should be moved to (Socket-Die-iMC-Channel-Slot) [4]. 4 1. Interleave set index ID 4 2. PMem module UID 4 3. Location ID 4 4. Location ID
633 Error The platform configuration check detected that the BIOS could not fully map memory on PMem module [1] because of an error in current configuration. The detailed status is CCUR table status: [2] [3]. 4 1. PMem module Handle 4 2. Current Configuration Status 4 3. Text error code corresponding to the status code

Security Check Events

The security check diagnostic test group verifies that all Intel® Optane™ PMem modules
have a consistent security state.

Table 3. Table Security Check Events

Code Severity Message Arguments
800 Info The security check succeeded.
801 Info The security check detected that there are no manageable PMem modules.
802 Warning The security check detected that security settings are inconsistent [1]. 4 1. A comma separated list of the number of PMem modules in each security state
804 Info The security check detected that security is not supported on all PMem modules.
805 Error An internal error caused the security check to abort.

Firmware Consistency and Settings Check Events

This test group verifies that all PMem modules of a given subsystem
device ID have consistent FW installed and other FW modifiable attributes are set in accordance with best practices.

Table 4. Table Firmware Consistency and Settings Check Events

Code Severity Message Arguments
900 Info The firmware consistency and settings check succeeded.
901 Info The firmware consistency and settings check detected that there are no manageable PMem modules.
902 Warning The firmware consistency and settings check detected that firmware version on PMem modules [1] with subsystem device ID [2] is non-optimal, preferred version is [3]. 4 1. Comma separated list of PMem module UIDs 4 2. Subsystem device ID 4 3. Preferred firmware version
903 Warning The firmware consistency and settings check detected that PMem module [1] is reporting a non-critical media temperature threshold of [2] C which is above the fatal threshold [3] C. UID: [4] 4 1. PMem module Handle 4 2. Current media temperature threshold 4 3. Fatal media temperature threshold 4 4. PMem module UID
904 Warning The firmware consistency and settings check detected that PMem module [1] is reporting a non-critical controller temperature threshold of [2] C which is above the fatal threshold [3] C. UID: [4] 4 1. PMem module Handle 4 2. Current controller temperature threshold 4 3. Fatal controller temperature threshold 4 4. PMem module UID
905 Warning The firmware consistency and settings check detected that PMem module [1] is reporting a percentage remaining of [2]% which is below the recommended threshold [3]%. UID: [4] 4 1. PMem module Handle 4 2. Current percentage remaining threshold 4 3. Recommended percentage remaining threshold 4 4. PMem module UID
906 Warning The firmware consistency and settings check detected that PMem modules have inconsistent viral policy settings.
910 Error An internal error caused the firmware consistency and settings check to abort.
911 Warning The firmware consistency and settings check detected that PMem modules have inconsistent first fast refresh settings.

2022-09-26 ipmctl