Server Products
Data Center Products including boards, integrated systems, Intel® Xeon® Processors, RAID Storage, and Intel® Xeon® Processors
4761 Discussions

Decoding System event log Correctable ECC logging

idata
Employee
1,801 Views

Hello,

There is a system event generated in my server which uses Intel® Server Board based on Intel® Xeon® processor E5-2600.

Event information is as given below:

EvM Revision : 04

Sensor Type : Memory

Event Type : Sensor-specific Discrete

Event Direction : Assertion Event

Event Data : a5ff07

Description : Correctable ECC logging limit reached

I have gone through the document given here:

https://www.intel.com/content/www/us/en/support/articles/000006888/server-products.html System Event Log Troubleshooting Guides for Intel® Server Boards

As per the "Correctable and uncorrectable ECC error sensor typical characteristics table" given in the above document, Event Data 3 would decode as

Event Data 3 [7:5] – Socket ID 0-3 = CPU1-4

[4:3] –Channel 0-3 = Channel A, B, C, D for CPU1 Channel E, F, G, H for CPU2 Channel J, K, L, M for CPU3 Channel N, P, R, T for CPU4

[2:0] DIMM 0-2 = DIMM 1-3 on Channel

In my case, Event Data 3 is 07 which would be 0000 0111 in binary. Could you please help me to understand the DIMM location based on the above data.

Thank You.

0 Kudos
4 Replies
idata
Employee
782 Views

Hello chanchan,

 

 

Thank you for contacting Intel Customer Support.

 

 

Unfortunately with the information provided, it wouldn't be possible to accurately locate the DIMM in question, however, it is possible to achieve with the https://www.intel.com/content/www/us/en/support/server-products/000023940.html System Information Retrieval Utility, it would also be very helpful if could provided the specific board model we're working with to have a better idea of the set up.

 

 

For further details and instruction please refer to: https://www.intel.com/content/www/us/en/support/articles/000024007/server-products.html How to do Basic Diagnostics when Having Correctable or Uncorrectable ECC Memory-Related Errors

 

 

Kenneth R.

 

Intel Customer Support
0 Kudos
idata
Employee
782 Views

Hi Kenneth,

Thank you for the quick reply.

I just have the System Event log with me and I do not have the system now to run sysinfo as you have suggested.

As per the https://www.intel.com/content/www/us/en/support/articles/000006888/server-products.html System Event Log Troubleshooting Guides for Intel® Server Boards, it clearly mentions that "In both Correctable and Uncorrectable ECC errors, the error can be narrowed down to particular DIMM(s) and the table below shows DIMM identification.

However the document does not mention the case when Event Data 3 bits[2:0] are 111( ie a decimal value of 7). What would this refer to? Please help.

Thanks

0 Kudos
idata
Employee
782 Views

Hello chanchan,

 

 

You're quite correct about what is stated in the document, however, it applies for the specific bits listed on the table which doesn't include value "7",

 

 

For next steps and troubleshooting you can refer to the table 73 of the same document, and the article mentioned before, according to the documentation available and previous cases reviewed, the Event Log (BMC generated) should show the specific DIMM in question, this log can be obtained trough sysinfo, another option is depending on your OS to run diagnosis commands to check the DIMM status.

 

 

Kenneth R

 

Intel Customer Support
0 Kudos
idata
Employee
782 Views

Hello chanchan,

 

 

This goes as a follow up on the last communication, have you had the chance to review the information provided, is further assistance needed or is it OK to set this trend as closed.

 

 

I'll stay tuned to your comments.

 

 

Ken
0 Kudos
Reply