Processors
Intel® Processors, Tools, and Utilities
14540 Discussions

MCE Analysis Help

idata
Employee
2,102 Views

Any ideas where I can get some help with the analysis of the MCEs below? Using an "unqualified" OS (CentOS), which my OEM vendor doesn't support and therefore doesn't have the support pack tools that hook into the OS for analysis. They suggested I "ask Intel" to provide an analysis of what part of the subsystem may be having the problem. OEM vendor is suggesting this is potentially not strictly a hardware error despite what the MCE says, and might actually be an interop problem between the OS and the hardware. These are IA64 systems, and I'm seeing them occur regularly on multiple machines.

Thanks in advance,

-Rob

HARDWARE ERROR. This is NOT a software problem!

 

Please contact your hardware vendor

 

MCE 12

 

CPU 0 BANK 8

 

MISC 14a6688000011080 ADDR 8e41d65c0

 

TIME 1340190061 Wed Jun 20 11:01:01 2012

 

MCG status:

 

MCi status:

 

MCi_MISC register valid

 

MCi_ADDR register valid

 

MCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERR

 

Transaction: Memory read error

 

STATUS 8c0000400001009f MCGSTATUS 0

 

MCGCAP 1c09 APICID 0 SOCKETID 0

 

CPUID Vendor Intel Family 6 Model 44

HARDWARE ERROR. This is NOT a software problem!

 

Please contact your hardware vendor

 

MCE 0

 

CPU 1 BANK 8

 

MISC 4702108000016000

 

TIME 1340154061 Wed Jun 20 01:01:01 2012

 

MCG status:

 

MCi status:

 

MCi_MISC register valid

 

MCA: MEMORY CONTROLLER MS_CHANNELunspecified_ERR

 

Transaction: Memory scrubbing error

 

STATUS 88000040000200cf MCGSTATUS 0

 

MCGCAP 1c09 APICID 20 SOCKETID 1

 

CPUID Vendor Intel Family 6 Model 44

HARDWARE ERROR. This is NOT a software problem!

 

Please contact your hardware vendor

 

MCE 31

 

CPU 0 BANK 8

 

MISC d847010400011287 ADDR 87bc2aac0

 

TIME 1340215261 Wed Jun 20 18:01:01 2012

 

MCG status:

 

MCi status:

 

MCi_MISC register valid

 

MCi_ADDR register valid

 

MCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERR

 

Transaction: Memory read error

 

STATUS 8c0000400001009f MCGSTATUS 0

 

MCGCAP 1c09 APICID 0 SOCKETID 0

 

CPUID Vendor Intel Family 6 Model 44
0 Kudos
2 Replies
Adolfo_S_Intel2
Employee
782 Views

Please let me know what is the Kernel that you are using?

Also let me know the processor model that you are using, and if possible the system configuration itself (hardware components)

Please bear in mind that Intel desktop motherboards do not support Linux operating systems, so we should check for this type of issues on Linux forums.

0 Kudos
idata
Employee
782 Views

Adolfo,

Thanks for your response. I'm getting the exact Kernel version info on the CentOS build now, and will reply with that shortly. The system is an HP Proliant DL360cG7 with the Itanium IA64 (Westmere) processor; so not a desktop motherboard. Red Hat Linux is a supported OS on this box, and CentOS is essentially an open-source version of it, but not one that HP officially supports, which is why I'm posting here. I'll have more information on the exact configuration of the box shortly.

I did find the following document on IA64 MCE codes, and am attempting to understand it now: http://www.intel.com/Assets/ja_JP/PDF/manual/253668.pdf http://www.intel.com/Assets/ja_JP/PDF/manual/253668.pdf

Regards,

-Rob

0 Kudos
Reply