Analyzing Semiconductor Failures – From Evidence to Root Cause

The culminating moment of triumph for any failure analysis project is when a defect is captured in all its glory – that instant where the noisy tangle of data and observations are crystallized into a coherent analysis due to the addition of one crowning piece of evidence. While it would seem that the final photograph, showcasing the defect that lies at the root of a failure, would draw a failure analysis project to a close, there is often still work left to do; in many cases, analyzing semiconductor failures requires an even deeper examination of the defect, to determine its most likely origin.

When an analyst has finally uncovered the defect, there are still further questions to answer. Was the defect caused by an outside stimulus (e.g. mechanical or electrical overstress), or was it a pre-existing problem which was induced during the manufacturing process? Was the failing device an unfortunate victim of statistics, falling victim to the ill fortune of random process anomalies, or could it be indicative of a more pervasive issue? Without properly identifying the source of the defect, determining what corrective actions (if any) are necessary to prevent recurrence is nearly impossible. As such, the analyst must not take the defect at face value, but must consider many other data points to determine the root cause of failure.

One of the first things to consider when analyzing a semiconductor failure is the history of a device. In some cases, a device’s history will immediately identify the root cause of failure – for example, the root cause of a device failing after being subjected to ESD testing is, shockingly, almost always ESD. Failure analysis in these cases is often performed to determine which structures on the device were affected, so that improvements to a design can be made if necessary. In other cases, the history of a device may be used to determine which failure mechanisms can safely be excluded (or at least de-emphasized). For example, if a device has been operating in the field for several years before its failure, chances are very good that a processing defect was not the root cause of the device’s untimely demise. Consider a plot showing the probability of a device failing with respect to its operating time; generally speaking, this plot will follow a “bathtub curve”, with higher failure rates at the very beginning and very end of a device’s operating lifespan and a relatively low probability through between the two lifetime extremes. Early life failures are generally the result of processing issues; conversely, end-of-life failures often result from the wear and tear a device has been subjected to throughout the course of its life. If a device survives in the field for several years, it is extremely unlikely that a process defect was responsible for its failure – most of these devices would have failed far sooner – graphically speaking, process defects tend to fall at the left end of the bathtub curve.

The history of a device is not the only thing that must be taken into account when analyzing a semiconductor failure; the location of a defect on the device can also offer clues about the nature of the device’s failure. Damage consistent with electrical overstress can be interpreted much differently depending on where it falls on the die; blown ESD protection diodes may imply typical electrical overstress (e.g. a high voltage transient on an input pin) while damage in the device core, with no noticeable effect on the protection diodes, may be indicative of an inherent weakness in a device’s process. Defects occurring in high-field areas, like at the edges of diffusions or between nodes with high potential differences, are interpreted differently than those in areas without the added stress of a high e-field (a defect within a metal trace, for example).

While many defects require in-depth analysis to determine their root cause, other defects may speak for themselves. Excessive metal causing a short at the lower layers of a die can only be a processing error; a charred and blackened logic IC with a hole blown clear through the plastic encapsulant, on the other hand, is very unlikely to result from a processing defect, unless that processing defect was the inclusion of a low-order explosive instead of a silicon circuit – an improbable occurrence, indeed. While these examples are obviously extreme, there are many types of defects – fused bond wires, scratches on the die, and so on – that can be immediately correlated with a failure mechanism.

While it’s certainly true that the moment when an analyst can capture a defect with the perfect image may be the most exciting in a failure analysis project, the effort does not end there; in order to properly identify any corrective actions that must be taken, the defect must be correlated to its root cause. The value of a good failure analysis lab is the experience that enables accurate correlation of defects and causes – the ability to synthesize all the evidence gathered into a coherent theory about the life and death of a device.

Recommended Posts