Reliability management

Ö Hallberg, Hallberg Independent Research; http://hir.nu

Since 1966 I have been working with electronic standards, quality and reliability management from components to systems within the telecommunication area. It involved problem solving, vendor assessments, product qualification and ISO certification, both to ISO 9000 and later ISO 14000 for environmental management certification within and outside the organization.

In order to keep up with colleagues world wide I used to give presentations and publications in scientific journals and conferences over the years to spread the knowledge from my own experience and to collect good ideas from others. On this page I will give a short summary of some of those publications If you are interested in details you may also download scanned copies for further reading.

I will take the papers one by one in consecutive order, starting in 1975. Hope you will enjoy the ride that ends in year 2000. Since then all my published papers are heading towards human reliability, i.e. public health and how it may be influenced by the environment surrounding us all. But that is another story.

1975

On a SINTOM seminar in Copenhagen April 1975, I presented a study about an electronic component, a relay driver, that we were able to distroy within 15 minutes although we kept the currents within the specified limits. The bond wires were made of aluminum and I was qurious to know if they became heated due to their resistance at high currents, still within limits. Calculations indicated that this was the case and if the wire becomes heated, it also becomes longer. And if the current is pulsed the wire starts flexing. It turned out that it was possible to break those wires after some 30000 cycles of current pulses which could not be accepted for telecommunication use. The process was actually filmed in an electron microscope and the manufacturer had to redesign the component to avoid this risk by doubble wires. PDF-copy, 619 kB

1976

In October 1976 I gave a presentation of reliability aspects on complex microcircuits (at that time...). We had been testing lots of new microcircuits; logic, memories etc to be used in the new AXE system for telecommunication that was being developed at the time. The presentation gave both theoretical and practical aspects on reliability and several examples of both early failures and wear-out failures were given. The paper is in Swedish but the main points might be followed as certainly the photographic examples given. The paper takes over 11 Mb to download. PDF-copy (11 Mb!)

1977

This year I made my first submission of a manuscript to a scientific journal. I had been working with an HP computer using HPL and HP-basic in order to model field failure rates for the case that produced units contained sub populations of weak parts that were supposed to fail quite soon and hopefully become replaced by other parts of better quality. The model worked fine and data indicated actually that those 'early failures' were likely to fail also at quite low temperatures, and that the temperature dependence of such parts probably was not so strong. Later on this was noticed also by other researchers. This basic piece of work formed the platform for all my later model developments where the log-normal function takes a predominant place, although also the Weibull-, the exponential- and the normal distributions also became used from time to time. PDF-copy 208 kB.

1979

In the 70's plastic encapsulated microcircuits were not approved for telecommunication use in our company. But there was a pressure from marketing, the suppliers and designers for using those potentially cheaper products. But they failed often in moisture tests. We performed tests at different temperature/humidity conditions and tried to figure out how long they would last at normal telecom ambients. It turned out that the plastic compound had to be very clean and did not contain clorine e.g. Otherwise the aluminum pattern would corrode, especially on the open, non-glassivated areas where the gold bond wires were fixed. The two metals could also cause a brittle compound called purple plague if used at higher temperatures. On this conference I presented a simple model for how to estimate the acceleration factor relative to normal ambient in different temperature-humidigy conditions. PDF-copy 350 kB.

1981

In beginning of the 80's we were busy testing the new memories needed for the new AXE system being developed for telephone exchanges. We started with the 1k memory (1103 by Intel) containing 1024 bits. But quite soon we had to test and qualify the 4k memories. One very important characteristic of a memory is that the information is stored as a charge in a small cell under a thin oxide layer. If there is any form of oxide breakdown in that cell the charge and the information is lost and anything can happen. So we had to run life tests at elevated temperature and highest nominal voltage for thousands of hours to check if the quality was good enough. On many parts. We ended up with houndreds of tested parte that had not failed but not could be used either in the production. So we decided to really kill those devices to see how much of voltage stress they could stand before they got a breakdown in the oxide. We designed a new type of step-stress-testing (SST) where the product was subjected to an increased voltage for a short pulse, then again function tested at normal voltage and so on. The experiment went very well and it turned out that memory types that had performed well in the 8000 h long test also behaved well in this 20 minutes test while bad types in the life test also behaved bad in the SST.

I submitted this to the International Reliability Physics Symposium 1981 and it was accepted at once. At that time I was the only Swede who had presented anything at this prestiguous conference, and I was very proud of that. If there has been any one since then I don't know. PDF-copy 246 kB.

1986

From 1981 to 1987 I was working as quality manager in a component manufacturing company where we developed and produced integrated circuits for the telecom market. One speciality was the manufacturing of ICs for high voltages, like the 48 V used on the telephone lines. We sometimes run into a sofisticated problem due to charge spreading on top of the chip protective glassivation. Electrons could slowly propagate over the chip and at some critical places act like a parasitic gate causing leakage currents to begin flow. This could take long time, months or even years before the chritical charge had been collected. We studied this phenomenon very carefully and in one of the Master Thesis' I guided we investigated the possibility to use this phenomenon as an on-chip moisture sensor. The student did a very good job (years later she became production manager) and it was submitted to and accepted for presentation at an RADC/NBS workshop in Washington. PDF-copy 336 kB.

1987

This year I presented a paper on how it may be possible to extract the inherent reliability function from field data collected from growing populations. It was a SINTOM conference and held in Visby, Sweden. The PC computing power was at that time very low and the examples I presented took long time to calculate. What at that time took 20 minutes takes today less than 1 second! Instead of having the computer finding the function that best fits data I presented a routin with cut and try. Apart from analysing component reliability I also gave an example from traffic safety where I could estimate future deaths numbers. However, this work was the real base for the more professional applications that were developed later years. PDF-copy 308 kB.

1991

Stew Peck is a legendary person in the history of IRPS. In 1986 he published a paper on acceleration factors for temperature/humidity tests that made me contact him. Over the years we both collected lots of information about this and finally together published an updated version of his earlier model but with somewhat fine-tuned parameters based on all that data we had collected. PDF-copy 449 kB.

1992

I was invited to give a presentation of my study together with Stew Peck (1991) for a European reliability conference, ESREF, in 1992. After that I also became a member of the steering committee and followed the conference carefully for a number of years. PDF-copy 1.5 Mb.

1994

At this time I had been working with hardware reliability management since some years. We worked with board and system testing of AXE hardware and did also follow the field performance through our sales and field return data bases all over the world. It turned out that this was also of general interest so the working group decided to submit a report about our methods to a quality and reliability journal. PDF-copy 656 kB.

1995

I was invited by ESREF to give a review presentation on Facts and fictions about the reliability of electronics. The presentation covered field failure analysis by different methods, reliability prediction, constant or non-constant failure rate by time, effect of burn-in, effect of complexity, hermetic parts vs. plastic parts, temperature dependency and finally reliability engineering in the future. PDF-copy 1068 kB.

1996

At a reliability conference held in London, The Reliability Challenge, I gave a presentation on the subject: Qualification of Components and Equipment in a new era. The presentation addressed a lot of questions related to the subject and gave a review of the development in standards over several decades. Several important conclusions are listed at the end to give guidance towards a failure-free era. PDF-copy 59 kB.

1997

In this Master Thesis work the student Mattias Nilsson analysed production test data and field return data. A new model for electronic reliability prediction was proposed. It used statistics from board and system tests performed at the production plant and the number of conductive layers on the board. The work was presented in France at the ESREF conference. PDF-copy 27 kB.

In another Master Thesis Patric Oscarsson presented his work at the ESREL conference in Lisbon 1997. He had developed a fully working and sofisticated Excel application for the analysis of field reliability data. This application has later on become the base for all my research in different diciplines. In this presentation we also made an analysis of traffic deaths in Sweden and could use it for projection of future death numbers. Since the method seems to give projections that have fit real data up to 2007 very good also the Swedish Road Authority has become interested and it was presented for the management at their headquarter in August 2008. The application took 1,5 year to finish to its current state and the full version is not for sale. DOC-copy 64 kB.

1998

The concept of failure free electronics caught the interest of the London seminar Reliability Challenge also in 1998. The presentation is again a review of methods and results and it shows that there is really a possibility to produce even complex products having very low failure rates. That is if a number of basic rules are taken into consideration. Failure rate is not an inherent constant but instead something you actually could live without, if you are lucky. PDF-copy 164 kB.

1999

A time dependent field return model for telecommunication hardware, Hallberg Ö and Löfberg J.

There appears to be two reasons for reliability improvement over time - one is due to field screening of marginal parts and one is due to product and process improvements over the production time. By taking the time trends into account an improved accuracy in reliability prediction can be obtained, especially for the initial few years. Reliability prediction can be based on test yield information as an alternative to the component part count method. This work was presented at the ASME conference in Maui, 1999. PDF-copy 819 kB.

Later the same year we were invited to repeat this presentation and give it at an IEEE/AST conference in Boston. Which I did, of-cource. Never say no to give an invited paper.

2000

The question of how mych testing is necessary was addressed at an IEEE/AST conference in Denver, Colorado 2000. John Larsson presented a paper we both co-authored. A general test philosophy has been outlined to support a profit driven product improvement process. The test philosophy is applied to the area of volume production of telecommunication hardware but may be applied in any production environment. A simple algorithm is given to identify the failure level under which sample testing may become economically more justified than 100% screening. Finally, a procedure to compress and store information on batch related parametric distributions is proposed to support the analysis and corrective actions due to field returns. PDF-copy 40 kB.