Continue to Site

Welcome to our site!

Electro Tech is an online community (with over 170,000 members) who enjoy talking about and building electronic circuits, projects and gadgets. To participate you need to register. Registration is free. Click here to register now.

  • Welcome to our site! Electro Tech is an online community (with over 170,000 members) who enjoy talking about and building electronic circuits, projects and gadgets. To participate you need to register. Registration is free. Click here to register now.

Debugging Temperature Related Issues??

Status
Not open for further replies.

stolzie

New Member
Hi Guys,

I am after some advice and suggestions of what can be done. Here is the story:

We currently have a circuit that has been in production for a number of years now, and are having issues with both older and newer units failing at different temperature ranges(mainly the hotter side). I wouldn't have been too concerned if it was a particular batch, that had failed but different units across a number of different production runs is a bit worrying.

There are a number of different IC's (5 in total) in this circuit which populate an area 72mm (~3") x 72mm , there are two that are rated between 0-70C, one that is -40 to 85C and the rest -40 to 125C. Along with two crystals for two of the IC's.

The suspect units are failing from about 37C to 60C, but the majority are at 50C.

In order to test our boards we are heating the operating board with a heat gun at certain ranges to maintain a constant temperature over the board as we can't afford an oven for testing. The temperature is being measured by a probe that has come with the multimeter and is directly placed over a particular IC while heating.

However seeing as all IC's are close together, we are not sure how to work out which chip/ crystal is failing. We have hit a few with freeze spray and the board has come back but in the process I believe we have also cooled down the other chips. We are lead to believe it is a certain chip but not 100% sure.

We have thought about boxing the IC's to heat particular sections of the board to work it out.

Anybody able to share there experiences with this kind of failure and how you managed to track the problem?

Thanks in advance for your input.

Cheers

Stolzie
 
hi,
One crude method I have used is to use the tip of the solder iron and touch the suspect IC/component for a couple of seconds.
Use the extension tube on the freezer spray and hold a piece of cardboard as a shield to prevent cooling of other IC's
 
Are these units failing when installed or just on test. A heat gun will not give even heating and may be the actual problem.

Can you tell us the number of the suspect chip(s)?

Mike.
 
Are these units failing when installed or just on test. A heat gun will not give even heating and may be the actual problem.

Can you tell us the number of the suspect chip(s)?

Mike.

Hi Mike,

The units are failing out in the field, unfortunately the install location is not local to us. We are trying to replicate the problem where at the workbench so we can work out what needs to be repaired or replaced.

The suspect chip that we are looking at is RTL8019AS Full Duplex Ethernet controller.

Thanks

Stolzie
 
Hi Guys,



We currently have a circuit that has been in production for a number of years now, and are having issues with both older and newer units failing at different temperature ranges(mainly the hotter side). I wouldn't have been too concerned if it was a particular batch, that had failed but different units across a number of different production runs is a bit worrying.

There are a number of different IC's (5 in total) in this circuit which populate an area 72mm (~3") x 72mm , there are two that are rated between 0-70C, one that is -40 to 85C and the rest -40 to 125C.

The suspect units are failing from about 37C to 60C, but the majority are at 50C.
You are a victim of the idiots that plagued me for 33 years in the industry. Here is wisdom: an IC does not know, nor care, what the ambient temperature is yet IC maker idiots spec products with ambient limits. 0 to 70 and -40 to 85 are both ambient limits (it will be labeled TA on the data sheet). -40 to +125 is a JUNCTION temperature range limit (called TJ). The junction temp is the only thing that affects an IC.

The suspect units are failing from about 37C to 60C, but the majority are at 50C.

I'll wager that's the measured ambient. Any clue what the junction temps are?

All that said, you must calculate TJ to know if it exceeds about 130C and if so, that shortens the life of the IC's. calculating it requires the skills of an endangered species called an analog power engineer. I am one such, but my eyes are dim and my opinions no longer found worthy these days.... I'm sure there is a good SPICE model somewhere that will run a marvelous simulation for you and let you adjust the parameters until it gives you the answer you want.

In reality: thermal design is critically difficult. An IC's temp is influenced by the PCB design, placement of nearby heat generating devices, and airflow or lack of as well as the characteristics of the package the device under test is in.

It's honestly too bad that people like me don't seem to be needed anymore. Fixing problems like this (or designing them out in the prototype stage) were what I did best.

good luck
 
Last edited:
HCFC 141-B boils at 32 C under atmospheric pressure.
It can be found at automotive parts stores as a flush for air conditioning systems.
The advantage to this is that a liquid can be controlled better than a spray.
There are others that might work, like acetone.
 
Randomly hitting it hot and cold is going to introduce an effect called thermal shock which can crack the die away from the lead frame or even crack the plastic package itself. If you are getting random but frequent failures on units running long and hot, that is an unlikely failure mode.

I might suggest investing in a decap and F/A analysis on some of the blown devices. Don't immediately rule out that the IC vendors could be shipping bad parts because they do that frequently (I should know, I spent 20 years fixing what our bad parts did to customers). Most field failures are not bad parts, but many are.
 
Thanks for the insight bountyhunter! The temperatures measured are all ambient. We haven't ruled out bad parts but it does seem strange. Will also have a look at the junction temperatures of the IC's.

Cheers

Stolzie
 
Temperature

Doesn't sound like the parts are failing or you would know which was bad. It sounds more like something drifts with temperature. When you mention crystals it sounds like frequency is important. Maybe you could expand on the failure mode and post a schematic?
 
Incredibly, I couldn't find where the RTL8019AS spec sheet lists power consumption, but I suspect such a complex chip may use a significant amount of power, which would cause self-heating of the device above ambient. If you notice, in section 7.2, the temperature is stated as Tc (case temperature) not Ta (ambient temperature). This implies some sort of heat sink or thermal connection/heat sink to maintain the chip at that temperature.

You might check how hot the chips are running above the ambient temperature. They may well be exceeding their case rating at an ambient of 50-60C. An IR temperature measurement gun can tell you the temperature of each chip.
 
You could have a design with a marginal timing issue. The problem may only creep up as the temp does. The problem may be more frequent depending on IC lot numbers. I would suspect timing problem and start reviewing all set up and hold times as well as clock signal quality.
 
Your in for some serious trouble shooting. I've had luck in the past using voltage and or frequency margining to show up timing issues. Try running the supplies at min. and max. If that doesn't get it to fail you can try running the frequency up a bit. Either way I think your in for a timing problem not a part problem.
 
Thanks everyone for your suggestions and input, we have purchased an IR Temp Sensor gun that we will be using for testing today to try and get some accurate temperature measurements.

We also haven't ruled out the frequency drift either and will be modifying some firmware and setting up pin toggles for frequency measurements.

Thanks again will keep you guys posted.

Stolzie
 
You need to verify setup and hold times of signals like reads and writes to name just a few.
 
Status
Not open for further replies.

Latest threads

New Articles From Microcontroller Tips

Back
Top