What are possible reasons for NVM data corruption during embedded controller shutdown?

Status
Not open for further replies.

jani12

Member
Consider a typical autonomous embedded Steering Control Module. This controller is mounted in a Commercial Truck. It assists in keeping the vehicle in it's lane.

During transition to sleep mode, real camera calibration data is written from RAM to NVM. This data written to NVM is corrupted. What are possible reasons that corrupt NVM data?

  1. Brown out condition
  2. Reset
  3. Power loss
  4. Any other reason that cause NVM corruption during controller transitioning to sleep mode.


How to find out if NVM data is corrupted? After controller wakes up, does it calculate checksum of real calibration values in the NVM and if calculated checksum doesn't match checksum stored in NVM, it is concluded that NVM is corrupted?
 
How to find out if NVM data is corrupted? After controller wakes up, does it calculate checksum of real calibration values in the NVM and if calculated checksum doesn't match checksum stored in NVM, it is concluded that NVM is corrupted?
That is one way.

It can be a good idea to write a checksum or serial number, or both, at the start and end of the data that is being written. If the checksums or serial numbers don't match, there has been corruption.

Depending on your NVM specifications, writing to NVM each time the module shuts down may be too often and the NVM could wear out before the vehicle. I would suggest only writing data if it has changed significantly.

Vehicle systems can wake up just after they have gone to sleep. That can result in multiple sleep-wake cycles, and it can be a problem if the NVM is being written to just as the module wakes up.
 
Most vehicle modules do not write short term calibration data to NVM, they just keep it in RAM.
That's why you can clear stray settings from many vehicles by disconnecting the battery for a while!

The basic calibration data and possibly separate long term verified calibration adjustments are stored so they can be restored after a power failure; but only save the calibration changes after eg. they are consistent and stable for several days, and different enough to the existing data that the update is actually needed.

Any stored data should be saved with a CRC or checksum that can detect bit corruption, whether in NVM or RAM.
You could store two or three duplicate copied short term calibration data blocks for the "live" settings, if the unit has enough RAM, and compare those against each other during the wake-up sequence.

Saving to NVM for the sake of it is likely to cause serious problems in the long term, as Tesla discovered...
 
You could store two or three duplicate copied short term calibration data blocks for the "live" settings, if the unit has enough RAM, and compare those against each other during the wake-up sequence.

This was a MASSIVE problem in later CRT TV sets - with NVM corruption one of the most common failures.

As far as I'm aware, no manufacturer ever cured the problem? - and Sharp even tried recording three copies of it in the NVM, and having a 'voting system' which selected which two were the same. Made no difference, they were still just as unreliable as everyone elses.
 
Any idea why?

Mike.
No, no manufacturer was ever able to sort it out - presumably it's why writing to internal EEPROM in a PIC is such a convoluted affair?, attempting not to get corruption. I often wondered if it was related to flash-over inside the CRT?.

The standard 24Lxx EEPROM's have a write prevention pin, but none ever seemed to use it - I don't know if it would have helped or not?.

With TV's it was common practice to build yourself an EEPROM programmer, and when a new set first came out you read the EEPROM out of it, and added it to your library of files, ready for when you needed it

The OP mentioned 'default' settings in their other thread, many TV's sets kept the default settings actually in the code of the micro-controller, so if you fitted a brand new blank EEPROM as soon as it powered up it detected this, and transferred the original defaults to it. Other manufacturers sold you new EEPROM's ready loaded with the default settings (obviously make and exact model specific) - or you got a blank EEPROM and wrote it using the defaults you had stored previously, as above.

One extra thing - I don't know if you knew or not?, but Samsung fitted seriously sub-standard electrolytics in their LCD TV's for a good few years - they were obviously well aware of it, as many failed while still under warranty, and in fact with pretty well all models from that era all spares were exhausted long before the 12 month warranty was even reached. Anyway, one of the effects of the faulty capacitors was to trash the EEPROM - so as well as replacing the useless capacitors you had to either replace or reprogram the EEPROM, which was a SM device. However, it was actually pretty easy (unless the EEPROM needed replacing) because it was a model with the defaults in the processor - all you had to do was short the data and clock lines together with a small screwdriver, and turn the set on. This fooled the set into thinking it was a blank EEPROM, and when you removed the screwdriver it uploaded the defaults.
 
Last edited:
I've seen a few ECUs that avoid the problem by not writing anything to NVR if power is lost.

One method is where the ECU is permanently powered, and it contains an SBC or System Basis Chip. The SBC usually contains the communications bus transducer (CAN or LIN) and the voltage regulator for the processor.

When car shuts down, communications on the bus cease. The processor continues to monitor the bus for a few seconds, so that a wake-up message can be distinguished from a late message from one of the other modules. After that few seconds, the processor will write anything needed to NVR. There is no particular hurry to write to NVR, as the processor can take power from the car battery as long as it needs to.

After writing to NVR the processor tells the SBC to turn off the voltage regulator, which stops the processor.

When communications restart, any dominant level on the bus will turn on the regulator and the processor is powered up and starts operating.

The SBC can keep it's quiescent current low because it is very simple, and it does need to understand the messages on the communication bus. It only needs to be able to detect a dominant level. Some SBCs have just a single bit latch to remember if the processor has instructed it to shut down or not. The voltage regulator doesn't need to run when quiescent current is important.
 
Status
Not open for further replies.
Cookies are required to use this site. You must accept them to continue using the site. Learn more…