Code glitch causing reset

Status
Not open for further replies.

riccardo

Member
Hello,

My code has developed a weird glitch that I've been unable to properly fix. The symptom is that the code runs to a point then the micro resets and the whole thing goes in a loop. This is very typical of what happens when data is written to an array beyond its size, but I can not find any such array issue. I've narrowed down the possible code areas where I can change something and the glitch goes away, however, I can not make sense of why, so I suspect I am missing something.

The area of code is a switch(), and I've found two ways to stop the reset happening....
1. Comment out the switch code
2. Change the vale of one of the cases from 6, to 66. Yes really! The cases are all #define numbers 1,2,3,4,5,6,7,8,9. I've tried replacing the defines with the numbers directly and it makes no difference. I can't understand why changing the case 6 to another value would stop the issue.

I've also noticed something odd when commenting out the section of code.. If I select all the code (Microchip Studio), and press slash, it comments out all the code except for one curly brace. I can't see why it singles out that one, but it does the same one regardless of how much of the code around it is selected.

P.S. I can comment out one of the switch cases and the bug is gone. However, the case is not even executed as the value is that of another case. I don't understand at all how that can happen!

I've attached a GIF showing what happens when commenting out that code. Maybe I've been staring at it too long and can't see something obvious. Any suggestions?

C++:
void PWM_4808::ModeSwitch() { // Set outputs according to mode
    // Set outputs according to mode
    switch (modeNow) {
    // FAULT: Everything stops, fault info on screen
        case MODE_FAULT:
            StopOutputs();    // OFF
            break;    // BREAK MODE_FAULT

    // NORMAL: Continuous Mode - Toggle On/Off with adjustable time
        case MODE_NORMAL:
            if (pulseLenCont) {
                StartOutputs(DISMODE_DC);            // Just on at full power
            } else {
                if (isRunningP == 0) {                // If just starting
                    timeStampOut = millis();            // Save current time
                    timeStampOutP = timeStampOut;        // Save start time
                }
                SinglePulseOut(pulseLen*10);        // Do a single pulse in ms (isRunning will set to zero when the pulse is ended)
            }
            isRunningP = 1;                        // Set the runningP flag
            break; // BREAK MODE_NORMAL

    // AUTO: Ping coil to detect load. User set detection current. (hysteresis in settings)
        case MODE_AUTO:
            timeStampOut = millis();            // Save current time
            if (isRunningP == 0) {                // If just starting
                timeStampOutP = timeStampOut;    // Save start time
                PingCoil(100);                    // Activate coil and read current after XXms
            } // END IF isRunningP == 0

            if (ampsVal < holdCurrent) {
                if (timeStampOut >= timeStampOutP+1000) {  // If time between pings has passed
                    timeStampOutP = timeStampOut;
                    PingCoil(100);                                // Activate coil and read current after 100ms
                } // END IF time passed
                if (ampsVal < holdCurrent-hysteresis) isLatched = 0;    // If below target and hysteresis. Unlatch
            } // END IF ampsVal < holdCurrent

            if (ampsVal >= holdCurrent) isLatched = 1;    // If current is above hold value, Set Latch

            if (isLatched) {
                StartOutputs(DISMODE_DC);            // On at full power
            } else {
                StopOutputs();                        // OFF
            } // END IF isLatched
            isRunningP = 1;
            break; // BREAK MODE_AUTO

    // THERMO: Regulate temp using thermocouple input. TODO: Adjustable heat time
        case MODE_THERMO:
            // NB: aux ADC reading needs to be calibrated for Celsius
            if (thermocoupleVal < targetTemp) {
                StartOutputs(DISMODE_DC);            // On at full power
            } else {
                StopOutputs();                        // OFF
            }
            isRunningP = 1;
            break; // BREAK MODE_THERMO

    // AUX: Regulate temp using 0-5V AUX input TODO: Adjustable heat time
        case MODE_AUX:
            // NB: aux ADC reading needs to be calibrated for Celsius
            if (auxVal < targetTemp) {                // TODO: auxVal needs a setting to calibrate for temperature
                StartOutputs(DISMODE_DC);            // On at full power
            } else {
                StopOutputs();                        // OFF
            }
            isRunningP = 1;
            break; // BREAK MODE_AUX

    // PWM_RES: Toggle On/Off, Adjustable duty, fixed frequency, selectable Duty or current mode
        case MODE_PWM_RES: // TODO: Change this to CC mode?
            if (isRunningP == 0) {                // If just starting
                StartOutputs(DISMODE_PWM);
            }
            TCApwm(modFrq, opDuty);                // Set/Update PWM Output
            isRunningP = 1;                        // Set the runningP flag
            break; // BREAK MODE_PWM_RES

    // PWM_PWM_DUAL: Resonance disabled, Selectable A/B PWM, Adjustable duty & frequency
        case MODE_PWM_DUAL:
            // TODO: Disable input capture and select MOSFET
            if (isRunningP == 0) {                // If just starting
                StartOutputs(DISMODE_PWM);
            }
            TCApwm(modFrq, opDuty);                // Set/Update PWM Output
            isRunningP = 1;
            break; // BREAK MODE_PWM_DUAL

    // SERIAL: Just shows serial comms
        case MODE_SERIAL:
            break; // BREAK MODE_SERIAL

    // DIAG: Toggle On/Off, Adjustable duty & frequency
        case MODE_DIAG:
            break; // BREAK MODE_DIAG

    // SETUP:
        case MODE_SETUP:
            break; // BREAK MODE_SETUP

    } // END SWITCH MODE_NOW
}
 

Attachments

  • codeGlitch.gif
    1.8 MB · Views: 345
Last edited:
I don't see anything from a quick look...

You could try adding a "default:" statement in the end of the switch with a breakpoint and see if that catches anything unexpected in the switch parameter?

And compiler bugs are always possible. Try turning optimisation off & see if that changes anything?
 
Could be the solenoid emf, that is tthe code that de energizes the coil, that's when the big voltage happens
 
Thank you for your replies.

adding "default" doesn't help unfortunately. Weirdly it is code that is not being executed that when commented out allows it to work. I think that it must be some sort of memory overwriting issue, and probably not directly to do with the code I've posted.

I'm currently testing the code without the other electronics attached, so voltage/power issues are not the problem.

I've tried without compiler optimisation, but the problem persists.

I found this link..
I created the "ISR(BADISR_vect)" and in there included a digitialWrite command to switch on an LED (in a while(1)) loop. It does indeed switch on that LED and then stops there. However, if I try to make the LED flash using a couple of for loops in the same place, it just continues to reset in the same way.
Also if I change from Release to Debug compile, it just keeps resetting regardless of the BADISR content.
 
Wild guess you dont have a watchdog timer firing in HW and
causing a reset .....?

Regards, Dana.
 
No WDT or similar setups.
I've added the following at the end of my setup section

Serial2.println (RSTCTRL.RSTFR,BIN);
RSTCTRL.RSTFR=0;

The output I get is "100001" which according to "12.5.1 Reset Flag Register" in the datasheet is both power reset and UPDI reset.

I'm not sure that's right though as it continues to loop and reset with the same flags even with nothing connected to the UPDI pin
 
One other possibility, inadequate bypassing causing, code sensitive possibly, a dip on
Vdd or on a reset pin. Use a DSO, set for one shot, trigger -level, ~ Vdd - 1 to 2 V, and see if
you get a transient on a pin......

OR run DSO at infinite presistence, and look at Vdd pin or reset pin, and see what worst
case level occurs.


Regards, Dana.
 
It's possible that the issue might not be directly related to the switch() statement, but rather some memory corruption issue that the switch() statement happens to trigger.

If you suspect that an array might be involved, you could try adding boundary checks to all array accesses to make sure they don't write outside of the array's bounds. This can be time-consuming, but it's a good way to rule out array-related issues. Alternatively, you could try running your code through a tool such as Valgrind or AddressSanitizer, which can detect a variety of memory errors.

Regarding the weird behavior you're seeing when commenting out code, it's possible that you have an unbalanced or malformed block of code that's causing the IDE to get confused about which curly braces to remove. This can be especially tricky to spot if you have a lot of nested blocks of code. You could try manually reformatting your code to ensure that all blocks are properly balanced, or using a code formatter to automatically do it for you.

As for the switch case that you mentioned, where changing the value of one of the cases from 6 to 66 stops the issue, it's possible that there's some memory corruption happening that's affecting the switch() statement's behavior. By changing the value of the case, you might be changing the location in memory where the switch() statement is stored, which could be resolving the issue.
 
Thank you all for your help. I think I may have found the issue, though I'm not sure how to resolve it properly yet..

I definately agree that it will be a memory corruption issue, though I am unsure of the source/solution. At the start of my program I have a "static const unsigned char Logo_bits[] U8X8_PROGMEM =...." which loads a simple image for the display. I've used this for a while in several projects with no issue. Perhaps my program size has grown in complexity to the point that this is now significant.

If I comment out the line that displays the image, all the bugs go away (though again, it may be coincidence like commenting out other random parts of the code). I've noticed that when I compile the program, the program size is: 27,728 bytes (used 56% of a 49,152 byte maximum), with ram at 35%. If I comment out the part that loads the PROGMEM, the compiled program size does not change. I was expecting it to be 1kB smaller (128x64 monochrome).

I wonder if the compiler is not properly allocating the FLASH and things end up overwritten.
 
OK so I am pretty sure it is something to do with the compiler, optomisation or something along those lines, but I have no idea how to debug something like that. I work using Microchip Studio with Visual Micro,

I use a cloud service to back up my projects so that I can also work on them at home for example. Last night I compiled some code at home and the reset thing started happening again. (ir was previously working OK at work). I turned off compiler optomisation, and the problem was resolved. However, back at work today, the code will not even compile and gives me ".......text is not within region text" "ld.exe: region text overflowed by 270180 bytes"

If I turn optomisation back on, it compiles, and is also not crashing. It's so random and frustrating as I just don't know how to track down the issue.
 
Sounds like a "code moving" bug. Do you have any code that writes to program area? Sometimes code get written over by other stuff and if it moves then it's no longer a problem. I'm assuming, of course, a Harvard architecture.

Do you mention anywhere which processor you're using?

Mike.
 
If your compiler has stack and heap size settings that might be a cause for
flaky behavior. I have had to adjust these for some designs for ARM.

Also does your processor support traps for illegal area accesses ? Might investigate
that as well.


Regards, Dana.
 
Status
Not open for further replies.
Cookies are required to use this site. You must accept them to continue using the site. Learn more…