Well, I think I found a solution. Check it out!
Long story short, rather than shrinking everything before the computation, guaranteeing an 8-bit result, I'm keeping the LSB data, adding it through, then reducing before storing the final term
Rough Pseudo-code of the "heart" for each FHT butterfly. Let's pretend in this small block the 8 bit ceiling doesn't apply:
;----------------------------------------------------------------------------------------------
Alpha = X[k+n+shift]*Wa_term[Wpoint] ;//result fits in 16bits, will never go above -16256, 16256
Bravo = X[butterfly-k+n]*Wb_term[Wpoint] ;//''
Charlie = X[k+n] <<7 ;//result fits in 16bits
X[k+n] = (Charlie +Alpha + Bravo) >>8 ;//stored value should fit in 8bits
X[k+n+shift] = (Charlie - Alpha - Bravo)>>8 ;//''
;---------------------------------------------------------------------------------------------
(I'm sure there are faster/cleaner ways to do this then my "nonsense" above. If you know of one, post it here! I'd love to see it!
)
Anyways, aside from my "nonsense", here's what that small change gives me:
Fsin = 15Hz, Fsamp = 140Hz, N = 256, Window = rectangular
Fsin = 62Hz, Fsamp = 140Hz, N = 256, Window = rectangular
There's still some small "junk" closer to DC, but the peak of the "junk" doesn't look like it moves, and is additive in magnitude if I inject something right at that frequency.
Perfect! Heh, probably not. However, I think it's a place for me to run back and modify my MCU's code. Pity, I know my exec. time is going up, but heck let's see by how much!
Any questions or comments, just let me know!
-EF