Hash Function for PIC

Status
Not open for further replies.

pikstart

Member
Hi!

I am looking for something like or exactly that. A hash function.

There are many (about 3000) groups of data (3 bytes long) that will be stored in EEPROM and would like to search very fast. If serially search a PIC with 8MHz clock would do about 8 seconds. Since the system is going to operate a parking bar, this time is too long.

So, a hash function may help store and retrieve the group of data almost immediatly.

Does anyone have any infos about it?

Regards,
Ioannis
 
Just a thought but what about increasing your frequency. If you operate it from 20mhz it should be around 3 secs?
 
What kind of data is this? Is it random data?
Is the data sorted?
Is it dynamically generating this data?

The trick to a hash is to know what kind of data you're getting and writing the hash such that collisions are reduced.

Perhaps the data is sorted and you can binary search through it?

(which reminds me, I had a ~30 entry static lookup table to search through, I just sequentially searched through it because it was small, and required less RAM to implement sequential search...)
 
Thanks for the replies.

I need this to work really fast. So even if the clock is increased the delay is significant. There are also other task of the MCU to do in the meantime so 3 sec are optimistic estimation.

About the data. They are the worst case scenario. Not all stored, but will be stored in time. Also they are random. Doing sorting everytime a new key is stored, is for now a bad option. So a binary search is my last resort.

The data are comming from RFiD keys through a Wiegand reader into my PIC mcu. Result is 3 random bytes to process.

Thanks for looking into it.

Ioannis
 
Due to the limited capacity of the PIC (do you have bigint and bigmodulus libraries on hand? And some statistics/math under your belt to deal with some pseudo-random number theory?) I'd just use a few bits from the IDs directly as a hash "function" and use that to index my hash table array. Sure there will be collisions, but with the limited capacity of the Microchip PIC it'll be a lot easier to do it that way, and generally still be fast depending on your table size and if you choose the best bits to be the index. You will have to over-allocate a big chunk of memory, likely dynamic memory allocation for entries will not be possible (since your table is 3000*3, you will probably need to be using around 32KB memory for the table; a 16K memory may work but really tight - more collisions.)

Since PICs have limited memory addressing, an off-the-shelf hash table routines are probably not available...
 
Why does your search take so long? With an 8MHz (2Meg instructions) clock and 3000 records you get 600 instruction cycles per record per second. To read compare and reject a record should be easily doable in 200 cycles and so you should be able to search all 3000 records in around 300mS. Are you writing this in Windoze?

I am currently doing a project on a 16F886 running at 20MHz. It transfers full screen images from program memory to a graphic LCD (128*64) and can do this at 60 frames per second. Even with waiting for the GLCD it is transferring data at around 40KBytes/Second.

Mike.
 
Thanks for the replies.

Well, I am more a hardware guy than software. I use a good compiler for PIC's the Melabs PBP 2.40. It is very fast and very close to pure assembly code, at least as I write programs.

@Pommie: I cannot follow your calculations about the "...600 instruction cycles per record per second..." But can say that at worst case I 'll have to do 3 iterations per record and my valid record be the last one, so total 3x3000=9000 if-thens.

I test it and produced valid result after 8 something seconds.

If there exist a Hash function the search virtually will be wiped out. PIC will calculate the address from the 3 bytes record and go directly at the correct address. Zero time!

The other option is do sort and search.

Ioannis
 
Thanks for the replies.

Well, I am more a hardware guy than software. I use a good compiler for PIC's the Melabs PBP 2.40. It is very fast and very close to pure assembly code, at least as I write programs.

@Pommie: I cannot follow your calculations about the "...600 instruction cycles per record per second..." But can say that at worst case I 'll have to do 3 iterations per record and my valid record be the last one, so total 3x3000=9000 if-thens.

I test it and produced valid result after 8 something seconds.

If there exist a Hash function the search virtually will be wiped out. PIC will calculate the address from the 3 bytes record and go directly at the correct address. Zero time!

The other option is do sort and binary search.

Ioannis
 
By using BASIC you're seriously crippling the speed, have you considered using an index?.
 
As an alternative to saving the data on eeprom (I think you must be using an external eeprom), you could possibly use the program memory for your table.

If the expected number of data writes is below 100K during the chip's lifetime then by selecting an 18F range chip such as 18F2550 you could have up to 32K of program memory available.
With a maximum clock speed of 48Mhz, using the TBLRD commands you should be able to access the table data very quickly and without the page boundary problems that you get in the 16F chips.
 
I thought I'd check my reasoning and have a go at writing code.

If you can use inline assembler then here's a routine that will check a random record (record num in RecordNum) against a record in record0 - 3 and will return 1 if they match. It takes 45 instruction cycles, worst case. Even with a 4MHz clock it would only take around 130mS to check all 3000 records. This is for 16F series.

Code:
TestRecord	bsf	STATUS,RP1	;#2
		bcf	STATUS,C	;clear carry for shift
		rlf	RecordNum+1,W	;record address = RecordNum*2
		movwf	EEADRH
		rlf	RecordNum,W
		movwf	EEADR
		movlw	low DataAddress	;add on base address of data
		addwf	EEADR,F
		btfsc	STATUS,C
		incf	EEADRH,F
		movlw	high DataAddress
		addwf	EEADRH,F
		bsf	STATUS,RP0	;#3
		bsf	EECON1,EEPGD	;Program memory
		bsf	EECON1,RD	;Read
		nop
		nop
		bcf	STATUS,RP0	;#2		
		movfw	EEDAT
		xorwf	Record1,W	;check first byte of record
		btfss	STATUS,Z
		goto	NoMatch		;exit Z=0
		movfw	EEDATH
		xorwf	Record2,W	;check second byte
		andlw	0x0f		;but only lower nibble
		btfss	STATUS,Z
		goto	NoMatch		;exit Z=0
		incf	EEADR,F		;move to other half of record
		btfsc	STATUS,Z	;crossed page boundary
		incf	EEADRH,F	;yes
		bsf	STATUS,RP0	;#3
		bsf	EECON1,RD
		nop
		nop
		bcf	STATUS,RP0	;#2		
		movfw	EEDAT
		xorwf	Record3,W	;check third byte
		btfss	STATUS,Z
		goto	NoMatch		;exit Z=0
		swapf	Record2,W	;get unchecked nibble
		xorwf	EEDATH,W	;check against memory
		andlw	0x0f		;ignore upper nibble
; here zero flag will be set as the records match
		movlw	1
		bcf	STATUS,RP1	;#0
		return
NoMatch		movlw	0
		bcf	STATUS,RP1	;#0
		return
		
DataAddress
		dw	0x412,0x356	;record = 0x123456
		dw	0x745,0x689	;record = 0x456789

If you go this route then writing a put record function would be pretty simple.

Mike.
 
@ Nigel: What do you mean by index? What exactly is your idea? I suppose with the Hash function I am trying to do some index too.

@ Pommie: Yes, Ican embed assembly in the program. I will give it a try. Thanks for the code.

@ picasm: I have to check how this can be controlled through my environment. This also may be combined with Pommie's piece of code.

Thanks a lot to all.

Ioannis
 
pikstart said:
@ Nigel: What do you mean by index? What exactly is your idea? I suppose with the Hash function I am trying to do some index too.

It probably wouldn't make much difference, as your records are so so short, (an index is just a sorted list of pointers to the data) the answer really is to store the records in order. I'm not very confident a hash function would help you very much?.
[/QUOTE]
 
Allrighty here's pseudocode implementation of the hash table you want, using a 32K memory. Note: the math might be screwy, make sure you do NOT overflow unintentionally. 32K gives a max of 10922 entries in this table. Also I'm not doing any bounds checking so if you follow this verbatim you likely will get buffer overflows or endless loop hangs as there's no way for me to test it. Also I refuse to write PIC assembly, I had enough BTSFC and table lookups to last me a lifetime when writing dsmdata code for the 16F628A. Die PIC die. Long Live AVR. Just kidding

This *should* give you O(1) on the best case, and O(n) on the worst, just like any other hash implementation. The hash function you _will_ have to change to fit the nature of _your_ data. Since I don't know what your data is, I wrote a VERY poor pseudo-random number generator that should work but may not be optimal for YOUR data. The Search, Insert, and Delete functions are exactly what they sound like. No EEprom destroying sort is needed or done. You will have to write your own EEprom code, especially if you need to reprogram data in blocks.

-------------
define maxindex 10000 // not using every entry, just for illustrative purposes
define invalidvalue -1
define unusedvalue 0

array byte numbers[maxindex][3]=unusedvalue; // Clearing to 0, an invalid entry so we can tell if an entry has been used or not.

longint Function Hashindex(byte number[2:0]) { // simple hash function
// this is a very crappy hash.
return (number[2]*5231+number[1]*191+number[0]) mod maxindex )
}

longint Function InsertHash(byte number[2:0]) { // insert number into hash
longint firstentry=Hashindex(number);
longint ptr=firstentry;
do {
if numbers[ptr] = unusedvalue { numbers[ptr]=number; return ptr }
// we had a collision here, so find another place to put it.
ptr=(ptr+1) mod maxindex
} while (ptr != firstentry)
return invalidval; // return invalid if we didn't find any space to put it
}

longint Function SearchHash(byte number[2:0]) { // returns index of location if successful, else returns invalidval.
longint firstentry=Hashindex(number);
longint ptr=firstentry;
do {
if numbers[ptr] = number { return ptr } // found it!
if numbers[ptr] = unusedvalue { return invalidvalue }; // oops, found a space, means we didn't put it in yet
// at this point we had some collision so search.
ptr=(ptr+1) mod maxindex
} while (ptr != firstentry)
return invalidval; // we searched everything, and failed to find it
}

longint Function DeleteHash(byte number[2:0]) {
longint elocation=SearchHash(number);
if (elocation == invalidval) {
return invalidval // deleting an invalid entry ... what the heck?
} else {
numbers[elocation]=unusedvalue; // found it, clear it.
}
return elocation // this is bad practice, but you fix my code, my code sucks.
}


NEW NOTE: I just discovered a bug in this pseudocode. Delete does NOT work as expected, it will cause problems. Spot the problem and fix it... (hint: dynamic memory would be much nicer here, traversing link lists is better than being stuck with an array here.)
 
Last edited:
Oh man, I really thank you for you trouble. I'll check it and see how it goes.

My data are just 3 random bytes read from a Wiegand RFiD reader. If I could write them with a programmer then things might be easy.

Regards,
Ioannis
 
Oh man, I really thank you for you trouble. I'll check it and see how it goes.

My data are just 3 random bytes read from a Wiegand RFiD reader. If I could write them with a programmer then things might be easy.

Regards,
Ioannis
 
Status
Not open for further replies.
Cookies are required to use this site. You must accept them to continue using the site. Learn more…