klimp klomp: October 2009

Sunday, October 18, 2009

Musings

Since the resources and space are so limited on the PIC, one always has to think of clever ways to squeeze out more from very little. For instance, on the PIC 16F84 memory, or working RAM is limited to 80 bytes. 3(for 2s/20MHz) to 6(for 1y/20MHz) of those were used up in the previous code simply to implement a delay. We needed to use 6 variables of 256, because (2*6+1)*256*256*256.. 6times=15*256^6 is what's able to execute the needed amount of instructions for 1 year. 256 is the biggest number we can represent in the PIC with a single memory location.
I was thinking perhaps there were a way to get away with less variables, to make big numbers. After all, if I'm given two bytes, d1 and d2, each able to store 256, my options are 256+256, or 256*256, or.. or.. how about 256^256.. now that's a huge number. Even 256^8 is huge. So is there a way to get aways with say d1 storing 255, and d2 say 8? What's the mechanics under the hood, can we generate the huge number of operations needed for a delay with just these two register? In C one would just make a routine,

pow(int x, int y) {int i, result=1; for(i=0;i!=y;i++) result=result*x; return result;}

Well, here is where C programmers or high level programmers think abstractly, and forget about the underlying architecture, and expect things to just work out. In practice however, first of all, this function results in overflows pretty fast, because a power such as 10000^10000 is not going to fit in an int result variable. The very reason C has datatypes is to enable efficient use of memory, and for the programmer to keep in mind the limits and the architecture at hand. VB does not require type declarations, and anything undeclared ends up as a 16 byte Variant type. Note the even the biggest long long int, on C is only 12 bytes. Still, 16 bytes is not enough for everything, because it can overflow too. Abstraction makes it possible to write code independent of the architecture and the hardware that it runs on, but eventually it does come down to having limits, PC computer hardware is not unlimited, at least not like it used to be back in the days when IBM made the decision that 1 MB should be enough memory for everyone. That decision has been laughed at ever since. Note that in the picdelay.c program below, I made a similar decision that 1 year should be enough for anyone, in view of the architecture. In the end that 1MB decision turned out to be very wrong, and since then, we learned our lessons very well, and proper new design tries to eliminate any built in limits. For instance in the old 8086 DOS PC, working memory was up to 640 KB, and above that was the video memory, and other things, up to 1MB. When practical memory limits grew to 8 MB with the 80286, the computer had to keep running circles around that video memory, and load contiguous blocks of memory from above 1MB to the block of 0-640 KB, work work work with it til it ran out of stuff to do, and then it had to take everything and move it back above 1MB, and load a new work dose from 1MB-8MB region to the 0-640KB region. In fact some device drivers loaded, such as CD ROM drivers or mouse or joystick drivers, required a few KB's of memory, and you ended up with 600KB or 586 KB or even less "conventional" memory, which was not enough for a lot of games to run, which needed at least say 604KB contiguous to function. You had the option to not run the CD ROM device driver, that could save some memory, or there were some memory optimizers like QEMM, that sought out contiguous empty space that was not used by any hardware between 640 and 1MB, and loaded CDROM drivers there. They would stay resident, but this way you could squeeze 629 KB! of working RAM, AND have a gazillion device drivers still loaded "high." Had they designed DOS without this fixed location limitation, such as the video memory address start location shown by a pointer value stored somewhere, things might have been easier to adapt as technology advanced. But once you had to adapt to the status quo, and stay compatible, it was hard to change things. One option of course is to require each computer to come with 8MB and a newer design spec says video memory now is between 7.64 to 8.00 MB. Any old program written assuming 0-1 MB design would have to be rewritten. That would be a mess.
Win32 has a flat memory model, and any program written is given a sandbox where it thinks it has access to all the memory on the computer, and Windows does the translation of what a program memory of say 0 memory location, or 1MB memory location translates to in the real computer, which might be the 12.5MB to 13.5 MB region. This way you don't have to rewrite windows programs to adapt to the new limits.
So back to 256^256 - can you implement code with only 2 registers and do this many iterations, find a cleverer way than having to use 6 registers? Would this be more elegant, maybe use even less instructions? Well, the answer is no. I actually tried. What happens is that you need a way to represent and store the intermediate numbers. In order to raise to power, you need to multiply, and in order to multiply, you need to add. In the PIC you have to add manually to multiply, and then use this multiply to raise to power, manually. Things such as 3^3 =27 or 4^4=256 fit into a register, but anything bigger doesn't. You could try to consume up the so far stored stuff before you do a next step, so before you come and multiply again, how about doing the iterations now, one by one, instead of doing them later. This does not work, because the very essence of raising to power, such as 256*(current product of previous 256*256..) = 256*123456789012=12345679012+12345689012+.. 256 times. Should you use up any of these intermediate values, you won't have the huge numbers to add up on the next turn of multiplying with 256.. basically +/- is one operation step away from * multiplication, but ^ raising to the power is 2 steps away, there is no quick and dirty method to decrement in between so you don't get an overflow while raising to power. So you can't do raising to power, unless you're willing to store huge intermediate values. You can make your own custom type, and dedicate 16 or more bytes to storing the values in it, but that negates the whole purpose of trying to save registers in the first place - use 16 registers to save 7 registers, and make the code very complicated too, similar to old 32 bit Windows on 16 bit PC architecture such as a 286 - low byte, high byte, what a mess that was. Linux was started out later, and by design it does not run on anything less than a 386, which is natively 32 bit, simply to not have to deal with low byte, high byte mess for internal calculations of memory addressing.
Basically the below delay code, using as many registers as needed and going with the 256*256*256*256... way to generate large numbers, is the most elegant and efficient way I can think of. In the mechanics, the intermediate values needing to be stored are never greater than 256, and can be decremented along the way, while the multiplication happens, so you don't need to first generate the product, then start 1 at a time from a huge number, but you can take one of them, decrement it all the way, then continue with the multiplication, and the final result of how many instructions were ultimately done, is not affected. Using registers + is not as efficient as * to create large numbers, and ^ is unimplementable without storing large intermediate values greater than 256, so it's not more elegant. Unless you can think of something I didn't think of.
By the way, while surfing around for answers on this topic, I came across the factoradic number system, which is a mixed radix number system (it's not 2 base like binary, or 10 base like decimal, or 16 base like hex, but each digit has its own base, the prior factorial up to that point. Because factorial is roughly proportional to N^N, huge numbers can be stored with few digits after a certain threshhold. The factoradic number system is basically the next step beyond arabic numeral notation - the + is tally marks fast, the arabic numeral system is developing * to its finesse with a fixed radix, and the factoradic gives a notation ^ fast. I'm not smart enough a mathematician to really analyze this, but there might be ways to implement factoradic in hardware that could give more efficient computation, at least when the number of digits stored is huge, probably well beyond 64 bit. Instead of the highest number stored being around say 2^64, one might be able to get closer to 64^64 magnitude number with accurate individual step granularity squished into 64 bits.

Made my own offline picdelay calculator

I made my offline version of www.golovchenko.org/cgi-bin/delay
that should work for up to 1 year delay on a PIC which should
be enough for ordinary people. If you don't like it, you can change
the source! There is a discrepancy on 31536000 seconds output,
and I contacted the original author.
It mostly works the same, especially on small numbers, but on one year
delay his page gets d1 as 0x4D, me, I get 0x4E.
Also, the final cycles he needs 8+4, I need 6+4.
When I do the hand calculations with his numbers, the number of inner
cycles is 157679999999977 instead of 157679999999988 that his page states.



/* picdelay.c
usage: gcc picdelay.c; ./a.out
Generates source code for delay up to 1 year on <20MHz PIC ucontrollers
Mimics output of www.golovchenko.org/cgi-bin/delay

License: Public Domain by me,
the Anonymous Author@kolomp.blogspot.com
No contact info, so I don't get bothered with spam.
Hope you enjoy it.
*/

#include <stdio.h>

int main(void) {

long double mhz, delay_s, act_delay_s;
unsigned long long int totalcycles, ncycles, dummy;
unsigned char d[8], i, nvars;
//d[8], 256^8 > 365*24*60*60 secs * 20 MHz/4 cycles, max prog limit
//needs to be changed for longer delays/faster chips

printf("Delay seconds: "); scanf ("%Lf", &delay_s);
printf("Clock MHz: ");  scanf ("%Lf", &mhz);

totalcycles=1e6/4*mhz*delay_s;
act_delay_s = (long double)totalcycles/mhz*4/1e6;

ncycles=totalcycles-4-1; //to include for "call" and last goto instructions

nvars=0; dummy=1;
while(dummy<ncycles)
{ nvars++; dummy = dummy*256*(2*nvars+1)/(2*nvars-1);}
//number of repeated instruction lines 2*nvars+1
//printf("dummy=%llu\n", dummy);

for(i=nvars;i>0;i--)
{ dummy = dummy/256;
 d[i]=ncycles/dummy;
 ncycles = ncycles % dummy;
//if (i>1)
      //     printf("d%d = 0x%X\n", i, d[i]+1);
      //else
      //     printf("d%d = 0x%X\n", i, d[i]  );
     }
//ncycles left over, ex. on a 7 instruction loop the remainder could be 0,1,2,3,4,5,6
//add the 1 that the last goto misses
 ncycles++;
//printf("ncycles=%d", ncycles);


//printf(" Delay seconds:    %Lf seconds\n", delay_s);
//printf(" Clock frequency:  %Lf MHz\n", mhz);
printf("; Actual delay = %.16Lf seconds  %llu cycles\n", act_delay_s, totalcycles);
printf("; Error = %.16LF % \n", (1-act_delay_s/delay_s)*100);
printf("\n");
printf(" cblock\n");
for (i=1;i<nvars+1;i++)
      printf(" d%d\n",i);
printf(" endc\n");
printf("\n");
printf("Delay\n");
printf("   ;%llu cycles\n",totalcycles-4-ncycles);
printf(" movlw 0x%X\n", d[1]);
printf(" movwf d1\n");
for (i=2;i<nvars+1;i++)
  {   printf(" movlw 0x%X\n", d[i]+1);
     printf(" movwf d%d\n",i);
  }
printf("Delay_0\n");
for (i=1;i<nvars;i++)
  {   printf(" decfsz d%d, f\n",i);
     printf(" goto $+2\n");
  }
printf(" decfsz d%d, f\n", nvars);
printf(" goto Delay_0\n");
printf("\n");
printf("   ;%d cycles\n", ncycles);
for (;ncycles>1;ncycles=ncycles-2)
     printf(" goto $+1\n");
if (ncycles>0)
     printf (" nop\n");
printf("\n");
printf("   ;4 cycles (including call)\n");
printf(" retlw 0;   \n");

return 0;
}

Thursday, October 15, 2009

Accurate PIC Delays

Time to learn PIC's all over again.
This entry is written very rough draft jotting down notes style and I apologize for the style.

I have PIC-PG4D from Olimex/Spark Fun Electronics. Spark Fun no longer calls it PG4D, and it's not listed under the PIC programmer section, but they still sell it for $27.95 under the sku: DEV-00001 in development tools http://www.sparkfun.com/commerce/product_info.php?products_id=1, called Development Board with Onboard Programmer. I think this is by far the best value to get started with PICs even today, compared to USB programmers.

It's a serial JDM-type programmer, and because it's JDM-alike, many generic PIC programming software tools work with it. My favorite programmer software is the command line "picprog" listed at http://hyvatti.iki.fi/~jaakko/pic/picprog.html. I use any old text editor to edit the assembler code, and gpasm to compile the hex file that can be burned onto the chip via the programmer.

I used to have a parallel port programmer, but the parallel port has disappeared from today's laptops, and they only come with USB ports anymore. USB to parallel port converters don't work for pic programming. Seriously, just don't bother. There is someone who sells custom USB2LPT parallel converters, but even he says that with each IN instruction the USB frame must be waited on for 125 us, and this can increase a pic programming time 100x times. So if it took to program a chip under a second via a conventional parallel port programmer, it may take over a minute to accomplish the same thing via one of these converters, if they only worked at all. Just don't bother. Unless you have a PCMCIA slot, or a desktop computer with a free PCI slot, each of which accept regular fast parallel port adapters, so if your only option is USB, then a parallel port programmer is out of question, and you can only use USB or USB/serial adapters.
USB adapters are becoming the norm, but for starting out and really learning the guts of what's going on, the serial port is still the best option. Unfortunately most USB/serial converters don't provide the classical -12V/+12V sufficient voltage to program a chip, but they most likely to go 0 to 5 V, or something intermediate between 5V and 12V, such as 8 V, which still works well enough with most serial port communication functions. Luckily I bought a Belkin F5U216 USB Dock station for like $20 or so back in 2004 at Best Buy. It's a really clumsy device, with a useless VGA passthrough cable that adds to the cable mess you have to drag around with the laptop, but it does have its own separate power adaptor, and it seems to supply enough voltage to program a PIC. It's an FTDI chip based device, and another less clumsy dock station that's PL2303 based, I had no luck with as far as programming voltage goes. Your luck may vary with these USB to serial adapters, and may have to go through quite a few til you find one that's able to program chips via USB/serial JDM adapters. But ultimately it may be worth the effort. The alternative of buying a separate USB programmer, and then using a regular low voltage USB/serial adapter to talk to the chip may be in the end cheaper, but it requires moving the chip from socket to socket, and the pins may be bent and broken off. Another option, of course, is to directly supply the programming voltage, and use a transistor to switch it based on the low voltage it gets from the serial port. But this you have to make yourself, and can't buy a kit that's ready made, and guaranteed to work. I wonder why olimex/kitsrus and the rest don't sell serial programmers that either work just off the serial port, or have an option to connect a high voltage source in case the serial port doesn't provide it. After all, internally, the USB based programmers also use a USB/serial converter. Having just a serial programmer, and a separate USB/serial adapter frees up the USB/serial adapter for other uses too. The picprog author says that USB to serial adapaters will work slowly, because the serial control lines need to be toggled, and each of those operations takes milliseconds, and a full programming of a chip up to an hour. But my Belkin F5U216 USB/serial dock station programs a PIC16F628A in like 3 to 5 seconds.

Once you have a programmer, you can solder many fun circuits you can find all over the web. The classic chip is PIC16F84A, which is what most classic tutorials and circuits are about. It's a very good midpoint of the spectrum to dive in at, to start out learning, but eventually you'd move on to the newer and cheaper pics, or either the lower performance 10F/12F series, or the higher performance 18F series. Actually the cheaper 16F628 is equivalent to 16F84 if the comparator registers are turned off, and it's recommended.

Microchip has all the datasheets you need, and they are extremely well documented. Currently, Allied Electronics, with a minimum order amount of $30, sells PIC's very cheaply: search for Pic, then limit the search to I/P so you don't get surface mount SOIC's. PIC10F200 is $0.46, PIC12F683 $1.18, PIC16F54 $0.55, PIC16F628A $1.73, PIC16F88 is $2.60, and they no longer sell the PIC16F84A, even though last week they still did. Out of the above bunch the PIC16F628A is recommended to start, and should be supported by most programmers and software you can find around the net. The PIC16F88 is the candy/king of the bunch, and still sufficiently F84-like to get started with.

The best starting point for absolute beginners, and programmers who've never seen assembler programming in their life, is http://www.mstracey.btinternet.co.uk/pictutorial/picmain.htm. It's PIC16F84 based, but it's directly applicable to the 628A, so don't worry.

One of the benefits of PIC programming is accurate timing on the microsecond scale. While in the past one could use the IBM PC with MSDOS directly to control external devices through the parallel port, these days, most modern multitasking operating systems no longer allow accurate timing, and delays/hiccups of CPU availability on the order of 250 milliseconds or more should be considered. If your application needs to log something once an hour or so, that is more than sufficient, but if you need exact timing, such as talking to a DS18B20 thermometer, or a HD44780 LCD, direct PC control is almost out of reach. I remember in 2006 I was requested to create a software slowly ramping up voltage on a power supply, from 0 to 300V, to coat some electrocoated panels. With the timers provided by the windows API the fastest delay time was 20 ms inside windows NT, with unpredictable occasional hiccups of over 200 ms. When the ramp time is 15 seconds, a half a second hiccup near 80V along the ramp may or may not significantly affect the reproducibility of the test. Running on top of a nonrealtime OS, where the OS may capriciously decide to churn some harddrive, or dump some memory cache, or attempt a network connect timeout in the middle of what you're doing, that's an iffy situation. This nonrealtime preemptive issue keeps the computer from direct automation and control of things such as a nuclear power plant or a submarine, and direct control and accurate timing is handed off instead to dedicated chips, such as a sound chip, or a serial UART, etc. Hence the need for the PIC, and learning how to program it. If you want to control motors, chips, any kind of devices, and you want to make good scientific measurements, you can use a PIC as either a standalone computer with no harddrive or keyboard or display, or a PIC connected via a serial port to a PC, where the PIC is your accurate realtime buffer between the moody and unpredictable, and mysterious computer OS and the real world. I don't even know what programs are running on windows anymore, since many of them can be hidden even from the task manager. At the Linux command prompt only ps ax or top lists most running tasks, and it feels more secure, even if Linux is generally under very heavy sabotage, at least there is no hidden direct spy features built into it, because the sourcecode is available for inspection, unless gcc inserts something, but that sourcecode is available for inspection too. Also, under a network connected windows computer MS has direct access to it, and can any time piss in your cereal, when you want to make a real world measurement, execute a remote procedure call on you. If you shut down the rpc service on NT, a countdown messagebox starts and automatically reboots the computer. That's a big no no. RPC has to run at all times. If you don't like it, what else you gonna do? Go to a competitor? Good luck finding one. Running and isolated windows session may not be possible in a few years. This is also where the PIC's are a refuge, since they are meant to function standalone, no keyboard/monitor/harddisk/memory, just a nanowatt battery, and possibly a 3.5 mm earphone plug serial connection, or an RF/serial connection, or maybe a set of LED's or even an LCD. Oh what freedom it is. Slow, can't do much in it, but at least no bullshit, because what you can do, you can rely on. And redundancy is cheap.
Imagine making your own garage openers, temperature monitors, or even home security systems. Automating your world a' la Jetson style. The PIC's make it affordable. As long as there is a roughly equivalent competition, such as coming from the arduino AVR microcontrollers, or even the intel 8051, prices, affordability, andproper, customer focused market behavior should be naturally happening. If any one gets too successful and leaves the others behind or forces them completely out of business, the market could turn into a monopolistic nightmare.

So here I am trying to learn PIC's again. Every time I try, Da Man uproots me and gets me out on the street, without a roof over my head. I'll never learn.

As you start out learning PIC's, the very first things you'll do is flash an LED. Set up the input/output ports, and change the bit values on them on and off. Of course a PIC running at 20 MHz gets an instruction execution time of 0.2 microseconds, and that's too fast to see for the human eye, so you have to learn delays.
Since you're working at very low level, only byte numbers up to 255 are available, and any numbers such as 10 million have to be expressed/manipulated via such small values. Subtracting 1 from 0 rolls over to 255 in a byte, and adding 1 to 255 rolls over to 0, with the carry flag bits set. It seems like such a bother compared to modern C compilers, but what you get in exchange is knowing exactly what happens on the CPU, no mystery about it. The possibility of a virus infection brought to you by the compiler is nil, because you can decompile and examine each and every cpu instruction, and understand what it does exactly. Secure computing. It's only possible on very small scale, with very low complexity systems, but that's where everything starts, before scaling up. Talking about being a computer security expert without understanding assembly is ridiculous. I never had the chance to learn assembly before, I'm pretty fluent in BASIC, Pascal and C and from here of course in most similar high level languages like javascript/java/C++/python, but assembler has always been a mystery to me. There is always a chance that the c compiler is rigged, and unless you're able, at least in theory, to personally examine the compiler output, you can never be sure about your computer's security. Though the price is dear, another benefit of assembler programming is that you're able to use hardware directly without trespassing and violating someone else's copyright and intellectual property. You can write your own OS or your own compiler, if you wish. It used to be that computer scientists all knew the inner functionings of computers, and they were all able to program assembler. They didn't do so because of the benefits and ease that high level programming tools provided, but they understood what these tools ultimately did, and were able to create high performance and quality software, if necessary, dropping back to assembler and directly working with the hardware. That was true computer science. Today's programmers coming from diploma degree mills are slaves to the tools they are given. If these tools suck in performance, anything they create with them sucks. They don't know how to create better tools. They are told they are forbidden to even try to understand how a tool works, because that would involve reverse engineering. Eventually only an elite, a select few special circle of people will be allowed to program with high performance, directly on the hardware, and everyone else mandated to use the high level and expensive and remotely monitored tools, because otherwise it will be a trespass on fully locked down and perpetual patent rights. Only an elite select few people will have access to the patent rights, especially when patents will be set to never expire, or renewable ad infinitum. It's called a competitive advantage in the name of self interest. Only special people will be allowed to write high performance code, everyone else will have to live and run on top of artificially sabotaged and held back tools. We can even see that today, java and dotnet seem very much like that. Hence Vista sucks. DOS, with a parallel port, and Quake, used to rock. It blew the minds of their users with the hotrod speed.
In today's world being mandated to use developer tools forced onto users by monopolies feels like a forced religious conversion, a violation of the First Amendment of the Constitution. Moreover, the backdoors put into network connected computers, and the constant remote watchful eye of the proprietary system holder making sure no wrong clicks and no intellectual property violations are happening, intruding into the user's private homes through network wires, seems like a violation of the Fourth Amendment of the constitution. All the while viruses and hackers have no problems trespassing also, and in fact, they are used to further scare and intimidate end users into blindly obeying a centralized high command. Nazi style. These trespasses on individual freedoms are happening in an effort of self interest, as a power grub and control, by those doing them. End users are no longer in charge of their own lives, or their own destinies, at least as far as computing destinies are concerned. What can you do in view of all of this? Well you can use Linux, but that too is so bloated anymore and under such heavy sabotage anymore, (compare Knoppix 3.4 to 6.0, wonder if Klaus is still alive, and it's not just an imposter releasing newer versions acting like its him), that that too is no safe haven. PIC's are a safe haven in the sense that they are so small, there is no room to even run and OS, or do anything really complicated to deceive the user. It's just you against the bare cpu, and you can still get some very neat, exciting and useful things accomplished with it. Artificial intelligence is impossible with them, in view of the very limited resources and speeds available, at least compared to today's supercomputer simulators. The bang per buck, the risk of artificial intelligence development vs. the benefit of usefulness they provide is very high. And security is a given, and redudancy is cheap. In fact microcontrollers could safely run and automate nuclear power plants, space stations, cars, etc, with generally available and small learning curve skills by the whole population. Another world war, or nuclear holocaust, and the remnants of technology such as complex computers, would be unusable by Joe Schmoe, but a microcontroller could be, and rebuilding a world could be accelerated. If a space station fails, and 3 astronauts are stuck on it, they are simply unable to take care of things for themselves, and fix them, unless, of course, everything is easy to fix, everything is running on top of things they understand, such as microcontrollers. I come from a chemical manufacturing/science background, where proper measurement, and time, are very important, if nothing else, for safety reasons. PLCs and ladder logic fulfill these functions today, but cost wise, PLC's and PICs are a different ballpark, and tenfold redundancy is similarly cost prohibitive with them, unlike with PIC's. Automation can eliminate tremendous amount of backbreaking work, and make the word a more efficient, easier and safer place to live in. Microcontrollers seem like a Godsend in this regard. If one can only learn them. I'm not sure I'm smart enough to, in a sense, to learn everything, to "take charge of my destiny," and learn how to fix the things that can be fixed in my life around me, or improve the things that can be improved, but at least I can have a go at it, I can try. But I wandered off a bit from the topic.. back to Delays.

One of the most beautiful PIC instructions is the "nop", no operation. You never encounter it in any high level programming language where things are obfuscated and uncertain, but where ever you see it, it's a comfoting sign that things are running under full accountability of time. After all besides his own programmer's time and pay, a main resource a computer scientist has to budget is execution time of software, the other limit being memory. These three things have to be held in balance - programmer time, execution time, memory consumption. These days everything is focused on programmer time, with grave sacrifices on execution and memory. And this would be all well, since even today, it is the programmer time that's the only expense. However as programmed devices are becoming ubiquitous, and energy consumption important, the nanowatt power PIC's with low memory resources will be more than adequate for many functions. Such as Roomba's. But I'm drifting off topic again. Back to delays..

Listing of ledblink.asm:

;Tutorial 1.2 - Nigel Goodwin 2002 - initial template
;modified by me, author at kolomp.blogspot.net

LIST p=16F628  ;tell assembler what chip we are using
include "p16f628.inc"  ;include the defaults for the chip
;    processor p16f628
;     __config 0x3D09   ;sets the configuration settings (oscillator type etc.)

cblock  0x20    ;start of general purpose registers
d1      ;used in delay routine
d2      ;used in delay routine
d3     ;used in delay routine
endc

org 0x0000   ;org sets the origin, 0x0000 for the 16F628,
;this is where the program starts running
movlw 0x07
movwf CMCON   ;turn comparators off (make it like a 16F84)

bsf  STATUS, RP0  ;select bank 1
movlw  b'00000000'  ;set PortB all outputs
movwf  TRISB
movwf TRISA   ;set PortA all outputs
bcf STATUS, RP0  ;select bank 0

Loop
movlw 0xff
movwf PORTA   ;set all bits on
movwf PORTB
nop    ;the nop's make up the time taken by the goto
nop
call Delay2s   ;this waits for a while!


movlw 0x00
movwf PORTA
movwf PORTB   ;set all bits off
call Delay50ms
goto Loop   ;go back and do it again

; Delay = 0.05 seconds
; Clock frequency = 20 MHz
; Actual delay = 0.05 seconds = 250000 cycles
; Error = 0 %

Delay50ms  ;249993 cycles
movlw 0x4E
movwf d1
movlw 0xC4
movwf d2
Dly50ms_0
decfsz d1, f
goto $+2
decfsz d2, f
goto Dly50ms_0
;3 cycles
goto $+1
nop
;4 cycles (including call)
retlw  0x00


; Delay = 2 seconds
; Clock frequency = 20 MHz

; Actual delay = 2 seconds = 10,000,000 cycles
; Error = 0 %

Delay2s ;9999995 cycles
movlw 0x5A  ;90
movwf d1
movlw 0xCD  ;205
movwf d2
movlw 0x16  ;22
movwf d3  ;6 cycles so far

Dly2s_0  decfsz d1, f  ;first trip to zero: 90*7=630 cycles, 90th activates d2 to 204
goto $+2  ;subsequent trips to zero 7*256 (from 0-1=255 rollover, i.e 256-1)
decfsz d2, f  ;first trip to zero activated by d1->0 to 204, total 204*256*7+90*7
goto $+2  ;subsquent trips to zero total 256*256*7
decfsz d3, f  ;first trip to zero only trip to zero, activated by d2->0
goto Dly2s_0  ;total 21*256*256*7+204*256*7+90*7=9999990-1 cycles
;-1 comes from the final decfsz d3 skipping a goto

nop   ;1 cycle
retlw 0x00   ;4 cycles (including call)

end

I hope you just scrolled through that and continue reading here. I apologize for the indentation. but the pre preformatted tag is not obeyed in this blog, and I keep losing the tabspaces. You might best off copying and pasting it into a text editor and hand formatting the comments and indentations. I basically went and followed the tutorial at http://www.winpicprog.co.uk/pic_tutorial.htm
and modified it to work for the PG4D I have. Had to change the config bits from 0x3D18 to 0x3D09 (by using Kcalc's scientific feature to convert between hex and binary and the 16f628's datasheet pdf) to update to external 20 MHz oscillator, as opposed to the 8 MHz internal oscillator the tutorial uses. The delay routines I used to scratch my head at, until I found the source code generator at http://www.golovchenko.org/cgi-bin/delay

The above sourcecode when compiled with

gpasm ledblink.asm

at a command prompt, outputs a ledblink.hex file, that looks like this

:020000040000FA
:1000000007309F0083160030860085008312FF3082
:1000100085008600000000001D200030850086005D
:10002000122007284E30A000C430A100A00B1928D0
:10003000A10B16281B28000000345A30A000CD3038
:10004000A1001630A200A00B2628A10B2828A20B85
:060050002328000000342B
:00000001FF

This hex file, containing machine instructions only with sourcecode comments stripped, can be directly burned onto a pic, or disassembled with gpdasm, but the decompiled stuff looks pretty haywire. There is nothing like the original sourcecode, full of comments, and full of the choice of variable notations of the original author.
To burn the hex file onto the pic, my Belkin F5U shows up as /dev/ttyUSB0, so, after flipping the switch on the DEV-00001, and disconnecting the power source, I burn it with the command

picprog --burn --device=pic16f628 -i ./ledblink.hex --jdm --pic-serial-port=/dev/ttyUSB0

The very bottom subroutine, Delay2s, is where I tried to understand how it works. The neat thing on a pic is that you can count time by simply counting the lines of instruction. To reiterate, each instruction in a PIC takes 4 clock cycles. A 4 MHz crystal gives 1us instruction time, and a 20MHz crystal, the one that comes with the DEV-00001 from Spark Fun, a 0.2 us instruction time. For a 2 second delay, we need 10 million instructions executed before proceeding to turn the LED back on.

At the delay2s label, that serves as a marker for a future goto instruction to come to, we start following the execution of instructions. movlw means move literal value to w. W is the working register, and the PIC is retardedly simplistic on having a single working register. Have I said PIC's also follow harvard architecture, as opposed to von Neumann, where code and data space are shared? This feature makes it even more secure, since buffer overflows of data don't turn into instructions. Though there are ways to circumvent such things and make data as if it were an instruction, at least it's a first line defense, an extra safety barrier.
Once the W register is filled with 0x5A hexadecimal number, (which, using Kcalc, turns out to be 90 decimal, or 1011010 binary), the next step is movwf d1, meaning move contents of the W working register to file d1. d1 was set up at the beginning of the source code to mean memory location byte register 20 (d2 is 21, and d3 is 22) (that you can find on the datasheet of the PIC16F628A), each of these d# values being able to hold a byte, or a value from 0 to 255. Since we're trying to iterate 10 million times, 256 is not enough, 256x256=65536 is not enough either, but 256*256*256=16,777,216 is enough to represent 10,000,000, so we need 3 bytes memory. The three values of 90, 205 and 22 were obtained by http://www.golovchenko.org/cgi-bin/delay

and the verification of how they work is explained in the comments below.
Once the 3 initial values are setup in the registers, we proceed. The next instruction, decfsz, meaning decrement file and skip if zero, is a branching, conditional instruction, similar to if..then constructs in high level languages. It takes 1 instruction time (4 clock cycles) to execute, except if the condition turns out to be true, when it takes 2 clock cycles. This is to compensate for not executing the next line, and still be able to count total instruction time by simply counting lines of code, and multiplying by the instruction time factors, independent of the conditions being true or false.
The block of code from Delay2s_0 contains 7 instructions for each iteration. From the datasheet instruction listing we can see that goto's take two instruction cycles to complete, and it's beautifully written with goto $+2, jumping ahead 2 lines in the execution. So when the code starts out, d1 is 90, decfsz brings it to 89, proceeding to next instruction goto $+2 gives 1+2 instruction times so far, then another goto, and a final goto gives a 1+2+2+2=7 instruction times, before decfsz brings d1 to 88. The process repeats itself until d1 ends up at 0. At this time, the following goto is skipped, and instead d2 is decremented from 205 to 204, and proceeding along, counting the instructions, we see that the total still stays at 7 when we arrive back to Delay2s_0. At this time, register d0 contains 0, and subtracting 1 from it rolls it under to 255, as if the contents were 256, and the carry bit flag is set. So d1 contained 90 only during the first countdown, but for subsequent countdowns it will count from 255 to 0. So while d1 is no 0, we keep repeating the prior steps, 1+2+2+2 instructions, d2 staying at 204, while d1 goes from 255, 254, 253, ... 5, 4, 3, 2, 1 to 0, and at this point the skip if 0 becomes true again, and now the decfsz d2 instruction is executed, bringing d2 to 203. This process repeats itself, 204 times, with a total number of instructions passed 204*256*7+90*7, when d2 becomes 0. Now the decfsz d3 is executed, and similarly, by the time d3 becomes 0 we executed 21*256*256*7+204*256*7+90*7=9999990-1 instructions. The -1 comes from the final decfsz d3 skipping a goto - previously this instruction took 1 cycle to decf, and 2 cycles to goto, with a total of 3 cycles, but now, decfsz, with a true condition becomes a 2 cycle instruction, and there is no goto, so we lost a cycle 3 vs. 2. To compensate for that, there is a single nop executed next. There is an equation one can come up with calculate those values, but one may forget how it exactly goes when a quick delay routine is needed. Therefore the easiest procedure is to multiply 256 as many times as you need it, counting how many register varibales you will need, then starting backwards, 1 minus the trial number such as 21(trial number)*256*256*7(2*number of variables +1)... to get you under your target value, then repeat new trial number, 204(trial)*256*7(2*number of variables+1) to still keep you under it, and the final tweak, with the doublecheck like we just did. After a few times of practice all this will be a breeze, assuming you've had the patience to go through all this step by step. Patience is pretty much the most important thing for math. If you only saw what kind of lengthy things Leonhard Euler did when coming up with some of his formulas, such as, if I remember, Sum(1/x^2)=pi^2/6. http://www.physicsforums.com/showthread.php?t=80591 Lots and lots of patience and superaccurate methodical hand calculations. Most people throw in the towel long before him. That's what made him different from the rest of the world. Edison said genius is 10% inspiration and 90% perspiration. Tesla used to ridicule him on that, how he would go ahead and diligently start inspecting each straw right away when looking for a needle in a haystack, instead of first trying to figure out lazy ways to eliminate half the haystack by some logical reasoning first, if possible. But nevertheless the statement stands.

There are many ways to do delays, as listed at http://www.piclist.com/tecHREF/microchip/delay/general.htm ,some of them relying on deep subsequent "call" within another "call" which uses up a stack space. Stack space is at premium, especially on PIC10 and PIC12 series.

PS: it seems something is not right with the __config 0x3D09, and when I just uncomment that line, the LED flashes right.

Wednesday, October 14, 2009

Quote

"In theory, there is no difference between theory and practice. In practice, there is." - Yogi Berra

klimp klomp