I was thinking perhaps there were a way to get away with less variables, to make big numbers. After all, if I'm given two bytes, d1 and d2, each able to store 256, my options are 256+256, or 256*256, or.. or.. how about 256^256.. now that's a huge number. Even 256^8 is huge. So is there a way to get aways with say d1 storing 255, and d2 say 8? What's the mechanics under the hood, can we generate the huge number of operations needed for a delay with just these two register? In C one would just make a routine,
pow(int x, int y) {int i, result=1; for(i=0;i!=y;i++) result=result*x; return result;}
Win32 has a flat memory model, and any program written is given a sandbox where it thinks it has access to all the memory on the computer, and Windows does the translation of what a program memory of say 0 memory location, or 1MB memory location translates to in the real computer, which might be the 12.5MB to 13.5 MB region. This way you don't have to rewrite windows programs to adapt to the new limits.
So back to 256^256 - can you implement code with only 2 registers and do this many iterations, find a cleverer way than having to use 6 registers? Would this be more elegant, maybe use even less instructions? Well, the answer is no. I actually tried. What happens is that you need a way to represent and store the intermediate numbers. In order to raise to power, you need to multiply, and in order to multiply, you need to add. In the PIC you have to add manually to multiply, and then use this multiply to raise to power, manually. Things such as 3^3 =27 or 4^4=256 fit into a register, but anything bigger doesn't. You could try to consume up the so far stored stuff before you do a next step, so before you come and multiply again, how about doing the iterations now, one by one, instead of doing them later. This does not work, because the very essence of raising to power, such as 256*(current product of previous 256*256..) = 256*123456789012=12345679012+12345689012+.. 256 times. Should you use up any of these intermediate values, you won't have the huge numbers to add up on the next turn of multiplying with 256.. basically +/- is one operation step away from * multiplication, but ^ raising to the power is 2 steps away, there is no quick and dirty method to decrement in between so you don't get an overflow while raising to power. So you can't do raising to power, unless you're willing to store huge intermediate values. You can make your own custom type, and dedicate 16 or more bytes to storing the values in it, but that negates the whole purpose of trying to save registers in the first place - use 16 registers to save 7 registers, and make the code very complicated too, similar to old 32 bit Windows on 16 bit PC architecture such as a 286 - low byte, high byte, what a mess that was. Linux was started out later, and by design it does not run on anything less than a 386, which is natively 32 bit, simply to not have to deal with low byte, high byte mess for internal calculations of memory addressing.
Basically the below delay code, using as many registers as needed and going with the 256*256*256*256... way to generate large numbers, is the most elegant and efficient way I can think of. In the mechanics, the intermediate values needing to be stored are never greater than 256, and can be decremented along the way, while the multiplication happens, so you don't need to first generate the product, then start 1 at a time from a huge number, but you can take one of them, decrement it all the way, then continue with the multiplication, and the final result of how many instructions were ultimately done, is not affected. Using registers + is not as efficient as * to create large numbers, and ^ is unimplementable without storing large intermediate values greater than 256, so it's not more elegant. Unless you can think of something I didn't think of.
By the way, while surfing around for answers on this topic, I came across the factoradic number system, which is a mixed radix number system (it's not 2 base like binary, or 10 base like decimal, or 16 base like hex, but each digit has its own base, the prior factorial up to that point. Because factorial is roughly proportional to N^N, huge numbers can be stored with few digits after a certain threshhold. The factoradic number system is basically the next step beyond arabic numeral notation - the + is tally marks fast, the arabic numeral system is developing * to its finesse with a fixed radix, and the factoradic gives a notation ^ fast. I'm not smart enough a mathematician to really analyze this, but there might be ways to implement factoradic in hardware that could give more efficient computation, at least when the number of digits stored is huge, probably well beyond 64 bit. Instead of the highest number stored being around say 2^64, one might be able to get closer to 64^64 magnitude number with accurate individual step granularity squished into 64 bits.
No comments:
Post a Comment