Assault Andy Administrator
I make other people create vaporware

Registered 29/07/2002
Points 5685

16th February, 2011 at 07:37:33 -

Well you don't really have lines of code in C/C++ since new lines are stripped in compilation anyway, aren't they? I think you can just put all the code on one line. So it's not really a good measure of how long your code is Efficiency on the other hand is another kettle of fish!

Andy/UrbanMonk, as an embedded system programmer I can always toggle an IO pin at the end of every iteration of an algorithm and measure the frequency of this. If the IDE toolset is good enough you can use a profiler and figure out the number of CPU cycles one iteration takes purely from simulation. We can also do in circuit emulation and set a hardware timer running...

There must be PC side profilers surely? I have yet to intuitively find or need one mind.

On a side note I absolutely detest the "lets put everything on one line C" style of writing or the if ( MYBOOL ) { // braces on same line etc style... Many C purists call that Java style due to the problems it can induce in Python or Java. In fact since 2005 MS has adopted Allman style.

I am a great supporter of the Allman style, and so are MS and the developers of IRRLICHT.

I think we ought to care most about efficiency. If a primality test was included in a standard library you would just make 1 call to a function. You can write an algorithm that completes in polynomial time instead of exponential like this brute force method.

As the engineer I must ask why you are interested in primes? The only place I could think of being interested in primes outside of academia would be in the brute forcing of AES et al - to crack it you effetively have to factor out the primes from a very very large number, which currently would take an unplausibly long time to do.

Quite nice knowing that the only thing stopping people breaking pretty much all the encyrption algorithms used online is the fact it would take a bloody long time to factor out a number. That said quantum computers running Shors alrgorithm are a long time off...

Matt do you know how to measure cycles (or time) an iteration takes on a PC app? Is there a profiler in typical pc IDE's? (I presume there is), or do you just figure it out the formal way (I can remember working out the efficiencies for binary split searches, bubble searches etc as an undergrad. So boring ).

The only profiling tools I have ever used record execution time, though supposedly profiler tools exist (I believe MS visual studio can do it). I think valgrind was suggested to me once, for profiling and catching memory leaks. We often just talk about how many "steps" are performed in the algorithm for an input of size n, suggesting complexity in terms of big O notation like O( n^2 ). You could compute the number of cycles exactly by debugging the assembly code and getting out that big fat processor manual that tells you how many cycles per operation you need on your particular processor. You musn't forget that moving memory around can actually be quite expensive as well, but this isn't really considered. Other considerations I have seen (when implementing certain fourier transforms) are how many multiplications and how many additions are made for an input of size n, with the assumption that multiplications are far more expensive than additions.

Primality tests do have a practical application. If you have a function with a good probability of generating a prime number, but it is not always certain, a fast algorithm for testing the primality of these prime numbers is essential. Prime numbers are generated in this way in certain encryption algorithms (such as PGP) that require large prime numbers. Clearly for large inputs the suggested brute force algorithm for testing primality is going to perform in a very substandard way.

Yeah, I do know about primes in encryption as I stated above but that is about the only practical place I know of their use - but I am not a computer scientist by any means...

In the embedded system world (which is my career) simulators naturally tell you the number of cycles as the formal complexity type judgements you mention are quite frankly pointless, as doing everything in fixed point and avoiding division or square roots (like the plague) is the main concern. Also if you have written in C or whatever you still have easy instant access to the machine code / assembly.

Some CPU's I use multiplication and addition is two cycles (with a hardware multiplier), division is typically around 256 where loading from memory is five cycles, and branching is 6. So where at all possible we multiply by the inverse of what we want to divide by (assuming our division is by a constant) or if the division/multiplication is by a power of two we always use shifts, as those are usually 1 cycle.

I really need to do some more PC side development, as what I do is really an entirely different ball game

Computer scientists are usually more interested in complexity. I am not overly familiar with embedded system programming but I am certain that complexity in some sense will have a bearing on the algorithms you select, but the difficulty in this is minor. The majority of your time is clearly going to be focussed on optimizing theoretically efficient (in the sense of complexity) algorithms for the target device. I am currently working on my final year dissertation, implementing pitch tracking algorithms on a mobile phone (android), so I am quite interested in the sort of optimizations you are used to. Avoiding floating point numbers and reducing multiplications for example will help to produce an efficient result, and I think profiling would be incredibly useful as a comparison of algorithm efficiency. When it comes to PC development I have rarely seen anyone interested in more than the complexity of their algorithm because instruction sets are so broad and programmers are happy to be a bit inefficient when there is so much computing power available and it saves a lot of time. We can test and identify inefficient code by timing it There are sometimes arguments like "interfaces aren't as efficient as abstract classes in java" for instance. The differences are minimal on a PC and in practice people choose what's nice to work with. To a certain degree we rely on the compilers to be clever as well and optimize some of the code for us where it can. Virtual machine based languages benefit from modifications at runtime: the code can actually change over time. Compilers aren't always sure which processes are most significant. Virtual machines are capable of analysing this at runtime and optimizing those processes, recompiling to machine code from byte code.

Well in the DSP/Embedded system world (I also do a lot of HDL, Hardware Description Language designs but that is a different kettle of fish) everything is real time, so after sampling an analogue to digital converter for our filter, fft or whatever must be done with enough time to spare to do other system tasks before getting the next ADC sample, so profiling is part of everyday DSP/ES life - and I do not know of any tools that do not have this feature with many of them giving the option of using real hardware or a simulator (apart from some ropey GNU "alternatives"). In real hardware breakpoints typically stop the program counter advancing, hence you can mooch around and look at any registers/memory you fancy.

But also code we develop typically results in machine code directly calling hardware peripeherals so we do not have to worry about abstraction layers or scheduling presented by the OS (or this is how I presume a OS works).

It really does depend on the CPU architecture but on some of the CPU's I use there are many things which can improve performance, especially with DSPs.
For example loading from an array into a temporary variable before doing a loop and reading the next value at the end of the loop would improve performance, as with a pipelined DSP the data will be loaded whilst doing the loop evaluation, hence cutting down on the number of cycles utilised (i.e. loop evaluation then load would incur NOPs whilst wait for the data to turn up).

Loading from external memory in double/quad word widths (typically 32/64 bits) whilst doing your arithmetic at 32/16/8 bit precision is another trick. Can do a few calculations for only one read/write penalty.

There is a running joke in the DSP world. Seeing a NOP in the assmebly is addmitting that it is Not Optimised Properly .

I guess you are experimenting with Kalman filters, Particle filters etc? I did some work on Kalman filters a few years ago on my masters degree.