Right, I\'ve spent the last hour reading the papers I pointed out above. I\'m sorry about the length of this post, but it might be of interest to some of you out there (who can\'t be bothered to read the papers).
These are my findings:
Useful keywords (for those who might like to google a bit):
Dynamic Voltage Scaling (DVS)
Voltage Scheduler
This field appears to have been running for ages, but I\'d not even thought about it until I chanced upon this thread a few days ago. In fact the main aims of the field are to do with mobile processors which allow scaling (which I presume the Xscale does, I think the SA1110 does). Overclocking simply increases the speed range for us, so there\'s no real difference. The aims and goals are identical - save power, get job done within required time, remain interactive, etc.
Basically there are two bits to DVS - Predicting what the workload will be and Working out what to set the speed to. Workload = (total_cycles - idle_time)/total_cycles @ max speed, the goal is to make idle_time=0 so that the workload is always 100% and the speed is as low as possible.
Needless to say there are lots of methods for both. I note that most/all of these algorithms don\'t take IO into account. Should IO occur they effectively recalculate their deadline times making them shorter, but they don\'t seem to use any sort of profiling of the IO history (as they do with the idle time for example) to take this into account.
prediction:
PAST - assume that the next workload will be the same as the last.
PEAK - predicts that a rising run rate (run cycles/idle cycles) will fall symetrically and that a falling run rate will continue to fall. They use hysteresis for the speed-setting part to ensure that the task can actually keep running.
AGED/AVGn - average past workloads, give higher weighting to the more recent (lots of variations - various moving averages, different weighting schemes, different numbers of samples, etc.)
FUTURE - look into the task queue and try to predict what these tasks will need based on past experience of similar ones.
Speed-setting:
Weiser-style - If the workload prediction is high (70%), increase the speed by 20% of the maximum speed. If the workload prediction is low (50%), use the Chan-style technique below.
Chan-style - Set the speed for the upcoming interval just high enough to complete the predicted work. In other words, multiply the maximum speed by the workload to get the speed.
My comments and questions:
My I note that the SA1100 can scale its voltage from 59MHz -] 206MHz (ignoring overclocking) in 10 steps, to perform power saving. Is this behaviour implemented at all on the 5000D/5500 (different processor I know, but pretty much the same I assume)? Do the overclocking tools allow this (ie. is it performed by the same method)?
One method I saw ignored niced processes completely for the calculations, which makes sense - if a process has been niced it presumably doesn\'t have a time limit for its execution.
One of the papers had an interesting idea of having 3 layers of software doing the speed changing:
level 1 waited for user input and could override the lower levels to ensure that the user experience remained satisfactory. I guess there are issues here with response times, etc. But basically I think the goal is to jump straight to 100% when user interaction had been detected, then drop back to the normal scheme once it has stopped.
Level 2 was an idea (unimplemented) for communications with applications, so that apps could tell the system what kind of deadlines/execution times they required (reducing the need for the prediction bits above). I also think that an IO profile could be provided here. Needless to say this would be extra effort for the programmer, but I\'m sure it could be automated to a degree. Just how useful it would be is open to debate, but it might help.
Level 3 is what I\'ve been describing above, predicing future needs and scaling the CPU speed appropriately.
A fair few of these papers were about the iPAQ (which IMO is similar enough to the Z to be worthy of comparison). From one paper/presentation (same as the one I got the percentages from in a previous post):
With all the unused chips turned off and the frontlight off, an iPAQ running Linux consumes 1.9 times as much power of the same machine running WinCE. Not ideal.
When LCD is on, most other chips are off and CPU is idle,
- Linux iPAQ consumes
470mW with linux kernel (2.4.x I think)
280mW with SDRAM power down mode
238mW with SDRAM power down mode and 30Hz screen refresh rate
172mW with SDRAM power down mode and CPU speed 56MHz (lowest possible)
- WinCE iPAQ consumes
248mW
= Note
Consumes 460mW more power if the front light is on
Linux can consume as low as 98mW if it also turns off LCD
I thought it was interesting anyway. Now to start downloading the kernel source and have a go.
Si