by kublikhan » Wed 22 Jul 2015, 02:22:28
Further pstarr, I am not sure you really appreciate just how much faster processors of today are compared to a processor from 10 years ago. Just because the 2 processor's have the same clock speed does NOT mean they are the same speed. A modern cpu at 2.6ghz is much faster than a 10 year old CPU at 2.6 ghz.
$this->bbcode_second_pass_quote('', 'Q'): Why, for example, would a 2.66 GHz dual-core Core i5 be faster than a 2.66 GHz Core 2 Duo, which is also dual-core?
A1: The processor requires fewer instruction cycles to execute the same instructions. This can be for a large number of reasons:
1. Large caches mean less time wasted waiting for memory.
2. More execution units means less time waiting to start operating on an instruction.
3. Better branch prediction means less time wasted speculatively executing instructions that never actually need to be executed.
4. Execution unit improvements mean less time waiting for instructions to complete.
5. Shorter pipelines means pipelines fill up faster.
And so on.
A2: The absolute definitive reference is the Intel 64 and IA-32 Architectures Software Developer Manuals. Some general differences I see listed in that chapter, going from the Core to the Nehalem/Sandy Bridge microarchitectures are:
* improved branch prediction, quicker recovery from misprediction
* HyperThreading Technology
* integrated memory controller, new cache hirearchy
* faster floating-point exception handling (Sandy Bridge only)
* LEA bandwidth improvement (Sandy Bridge only)
* AVX instruction extensions (Sandy Bridge only)
A3: Designing a processor to deliver high performance is far more than just increasing the clock rate. There are numerous other ways to increase performance, enabled through Moore's law and instrumental to the design of modern processors.
* Pipelines have become longer over the years, enabling higher clock rates. However, among other things, longer pipelines increase the penalty for an incorrect branch prediction, so a pipeline can't be too long. In trying to reach very high clock speeds, the Pentium 4 processor used very long pipelines, up to 31 stages in Prescott. To reduce performance deficits, the processor would try to execute instructions even if they might fail, and would keep trying until they succeeded. This led to very high power consumption and reduced the performance gained from hyper-threading. Newer processors no longer use pipelines this long, especially since clock rate scaling has reached a wall; Haswell uses a pipeline which varies between 14 and 19 stages long, and lower-power architectures use shorter pipelines (Intel Atom Silvermont has 12 to 14 stages).
* The accuracy of branch prediction has improved with more advanced architectures, reducing the frequency of pipeline flushes caused by misprediction and allowing more instructions to be executed concurrently. Considering the length of pipelines in today's processors, this is critical to maintaining high performance.
* With increasing transistor budgets, larger and more effective caches can be embedded in the processor, reducing stalls due to memory access. Memory accesses can require more than 200 cycles to complete on modern systems, so it is important to reduce the need to access main memory as much as possible.
* Newer processors are better able to take advantage of ILP through more advanced superscalar execution logic and "wider" designs that allow more instructions to be decoded and executed concurrently. As noted above, Haswell can execute up to eight instructions at a time. Increasing transistor budgets allow more functional units such as integer ALUs to be included in the processor core. Key data structures used in out-of-order and superscalar execution, such as the reservation station, reorder buffer, and register file, are expanded in newer designs, which allows the processor to search a wider window of instructions to exploit their ILP. This is a major driving force behind performance increases in today's processors.
* More complex instructions are included in newer processors, and an increasing number of applications use these instructions to enhance performance. Improvements in compiler technology enable more effective use of these instructions.
* In addition to the above, greater integration of parts previously external to the CPU such as the northbridge, memory controller, and PCIe lanes reduce I/O and memory latency. This increases throughput by reducing stalls caused by delays in accessing data from other devices.
Why are newer generations of processors faster at the same clock speed?
The oil barrel is half-full.