32-bit und mehr - MikroProzessor-Power für jede Anwendung
40 Jahre Mikroprozessoren:
Die Auswahl an leistungsfähigen Prozessoren für spezielle Zwecke ist sehr unübersichtlich. Zum einen ersetzen konfigurierbare Architekturen bei kleinen Stückzahlen herkömmliches Board-Level Design, zum anderen denken sich bekannte Siliziumfabrikanten immer wieder raffiniertere Variationen ihrer Standardarchitekturen aus, so daß viele ehemalige Wünsche wahr werden. Ein Prozessor der 6 Schrittmotoren gleichzeitig in Schach hält und obendrein Bluetooth fähig ist? Warum nicht, gerade Zulieferanten der Automobilindustrie sind sehr empfänglich für solche Ideen, denn Millionen-Stückzahlen winken in dieser Industrie.
Einige der leistungsfähigsten SOC - system on chip - die in Automobile der oberen Mittelklasse oder besser verbaut werden basieren auf ARM Cortex A9 und A15, bald auch die ARMv8 Hochleistungsarchitektur Cortex A5x un A7x! Vielleicht hätte sich Intel doch besser auf eine Weiterentwicklung der von Digital geerbten ARM11 XSCALE-Architektur konzentrieren sollen? Nichtsdestotrotz hat Intel auch Pferde ins Rennen geschickt, die Skylake Generation von Intel Atom hat ordentlich Dampf ohne einen großen Lüfter zu brauchen.
Evolution in Architectures
Today's embedded processor choices include:
Everybody knows it, but it's still true: that silicon technology follows Moore's Law, roughly doubling the number of transistors every 18 months. And we are seeing the benefits of this relentless silicon march up the silicon curve. It is a technical commonplace, a cliché, that silicon technology follows Moore's Law, roughly doubling the number of transistors (or functionality, or clock rates, or capabilities) every 18 months to two years.
Modern low power silicon offers higher and higher clock rates, processors need more on-chip memory to minimize off-chip memory access delays. Many are moving toward large on-chip L2 caches to localize processing and to minimize off-chip memory access delays.
In 21st century chip-level MultiProcessor became a reality. The year that SOCs moved from being a way to integrate a processor with its peripherals on one piece of silicon, to the point when SOCs started taking on the characteristics of true systems. Multiple processors on an FPGA became a working reality, one that designers could count on for delivering a large amount of processing power within a realistic silicon budget.
SOC Multi-Processing ranges from paired processors, such as a RISC paired with a microcontroller, to full-scale MP architectures with multiple RISC processors. In addition, a new class of MP processing has emerged, that of multiple processors arranged in sequential processing order or in processing arrays. This latter class represents the deployment of specialized math, vector, graphic, or media processors, which collectively can deliver a very high level of performance at modest clock rates. Now the software needs to become capable to feed processor arrays with enough tasks to turn the gain in silicon capability into real advantage in application processing speed.
Taking advantage of today's plentiful silicon, vendors are packing multiple processors on a single die to minimize design chip counts and costs.
Clocks vs. Execution Units
There's a new variation on an age-old: clock rates vs. execution units. The idea is that we don't have to go faster if we do more in parallel. Many designers are making an interesting tradeoff: clock rates vs. execution units based on the idea that maybe we don't have to go faster if we can have lots of parallel execution units. We can then run the execution units at slower clock rates and get GHz level performance without straining the silicon. It's a variation of the "wider rather than faster" design theme. If you think about it, that's precisely what superscalar RISC, VLIW and SIMD are all about, essentially deploying more execution units in parallel.
Sounds good, but most superscalar RISCs, VLIWs or SIMDs, can't get that many execution units chugging away in parallel. For example, a 4-way superscalar RISC will run 4 execution units in parallel. At best, a VLIW like TI's C6x with an 8-way VLIW has 8 units executing in parallel. SIMDs do a bit better, especially for 8-bit operations: a 128-bit SIMD like Motorola's PowerPC G4 does 16 executions in parallel. But if you need 16-bit accuracy, it only does 8 operations in parallel.
However, there's another way to get more parallel processing power to deliver massive amounts of execution MIPS at relatively low clock rates. New architecture designers have done this by basically upping the number of parallel execution units that can be deployed in tandem. Today's emerging parallel designs are all over the place architecturally, but basically all get their top-level performance by ganging multiple parallel execution units for massive parallelism.
There are several dynamically reconfigurable MP designs with an ARC RISC on-chip host with a 32-bit reconfigurable processing fabric. It is configurable with FPGA-like programmable local and layer interconnects and datapath cells. Examples of such architectures can be found with Stretch, Altera, Atmel, Xilinx and more companies to come.
Through the looking glass:
Today's processor design techniques include RISC, Superscalar, VLIW and SIMD. Each of these techniques enable designers to get more out of their silicon by squeezing down cycle logic, executing instructions in parallel, or multiplying the number of operations a single instruction can execute respectively. The trick is to get more done in the same amount of clock time.
RISC In classic RISCs, the trick was to squeeze down the register-to-ALU-to register cycle for higher execution speeds. One way to get it faster was to simplify the logic: to simplify the instruction set, use fixed multi-word addressing, use a Load/Store architecture (operate only on registers), pipelining to sequentially stage execution (enabling the next instruction to start before the current one finished), and use fixed instruction words. These design techniques enabled RISCs to run faster than the older CISC (complex instruction set computer) processors.
Superscalar The next step to up RISC performance was adding superscalar execution. Superscalar designs can issue more than one RISC instruction per cycle, using multiple execution units to execute multiple instructions in parallel. For example, many RISCs can issue and execute an integer and a floating-point instruction in parallel. But superscalar design techniques ran into some natural limits, namely that the more instructions you issue, the more intermediate stuff you have to hold in case something goes wrong, such as having to take a branch, which negates the instructions that follow it in sequence. Superscalar has settled out into implementations that can issue 2,3 or 4 instructions in parallel.
VLIW Some new design techniques have evolved from RISC. These include VLIW and SIMD. VLIW (very long instruction word) implementations are a relatively successful attempt to bypass the problems of superscalar RISC. VLIW is very like RISC superscalar; both techniques issue a number of RISC instructions. The difference is that RISC superscalar does it dynamically in hardware, deciding which instructions to issue and to handle intermediate scheduling problems. VLIW lets the compiler handle the scheduling, with the hardware receiving and issuing a block of RISC instructions.
SIMD It turns out that SIMD (single instruction, multiple data) has been around a long time. It means that a single instruction controls the operation on multiple data elements. For example, an ADD instruction causes n units to do an add. SIMD have proved to be a very powerful mechanism, especially for 8-, 16-bit, and 32-bit DSP and graphics operations done on large register words. SIMD was a natural extension for floating-point units in RISC and the X86 PC processors. Originally pioneered by Sun for its SPARC and picked up by Intel for its Pentium, SIMD enables one instruction to be applied to multiple fields in a floating-point register word. For a 64-bit word, that can be 8 8-bit adds, 4 16-bit adds, or 2 32-bit adds, delivering a 8x, 4x or 2x speedup. SIMD has now been extended to other architectures and designs: Motorola's PowerPC G4 implements a 128-bit vector engine co-processor with a G3 PPC core. The latest SIMD designs are moving to a separate 128-bit vector unit instead of the earlier 64-bit Floating-Point Execution Units.
(by techonline2000, revised and updated by Bernhard Kockoth embeddedexpert.com 2008)
Embedded Expert 2017 - Alle Marken, Warenzeichen und Handelsnamen sind Eigentum der jeweiligen Inhaber.
All trademarks and registered names are property of their respective owners. German law requires Impressum