A closer look at microprocessors that have shaped the digital world

Christopher U. Ngene; Manish Kumar Mishra

A Closer Look At Microprocessors That Have Shaped The Digital World

Christopher U. Ngene1 Student member IEEE, Manish Kumar Mishra2

1Computer Engineering Faculty, Kharkov National University of Radioelectronics, Kharkov, Ukraine

unchris. ua@gmail. com.

2Department of Computer Engineering University of Maiduguri, Nigeria; [email protected]

Abstract - If you have been following the development in the microprocessor world you would attest to the fact that things have dramatically changed since the introduction of the first world acclaimed microprocessor Intel 4004 in 1971. What were the changes that have been made to these processors that have actually improved our lots, especially how we perceive the world around us and improve our productivity at work? In this study, we investigate different general purpose processors with a view to enlightening consumers and enthusiasts alike, determine which of the myriads of processors will be most appropriate for their tasks and the choice of which makes more economic sense. We have been able to explore in relative detail that processor speed is not the only determinant of processor performance but of most significant is the architecture. This study reveals that new technology is not the only factor that determines whether a new processor is actually new, but most importantly marketing considerations have been the driving force.

I. Introduction

Since the introduction of the first commercial integrated circuit in 1961 and the introduction of the first microprocessor in 1971, the semiconductor industry has experienced a healthy growth. Anyone involved in electronic design or electronic design automation (EDA), marketing or analysis of electronic devices knows that things are becoming evermore complex as the years go by and microprocessors are no exception to this rule. By comparison today’s microprocessors are more complex than the 1960’s and ‘70s mainframe’s central processing units. Most importantly these processors outperform those mainframe CpUs and are cheaper and affordable. Architectures have also evolved to the extent that we no longer talk about CISC or RISC but a combination of both. other architectures that enhance performance like EPIC (Explicitly Parallel Instruction computing) based on VLIW in conjunction with pipelining, super-scaling and hyper-threading have boosted performance to unimaginable level.

Manuscript received November 3, 2009.

Christopher U. Ngene is with the Kharkov National Uuniversity of Radioelectronics, Computer Engineering Faculty, Lenin Prosp., 14, Kharkov, 61166, Ukraine (corresponding author, phone: +38057-7021326), unchris. [email protected].

Manish Kumar Mishra is with Department of Computer Engineering University of Maiduguri, Nigeria; [email protected]

The fate of both Intel and Microsoft was dramatically changed in 1981 when IBM introduced the IBM PC, which was based on a 4.77MHz Intel 8088 16-bit processor running the Microsoft Disk Operating System (mS-DOS) 1.0. Was there any one chip that propelled Intel into the Fortune 500? Intel says there was: the 8088 [6]. Since that fateful decision was made, computers became affordable and found their ways into our homes. The focus of this paper is on Intel and AMD processors. Even at that we have discussed selected number of processors that have made their marks in the industry. The obvious reason for this selective approach is because there are myriads of processors in the inventory of these vendors which the limited space at our disposal will not be enough to accommodate.

Available publications on microprocessors have focused mainly on the historical development, specific processor reviews and benchmarking [1][2][18]. We agree that these are necessary. But numerous users of these processors may not quite appreciate some of the technical jargons employed in some of these publications. With respect to the above this paper has taken a wider perspective to give our readers a holistic view. This has dictated the approach we have taken and have encapsulated all vital information regarding processor within the pages of this paper. In this paper we have given a brief historical background of microprocessors and how the Moore’s law has been holding out as a result of innovations in chip fabrication enabling smaller and smaller feature sizes. The rest of this paper is presented as follows: section 2 presents some historical backgrounds on the development of transistors, integrated circuits and subsequently microprocessors. In this section we have also looked at the basic processor architectures. Section 3 presents the different types of processors and microprocessor feature trends. Special purpose processors - microcontrollers, graphic and digital signal processors were discussed, but a detailed description of these processors have been left out for future work. Section 4 presents methods of chip fabrication, foundries, process yield and process technologies and an evaluation of the relationships between die sizes, lithography and transistor count per die.

2. Background information

Before we proceed further a little background would suffice in order to appreciate where we are now by knowing where we came from. The development of computer systems is closely tied with processors and subsequently

R&I, 2009, №4

41

microprocessors. Processor (Central Processing Unit - CPU) in conjunction with the memory is the brain of the computer. Data processing (arithmetic and logic operations take place in the CPU). Early computers which were mainly mainframes have very large CPUs. Early CPUs were implemented as discrete components and numerous small integrated circuits (ICs) on one or more circuit boards. Microprocessors, on the other hand, are CPUs manufactured on a very small number of ICs; usually just one. The overall smaller CPU size as a result of being implemented on a single die means faster switching time because of physical factors like decreased gate parasitic capacitance. Prior to the advent of machines that resemble today's CPUs, computers such as the ENIAC had to be physically rewired in order to perform different tasks. These machines are often referred to as "fixed-program computers," since they had to be physically reconfigured in order to run a different program. Since the term "CPU" is generally defined as a software (computer program) execution device, the earliest devices that could rightly be called CPUs came with the advent of the stored-program computer. CPU deals with discrete states and thus employs switching elements for change of states. Before the discovery of transistors, electrical relays and vacuum tubes (thermionic valves) were commonly used as switching elements. The electromechanical relays and vacuum tubes have the problems of contact bounce and heat respectively. They generally have a slow switching capability. They are considered to be very unreliable for the above reasons. Tube computers like EDVAC are generally faster than

electromechanical computer (Harvard Mark I) but are less reliable. EDVAC tended to average eight hours between failures, whereas relay computers (Harvard Mark I) failed rarely.

2.1 Evolution and Direction of Development

Let us start by examining the different switching elements and subsequent technologies that characterises the generations of microprocessor. Vacuum tubes and

electromechanical relays were used in the first generation Processors. One other important drawback of these early processors was that programming was done using machine

language. The discovery of transistor by three bell laboratory scientists - J. Bardeen, H. W. Brattain, and W. Shockley launched the Second generation processors. Transistors revolutionised electronics in general and computers in particular. Transistors were much smaller than vacuum tubes, consumed less energy, faster switching and more reliable. Programming of the CPU was done in assembly languages (symbolic languages) and followed by high level languages such as FORTRAN and COBOL. Standardization trend generally began in the era of discrete transistor CPUs. With this improvement more complex and reliable CPUs were built onto one or several printed circuit boards containing individual components. After the deployment of transistors as a switching element, CPUs were still large and occupies several circuit boards. The needs to reduce the size of components were primary preoccupation of engineers and scientists. A method of manufacturing many transistors in a compact space was developed. This method is known as Integrated Circuit (IC). An IC is a complete electronic circuit on a small chip of silicon. Beginning in 1965 ICs began to replace transistors in CPUs. In 1959, Jack Kilby and Robert Noyce independently invented a means of fabricating multiple transistors on a single slab of semiconductor material.

2.2 Scale of Integration

Dimensions on an IC are measured in units of micrometers, with one micrometer (1 pm) being one millionth of a meter. To serve as a reference point, a human hair is roughly 100 pm in diameter. Each year, researchers and engineers have been finding new ways to steadily reduce these feature sizes to pack more transistors into the same silicon area. There are different levels of integration. This has to do with the number of digital components that are placed on a single chip. The early ICs contained only one building block (logic gates) such as AND gates etc. CPUs based on this sort of IC are known as Small Scale Integration (SSI) devices. Such ICs contained tens of transistors. To build an entire CPU out of SSI ICs required thousands of individual chips, but still consumed much less space and power than earlier discrete transistor designs.

Scale of Integration

VLSI

LSI

MSI

SSI

1 10 100 1000 10000 100000 1000000

Transistor per Chip

Fig. 1 Scale of Integration

42

R&I, 2009, №4

With the advance in microelectronic technology more and more transistors are placed on a single chip. This resulted in the reduction of the number of ICs required for a complete CPU. Other levels of integration include Medium Scale Integration (MSI), Large Scale Integration (LSI) and Very Large Scale Integration (VLSI). The increase in the scale of integration results in increase in transistor counts to hundreds, tens of thousands and tens of millions, see figure 1. Lithography (method of tiny writing) is a technology that brought the possibility of squeezing 4 billion transistors on chip whose size is not more than a postage stamp [1].

The introduction of microprocessors marked the beginning of the fourth generation CPUs. MP is a complete CPU on a single chip of silicon wafer. This is actually an extension of the third generation CPU technology. If the early CPUs were designed for specific purposes (applications) the MP is actually a general purpose CPU on a single chip. The first commercially available MP was the Intel 4004, produced in 1971. It contained 2300 PMOS transistors [4]. Moore’s Law: As the number of transistors on a single chip kept on increasing, Fairchild Semiconductor’s director of Research & Development and later co-founder of Intel Gordon Moore observed in his paper titled “Cramming More Components onto Integrated Circuits” in1965 that the density of elements in ICs was doubling every 18 months, and predicted that the trend would continue for the next ten years. See section 3 for more details. With certain amendments, this came to be known as Moore’s Law. By 1971 when the first MP was produced Moore’s law was found to be an accurate predictor of the growth of the number of transistors on a single chip. Lithography is the reason why Moore’s Law endures after 44 years. First demonstrated in September of 2007, the 32nm SRAM test chip is a testament to the health of not only the 32nm process, but also of the health of Moore’s law. More information on photolithography can be found in section 4 of this paper.

2.3. CISC and RISC Architecture

CISC (Complex Instruction Set Computer) and RISC (Reduced Instruction Set Computer) are dominant processor architecture paradigms. Computers of the two types are differentiated by the nature of the data processing instruction sets interpreted by their CPUs. They both have advantages and drawbacks, which are detailed below. The IBM 360 system, created in 1964, was probably the first modern processor system, which initiated the idea of computer architecture in computer science and adopted micro-coded control. Micro-coded control facilitated the use of complex instruction sets and provided flexibility, thus appeared CISC. CISC was primarily motivated by a desire to reduce the “semantic gap” between the machine language of the processor and the high-level languages in which people were programming, the theory was that such a processor would have to execute fewer instructions and thus would have better performance [7]. To improve performance, CISC systems try to reduce the number of instructions programs must call. To do this, they have large sets of microcode instructions that cover a broad range of tasks. A single microcode instruction, in turn, when translated in the CPU, may become several tasks the processor performs. As a consequence, instructions

R&I, 2009, №4

are of variable length and often require more than one clock cycle to complete. However, according to the 20-80 rules, 20% of the available instructions are likely to be used 80% of the time, with some instructions only used very rarely. Some of these instructions are very complex, so creating them in silicon is a very arduous task. Instead, the processor designer uses microcode instead of hardwired control unit [8]. CISC processors are characterised by few number of registers, large instruction set (some simple and some complex), variable length instruction, instructions generally take more than 1 clock cycle to execute, microcode control etc. The processors that employ CISC architecture include DEC VAX, Motolora 68K and 680x0, x86 families of processors, Pentium MMX, to Pentium III.

In 70’s, John Cocke at IBM’s T.J Watson Research Centre provided the fundamental concepts of RISC, the idea came from the IBM 801 minicomputer built in 1971 which is used as a fast controller in a very large telephone switching system. This chip contained many traits a later RISC chip should have: few instructions, fix sized instructions in a fixed format, execution on a single cycle of a processor and a Load / Store architecture. These ideas were further refined and articulated by a group at University of

California Berkeley led by David Patterson, who coined the term “RISC” [9]. They realized that RISC promised higher performance, less cost and faster design time. RISC systems, on the other hand, seek to improve performance by reducing the number of clock cycles required to perform tasks. They have small sets of simplified instructions, doing away with microcode altogether in most cases. While this means that tasks require more instructions, instructions are all of the same length and usually require only one clock cycle to complete. Because of this, RISC systems are capable of processing instructions in parallel in a process called pipelining. RISC processors are characterised by large number of registers, fixed length instruction, pipeline, instructions generally take less than 1 clock cycle to execute, hardwired control, complexity pushed to the compiler etc. At the theoretical level a RISC chip has to execute more instructions to complete a given task, but it does this so fast that it ends up being faster than an equivalent CISC chip. This is because they only have to do a few simple tasks, so they can concentrate on doing them at really high speed. The early RISC processors are RISC I and RISC II from the University of California at Berkeley and the MIPS from Stanford University. Other typical RISC system includes HP PA-RISC, IBM RT-PC, IBM RS6000, Intel’s i860 and i960, MIPS R2000 (and so on), Motorola’s 88K, Motolora/IBM’s PowerPc, and Sun’s SPARC etc.

Currently, the difference between RISC and CISC chips is getting smaller and smaller. RISC and CISC architectures are becoming more and more alike. Many of today’s RISC chips support just as many instructions as yesterday’s CISC chips. The PowerPC 601, for example, supports more instructions than the Pentium. Yet the 601 is considered a RISC chip, while the Pentium is definitely CISC. So, the difference between RISC and CISC is no longer one of instruction sets, but of the whole chip architecture and system. The designations RISC and CISC are no longer meaningful in the original sense. What counts in a real world, are always how

43

fast a chip can execute the instructions it is given and how well it runs existing software [10]. The biggest threat for CISC and RISC might not be each other, but a new technology called EPIC. EPIC stands for Explicitly Parallel Instruction Computing. Like the word parallel already says EPIC can do many instruction executions in parallel to one another. EPIC was created by Intel and is in a way a combination of both CISC and RISC. This will in theory allow the processing of Windows-based as well as UNIX-based applications by the same CPU.

III. Types of MP

The discussion of MP will not be complete without looking at the different types and the direction of development. As designers found more and more applications for MP they pressured MP manufacturers to develop devices with architectures and features optimized for doing certain types of tasks. In response to the expressed needs, MP have evolved in two major directions during the last 15 years- Special and General purpose processors [4].

3.1 Special Purpose Processors

These types of processors are designed to perform specific functions like controlling a robot arm, controlling a production process. Some of them are optimized to handle graphics and multimedia. The processor used in play station is a good example of a processor optimized to handle streaming graphic information.

Embedded Controllers (Microcontrollers): Microcontrollers are frequently used in automatically controlled products and devices, such as telephones, clocks, automobile engine control systems, office and domestic machines, appliances, power tools, and toys. In contrast to general-purpose CPUs, microcontrollers may not implement an external address or data bus, because they integrate RAM and non-volatile memory (EEPROM) on the same chip as the CPU. Because they need fewer pins, the chip can be placed in a much smaller, cheaper package. It emphasises high integration, low power consumption, self-sufficiency and cost-effectiveness. Microcontrollers often operate at very low speed compared to modern day microprocessors, but this is adequate for typical applications [18].

An embedded system may have minimal requirements for memory and program length. Input and output devices may be discrete switches, relays, or solenoids. An embedded controller may lack any human-readable interface devices at all. For example, embedded systems usually don’t have keyboards, screens, disks, printers, or other recognizable I/O devices of a personal computer. The number of microcontrollers that is in use today is far greater than the number of general purpose processors. For instance in a typical home there may be say two desktop systems and a laptop, this implies that there is only 3 general purpose processors. In such homes you may have about 4 handsets, a refrigerator, a deep freezer, a microwave oven, TV set, DVD player and two toys. This translates to about 10 embedded processors which is far greater than the number of general purpose processors in the same home. It is important to note

that our desktop PCs also contain microcontrollers that handle input/output operations. The major characteristic of embedded processors is Low power. Consequently low power processors have also found their ways in cell phones and GPS receivers. Examples of such processors are low power Z-80 and 80386. Top Microcontroller manufacturers are; Texas Instruments, Atmel, National Semiconductor, Silicon Laboratories, NXP Semiconductors, NEC, Microchip etc.

Graphic Processors: A Graphics Processing Unit (GPU) is a processor attached to a graphics card dedicated to calculating floating point operations and the like. It is a specialized processor that offloads 3D graphics rendering from the general purpose microprocessor. It is used in embedded systems, mobile phones, personal computers, workstations, and game consoles. Modern GPUs are very efficient at manipulating computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for a range of complex algorithms. In a personal computer, a GPU can be present on a video card, or it can be on the motherboard. More than 90% of new desktop and notebook computers have integrated GPUs, which are usually far less powerful than those on a video card [18]. CPUs compute, while GPUs let you experience. Both are important, but most computers today are shipped with an underpowered Graphics Processing Unit. Whether you’re editing photos, watching videos, playing a game, or just using the latest operating systems, a powerful GPU will help your desktop or notebook PC run smoothly with jaw-dropping visuals. As the processing power of GPUs has increased, so has their demand for electrical power. High performances GPUs often consume more energy than current CPUs.

Many companies have produced GPUs under a number of brand names. In 2008, Intel, NVIDIA and AMD/ATI were the market share leaders, with 49.4%, 27.8% and 20.6% market share respectively. However, those numbers include Intel’s very low-cost, less powerful integrated graphics solutions as GPUs. Not counting those numbers, NVIDIA and AMD control nearly 100% of the market. VIA Technologies/S3 Graphics and Matrox also produce GPUs [18]. GPUs can be Dedicated or Integrated. A dedicated graphics cards have RAM that is dedicated to the card’s use. Integrated graphics solutions, or shared graphics solutions are graphics processors that utilize a portion of a computer’s system RAM rather than dedicated graphics memory. Computers with integrated graphics account for 90% of all PC shipments. These solutions are cheaper to implement than dedicated graphics solutions, but are less capable. Historically, integrated solutions were often considered unfit to play 3D games or run graphically intensive programs such as Adobe Flash. It is entertainment applications that are driving CPU performance now and for games and sound and graphics processing, you need floating point performance. Nvidia is a leader in the design and manufacture of graphic processors.

Digital Signal Processors (DSP): Digital Signal Processing is carried out by mathematical operations. In comparison, word processing and similar programs merely rearrange stored data. This means that computers designed for business and other general applications are not optimized for algorithms such as digital filtering and Fourier analysis.

R&I, 2009, №4

44

Digital Signal Processors are microprocessors specifically designed to handle Digital Signal Processing tasks. Digital signal processing algorithms typically require a large number of mathematical operations to be performed quickly on a set of data. Signals are converted from analog to digital, manipulated digitally, and then converted again to analog form. However, the last forty years have shown that computers are extremely capable in two broad areas, (1) data manipulation, such as word processing and database management, and (2) mathematical calculation, used in science, engineering, and Digital Signal Processing. All microprocessors can perform both tasks; however, it is difficult (expensive) to make a device that is optimized for both. Since DSPs are optimised for high speed arithmetic operations its instruction set is much smaller than that of a desktop microprocessor—perhaps no more than 80 instructions. This means that the DSP needs only a slimmed-down instruction-decode unit and fewer internal execution units. Moreover, any execution units that are present are geared toward high-performance arithmetic operations. DSP uses a Harvard architecture (maintaining completely physically separate memory spaces for data and instructions) so the chip’s fetching and execution of program code doesn’t interfere with its data processing operations. General purpose processors use Von Neumann architecture, whereby data and instructions are stored in the same memory but of course in different locations. Digital signal processors have far fewer transistors than a CPU, thus they consume less power, which makes them ideal for battery-powered products. Their simplicity also makes them inexpensive to manufacture, thus they’re well suited for cost-sensitive applications. Texas Instrument is the top supplier of chips for cellular handsets, as well as the number one producer of digital signal processors and analog semiconductors [11].

3.2 General Purpose Processors

These are the type of processors that powers laptops, desktops, workstations and servers. They are named general purpose because they accomplish a wide variety of tasks including those tasks specialized processors are designed for. They are characterized by large amount of registers and addressable memory locations. The tables below (table 1 to table 5) [12] show the major characteristics of these processors and the various improvements accomplished over the years by designers/manufacturers. There are so many companies manufacturing microprocessors but for the purposes of this paper we will concentrate on those

processors manufactured by Intel and Advanced Micro Devices (AMD) and little mention of Motorola. There is no doubt that Intel and AMD have monopolized the microprocessor industry and have engaged in strict competition over who will produce the best performing processors. We shall take a look shortly on how they have fared all these years. The third column on the table shows the size of the general purpose registers (GPR) and floating point registers (FPR or FP unit). The GPR usually determines the number of bits the ALU can handle at a time. Thus when we say 8-bit or 32-bit processor we are saying that the integer unit or ALU and GPR are 8 or 32 bits wide. The floating point unit (FPU) is also known as math processor or coprocessor. It has its own instruction set separate from the main processor and it is optimized to handle arithmetic operation on very small and very large numbers, that is real numbers.

The tables Table 1 through to Table 5 were organized in accordance with the various Intel families of processors to aid readability and understandability. The first column of these tables gives the type of processor and year it was first introduced. In this paper we have considered selected types of processors for a given year because there are so many types of processors that were produced each year. Some of the new ones are not actually better than their elder brothers. Marketing rather than technical considerations becomes ever more important, the once-clear distinctions between different products become blurred. Consequently we have chosen those processors that have actually showcased new technology and can be said to be distinct from their older cousins. These companies use a naming convention that is aimed at deceiving the buyers who always think that newer processors are better than the older ones without considering their needs. For instance Intel original Pentium III was actually Pentium II SSE with Streaming SIMD. Sometime the trick is in the number (clock speed).The Celeron 266 actually ran at 266MHz, but it performed like a Pentium MMX at 200MHz. AMD’s response to this was to revive the old and rather unpopular PR rating. The AMD Athlon XP family were introduced at 1333, 1400, 1466, and 1533MHz, but are sold as the 1500+, 1600+, 1700+ and 1800+ [19]. In some cases the manufacturers deliberately remark their chips to lower speeds even as the chip can operate at a higher speed. This was done for marketing purposes only, as lower speed ones were cheaper and sold at cheaper prices. When the buyers got hint of this, over-clocking of processors was born.

Table 1

Selected CPUs: x486 and Earlier Processors

Processors and Date introduced Clock Speed Internal/External - FSB (MHz) Register Width (Bits) External Data Bus/ Address Bus Width (Bits) Transistor Count per Die Die on-Die Caches Size/Technology (mm2/pm)

4004 - Nov 1971 0..1 / 0.1 4 4/640 Bytes 2,300 None 24/10 PMOS

8080 - Apr 1974 2 / 2 8 8/8 - 256B 6,000 None 20/ 6pm NMOS

Z-80 - July 1976 2.5-12 /2.5-12 8 8/16 - 64KB 6,000 None

8085 -Mar 1976 5/5 8 8/16 - 64KB 6,500 None /3 NMOS

8086 - June 1978 10, 8, 4.77 / same 16 16/20 - 1MB 29,000 None 33/3.2pm

MC68000 - Sep 50 16 /24 - 16MB 68,000

1979

R&I, 2009, №4

45

80286 - Feb 1982 12, 10, 6 / same 16 32/24 - 16MB 134,000 None 1.5pm

80386 DX - Oct 1985 16, 20, 25, 33/same 32 32 / 32 - 4GB 275,000 None 1.0 -1.5

80486 DX - Apr ‘89 80486DX - Mar’94 25, 3, 50, /20-50 75 - 100 32, 80FPU 32/32 - 4GB 1,200,000 1,600,000 L1-8KB Unified /1, 0.8 (50 MHz) 345/0.6

Table 2

Pentium Processor Family

Processors and Date Clock Speed Register External Data Bus/ Transistor On-Die Caches Die

introduced Internal/External Bus Width Address Bus Count per Size/Process

- FSB (MHz) (Bits) Width (Bits) Die (Millions) Tech. (mm2/nm)

Pentium Processor family

Pentium - Mar 1993 - Mar- 60- 200/50-66 32, 80FPU 64/32 - 4GB 3.1 L1-2x8KB 294/800

94 (Desktop) 3.3 148/600, 350

Pentium Pro- Nov 1995 - 150-200/60-66 32, 80FPU 64/32 - 4GB 5.5 L1- 2x8KB 202/600,

Jan-96 (Desktop) L2- 256,512KB 196/350

Pentium with MMX-Oct-96 166 -300/60 - 66 32, 80FPU, 64/32 - 4GB 4.5 L1-16KB 140/350, 250

- Sep-97 (Desktop/Notebook) 64MMX

Pentium Extreme 3,2GHz/800MHz 32 64/36 - 64GB 230 L2 - 2 MB 203/90

EditionApr-05 - Jan-06 3.2, 3.46, 3.73

(840,955,965) /1.066GHz 64/36 - 64GB 376 L2 - 4 MB 162/65

Pentium Dual Core -Jan-07 Mobile T2130, T2060,T2080 Jul-07 1.6, 1.73/ 533 32 64/36 - 64GB 176 L2 - 1MB 90.3/65

Desktop E2140, E2160 1.6, 1.8/800MHz Same Same Same Same

Pentium II -Jan 1997 -Aug- 233-300/66 32, 80FPU, 64/36 - 64GB 7.5 L1- 2x16KB 202/350

98 (Desktop) 64MMX L2-256KB or

(Server) 333,350, 450 /66, 512KB

100 same same L2 - 512KB 104/250

Pentium II Xeon -Jun-98 400-450/100 32 64/36 7.5 L1- 512KB /250

(Servers) L2- 2MB

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Pentium III Mobile Celeron 266,333, 366-466/ 32 64/36 18.9 L2 - 128KB /250

Jan-99 - Sep-02 450-850/100

(Mobile) 1.6-2.5GHz /133, 400 L2 - 128KB 104/180

L2 - 256KB /130

Pentium III Xeon Oct-99- 0.6 -1GHz/ 100, 133 32 64/36-64GB 28 L1- 256, 512 104/180

May-00 500-550 L1- 2MB

(High End) 9.5 same /250

Pentium III with Speedstep 0.750 - 1GHz 32 64/36 28 L2 - 256KB /180

tech. Jan-00 /100MHz Advanced xfer

(Mobile PC) cache

Pentium III Mobile Jul-01 - 1 - 1.33GHz 32 64/36 44 L2 - 512KB /130

Sep-02 /133MHz Same with

(Mobile PC) advanced xfer

Server Jan-02 1.4GHz/133 same cache same

Pentium III Celeron Mar-04 0.9 - 1.7Ghz/400 64/36-64GB 140 L2- 512K, /130, 90

- Mar-06 (Notebook) 1.46- 2GHz /400 151 1MB L2- 1MB /65

Pentium M

Pentium M Mar-03 (Mini-Notebook) 0.9 - 1.7/400 32 /32 - 4GB 77M L2 - 1MB 83/0.13

Pentium M May-04- Jan-05 (Full size Mobile PC) 1 -2.13/400, 533 32 /32 - 4GB 144M L2 - 2MB 87/90nm

Pentium 4

Pentium 4 Nov-00 1.3 - 2G /400 32 42 L2 - 256KB 217/180

Aug-01 - Mar-02 2 - 2.8 /400-533 32 64/32 - 4GB Same 55 L2 - 512KB 131/130

With HTT Nov-02 3.06 - 3.40 /533 -800 55 L2 - 512KB 131/130

Feb-Jun-04 2.8 - 3.8/800 64 125 L2 - 1MB 112/90

Feb-05 (Desktop/Gaming) 3 - 3.6/800, 1.066 64 same 169 L2 - 2MB 135/90

Pentium 4 Jun-02 - Sep-03 (Appied comp/Mobile) 1.7 - 3.2G /400, 533 32 55 L2 - 512KB 131/130

Pentium 4 with HTT 518, 532(Mobile)Jun-04 -Sep-04 2.8- 3.46 G /533 32 125 L2 - 1MB 112/90

Pentium 4 Extreme Edition 3.2 - 3.46/800, 1.066 64 64/36-64GB 178 L2 - 512KB

htt Nov-03Nov-04(Gaming) 178 L3 - 2MB /130

Xeon Processors

Xeon MP Sep-08 2.13 - 2.66 /1.066 64 64/40 - 1TB 1.9B L2 - 8MB 503/45

46

R&I, 2009, №4

E7420, 30, 40 (Server) Same same 12MB 16M 503/45 same

Pentium D

Quad Core Xeon Jan-08 2.50 - 2.83 /1333 64/36- 64GB 456 L2 - 6 MB 164/45

X3320 (Server) 820 12MB 214/45

Pentium D May-05 2.8 -3.2/800 64 64/36-64GB 230M L2 -1MB 206/90nm

(Desktop PC)

Pentium D Jan-06(Desktop) 2.8-3.733/800, 1.066 64 64/36-64GB 376M L2 - 2MB 280/65nm

The second column of these tables indicates the processor and the front side bus (FSB) speed. For some processors like the Core i7 family the second column indicates the processor speed and the QPI (Quick Path Interconnect). Early processors have very low computational power. The first processor was never used in a personal computer; it was designed for use in a calculator. Most of these processors go for a microcontroller. Interestingly the Z-80 was the most popular processor of its time. It was developed by a group of ex-Intel engineers who set out to improve on the 8080 but still maintain compatibility with it. Zilog pushed it to faster and faster clock speeds over the years. It was more than a match for the clumsy first-generation 16-bit CPUs like the Intel 8086 and can still do useful work. Incredibly, versions of the 30-year-old 8-bit Z-80 are still in production. Early microprocessors did not incorporate FPUs, but real number crunching is done using high level or low level programs written for the main processor. This reduces the speed of processing and it explains why these processors perform poorly on graphic applications. The 8086 incorporates 8087 math processor which is used in parallel with the main processor. Intel 386 DX a high-performance chip of choice for a long, long time. The DX-33 was never a volume seller because of the AMD386DX-33’s entry into the market. This was a mixture of AMD’s own design work and the Intel technologies covered under the existing ten year cross-license agreement. In 1976 and 1982 Intel signed a technology sharing contract with AMD which allowed AMD to use Intel intellectual property to make 80286 processors. But AMD wanted to make the 386 Intel objected and the legal battle began. AMD declared that they will use the parts in their position to make the 386. And in late 1991 AMD won the case and the AMD386DX-33 was born. This also marked the beginning of Clone wars. This generally helped to bring down the cost of these class and subsequent classes of processors at the time. It was the buyers that actually benefitted immensely from this situation. AMD made and sold a moderate number of its own 386DX-33, but soon moved on to that all-time great chip, the 386DX-40 [19].

The 32-bit 80486 uses the same processor used in the 386, but this time the coprocessor 80387 was integrated in the same die as the main processor. The math processor being integrated on the chip allows it to execute math instructions about 3 times as fast as 386/387 combination [4]. The 486 brought a number of mainframe techniques into the x86 worlds for the first time: internal cache, rudimentary branch prediction, integrated FPU, and a five-stage pipeline. A 386 does incorporate pipeline technique. Instructions are loaded one after another into the RAM end of the pipeline and the CPU just takes them from the other end as needed. All modern CPUs are heavily pipelined.

After the 486 processor, Intel came up with the next generation of the IA-32 family of processors with the

Pentium processor in 1993. It has internal data paths of 128 and 256 bits to handle multiple 32 bit size simultaneously for processing. It incorporates an advanced programmable interrupt controller (APIC) and multimedia capability -MMX technology. The number of processors per die varies tremendously depending especially on the number of caches that are on the die, this also enhances the speed of execution of instructions by the processors and thus its performance. The main new feature in the fifth-generation P5 Pentium processors was the superscalar architecture, in which two instruction execution units could execute instructions simultaneously in parallel. The P6 family of processors are Pentium pro, P2, P2-Xeon, Celeron, P3, P3-Xeon. The P6 architecture upgrades the superscalar architecture of the P5 processors by adding more instruction execution units and by breaking down the instructions into special micro-ops. This is where the CISC instructions are broken down into more RISC commands. Original P6 processor includes 256KB, 512KB, or 1MB of full-core speed L2 cache. P6 with 512KB L2 cache runs at half-core speed. The P6 Celeron has no L2 cache but Celeron-A has 128KB of on-die full-core speed L2 cache. The P2’s L2 cache runs at half speed, the Xeon’s runs at full speed, and is available from 512KB to 8MB. The first P3 has 512KB of half-core speed L2 cache and subsequent ones run at full core speed.

Pentium 4. The release of Pentium 4 marks Intel’s first allnew x86 design since the Pentium Pro. It is the successor of the P6 family with an entirely new architecture. The Pentium 4 is a single-core mainstream desktop and laptop central processing units (CPUs) introduced in November 20, 2000. Pentium 4s were shipped last on August 8, 2008. They had the 7th-generation microarchitecture, called NetBurst, which includes features such as Hyper Pipelined Technology, Rapid Execution Engine and Execution trace cache which are firsts in this particular micro-architecture with high speed internal data paths. The execution trace cache of L1 stores decoded micro-operations, so that when executing a new instruction, instead of fetching and decoding the instruction again, the CPU can directly access the decoded micro-ops from the trace cache, thereby saving a considerable amount of time. Moreover the micro-ops are cached in their predicted path of execution, which means that when instructions are fetched by the CPU from the cache, they are already present in the correct order of execution. Astonishingly, the Pentium 4 did not improve on the old P6 design in either of the normal two key performance measures: integer processing speed or floating-point performance. It was a surprise to the computing community. At 1.5GHz, the Pentium-4 was not only inferior to the Athlons, it couldn’t beat the Pentium-III. The Pentium 4 design sacrificed orthodox performance in order to gain two things: clock speed and Streaming SIMD Extension (SSE) performance. While it did quite a lot less per clock-tick than an Athlon or a Pentium-III, it ticked over

R&I, 2009, №4

47

faster - 1.5GHz on introduction, 2GHz inside the year, and 3 to 4GHz within the next year or so after that. The Pentium 4’s SSE unit was exceedingly fast — easily faster than the equivalent SIMD units (be they MMX or SSE or 3Dnow) in any of the AMD chips or in the Pentium 2s and 3s.

Pentium M. which was designed by the Intel’s center in Israel was a truly excellent little chip, based on the old Pentium III but heavily revised was designed as a low-power notebook part [22]. The Pentium M blows the P4 into the weeds, and gave an Athlon XP a very good run for its money.

The Pentium D brand refers to dual-core desktop microprocessors. It was first introduced in 2005. The Pentium D processor is a multi-core processor, consisting of two Pentium 4 processors and each core sits on its own die. The Intel Pentium D processor features the first desktop dual-core design with two complete processor cores, that each run at the same speed, in one physical package. There are so many Pentium D revisions and the maximum clock speed reached

Ta

Intel Паш

is 3.7 GHz for the extreme edition. It is a desktop processor with core 8xx and 9xx series.

Itanium Processors - The first 64 bit processor which uses the IA-64 was Itanium code named Merced delivered in 2001. Itanium architecture provides for 128 64-bit general purpose registers, 128 82-bit floating-point registers 64 1-bit predicate registers The Intel Architecture - 64-bit (IA-64) is a unique combination of innovative features, such as explicit parallelism (instruction-level parallelism, in which the compiler makes the decisions about which instructions to execute in parallel.), predication, speculation etc. That architecture is designed to be highly scalable to fill the ever increasing performance requirements of various server and workstation market segments. The IA-64 architecture features a revolutionary 64-bit instruction set architecture (ISA) which applies a new processor architecture technology called EPIC (Explicitly Parallel Instruction Computing). This architecture is compatible with the IA-32 instruction set.

E 3

Processors

Processors and Date introduced Clock Speed Register External Data Bus/ Transistor on-Die Caches Die

Internal/External Bus Width Address Bus Count per Die Size/Process

- FSB (MHz) (Bits) Width (Millions) Tech.

(Bits) (mm2/^m)

Itanium

Itanium May-01 733, 800/266 64/44-16TB 25M L1 - 2x16K /0.18

(Enterprise) L2 - 256KB L3 - 2, 4MB off die

Itanium 2 Jul-02 - Apr-04 0.9 -1 G/400 64 64/40-1TB 220M L2 - 1,5, 421/180

Nov-04 1.3 - 1.6G/400 410M 3MB 374/130

(Servers) 1.5, 1.6G/ 1.5- 6MB 374/130

3, 4, 6, 9MB

592M 432/130

Itanium 2 Jul-05 Mid-06 1.6/667M 64/40 - 1TB 1.72B L2 - 6-9MB 596/90nm

(Enterprise) 1.6/400 same same 12MB same

Dual Core Itanium 2 Jul-06 1.4 -1.6/400 64/40 - 1TB 1.72B L3 - 16MB 596/90nm

The birth of EPIC started when Hewlett Packard and Intel announced that they were forming an alliance to jointly develop a new 64-bit architecture using existing Very Long Instruction Word (VLIW) technology as a starting point. At that point the 64-bit x86 processor Intel was developing under the code name P7 was quietly dropped in favour of the particular flavour of VLIW that HP researchers had been quietly working on for about five years. Itanium was one of the biggest technological flops in the history of computing. Its sales were so disastrously beneath expectations. only a few thousand systems using the original Merced Itanium processor were sold, due to relatively poor performance, high cost and limited software availability. Recognizing that the lack of software could be a serious problem for the future, Intel made thousands of these early systems available to independent software vendors (ISVs) to stimulate development. HP and Intel brought the next-generation Itanium 2 processor to market a year later.

Core Processors - use Core architecture. It is a multi-core processor microarchitecture that was a replacement for Intel’s NetBurst microarchitecture, which has been in use across desktop, mobile and server platforms since its release in the Pentium 4 range of processors. The extreme power consumption of NetBurst-based processors and the resulting

inability to effectively increase clock speed was the primary reason Intel abandoned the NetBurst architecture. The Intel Core Microarchitecture was designed by the Intel Israel (IDC) team that previously designed the Pentium M mobile processor. Yonah is the first multicore chip made by Intel using the new 65 nanometer process. AMD which is the closest rival to Intel was almost a year behind in adapting to this technology. Yonah was Intel’s first multicore processor that featured both cores on a single die. Although Yonah (Core Duo and Core Solo) features the new Core logo, it doesn’t actually make use of the new Core microarchitecture see table 4. When manufacturing the Core Duo processors, Intel encountered defects that rendered one of the two cores inoperable. Since one of the cores was perfectly functional, Intel decided to disable the second core and call the processor the Core Solo. Like the Core Duo, the Core Solo is used exclusively for mobile computers and is not a 64-bit processor. When you begin looking at the Core processors, keep in mind that the clock speeds are down a bit from the Pentium line. This is because the Core technology focuses on improving performance per clock cycle rather than on improving performance by increasing clock cycles. The first chip that carried the new core architecture was Woodcrest, the first eighth generation server and workstation chip, which

R&I, 2009, №4

48

was released on June 19, 2006. After Woodcrest, was Conroe, the first eighth generation desktop chip that was released on July 23, 2006. Finally, Merom, the first eighth generation mobile chip, that was released in August 2006. The Conroe and Merom feature the new Core 2 Duo logo. Like Intel, AMD has several processors in its dual-core line-

up. However, before AMD made the move to dual-core processor technology, it focused on integrating 64-bit technology into the Athlon XP processors. In doing so, AMD was able to take into account the transition to dual-core processors earlier in the design process.

Table 4

Selected Intel Core Processors

.Processors and Date Clock Speed Register External Data Bus/ Transistor on-Die Caches Die

introduced Internal/External Bus Width Address Bus Count per Die Size/Process

- FSB (MHz) (Bits) Width Tech.

(Bits) (mm2/pm)

Core Processors

Core Solo U1400/U1500 Mar-06 1.2-1.3/533 32 64/31 - 2G 151M L2 - 2MB 90/65nm

(Mobile PC) U1300 (Mini & Thin) 1.06/533 same 152M same same

Core Solo T1300/T1400 1.66 - 1.83/667 32 64/31 - 2G 152M L2 - 2MB 90/65nm

Aug-96 Core Duo Feb-06 T2.... 1.6 - 2/533 32 64/31 - 2GB 151M L2 - 2MB 90/65nm

(Mobile) Mar-06 1.06-1.20/533 64/31 - 2GB 151M L2 - 2MB 90/65nm

(Mini, Thin) Core i7 Nov-08 2.66 - 2.93/4.8GT/S* 64 64/36-64GB 731M L3- 8MB 263/45nm

Quad Core 956 Extreme (Desktop) 3.20/ 64/36-64GB 731M L3- 8MB 263/45nm

Core 2 Processor

Core 2 Solo Jan-06 U2100.. . 1.2, 1.06/533 64 64/36-64GB 291 L2 - 1MB 143/65nm

(Mobile PC) Core 2 Duo Jul-06 64

U7 1.2, 1.6/533 64/36-64GB 167M L2 - 2MB 143/65

L7 1.33 - /667, 800 64/36-64GB 291M L2 - 4MB 143/65

T5... 1.66-1.83/667 64/36-64GB 291M L2 - 2MB 143/65

T7... Jul-07 (Mobile PC) Core 2 Duo 2- 2.6/667, 800 64 64/36-64GB 291M L2 - 4MB 143/65

E6... Jul-06 1.8-2.66/1.066 64/36-64GB 167M, 291M L2 - 2, 4MB 111/65

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

E4..Apr-07 (Desktop) 1.8-2.2/800 64/36-64GB 167M L2 - 2MB 143/65

Core 2 Quad Q6... Jan-07 (Desktop) 2.4-/1066,1333 64 64/36-64GB 582M L2 - 8MB 286/65

Core 2 Extreme 3GHz/1333 64 64/36-64GB 820M L2 - 12MB 214/45

QX9... Nov-07 Core 2 Quad Q9.. Jan-08 (Desktop) 2.5, 2.83/1333 64 64/36-64GB 820M L2 - 6, 12MB 214/45

Core 2 Duo P9....Dec-08 (Mobile PC) 2.53, 2.66/1066 64 64/36-64GB 410M L2 - 6MB 107/45

*Intel Quickpath Interconnect - replaces the FSB in Core i7

Core 2 Duo is designed for both desktops and laptops. The Core 2 Duo processor’s model number system uses the 6000 series for desktops and the 5000 and 7000 series for laptops. The higher the model number, the higher the clock speed. The processor that powers the computer on which this paper was prepared is Intel Duo 2 Core T5450. A 2 MB Level 2 cache is on die bringing the total number of processors on the die to 291 millions. Each of the processors runs at 1.66 GHz with FSB speed of 667MHz. The process technology is 65nm with a die size of 143mm2 and a maximum TDP of 35W.

Intel Core i7 is a family of several Intel desktop x86-64 processors, the first processors released using the Intel Nehalem micro-architecture and the successor to the Intel Core 2 families. It has quickpath interconnect which replaces the FSB. This means that the Northbridge chipset is not required. It is a Quad-Core processor with on-die memory

controller: the memory is directly connected to the processor. The following caches: 32 KB L1 instruction and 32 KB L1 data cache per core; 256 KB L2 cache (combined instruction and data) per core; 8 MB L3 (combined instruction and data) "inclusive", shared by all cores; Single-die device: all four cores, the memory controller, and all cache are on a single die. It features 8 simultaneous threads with hyper-threading. All Core i7 processors are used in desktop PCs.

Intel’s Atom Processor: The fact that processor

manufacturers are migrating from the megahertz to low power processors cannot be expressed in a better way than the release of the Atom processor by Intel. As Intel’s smallest and lowest power processor, the Intel® Atom™ processor is a single core processor which enables the latest Mobile Internet Devices (MIDs), and another new category of devices for the internet called netbooks (Internet-centric

R&I, 2009, №4

49

mobile computing devices)” and nettops ( basic Internetcentric desktop PCs). Table 1.6 shows that the processor was manufactured using 45nm process and has a core speed of 1.6 GHz and capable of addressing 4 GB of physical memory. The processor remains software compatible with previous 32-

bit Intel® architecture and complementary silicon. Intel Atom processor N270 at 1.6 GHz core speed with 533 MHz frontside bus (FSB) has 2.5 watts thermal design power2 (TDP)

[13].

Table 5

Atom Processor

Processors and Date introduced Clock Speed Internal/External Bus - FSB (MHz) Register Width (Bits) External Data Bus/ Address Bus Width (Bits) Transistor Count per Die On-Die Caches Die Size/Process Tech. (mm2/^m)

Atom Processor

Atom 230- single core Q2’08 (Nettop) 1600/533 64 47 L1-32KB Instr 24 Data L2- 512KB 26/45

Atom 330 -2 cores Q3’08 (Nettop) 1600/533 64 64/32-4GB 47 L1-32KB Instr 24 Data L2- 2x512KB 26/45

Atom Jun-08 N270 (Netbook) 1600/533 32 64/32-4GB 47M L1 - 32KB Instr and 24KB WBD L2 - 512KB 26/45nm

Atom Z5xx Apr-08 (MIDs) 800 -1860/400, 533 32 64/32-4GB 47M L2 - 512KB 26/45nm

Westmere Processor family: This 32nm chip will debut in the last quarter of 2009. Intel had already upgraded its fabs in the United States to meet the rollout plan. The new 32-nm chips, developed under the code name Westmere, offer increased performance without an increase in the thermal envelope. Mobile and desktop processor production will begin in the fourth quarter of 2009, with an unspecified rollout date to follow. The 32-nm chips will feature two processing cores and four instructional threads, with integrated graphics. Chips for mainstream desktops are being developed under the code name Clarkdale, while the processors for thin and light notebooks are code-named Arrandale.

Ta

Feature Trends

Microprocessor feature trends are summarised in Table 6. These include microprocessor clock rate trends, increasing number of processors per die, etc. over the last 16 years. The exponentially rising clock rate indicates several changes in testing over the next 10 years. Transistor feature sizes on a VLSI chip reduce roughly by 10.5% per year, resulting in a transistor density increase of roughly 22.1% every year. An almost equal amount of increase is provided by wafer and chip size increases and circuit design and process innovations

[14]. This can be seen in Figure 2, which shows a nearly 44% increase in transistors on microprocessor chips every year, approximately doubling every two years as stated by Moore’s law.

E 6

Processor Chips

Year 1997 - 2001 2003-2006 2007-2012

Feature Size, pm 0.25-0.15 0.13-0.10 0.07-0.022

Millions of transistors/cm2 4-10 18-39 84-180

Number of wiring layers 6-7 7-8 8-9

Die size, mm2 50-385 60-520 70-750

Pin count 100-900 160-1475 260-2690

Colck rate, MHz 200-730 530-1100 840-1830

Voltage, V 1.2-2.5 0.9-1.5 0.5-0.9

Power, W 1.2-1.6 2-96 2.8-109

The current 45nm process will be followed by 32nm going by Intel’s assertion in September 2007 that it will ramp up performance and energy efficiency in its microprocessors by using a 32-nanometer process technology starting in 2009. During a keynote at the Intel Developer Forum in San Francisco, Intel president and CEO Paul Otellini showed a 300mm wafer built using the 32-nm manufacturing technology. The chip will house more than 1.9 billion transistors and its increased performance will enable “true to

life entertainment and real-life graphics capabilities,” Otellini said in his keynote. The chips will be an upgrade over processors built using the 45-nm process. It is important to note that earlier in 2007 a group of chipmakers led by IBM agreed to further collaborate to jointly develop 32-nm semiconductor production technology. Other companies in the collaboration include Freescale Semiconductor, Chartered Semiconductor Manufacturing, Infineon Technologies, and Samsung Electronics. However, Intel is the first company to demonstrate working 32nm processors [15].

50

R&I, 2009, №4

Transistor Count per Die

Fig. 2. Transistor Count Trends

Naming and Numbering Convention. For a very long time core speed has been the accepted benchmark used when comparing processors and their performance, processor technology has evolved far beyond the speed of the core and the multiplier alone. After it become evident that performance can no longer be measured by CPU speed alone, Intel moved to a new strategy - processor numbers. Intel opined that the sum of all the features of a processor is far greater than the speed rating alone. Prior to this new strategy processor were named according to its speed, for example, Pentium 4 with HTT 3.40 GHz. And older processors will continue to use the old naming convention.

Intel processor numbers are based on a variety of features that may include the processor’s underlying architecture, cache, FSB, clock speed, power and other Intel technologies. A processor number represents a broad set of features that can influence overall computing experience but is not a measurement of performance. Once you decide on a specific processor brand and type, compare processor numbers to verify the processor includes the features you are looking for. Intel’s processor number system is used with the following brands: Core processors, Pentium processors Celeron processors, Atom processors, Xeon and Itanium processors

[15]. A higher number within a processor class or family generally indicates more features, including: cache, clock

R&I, 2009, №4

speed, Front Side Bus, Intel® QuickPath Interconnect, new instructions, or other Intel technologies. In 2004, Intel introduced processor numbers for desktop and mobile systems with the goal of allowing customers to quickly differentiate among comparable processors and consider more than one processor feature during the selection process. A higher processor number may also have more of one feature and less of another. Let us now look at the numbering scheme for each of the family. The numbering scheme is shown in table 7.

Chipset and motherboard numbers are aligned to the appropriate processor and chipset respectively for ease of matching. Chipset numbers and are indicated by a one-letter suffix. For example, in the name Intel® 5000X, the X identifies this as a chipset and 5000 identifies it as a chipset for the Intel Xeon Processor 5000 series. Again, the letter suffix does not have any inherent meaning — for instance, “A” is not necessarily more powerful than “B.” Board numbers are based on the chipset number and are indicated by a 2-letter suffix following the chipset suffix. For example, Intel® Server Board S5000PSL is the server board for the volume segment (Star Lake) family based on the Intel 5000P chipset [20]. Generally Intel processor names consist of two parts, the brand name and the processor number. Show below is an example.

51

г

Processor Name

_______________________

Intel® Core™ 2 Quad processor

Brand

Q9550S Л

Number

Core processor: The processor number consists of alpha prefixes followed by four digit numbers. The alpha prefixes

indicate whether the processor is for a desktop, server or mobile (laptop). Processor numbers for the Intel® Core™2 processor family brands are categorized with an alpha prefix followed by a four digit numerical sequence. The alpha prefixes also indicate the maximum range of TDP for a given processor. The following alpha prefixes are used to indicate processors for mobile systems: T, P, L, U.

Table 7

_______________________________________________Numbering scheme________________________________________________

Alpha Prefix____________________________________________________________Description____________________________

__________________________Mobile___________________________________________________

Highly energy efficient processors with TDP 30-39W Highly energy efficient processor with TDP 20-29 W Highly energy efficient with TDP 12-19W ultra high energy efficient with TDP less than or equal to 11.9W Small form-factor with 22x22 BGA package core Solo core Duo

Core 2 Duo_________________________________________________________________________

__________________Desktop (E, X, Q, and QX)________________________________________

Desktop energy efficient dual-core processors with TDP greater than or equal to 55W (E6xxx) Desktop or mobile dual-core extreme performance processors Quad-core high performance processors Quad-core extreme performance processors

Core i7 high performing processors for extreme gamers and enthusiasts.

Pentium Dual Core: TDP that is greater than or equal to 65W________________________

Intel Server Processors (Xeon and Itanium)

X High performance

E Mainstream (rack optimized)

L Power optimised

E

X

Q

QX

920, 940, 965. Core i7-965 E2xxx, E5xxx, E6xxx and E7xxx

T

P

L

U

S

T1xxx, U1xxx T2xxx, L2xxx and U2 T5xxx, T7xxx, P, L, U and S

Please note that the alpha prefixes X and QX are also used for mobile dual-core and quad core extreme performance processors respectively. Processor number for the Core 2 Quad family has in addition to the alpha prefix and four digit numerical sequence is further identified by an “S” suffix which represents processors having a lower TDP. Itanium Processor 9000 ^ Multi-processor and dual-processor; Xeon Processor: 7000 series ^ Multi-Processor; Xeon Processor 5000 series ^ Dual-processor; Xeon Processor 3000 series ^ Single-Processor.

Like Intel AMD uses processor numbers to differentiate different processor families. Athlon 64 X2 is the brand name for AMD's line of 64-bit, dual-core processors for desktop computers. Athlon 64 X2 processor's model number system starts with 3800+ and move up to 5600+. The higher the model number, the higher the clock speed. Athlon 64 FX is a version of the Athlon 64 processor aimed at gamers and digital media creation professionals. It has a higher clock speed than Athlon 64 X2, as well a host of other features specifically aimed at enhancing graphics, video, and animation. Athlon 64 FX is available in dual- and quad-core versions: Athlon 64 FX-60 is the dual-core version; Athlon 64 FX-70, Athlon 64 FX-72, and Athlon 64 FX-74 are the quad-core versions. Turion 64 X2 is AMD's 64-bit, dual-core, low-power processor designed for laptops. Currently crop of Turion 64 X2 processor's model number system consists of two letters followed by two numbers: They start with TL-50 and move up to TL-60.

Processors (sometimes in conjunction with chipsets and/or server board components) may also contain other Intel technologies and capabilities that affect performance and may

be reflected in incremental processor numbers. These include features such as Hyper-Threading Technology1 , 64-bit technology, Intel® Virtualization Technology, Intel® I/O Acceleration, Technology, Intel® Active Management Technology2 , etc. [20].

Performance Leadership between Intel and AMD Processors. Both AMD and Intel manufacture microprocessors based on the x86 architecture. The war b etwe en the se tw o o rganizatio ns has prob ab ly b ee n the most prolonged war in the computer world. The corporate rivalry dates back to 1969, with the setting up of the Advanced Micro Devices Corporation, just one year after the establishment of Intel Corporation. In January 1995, both the organizations settled their litigations, but the processor wars continued. This paper is not targeted at elaborate comparisons between these two giants of the microprocessor world. However it is important to mention a few of the differences between the processors of these giants, especially their earlier processors. So far as cost-no-object performance goes, Intel dominated the mainstream CPU market from the time they started it all in 1971 through until around the end of the 21st century. Until the Athlon arrived to change the landscape, only four times had other chip makers succeeded in making a faster mainstream CPU than Intel's best: Zilog produced their immortal Z-80 in 1976 and dominated the market with it until the rise of the 16-bit 8086/8088 twins in the early '80s. From that time on, Intel had an incredible unbroken run of 18 straight years at the top until it stumbled with their lack-lustre Pentium Pro.

When it comes to the overall power consumption of a system, performance to cost ratio, 3D Gaming, MP3 and

52

R&I, 2009, №4

Video Encoding, graphics AMD is a winner when its processors are compared to Intel's previous processors i.e. processors before the debut of the Core 2 and Quad-core processors. AMD processors were cheaper as compared to Intel’s previous processors. Experiments have also proved that a machine running on an Intel Core 2 Duo processor consumed, at least 7W more power, than an AMD Sempron. When it comes to cooling and productivity computers working on Intel's Core 2 Duo processor and Quad-core processors definitely have superior cooling features and better heat sinks, when compared to the AMD machines. Not only this, but the Core 2 Duo processors could reach to a speed of 3.2 GHz on proper cooling. Office Productivity and Multitasking (word processing, spreadsheet, internet browsing speed) , which is an important feature that worries every computer user will not be the same for 32-bit and 64-bit processors. A 64-bit processor will definitely outperform 32-bit processors on all platforms. We can say that the situation between processors of these two organisations is fairly balanced.

IV. Fabrication of Microprocessor

An IC is made from layers of doped silicon, polysilicon, metal and silicon dioxide, built on top of one another, on a thin silicon wafer. Some of these layers form transistors, and others form planes of connection wires. The basic step in IC fabrication is to construct a layer with a customized pattern, a process known as lithography. Today’s IC device technology typically consists of 10 to 15 layers, and thus the lithography process has to be repeated 10 to 15 times during the fabrication of an IC. The trend in wafers has moved from 200mm (eight-inch) diameter to a bigger, 300mm (12-inch) diameter wafer. This has increased surface area dramatically over the smaller 200mm design and boost chip production to about 675 chips per wafer. Intel and other manufacturers are already using 300mm wafer production.

4.1. Measuring units

In the table1.2 through 1.8 on selected microprocessors discussed above we have made mention of the technology

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

process and die size. It is useful to review the size scale of the chips being discussed to further aid us appreciate how small these chips are. One micrometer is one millionth of a meter and one nanometer is one billionth of a meter. At the micrometer scale is the thickness of a human hair and size of bacteria, viruses are nanometer scale in size and atoms are picometers in size [16]. The smallest processes currently in production up to the third quarter of 2009 are 45nm processes and some of these processes actually have features smaller than 45nm.

4.2. Lithography

Optical lithography is the most important technology employed in chip making and is the pillar that will be very difficult to topple. Lithography is technology that uses radiation with about half the wavelength of purple light. The future chip shrinking will depend on the future of lithography. In both logic and memory chips each of the vast profusion of transistors acts like a switch that allows electrons to flow through the device. A metal-oxide semiconductor field-effect transistor (MOSFET) which is virtually used in all modern chips has three main parts: a source, a drain and a gate. A voltage applied to that gate lets the electrons to flow from source to drain. Physically the gate sits between the source and the drain. The industry sold microprocessors based on how fast the chips could process instructions, and that rate was pretty much directly related to how small the gate width was [1]. This war for a higher processor speed drove the shrinking, with the result that the gate width got smaller than the half-pitch. A Pitch is a distance between metal plates at the source and the drain. Because microprocessor speed was largely determined by the dimensions of the gate, by 2000 the gate had become the smallest feature produced in the semiconductor industry. For logic devices the gate length became the smallest feature, but for memory the half-pitch remained the smallest feature [1]. For memories speed is also a key parameter, but there was no similar war between memory manufacturers seeking to drive up the clock frequency.

Transistor count, lithography and Die size comparison

chart

10000000000 1000000000 100000000 10000000 1000000 100000 10000 1000 100 10

^ ^ ^ ^ o?nN

□ Transistor Count

□ Lithography(nm)

□ Die Size(mm2)

Fig. 3. Technology Trends

They rather concentrated on reducing the size of each memory cell on their chips so that they could squeeze ever

R&I, 2009, №4

53

more bits into less and less real estate. It is important to note at this juncture that the shrink rate of memories is higher than that of logic. And it is likely that memories will embrace the new technology that will replace optical lithography - EUV (Extreme Ultra-Violet lithography) [17]. EUV is also uses optics, since it uses electromagnetic radiation but has a wavelength of about one-fifteenth in length. Other next generation lithography are electron beam and imprint lithography. For more detailed information on lithography process technology refer to see [1] [16]. The impact of the process technology on the overall processor count per die is shown in figure 3. It is evident that the smaller the process technology (lithography) the larger the number of transistors per die and consequently the speed, increases and TDP reduces. With increased transistor count per die has led to an increase in die size. Moore’s law is holding up due to the reduced feature sizes which are possible as a result of optical process technology. The feature sizes will continue to decrease as the semiconductor roadmap moves to the 22nm process technology as year 2011 approaches.

4.3 Yield, Chip testing and grading

As each 8 inch or 12 inch wafer rolls off the production line, it is split up into its individual chips after undergoing a special feature tests. Ideally, they would be all exactly identical, but in practice there are tiny variations between them. Process variations, such as impurities in wafer material and chemicals, dust particles on masks or in the projection system, mask misalignment, incorrect temperature control, etc., can produce defects on wafers. The term defect generally refers to a physical imperfection in the processed wafer [2]. The stuck-at fault model is the most commonly used model in VLSI testing and fault tolerance schemes. In this model, a physical defect manifests itself as a signal consistently having a certain value (either zero or one) independent of the input [3]. These defects affect the process yield. The process yield of a manufacturing process is defined as the fraction (or percentage) of acceptable parts among all parts that are fabricated (The ratio of good to bad chips on a wafer). See figure 4

Fig. 4. Defect modelling for yield estimation

Typical defects are broken conductors, missing contacts, bridging between conductors, missing transistors, incorrect doping levels, and many other phenomena that can cause the circuit to fail. Some defects are observable through the optical or electron microscope. Others are not visible and can only be detected by electrical tests. To estimate the VLSI

yield, defects are modelled as random phenomena. For more details refer to [2].

Yields well under 50 percent are common when a new chip starts production; however, by the end of a given chip’s life, the yields are normally in the 90 percent range. Most chip manufacturers guard their yield figures and are very secretive about them because knowledge of yield problems can give their competitors an edge. After testing each die (chips) on the wafer, the bad ones are marked. All the dies are cut from the wafer and each retested, packaged and retested again. The packaging process is also referred to as bonding, because the die is placed into chip housing where a special machine bonds fine gold wires between the die and the pins on the chip. The package is the container for the chip die, and it essentially seals it from the environment. This testing (pressures, temperatures, and speeds) after packaging is to detect the point at which the chip will stop working. At this point, the maximum successful speed is noted and the final chips are sorted into bins with those that tested at a similar speed. For example, the Pentium III 750, 866, and 1000 are all exactly the same chip made using the same process. They were sorted at the end of the manufacturing cycle by speed. The paradox is that Intel often sells a lot more of the lower-priced 933 and 866MHz chips, so it will just dip into the bin of 1000MHz processors and label them as 933 or 866 chips and sell them that way. People began discovering that many of the lower-rated chips would actually run at speeds much higher than they were rated, and the business of overclocking was born. Overclocking describes the operation of a chip at a speed higher than it was rated for [21]. With the birth of overclocking vendors who reap where they did not sow starting remarking slower chips and reselling them as if they were faster. Because most of the Intel and AMD processors are produced with a generous safety margin—that is, they will normally run well past their rated speed—the remarked chips would seem to work fine in most cases. Of course, in many cases they wouldn't work fine, and the system would end up crashing or locking up periodically [21]. Intel and AMD have stopped the trend (remarking fraud) by building overclock protection in the form of a multiplier lock into most of its newer chips.

4.4 Chip foundries

Having discussed microprocessors up to this point it becomes necessary to know where these chips are manufactured; the location of the foundries. Chip foundries fabricate anyone’s chips on contract basis. The question is “Where are the foundries (fabs) located? It will interest you to know that most of the foundries are located in the Far East countries - Taiwan, South Korea, Philippines, China, Singapore. Taiwan for instance is home to several chip foundries, the two biggest being Taiwan Semiconductor Manufacturing Co. and United Microelectronics Corp., both headquartered in Hsinchu [1]. It is home to 6 of the 10 major DRAM manufacturers - Inotera, Nanya, Powerchip, ProMos, Rexchip and Winbond [5]. Some of the semiconductor companies no longer do any of their own fabrication. Some who used to make their own chips have become fab-lite. They now retain some facilities to develop the initial technology but then send their volume business to foundries.

R&I, 2009, №4

54

This is done purely from economic point of view as it allows these companies to do rapid development under their own control and avoid the cost of massive fabs for volume. Building a new fab is quite expensive and can cost about $5 billion. This explains why more and more American and European countries take this approach [1].. Integrated device makers like Intel, Samsung, Toshiba and IBM design and build all their own chips. AMD, Freescale and Texas Instruments have recently gone to the fabs. IBM Microelectronics in East Fishkill N.Y. has become a dominant foundry for game processors. Fujitsu also maintains a 65nm fab in Japan. After Intel and Samsung Texas Instrument is the third largest manufacturer of semiconductors worldwide.

Conclusion

This paper has examined most of the processors by Intel and AMD that have made their marks in the last three decades. Some of them did not live up to expectations as vendors continually played tricks on the consumers especially with their first releases of their new line of processors. In most cases new processors have been found to perform lower or operate at the same level of performance as their older cousins. Rushing to purchase a new processor may not be the right thing to do at the moment. But for how long shall one continue to wait for the second release. We should try to endure if our current processor is still serving us well. We have revealed that Moore’s law will continue to endure as long as new process technologies reduce the feature sizes on the die. The use of EUV process will certainly reduce feature sizes further. We have seen that the better the process technology the more transistors we can squeeze on the die, the faster the processor and the less thermal design power and ultimately the cheaper they become.

References

[1] Bill Arnold, “Shrinking Possibilities”, IEEE Spectrum, pp. 23~25, April 2009.

[2] Michael L. Bushnell and Vishwani D. Agrawal, Essentials of Digital Testing for Digital, Memory and Mixed-Signal VLSI Circuits: ISBN 0-306-47040-3, Kluwer Academic Publishers, 2002.

[3] Fred A. Bower, Sule Ozev, and Daniel J. Sorin, “Autonomic Microprocessor Execution via Self-Repairing Arrays”, IEEE Transactions on Dependable and Secure Computing, Vol. 2, No. 4, pp. 297~310, Oct-Dec 2005.

[4] Douglas V. Hall, Microprocessors and Interfacing Revised 2nd ed., ISBN: 0-07-060167-4, New Delhi, Tata McGraw-Hill, 2007.

[5] Yu-Tzu and Samuel K. Moore, “Taiwan’s Troubled DRAM Plan”, IEEE Spectrum, pp. 9~10, June 2009.

[6] Brian R. Santo, “25 Microchips That Shook the World”, IEEE Spectrum May 2009.

[7] Keith Diefendorff, “History of the PowerPC Architecture”, Communication of the ACM 37, 6, pp.28~33, June 1994.

[8] Indiana University Information technology Services Knowledge Base, “What are CISC and RISC technologies, and how do they compare?”,

[9] Patterson, D.S. and Ditzel, D.R. “The case for the reduced instruction set computer”, Computer Architecture News 8:6 pp. 25-33, Oct.15, 1980.

[10] Jeff Prosise, “RISC vs. CISC: The Real Story - What makes the PowerPC a RISC processor and the Pentium a CISC? “, http://www.zdnet.com/pcmag/pctech/content/14/18/tu1418.001 .html.

[11] Texas Instruments Datasheets and Cross Reference” http://www.supplyframe.com/datasheet-pdf/manufacturer/texas+instruments?id=2477

[12] Intel Corporation, “Microprocessor Quick Reference Guide”, http://www.intel.com.

[13] Intel, “Embedded and Communications”, Product Brief, intel.com/go/embedded.

[14] Harriott L. R., “A New Role for E-Beam: Electron Projection,” IEEE Spectrum, vol. 36, no. 7, pp. 41-45, July 1999.

[15] Intel Corporation, “White Paper Introduction to Intel’s 32nm Process Technology”, www.intel.com.

[16] Scotten W. Jones, Introduction of Integrated Circuit Technology, 4th ed, IC Knowledge LLC.

[17] Chris A. Mack, “Seeing Double”, IEEE Spectrum, pp. 123, November 2008.

[18] Wikipedia

[19] Red Hill Technology, “The Red Hill CPU Guide”, http://www.redhill.net.au/index.html.

[20] Intel, “New numbering system for Interl Server Processors”, www.intel.com.

[21] Scott Mueller and Mark Edward Scoper, “Microprocessor Types and Specifications”, InfromIT Network, file://J:\MacmillanComputerPublishing\chapters\JW003.html 3/22/01].

[22] Intel Pentium M power data, 1.6 GHz, 0.13pm technology, 1 MB L2 cache

[23] http://www.intel.com/design/intarch/pentiumm/pentiumm.htm

R&I, 2009, №4

55

A closer look at microprocessors that have shaped the digital world Текст научной статьи по специальности «Электротехника, электронная техника, информационные технологии»

Похожие темы научных работ по электротехнике, электронной технике, информационным технологиям , автор научной работы — Christopher U. Ngene, Manish Kumar Mishra

Текст научной работы на тему «A closer look at microprocessors that have shaped the digital world»