Processor Architecture

The processor, also called the central processing unit (CPU), interprets and carries out the basic instructions that operate a computer.  The processor significantly impacts overall computing power and manages most of a computer’s operations.  On the larger computers, such as mainframes and supercomputers, the various functions performed by the processor extend over many separate chips and often multiple circuit boards.  On a personal computers, such as mainframes and supercomputers, the various functions performed by the processor extend over many separate chips and often multiple circuit boards.  On a personal computer, all functions of the processor usually are on a single chip.  Some computer and chip manufacturers use the term microprocessor to refer to a personal computer processor chip.

Processor

Principles of CPU Architecture

The underlying principles of all computer processors are the same.  It does not matter of the brand, age, software or broadband set-up.  Fundamentally, they all take signals in the form of binary (0s and 1s), manipulate them according to a set of instructions, and produce output in the form of binary.  The voltage on the line at the time and signal is sent determines whether the signal is a 0 or 1.  On a 3.3-volt system, an application of 3.3-volts means that it’s 1, while an application of 0 volts means it’s a 0.

Processors work by reacting to an input of 0s and 1s in specific ways and then returning an output based on the decision.  The decision itself happens in a circuit called a logic gate, each of which requires at least one transistor, with the inputs and outputs arranged differently by different operations.  The fact that today’s processors contain millions of transistors offer a clue as to how complex the logic system is.  The processor’s logic gates work together to make decisions using Boolean Logic, which is based on the algebraic system establish by mathematician George Boole. For more information regarding Boolean logic.

Registers

A processor contains small, high-speed storage locations, called registers, that temporarily hold data and instructions.  Registers are part of the processor, not part of memory or a permanent storage device.  Processors have many different types of registers, each with a specific storage function.  Register functions include storing the location from where an instruction was fetched, storing and instruction while the control unit decodes it, storing the data while the ALU computes it, and storing the results of a calculation.

To summarise registers are locations where data or control information is temporarily stored.  It’s like a drawer in which you keep your files and papers.  The CPU is made up for two main parts; Arithmetic Logic Unit and Control Unit.

The Control Unit (CU)

The control unit is the component of the processor that directs and coordinates most of the processor that directs and coordinates most of the operations in the computer.  The control unit has  a role much like a traffic light: it interprets each instruction issued by a program and then initiates the appropriate action to carry out the instruction.  Types of internal components that the control unit directs include the arithmetic/logic unit, registers, and buses, each discussed later in this chapter.

Functions of the Control Unit (CU)

The control unit co-ordinates the input and output devices of a computer system.  It fetches the code of all of the instructions in the microprograms.  In computers, the control unit was historically defined as one distinct part of the 1946 reference model of Von Neumann architecture.  In modern computer designs, the control unit is typically an internal model of Von Neumann architecture.  In modern computer designs, the control unit is typically an internal part of the CPU with its overall l role and operation unchanged.

The outputs of the control unit control the activity of the rest of the device.  A control unit can be thought of an infinite state machine.  The control unit is the circuitry that controls the flow of data through the processor, and coordinates the activities of the other units within it.  In a way, it is the “brain within the brain”, as it controls what happens inside the processor, which in turn controls the rest of the PC.  The same control unit which are developed within CPUs are also present in GPUs.  The modern information age would not be possible without complex unit designs.

The functions performed by the control unit vary greatly by the internal architecture of the CPU, since the control unit really implements this architecture of the CPU, since the control unit implements this architecture.  On a regular processor that executes x86 instructions natively the control unit performs the tasks of fetching, decoding, managing execution and then storing results.

Artihmetic Logic Unit (ALU)

The arithmetric logic unit (ALU), another component of the processor, performs arithmetic, comparison, and other operations.

Arithmetic operations include basic calculations such as addition, subtraction, multiplication, and division.  Comparison operations involve comparing one data item with another to determine whether the first item is greater than, equal to, or less than the other item is greater than, equal to, or less than the other item.  Depending on the results of the comparison, different actions may occur.  For example, to determine if an employee should receive overtime pay, software instructs the ALU to compare the number of hours an employee worked during the week with the regular time hours allowed (e.g., 40 hours).  If the hours worked are greater than 40, software instructs the ALU to perform calculations that compute the overtime wage.

Machine Cycle

For every instruction, a processor repeats a set of four basic operations, which comprise a machine cycle.

  • Step 1: FetchingFetching is the process of obtaining a program instruction or data item from memory. 
  • Step 2: Decoding – The term decoding refers to the process of translating the instruction into signals the computer can execute.
  • Step 3: ExecutingExecuting is the process of carrying out the commands. 
  • Step 4: Storing (if necessary) -  Storing, in this context, means writing the result to memory (not to a storage medium).

Machine Cycle Diagram

In some computers, the processor fetches, decodes, executes, and stores only one instruction at a time.  In these computers, the processor waits until an instruction completes all four stages of the machine cycle (fetch, decode, execute, and store) before beginning work on the next instruction.

Most of today’s personal computers support a concept called pipelining.  With pipelining, the processor begins fetching a second instruction before it completes the machine cycle for the first instruction.  Processors that use pipelining are faster because they do not have to wait for one instruction to complete the machine cycle before fetching the next.  Think of a pipeline as an assembly line.  By the time the first instruction is in the last stage of the machine cycle, three other instructions could have been fetched and started through the machine cycle.

Machine Cycle Pipeline

Most modern computers support pipelining.  With pipelining, the processor fetches a second instruction before the first instruction.

The System Clock

The processor relies on a small quartz crystal circuit called the system clock to control the timing of all computer operations.  Just as your heart beats at a regular rate to keep your body functioning, the system clock generates regular electronic pulses, or ticks, that set the operating pace of components of the system unit.

Each tick equates to a clock cycle.  In the past, processors used on or more clock cycles to execute each instruction.  Processors today often are superscalar, which means they can execute more than one instruction per clock cycle.

The pace of the system clock, called the clock speed, is measured by the number of ticks per second.  Current personal computer processors have clock speeds in the gigahertz range.  Giga is a prefix that stands for billion, and a hertz is one cycle per second.  Thus, one gigahertz (GHz) equals one billion ticks of the system clock per second.  A computer that operates at 3 Ghz has 3 billion (giga) clock cycles in one second (hertz).

The faster the clock speed, the more instructions the processor can execute per second.  The speed of the system clock has no effect on devices such as a printer or disk drive.  The speed of the system clock is just one factor that influences a computer’s performance.  Other factors, such as the type of processor chip, amount of cache, memory access time, bus width, and bus clock speed.

In computers, sequence is everything. The system clock synchronizes the tasks in a computer, like loading data before manipulating it, etc. The system clock is a circuit that emits a continuouse stream of precise high and low pulses taht are all exactly the same length. Once clock cycle is the time that passes from the start of one high pulse, until the start of the next. If several evernts are supposed to happen in one clock cycle, the cycle is subdivided by inserting a circuit with a known delay in it, thus providing more highs and more lows.

Every modern PC has multiple system clocks. Each of these vibrates at a specific frequency, normally measured in MHz (megahertz, or millions of cycles per second). A clock “tick” is the smallest unit of time in which processing happens, and is sometimes called a cycle; some types of work can be done in one cycle while others require many. The ticking of these clocks is what drives the various circuits in the PC, and the faster they tick, the more performance you get from yur machine (other things being equal).

The original computers had a unified system clcok; a single clock (running at a very low speed like 8MHz) drove the processor, memory (there was no cache back then) and I/O bus. As computers advanced and different parts have gained in speed more than others, the need for multiple clokcs has arisen. A typical modern comptuer has either four or five different clocks, running at different (but related) speeds. When the "system clock" is referred to generically, it normally refers to the speed of the memory bus running on the motherboard (and not usually that of the processor).

The various clocks in the modern comptuer are created using a single clock generator circuit (on the motherboard) to generate the "main" system clock, and then various clock multiplier or divider circuits to create the other signals. The entire system is tied to the speed of the system clock. This is why increasing the system clock speed is usually more important than increasing the raw processor speed; the processor spends a great deal of time waiting on information from much slower devices, especially the system buses. While a faster processor will have greater performance, this increase in speed will not lead to nearly as much performance improvement if the processor is spending a great deal of time sitting idle wiaiting for other, slower parts of the system.

Cache

Most of today’s computers improve processing times with cache (pronounced cash).  Two types of cache are memory cache and disk cache.

Memory cache helps speed the processes of the computer because it stores frequently used instructions and data.  Most personal computers today have two types of memory cache: L1 cache and L2 cache.  Some also have L3 Cache.

  • L1 Cache – L1 Cache is built directly in the processor chip.  L1 cache usually  has a very small capacity, ranging from 8Kb to 128Kb.  The more common sizes for personal computers are 32Kb or 64Kb.
  • L2 Cache – L2 Cache is slightly slower than L1 cache but has a much larger capacity, ranging from 64Kb to 16Mb.  When discussing cache, most users are referring to L2 cache.  Current processories include advanced transfer cache (ATC), a type of L2 cache built directly on the processor chip.  Processors that use ATC perform at much faster rates than those that do not use it.

Personal computers today typically have from 512Kb to 8Mb of advanced transfer cache.  Servers and workstations have from 8Mb to 16Mb of advanced transfer cache.

  • L3 Cache – L3 Cache is a cache on the motherboard that is separate from the processor chip.  L3 cache exists only on computers that use L2 advanced transfer cache (ATC).  Personal computers often have up to 8 Mb of L3 cache; servers and workstations have from 8 Mb to 24 Mb of L3 cache.

Cache speeds up processing time because it stores frequently used instructions and data.  When the processor needs an instruction or data, it searches memory in this order: L1 cache, then L2 cache, then L3 cache (if it exists), then RAM – with a greater delay in processing for each level of memory it must search.

Cache Memory

If the instruction or data is not found in memory, then it must search a slower speed storage medium such as a hard disk, CD, or DVD.

Windows Vista users can increase the size of cache through Windows ReadyBoost, which can allocate up to 4 GB of removable flash memory devices as additional cache.  Examples of removable flash memory include USB flash drives, CompactFlash cards, and SD (Secure Digital) cards.  Removable flash memory is discussed in more depth later in this chapter and the book.

Parallel Processing

Parallel processing is method that uses multiple processors simultaneously to execute a single program or task.

<control processor image>

Parallel processing divides a problem into portions so that multiple processors work on their assigned portion of a problem at the same time.  In this diagram, one processor , called the control processor, is managing the operations of four other processors.

Parallel processing divides a single problem into portions so that multiple processors work on their assigned that multiple processors work on their assigned portion of the problem at the same time.  Parallel processing requires special software that recognises how to divide the problem and then bring the results back together again.

Some personal computers implement parallel processing with dual-core processors or multi-core processors.  Others have two or more separate processors.  Others have two or more separate processor chips, respectively called dual processor chips, respectively called dual processor or multiprocessor computers.

Massively parallel processing is large scale parallel processing for applications such as artificial intelligence and weather forecasting.  Some applications draw on the idle time of home users’ personal computers to achieve parallel processing.