I can still remember the first time I touched a computer. I
remember first being amazed by the things it could do (even in the days of good
old dos). After that wonder died down, I began wondering how this thing could
do all of these amazing tasks and work in such a small box. That was when I
was about eight years old. From that moment on, I knew that I wanted to become
a Computer Engineer and learn as much as I could about these absolutely amazing
devices. Well I am now graduated from college and am as passionate about them
as ever. In this section I am going to attempt to share some of the knowledge
I have learned in my research and schooling. Most of the papers presented in
this section were written during my Graduate Computer Architecture taken at the
Georgia Institute of Technology during the fall of 2002. It is my sincere hope
that someone who wants to know more about the internals of computers will
stumble into this section and learn something they didn’t know before. Of
course what I have provided here is only summaries of papers or other small
tidbits that could only set you on the path, but you have to start somewhere!
If you learn anything from the information contained in
here, I’d like to hear from you. If you are a student and are using any of my
work for less than honorable purposes, I’d still like to hear from you. I
do not condone cheating; please do not use anything you find on my site for
dishonorable conduct.
E-Mail me at baudburn [at] baudburn [dot] com. Thanks.
Formal Verification
Microprocessors are just cool. They are the brain of any
computer system and this is the reason they get so much attention. While
working at National Instruments, I was fortunate enough to be able to take a
course in advanced computer architecture. This course was very good and we a
very cool project that required us to implement a 5-stage dual issue pipeline
processor in a VHDL-like description language for formal verification known as
abstract HDL (absHDL). For those of you who do not know, formal verification is
a process by which you can mathematically prove the correctness of a
microprocessor implementation through the use of large Boolean equations. In a
nutshell, you create a very simple non-pipelined processor implementation with
all the desired functionality of your final processor. This processor is very
easy to look at and decide if it works properly. Once this stage is finished you
begin extending it to the pipelined version. This becomes difficult as you run
head on into pipelining hazards like data dependencies. In the end, formal
verification proves the pipelined versions correctness by proving that a single
step of the non-pipelined (ISA) version of the processor is in all cases
exactly equal to one *or more* steps of
the pipelined processor. If it can be proven that from any given starting
state, this assertion holds true, the pipelined implementation if formally
verified and will perform correctly. There are many benefits to this approach;
far too numerous to even begin to explain here. If I have the chance, at some
point in the future, I will write up a section on formal verification. For now
just trust me, it is a very good thing when compared with the alternatives.
So without further ado, here are the processor implementations in absHDL.
Revision 1.0: The objecting for the first revision of this processor was to create a 5 Stage
pipelined processor that implements register-register ALU instructions with 2
register ids, register-immediate ALU instructions with 2 register ids, store
instructions, load instructions, jump instructions, and branch instructions.
The stages utilized in this pipeline are instruction fetch, instruction decode,
execute, memory ops, and write back.
For those who are interested, I will also post the intermediary steps taken to
get to revision 1 of this processor:
5 Stage Pipeline Rev 0.1
5 Stage Pipeline Rev 0.2
5 Stage Pipeline Rev 0.3
5 Stage Pipeline Rev 0.4
5 Stage Pipeline Rev 0.5
Revision 2.0: This revision builds on Rev 1 and adds support for ALU exceptions, return from ALU
exceptions, and branch prediction. This step makes it much more difficult as
the pipe steering logic becomes much more complex as well as the logic for
correcting pipeline mistakes like branch mispredictions.
As with Rev 1, here are the intermediary steps that I took
to go from rev 1 to rev 2.
5 Stage Pipeline Rev 1.1
5 Stage Pipeline Rev 1.2
Revision 3.0: The
third step is not really a revision so much as a rewrite. In this phase I took
the 1.0 implementation and revised its issue logic to be a dual issue pipeline.
The first pipeline in this dual issue configuration will handle ALU and memory
instructions while the second pipeline will handle ALU, jump, and branch
instructions. Restrictions similar to these are sometimes used to save hardware
real estate, but at the price of adding on structural hazards. We will now have
to route instructions into their appropriate pipeline based on their
instruction type. Structural hazards will arise if the next instruction
requires hardware from a certain pipeline, but that pipeline is busy. To handle
this we will have to stall the pipeline by inserting bubbles.
5 Stage Dual Issue Pipeline 1.0
As with the other steps, here are the intermediary steps taken:
5 Stage Dual Issue Pipeline Rev 0.1
This section will serve as a place to post papers and other
things that I have written about computer architecture. The majority of the
files added here are short summaries of papers that I have read. These
summaries attempt to condense the major points of a specific topic into a
concise work that attempts to convey the key points of the work at a glance (or
skim).
Itanium Vs Crusoe |
This paper was written to compare and contrast the Intel Itanium and the Transmetta Crusoe. These two approaches are fundamentally very different and were designed with different goals in mind. The paper is written as if it were a report for a company manager. |
DIVA Vs Slipstream |
This paper was written to discuss the DIVA and Slipstream architectures. These alternative architectures use the power of two processors to achieve greater speed than a single processor could by itself. Keep in mind that this paper does *NOT* discuss SMP or other common multiprocessor solutions. It instead considers a single physical processor that has two processor cores built in and the possibilities associated. This is an interesting read! |
One Billion Transistors |
This paper examines the future ahead of us. It deals with the technology will be available to us when we can fit one billion transistors on a single chip! |
Trace Cache |
Trace caching is a very cool idea. In a nutshell, a trace cache works just like a traditional cache, but stores instructions in a logically contiguous fashion instead of in a locally contiguous fashion. This is cool stuff that is beginning to be utilized in processors like the Intel Pentium 4. |
Branch Prediction |
When I first learned about processors, I remember wondering how a branch predictor works. Imagine a device that can guess with more than 95% accuracy the direction you are going to take before you get there! It sounds impossible at first, but it is actually quite simple. Read on to find out. |
Value Prediction vs Instruction Reuse |
This is certainly an interesting idea. This paper compares two technologies aimed at accelerating the rate at which a processor can finish a segment of code. Value prediction speculatively guesses at the result of instructions based on previous results allowing it to compute ahead of itself. Instruction reuse recognizes instruction traces that have been previously calculated and removes the trace from the instruction stream since the results are have previously been calculated. Very cool stuff!! |
POWER4 |
There are many different processors in the world. Many more than the AMD/Intel processors that everyone knows about. This paper examines some of the features of the IBM POWER4 Architecture. |
Intelligent Cache Memory |
This one probably isnÃt as interesting as some other papers here. This paper talks a bit about cache systems and how they can be used to maximize performance of multimedia focused processors (among others). |
Sure you use it every single day. Sure it has cool icons for every major holiday and many that you have never even heard of. But the real question is: Do you know how it works? This paper talks about some of Google’s functionality. It considers how Google crawls, and what mechanisms are in place to insure that Google remains a very popular web searching utility. |
|
Future Microarchitectural Problems |
Everyone wants to know what the future will be like. This paper discusses what challenges we can foresee in the future of microarchitecture. An interesting read if you want to know where we are going. |
10 Arguments for RISC |
RISC Vs. CISC, the battle has been waged on and on for as long as I can remember. Here are 10 reasons RISC is a good architectural design choice. |
IA-64 Performance |
Intel has spent a whole load of money developing the IA-64 instruction set. This is a list of things that could inhibit the performance a processor with this instruction set. |
10 Innovations in the Intel Pentium 4 |
Intel’s Pentium 4 processor is certainly innovative. On some fronts it actually seems to perform worse than its Pentium III buddy. If you are willing to overlook this, check out these 10 innovations that are in every P4 processor. |
MLP vs. ILP |
Memory Level Parallelism (MLP )is an interesting idea. Bascially the argument is that MLP attempts to parallelize misses to cache rather than parallelize instructions (ILP). Here are 7 reasons that MLP, if further explored, could prove to be better than the much more common ILP. |
Considerations for the Intel Itanium 2 Data Cache Design |
If you know anything about processors, you know that cache is important. The more cache you have, the less cycles the processor has to waste doing NOP’s while fetching data. However there is more the caches than just size. This short paper talks about the Itanium 2, and how they chose to implement the Data Cache. |