ECE 428 Programmable ASIC Design

## FPGA Microprocessor

Part 1

Haibo Wang ECE Department Southern Illinois University Carbondale, IL 62901

#### Microprocessor v.s. ASIC

□ For a given function, we can implement it by using a general purpose microprocessor or by designing an ASIC

| Microprocessor                                                  | ASIC                                                                                                  |
|-----------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|
| 1. General purpose                                              | 1. Application specific                                                                               |
| 2. Flexible and easy to update by loading new software          | 2. Less flexible and almost impossible to<br>Update unless the ASIC contains<br>programmable circuits |
| 3. Typically, the performance is not optimized for a given task | 3. Normally, optimal performance                                                                      |

#### How Does A Microprocessor Work?



- □ First, an instruction is fetched into the microprocessor
- □ Second, the instruction is decoded for execution
- □ Finally, the instruction is executed.

#### Advantages of FPGA Microprocessors

- ☐ The architecture of FPGA microprocessor can be easily modified to achieve optimal performance for a given application.
- $\Box \text{ Example: calculate } Y = A \bullet X^2 + B \bullet X + C$



#### Introduction to GNOME Microprocessor



## Instructions of GNOME Microprocessor

| Mnemonic | Operation                       | Description                                                                                                                                                                              |
|----------|---------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| load Rd  | ACC ← Rd                        | Load the accumulator with the contents of the memory location whose address is d.                                                                                                        |
| loadi d  | ACC ← d                         | Load the accumulator with the immediate data d                                                                                                                                           |
| store Rd | Rd ACC                          | Store the value in the accumulator into RAM location Rd.                                                                                                                                 |
| add Rd   | ACC ← ACC+Rd+C                  | Add the content of RAM location Rd and the C flag to the Accumulator.                                                                                                                    |
| addi d   | ACC ← ACC+d+C                   | Add the value d and the C flag to the accumulator.                                                                                                                                       |
| xor Rd   | $ACC \leftarrow ACC \otimes Rd$ | EXCLUSIVE-OR the content of RAM location Rd with the Accumulator.                                                                                                                        |
| test Rd  | Z ← ACC • Rd                    | AND the contents of RAM location Rd with the accumulator,<br>but do not store the results in the accumulator. Instead, set<br>the Z flag if the result is 0, otherwise clear the Z flag. |

## Instructions of GNOME Microprocessor

| Mnemoni<br>c | Operation   | Description                                                                                                                          |
|--------------|-------------|--------------------------------------------------------------------------------------------------------------------------------------|
| clear_c      | C ← 0       | Clear the C flag to zero.                                                                                                            |
| set_c        | C ← 1       | Set the C flag to one.                                                                                                               |
| skipc        | PC ← PC+1+C | If C=1, skip the next instruction by incrementing the program counter by two instead of one. Otherwise, execute the next instruction |
| skipz        | PC ← PC+1+Z | If Z=1, skip the next instruction by incrementing the program counter by two instead of one. Otherwise, execute the next instruction |
| jump a       | PC ←a       | Jump to program address a and execute instructions from address a. a is an address in the range [0127]                               |

#### Instruction Encoding

#### □ Instruction Format

- Fixed length instructions (8 bits)
- Each instruction can have one or zero operand



## Instruction Encoding

| Mnemonic | Encoding                                 | Comment                                                      |
|----------|------------------------------------------|--------------------------------------------------------------|
| load Rd  | $0\ 1\ 0\ 0\ d_{3}\ d_{2}\ d_{1}\ d_{0}$ | $d_3 d_2 d_1 d_0$ is the address of RAM location             |
| loadi d  | $0\ 0\ 0\ 1\ d_{3}\ d_{2}\ d_{1}\ d_{0}$ | $d_3 d_2 d_1 d_0$ is the immediate data                      |
| store Rd | $0\ 0\ 1\ 1\ d_3\ d_2\ d_1\ d_0$         | $d_3 d_2 d_1 d_0$ is the address of RAM location             |
| add Rd   | $0\ 1\ 0\ 1\ d_3\ d_2\ d_1\ d_0$         | $d_3 d_2 d_1 d_0$ is the address of RAM location             |
| addi d   | $0\ 0\ 1\ 0\ d_3\ d_2\ d_1\ d_0$         | $d_3 d_2 d_1 d_0$ is the immediate data                      |
| xor Rd   | $0\ 1\ 1\ 0\ d_3\ d_2\ d_1\ d_0$         | $d_3 d_2 d_1 d_0$ is the address of RAM location             |
| test Rd  | $0\ 1\ 1\ 1\ d_{3}\ d_{2}\ d_{1}\ d_{0}$ | $d_3 d_2 d_1 d_0$ is the address of RAM location             |
| clear_c  | 00000000                                 |                                                              |
| set_c    | 00000001                                 |                                                              |
| skipc    | 00000010                                 |                                                              |
| skipz    | 00000011                                 |                                                              |
| jump a   | $1 a_6 a_5 a_4 a_3 a_2 a_1 a_0$          | $a_6 a_5 a_4 a_3 a_2 a_1 a_0$ is the new instruction address |

### Operation of GNOME Microprocessor



#### **GNOME** Block Diagram



#### Address Generation

- □ If inc\_pc = 1, the adder output goes to the input of the register (for normal instruction address update and skipz and skipc instructions).
- □ If jump\_pc =1, curr\_ir goes to the input of the register (for jump instruction)
- □ If jump\_pc =0 and inc\_pc=0, the output of the register goes to the input of the register.
- □ It is prohibited to have jump\_pc =1 and inc\_pc=1 at the same time.
- After reset, curr\_pc=0. Thus, the microprocessor fetches the first instruction from address 000000



#### Instruction Latch & Data Bus Circuits

- For write operation (store rd), the write buffer is on and data on curr\_acc are written into memory. During other time, the write buffer is off (high impedance output)
- To fetch an instruction, both muxs pass the data on the data bus to the instruction register
- To fetch data from data RAM, only the least significant mux passes the pass the data on the data bus to the instruction register



#### Control & decoding Unit



10-14

Control & decoding Unit

Example: how generate control signal write

— write signal should be driven to high to enable the write buffer during the execution state of *store rd* instructions

write = exe\_state AND store\_instruction

- 1. Signal exe\_state is high when GNOME is during execution states
- 2. Signal store\_instruction is high if the current instruction is *store rd*

## **Execution Unit**

# Accumulator-Based Architecture



- For operations with two operands, one operand is accumulator.
- The execution result is store in accumulator

#### □ ALU operations

| Name  | Operation                        | Encoding |
|-------|----------------------------------|----------|
| PASS  | ACC ← B                          | 001      |
| ADD   | ACC←ACC+B+Cin                    | 010      |
| XOR   | ACC $\leftarrow$ ACC $\otimes$ B | 011      |
| AND   | Zero flag = $ACC \bullet B$      | 100      |
| SET_C | Set carry flag                   | 101      |
| CLR_C | Clear carry flag                 | 110      |

#### ALU Circuit



| alu_op[1] | alu_op[0] | Operation |
|-----------|-----------|-----------|
| 0         | 1         | PASS      |
| 1         | 1         | XOR       |
| 1         | 0         | ADD*      |

\* Extra circuits are needed for carry generation

#### Complete ALU Circuit



10-18

#### Programming GNOME Processor

#### Example: writing a program to calculate 48H + 29H

| Instructions |    | Machine code |
|--------------|----|--------------|
| Loadi        | 8  | 18H          |
| Store        | R0 | 30H          |
| Loadi        | 4  | 14H          |
| Store        | R1 | 31H          |
| Loadi        | 9  | 19H          |
| Store        | R2 | 32H          |
| Loadi        | 2  | 12H          |
| Store        | R3 | 33H          |
| Clear_c      |    | 00H          |
| Load         | R0 | 40H          |
| Add          | R2 | 52H          |
| Store        | R4 | 34H          |
| Load         | R1 | 41H          |
| Add          | R3 | 53H          |
| Store        | R5 | 35H          |

#### Put Everything Together

