Introduction to FPGA and Verilog
ECE3400

An FPGA is an integrated circuit which can be modified after manufacture to become the hardware that you describe in a hardware description language (HDL). The DE0-nano that you are using can be a 32-bit cpu, a VGA-controller, or camera controller. Or all three. The process for specifying the hardware depends on a high level HDL called Verilog. You might use a FPGA when you need highly parallel computing or huge i/o bandwidth.

You are going to need a FPGA because of the high-speed timing demands and bandwidth required to read and analyse an image from a camera,
while handling a communications connection to an Arduino. The FPGA has to:
-- Connect an OV7670 camera to the FPGA so that you can read images into FPGA memory.
-- Analyse camera images in FPGA memory for one of several shapes and colors.
For debugging, you may want to display the image and analysis on a VGA monitor.
-- Communicate the shape analysis results to the control Arduino.
-- for more detail see https://cei-lab.github.io/ece3400-2018/lab4.html

FPGA structure

Your FPGA has:
--Logic elements 22,320
--Memory blocks 594kbits, mostly as M9k blocks
--Dedicated, very fast, 66 hardware 18x18 multipliers
--A variety of clock sources. Most important is PLL modification of 50 MHz input clock.

Logic element
logic element modes normal and arithmetic.
22k LE is enough to build a few 32-bit processors on one chip.
Statments in Verilog: set up the LUT, set the MUXs, and route the output to other LEs and to memory

Logic Array Blocks (LAB) are groups of LE organized for fast communication.
Each LAB consists of the following features:
-- 16 LEs: convenient for 32 bit arithmetic with fast LAB carry chains and two arithmetic bits per LE
-- LE carry chains and Register chains

Memory
Using LE as memory:
-- Fast to set up
-- Asynchronous access and global clear possible.
This allows you to do operations you cannot do on M9k blocks.
Such as asynch reads, global clear, etc
-- inefficient use of FPGA resource (relative to M9k blocks).
You will be able to store about 5 Kbytes in LE memory, if you build nothing else.
M9k block memory:
-- Single port RAM, Dual port RAM, FIFO, Shift register, ROM.
-- Synchronous access only, must use clocked registers.
-- Arranged as 1, 2, 4, 8, 9, 16, 18, 32, 36 bit words for total of 9k bits each.
-- Verilog can specify memories spanning multiple M9k blocks (see below)

Multipliers
-- 9x9, 18x18, or chained in Verilog for more bits
-- Fast, single cycle operation up to several 100 MHz for 18x18.

Your DE0-nano board also has
-- 32MB SDRAM -- IF you intend to use this you MUST learn to use the Qsys bus tool first
-- 8 LEDS
-- Two pushbuttons
-- ADI ADXL345, 3-axis accelerometer with high resolution
-- ADC
-- 50MHz clock oscillator which can be set internally in the FPGA using a PLL

 


Using Verilog to describe hardware
The abundance and variety of hardware on the FPGA demands design automation tools. You will be using Verilog organize your design at the register-transfer level RTL. You will specify registers, wires, memories, arithmetic and all other logic in Verilog.

Verilog looks a lot like C-language, but it is fundamentally different.

Verilog logic templates

Verilog Reference

Verilog Style
-- Nonblocking Assignments in Verilog Synthesis, Coding Styles That Kill!
-- Avoid inferred latches

Verilog simulation/testbench
-- Local ModelSim example
-- Testbench Primer

Advanced Synthesis cookbook


Verilog Examples

Making a multiplexer (combinatorial --notice blocking assign)
-- combinatorial block

always@(a or b or sel)
   if (sel == 1’b1)
      c = a;
   else
      c = b; 

-- Ternary operator
wire c = sel ? a : b; 

Making a state machine (clocked -- notice non-blocking assign)
-- direct digital synthesis (see also the full code)

module DDS (clock, reset, increment, phase, sine_out);
input clock, reset;
input [31:0] increment ;
input [7:0]  phase;
output wire signed [15:0] sine_out;
reg [31:0]	accumulator;

always@(posedge clock) begin
	if (reset) accumulator <= 0;
	// increment phase accumulator
	else accumulator <= accumulator + increment  ;
end

// link the accumulator to the sine lookup table
sync_rom sineTable(clock, accumulator[31:24], sine_out);

-- and the ROM table
page 12-28 in HDL style guide
module sync_rom (clock, address, sine);
input clock;
input [7:0] address;
output [15:0] sine;
reg signed [15:0] sine;
always@(posedge clock)
begin
    case(address)
    		8'h00: sine = 16'h0000 ;
		8'h01: sine = 16'h0192 ;
		8'h02: sine = 16'h0323 ;
		8'h03: sine = 16'h04b5 ;
		8'h04: sine = 16'h0645 ; 
            ...
                8'hfe: sine = 16'hfcdd ;
		8'hff: sine = 16'hfe6e ;
	endcase
end
endmodule

Making dual port memory (clocked), e.g. VGA frame buffer.
This module infers M9k memory in Quartus Prime, when using Cyclone4. See also RAM synthesis attributes.
RAM-style determination and page 12-15 in HDL style guide (Chapter 12, search for: Recommended HDL Coding Styles)
If the size of the memory does not fit in one M9k block, Quartus will link blocks.

module single_clk_ram( 
    output reg [7:0] q,
    input [7:0] d,
    input [6:0] write_address, read_address,
    input we, clk
);
    reg [7:0] mem [127:0];
    always @ (posedge clk) begin
        if (we)
            mem[write_address] <= d;
        q <= mem[read_address]; // q doesn't get d in this clock cycle
    end
endmodule 

Some resources from ee5760
(5760 uses Cyclone5, but most material will be relevant for Cyclone4).

Verilog Summary

Quartus Compile Process and SignalTap

Using ModelSim

Student projects from ece5760