Open Access
Scott, Jesse
Graduate Program:
Computer Science and Engineering
Master of Science
Document Type:
Master Thesis
Date of Defense:
Committee Members:
  • Vijaykrishnan Narayanan, Thesis Advisor
  • Weavesort
  • HDL
  • systolic
  • Batcher
  • insertion
  • median
The two-dimensional spatial median filter is a core algorithm for impulse noise removal in digital image processing and computer vision. While the literature presents several analyses of median filters optimized for a standard 3x3 pixel neighborhood configuration, a 5x5 neighborhood, useful for imagery exhibiting noise not conforming to the classic “salt and pepper” formation, has received little analysis. Research efforts on hardware implementations of median filters have been devoted primarily toward implementations with low latency and high throughput. A system is in development that utilizes intensified visible to near-infrared sensors and requires a 5x5 median filter to handle intensifier noise. Since the system is a battery powered unit, optimal power usage is a critical requirement in addition to low latency and high throughput. However, optimal power usage for median filtering has received little attention in the literature. This research focuses on investigating five selected hardware implementations of a 5x5 median filter and comparing them on the basis of power dissipation. The latency, maximum clock rates, and resource utilization for some of these implementations are also analyzed. The designs considered include implementations of a radix sort-based elimination algorithm, a systolic sorting array, a Batcher sorting network, and two insertion sorting networks. The two insertion sort networks have nearly identical sorting cores, but one utilizes a fundamentally different wrapper; this will be referred to as a row insertion sort network. Also included in the analysis is a commercial off the shelf (COTS) intellectual property (IP) core. The five custom filters were designed to be fully pipelined, accepting inputs and generating median output values every pixel clock pulse with a constant latency. All five custom designs are integrated with a wrapper functions that provide buffering to handle the kernel based behavior of two dimensional filtering. Initially, a merge sort implementation was also considered but was almost immediately eliminated because the power and resource usage was an order of magnitude greater than all other designs under review.