PROGRAM ALLOCATION AND IMPLEMENTATION OF CACHE IN A DATAFLOW ENVIRONMENT

Open Access
Author:
Lim, Joford T.
Graduate Program:
Computer Science and Engineering
Degree:
Doctor of Philosophy
Document Type:
Dissertation
Date of Defense:
October 15, 2001
Committee Members:
  • Ali R Hurson, Committee Chair
  • Mary Jane Irwin, Committee Member
  • John Metzner, Committee Member
  • Ali Borhan, Committee Member
  • Octavia I Camps, Committee Member
Keywords:
  • dataflow
  • program allocation
  • loop scheduling
  • computer architecture
  • cache
Abstract:
The success of multithreaded systems depends on how quickly context switching between threads can be achieved. Fast context switch is only possible if threads are resident in fast but small memories (such as instruction buffers, caches and registers). This however, limits the number of active threads and thus the amount of latency that can be tolerated. The generality of dataflow scheduling makes it difficult to fetch and execute a sequence of logically related sets of threads through the processor pipeline, thereby removing any opportunity to use registers across thread boundaries. Relegating the responsibilities of scheduling and storage management to the compiler alleviates this problem to some extent. In conventional architectures, the reduction in memory latencies is achieved by providing (explicit) programmable registers and (implicit) high-speed caches. Amalgamating the idea of caches or register-caches within the dataflow framework can result in a higher exploitation of parallelism and hardware utilization. This thesis investigates the suitability of cache memory in a dataflow paradigm. We present two heuristic schemes that allow the detection, exploitation, and enhancement of temporal and spatial localities in a dataflow graph (dataflow program). Dataflow graphs are partitioned into subgraphs while preserving localities, and subgraphs are distributed among the processors in order to reduce cache misses and communication cost. The Staggered Distribution scheme was proposed to address the issue of detection and allocation of dynamic parallelism in a program. On the other hand, the Vertically Layered (VL) allocation scheme, can effectively detect spatial localities in a dataflow graph by assigning nodes connected serially to a partition (a vertical layer). As part of this thesis, we intend to incorporate and utilize these two schemes for detecting and enhancing localities in a dataflow graph, which is essential in optimizing the effectiveness of cache in multithreaded dataflow architectures. In addition, since the Staggered scheme produces an unbalanced load among processors. An extension to the Staggered scheme that produces a more balanced distribution of iterations among processors will also be developed.