Open Access
Pezeshk, Aria
Graduate Program:
Electrical Engineering
Doctor of Philosophy
Document Type:
Date of Defense:
May 25, 2011
Committee Members:
  • Richard Laurence Tutwiler, Dissertation Advisor
  • Richard Laurence Tutwiler, Committee Chair
  • William Kenneth Jenkins, Committee Chair
  • William Evan Higgins, Committee Member
  • Robert Collins, Committee Member
  • image processing
  • document recognition
  • mathematical morphology
  • text recognition
Topographic maps are one the best and most abundant sources of geographic information. In most countries these maps are prepared by dedicated national organizations and are therefore available for most or even entirety of a country's landmass. The information content of topographic maps is shown either in the form of graphics (e.g. contour lines, roads, buildings, and bodies of water and vegetation), or text (e.g. street labels, place names, and elevation data). While most new maps are generated using computer programs, the majority of existing topographic maps are only available as printed copies. The printing process combines the graphical and textual features into a single 2D layer and forms a complex mixture of heavily intersecting features in which the individual information layers are no longer readily accessible. Development of automatic feature extraction algorithms for map understanding systems has therefore been of interest for a long time. Map interpretation systems can be utilized in a variety of applications. The extracted road and contour lines can be used in Geographic Information Systems (GIS) in order to update an existing map. Since topographic maps are accurately geo-referenced, these layers can also be used in conflation with satellite imagery to obtain a single integrated map. Additionally, the extracted text can be used to classify or search for maps that contain the name of a particular street, area, or city in a database of map images, or to add such information to satellite imagery. Furthermore, the elevation data can be used in conjunction with the extracted contour lines in order to create 3D digital elevation models. In this dissertation we focus on the development of a system for automatic extraction of various graphical features and recognition of the text content of scanned topographic maps. The input to this system is the image of a scanned topographic map, and the output consists of each of the various graphical features extracted as separate layers and the recognized text. This system has been designed such that heuristics are not used, user interaction is limited only to a supervisory capacity, and the need for prior knowledge about the map images is minimal. Furthermore, we have tested our system extensively on the difficult class of United States Geological Survey (USGS) topographic maps which contain dense and regularly overlapping features, and use the same color for linear features, text, and buildings. The challenges encountered in the separation of the text and graphics in maps can be divided into two categories. The first group of challenges result from the scanning process, and consist of artifacts caused by blurring, aliasing, and mixing of colors of adjacent pixels. Topographic maps mainly consist of very thin linear features such as contour lines and road lines that are easily affected by the blending of colors across the pixels of intersecting features and erosion of the edges. Hence unlike other printed documents, the impact of the scanning artifacts cannot be overlooked here. The main challenge in separating the graphical and textual elements however emanates from the heavy intersection and overlapping of features as they increase the likelihood of classification errors. Graphical features are in general more tolerant to extraction errors. On the other hand, the text content of maps is far more sensitive to defects since any quality degradation has a negative impact on the recognition rate of the extracted text. Misclassification of text as graphics results in partial or complete loss of character segments, while graphics segments that are not properly detected produce hard to recognize and/or conjoined characters. In addition to the difficulties associated with the large amount of noise and defects, recognition of the text content of maps is further complicated due to the fact that many of the extracted words have arbitrary orientation and/or are curvilinear and thus need to be properly processed before being sent for recognition. Our proposed map understanding system has been designed to overcome each of the aforementioned challenges. This system consists of two main stages: text and graphics separation, followed by text recognition. The contour lines are first extracted using a novel false color technique that aims to decrease the intra-class variance of the colors of contour pixels. The main contribution of our text/graphics separation unit however is our linear feature extraction algorithm which uses a new line representation method based on directional morphological filtering to extract features with arbitrary orientation and curvature such as roads and boundary lines, even when they are intersecting with the text. Once the linear features are removed, we use a series of algorithms to remove the remaining non-character objects, group the characters into their respective strings, and reorient the text to the horizontal direction. Commercial Optical Character Recognition (OCR) systems are primarily designed for noise free office documents and their performance deteriorates significantly in the presence of less than ideal conditions. We have therefore developed a custom multi-font segmentation-free OCR that combines the outputs of two sets of Hidden Markov Models (HMMs), and bigram and character width probabilities to recognize the text. A specially designed defect model that closely mimics the artifacts encountered in text extracted from maps is used to artificially generate the training sets required for each character. Another novel aspect of the recognition engine is a preprocessing algorithm that uses RANSAC to automatically eliminate some of the artifacts attached to the characters, and/or properly normalize every extracted word image in order to improve the recognition rate.