geology Q&As on videos/ articles. Week 3 article (pdf) and questions on (word doc) attached Week 5 article (pdf) and questions (word doc) attached We

geology
Q&As on videos/ articles.
Week 3 article (pdf) and questions on (word doc) attached
Week 5 article (pdf) and questions (word doc) attached
Week 4 video link https://youtu.be/CTw4cfZyTMw and questions (word doc) are attached
Week 5 video link https://youtu.be/1zP9cMU6AvE and questions (word doc) are attached
PLEASE PAY ATTENTION TO THE TITLES OF THE ATTACHMENTS!!!

ENV151L Week 3 Journal Article Response Page 1 of 1

Don't use plagiarized sources. Get Your Custom Assignment on
geology Q&As on videos/ articles. Week 3 article (pdf) and questions on (word doc) attached Week 5 article (pdf) and questions (word doc) attached We
From as Little as $13/Page

Your Name:

ENV151L

Week 3 Reflection

Instructions: Read the Week 3 Article posted on Blackboard.

Write your responses below in the green boxes.

NOTE: The green boxes will expand when you type to the end of them.

1. Purpose of the Research. What was the article about? This is a brief responseno more than 5 sentences total

2. Methods. Which methods were used in the article?

3. What are the major finding of the article?

4. Explain in no more than 3 sentences how the information in the article points to the designing work of a Creator.

Now Submit this completed Document to Blackboard to be graded. ENV151L Week 4 Video Response Page 1 of 1

Your Name:

ENV151L

Week 4 Reflection

Instructions: Read the Week 4 Video posted on Blackboard.

Write your responses below in the green boxes.

NOTE: The green boxes will expand when you type to the end of them.

1. Purpose of the Research. What was the video about? This is a brief responseno more than 5 sentences total

2. Methods. Which methods were used in the video?

3. What are the major finding of the video?

4. Explain in no more than 3 sentences how the information in the video points to the designing work of a Creator.

Now Submit this completed Document to Blackboard to be graded. ENV151L Week 5 Journal Article Response Page 1 of 1

Your Name:

ENV151L

Week 5 Reflection

Instructions: Read the Week 5 Article posted on Blackboard.

Write your responses below in the green boxes.

NOTE: The green boxes will expand when you type to the end of them.

1. Purpose of the Research. What was the article about? This is a brief responseno more than 5 sentences total

2. Methods. Which methods were used in the article?

3. What are the major finding of the article?

4. Explain in no more than 3 sentences how the information in the article points to the designing work of a Creator.

Now Submit this completed Document to Blackboard to be graded. ENV151L Week 6 Video Response Page 1 of 1

Your Name:

ENV151L

Week 6 Reflection

Instructions: Read the Week 6 Video posted on Blackboard.

Write your responses below in the green boxes.

NOTE: The green boxes will expand when you type to the end of them.

1. Purpose of the Research. What was the video about? This is a brief responseno more than 5 sentences total

2. Methods. Which methods were used in the video?

3. What are the major finding of the video?

4. Explain in no more than 3 sentences how the information in the video points to the designing work of a Creator.

Now Submit this completed Document to Blackboard to be graded. 400 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 10, NO. 2, FEBRUARY 2017

The Reconnection of Contour Lines from Scanned
Color Images of Topographical Maps Based

on GPU Implementation
Jianfeng Song, Panfeng Wang, Qiguang Miao, Ruyi Liu, and Bormin Huang

AbstractThis paper presents a method for the reconnection of
contour lines from scanned color images of topographical maps
based on graphics processing unit (GPU) implementation. The ex-
traction of contour lines, which are shown with brown color on
USGS maps, is a difficult process due to aliasing and false colors
induced by the scanning process and due to closely spaced and
intersecting/overlapping features inherent to the map. First, an
effective method is presented for contour line reconnection from
scanned topographical maps based on CPU. This method consid-
ers both the distance and direction between the two broken points
of the contour lines. It gets better performance and has high con-
nection rate, but the time complexity of the algorithm is nonlinear
with the increasing size of topographical map. Second, the advan-
tage of the massively parallel computing capability of GPU with
the compute unified device architecture is taken to improve the
algorithm. Finally, a better performance has been achieved based
on the open source computer vision library. The experimental re-
sults show that the GPU implementation with loop-based patterns
achieves a speedup of 1360 and the identical result compared
with the implementation on CPU.

Index TermsCompute unified device architecture (CUDA),
contour lines, graphics processing unit (GPU), open source
computer vision (OpenCV).

I. INTRODUCTION

G RAPHICS processing unit (GPU) is designed for displayon the computer graphics rendering. It has been widely
used in scientific computing and image processing. It is
especially suitable for the computation of parallel data. The
popular distributed GPU computing model has promoted the

Manuscript received January 12, 2016; revised May 23, 2016; accepted June
9, 2016. Date of publication June 29, 2016; date of current version January 23,
2017. The work was supported in part by the National Natural Science Foun-
dations of China under Grant 61472302, Grant 61272280, Grant 41271447,
and Grant U1404620, in part by the Program for New Century Excellent Tal-
ents in University under Grant NCET-12-0919, in part by the Fundamental
Research Funds for the Central Universities under Grant K5051203020, Grant
K5051303018, Grant JB150313, and Grant BDY081422, in part by the Natural
Science Foundation of Shaanxi Province under Grant 2014JM8310 and Grant
2010JM8027, in part by the Creative Project of the Science and Technology
State of Xian under Grant CXY1441(1), and in part by the State Key Lab-
oratory of Geo-information Engineering under Grant SKLGIE2014-M-4-4-4.
(Corresponding author: Qiguang Miao.)

The authors are with the School of Computer and Technology, Xidian Univer-
sity, Xian 710071, China (e-mail: [emailprotected]; [emailprotected]
163.com; [emailprotected]; [emailprotected]; [emailprotected]
gmail.com).

Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSTARS.2016.2580903

development of the top supercomputers [1]. Only one thread
can run on one core of CPU at a moment. GPU uses the
lightweight thread that is managed by the hardware which can
easily switch with low cost. In the hardware limit, the more
the compute unified device architecture (CUDA) threads, the
better computing performance you will get. GPU has hundreds
of parallel processing units and memory bandwidth which are
several times than a computer. GPU has far exceeded CPU
in terms of computing capacity. Parallel computing of GPU
has become the key technology of accelerating the numerical
simulation of science and engineering application.

A contour line of a function of two variables is a curve along
which the function has a constant value. Contour lines are the
main graphical element to characterize three-dimensional (3-D)
terrain on 2-D maps. Color image segmentation is a general
method used to separate those desirable features (contour lines)
from undesirable features (other information on the maps, like
stream, character, etc.). But in the pretreatment of topographical
map color segmentation [2][7] including background removal
[8], [9] and denoising, the situation such as color confusion and
overlapping can cause contour gaps. The contour gaps are due
to three causes. First, scanned color topographical map infor-
mation is not complete and contour line segments are lost with
a great randomness. Second, contour line segments are covered
by other color information like black text and kilometer grid.
Third, there are some supplementary contour lines on the to-
pographical map which are drawn with short dashed lines. The
existing contour line connection algorithms belong to the fol-
lowing categories. Xueliang et al. [10] use a maximum clique
graph searching method to reconnect contour lines. The algo-
rithm is a graph theoretic searching method and the error rate of
this method is relatively low. The method has automatic error
correction capability, but it needs artificial participation and is
time consuming. Ping [11] uses a Feynman code to reconnect
contour lines. This method calculates the code of extended line
based on pixels after the broken points of the contour lines.
The intersection of two pixels extension part on the broken
line achieves the connection. This method performs better on
scanned color images of topographical maps that have thicker
features which spaced far apart from each other. However, their
algorithm is ineffective since the error rate may rise in a situation
that contour lines are intensive. Yi [12] develops a method to
reconnect contour lines based on prior knowledge. This method
takes different ways to search and match broken points of the

1939-1404 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

SONG et al.: THE RECONNECTION OF CONTOUR LINES FROM SCANNED COLOR IMAGES OF TOPOGRAPHICAL MAPS 401

Fig. 1. Preprocessing results of contour line reconnection. (a) Resulting image
with background of topographic map removed. (b) Segmented and denoised
resulting image. (c) Resulting image with binarization. (d) Thinned and pruned
resulting image.

contour lines according to the specific circumstances such as
gulch, escarpment, and text label. It avoids the blindness of con-
ventional methods, but it is very difficult to integrate all kinds
of algorithms for connecting those broken lines. Another new
approach combined with the minimum distance and direction
difference is proposed by Li [13]. This method considers both
the distance and direction between two broken points of the con-
tour lines. It gets better performance and has high connection
rate, but the computation cost is high and could not meet the
requirements of practical applications. Our prior method could
not solve the performance requirements especially time perfor-
mance in the practical engineering. We improve the algorithm
combined with the minimum distance and direction difference
and solve the time performance of reconnection of contour lines
with NVIDIA GPU. Whats more, the GPU implementation can
be used in practical engineering application. The rest of the paper
is organized as follows. Section II describes the algorithm design
of contour connection. Section III explains the GPUs implemen-
tation of contour connection. Section IV concludes the paper.

II. ALGORITHM DESIGN OF CONTOUR CONNECTION

Color image segmentation is a general method used to sep-
arate those desirable features (contour lines) from undesirable
features (other information on the maps, like stream, character,
etc.). The background of topographical map influences the ex-
traction process because the color of contour lines is changed in
contrast to their background. First, we remove the background of
topographical map with linear feature separation method [14].
Second, we use GustafsonKessel clustering algorithm to ex-
tract contour lines. Third, the preprocessing in extraction of
contour lines is a significant step after extracting the brown con-
tour lines from scanned topographical map. The preprocessing
of the brown contour lines requires several steps which includes
denoising [15][17], binarization [18][24], thinning [25] and
pruning. A simple example which illustrates the preprocessing
of contour line reconnection are in Fig. 1.

Fig. 8 shows eight neighborhoods pixels (P1 , P2 , P3 , P4 ,
P5 , P6 , P7 , P8 ) of central pixel P0 .

Set the foreground pixels as 0 and the background pixels as 1.

Fig. 2. Eight neighborhoods regions of central pixel P0 .

Fig. 3. Break contours in a thinned image.

P0 is a break point in a thinned image, if (1){
P0 = 0
P1 + P2 + P3 + P4 + P5 + P6 + P7 + P8 = 6.

(1)

P0 is a contour intersection in a thinned image, if (2){
P0 = 0
P1 + P2 + P3 + P4 + P5 + P6 + P7 + P8 < 6. (2) The input of contour connection method is a thinned image. First, all break points in the map are found. Then, search for the other break point for each break point which has similar direction and the shortest distance. Finally, connection function is used to connect these two break points. In the topographical maps, most of the pairs of break points are in the opposite or parabola direction. As shown in Fig. 3, color lines are break contours. Red break contours are in parabola direction and blue break contours are in the opposite direction. The position of the break point can be obtained by searching the thinned image. The match condition between the two break points is as follows. A. Direction Condition Fig. 4 shows eight directions of the central break points which coordinates are (i, j). The opposite direction includes d0 and d4 , d2 , and d6 . The parabola directions include d0 and d2 , d2 and d4 , d4 and d6 , d6 and d0 . Any break points direction is obtained by backtracking, and the break point is added to the break point set when its direction is appropriate to the central break point. B. Distance Condition and Angle Condition Distance condition means the limitation of Euclidean distance between two break points. The distance is calculated as follows: Ds (xP a , yP a , xP b , yP b ) = (xP a xP b )2 + (yP a yP b )2 (3) 402 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 10, NO. 2, FEBRUARY 2017 Fig. 4. Eight directions of the central break point. where Pa , Pb are two different point, xP a is the x-coordinate of Pa , xP b is the x-coordinate of Pb , yP a is y-coordinate of Pa , yP b is y-coordinate of Pb .The distance condition is Ds < 1 , where 1 is a smaller constant. Angle condition means the limitation of curve tendency of the contour lines. The angle of two different points is defined as: (xP a , yP a , xP b , yP b ) = arctan( yP a yP b xP a xP b ) (4) where Pa , Pb are two different point, xP a is the x-coordinate of Pa , xP b is the x-coordinate of Pb , yP a is the y-coordinate of Pa , and yP b is the y-coordinate of Pb . The angle of contour line is (Pa , L) = average((Pa , Pi ))i = a + 1, a + 2, . . .a + k (5) where Pa is a break point of the contour line L, Pi is a point on the contour line, and there are ten points between Pi and Pi+ 1 . The value k of the factor can be determined empirically. If there are two contour lines L0 and L1 , then P0 is the break point of L0 and P1 is the break point of L1 . We calculate (P0 , L0 ), (P1 , L1 ), and (P0 , P1 ) where P1 is in the P0 s search window region. The angle condition is as follows: max{|(P0 , L0 ) (P0 , P1 )| , |(P1 , L1 ) (P0 , P1 )|}< 2 (6) where 2 is a smaller constant. In this paper, we combine the distance condition and an- gle condition and we finally calculate the function value (P0 , P1 , L0 , L1 ) as follows: (P0 , P1 , L0 , L1 ) = max{|(P0 , L0 ) (P0 , P1 )|, |(P1 , L1 ) (P0 , P1 )|} 1 + Ds (P0 , P1 ) 2 (7) where 1 is angle weight and 2 is distance weight, the two parameters can be determined empirically. (P1 , L1 ) = arg min{(P0 , Pi , L0 , Li )}. (8) Fig. 5. C code to determine an intersection point. Fig. 6. Process of contour connection. P1 is considered to be the optimal match to P0 and the two break points belong to the same contour line. The contour lines L0 ,L1 should be connected. C. Crossing Line Condition Contour lines never intersect each other and they form closed loops in topographical maps. The reconnection process will be canceled if there is an intersection point in the new line which the two break points constitute. Fig. 5 shows the C code which determines whether there is an intersection point. According to the above-mentioned analysis, the process of contour line reconnection can be expressed as the following steps and the process of contour connection in our study is depicted in Fig. 6. Step1: Set the initial value hstart, the window threshold hmax, the step size hadd, the angle weight 1 , and the distance weight 2 . Set the candidate points set S empty and track the curve to the first break point P0 . Step 2: In the region which has the size of h h and centered around P0 , search the other break points P1 , P2 , P3 , ..., Pn and add them to candidate point set S. Step 3: Compare the direction of P0 and every other points Pi in S. Delete the break points which are not in the opposite or SONG et al.: THE RECONNECTION OF CONTOUR LINES FROM SCANNED COLOR IMAGES OF TOPOGRAPHICAL MAPS 403 Fig. 7. CUDA threads hierarchy. parabola direction relative to P0 . Meanwhile, delete the break points which are in the same line with P0 . Step 4: Delete those break points which dissatisfy the distance and angle constraints from S. If S is empty, then jump to step 7. Step 5: Find the minimum of function (P0 , Pi ) and get the break point Pbest, and then connect P0 and Pbest. Step 6: If there is no cross point in this region, then the connection is done. Otherwise, cancel this connection and jump to step 7. Step 7: If h < hmax, then let h = h+hadd and jump to step 2. Otherwise, the iteration is over. The efficiency of the program is very low and it takes more than 1 h to reconnect 6568 8009 thinned image. The algorithm combined with the minimum distance and direction difference includes these characteristics. First, the image data are parallel. Parallel data are the property of application program, and calcu- lation operations can be safely performed according to certain data structures at the same time. Second, the problem can be partitioned into subproblems. We can use multithreaded pro- gram implementation to improve the performance. Third, the algorithm is one of the local connection methods and the loop operation of the whole image can be replaced by CUDA thread and block indices. III. GPU IMPLEMENTATION OF CONTOUR CONNECTION A. CUDA C Programming Method CUDA is a general solution of complete set of parallel com- puting launched by NVIDIA and designed for parallel comput- ing hardware and software architecture [26]. CUDA is widely used in matrix numerical computing, video and image process- ing, machine learning, etc. GPU could be regarded as an equip- ment which can carry out a large number of parallel threads when CUDA application is compiled [27]. Tens of thousands Fig. 8. Schematic visualization of a multiprocessor. of threads can be used to accelerate the CUDA application that can make the program execution performance for exponential improvement. CUDA C programming language is a simple ex- tension of the standard C programming language and it is very convenient for developers to access memory data with pointer operation in the kernel function. GPU applications can be devel- oped in a grammar that is quite close to the standard C program- ming language. The general process for CUDA C programming is accomplished by multistep process. Initially, the parallel part in the algorithm including the parallel data and process should be separated from the whole algorithm, and then the host mem- ory is copied to the device memory before the kernel function is called. A kernel function should point out thread blocks and the number of threads per block. Finally, the device memory data should be copied back to the host memory and GPU computing result is obtained. The CUDA code consists of three computa- tional phases: transmission of data into the global memory of the GPU, execution of the CUDA kernel, and transmission of results from the GPU into the memory of CPU. The CUDA thread hierarchy is showed in Fig. 7. CUDA takes a bottom-up point of view of parallelism in which a thread is an atomic unit of parallelism. Threads are organized into a three-level hierar- chy. The highest level is a grid, which consists of thread blocks. A grid is a 3-D array of thread blocks. Thread blocks implement coarse-grained scalable data parallelism and they are executed 404 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 10, NO. 2, FEBRUARY 2017 independently, which allows them to be scheduled in any order across any number of cores. This allows the CUDA code to scale with the number of processors. A kernel can be executed by multiple equally shaped thread blocks, so that the total num- ber of threads is equal to the number of threads per block times the number of blocks. The number of thread blocks in a grid is usually dictated by the size of the data being processed or the number of processors in the system, which it can greatly exceed. B. GPU Implementation Based on Divide-and-Conquer Patterns The massive parallelism on the GPU hardware is achieved by a plurality of the same stream multiprocessors. Each stream mul- tiprocessor independently schedules its internal resources and execution units to deal with assigned thread. NVIDIA GeForce GTX 750 GPU has four stream multiprocessors. Fig. 8 presents a schematic visualization of a multiprocessor. Each multipro- cessor has 32 CUDA cores and it executes in parallel with the other multiprocessors. All the 32 cores in a group execute in a single instruction multiple data fashion with all cores in the same multiprocessor executing the same instruction at the same time. There are eight special function units (SFU) inside each multiprocessor for single-precision floating-point transcenden- tal functions. Each multiprocessor has eight load (LD)/store (ST) units for memory access. Each multiprocessor also has 32768 32-bit registers which can be accessed in a single clock cycle. GTX 750 has 1 GB of global memory, which has a higher bandwidth than the DRAM memory in the CPUs. In massively parallel computing, divide and conquer is the most common and most effective strategy. Programmer should partition the prob- lem into coarse subproblems that can be solved independently in parallel by blocks of threads, and each subproblem into finer pieces that can be solved cooperatively in parallel by all threads within the block. By giving each thread a unique global index in the ith and jth domain using thread and block indices. The result of this thread mapping is that multiple threads are carrying out the same computation on different image data in parallel. We use open source computer vision (OpenCV) code library to read image data from local disk and assign the appropriate size on the GPU device memory through CUDA memory manage- ment functions. Image data are copied from the host memory to the device memory and then kernel function is called to reconnect the contour line. The method and the constant pa- rameters to search and match break points remain the same as CPU method. After all the threads exit normally, CUDA mem- ory management functions should be called to copy the result image back to the host memory. The number of threads per block is 64. The size of the grid is determined according to the size of the image and the block size. For fair comparison, we ran both our GPU code and CPU code on our HP Z600 Work- station (Intel(R) Xeon(R) CPU E5520 @2.27 GHz) equipped with NVIDIA GTX 750 GPU where thread block size = 64, image block = 120 120, hmax = 50, hadd = 5, 1 = 0.35, 2 = 0.65 and no I/O transfer. As seen from the speedup results in Table I, our implementation on a GPU is faster than the other implementation because we make full use of parallel progra- mming approach. We achieves a speedup of 251.2 without TABLE I RUNNING TIME AND SPEEDUP OF CUDA C IMPLEMENTATION OF THE RECONNECTION OF CONTOUR LINES AS COMPARED TO ITS CPU COUNTERPART RUNNING ON ONE CPU CORE CPU runtime (s) GPU runtime (s) Speedup 4020 16 251.2 Fig. 9. Image A (1293 705). Fig. 10. Image B (1904 973). any program and algorithm optimization. We use connection rate to measure the accuracy of the algorithm results. The con- nection rate is defined as w = F/G (9) where F is the number of connected break points and G is the number of the total break points. Figs. 9 to 12 show the examples of image in different size. Fig. 13 demonstrates the connection rate of different images. As seen from Fig. 13, in general, the larger image block, the better connection results are obtained. The main reason is the break points between different blocks cannot be connected in parallel process. But there is an exception when the image block size is 90 90 and the image size is 1293 705 which is named Image A. It is the kilometer grid in the image that causes a large number of break point and the image block boundary is just near the kilometer grid which leads to unconnected break points. Another problem is that we cannot set a fixed block size to obtain the best result in different image. In a word, the result based on divide and conquer in parallel programming cannot always be the same as the CPU in serial programming. SONG et al.: THE RECONNECTION OF CONTOUR LINES FROM SCANNED COLOR IMAGES OF TOPOGRAPHICAL MAPS 405 Fig. 11. Image C (5000 3810). Fig. 12. Image D (6568 8009). Fig. 13. Connection rate of different images in different size. C. Further Improvements With Loop-Based Patterns If the number of thread that we schedule is equal to the number of pixels of in the image, computing performance is improved because more threads can make up the reading latency of a sin- gle thread in general. Loop-based patterns are used to expand the loop in the program. In order to improve the image process- ing result, GPU is used to search and record the matched break points. The cross judgments and reconnection process are exe- cuted on CPU. We separate the image writing operations from the kernel function. The loops for spatial grid points (i, j) are Fig. 14. Contrast of connection rate between divide-and-conquer pattern and loop-based patterns. Fig. 15. Contrast of speedup between divide-and-conquer pattern and loop-based patterns. replaced by index computations using thread and block indices: i = blockIdx.x blockDim.x + threadIdx.x j = blockIdx.y blockDim.y + threadIdx.y dim3 dimGrid((h + T I 1)/T I, (w + T I 1)/T I) dim3 dimBlock(T I, T I) where h and w are the height and width of the image, the T I is the dimension of thread block.One thread deals with one image pixel processing and part of the computing tasks are as- signed to the CPU. Fig. 15 gives the contrast of connection rate between divide-and-conquer pattern and loop-based pat- tern. The image block is 120 120. Loop-based patterns are useful. The result image is the same as the one on CPU pro- gramming. Large number of threads in parallel improve the computing performance as Fig. 14 shows. As Fig. 15 shows, the acceleration of the GPU is not very obvious when the image size is small. GPU makes great acceleration for larger images and achieves higher computation performance with loop-based pattern. CPU takes about 67 min, but the improved algorithm using GPU reduces the computation time to 13 s and achieves a speedup of 309.2. 406 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 10, NO. 2, FEBRUARY 2017 Fig. 16. Speedup of the CUDA C codes versus block size. D. Impact of Block Size on GPU-Based Contour Connection Efficiency of the CUDA C program can be further improved by adjusting the thread block size. The block size is a multiple of 32 threads (i.e., a warp), the computing performance is usually better than the neighboring block sizes since threads of 32 are grouped together and the multiprocessor sends the same instruc- tion to all the threads in a warp in order to make execution more efficient, which is one of the merits of GPU architecture. But it may cause excessive memory requirements which make the low efficiency of stream processors when the block size is big. The effect of the thread block size on the computing performance was evaluated by varying the block size whilst keeping the reg- isters per threads at 43. Fig. 16 shows the speedup of the CUDA C codes versus the block size. It was found that the block size of 64 could produce the best performance. This is what we used in previous sections. E. Impact of Registers/Thread on GPU-Based Contour Connection If we define possible variables as registers, the computation performance is greatly improved because global memory ac- cess for data transfer takes much more time than register access. There are thousands of registers in each multiprocessor stream. The kernel function calculates the number of registers per thread when the CUDA C program is compiled. If the number of reg- isters per thread is too large, the number of thread blocks that can be scheduled in each multiprocessor stream is limited. If the number of threads is too small, the GPU hardware resource can- not be fully used and the performance decreases. Fig. 17 shows that the best computation performance occurs at 43 registers per thread. F. Data Transfer Between CPU and GPU This experiment supports the general findings that data trans- fer takes up a lot of time and apparently is an important factor for the speedup. When I/O transfer is considered, the use of overlapped CUDA kernel execution with asynchronous mem- ory access for data transfer can be applied. Each NVidia Tesla Fig. 17. Speedup versus registers per thread. Fig. 18. Overlapping memory transfer with host-to-device memory transfer,
GPU kernel execution, and device-to-host memory transfer.

TABLE II
RUNTIME AND SPEEDUP FOR DATA TRANSFER BETWEEN CPU AND GPU

Total runtime Host to
device

Kernel
execution

Device to
host

CPU 4020 000 ms
GPU (with nonoverlapping) 11 764 ms 16 ms 11 716 ms 32 ms
GPU (with overlapping) 11 706 ms
Speedup (with nonoverlapping) 341.7
Speedup (with overlapping) 343.4

K40 has one copy engine and one kernel engine, allowing over-
lapped data transfer and kernel execution. We choose NVidia
Tesla K40 with 2880 cores as the GPU equipment. A diagram
explaining the computation timeline of the contour line recon-
nection of GPU implementation is depicted in Fig. 18. As com-
mands pipeline, streams, illustrated in different colors, execute
commands in a manner of first-in-first-out order. As a result, the
same stream would arrange the overlapping between CPU-to-
GPU and GPU-to-CPU memory transfers and kernel execution
on GPUs with two copy engines. Table II lists the GPU running
time using thread block size = 64, registers per thread = 43,
image size = 6568 8009, hmax = 50 , hadd = 5 , 1 = 0.35,

SONG et al.: THE RECONNECTION OF CONTOUR LINES FROM SCANNED COLOR IMAGES OF TOPOGRAPHICAL MAPS 407

TABLE III
EXPERIMENTS UNDER DIFFERENT GPU DEVICES

GPU devices CUDA cores Speedup

NVIDIA GEFORCE GTX 750 512 343.4
NVIDIA GEFORCE GTX 650Ti 768 510.8
NVIDIA GEFORCE GTX 760 1152 816.5
NVIDIA Tesla K40 2880 1360.9

Fig. 19. Resulting image of contour lines reconnection.

2 = 0.65. When overlapping access is taken into account,
the results of computing performance are listed at the sec-
ond column of Table II, where it shows that using overlapping
access can achieve better performance than nonoverlapping ac-
cess. Different GPU devices obtain different computation per-
formances and Table III lists the experiment results where thread
block size = 64, registers per thread = 43, image size = 6568
8009, hmax = 50, hadd = 5, 1 = 0.35, 2 = 0.65. As Table III
shows, we have achieved a speedup of 1360.9 and the same
connection rate compared with the implementation on CPU.
Fig. 19 shows the result of contour line reconnection in which
pink lines are reconnected lines.

IV. CONCLUSION

Contour lines are the main graphical element to characterize
3-D terrain on 2-D maps. But in the pretreatment of topographi-
cal map color segmentation including background removal and
denoising, the situation such as color confusion and overlapping
can cause contour gaps. Contour lines are always overlapped or
intersected with other information on the topographical maps. It
is difficult to clean up all other unexpected information while re-
taining the accurate contour data due to closely spaced and over-
lapped features inherent to the map. In this paper, we present a
method for the reconnection of contour lines from scanned color
images of topographical maps based on GPU implementation.
Th