Data Driven Pages and Python's Multiprocessing

Utilizing Data Driven Pages in arcpy is a great way to export a map series quickly. But what if it isn't fast enough?

Spokane County Assessors Department maintains a land ownership map series of 3600 pages. Updating these maps takes far too long. 

Platform Time to produce
ArcMap (w/ ArcObjects) 15 hours+
arcpy.mapping 9-12 hours
?? ??

mxd = arcpy.mapping.MapDocument("C:\\Map.mxd")

#Create DDP object
ddp = mxd.dataDrivenPages

#Export PDF for each page in MXD
for pageNum in range(1, mxd.dataDrivenPages.pageCount + 1):

    #set current page
    mxd.dataDrivenPages.currentPageID = pageNum
	
    #export!	
    ddp.exportToPDF("C:\\" + pageNum ,"CURRENT","PDF_SINGLE_FILE")

Basic DDP operation

Future Note: (DDP is now 'Map Series' and mapping is arcpy.mp @ ArcGIS Pro)

With Multiprocessing we can speed this up!

Platform Time to produce
ArcMap (GUI) 15 hours
arcpy.mapping 9-12 hours
multiprocessing.Pool(11) 2 hours 

What is Multiprocessing?

multiprocessing is a package that supports spawning processes.

home construction analogy:

 

A job site has: Blueprints, a Crew and a Boss

def exportMaps((map, pageList)):
    # Data Driven Pages export 




mapRanges = (
    [MXD, list(range(1, 355))],
    [MXD, list(range(355, 710))],
    [MXD, list(range(710, 1065))],
    [MXD, list(range(1065, 1420))],
    [MXD, list(range(1420, 1775))],
    [MXD, list(range(1775, 2130))],
    [MXD, list(range(2130, 2485))],
    [MXD, list(range(2485, 2840))],
    [MXD, list(range(2840, 3195))],
    [MXD, list(range(3195, 3587))]
)


def createmaps_handler():
    p = multiprocessing.Pool(10)
    p.map(exportMaps, mapRanges)



#Initiate script by running handler function
if __name__ == '__main__':
    createmaps_handler()

Boss:

createmaps_handler gathers the crew and assigns work

Crew:
mapRanges splits the work out into equal units. These units will be fed to individual processes.

Blueprints:
exportMaps holds the work: map creation.

calling createmaps_handler will spin out the processes

BAM!!

Multiprocessing runs optimally when work is spread equally

Initial tests were disappointing because the most time consuming maps were not equally distributed between workers.

Maps: 1-355                   Duration: 1:31
             356-710                               1:34
             711-1065                             1:35
             1066-1420                           1:36
             1421-1775                           2:05
             1776-2130                           3:33
             2131-2485                           2:33
             2486-2840                           1:36
             2841-3195                           1:25
             3196-3587                           1:21

            

Urban Maps

Rural Maps

Maps:            1-355                   Duration: 1:31
                       356-710                               1:34
                       711-1065                             1:35
                       1066-1420                           1:36
                       1421-1775                           2:05
                       1776-2130                           3:33
                       2131-2485                           2:33
                       2486-2840                           1:39
                       2841-3195                           1:25
                       3196-3587                           1:21

Unequal Spread

Equal Spread

Maps:            1-355                   Duration: 1:52
                       356-710                               1:52
                       711-1065                             1:57
                       1066-1420                           2:08
                       1421-1775                           1:57
                       1776-2130                           2:02
                       2131-2485                           1:54
                       2486-2840                           1:58
                       2841-3195                           1:52
                       3196-3587                           2:01

Thanks!

Phil Larkin

pslarkin@spokanecounty.org

https://slides.com/psl/multi_arcpy/

multiprocessing and arcpy

By psl

multiprocessing and arcpy

Using Python's multiprocessing module can help speed up arcpy's data driven pages operations.

  • 824