xargs is awesome tool for running parallel processes

xargs is a powerful but simple command for parallel processing. You should use it more.

#!/bin/bash

# Call dmv command on each district
# in Manhattan, processing 3 districts
# at a time

seq 65 76 | xargs -nl -P3 -I{} dmv -a {}

# Call the dmv command on selected districts
# in the Bronx, running 3 at a time
echo "77 83 85 86 87" | xargs -nl -P3 -I{} -d' ' dmv -a {}

I used to do such things with loops and job control, which is fine, but xargs is much more compact and less error prone. I used to think of xargs as a program that was used primarily for taking a multi-line text file and appending it to a command but it’s actually very useful with the -P parallel flag. Moreover, modern computers are very good at handling multiple threads at once — and actually quite slow when they don’t have threads to work with because processor clock speeds haven’t increased in decades now due the clock speed barrier.

I also use xargs now a lot with wget2 when downloading LIDAR with the parrallel flag, as you can usually download multiple files quicker then one at a time. I used to do that with multiple loops and job control, but that’s a bad way to do it as it causes race conditions and other problems with the complexity.

echo "https://gisdata.ny.gov/elevation/DEM/USGS_Schoharie2014/18TWN580400.img https://gisdata.ny.gov/elevation/DEM/USGS_Schoharie2014/18TWN595400.img https://gisdata.ny.gov/elevation/DEM/USGS_Schoharie2014/18TWN610400.img https://gisdata.ny.gov/elevation/DEM/USGS_Schoharie2014/18TWN580385.img https://gisdata.ny.gov/elevation/DEM/USGS_Schoharie2014/18TWN595385.img https://gisdata.ny.gov/elevation/DEM/USGS_Schoharie2014/18TWN610385.img" | xargs -I {} -d ' '  -P 3 wget2 -c {}

As a bonus, here’s the R script I use to create that output to download those LIDAR tiles.

library(tidyverse)
library(sf)
library(terra)
library(arcpullr)
library(mapedit)

rm(list = ls())
shape_to_download <-
  mapedit::drawFeatures() %>%
  .[1] %>% st_transform(4326)

lidar.url <- tibble()

for (i in seq(2, 14)) {
  if (nrow(lidar.url) == 0)
    lidar.url <- get_layer_by_poly(
      str_c(
        'https://elevation.its.ny.gov/arcgis/rest/services/Dem_Indexes/MapServer/',
        i
      ),
      shape_to_download,
      sp_rel = 'intersects'
    )
}

lidar.url <- lidar.url %>% st_drop_geometry()

str_c('echo "',
      paste(lidar.url$DIRECT_DL, collapse = ' '),
      '" | xargs -I {} -d \' \'  -P 3 wget2 -c {}') -> out

cat(out)
clipr::write_clip(out)

Leave a Reply

Your email address will not be published. Required fields are marked *