HTCondor week 2015 summary

Justas Balcas (Vilnius University)

2015-06-24

Outline

Justas Balcas (Vilnius University)

2015-06-24

  • Introduction
  • What`s cooking in HTCondor
    • Docker
    • condor and dagman
    • python bindings
    • Data caching
  • Other
    
    • Testing the limits of condor
    • HTCondor monitoring
      
  • Hands on tutorial

Justas Balcas (Vilnius University)

2015-06-24

HT Stands for High Throughput

Justas Balcas (Vilnius University)

2015-06-24

Throughput: the quantity of work done by
an electronic computer in a given period of
time (Dictionary.com)

HTCondor: a flexible batch queuing system

Justas Balcas (Vilnius University)

2015-06-24

  • Very configurable, adaptable
  • Supports strong security methods
  • Interoperates with many types of computing grids
  • Manages both dedicated machines and non-dedicated machines (for cycle scavenging)
  • Fault-tolerant: can survive crashes, network outages,any single point of failure

Justas Balcas (Vilnius University)

2015-06-24

  • Each job needs resources when it runs:
    • Up to 2.5 GBytes of RAM
    • Uses 20 MBytes of input
    • Requires 2 – 500 hours of computing time
    • Produces up to 27 GBytes of output

Justas Balcas (Vilnius University)

2015-06-24

Job

The HTCondor representation of a piece of work like a Unix process can be an element of a workflow

ClassAd

HTCondor’s internal data representation

Machine or Resource

Computers that can do the processing

Definitions

Justas Balcas (Vilnius University)

2015-06-24

Matchmaking

associating a job with a machine resource

Central Manager

central repository for the whole pool does matchmaking

Submit Host

the computer from which jobs are submitted to HTCondor

Execute Host

the computer that runs a job

More definitions

Justas Balcas (Vilnius University)

2015-06-24

  • Available as a free download from http://research.cs.wisc.edu/htcondor
  • Download HTCondor for your operating system
    • Available for many modern Unix platforms (including Linux and Apple’s OS/X)
    • Windows, many versions
  • Repositories
    • YUM: RHEL 4, 5, and 6 ($ yum install condor.x86_64)
    • APT: Debian 6 and 7 ($ apt-get install condor)

Getting HTCondor

What`s cooking in HTCondor

Justas Balcas (Vilnius University)

2015-06-24

  • Docker
  • condor and dagman
  • python bindings
  • Data caching

Some HTCondor v8.3 Enhancements

Justas Balcas (Vilnius University)

2015-06-24

  • Scalability and stability
  • Goal: 200k slots in one pool, 10 schedds managing 400k jobs
  • Resolved developer tickets: 240 bug fix issues (v8.2.x tickets),
  • 234 enhancement issues (v8.3 tickets)
  • Docker Job Universe
  • Tool improvements, esp condor_submit
  • IPv6 mixed mode
  • Encrypted Job Execute Directory
  • Periodic application-layer checkpoint support in Vanilla Universe
  • Submit requirements
  • New packaging

Docker

Justas Balcas (Vilnius University)

2015-06-24

Docker

Justas Balcas (Vilnius University)

2015-06-24


Docker manages Linux containers.
Containers give Linux processes a private:

  • Root file system
  • Process space
  • NATed network
  • UID space

Docker

Justas Balcas (Vilnius University)

2015-06-24

This is an “ubuntu” container

Processes in other containers on this machine can NOT see what’s going on in this “ubuntu” container

This is my host OS, running Fedora

Installation of docker universe

Justas Balcas (Vilnius University)

2015-06-24

  • Need condor 8.3.6+
  • Need docker (maybe from EPEL) ($ yum install docker-io)
    • Docker is moving fast: docker 1.6+, ideally
      • odd bugs with older dockers!
  • Condor needs to be in the docker group!
    • $ useradd –G docker condor
    • $ service docker start

Docker universe

Justas Balcas (Vilnius University)

2015-06-24

DOCKER = /usr/bin/docker

 

universe = docker
executable = /bin/my_executable
arguments = arg1
docker_image = deb7_and_HEP_stack
transfer_input_files = some_input
output = out
error = err
log = log
queue

Condor_submit

Justas Balcas (Vilnius University)

2015-06-24

The way it works

Universe = Vanilla
Executable = cook
Output = meal$(Process).out
Args = -i pasta
Queue
Args = -i chicken
Queue

Condor_submit new way 8.3.5/8.4

Justas Balcas (Vilnius University)

2015-06-24

Universe = Vanilla
Executable = cook
Output = meal$(Process).out
Args = - i $(Item)
Queue
Item in (pasta, chicken)

Condor_submit new way 8.3.5/8.4

Justas Balcas (Vilnius University)

2015-06-24

Queue <N> <var> in (<item-list>)
Queue <N> <var> matching (<glob-list>)
Queue <N> <vars> from <filename>
Queue <N> <vars> from <script>
Queue <N> <vars> from (
<multiline-list>
)

  • Iterate <items>, creating <N> jobs for each item
  • In/from/matching keywords control how we get <items>
  • This is not the full syntax description.

Condor_submit new way 8.3.5/8.4

Justas Balcas (Vilnius University)

2015-06-24

  • Improvement: Collector will not fork for queries to small tables
    • Load Collector with 100k machine ads
    • Before change: ~4.5 queries/second
    • After change: ~24.4 queries/second
  • Improvement: Schedd condor_q quantum adjusted (to 100ms)
    • Load schedd with 100k jobs ads, 40Hz job throughput
    • Before change: ~135 seconds per condor_q
    • After change: ~22 seconds per condor_q

Justas Balcas (Vilnius University)

2015-06-24

Justas Balcas (Vilnius University)

2015-06-24

Dagman changes

  • PRE/POST script retry after delay (DEFER option)
  • DAGMan handles submit file “foreach” syntax
  • Configuration:
    • Maxpre, maxpost default to 20 (was 0)
    • Maxidle defaults to 1000 (was 0)
    • Fixed DAGMan entries in param table
  • Node status file:
    • Format is now ClassAds
    • More info (retry number, procs queued and held for each node)
    • Fixed bug: final DAG status not always recorded correctly
    • ALWAYS-UPDATE option
    • Now works on Windows

Justas Balcas (Vilnius University)

2015-06-24

New command line arguments

  • -limit <num>
    • Show at most <num> records
  • -totals
    • Show only totals
  • -dag <dag-id>
    • Show all jobs in the dag
  • -autocluster -long
    • Group and count jobs that have same requirements
    • ...perfect for provisioning systems

Justas Balcas (Vilnius University)

2015-06-24

Python bindings. Why Python?

  • Plausible to do “on the side”: Clear, straightforward bridge to C++
  • HTCondor doesn’t have a “library”, so SWIG isn’t useful
  • All could be done in C++; no python in python bindings
  • Anecdotally, one of the most popular sysadmin and integrator scripting languages
  • ... because Brian wanted to

 

 

 

 

If you can do it with the command line tools, you should be able to do it with python.

Justas Balcas (Vilnius University)

2015-06-24

Python bindings. Where can we help?

  • The only thing better than feedback are patches!
    • Places Brian would love help:
      • (Better) python3 support
      • Add more unit tests
      • Get unit tests run inside the HTCondor tests
      • Better/more examples in the documentation

Justas Balcas (Vilnius University)

2015-06-24

Data Caching (HTCache)

Other

Justas Balcas (Vilnius University)

2015-06-24

  • Testing the limits of condor  
    
  • HTCondor monitoring
    

Shooting for the sky: Testing the limits of condor

Justas Balcas (Vilnius University)

2015-06-24

Shooting for the sky: Testing the limits of condor

Justas Balcas (Vilnius University)

2015-06-24

HTCondor monitoring

Justas Balcas (Vilnius University)

2015-06-24

To try it out, you can just parse the output of “condor_q” into the desired format.
Then, simply use netcat to send it to the Graphite server

#!/bin/bash
metric=”htcondor.running”
value=$(condor_q | grep R | wc -l)
timestamp=$(date +%s)
echo ”$metric $value $timestamp” | nc \
graphite.yourdomain.edu 2003

HTCondor monitoring

Justas Balcas (Vilnius University)

2015-06-24

#!/bin/bash
metric=”htcondor.running”
value=$(condor_q | grep R | wc -l)
timestamp=$(date +%s)
echo ”$metric $value $timestamp” | nc \
graphite.yourdomain.edu 2003

HTCondor monitoring

Justas Balcas (Vilnius University)

2015-06-24

#!/bin/bash
metric=”htcondor.running”
value=$(condor_q | grep R | wc -l)
timestamp=$(date +%s)
echo ”$metric $value $timestamp” | nc \
graphite.yourdomain.edu 2003
import classad, htcondor
coll = htcondor.Collector("htcondor.domain.edu")
slotState = coll.query(htcondor.AdTypes.Startd, "true",
            ['Name','JobId','State', 'RemoteOwner','COLLECTOR_HOST_STRING'])
for slot in slotState[:]:
    if (slot['State'] == "Claimed"):
        slot_claimed += 1
print "condor.claimed "+ str(slot_claimed) + " " + str(timestamp)

HTCondor monitoring

Justas Balcas (Vilnius University)

2015-06-24

Python script polls the history logs periodically for new entries and publishes this to a Redis channel.

Classads get published to a channel on the Redis server and read by Logstash

Due to size of classads on Elasticsearch and because ES only works on data in memory,

data goes into a new

index each month.

HTCondor monitoring

Justas Balcas (Vilnius University)

2015-06-24

Python script is run every minute by a cronjob and collects classads for all jobs.

HTCondor monitoring

Justas Balcas (Vilnius University)

2015-06-24

HTCondor monitoring

Justas Balcas (Vilnius University)

2015-06-24

Hands-on tutorial

Justas Balcas (Vilnius University)

2015-06-24

Is it easy to lie?

Justas Balcas (Vilnius University)

2015-06-24

it is easy to lie

Justas Balcas (Vilnius University)

2015-06-24

Default values:
NUM_CPUS = $(DETECTED_CPUS)
MEMORY = $(DETECTED_MEMORY)
$ condor_config_val -dump | grep DETECTED*
DETECTED_CORES = 8
DETECTED_CPUS = 8
DETECTED_MEMORY = 15960
DETECTED_PHYSICAL_CPUS = 4
$ condor_config_val -dump | grep NUM_CPUS*
NUM_CPUS = 8
$ condor_status -totals
                     Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX     8     0       0         8       0          0        0

               Total     8     0       0         8       0          0        0

it is easy to lie

Justas Balcas (Vilnius University)

2015-06-24

Increase num of CPUs and memory
NUM_CPUS = 32
MEMORY = $(DETECTED_MEMORY)*32
Default values:
NUM_CPUS = $(DETECTED_CPUS)
MEMORY = $(DETECTED_MEMORY)
$ condor_config_val -dump | grep DETECTED*
DETECTED_CORES = 8
DETECTED_CPUS = 8
DETECTED_MEMORY = 15960
DETECTED_PHYSICAL_CPUS = 4
$ condor_config_val -dump | grep NUM_CPUS*
NUM_CPUS = 32
$ condor_status -totals
                     Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX    32     0       0        32       0          0        0

               Total    32     0       0        32       0          0        0

What about GPUs?

Justas Balcas (Vilnius University)

2015-06-24

$ condor_status -long slot1@jbalcas | grep -i gpus
TotalGPUs = 2
DetectedGPUs = 2
AssignedGPUs = "CUDA0"
MachineResources = "Cpus Memory Disk Swap GPUs"
GPUs = 1
TotalSlotGPUs = 1
$ condor_config_val -dump gpus
ENVIRONMENT_FOR_AssignedGPUs = GPU_NAME GPU_ID=/CUDA//
ENVIRONMENT_VALUE_FOR_UnAssignedGPUs = 10000
MACHINE_RESOURCE_GPUs = CUDA0, CUDA1

If your graphics card is from NVIDIA and it is listed in http://developer.nvidia.com/cuda-gpus, your GPU is CUDA-capable.

Requirements:

Suspendable and not suspendable slots

Justas Balcas (Vilnius University)

2015-06-24


MyNonSuspendableSlotIsIdle = \
     (NonSuspendableSlotState =!= "Claimed" && \
	  NonSuspendableSlotState =!= "Preempting")

#NonSuspendable slots are always willing to start jobs.
#Suspendable slots are only willing to start if the NonSuspendable slot is idle
START = \
	IsSuspendableSlot!=True && IsSuspendableJob=!=True || \
	IsSuspendableSlot && IsSuspendableJob==True && $(MyNonSuspendableSlotIsIdle)

# Suspend the suspendable slot if the other slot is busy.
SUSPEND = \
	IsSuspendableSlot && $(MyNonSuspendableSlotIsIdle)!=True
CONTINUE = ($(SUSPEND)) != True

deck

By Justas Balcas

deck

  • 642