HTCondor

HTCondor
Developer(s)	University of Wisconsin–Madison
Stable release	23.0.10 LTS / May 9, 2024; 4 months ago
Preview release	23.7.2 / May 16, 2024; 3 months ago
Repository	github.com/htcondor/htcondor ;
Written in	C++, Python, Perl
Operating system	Microsoft Windows, Mac OS X, Linux, FreeBSD
Type	High-Throughput Computing
License	Apache License 2.0
Website	htcondor.org

HTCondor is an open-source high-throughput computing software framework for coarse-grained distributed parallelization of computationally intensive tasks.^[1] It can be used to manage workload on a dedicated cluster of computers, or to farm out work to idle desktop computers – so-called cycle scavenging. HTCondor runs on Linux, Unix, Mac OS X, FreeBSD, and Microsoft Windows operating systems. HTCondor can integrate both dedicated resources (rack-mounted clusters) and non-dedicated desktop machines (cycle scavenging) into one computing environment.

HTCondor is developed by the HTCondor team at the University of Wisconsin–Madison and is freely available for use. HTCondor follows an open-source philosophy and is licensed under the Apache License 2.0.^[2]

While HTCondor makes use of unused computing time, leaving computers turned on for use with HTCondor will increase energy consumption and associated costs. Starting from version 7.1.1, HTCondor can hibernate and wake machines based on user-specified policies, a feature previously available only via third-party software.

History

The development of HTCondor started in 1988.

HTCondor was formerly known as Condor; the name was changed in October 2012 to resolve a trademark lawsuit.^[3]

HTCondor was the scheduler software used to distribute jobs for the first draft assembly of the Human Genome.

Example of use

The NASA Advanced Supercomputing facility (NAS) HTCondor pool consists of approximately 350 SGI and Sun workstations purchased and used for software development, visualization, email, document preparation, and other tasks. Each workstation runs a daemon that watches user I/O and CPU load. When a workstation has been idle for two hours, a job from the batch queue is assigned to the workstation and will run until the daemon detects a keystroke, mouse motion, or high non-HTCondor CPU usage. At that point, the job will be removed from the workstation and placed back on the batch queue.

Features

HTCondor can run both sequential and parallel jobs. Sequential jobs can be run in several different "universes", including "vanilla" which provides the ability to run most "batch ready" programs, and "standard universe" in which the target application is re-linked with the HTCondor I/O library which provides for remote job I/O and job checkpointing. HTCondor also provides a "local universe" which allows jobs to run on the "submit host".

In the world of parallel jobs, HTCondor supports the standard Message Passing Interface and Parallel Virtual Machine (Goux, et al. 2000) in addition to its own Master Worker "MW" library for extremely parallel tasks.

HTCondor-G allows HTCondor jobs to use resources not under its direct control. It is mostly used to talk to grid and cloud resources, like pre-WS and WS Globus, Nordugrid ARC, UNICORE and Amazon Elastic Compute Cloud. But it can also be used to talk to other batch systems, like Torque/PBS and LSF. Support for Sun Grid Engine is currently under development as part of the EGEE project.^{[citation needed]}

HTCondor supports the DRMAA job API. This allows DRMAA compliant clients to submit and monitor HTCondor jobs. The SAGA C++ Reference Implementation provides an HTCondor plug-in (adaptor), which makes HTCondor job submission and monitoring available via SAGA's Python and C++ APIs.

Other HTCondor features include "DAGMan" which provides a mechanism to describe job dependencies.

References

^ Thain, Douglas; Tannenbaum, Todd; Livny, Miron (2005). "Distributed Computing in Practice: the Condor Experience" (PDF). Concurrency and Computation: Practice and Experience. 17 (2–4): 323–356. CiteSeerX 10.1.1.6.3035. doi:10.1002/cpe.938. S2CID 15450656.
^ "HTCondor - License Information". research.cs.wisc.edu.
^ Tannenbaum, Todd. ""Condor" name changing to "HTCondor"". Retrieved 11 March 2013.

External links

Official website

[1] Thain, Douglas; Tannenbaum, Todd; Livny, Miron (2005). "Distributed Computing in Practice: the Condor Experience" (PDF). Concurrency and Computation: Practice and Experience. 17 (2–4): 323–356. CiteSeerX 10.1.1.6.3035. doi:10.1002/cpe.938. S2CID 15450656.

[2] "HTCondor - License Information". research.cs.wisc.edu.

[3] Tannenbaum, Todd. ""Condor" name changing to "HTCondor"". Retrieved 11 March 2013.

[1]

[2]

[3]

History

Example of use

Features

See also

References

External links