Binary repository manager

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

A binary repository manager is a software tool designed to optimize the download and storage of binary files used and produced in software development. It centralizes the management of all the binary artifacts generated and used by the organization to overcome the complexity arising from the diversity of binary artifact types, their position in the overall workflow and the dependencies between them.

A binary repository is a software repository for packages, artifacts and their corresponding metadata. It can be used to store binary files produced by an organization itself, such as product releases and nightly product builds, or for third party binaries which must be treated differently for both technical and legal reasons.

Introduction[edit]

Software development can be a complex process[1][2] involving many developers, or teams of developers working on shared code bases, accessing the same build tools, downloading and using a shared set of binary resources, and deploying components into the same software product. To manage the source files used in software development, organizations will typically use revision control. The many source files used in software development are eventually built into the binary artifacts (also known as “binaries”) which constitute the components of a software product. In addition, in order to provide their functionality and feature set, software products may use many 3rd party artifacts downloaded from free open source repositories or purchased from commercial sources.[3] Consequently, a software product may comprise tens, hundreds and even thousands of individual binary artifacts which must be managed in order to efficiently maintain a coherent and functional software product. This function of managing the binary artifacts is done by a binary repository manager. A binary repository manager can be thought of as being to binaries what revision control is to source files.

Universal package manager[edit]

The software and technology industry continues to change and grow, binary repository managers are no different. They are beginning to shift towards positioning as a universal package managers.[4] These package managers aim to standardize the way enterprises treat all package types used in the software development process. They give users the ability to apply security and compliance metrics across all artifact types. Universal package managers have been referred to as being at the center of a DevOps toolchain.[5]

Notable Universal package managers include:[6]

  • Apache Archiva
  • CloudRepo
  • Cloudsmith Package
  • JFrog Artifactory
  • Inedo ProGet
  • Packagecloud[7]
  • Sonatype Nexus

Relationship to continuous integration[edit]

As part of the development lifecycle, source code is continuously being built into binary artifacts using continuous integration. This may interact with a binary repository manager much like a developer would by getting artifacts from the repositories and pushing builds there. Tight integration with CI servers enables the storage of important metadata such as:

  • Which user triggered the build (whether manually or by committing to revision control)
  • Which modules were built
  • Which sources were used (commit id, revision, branch)
  • Dependencies used
  • Environment variables
  • Packages installed

Artifacts and packages[edit]

Artifacts and packages inherently mean different things. Artifacts are simply an output or collection of files (ex. JAR, WAR, DLLS, RPM etc.) and one of those files may contain metadata (e.g. POM file). Whereas packages are a single archive file in a well-defined format (ex. NuGet) that contain files appropriate for the package type (ex. DLL, PDB).[8] Many artifacts result from builds but other types are crucial as well. Packages are essentially one of two things: a library or an application.[9]

Compared to source files, binary artifacts are often larger by orders of magnitude, they are rarely deleted or overwritten (except for rare cases such as snapshots or nightly builds), and they are usually accompanied by lots of metadata such as id, package name, version, license and more.

Metadata[edit]

Metadata describes a binary artifact, is stored and specified separately from the artifact itself, and can have several additional uses. The following table shows some common metadata types and their uses:

Metadata type Used for
Versions available Upgrading and downgrading automatically
Dependencies Specify other artifacts that the current artifact depends on
Downstream dependencies Specify other artifacts that depend on the current artifact
License Legal compliance
Build date and time Traceability
Documentation Provide offline availability for contextual documentation in IDEs
Approval information Traceability
Metrics Code coverage, compliance to rules, test results
User-created metadata Custom reports and processes

Key features of repository managers[edit]

Key factors and features when considering the adoption of a package manager include:[10]

  • Caching – Caching simply stores local copies of packages. This increases performance for slow internet connections by allowing the user to pull from the local repository instead of externally. Caching locally allows frequently used packages to be available even during times of external repository outages.
  • Retention policies - Repository managers can be used and configured to support organization purging policies to ensure reasonable disk space usage. Local repositories for third party artifacts may also be purged after not being used by any release for a specified time.
  • License filtering - Third party artifacts may be subject to approval processes due to licensing and legal issues. Package managers allow for the restriction of only approved artifacts into deployment.
  • High availability - Since a binary repository manager maintains all the development dependencies, it is vital to always maintain access to these artifacts. Any down-time of the binary repository manager can halt development with all the significant consequences to the organization. A high availability instance allows an enterprise to overcome the risk associated with downtime, through automatic failover. This is achieved by having a redundant set of repository managers work against the same database and file storage. Maintaining enterprise wide stability and performance at all times
  • User restrictions - Repository managers can be integrated with other organizational systems such as LDAP or Single Sign-On servers to simplify and centralize user management. This gives an enterprise granular control over who has access to vital software components.

See also[edit]

References[edit]

  1. ^ Biggert, Johnny. "SUSTAINABLE SOFTWARE DEVELOPMENT, PART 2: MANAGING COMPLEXITY". Developers Dilemma. Johnny Biggert. Retrieved 11 January 2015. 
  2. ^ "Managing Complexity". The Economist. The Economist. Retrieved 11 January 2015. 
  3. ^ "Eighth Annual Future of Open Source Survey Finds OSS Powering New Technologies, Reaching New People, and Creating New Economics". blackducksoftware.com. Retrieved 25 February 2015. 
  4. ^ Waters, John K. (8 September 2015). "JFrog Releases 'Universal' Artifact Repository". ADT Mag. Application Development Trends Magazine. 
  5. ^ Decoster, Xavier (18 August 2013). "An Overview of the NuGet Ecosystem". CodeProject.com. 
  6. ^ hanselman, scott (13 April 2015). "How to host your own NuGet Server and Package Feed". Hanselman.com. 
  7. ^ canals, armando (31 March 2018). "Publishing npm Packages Using CircleCI 2.0 - CircleCI". circleci.com. 
  8. ^ Chris, Tucker (2007-03-15). "Optimal Package Install/Uninstall Manager" (PDF). UC San Diego: 1. Retrieved 2011-09-14. 
  9. ^ "Linux repository classification schemes". braintickle.blogspot.com. Retrieved 2008-03-01. 
  10. ^ Bridgewater, Adrian (1 November 2015). "How to find real DevOps, look for binary artifact repository control". ComputerWeekly.com.