Jump to content

Binary repository manager

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Rhws (talk | contribs) at 15:20, 19 January 2015 (→‎What is a Binary Repository). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A Binary Repository Manager is a software tool designed to optimize the download and storage of binary files used and produced in software development. It centralizes the management of all the binary artifacts generated and used by the organization to overcome the incredible complexity arising from the diversity of binary artifact types, their position in the overall workflow and the set of dependencies between them.

Introduction

Software development can be an extremely complex process[1] [2] involving many developers, or teams of developers working on shared code bases, accessing the same build tools, downloading and using a shared set of binary resources, and deploying components into the same software product. To manage the source files used in software development, organizations will typically use a Version Control System. The many source files used in software development are eventually built into the binary artifacts (also known as “binaries”) which constitute the components of a software product. In addition, in order to provide their functionality and feature set, software products may use many 3rd party artifacts downloaded from free open source repositories or purchased from commercial sources. Consequently, a software product may comprise tens, hundreds and even thousands of individual binary artifacts which must be managed in order to efficiently maintain a coherent and functional software product. This function of managing the binary artifacts is done by a Binary Repository Manager. A Binary Repository Manager can be thought of as being to binaries what a Version Control System is to source files.

What is a Binary Repository

A Binary Repository is a software repository for binary artifacts and their corresponding metadata. It can be used to store binaries produced by an organization itself, such as releases and nightly builds, or for third party binaries which must be treated differently for both technical and legal reasons.

Artifacts

An artifact is the output of any step in the development process. Many artifacts result from builds but other types are crucial as well. Examples of common binary artifact types include:

  • ZIP or tarball files
  • RPM or DEB packages (Linux)
  • JAR, WAR, and EAR packages (Java)
  • Gems (Ruby)
  • DLLs (Windows)
  • Docker Image layers
  • Python packages

Compared to source files, binary artifacts are often larger by orders of magnitude, they are rarely deleted or overwritten (except for rare cases such as snapshots or nightly builds), and they are usually accompanied by lots of metadata such as id, package name, version, license and more.

Metadata

Metadata describes a binary artifact, is stored and specified separately from the artifact itself, and can have several additional uses. The following table shows some common metadata types and their uses:

Metadata type Used for
Versions available Upgrading and downgrading automatically
Dependencies Specify other artifacts that the current artifact depends on
Downstream dependencies Specify other artifacts that depend on the current artifact
License Legal compliance
Build date and time Traceability
Documentation Provide offline availability for contextual documentation in IDEs
Approval information Traceability
Metrics Code coverage, compliance to rules, test results
User-created metadata Custom reports and processes

What is a Binary Repository Manager[3]

In common day-to-day usage, the term “Binary Repository” is frequently used to refer to a “Binary Repository Manager”, however, as the term suggests, one manages and the other is managed. A Binary Repository Manager fulfills several functions for each step in the software development lifecycle.

Managing Multiple Repositories

Organizations usually need to use several binary repositories in the development process. Repositories are structured according to several considerations such as organizational (project, department and access privileges), target environment (staging or production), artifact type (jar, rpm, gem etc.), artifact state (integration, release etc.) and more. A Binary Repository Manager assigns and manages the permissions that determine which entities in the organization can access each binary repository.

Storing Local Binaries

Local repositories are physical, locally-managed repositories into which an organization can deploy binaries. Typically, these are used to deploy internal and external releases as well as development builds, but they can also be used to store binaries that are not widely available on public repositories such as 3rd party commercial components that the organization has purchased. Using local repositories, all internal resources can be made available from a single access point across the organization from one common URL

Proxying and Caching External Binaries

It is common for organizations to need third party artifacts hosted in an external repository. This can introduce an element of risk since an organization cannot control access to an external resource. Moreover, network latency and bandwidth can directly affect development speed, especially if the binaries used are very large. A team of developers can have progress severely hampered if its members need to download the latest build of several dependencies several times a day, where each download may take several minutes.

To overcome the risks of using an external repository, a Binary Repository Manager maintains a caching proxy to the external resource, known as a Remote Repository. By using remote repositories, the Binary Repository Manager removes the organization’s dependence on the external repository. Since external binaries are stored in a local cache, it means that they can be served rapidly to other machines on the same network after the initial request - either to developers or directly to Continuous Integration servers themselves.

There are two ways to proxy external repositories: on-demand, or mirrored.
In an on-demand proxy, requests to the remote repositories happen only the first time a developer requests an artifact that is not yet cached in the proxy. Any further requests from the other developers will use the copy in the proxy repository. Therefore, the disk space and bandwidth requirements are usually low, since only artifacts that are actually used are cached.

On-demand proxy
On-demand proxy


In a mirrored repository, all changes in the original repository are automatically synched to the mirror, so even the first request for an artifact is always resolved from the closer mirror. This is also known as repository replication. Here, however, disk space and bandwidth requirements are therefore usually higher.

Mirror proxy
Mirror proxy

Grouping Repositories

When multiple teams and build tools in an organization use the same Binary Repository Manager, the number of repositories being used can multiply very quickly since each team may have a different set of permissions or usage patterns, and may use different 3rd party resources. This can make repository setup very complex with a potentially long list of repositories to configure. To shorten the list of repositories that developers or build tools need to be familiar with, the administrator may define Virtual Repositories. A virtual repository simplifies development by encapsulating any number of local and remote repositories, and representing them as a unified repository accessed from a single URL. The Binary Repository Manager internally optimizes how binaries are uploaded to, or downloaded from the local and remote repositories underlying the virtual repository that the developer is exposed to.

Supporting Distributed Development Teams

When teams that access repositories are in different geographical locations, a set of Binary Repository Managers can be set up hierarchically to manage the repositories of all the distributed teams. A local repository manager is set up in each location to serve as a mirror of the remote server and synchronize the different repositories’ contents, and then a “Master Repository Manager” is proxied by the distributed local managers for local caching.

Hierarchy of Binary Repository Managers
Hierarchy of Binary Repository Managers used to replicate repositories for geographically distributed development teams

Artifact Promotion

A Binary Repository Manager can enforce an organization’s development workflow by setting different permissions for each repository to only allow authorized users to promote artifacts from one repository to the next one in the workflow. For example, a release candidate must go through integration testing and QA before being made available to other teams. Using the Binary Repository Manager, only authorized members of the QA team can promote the release candidate to the releases repository once it has passed the QA process. Then, the production systems can be configured to pull artifacts only from the releases repository.

Security and Maintenance

Authentication and Authorization

Project source code is controlled by giving users access permissions as deemed necessary. In the same way, access to the resulting project binaries may also be controlled, and this can all be done by the Binary Repository Manager. Moreover, Repository Managers can be integrated with other organization systems such as LDAP or Single Sign-on servers to simplify and centralize user management.

Purging Policies

Binary Repository Managers can be used and configured to support organization purging policies to ensure reasonable disk space usage. For example, Continuous Integration servers may generate several snapshots of an artifact per day. Using the Binary Repository Manager, an organization can configure purging based on number of snapshots or disk-space usage. Proxied repositories for third party artifacts may also be purged after not being used by any release for a specified time. For example, artifacts used during a proof-of-concept, may cease to be used once a product progresses to production.

Managing Third Party Artifacts

Third party artifacts may be subject to approval processes due to licensing and legal issues. A Binary Repository Manager can be used to allow usage of an artifact to development until the artifact is approved, and only then authorize publication to production repositories. Moreover, some Binary Repository Managers (e.g. Artifactory) also include automatic license discovery and management and integration with license management software (e.g. BlackDuck Code Center).

High Availability

Since a Binary Repository Manager maintains all the development dependencies, it is a central and usually mission-critical component in the organizational infrastructure. Any down-time of the Binary Repository Manager can halt development with all the significant consequences to the organization. To overcome this risk, a Binary Repository Manager can be installed with a High Availability Configuration to minimize the risk of down-time. This is achieved by having a redundant set of Repository Managers work against the same database and file storage. Through built-in processes of synchronization, each Repository Manager mirrors the other, so that no one repository manager can be a single point of failure.

Binary Repository Managers and Continuous Integration

As part of the development lifecycle, source code is continuously being built into binary artifacts using Continuous Integration Servers. These servers may interact with a Binary Repository Manager much like a developer would by getting artifacts from the repositories and pushing builds there. Tight integration with CI servers enables the storage of important metadata such as:

  • Which user triggered the build (whether manually or by committing to a VCS)
  • Which modules were built
  • Which sources were used (commit id, revision, branch)
  • Dependencies used
  • Environment variables
  • Packages installed

This information can be used later for artifact scans or reports, audit and security checks and build traceability.

References

  1. ^ Biggert, Johnny. "SUSTAINABLE SOFTWARE DEVELOPMENT, PART 2: MANAGING COMPLEXITY". Developers Dilemma. Johnny Biggert. Retrieved 11 January 2015.
  2. ^ "Managing Complexity". The Economist. The Economist. Retrieved 11 January 2015.
  3. ^ "What is a Binary Repository Manager". www.jfrog.com. JFrog Ltd. Retrieved 15 January 2015.