Jump to content

User:GJSissons/sandbox

From Wikipedia, the free encyclopedia
Nextflow
Original author(s)Paolo Di Tommaso
Developer(s)Seqera Labs, Centre for Genomic Regulation
Initial releaseApril 9, 2013; 11 years ago (2013-04-09)
Stable release
v22.04.3 / May 18, 2022; 2 years ago (2022-05-18)
Preview release
v22.05.0-edge / May 25, 2022; 2 years ago (2022-05-25)
Repositoryhttps://github.com/nextflow-io/nextflow
Written inGroovy, Java
Operating systemLinux, macOS, WSL
TypeScientific workflow system, Dataflow programming, Big data
LicenseApache License 2.0
Websitenextflow.io


Nextflow is an open-source scientific workflow system based on the Dataflow programming model. Originally developed at the Centre for Genomic Regulation and released as an open source project on GitHub in July of 2013.[1] The software is now actively maintained at Seqera Labs by the original authors.

Nextflow enables scalable and reproducible scientific workflows, implicit data parallelism and fault tolerance. It allows the adaptation of pipelines written in the most common scripting languages.[2] Conceptually, Nextflow is similar to other workflow management systems used in the life sciences field. Examples include Galaxy, CWL, Cromwell Workflow Engine, Apache Airflow, and Snakemake. In Bioinformatics, the terms pipeline and workflow are often used interchangeably to describe the steps involved in a multi-step analytic process.[3]

Nextflow is a general-purpose workflow system, but it is most widely used in the life sciences industry for various applications including genomic analysis, imaging, and machine learning.

Overview

[edit]

Nextflow provides a reactive workflow framework and a programming domain-specific language (DSL) that simplifies the development of data-intensive computational pipelines.[4] The DSL is designed to be easy to learn to minimize a developer’s learning curve. Nextflow’s scripting language is an extension of the Apache Groovy programming language, which in turn, is based on the Java language.[4]

In Nextflow, pipelines are constructed by logically connecting a set of processes, each with a defined set of inputs and outputs. Rather than running in a prescribed sequence, individual process steps run when their input channels become valid. Nextflow is said to be reactive, because process steps can execute in parallel as soon as their inputs (typically outputs from another process step) become valid.

While Nextflow’s DSL is used to express workflow logic, developers are free to code workflow steps using their scripting language of choice. This allows existing scripts and workflows developed using other frameworks to be easily adapted to Nextflow. Supported scripting languages include bash, csh, ksh, python, ruby, and R. Any scripting language that uses the standard Unix shebang declaration (#!/bin/bash) is supported in Nextflow.[2] Users can optionally mix multiple languages in the same Nextflow script. Developers can author and maintain Nextflow scripts using their preferred editor or integrated development environment (IDE). A sample Nextflow script is shown below:

greetings = Channel.from("Hello", "Ciao", "Hola", "Bonjour")

process hello_world {
    input:
    val x from greetings

    output:
    file "${x}.txt" into grettings_txts

    script:
    """
    echo "${x} World!" > ${x}.txt
    """
}

The Nextflow Execution Model

[edit]

In Nextflow, the execution of process steps is abstracted from the execution of the pipeline itself. This architectural approach enables Nextflow to scale to arbitrarily large computing environments and exploit parallelism to accelerate pipeline execution. Executors are Nextflow components that determine where pipeline processes are run and that supervise process execution.[5]

The functional logic of a pipeline is independent of the underlying processing platform. Executor definitions are maintained in a separate Nextflow configuration file independent of the Nextflow workflow.[6] This means that users can change where a Nextflow pipeline executes without needing to make any changes to the pipeline logic itself. Users can elect to run Nextflow workflows using any of the following executors:[5]

  • Local – the default executor, where execution occurs on the computer where the pipeline is launched
  • HPC workload managers – Slurm, SGE, LSF, Moab, PBS Pro, PBS/Torque, HTCondor, NQSII, OAR
  • Kubernetes – local or cloud-based Kubernetes implementations
  • Cloud batch services – AWS Batch, Azure Batch
  • Other environments – Apache Ignite, Google Life Sciences

Optionally, Nextflow workflows can be configured to spread execution across multiple computing platforms. In Nextflow, Individual workflow steps are often executed in containers for portability across computing environments. Supported container frameworks include Docker, Singularity, CharlieCloud, Podman, and Shifter.[7]

Deployment Options

[edit]

The Execution model described above allows Nextflow to be deployed across diverse computing environments. Nextflow pipelines run on Linux or Mac OS. Because Nextflow decouples workflow execution from individual process steps, execution can occur on local compute nodes, on-premises or cloud-resident clusters, or public and private clouds. There are multiple cloud deployment options, including running on provisioned compute instances, leveraging a cloud provider’s batch service, or leveraging cloud-based Kubernetes environments such as GKE, EKS, or AKS.[5]

Cloud Support

[edit]

Nextflow pipelines can run on any cloud platform, extended support is provided for Amazon Web Services, Azure Cloud, and Google Cloud. On AWS, Nextflow provides support for AWS security credentials, AWS IAM policies, extended support for Amazon S3, and support for AWS Batch.[8] In Azure, Nextflow provides similar extensions for Azure Blob Storage, Azure File Shares, and Azure Batch.[9] In the Google Cloud, Nextflow provides support for Google Cloud Life Sciences.[10]

Key Influences on the Nextflow Design

[edit]

According to Nextflow’s principal developer, Paolo Di Tommaso, the design of Nextflow was influenced significantly by several factors and events:[11]

  • Solomon Hyke’s talk at dotScale in 2013 had a significant impact on the design of Nextflow according to Nextflow's author.[12] It became clear that workflow steps and the applications that enabled them would be increasingly encapsulated in containers for portability and ease of deployment. As a result, Nextflow was designed to pull containers from registries automatically and execute containers across all supported execution environments.
  • Concurrent with the design of Nextflow, DevOps techniques were beginning to revolutionize software design.[13] Nextflow’s authors realized that developers would want to collaborate on pipelines using source-code management systems (SCMs). As a result, Nextflow evolved to have native support for SCMs, including GitHub, GitLab, and others.[14]
  • The Functional dataflow model also had a fundamental impact on the design of Nextflow. Rather than viewing workflows as a series of discrete steps that occur in a specific order, workflows were better conceptualized as black boxes with inputs and outputs. In a dataflow model, execution of a step is triggered only when an input becomes valid.[15]

History

[edit]
  • Nextflow was first released on GitHub in March of 2013 under a GPLv3 open source license[16]
  • A parallel community called nf-core was started in 2017 by Phil Ewels to develop and share curated pipelines written in Nextflow[11]
  • In April of 2017, Nextflow was featured in Nature Biotechnology[17]
  • Seqera Labs was launched in July of 2018 as a spin-off from the Centre for Genomic Regulation in Spain[11]
  • In October 2018, the project license for Nextflow was changed to Apache 2.0[18]
  • Seqera Labs attracted initial seed funding for Nextflow in February of 2019[11]
  • Nextflow Tower was launched by Seqera Labs in September of 2019, a commercial product that made Nextflow workflows easier to deploy and manage for commercial users[19]
  • Nextflow DSL 2 was introduced in July of 2020 providing support for sub flows and other features[20]
  • In July of 2020, Seqera Labs was awarded an EOSS grant from the Chan Zuckerberg Initiative[21]
  • By 2020, monthly downloads of Nextflow had grown to approximately 55,000 per month[11]
  • Nextflow’s role in the global battle against COVID-19 recognized in BioIT World, November 5th, 2021[22]

References

[edit]
  1. ^ "Release Version 0.3.0 · nextflow-io/nextflow". GitHub. Retrieved 2022-05-31.
  2. ^ a b "Nextflow Documentation, Processes". docs.nextflow.io. Retrieved 6 June 2022.{{cite web}}: CS1 maint: url-status (link)
  3. ^ Somak, Roy. "Next-Generation Sequencing Bioinformatics Pipelines". aaac.org. Retrieved 6 June 2022.{{cite web}}: CS1 maint: url-status (link)
  4. ^ a b "Nextflow Documentation - Domain Specific Language (DSL) 2". docs.nextflow.io. Retrieved 6 June 2022.{{cite web}}: CS1 maint: url-status (link)
  5. ^ a b c "Nextflow Documentation - Executors". docs.nextflow.io. Retrieved 6 June 2022.{{cite web}}: CS1 maint: url-status (link)
  6. ^ "Nextflow Documentation - Configuration". docs.nextflow.io. Retrieved 6 June 2022.{{cite web}}: CS1 maint: url-status (link)
  7. ^ "Nextflow Documentation - Containers". docs.nextflow.io. Retrieved 7 June 2022.{{cite web}}: CS1 maint: url-status (link)
  8. ^ "Nextflow Documentation - Amazon Cloud". docs.nextflow.io. Retrieved 6 June 2022.{{cite web}}: CS1 maint: url-status (link)
  9. ^ "Nextflow Documentation - Azure Cloud". docs.nextflow.io. Retrieved 6 June 2022.{{cite web}}: CS1 maint: url-status (link)
  10. ^ "Nextflow Documentation - Google Cloud". docs.nextflow.io. Retrieved 6 June 2022.{{cite web}}: CS1 maint: url-status (link)
  11. ^ a b c d e Di Tomasso, Paolo (14 October 2021). "The story of Nextflow: Building a modern pipeline orchestrator". eLifeSciences.org. Retrieved 6 June 2022.{{cite web}}: CS1 maint: url-status (link)
  12. ^ Hykes, Solomon (7 June 2013). "Dot Scale 2013 - Why we built Docker". YouTube. Retrieved 6 June 2022.{{cite web}}: CS1 maint: url-status (link)
  13. ^ Courtemanche, Meredith; Mell, Emily; Gillis, Alexander. "What is DevOps: The Ultimate Guide". TechTarget. Retrieved 6 June 2022.{{cite web}}: CS1 maint: url-status (link)
  14. ^ "Nextflow Documentation - Pipeline Sharing". docs.nextflow.io. Retrieved 6 June 2022.{{cite web}}: CS1 maint: url-status (link)
  15. ^ "Nextflow Documentation - Channels". docs.nextflow.io. Retrieved 6 June 2022.{{cite web}}: CS1 maint: url-status (link)
  16. ^ "Nextflow version 0.3.0 GitHub Repo". GitHub.COM. 11 July 2013. Retrieved 7 June 2022.{{cite web}}: CS1 maint: url-status (link)
  17. ^ Di Tomasso, Paolo; Chatzou, Maria; Floden, Evan; Prieto Barja, Pablo; Palumbo, Emilio; Notredame, Cedric (11 April 2017). "Nextflow enables reproducible computational workflows". Nature Biotechnology Journal. Retrieved 7 June 2022.{{cite web}}: CS1 maint: url-status (link)
  18. ^ Di Tomasso, Paolo (24 October 2018). "Goodbye zero, Hello Apache!". Nextflow.io/blog. Retrieved 7 June 2022.{{cite web}}: CS1 maint: url-status (link)
  19. ^ Di Tommaso, Paolo (8 October 2019). "Introducing Nextflow Tower - Seamless monitoring of data analysis workflows from anywhere". Seqera.IO. Retrieved 7 June 2022.{{cite web}}: CS1 maint: url-status (link)
  20. ^ Di Tommaso, Paolo (24 July 2020). "Nextflow DSL 2 is here!". Nextflow.IO/blog. Retrieved 7 June 2022.{{cite web}}: CS1 maint: url-status (link)
  21. ^ "Seqera awarded EOSS grant from the Chan Zuckerberg Initiative". Seqera Labs. 27 July 2020. Retrieved 7 June 2022.{{cite web}}: CS1 maint: url-status (link)
  22. ^ Floden, Evan (5 November 2021). "Genetic Sequencing Will Enable Us To Win The Global Battle Against COVID-19".
[edit]