Bioinformatics Pipeline Development with Nextflow
How to manage your own data analysis pipelines using workflow management systems

Streamline your research through the development of reproducible analysis pipelines

In a nutshell

  • Learn the fundamental best-practices of bioinformatic pipeline development
  • Understand how workflow management systems can accelerate your research
  • Use state-of-the-art, open source software to make complex analyses routine
  • Perform your own custom analysis pipelines using Nextflow!

When?
May 3-6, 2021
9 am - 5 pm (CEST)

Where?
Online

The purpose of the workshop is to introduce the concepts of bioinformatic pipeline development through the context of the open source Workflow Management System (WMS) Nextflow. The participants will be trained in the scripting, configuration and execution of example analysis pipelines based on current industry best-practices, and learn how to share them with other users. Finally, the participants will apply everything they have learned by implementing their own analysis pipelines from the ground up.

By the end of the workshop all attendees will be enabled to build their own scalable, reproducible bioinformatic pipelines which can be run locally, on high-performance computing clusters or even in the cloud. The course layout has been adapted to the needs of beginners in the field of computational biology and allows scientists with little or no background in software development to get a first hands-on experience in this new and fast-evolving area of expertise. This instructor-led live online workshop has been newly designed for an engaging, interactive online learning experience.

Get trained by experts

Our trainers have a proven record of academic and/or industrial experience in NGS data analysis. Because up-to-date expert knowledge is needed to answer your questions and know what is important in the field.

Open source NGS tools

We only use open source tools that are free to use for academia and industry.

Learn effectively with well-curated materials

For an optimal learning experience we carefully prepare our learning materials and example data.

This workshop has been adapted to the needs of beginners in the field of (biological) data analysis and comprises these three course modules:

  1. Introduction to pipeline development and workflow management systems:
    An overview of bioinformatic pipeline development in the context of workflow management systems such as Nextflow and Snakemake. Important consideration is given to understanding and addressing the needs of other pipeline users in regards to various types of computational infrastructure.
  2. Nextflow for biological data analysis:
    Get hands-on with Nextflow. Understand processes and channels, the scripting language and syntax, execution abstraction and relevant configuration options. This module covers essential knowledge for the practical implementation of any new project in bioinformatic pipeline development.
  3. Build your own analysis pipeline:
    This module will be entirely hands-on, beginning with the planning and outlining of a custom bioinformatics pipeline and ending with the opportunity to start building and implementing the pipeline from the ground up, with guidance from our in-house experts. Participants can choose from a selection of relevant examples from the field of NGS data analysis.

Detailed Course Program


Introduction to pipeline development and workflow management systems

  • Introduction and overview:. Why build bioinformatic analysis pipelines at all?
  • Workflow Management Systems:. What’s out there and how should I decide what to use? How do I think like an end-user?
  • Where to find example pipelines, how to run them, and get a feel for what output to expect. Get familiar with the Linux command line.
  • Considerations for different types of underlying computational infrastructure.
  • Should my pipeline run locally, on a HPC or in the cloud? How do I make my work scalable?
  • Setting up environmental dependencies and software containers. How do I make my work reproducible?
  • Industry best-practices and optimising your work environment for software development.

Nextflow for biological data analysis

  • Understanding the concepts of dataflow: processes and channels, input and output
  • Running a pipeline with Nextflow: work directory layout and process execution
  • Language basics: Nextflow scripting and syntax
  • Configuration options: parameters, scopes and profiles
  • Execution abstraction: integrating with resource management software
  • Workflow introspection: runtime metadata and handling errors
  • Sharing your pipeline with online code repositories

Build your own analysis pipeline

  • How to outline and approach a new project in pipeline development
  • Getting started: building your pipeline from the ground up
  • Write processes, define the workflow, add dependencies, run and test your pipeline!

Speakers

Adam Nunn (ecSeq Bioinformatics GmbH & University of Leipzig)
is a PhD student at the Marie Skłodowska-Curie Innovative Training Network 'Epidiverse'. He developed several bioinformatics pipelines using Nextflow for this European network.

Dr. Mario Fasold (ecSeq Bioinformatics GmbH)
Mario works in the analysis of microarray data since 2007 and developed several bioinformatics tools such as the Bioconductor package AffyRNADegradation and the Larpack program package. Since 2011 he specialized in the field of NGS data analysis and helped analysing sequencing data of several large consortium projects.

Requirements

The target audience are biologists or data analysts with no or little experience in developing computational pipelines for data analysis. A superficial understanding of molecular biology (DNA, RNA, gene expression, PCR, ...) is assumed, as examples will be given in the context of this field.

Some familiarity with a command line interface (e.g. Linux, Mac OS X) and a minimal understanding of object-oriented programming (with e.g. Python or Java) is recommended but not required.

A current desktop computer / laptop with an up-to-date browser (Firefox or Chrome) is required.

  •   Printed course materials
  •   High-performance cloud computer (accessed via browser)
  •   Downloadable Live-Linux system with pre-installed tools for seamless continuation / repetition after the course

  •   Hands-on use of workflow management tools to see where the stumbling blocks are
  •   Our assistants can help you and provide feedback you on the spot
  •   No previous installation of software necessary
  •   Continue practicing on your own using our Live-Linux system and the printed manuscript


Attendance

Location: Online
Language: English
Available Seats: 30 (first-come, first-served)

Registration Fee: 989 EUR (excluding VAT)

Key dates

Opening Date of Registration: January 5, 2021
Closing Date of Registration: May 1, 2021
Workshop: May 3 - 6, 2021 from 9 am to 5 pm (CEST UTC+2)

Find out what time it is at your location: Time Difference

"Excellently structured and polished workshop! The material was challenging but the way it was presented made it easy to follow and fun to engage with. The tutors made the learning environment really friendly and safe, and I got much more out of this workshop than I initially expected. Thank you!" Antonina Karakostova, University of Copenhagen, DK

"It is a very condensed course, covering really practical tricks for beginners on coding in Linux, and basically all the primary steps dealing with NGS data. Really worths the contribution of the fee and the time spent :)" Tianhao Zhao, University of Groningen, Netherlands

"This was an excellent course and I highly recommend it for anyone beginning to learn NGS data analysis. All of the topics were explained very well and it was the right amount of information for my first time taking a course on NGS data and using the Linux command line." Lynsey W., USA



When you register for this workshop you are agreeing with our Workshop Terms and Conditions. Please read them before you register.


Answer

What you need:

  • A computer with one of the following operation systems: Windows 7 or later (incl. Windows 10), Mac OS X 10.13 or later.
  • One of the following web browsers: Edge 42 and later, Chrome 65 or Later, Firefox 48 or Later.
  • A microphone and loudspeakers/headphones.

  The course cannot be run on phones, tablets and similar handheld devices.

Answer

We will start every morning at 9am sharp and work together until 5pm in the afternoon. There will be regular short breaks and a longer break at lunchtime.

As this is a live broadcast, you cannot pause the course and continue later. The individual exercises build on each other, so you should not leave in between.

Answer

No, you do not have to install any software to follow the course. You will get access to a high performance computer in the cloud, which you can easily log into using an in-browser console. All necessary programs are already installed on this computer. This way, we can start right away.

Answer

Of course we'll help you. If such a case should occur, we have assistants in the virtual room whom you can contact via chat. They can discuss your issue in a separate room/chat. They can also dial into your in-browser console and see exactly the same as what you see. This way they can help you directly and without much detour.

Note: The assistent can only see your in-browser terminal window but nothing else and also do not have any access to your computer.

Answer

A few days before the course we will send you the manuscript by mail. After the course, you will receive a download link for our Live-Linux system with pre-installed NGS tools. When you start this with a virtualization tool (such as the free VirtualBox), a live version of Linux will be running on your computer where you can repeat and/or practice all the tasks of the course.

Any Questions? Please feel free to contact our events team.

ecSeq Bioinformatics GmbH
Sternwartenstr. 29
D-04103 Leipzig
Germany
Email: events@ecSeq.com