KNIME is a well-established data analysis framework which supports the generation of workflows for data analysis. In this tutorial, we describe how to use SeqAn applications in KNIME.
The Installation of the SeqAn NGS Toolbox in KNIME is very easy.
Download the latest KNIME release from the KNIME website download page. You might be asked for registration but that is optional.
In the KNIME window, click on the menu
Help > Install new Software.
In the opening dialog choose
In the opening dialog fill-in the following Information:
Trusted Community Contributions (3.1)
If you are, by chance, still using an older KNIME version and you do not want to update to the latest version you can find the corresponding update site location at the community-contributions page of the KNIME website.
After pressing OK, KNIME will show you all the contents of the added Update Site, containing also the SeqAn nodes.
Select the SeqAn NGS Toolbox and click Next.
Follow the instructions.
After the installation is done, KNIME will prompt you to restart. Click OK and KNIME will restart with the newly installed SeqAn nodes available under the
Community Nodes category. The installation also includes GenericKnimeNodes which are very useful for using SeqAn nodes in KNIME. This includes file input/output nodes.
Now you can drag and drop the installed SeqAn nodes to make your desired workflow together with the other KNIME nodes.
In this example we will use a read mapper (yara) to map short reads against a reference genome. Then we will use SnpStore to call variants and store the variants as
gff files. We will also do error correction of Illumina reads before we map them to the reference. In this way we can identify SNP’s more clearly.
- Download this zipped
example dataand extract it somewhere appropriate. It contains three files. The file
NC_008253_1K.fais a small toy reference genome. Files
sim_reads_r.fqare short sequencing paired reads. For each read in one file its mate is contained in the other file.
- On the left side of the opened KNIME window under KNIME Explorer right click on
LOCAL (Local Workspace)and chose the menu item
New KNIME Workflow. You will be presented with a dialog to enter the name and location of the workflow to be created. Give your workflow an appropriate name, perhaps something like ‘Variant Calling Workflow’, and click finish.
- Drag and drop the nodes shown in the following picture from the
Node Repositorypanel on the left bottom side of the KNIME window and arrange/connect them as they are shown in the picture bellow. You can also rename the node from nodeXX to a meaningful name like
INPUT: Reference. The node name is the text below the node. The Node type, which is displayed above the node, cannot be edited.
- Now it’s time to configure our nodes. To configure a node just double-click on it. A configuration dialog will pop up. Let us configure our nodes on our workflow one by one.
NC_008253_1K.faunder Selected file field.
addand select both
- Run the workflow. Right-click on the File Viewer (
OUTPUT: SNP's) node at the right end of our configured workflow and choose Execute from the menu. As the preceding nodes execute they change their indicator color from yellow to green. When the last node finishes executing do the same to execute the File Viewer (
- See the results. You can take a look at the results (SNPs/IndDels) by Right-clicking on the corresponding File Viewer node and choose
View: (data view)from the menu.
Congratulations you have just created a working KNIME workflow using SeqAn nodes!
The git repository https://github.com/seqan/knime_seqan_workflows has several workflows ready to run. Each workflow is contained in a directory. The directory for a workflow contains an example data and a README file in it. This makes it easier to download and execute the workflow. You can either clone the repository or download individual workflows and execute them with the data provided or with your own data.
With the steps described above you will be able to set up your own workflows in KNIME. If you want to contribute a workflow to the SeqAn community you are encouraged to do so. You can do it as follows:
- Simply clone the workflow git repository into your own github repository and add a new folder
- In KNIME export your workflow without the data files as a
.zipfile into that folder.
- Provide a README, a screenshot and some example input data as well.
To get a more clear idea just take a look at the existing workflow folders.
After everything is ready, add…commit and push the new folder into your github repository and make a github pull request to the original workflow repository (https://github.com/seqan/knime_seqan_workflows) and - voila - it will be shared with the community.