PGMiner : Complete Proteogenomics Workflow; from Data Acquisition to Result Visualization
Keywords: Proteogenomics, KNIME
In parallel with the development of nucleotide sequencing an equally important interest in further describing
the sequence in terms of function arose. Following the advent of next generation sequencing, the current
bottleneck is the annotation of available genomic sequences. While sequencing the transcriptome allows for
determining expressed nucleotide sequences, it is limited since it may not be possible to sequence the expressome
under all possible conditions. Proteomics, currently based in mass spectrometry, can perform sequencing on the
protein level and thereby complement transcriptomics studies. Moreover, there exists information such as post
translational modification events which can only be determined on the proteomics level. Therefore, it is essential
to combine proteomics and genomics. For that purpose, a number of proteogenomics data analysis pipelines have been described.
Here we describe a novel proteogenomics workflow which encompasses everything from the acquisition of data to result
visualization in the Konstanz Information Miner, a state of the art workflow management and data analytics platform.
This new workflow, entitled PGMiner, not only includes all data analysis steps, but is highly customizable which is
rather cumbersome for most existing pipelines. Moreover, no burdensome installation processes have to be performed
making PGMiner the most user friendly tool available.
The current version of PGMiner includes 4 main categories. The nodes that are related to each category are listed. Detailed descriptions
on nodes and their configurations
can be reached via Node description panel in KNIME platform.
- File operation: 3-frame or 6-frame translation of list of databases
- Database equalization: Creation of equalized databases
- DatabaseSearchResultMerger: Returns one result file of an algorithm which searches same spectra file against multiple databases
- FDR: False discovery rate calculation based on target-decoy approach
- ConsensusDBSearchResult: Majority vote calculation
- WuManber exact peptide sequence search
- EnzymeChecker: Checks whether mapped peptide locations confirm enzymatic cleavage rule
- Annotation mapping: Maps peptide genomic locations to annotation GFF file of the organism
The current release of PGMiner runs on KNIME 3.0.1 Full Analytics
,nstalled on Linux and Windows.
Java versions must be 1.8.
In case PGMiner update site will be used, there is no installation rather than Java 1.8 and KNIME 3.0.1 Full Analytics
PGMiner supports latest versions of following MS Search engines. These tools are available with PGMiner, thus no installation is required.
PGMiner database search engine runners require to be set configuration files which can be found under Downloads section.
The parameter names are correlated to definitions given by algorithms. Please see detailed information about settings from algorithms's web pages.
Besides that makeblastdb tool of NCBI Blast is available.
PGMiner search engine runners currently support only .MGF(Mascot Generic Format).Therefore, MS/MS data in different format must
be converted to .mgf format by using OpenMS FileConverter which is already available on KNIME 3.0.1 Full Analytics
We recommend running PGTools on a minimum of four core computers with minimum 20GB RAM.