PGMiner : Complete Proteogenomics Workflow; from Data Acquisition to Result Visualization

Keywords: Proteogenomics, KNIME

In parallel with the development of nucleotide sequencing an equally important interest in further describing the sequence in terms of function arose. Following the advent of next generation sequencing, the current bottleneck is the annotation of available genomic sequences. While sequencing the transcriptome allows for determining expressed nucleotide sequences, it is limited since it may not be possible to sequence the expressome under all possible conditions. Proteomics, currently based in mass spectrometry, can perform sequencing on the protein level and thereby complement transcriptomics studies. Moreover, there exists information such as post translational modification events which can only be determined on the proteomics level. Therefore, it is essential to combine proteomics and genomics. For that purpose, a number of proteogenomics data analysis pipelines have been described. Here we describe a novel proteogenomics workflow which encompasses everything from the acquisition of data to result visualization in the Konstanz Information Miner, a state of the art workflow management and data analytics platform. This new workflow, entitled PGMiner, not only includes all data analysis steps, but is highly customizable which is rather cumbersome for most existing pipelines. Moreover, no burdensome installation processes have to be performed making PGMiner the most user friendly tool available.


The current version of PGMiner includes 4 main categories. The nodes that are related to each category are listed. Detailed descriptions on nodes and their configurations can be reached via Node description panel in KNIME platform.
  • Data retrieval
    • PeptideAtlas
  • Database processing
    • File operation: 3-frame or 6-frame translation of list of databases
    • Database equalization: Creation of equalized databases
  • Peptide identification
    • MSGF+
    • OMSSA
    • XTandem
    • DatabaseSearchResultMerger: Returns one result file of an algorithm which searches same spectra file against multiple databases
    • FDR: False discovery rate calculation based on target-decoy approach
    • ConsensusDBSearchResult: Majority vote calculation
  • Peptide mapping
    • WuManber exact peptide sequence search
    • EnzymeChecker: Checks whether mapped peptide locations confirm enzymatic cleavage rule
    • Annotation mapping: Maps peptide genomic locations to annotation GFF file of the organism


    The current release of PGMiner runs on KNIME 3.0.1 Full Analytics ,nstalled on Linux and Windows.
    Java versions must be 1.8.
    In case PGMiner update site will be used, there is no installation rather than Java 1.8 and KNIME 3.0.1 Full Analytics is required.
    PGMiner supports latest versions of following MS Search engines. These tools are available with PGMiner, thus no installation is required.


    PGMiner database search engine runners require to be set configuration files which can be found under Downloads section.
    The parameter names are correlated to definitions given by algorithms. Please see detailed information about settings from algorithms's web pages.
    Besides that makeblastdb tool of NCBI Blast is available.
    PGMiner search engine runners currently support only .MGF(Mascot Generic Format).Therefore, MS/MS data in different format must be converted to .mgf format by using OpenMS FileConverter which is already available on KNIME 3.0.1 Full Analytics.
    We recommend running PGTools on a minimum of four core computers with minimum 20GB RAM.


    Watch our video tutorial
    Download the ubuntu virtual box image to have pre-installed PGMiner on KNIME
    Download configuration files
    Note: A ready-to-use workflow and sample data are available on virtual box. The session PG on virtual machine is password protected. The password is 1.
    The virtual box image will be updated when new version of PGMiner is released.

    Please contact us in case you encounter a problem. The support will be given asap.