<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE knimeNode PUBLIC "-//UNIKN//DTD KNIME Node 2.0//EN" "http://www.knime.org/Node.dtd">
<knimeNode icon="generic_node.png" type="Manipulator">
    <name>TopPerc</name>
    
    <shortDescription>
        Facilitate input to Percolator and reintegrate.
    </shortDescription>
    
    <fullDescription>
        <intro><p>Facilitate input to Percolator and reintegrate.</p>
		<p>
			<a href="http://ftp.mi.fu-berlin.de/OpenMS/release-documentation/html/UTILS_TopPerc.html">Web Documentation for TopPerc</a>
		</p>

        </intro>
        
		<option name="version">Version of the tool that generated this parameters file.</option>
		<option name="enzyme">Type of enzyme: no_enzyme,elastase,pepsin,proteinasek,thermolysin,chymotrypsin,lys-n,lys-c,arg-c,asp-n,glu-c,trypsin</option>
		<option name="e">read xml-input format (pin) from standard input</option>
		<option name="Z">Include decoys (PSMs, peptides and/or proteins) in the xml-output. Only available if -X is used.</option>
		<option name="p">Cpos, penalty for mistakes made on positive examples. Set by cross validation if not specified.</option>
		<option name="n">Cneg, penalty for mistakes made on negative examples. Set by cross validation if not specified.</option>
		<option name="F">False discovery rate threshold to define positive examples in training. Set by cross validation if 0. Default is 0.01.</option>
		<option name="t">False discovery rate threshold for evaluating best cross validation result and the reported end result. Default is 0.01.</option>
		<option name="i">Maximal number of iterations</option>
		<option name="x">Quicker execution by reduced internal cross-validation.</option>
		<option name="f">Fraction of the negative data set to be used as train set when only providing one negative set, remaining examples will be used as test set. Set to 0.6 by default.</option>
		<option name="V">The most informative feature given as the feature name, can be negated to indicate that a lower value is better.</option>
		<option name="v">Set verbosity of output: 0=no processing info, 5=all, default is 2</option>
		<option name="u">Use unit normalization [0-1] instead of standard deviation normalization</option>
		<option name="R">Measure performance on test set each iteration</option>
		<option name="O">Override error check and do not fall back on default score vector in case of suspect score vector</option>
		<option name="S">Setting seed of the random number generator. Default value is 1</option>
		<option name="K">Retention time features calculated as in Klammer et al.</option>
		<option name="D">Include description of correct features</option>
		<option name="U">Do not remove redundant peptides, keep all PSMS and exclude peptide level probabilities.</option>
		<option name="s">skip validation of input file against xml schema</option>
		<option name="A">output protein level probabilities</option>
		<option name="a">Probability with which a present protein emits an associated peptide (to be used jointly with the -A option). Set by grid search if not specified.</option>
		<option name="b">Probability of the creation of a peptide from noise (to be used jointly with the -A option). Set by grid search if not specified</option>
		<option name="G">Prior probability of that a protein is present in the sample ( to be used with the -A option). Set by grid search if not specified</option>
		<option name="g">treat ties as if it were one protein (Only valid if option -A is active).</option>
		<option name="I">use pi_0 value when calculating empirical q-values (no effect if option Q is activated) (Only valid if option -A is active).</option>
		<option name="q">output empirical q-values and p-values (from target-decoy analysis) (Only valid if option -A is active).</option>
		<option name="N">disactivates the grouping of proteins with similar connectivity, for example if proteins P1 and P2 have the same peptides matching both of them, P1 and P2 will not be grouped as one protein (Only valid if option -A is active).</option>
		<option name="E">Proteins graph will not be separated in sub-graphs (Only valid if option -A is active).</option>
		<option name="C">it does not prune peptides with a very low score (~0.0) which means that if a peptide with a very low score is matching two proteins, when we prune the peptide,it will be duplicated to generate two new protein groups (Only valid if option -A is active).</option>
		<option name="d">Setting depth 0 or 1 or 2 from low depth to high depth(less computational time) of the grid search for the estimation Alpha,Beta and Gamma parameters for fido(Only valid if option -A is active). Default value is 0</option>
		<option name="P">Define the text pattern to identify the decoy proteins and/or PSMs, set this up if the label that identifies the decoys in the database is not the default (by default : random) (Only valid if option -A  is active).</option>
		<option name="T">Reduce the tree of proteins (removing low scored proteins) in order to estimate alpha,beta and gamma faster.(Only valid if option -A is active).</option>
		<option name="Y">Use target decoy competition to compute peptide probabilities.(recommended when using -A).</option>
		<option name="H">Q-value threshold that will be used in the computation of the MSE and ROC AUC score in the grid search (recommended 0.05 for normal size datasets and 0.1 for big size datasets).(Only valid if option -A is active).</option>
		<option name="fido-truncation">Proteins with a very low score (&lt; 0.001) will be truncated (assigned 0.0 probability).(Only valid if option -A is active)</option>
		<option name="Q">Uses protein group level inference, each cluster of proteins is either present or not, therefore when grouping proteins discard all possible combinations for each group.(Only valid if option -A is active and -N is inactive).</option>
		<option name="log">Name of log file (created only when specified)</option>
		<option name="debug">Sets the debug level</option>
		<option name="threads">Sets the number of threads allowed to be used by the TOPP tool</option>
		<option name="no_progress">Disables progress logging to command line</option>
		<option name="force">Overwrite tool specific checks.</option>
		<option name="test">Enables the test mode (needed for internal use only)</option>

    </fullDescription>
    
    <ports>
		<inPort index="0" name="percolator_executable []">Path to the percolator binary []</inPort>
		<inPort index="1" name="in_target [mzid]">Input target file [mzid]</inPort>
		<inPort index="2" name="in_decoy [mzid]">Input decoy file [mzid]</inPort>
		<inPort index="3" name="k []">Input file given in the deprecated pin-xml format generated by e.g. sqt2pin with the -k option [,opt.]</inPort>
		<inPort index="4" name="W []">Read initial weights to the given file [,opt.]</inPort>
		<outPort index="0" name="out []">Output file []</outPort>
		<outPort index="1" name="X [Inactive]">path to file in xml-output format (pout). Default is: pout.tab [Inactive]</outPort>
		<outPort index="2" name="J [Inactive]">Output the computed features to the given file in tab-delimited format. A file with the features with the given file name will be created [Inactive]</outPort>
		<outPort index="3" name="w [Inactive]">Output final weights to the given file [Inactive]</outPort>
		<outPort index="4" name="B [Inactive]">Output tab delimited results for decoys into a file [Inactive]</outPort>
 </ports>
    <views>
        <view index="0" name="TopPerc Std Output">The text sent to standard out during the execution of TopPerc.</view>
        <view index="1" name="TopPerc Error Output">The text sent to standard error during the execution of TopPerc. (If it appears in gray, it's the output of a previously failing run which is preserved for your trouble shooting.)</view>
    </views>    
</knimeNode>
