software

Random Jungle

 

Information on Random Jungle 
 

"The novel software package RJ implements all features of the reference implementation randomForest such as various tuning parameters, prediction of new datasets using previously grown forests, sample proximities and imputation. Commonly used measures are implemented, such as Gini importance, permutation importance and conditional importance measures. RJ additionally implements the variable backward elimination. When multiple CPU are available, RJ is able to perform RF on multiple CPUs simultaneously using multithreading and Message Passing Interface (MPI) parallelization."

-- Schwarz. D (2010); On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data, Bioinformatics (2010) 26 (14): 1752-1758 

 

News - September 7th, 2011  

 

RJ VERSION 1.3.0 

FIX(ES):

- Prediction of regression random forests

- Naming of regression random forest FILEPREFIXNAME.importance outcome

- Set default targetpartionsize to 5 in case of regression mode (treetype = 3)

 

Random Jungle Versions   

 

Linux 32 Bit Version (Build 1.2.365) 

 

Linux 32 Bit MPI Version (Build 1.2.365) 

 

Linux 64 Bit Version (Build 1.3.0) 

 

Linux 64 Bit MPI Version (Build 1.3.0) 

Centos 64 Bit Version (In progress)

 

Example data 

Please insert the example data in the folder /demo/input

  

Windows 32 Bit Version (Build 1.2.365)

 

Example data 

Please insert the example data in the folder /demo/input

  

Not supported yet

  

Support and help

 

 

Manuel for all builds

 

Schwarz. D (2010);

On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data

Bioinformatics (2010) 26 (14): 1752-1758 

 

For getting help, visit our group and drop your question!

Google Groups
randomjungle
Visit this group

 

Support by Jochen Kruppa (jochen.kruppa@imbs-luebeck.de)

 

EECI

EECI (effect estimates confidence intervals) is an Excel tool for estimating confidence intervals for a number of epidemiological effect measures. Download: EECI.xlsx

The program is based on the following publication: Ziegler, A. and König, I. R. (2010): A Statistical Approach to Genetic Epidemiology: Concepts and Applications. Second edition. Wiley-VCH: Weinheim.

Microsoft Office 2007 is required for using this tool. Only the bold numbers can be modified by the user.

abi2link

abi2link is designed to create linkage files out of ABI genotype and phenotype files. Please see example directory for a detailed file description.

Usage:/abi2link ARGUMENTS

Currently known arguments:

 

--map <haldane|kosambi> locus mapping function
--ped <file> pedigree file
--chr <file> chromosome description file
--trait <file> trait file (optional)
--estimate <all|founder> estimate allele frequencies from all individuals or from founders only (optional, default: all)
--prefix <name> output file prefix (optional, default: abi2link)
-v, --version print version information and exit
-h, --help print this text and exit

Copyright: Andreas Ziegler

Contact: ziegler@imbs.uni-luebeck.de

abi2link

abilink - User Documentation

 

minsage

minsage (mininmal sample size for genotypes) is designed to calculate the sample size of genotypes minimally required to ensure that all alleles with a specified frequency at one locus are detected with a given confidence.

The program is based on the following publication:
Gregorius, H.-G. (1980) The probability of losing an allele when diploid genotypes are sampled. Biometrics, 36, 643-652.

minsage is started by typing "minsage".

Within the program, you are prompted to specify the following parameters:

  • allele frequency: minimum allele frequency a that is to be detected
  • confidence: confidence for detecting the allele
  • uniformly distributed alleles or biallelic markers: The allele can be set to be the less frequent allele of a biallelic marker. Otherwise, if neither the number of alleles nor the genotypic frequencies are known, alleles can set to be uniformly distributed.

The output renders the minimal sample size N of genotypes needed to detect alleles of frequency a with the specified confidence. The results are given both for the case that Hardy- Weinberg equilibrium can be or cannot be assumed.

minsage

minsage - User Documentation

Copyright: Andreas Ziegler
Contact: Inke.Koenig@imbs.uni-luebeck.de

GroupSeq

GroupSeq

is designed to calculate sequential boundaries in R with extended functionalites compared with the FORTRAN program by Reboussin et al. (2000, Controlled Clinical Trials, 21: 190-207).

It is available from CRAN under http://cran.r-project.org/web/packages/GroupSeq/index.html.

Contact: Inke.Koenig@imbs.uni-luebeck.de

 

silcLOD

silcLOD (significance levels and critical LODs) is designed to calculate nominal significance levels and critical LOD scores depending on the length of the investigated region, number of chromosomes, and the cross-over rate. The global significance level as well as the precision of the calculation have to be specified.

The program is based on the following publication:
Lander, E., Kruglyak, L. (1995) Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results.Nature Genetics, 11, 241-247.

silcLOD is started by typing "silcLOD".

Within the program, you are prompted to specify the following parameters:

  • Length of genomic region in Morgan: Length of the investigated region. The default is given by 33 (length of the human genome).
  • Number of chromosomes: Number of investigated chromosomes. The default is given by 23 (total number of human chromosomes).
  • Cross over rate: Total crossing over rate between the genotypes. For different mapping methods, the values for humans are given below (according to Lander and Kruglyak, 1995, Table 1). The default is given by 2.


    Mapping method Cross over rate
    Lod score analysis 1
    Allele sharing in sibs and half-sibs 2
    Allele sharing in grandparent-grandchildren 1
    Allele sharing in uncle-nephew 5/2
    Allele sharing in first cousin 8/3
    Allele sharing in first cousin, once removed 20/7
    Allele sharing in second cousin 16/5
  • Global significance level: Desired global signficance level for the investigation. The default is given by 0.05.
  • Precision: The precision sets the maximally allowed difference between the specified and the calculated global significance level. The default is given by 0.00000001.

In any stage, entering "?" gives help for specifying the parameters. The output can be saved or presented on screen only. The results render the nominal alpha for a single marker using an infinitely dense marker map as well as the critical LOD scores for single markers using an infinitely dense marker map or maps assuming distances of 10cM, 5cM, 2cM, or 1cM.


silcLOD

silcLOD - User Documentation

 

Copyright: Andreas Ziegler
Contact: Inke.Koenig@imbs.uni-luebeck.de

GEESIZE

GEESIZE version 3.1 is designed to compute the minimum sample size in studies with correlated response data based on generalized estimating equations (GEE). These correlated response data arise e.g. in repeated measurement designs, family studies or studies involving paired organs like ophtalmological studies.

GEESIZE is a SAS macro using SAS IML which has to be used within a SAS programm. Thus, the SAS IML modul has to be licensed.

The program is based on the following publications:
Rochon, J. (1998)Application of GEE procedures for sample size calculations in repeated measures Stat Med, 17, 1643-1658

Dahmen, G., Rochon, J., König, I. R., Ziegler, A. (2004), Sample size calculations for controlled clinical trials using generalized estimating equations (GEE) Methods Inf Med, 43(5), 451-6

The user might also be interested in:

Dahmen, G., Ziegler, A. (2004), Generalized estimating equations in controlled clinical trials: Hypotheses testing Biom J, 46, 214-232

Dahmen, G., Ziegler, A. (2006), Independence Estimating Equations for Controlled Clinical Trials with Small Sample Sizes Methods Inf Med, 45, 430-4

The documentation file gives an instruction to the use of the macro.

The output comprised the minimal sample size required in each treatment group under the predefined parameter setting. A detailed definition of the output can be find in the documentation file.

View macro

GEESIZE SAS Macro

GEESIZE 3.1 - User Documentation

Examples

 

Copyright: Prof. Dr. Andreas Ziegler

Contact: ziegler@imbs.uni-luebeck.de

power.HE

power.HE is designed to calculate sample size and power for the Haseman-Elston method in linkage analyses for a quantitative trait.

power_HE_v1-2.r

Inhalt abgleichen