





"The novel software package RJ implements all features of the reference implementation randomForest such as various tuning parameters, prediction of new datasets using previously grown forests, sample proximities and imputation. Commonly used measures are implemented, such as Gini importance, permutation importance and conditional importance measures. RJ additionally implements the variable backward elimination. When multiple CPU are available, RJ is able to perform RF on multiple CPUs simultaneously using multithreading and Message Passing Interface (MPI) parallelization."
-- Schwarz. D (2010); On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data, Bioinformatics (2010) 26 (14): 1752-1758 ![]()
RJ VERSION 1.3.0
FIX(ES):
- Prediction of regression random forests
- Naming of regression random forest FILEPREFIXNAME.importance outcome
- Set default targetpartionsize to 5 in case of regression mode (treetype = 3)
|
Centos 64 Bit Version (In progress) |
|
|
Please insert the example data in the folder /demo/input |
|
Please insert the example data in the folder /demo/input |
|
Not supported yet |
|
Schwarz. D (2010); On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data Bioinformatics (2010) 26 (14): 1752-1758 |
For getting help, visit our group and drop your question!
| randomjungle |
| Visit this group |
| Support by Jochen Kruppa (jochen.kruppa@imbs-luebeck.de) |
EECI (effect estimates confidence intervals) is an Excel tool for estimating confidence intervals for a number of epidemiological effect measures. Download: EECI.xlsx
The program is based on the following publication: Ziegler, A. and König, I. R. (2010): A Statistical Approach to Genetic Epidemiology: Concepts and Applications. Second edition. Wiley-VCH: Weinheim.
Microsoft Office 2007 is required for using this tool. Only the bold numbers can be modified by the user.
abi2link is designed to create linkage files out of ABI genotype and phenotype files. Please see example directory for a detailed file description.
Currently known arguments:
| --map <haldane|kosambi> | locus mapping function |
| --ped <file> | pedigree file |
| --chr <file> | chromosome description file |
| --trait <file> | trait file (optional) |
| --estimate <all|founder> | estimate allele frequencies from all individuals or from founders only (optional, default: all) |
| --prefix <name> | output file prefix (optional, default: abi2link) |
| -v, --version | print version information and exit |
| -h, --help | print this text and exit |
Copyright: Andreas Ziegler
Contact: ziegler@imbs.uni-luebeck.de
minsage (mininmal sample size for genotypes) is designed to calculate the sample size of genotypes minimally required to ensure that all alleles with a specified frequency at one locus are detected with a given confidence.
The program is based on the following publication:
Gregorius, H.-G. (1980) The probability of losing an allele when diploid genotypes are sampled. Biometrics, 36, 643-652.
minsage is started by typing "minsage".
Within the program, you are prompted to specify the following parameters:
The output renders the minimal sample size N of genotypes needed to detect alleles of frequency a with the specified confidence. The results are given both for the case that Hardy- Weinberg equilibrium can be or cannot be assumed.
Copyright: Andreas Ziegler
Contact: Inke.Koenig@imbs.uni-luebeck.de
GroupSeq
is designed to calculate sequential boundaries in R with extended functionalites compared with the FORTRAN program by Reboussin et al. (2000, Controlled Clinical Trials, 21: 190-207).
It is available from CRAN under http://cran.r-project.org/web/packages/GroupSeq/index.html.
Contact: Inke.Koenig@imbs.uni-luebeck.de
silcLOD (significance levels and critical LODs) is designed to calculate nominal significance levels and critical LOD scores depending on the length of the investigated region, number of chromosomes, and the cross-over rate. The global significance level as well as the precision of the calculation have to be specified.
The program is based on the following publication:
Lander, E., Kruglyak, L. (1995) Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results.Nature Genetics, 11, 241-247.
silcLOD is started by typing "silcLOD".
Within the program, you are prompted to specify the following parameters:
| Mapping method | Cross over rate |
| Lod score analysis | 1 |
| Allele sharing in sibs and half-sibs | 2 |
| Allele sharing in grandparent-grandchildren | 1 |
| Allele sharing in uncle-nephew | 5/2 |
| Allele sharing in first cousin | 8/3 |
| Allele sharing in first cousin, once removed | 20/7 |
| Allele sharing in second cousin | 16/5 |
In any stage, entering "?" gives help for specifying the parameters. The output can be saved or presented on screen only. The results render the nominal alpha for a single marker using an infinitely dense marker map as well as the critical LOD scores for single markers using an infinitely dense marker map or maps assuming distances of 10cM, 5cM, 2cM, or 1cM.
Copyright: Andreas Ziegler
Contact: Inke.Koenig@imbs.uni-luebeck.de
GEESIZE version 3.1 is designed to compute the minimum sample size in studies with correlated response data based on generalized estimating equations (GEE). These correlated response data arise e.g. in repeated measurement designs, family studies or studies involving paired organs like ophtalmological studies.
GEESIZE is a SAS macro using SAS IML which has to be used within a SAS programm. Thus, the SAS IML modul has to be licensed.
The program is based on the following publications:
Rochon, J. (1998)Application of GEE procedures for sample size calculations in repeated measures Stat Med, 17, 1643-1658
Dahmen, G., Rochon, J., König, I. R., Ziegler, A. (2004), Sample size calculations for controlled clinical trials using generalized estimating equations (GEE) Methods Inf Med, 43(5), 451-6
The user might also be interested in:
Dahmen, G., Ziegler, A. (2004), Generalized estimating equations in controlled clinical trials: Hypotheses testing Biom J, 46, 214-232
Dahmen, G., Ziegler, A. (2006), Independence Estimating Equations for Controlled Clinical Trials with Small Sample Sizes Methods Inf Med, 45, 430-4
The documentation file gives an instruction to the use of the macro.
The output comprised the minimal sample size required in each treatment group under the predefined parameter setting. A detailed definition of the output can be find in the documentation file.
GEESIZE 3.1 - User Documentation
Copyright: Prof. Dr. Andreas Ziegler
Contact: ziegler@imbs.uni-luebeck.de
power.HE is designed to calculate sample size and power for the Haseman-Elston method in linkage analyses for a quantitative trait.

