📣 An issue occured with the embedded container registry on October 25 2021, between 10:30 and 12:10 (UTC+2). Any persisting issues should be reported to CC-IN2P3 Support. 🐛

Commit 9e81940a authored by LANORE Vincent's avatar LANORE Vincent
Browse files

Merge branch 'master' into get_fixed_seeds

parents 7a6deec3 0972a129
# Convergence
Convergence package is a software for the genome-wide detection of molecular signatures of a given phenotypic convergence.
Source code in C langage of the software are provided in directory 'src'. Directory 'data' contains a small dataset. Directory 'results' is an example of results obtained. Each directory contains a 'README.txt' file.
The directory "data" contains
- file 'character.csv' contains the table giving the character "echolocation" all extant species of the dataset,
- file 'phylogeny.nwk' contains the phylogeny,
- directory 'alignments' contains a dozen of alignments of the dataset,
- file 'GO_audition.txt' contains the GO-terms related to audition,
- file 'GO_background.txt' contains the list of genes used as backgroung for the enrichment tests,
- file 'GO_description.txt' associates all GO-ident with a short description,
- file 'GO_table.csv' associates all gene idents with the GO-ident which annote it,
- file 'GO_table_audition.csv' associates all gene idents with 'audition' if they are annotated with at least a GO-terms related to audition.
the three first items are from the dataset of
Thomas, G. W. and Hahn, M. W. (2015). Determining the null model for detecting adaptive convergence from genomic data: a case study using echolocating mammals. Molecular biology and evolution, 32(5), 1232–1236
and provided with the courtesy of the authors.
>Elephant
TSTEVFYKAGSKFVNGVAIHSMDFDDTWHPATHVLTALSEALPSGLDLLLAFSVGIEVQG
RLMHFSKEANDIPKRFHLGSAAAASKYLGLSEALAIAVSHAGAPIANAATQTKPLHIGNA
ARHGIEAAFLAMLGLQGNKQILDMEAGFGAFYTWDVAFKRFPADAAASVRKHLVLRIPDV
QYVNRPEARHSFQYVACAMLLDGGRRKVELEYPPTLYCEVSVTLKRSSTFYGHWRNPLSQ
KVEGLIRIVEKLEDIEDCSVLTTLLKG
>Alpaca
TNTEVFHKTQSKFVNGVAIHSMDFDDTWYPATHVLMALSEALPSGLDLLLAFNVGIEVQG
RLMHFSKEANDIPKRFHLGSAAAASKFLGLSEALAIAVSHAGAPMANAATQTKPLHIGNA
ARHGIEAVFLAMLGLQGNKQVLDMETGFGAFYTWDVAFKVFPADAAASVRRHLVLRIPDV
QYVNRPEARHSFQYVACATLLDGARGKVELEHPRTLYCEVSVVLKRCDTFYGHWRKPLSQ
KVERLIELVENLEVVEDCSVLTALLKE
>Megabat
TSTEVFHKATSKFVNGVAIHSMDFDDTWHPATHVLMALSEALPSGLDLLLAFNVGIEVQG
RLMNFSKEANNIPKRFHLGSAAAASKILGLSEALAIAVSHAGAPMANAATQTKPLHIGNA
ARHGMEAAFLAMLGLQGNKQVLDMQAGFGAFYTWDVAFKRFPADAAASVRKHLVLRIPDV
QYVNRPEARHSFQYVACAMLLDGNRGKVETEHPQTLYCEISVTLKRSDTFYGHWRKPLSQ
KVERFIETVENLEDLEDCSLLATLLKG
>Human
TTTEVFHIASSKFVNGVAIHSMDFDDTWHPATHVLTALAEALPSGLDLLLAFNVGIEVQG
RLLHFAKEANDMPKRFHLGSAAAASKFLGLSEALAIAVSHAGAPMANAATQTKPLHIGNA
AKHGIEAAFLAMLGLQGNKQVLDLEAGFGAFYSWDVAFKRFPADAAASVRKHLVLRIPNV
QYVNRPEARHSFQYVACAMLLDGGRSKVELEYPPILYCEISVTLKRSDTFYGHWRKPLSQ
EVESLIKIVKNLEDLEDCSVLTTLLKG
>Microbat
TSTEVFHKANSKFVNGVAIHSMDFDDTWYPATHVLMALSEALPSGLDLLLAFNVGIEVQG
RLMNFSKDANDIPKRFHLGSAAAASKILGLSKALAIAVSHAGAPMANAATQTKPLHMGNA
ARHGIEAVFLAMLGLQGNKQVLDTQTGFGAFYTWDVAFKSFPADAAASVRKHLVLRIPDV
QYVNRPEARHSFQYVACAMLLDGGRCKVELEHPQTLYCEISVTLKRADTFYGHWRKPLSQ
KVERLIKIVDNLEGLEDCSVLTSLLKG
>Mouse
TGTEVFHKVTSKFVNGVAVHSMDFDDTWHPATHVLTALSEALPSGLDLLLAFNVGIEVQG
RLMHFSKEAKDIPKRFHLGSAAAASKFLGLSEALAIAVSHAGAPIANAATQTKPLHIGNA
AKHGMEATFLAMLGLQGNKQILDLGSGFGAFYIWDVAFKSFPADAAAAVRKHLVLRIPDV
QYVNRPEARHSFQYVACASLLDGSRKKVKLEHPPTLYCEISITLKRSDTFYGHWRKPLSQ
EVESLITVVEKLEDLEDCSVLTRLLKG
>Dolphin
TSTEVFQKARSKFVNGVAVHSMDFDDTWYPATHVLMALSEALPSGLDLLLAFNVGIEVQG
RLMHFSKEAKDIPKRFHLGSAAAASKFLGLSEALAIAVSHAGAPMANAATQTKPLHVGNA
ARHGLEAAFLAMLGLQGNKQVLDMESGFGAFYTWDVAFKLFPADAAASVRKQLVLRIPDV
QYVNRPEARHSFQYVACAMLLDGARGKVELEHPQTLYCEISVALKRSDTFYGHWRKPLSQ
KVERLIEIVENLEDLEDCSVLTALLKE
>Marmoset
TSTEVFHKASSKFVNGVAIHSMDFDDTWHPATHVLTALAEALPSGLDLLLAFNVGIEVQG
QLLHFAKEANDIPKRFHLGSAAAASKFLGLSEALAIAVSHAGAPMANAATQTKPLHIGNA
AKHGIEAAFLAMLGLQGNKQILDLEAGLGAFYSWDVAFKRFPADAAASVRKHLVLRIPKV
QYVNRPEARHSFQYVACAMLLDGGRNKVELEYPPTLYCEISVTLKRSDTFYGHWRKPLSQ
EVESLIKIVKNLEDLEDCSVLTTLLKG
>Cow
TSTEVFHKTSSKFVNGVAIHSMDFDDTWHPATHVLMALSEALPSGLDLLLAFNVGIEVQG
RLLRFSKEAYDIPKRFHLGSAAAASKFLGLSEALAIAVSHAGAPMANAATQTKPLHVGNA
ARHGLEAAFLAMLGLQGNKRVLDLETGFGAFYTWDVAFKRFPADTAASMRRHLVLRIPDV
QYVNRPEARHSFQYVACAMLLDGVRGKVELEHPRTLYCEMSVALKRSDTFYGHWRKPLSQ
KVERLIEIVENLEDLEDCSVLTTLLKE
Elephant 0
Alpaca 0
Megabat 0
Human 0
Microbat 1
Mouse 0
Dolphin 1
Marmoset 0
Cow 0
((((Microbat:0.152206675,Megabat:0.118463325)13:0.02192845312,((Dolphin:0.07100760714,Cow:0.1197643929)15:0.01822872917,Alpaca:0.1198712708)14:0.03523542188)12:0.02628360938,((Marmoset:0.0667873125,Human:0.0427596875)17:0.08444453125,Mouse:0.3446139688)16:0.008101671875)11:0.005608828125,Elephant:0.1680970625)10;
/*
'msd' detects molecular signatures of phenotypic convergence /
'enr' computes GO terms enrichments of a list of genes /
Copyright (C) 2017 Gilles DIDIER
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include "AATree.h"
#define SIZE_BUFFER 100
static int skewAATree(TypeAATree *t, int n);
static int splitAATree(TypeAATree *t, int n);
static int searchAATreeRec(TypeAATree *t, double val, int n);
static int searchLowerAATreeRec(TypeAATree *t, double val, int last, int n);
static int searchUpperAATreeRec(TypeAATree *t, double val, int last, int n);
static int insertAATreeRec(TypeAATree *t, double val, int n);
static void setTableIndexRec(double *tab, TypeAATree *t, int *cur, int n);
TypeAATree *newAATree(int initSize) {
TypeAATree *t = (TypeAATree*)malloc(sizeof(TypeAATree));
if(t == NULL)
return NULL;
if(initSize <= 0)
initSize = SIZE_BUFFER;
t->buffer = initSize;
t->node = (TypeAANode*) malloc((t->buffer+1)*sizeof(TypeAANode));
if(t->node == NULL) {
free(t);
return NULL;
}
t->node[0].lv = 0;
t->node[0].l = -1;
t->node[0].r = -1;
t->node[0].index = NO_NODE_AATREE;
t->node++;
t->nullnode = -1;
t->root = -1;
t->deleted = NO_NODE_AATREE;
t->size = 0;
return t;
}
void freeAATree(TypeAATree *t) {
assert(t && t->node);
t->node--;
free((void*) t->node);
free((void*) t);
}
int skewAATree(TypeAATree *t, int n) {
assert(t && t->node && n<t->size);
if(t->node[n].lv != t->node[t->node[n].l].lv)
return n;
int left = t->node[n].l;
t->node[n].l = t->node[left].r;
t->node[left].r = n;
return left;
}
int splitAATree(TypeAATree *t, int n) {
assert(t && t->node && n<t->size);
if(t->node[t->node[t->node[n].r].r].lv != t->node[n].lv)
return n;
int right = t->node[n].r;
t->node[n].r = t->node[right].l;
t->node[right].l = n;
t->node[right].lv++;
return right;
}
int searchAATreeRec(TypeAATree *t, double val, int n) {
if(n == t->nullnode)
return NO_NODE_AATREE;
else {
if(t->node[n].val > val)
return searchAATreeRec(t, val, t->node[n].l);
else {
if(t->node[n].val < val)
return searchAATreeRec(t, val, t->node[n].r);
else
return n;
}
}
}
int searchAATree(TypeAATree *t, double val) {
return searchAATreeRec(t, val, t->root);
}
int searchLowerAATreeRec(TypeAATree *t, double val, int last, int n) {
if(n == t->nullnode)
return last;
else {
if(t->node[n].val > val)
return searchLowerAATreeRec(t, val, last, t->node[n].l);
else {
if(t->node[n].val < val)
return searchLowerAATreeRec(t, val, n, t->node[n].r);
else
return n;
}
}
}
int searchLowerAATree(TypeAATree *t, double val) {
return searchLowerAATreeRec(t, val, NO_NODE_AATREE, t->root);
}
int searchUpperAATreeRec(TypeAATree *t, double val, int last, int n) {
if(n == t->nullnode)
return last;
else {
if(t->node[n].val > val)
return searchUpperAATreeRec(t, val, n, t->node[n].l);
else {
if(t->node[n].val < val)
return searchUpperAATreeRec(t, val, last, t->node[n].r);
else
return n;
}
}
}
int searchUpperAATree(TypeAATree *t, double val) {
return searchUpperAATreeRec(t, val, NO_NODE_AATREE, t->root);
}
int insertAATreeRec(TypeAATree *t, const double val, int n) {
if(n == t->nullnode) {
if(t->size >= t->buffer) {
t->buffer += SIZE_BUFFER;
t->node--;
t->node = (TypeAANode*) realloc((void*)t->node, t->buffer*sizeof(TypeAANode));
t->node++;
}
t->node[t->size].val = val;
t->node[t->size].lv = 1;
t->node[t->size].index = NO_NODE_AATREE;
t->node[t->size].l = t->nullnode;
t->node[t->size].r = t->nullnode;
t->size++;
return t->size-1;
}
if(val < t->node[n].val)
t->node[n].l = insertAATreeRec(t, val, t->node[n].l);
else {
if(val > t->node[n].val)
t->node[n].r = insertAATreeRec(t, val, t->node[n].r);
else
return n;
}
n = skewAATree(t, n);
n = splitAATree(t, n);
return n;
}
void insertAATree(TypeAATree *t, const double val) {
t->root = insertAATreeRec(t, val, t->root);
}
void fprintAATree(FILE *f, TypeAATree *t) {
int i;
fprintf(f, "size %d\troot %d\n", t->size, t->root);
for(i=0; i<t->size; i++)
fprintf(f, "node %d/%.2le l %d r %d (%d)\n", i, t->node[i].val, t->node[i].l, t->node[i].r, t->node[i].index);
}
void setTableIndexRec(double *tab, TypeAATree *t, int *index, int n) {
if(n == t->nullnode)
return;
setTableIndexRec(tab, t, index, t->node[n].r);
t->node[n].index = *index;
tab[(*index)++] = t->node[n].val;
setTableIndexRec(tab, t, index, t->node[n].l);
}
void setTableIndex(double *tab, TypeAATree *t) {
int index = 0;
setTableIndexRec(tab, t, &index, t->root);
}
/*
int removeAATreeRec(TypeAATree *t, const double val, int n) {
if(n == t->nullnode)
return n;
t->last = n;
if(val < t->node[n].val)
t->node[n].l = removeAATreeRec(t, val, t->node[n].l);
else {
t->deleted = n;
t->node[n].r = removeAATreeRec(t, val, t->node[n].r);
}
if(n == t->last && t->deleted != t->nullnode && val == t->node[t->deleted]val) {
t->node[t->deleted].val = t->node[n].val;
t->deleted = t->nullnode;
n = t->node[n].r;
free(t->last->val);
free(t->last);
t->num_entries--;
} else if(t->node[n].l->lv < t->node[n].lv-1 || t->node[n].r->lv < t->node[n].lv-1) {
t->node[n].lv--;
if(t->node[n].r->lv > t->node[n].lv) t->node[n].r->lv = t->node[n].lv;
n = aa_skew(n);
t->node[n].r = aa_skew(t->node[n].r);
t->node[n].r->r = aa_skew(t->node[n].r->r);
n = aa_split(n);
t->node[n].r = aa_split(t->node[n].r);
}
return n;
}
void removeAATree(TypeAATree *t, const double val) {
t->root = removeAATreeRec(t, val, t->root);
}
static void aa_foreach_(const aanode *nn, const aanode *n, AAForeach cb, void *arg) {
if(n == nn) return;
(*cb)(n, arg);
aa_foreach_(nn, t->node[n].l, cb, arg);
aa_foreach_(nn, t->node[n].r, cb, arg);
}
void aa_foreach(const aat *t, AAForeach callback, void *arg) {
aa_foreach_(t->nullnode, t->root, callback, arg);
}
static void aa_map_(const aanode *nn, aanode *n, AAMap cb) {
if(n == nn) return;
void *tmp = (*cb)(n);
free(t->node[n].val);
t->node[n].val = tmp;
aa_map_(nn, t->node[n].l, cb);
aa_map_(nn, t->node[n].r, cb);
}
void aa_map(const aat *t, AAMap callback) {
aa_map_(t->nullnode, t->root, callback);
}
static aanode* aa_first_(aanode *nn, aanode *n) {
if(n == nn) return nn;
else if(t->node[n].l == nn) return n;
else return aa_first_(nn, t->node[n].l);
}
aanode* aa_first(const aat *t) {
return aa_first_(t->nullnode, t->root);
}
static aanode* aa_last_(aanode *nn, aanode *n) {
if(n == nn) return nn;
else if(t->node[n].r == nn) return n;
else return aa_last_(nn, t->node[n].r);
}
aanode* aa_last(const aat *t) {
return aa_last_(t->nullnode, t->root);
}
*/
/*
'msd' detects molecular signatures of phenotypic convergence /
'enr' computes GO terms enrichments of a list of genes /
Copyright (C) 2017 Gilles DIDIER
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#ifndef AATreeF
#define AATreeF
#include <stdio.h>
#include <limits.h>
#define NO_NODE_AATREE INT_MAX
typedef struct AA_NODE {
double val;
int lv, l, r, index;
} TypeAANode;
typedef struct {
int root, nullnode, deleted, last, size, buffer;
TypeAANode *node;
} TypeAATree;
// new empty AA tree
TypeAATree *newAATree(int initSize);
// delete the AA tree(release all resources)
void freeAATree(TypeAATree *t);
// search an item in the tree by the key, and return the value
int searchAATree(TypeAATree *t, double val);
int searchLowerAATree(TypeAATree *t, double val);
int searchUpperAATree(TypeAATree *t, double val);
// insert an item by key-value into the tree
void insertAATree(TypeAATree *t, const double val);
void fprintAATree(FILE *f, TypeAATree *t);
void setTableIndex(double *tab, TypeAATree *t);
#endif
# list of executable files to produce
MSD = msd
ENR = enr
# .o files necessary to build the executables
OBJ_MSD = AATree.o EvolutionModel.o EvolutionModelProt.o EvolutionModelProtStored.o Utils.o Tree.o NLOpt.o Alignment.o ConvergenceExpectation.o ColumnLikelihood.o DensityUtils.o AlignmentLikelihood.o MolSigDetect.o Character.o
OBJ_ENR = StatAnnotation.o Annotation.o Utils.o Enrichment.o
########### MODIFY ONLY TO CHANGE OPTIONS ############
# compiler and its options
CC = gcc
CFLAGS = -I/usr/local/include -I/usr/include -Wall -Wno-char-subscripts -D_POSIX_SOURCE -std=c99 -Wall -pedantic -O3
# linker and its options
LD = $(CC)
LDFLAGS = -rdynamic -lm -lnlopt -lgsl -lgslcblas
LDFLAGS_TH = -rdynamic -lm -lnlopt -lgsl -lgslcblas -D_REENTRANT -L/usr/local/lib -L/usr/lib/ -lpthread
############ LIST OF EXECUTABLE TARGETS (MODIFY ONLY TO ADD AN EXECUTABLE) ##############
all: Makefile.d $(MSD) $(ENR)
# build the executable
$(MSD): $(OBJ_MSD)
$(LD) $^ -o $@ $(LDFLAGS_TH)
# build the executable
$(ENR): $(OBJ_ENR)
$(LD) $^ -o $@ $(LDFLAGS)
############ DO NOT MODIFY ANYTHING BELOW THIS LINE ##############
# create .o from .c
.c.o:
$(CC) $(CFLAGS) -c $<
# remove non essential files
clean:
$(RM) *.o *~ *.log Makefile.d
# clean everything but sources
distclean: clean
$(RM) $(EXE)
# dependencies
Makefile.d:
$(CC) -MM $(CFLAGS) *.c > Makefile.d
# only real files can be non phony targets
.PHONY: all clean distclean debug release
-include Makefile.d
This diff is collapsed.
This diff is collapsed.
/*
'msd' detects molecular signatures of phenotypic convergence /
'enr' computes GO terms enrichments of a list of genes /
Copyright (C) 2017 Gilles DIDIER
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#ifndef AlignmentF
#define AlignmentF
#include <stdlib.h>
#include <stdio.h>
#define EMPTY 255
#define EMPTY_CAR '-'
#define WHATEVER_CAR_DNA 'N'
#define WHATEVER_CAR_PROT 'X'
#define TYPE_ALIGNMENT_PROTEIN 'p'
#define TYPE_ALIGNMENT_DNA 'd'
#define TYPE_ALIGNMENT_RNA 'r'
#define DNA "ACGT" /*"ACGT" + "YRMKWSBDHVN"*/
#define RNA "ACGUYRMKWSBDHVN" /*"ACGU" + "YRMKWSBDHVN"*/
#define PRO "ARDNCEQGHILKMFPSTWYV"
typedef int TypePosition;
typedef int TypeSymbol;
typedef int TypeNumber;
typedef enum TAF {
fasta=0,
clustal,
msf,
markx,
srs,
unknown
} TypeAlignmentFile;
typedef struct ALIGNMENT {
TypeNumber number;
TypePosition size;
TypeSymbol **sequence, empty, cardinal, whatever;
char **name, *table;
} TypeAlignment;
TypeAlignment *readAlignment(FILE *f, char typeAlphabet);
TypeAlignment *readAlignmentFasta(FILE *f, char *table, int canInc);
TypeAlignment *readAlignmentMsf(FILE *f, char *table, int canInc);
TypeAlignment *readAlignmentClustal(FILE *f, char *table, int canInc);
void printAlignmentFasta(FILE *f, TypeAlignment *al, int sizeLine);
void printAlignmentMsf(FILE *f, TypeAlignment *al, int sizeLine);
void printAlignmentSrs(FILE *f, TypeAlignment *al, int sizeLine);
void printAlignmentMarkX(FILE *f, TypeAlignment *al, int sizeLine);
void printAlignmentTex(FILE *f, TypeAlignment *al, int sizeLine);
void printAlignmentTexBis(FILE *f, TypeAlignment *al, int sizeLine);
void printHeadPair(FILE *f, TypeAlignment *al);
void printHeadMulti(FILE *f, TypeAlignment *al);
void freeAlignment(TypeAlignment *alignment);
void purgeAlignment(TypeAlignment *al);
#endif
/*
'msd' detects molecular signatures of phenotypic convergence /
'enr' computes GO terms enrichments of a list of genes /
Copyright (C) 2017 Gilles DIDIER
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#include <stdlib.h>
#include <math.h>
#include <nlopt.h>
#include "AlignmentLikelihood.h"
#include "ColumnLikelihood.h"
#include "EvolutionModelProt.h"
#include "EvolutionModelProtStored.h"
#include "NLOpt.h"
typedef struct OPT_DATA {