Commit 3e25eb7a authored by LANORE Vincent's avatar LANORE Vincent
Browse files

Started working on generate_pairs.py script to generate sorted profile pairs.

parent 172cee79
#!/usr/bin/python
# -*- coding: utf-8 -*-
# Copyright or Copr. Centre National de la Recherche Scientifique (CNRS) (2018)
# Contributors:
# - Vincent Lanore <vincent.lanore@gmail.com>
# This software is a computer program whose purpose is to provide a set of scripts for pre and post processing of data for
# convergence detection programs.
# This software is governed by the CeCILL-C license under French law and abiding by the rules of distribution of free software.
# You can use, modify and/ or redistribute the software under the terms of the CeCILL-C license as circulated by CEA, CNRS and
# INRIA at the following URL "http://www.cecill.info".
# As a counterpart to the access to the source code and rights to copy, modify and redistribute granted by the license, users
# are provided only with a limited warranty and the software's author, the holder of the economic rights, and the successive
# licensors have only limited liability.
# In this respect, the user's attention is drawn to the risks associated with loading, using, modifying and/or developing or
# reproducing the software by the user in light of its specific status of free software, that may mean that it is complicated
# to manipulate, and that also therefore means that it is reserved for developers and experienced professionals having in-depth
# computer knowledge. Users are therefore encouraged to load and test the software's suitability as regards their requirements
# in conditions enabling the security of their systems and/or data to be ensured and, more generally, to use and operate it in
# the same conditions as regards security.
# The fact that you are presently reading this means that you have had knowledge of the CeCILL-C license and that you accept
# its terms.
"""Scripts that reads aminoacid profiles from a file, draws random pairs among them,
and outputs said pairs sorted in distance classes.
Usage:
generate_pairs.py [options...] -o <output-file> <profiles-file>
Positional arguments:
profiles-file the file to read profile from
Options:
-h, --help show this help message and exit
-o, --output-file <filename>
output file"""
from diffsel_script_utils import *
#===================================================================================================
STEP("Parsing command line arguments")
from docopt import docopt
args = docopt(__doc__)
profiles_file = args["<profiles-file>"]
MESSAGE("Profile file is " + param(profiles_file))
out = args["--output-file"]
MESSAGE("Output file is " + param(out))
#===================================================================================================
STEP("Reading profiles from file")
import pandas as pd
MESSAGE("Reading tsv file")
profiles = pd.read_csv(profiles_file, sep=" ", header=None).transpose()
print(profiles.shape)
print(profiles)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment