Bioinformatics Stronghold 1 (Num 7-13)

Problem 7 GC (0.5 pt)

  • A dictionary with accession:sequence key:value pairs is useful, make sure you remove any newlines from your sequence. Or, use Biopython to parse the FASTA file.

The next three are easier than the rest. Get help if you are struggling with these:

  • Problem 8 RNA (0.5 pt)

  • Problem 9 PROT (0.5 pt)

  • Problem 10 SUBS (0.5 pt)

Problem 11 GRPH (1 pt)

  • Consider using the itertools.combinations function from the itertools module. For example:
import itertools
#combinations function from itertools module pulls all pairwise (2) keys from dict
for k1,k2 in itertools.combinations(datadict, 2): 
  #k1 is one key from datadict
  #k2 is another key from datadict
  #do something with them

Problem 12 LCSM (1 pt)

  • Consider writing two functions. One that checks if a given substring appears in all members of a list of strings and returns True/False. The other pulls out all substrings from the first string in a list and sends the longest candidates to your check function. Keep track of your current longest substring as you go.

Problem 13 MPRT (1 pt)

  • Check out the re module and the regular expressions page in the Python for Biologists tutorial. The urllib.request module may be useful or try parsing SwissProt records with Biopython.

On the due date, in class, you may be asked questions about how your code works and your comments will help you explain. Inability to explain how your code works will result in a reduced grade.

Optional challenges to help you prepare for your projects

  • Connect to compbio.cs.luc.edu and test running your scripts from the command line
    • Check out the Unix scp command
  • Avoid hardcoding your input/output file paths in your scripts. Try passing file paths into your script via the command line.
    • Check out python modules sys and argparse
    • See example problem and solution on the compbio server: /homes/hwheeler/python_examples/DNA.py