Each problem will mention some outside software that could solve it under the Problem heading. However, you should use Biopython to write your own script to solve each problem. Be sure to use modules mentioned in the Programming Shortcut sections at the bottom of the page to make everything easier.
24 INI: Install Biopython or use the compbio.cs.luc.edu
server. Once installed, the solution is easy (see Programming Shortcut). The easiest way to install is from the command line. Try this first:
conda install -c anaconda biopython
If that doesn’t work, make sure you’re computer can find the correct conda
. Because I have both python2
and python3
installed, I had to first find the path to python2
from the terminal (Mac/Linux): which python2
And then: [path output from above]/anaconda/bin/conda install -c anaconda biopython
In the Python shell, try:
>>> import Bio
If no error, you’re set! If Bio can’t be found, make sure the IDE you’re using includes the paths to anaconda, try from the IDE shell:
>>> import sys
>>> sys.path
If anaconda paths are missing:
>>> sys.path.append('/path/to/anaconda/')
If that doesn’t work, see http://biopython.org/wiki/Download for other ways to install.
25 GBK: OMG, automated retrieval from NCBI? Awesome!
Again, see the Programming Shortcut. Use the Bio.Entrez
and Bio.SeqIO
modules. Entrez requires your email address, a precaution against excessive usage. See the Biopython tutorial for details. You can always try your search on the NCBI Nucleotide database to figure out how to structure your query.
26 TFSQ: Review the FASTQ format under How to Handle Quality. Don’t use any of the links described under the Problem heading. Instead, write your own one-liner using the Bio.SeqIO
module (see Programming Shortcut).
27 PHRE: Although you would probably use FastQC software in your research, the goal here is to get some more practice with Biopython, so use the Programming Shortcut.
28 FILT: Use the same Programming Shortcut from number 32 for obtaining phred scores. Be sure your conditional statements include reads at or above (>=) the thresholds q and p.
29 BPHR: Get the phred scores like you’ve done previously. Consider making a list of lists or a matrix using the numpy
module.
30 BFIL: Likely the most challenging problem. Again, get the phred scores like you’ve done previously. Consider building a list of indices of where to trim the left end of sequences and a list of indices of where to trim the right end of sequences. Replace lines 2 and 4 with the trimmed sequences for each record, while keeping lines 1 and 3 stored somewhere in order to print out the trimmed fastq
file.