OverviewTeaching: 20 min
Exercises: 10 minQuestions
How can I practice the new skills I’ve learned?Objectives
apply their Python skills to solve more extensive challenges.
In the previous five sections, you’ve been introduced to a great many concepts. We’ve covered multiple elements of Python’s syntax and some valuable tools in the Standard Library. We’ve dabbled in powerful libraries for handling and analysing data. We learned how to create high-quality visualisations from that data and those analyses. And we equipped ourselves with the knowledge and skills needed to write programs that are easier to use, maintain and extend.
However, as you probably found when you started taking your first steps with Python, all of this useful knowledge and these good practices that you’ve worked hard to pick up over the previous sessions will only find a place in your long-term memory if you use them. In this final section, we present you with some coding challenges: opportunites to apply what you’ve learned here.
Unlike the exercises you were set in the previous sections, most of these challenges have been developed by others, and we’re simply linking out to resources that we know and can recommend. If you know of other similar collections of programming challenges that you think would be a good way for you to practice the skills you’ve learned in this course, feel free to use those and please also tell us about them by creating a new Issue on the source repository of this lesson.
Advent of Code
Advent of Code is a collection of coding challenges, published two-per-day (complete the first to unlock the second) throughout the first 25 days of December. The challenges are language-agnostic, i.e. they can be solved with whatever programming language you want to use. You must sign up for free to access the puzzle inputs (you can sign in with an account for GitHub and several other platforms) but, once you’ve done so, you’ll have access to 50 puzzles each year from 2019, 2018, 2017, 2016, and 2015.
The puzzles vary considerably in difficulty. To get an idea of how hard a challenge is relative to the others for a given year, you can check out the Stats page for that year. (The overall participation tails off as the month goes on, but the peaks and troughs in completion can give a rough idea of where the easy and difficul challenges are.)
If you’d like a recommendation for where to start, the following three are some of our favourites:
For a challenge geared more towards those learning Python for computational biology and bioinformatics, we recommend Rosalind. Named in honour of Rosalind Franklin, the site provides a collection of 285 problems, all working with biological data.
Once again, you must sign up for a free account before you can access the problems. As with Advent of Code, Rosalind doesn’t look at your code: it only cares about whether you got the correct answer. However, the authors do recommend that you post your solution in the comments section that exists for each challenge, to get feedback and compare/discuss solutions with other users.
In our experience, the learning curve is quite steep, but the first few challenges are well within the reach for anyone who’s completed this course. Rosalind provides an overview of how many users have attempted and completed each problem, which gives you a good estimate of how challenging each task will be. It’s a great way to improve your programming skills, while beginning to develop an understanding of good algorithm design, and the methods underpinning many key tools/analyses in bioinformatics.
Debugging/Code Improvement Challenge
As another option, the challenge below is intended to test your understanding of the code style, Python syntax, and user interface design concepts introduced earlier in this material. Best tackled in pairs or small groups, it guides you through the (often sadly familiar) process of adapting someone else’s (or perhaps your own) poorly-written, poorly-documented code.
This script is intended to count nucleotides in DNA sequences stored in FASTA format. Before you look at the sequence files we will test it on, open the script in your favourite editor and discuss ways in which it could be improved. Things to think about might include:
- How easy is it to understand what the script does?
- How robust is the script?
- Does it follow good coding standards?
- Does it do what it is supposed to?
- What problems can you foresee, if the script were to be shared with others or applied to a different sequence file?
Now run the script on
example_sequences1.fasta. Do you notice any more improvements that could be made?
What about if you run the script on
Make a copy of the script (or start from scratch if you prefer!) and improve the code to make it
- portable between different computers/operating systems
- easy to maintain/adapt
- do what it is supposed to do!
(Note: You may be aware that Biopython and other libraries include functions and classes designed to work with sequence objects. It would be against the spirit of the exercise to use those libraries here.)
If you have time, try to further adapt the script to expand its functionality such that, given a file of protein sequences instead, it will produce counts of the different amino acids. You can use the file
protein_sequences.fastato test your script. You may also want to know that a DNA sequence can look confusingly like a protein sequence, thanks to IUPAC ambiguity codes.
Expected Output -
sequence_1 A: 14 C: 21 G: 9 T: 15 0.5084745762711864 # (you may prefer to improve the formatting of this output...) sequence_2 A: 14 C: 10 G: 10 T: 8 0.47619047619047616 sequence_3 A: 20 C: 15 G: 7 T: 11 0.41509433962264153 sequence_4 A: 20 C: 19 G: 12 T: 9 0.5166666666666667 sequence_5 A: 30 C: 8 G: 5 T: 10 0.24528301886792453
Expected Output -
sequence_1 A: 14 C: 21 G: 9 T: 15 0.5084745762711864 sequence_2 A: 13 C: 13 G: 13 T: 0 0.6666666666666667 sequence_3 A: 15 C: 12 G: 8 T: 9 0.45454545454545453 # (if ignoring ambiguous codes in sequence length) # (you may also want to include the ambiguous nucleiotide codes:) D: 1 H: 1 K: 1 M: 1 0.4166666666666667 # (if including ambiguous codes in sequence length) sequence_4 A: 48 C: 45 G: 37 T: 50 0.45555555555555555 sequence_5 A: 14 C: 14 G: 12 T: 13 0.49056603773584906
Expected Output -
sp|P05480|SRC_MOUSE Neuronal proto-oncogene tyrosine-protein kinase Src OS=Mus musculus GN=Src PE=1 SV=4 A: 40 C: 9 D: 22 E: 42 F: 21 G: 42 H: 9 I: 16 K: 32 L: 49 M: 11 N: 19 P: 33 Q: 23 R: 34 S: 38 T: 36 V: 33 W: 9 Y: 23 # (note lack of any additional output after residue counts for each sequence) sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3 A: 42 C: 8 D: 26 E: 21 F: 27 G: 37 H: 18 I: 22 K: 23 L: 60 M: 11 N: 19 P: 37 Q: 21 R: 24 S: 44 T: 32 V: 32 W: 13 Y: 19 sp|P12931|SRC_HUMAN Proto-oncogene tyrosine-protein kinase Src OS=Homo sapiens GN=SRC PE=1 SV=3 A: 44 C: 9 D: 21 E: 42 F: 21 G: 43 H: 9 I: 16 K: 31 L: 49 M: 11 N: 18 P: 32 Q: 23 R: 32 S: 37 T: 36 V: 30 W: 9 Y: 23
Thank you for following this lesson. We hope you’ve found this course helpful and interesting, and learned plenty of new things to apply in your Python programming every day. If you have thoughts on how we could improve these materials, or additional content that could be included here, please give us your feedback. Your instructors have probably shared a link with you to a post-workshop survey, but you can also give us your comments and suggestions by filing an Issue on the source repository.
There are many coding challenges to be found online, which can be used to exercise your Python skills.