This lesson is in the early stages of development (Alpha version)

Coding Challenges

Overview

Teaching: 20 min
Exercises: 10 min
Questions
  • How can I practice the new skills I’ve learned?

Objectives
  • apply their Python skills to solve more extensive challenges.

In the previous five sections, you’ve been introduced to a great many concepts. We’ve covered multiple elements of Python’s syntax and some valuable tools in the Standard Library. We’ve dabbled in powerful libraries for handling and analysing data. We learned how to create high-quality visualisations from that data and those analyses. And we equipped ourselves with the knowledge and skills needed to write programs that are easier to use, maintain and extend.

However, as you probably found when you started taking your first steps with Python, all of this useful knowledge and these good practices that you’ve worked hard to pick up over the previous sessions will only find a place in your long-term memory if you use them. In this final section, we present you with some coding challenges: opportunites to apply what you’ve learned here.

Unlike the exercises you were set in the previous sections, most of these challenges have been developed by others, and we’re simply linking out to resources that we know and can recommend. If you know of other similar collections of programming challenges that you think would be a good way for you to practice the skills you’ve learned in this course, feel free to use those and please also tell us about them by creating a new Issue on the source repository of this lesson.

Advent of Code

Advent of Code is a collection of coding challenges, published two-per-day (complete the first to unlock the second) throughout the first 25 days of December. The challenges are language-agnostic, i.e. they can be solved with whatever programming language you want to use. You must sign up for free to access the puzzle inputs (you can sign in with an account for GitHub and several other platforms) but, once you’ve done so, you’ll have access to 50 puzzles each year from 2019, 2018, 2017, 2016, and 2015.

The puzzles vary considerably in difficulty. To get an idea of how hard a challenge is relative to the others for a given year, you can check out the Stats page for that year. (The overall participation tails off as the month goes on, but the peaks and troughs in completion can give a rough idea of where the easy and difficul challenges are.)

If you’d like a recommendation for where to start, the following three are some of our favourites:

Rosalind

For a challenge geared more towards those learning Python for computational biology and bioinformatics, we recommend Rosalind. Named in honour of Rosalind Franklin, the site provides a collection of 285 problems, all working with biological data.

Once again, you must sign up for a free account before you can access the problems. As with Advent of Code, Rosalind doesn’t look at your code: it only cares about whether you got the correct answer. However, the authors do recommend that you post your solution in the comments section that exists for each challenge, to get feedback and compare/discuss solutions with other users.

In our experience, the learning curve is quite steep, but the first few challenges are well within the reach for anyone who’s completed this course. Rosalind provides an overview of how many users have attempted and completed each problem, which gives you a good estimate of how challenging each task will be. It’s a great way to improve your programming skills, while beginning to develop an understanding of good algorithm design, and the methods underpinning many key tools/analyses in bioinformatics.

Debugging/Code Improvement Challenge

As another option, the challenge below is intended to test your understanding of the code style, Python syntax, and user interface design concepts introduced earlier in this material. Best tackled in pairs or small groups, it guides you through the (often sadly familiar) process of adapting someone else’s (or perhaps your own) poorly-written, poorly-documented code.

The Horror

This script is intended to count nucleotides in DNA sequences stored in FASTA format. Before you look at the sequence files we will test it on, open the script in your favourite editor and discuss ways in which it could be improved. Things to think about might include:

  • How easy is it to understand what the script does?
  • How robust is the script?
  • Does it follow good coding standards?
  • Does it do what it is supposed to?
  • What problems can you foresee, if the script were to be shared with others or applied to a different sequence file?

Now run the script on example_sequences1.fasta. Do you notice any more improvements that could be made?

What about if you run the script on example_sequences2.fasta?

Make a copy of the script (or start from scratch if you prefer!) and improve the code to make it

  • robust
  • portable between different computers/operating systems
  • shareable
  • easy to maintain/adapt
  • do what it is supposed to do!

(Note: You may be aware that Biopython and other libraries include functions and classes designed to work with sequence objects. It would be against the spirit of the exercise to use those libraries here.)

If you have time, try to further adapt the script to expand its functionality such that, given a file of protein sequences instead, it will produce counts of the different amino acids. You can use the file protein_sequences.fasta to test your script. You may also want to know that a DNA sequence can look confusingly like a protein sequence, thanks to IUPAC ambiguity codes.

Expected Output - example_sequences1.fasta

sequence_1
A: 14
C: 21
G: 9
T: 15
0.5084745762711864 # (you may prefer to improve the formatting of this output...)

sequence_2
A: 14
C: 10
G: 10
T: 8
0.47619047619047616

sequence_3
A: 20
C: 15
G: 7
T: 11
0.41509433962264153

sequence_4
A: 20
C: 19
G: 12
T: 9
0.5166666666666667

sequence_5
A: 30
C: 8
G: 5
T: 10
0.24528301886792453

Expected Output - example_sequences2.fasta

sequence_1
A: 14
C: 21
G: 9
T: 15
0.5084745762711864

sequence_2
A: 13
C: 13
G: 13
T: 0
0.6666666666666667

sequence_3
A: 15
C: 12
G: 8
T: 9
0.45454545454545453 # (if ignoring ambiguous codes in sequence length)
# (you may also want to include the ambiguous nucleiotide codes:)
D: 1
H: 1
K: 1
M: 1
0.4166666666666667 # (if including ambiguous codes in sequence length)

sequence_4
A: 48
C: 45
G: 37
T: 50
0.45555555555555555

sequence_5
A: 14
C: 14
G: 12
T: 13
0.49056603773584906

Expected Output - protein_sequences.fasta

sp|P05480|SRC_MOUSE Neuronal proto-oncogene tyrosine-protein kinase Src OS=Mus musculus GN=Src PE=1 SV=4
A: 40
C: 9
D: 22
E: 42
F: 21
G: 42
H: 9
I: 16
K: 32
L: 49
M: 11
N: 19
P: 33
Q: 23
R: 34
S: 38
T: 36
V: 33
W: 9
Y: 23 # (note lack of any additional output after residue counts for each sequence)

sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3
A: 42
C: 8
D: 26
E: 21
F: 27
G: 37
H: 18
I: 22
K: 23
L: 60
M: 11
N: 19
P: 37
Q: 21
R: 24
S: 44
T: 32
V: 32
W: 13
Y: 19

sp|P12931|SRC_HUMAN Proto-oncogene tyrosine-protein kinase Src OS=Homo sapiens GN=SRC PE=1 SV=3
A: 44
C: 9
D: 21
E: 42
F: 21
G: 43
H: 9
I: 16
K: 31
L: 49
M: 11
N: 18
P: 32
Q: 23
R: 32
S: 37
T: 36
V: 30
W: 9
Y: 23

Conclusion

Thank you for following this lesson. We hope you’ve found this course helpful and interesting, and learned plenty of new things to apply in your Python programming every day. If you have thoughts on how we could improve these materials, or additional content that could be included here, please give us your feedback. Your instructors have probably shared a link with you to a post-workshop survey, but you can also give us your comments and suggestions by filing an Issue on the source repository.

Key Points

  • There are many coding challenges to be found online, which can be used to exercise your Python skills.