Introduction to Python For Bioinformatics

These pages contain the course materials for the SANBI introductory course on Python for bioinformatics. While you might find these materials useful even if you are not a student of bioinformatics, the examples chosen wil emphasis the use of the Python programming language in bioinformatics.

The course will use two environments. We will start with the Jupyter Lab environment, a web-based environment for Python coding that uses “notebooks” for data analysis. While you can install Jupyter Lab on your own computer, we will provide a Jupyter Lab environment for you so that all you need to get started is a web browser. This environment is available at https://jupyterhub.sanbi.ac.za/jupyterhub. You need a login to use this environment and these will be provided during the course.

We have chosen the Jupyter environment because it is a powerful and user-friendly environment for exploratory data processing and also because a similar environment is provided by the Ilifu research computing environment. The Ilifu environment is much larger than the environment we are using for teaching and we recommend that for your own research you use either your own computer or, if you have access, the Ilifu environment.

The second environment introduced during the course will be Python scripts on your own computer. Writing code in scripts allows us to create modules of code that can be re-used, either by ourselves or others. These scripts will be your first steps into software engineering, a descipline that deals with the design, development, testing and maintenance of software applications. As scientific research involves the use of such applications, understanding how to discipline ourselves to build them, starting with simple scripts, is an essential skill in the modern scientist’s toolbox.

Course contents¶

The course content is largely based on Justin Bois’s Introduction to Programming in the Biological Sciences Bootcamp, with adaptions for our teaching environment and goals at SANBI.

Once we start getting into practical Python, we will use the output of a fictional tool, FINDHOM, to guide our learning. FINDHOM finds homology between sequences. The lab has been doing surveillance at several sites, collecting sandflies, extracting DNA from the captured flies. After DNA is extracted, part of the coi gene is amplified using PCR and then sequenced. FINDHOM is run on these DNA sequences to find matches between the samples and a database of known Phlebotomus and Sergentomyia coi gene sequences. More will be described in the course about the output generated by this fictional tool, for now it is simply worth noting that Python is an excellent language for reading text (like that output by most bioinformatics tools) and summarising the important information contained within it.

Python for Bioinformatics

Course contents¶