Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Exercise 3: String Methods and Dictionaries


  1. A line in a file has this content: QR57613 1.3 Serpentes Pythonidae. Using Python, how could you count the number of words in the line?

  2. Another line has this content QR57613\t1.3\tSerpentes\tPythonidae\tPython regius. You are told that it comes from a tab separated values (TSV) file. How many fields (separated by tabs) are there in the line?

  3. You have the following output from the FINDHOM tool:

    findhom_result = """#FINDHOM v 1.2:
    Search results:
    Query\tMatch fraction\tScore\tSubject
    SMPL001\t0.7\t12331\tAQ10213 Phlebotomus perniciosus
    SMPL003\t0.5\t6032\tBZ102363 Phlebotomus papatasi
    SMPL004\t0.8\t13123\tRD178237 Sergentomyia dubia
    SMPL007\t0.6\t10610\tBQ187981 Phlebotomus papatasi"""

    How would you split the string in findhom_result into multiple lines?

  4. The .startswith() method of a string can be used to test if a string starts with a string. E.g. mystring.startswith('Name') tests if mystring starts with 'Name'. How many of the lines from findhom_result start with SMPL?

  5. The FINDHOM output consists of a prelude, then a header line and multiple lines of tab-separated data. Given the findhom_result data, write Python code to count the number of result lines in the findhom_result. Do not make any assumptions about the sample naming, i.e. do not assume that each result line starts with SMPL or similar.

  6. Using the data in findhom_result, write a function process_findhom that reads in a string like that from findhom_results, creates a query_to_subject dictionary which associate the value in the Query field with the species name in the Subject field. Here is an example of process_findhom being called and its results:

    process_findhom(findhom_result)
    {'SMPL001': 'AQ10213 Phlebotomus perniciosus', 'SMPL003': 'BZ102363 Phlebotomus papatasi', 'SMPL004': 'RD178237 Sergentomyia dubia', 'SMPL007': 'BQ187981 Phlebotomus papatasi'}
  7. In process_findhom what would happen if two queries had the same identifier (e.g. if SMPL003 )? In a real world example, can you think of how you would want this situation dealt with?