A line in a file has this content:
QR57613 1.3 Serpentes Pythonidae. Using Python, how could you count the number of words in the line?Another line has this content
QR57613\t1.3\tSerpentes\tPythonidae\tPython regius. You are told that it comes from a tab separated values (TSV) file. How many fields (separated by tabs) are there in the line?You have the following output from the FINDHOM tool:
findhom_result = """#FINDHOM v 1.2: Search results: Query\tMatch fraction\tScore\tSubject SMPL001\t0.7\t12331\tAQ10213 Phlebotomus perniciosus SMPL003\t0.5\t6032\tBZ102363 Phlebotomus papatasi SMPL004\t0.8\t13123\tRD178237 Sergentomyia dubia SMPL007\t0.6\t10610\tBQ187981 Phlebotomus papatasi"""How would you split the string in
findhom_resultinto multiple lines?The
.startswith()method of a string can be used to test if a string starts with a string. E.g.mystring.startswith('Name')tests ifmystringstarts with'Name'. How many of the lines fromfindhom_resultstart withSMPL?The FINDHOM output consists of a prelude, then a header line and multiple lines of tab-separated data. Given the
findhom_resultdata, write Python code to count the number of result lines in thefindhom_result. Do not make any assumptions about the sample naming, i.e. do not assume that each result line starts withSMPLor similar.Using the data in
findhom_result, write a functionprocess_findhomthat reads in a string like that fromfindhom_results, creates aquery_to_subjectdictionary which associate the value in the Query field with the species name in the Subject field. Here is an example ofprocess_findhombeing called and its results:process_findhom(findhom_result){'SMPL001': 'AQ10213 Phlebotomus perniciosus', 'SMPL003': 'BZ102363 Phlebotomus papatasi', 'SMPL004': 'RD178237 Sergentomyia dubia', 'SMPL007': 'BQ187981 Phlebotomus papatasi'}In
process_findhomwhat would happen if two queries had the same identifier (e.g. ifSMPL003)? In a real world example, can you think of how you would want this situation dealt with?