-
CS50 Week6: Problem Set, DNAProgramming/CS50 2023. 7. 21. 15:07
하버드 CS50 강의 6주차 Problem Set 과제 DNA 의 풀이를 다룹니다.
DNA 검식 결과를 바탕으로 사람을 찾아내는 과제입니다.
csv 데이터를 다루기 위해 파이썬의 판다스 라이브러리를 사용한 풀이입니다.Task
$ python dna.py databases/large.csv sequences/5.txt Lavender
Code
예전에 학교 수업에서 판다스를 써본 적이 있어서 이번에도 써 봤다.
import sys import pandas as pd def main(): # TODO: Check for command-line usage if len(sys.argv) != 3: print("Usage python dna.py data.csv sequence.txt") sys.exit(1) # TODO: Read database file into a variable DNA_DB = pd.read_csv(sys.argv[1], index_col="name") # TODO: Read DNA sequence file into a variable with open(sys.argv[2], 'r') as file: DNA_sequence = file.read() # TODO: Find longest match of each STR in DNA sequence subsequences = DNA_DB.columns profile = [longest_match(DNA_sequence, subsequence) for subsequence in subsequences] # TODO: Check database for matching profiles match = DNA_DB.loc[DNA_DB.apply(lambda row: row.tolist() == profile, axis=1)].index if len(match) == 0: print("No match") else: print(match[0]) return def longest_match(sequence, subsequence): """Returns length of longest run of subsequence in sequence.""" # Initialize variables longest_run = 0 subsequence_length = len(subsequence) sequence_length = len(sequence) # Check each character in sequence for most consecutive runs of subsequence for i in range(sequence_length): # Initialize count of consecutive runs count = 0 # Check for a subsequence match in a "substring" (a subset of characters) within sequence # If a match, move substring to next potential match in sequence # Continue moving substring and checking for matches until out of consecutive matches while True: # Adjust substring start and end start = i + count * subsequence_length end = start + subsequence_length # If there is a match in the substring if sequence[start:end] == subsequence: count += 1 # If there is no match in the substring else: break # Update most consecutive matches found longest_run = max(longest_run, count) # After checking for runs at each character in seqeuence, return longest run found return longest_run main()