1 year ago
#340678
nunya
Faster algorithm for lexicographic comparison of DNA strings
I'm trying to find a faster way to do the following:
Given a list of DNA strings x = ([s1, s2, s3, s4...]) (where the strings can only consist of the letters 'A', 'T', 'C', and 'G') and a list of index pairs y = ([[i, j], [i, j], [i, j]....]) find a function that returns whether or not x[i] is lexicographically less than x[j] for each [i, j] index pair. That is, find an algorithm that does the same thing as:
def compare(x, y):
return [x[i]<x[j] for i, j in y]
but with better time complexity than the example provided. Moreover, we should assume that if len(x[i]) < len(x[j]), then x[i]<x[j] evaluates to True.
Here's an example of what the algorithm takes in as input and what it provides as output:
>>> compare(x=["B", "BB", "BC"], y=[(0, 1), (1, 2), (2, 0)])
[True, True, False]
'B' < 'BB' : True, because len('B') < len('BB')
'BB' < 'BC' : True because 'BB' comes before 'BC' in lexicographic order (the first position in both strings are the same, thus we need to compare second positions of both strings)
'BC' < 'B' : False because len('BC') > len('B')
I've already tried utilizing a tester function as follows:
def tester_2(x, y):
return [True if len(x[i])<len(x[j]) else False if len(x[i])>len(x[j]) else x[i]<x[j] for i, j in y]
but for whatever reason, it runs slower than the reference example that we need to be faster than.
Any help would be greatly appreciated!
python
algorithm
time-complexity
dna-sequence
lexicographic
0 Answers
Your Answer