Faster algorithm for lexicographic comparison of DNA strings

1 year ago

#340678

nunya

I'm trying to find a faster way to do the following:

Given a list of DNA strings x = ([s1, s2, s3, s4...]) (where the strings can only consist of the letters 'A', 'T', 'C', and 'G') and a list of index pairs y = ([[i, j], [i, j], [i, j]....]) find a function that returns whether or not x[i] is lexicographically less than x[j] for each [i, j] index pair. That is, find an algorithm that does the same thing as:

def compare(x, y):
   return [x[i]<x[j] for i, j in y]

but with better time complexity than the example provided. Moreover, we should assume that if len(x[i]) < len(x[j]), then x[i]<x[j] evaluates to True.

Here's an example of what the algorithm takes in as input and what it provides as output:

>>> compare(x=["B", "BB", "BC"], y=[(0, 1), (1, 2), (2, 0)])
[True, True, False]

'B' < 'BB' : True, because len('B') < len('BB')

'BB' < 'BC' : True because 'BB' comes before 'BC' in lexicographic order (the first position in both strings are the same, thus we need to compare second positions of both strings)

'BC' < 'B' : False because len('BC') > len('B')

I've already tried utilizing a tester function as follows:

def tester_2(x, y):
    return [True if len(x[i])<len(x[j]) else False if len(x[i])>len(x[j]) else x[i]<x[j] for i, j in y]

but for whatever reason, it runs slower than the reference example that we need to be faster than.

Any help would be greatly appreciated!

python

algorithm

time-complexity

dna-sequence

lexicographic

0 Answers

Your Answer

Posts

Questions

Blogs

Jobs