๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿงฌ Biology/๋ฐ”์ด์˜ค ์ฝ”๋”ฉ ๋ฌธ์ œ

[ROSALIND] DNA๊ฐ€ ๊ณต์œ ํ•˜๋Š” motif ์ฐพ๊ธฐ

by HelloRabbit 2023. 5. 25.
728x90

๋ฌธ์ œ ์„ค๋ช…

์œ ์ „์ž(Gene)๋ž€ ๋‹จ๋ฐฑ์งˆ๋กœ ๋ฒˆ์—ญ๋  ์ˆ˜ ์žˆ๋Š” DNA์˜ ์˜์—ญ์„ ์–˜๊ธฐํ•œ๋‹ค. ๋ชจํ‹ฐํ”„(Motif)๋Š” ์œ ์ „์ž์˜ ๊ธฐ๋Šฅ๊ณผ ๊ด€๋ จ์žˆ๋Š” DNA์˜ ์ž‘์€ ๋‹จ์œ„์ธ๋ฐ ๋ชจํ‹ฐํ”„ ์„œ์—ด์€ ์ž˜ ๋ณด์กด๋ผ ์žˆ์–ด ๋ชจํ‹ฐํ”„๋ฅผ ํ†ตํ•ด ์„œ๋กœ ๋‹ค๋ฅธ ์ข… ๊ฐ„์˜ DNA ์œ ์‚ฌ์„ฑ์„ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ๋‹ค.

์ด๋ฒˆ ๋ฌธ์ œ์—์„œ๋Š” ์—ฌ๋Ÿฌ DNA ์„œ์—ด๋“ค ๊ฐ„์— ๊ณต์œ ํ•˜๊ณ  ์žˆ๋Š” ์ตœ๋Œ€ํ•œ ๊ธด ๋ชจํ‹ฐํ”„๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด๋‹ค(๋ชจํ‹ฐํ”„๊ฐ€ ๊ธธ์ˆ˜๋ก ๊ณต์œ ํ•˜๋Š” ๊ธฐ๋Šฅ ๋˜ํ•œ ๋” ์œ ์‚ฌํ•˜๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค!).

๋ฌธ์ œ (Finding a Shared Motif)

์ตœ๋Œ€ 100๊ฐœ์˜ DNA ์„œ์—ด( <= 1kbp)์ด FASTA ํŒŒ์ผ ํฌ๋งท์œผ๋กœ ์ฃผ์–ด์กŒ์„ ๋•Œ ๋ชจ๋“  DNA ์„œ์—ด์ด ๊ณต์œ ํ•˜๊ณ  ์žˆ๋Š” ๊ฐ€์žฅ ๊ธด ๋ฌธ์ž์—ด์„ ์ถœ๋ ฅํ•˜์‹œ์˜ค.

์˜ˆ์‹œ

>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA

์˜ˆ์ƒ ๊ฒฐ๊ณผ

AC

 

ํ•ด๊ฒฐ

def find_kmers(seq, k):
    return sorted([seq[i:i+k] for i in range(len(seq)-k+1)], reverse=True)

def find_shared_motif(dnas):
    dnas = sorted(dnas, key=lambda x: len(x))
    motifs = [dnas[0]]
    k = len(dnas[0])

    while motifs:
        motif = motifs.pop()
        included = 1
        for dna in dnas:
            if motif not in dna:
                included = 0
                break
        
        if included:
            break
        
        if not motifs and k-1 > 0:
            k -= 1
            motifs = find_kmers(dnas[0], k)

    return motif

dnas = []
with open("rosalind_lcsm.txt", "r") as f:
    lines = f.readlines()
    seq = ''
    for line in lines:
        if line.startswith(">"):
            if seq:
                dnas.append(seq)
            seq = ''
        else:
            seq += line.strip()
            
print(find_shared_motif(dnas))

 

 

 

๋Œ“๊ธ€