๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿงฌ Biology/์ƒ๋ฌผ์ •๋ณดํ•™ ์•Œ๊ณ ๋ฆฌ์ฆ˜

k-mer๋กœ ํŒจํ„ด ๋นˆ๋„ ๊ตฌํ•˜๊ธฐ

by HelloRabbit 2023. 6. 14.
728x90

Goal

1. K-mer๋ž€?
2. K-mer๋กœ ์„œ์—ด ํŒจํ„ด ๋นˆ๋„ ๊ตฌํ•˜๊ธฐ

 

K-mer๋ž€?

์ƒ๋ฌผ์ •๋ณดํ•™์—์„œ k-mer๋ผ๋Š” ๋ง์„ ํ”ํžˆ ๋“ค์–ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค. K-mer๋ž€ ์‰ฝ๊ฒŒ ์–˜๊ธฐํ•ด์„œ k ์ˆซ์ž๋งŒํผ ๊ธธ์ด๋ฅผ ๊ฐ€์ง„ ์„œ์—ด์„ ์–˜๊ธฐํ•œ๋‹ค.

 

์˜ˆ๋ฅผ ๋“ค์–ด 3-mer๋ผ๋ฉด 3 bp ๊ธธ์ด๋ฅผ ๊ฐ€์ง„ "ATA", "ATT", "GCT", "AGT" ๋“ฑ 3๊ฐœ์˜ ์—ผ๊ธฐ๋กœ ์ด๋ฃจ์–ด์ง„ DNA ์„œ์—ด ๊ฐ™์€๊ฑธ ์–˜๊ธฐํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

 

K-mer๋กœ ํŒจํ„ด ๋นˆ๋„ ๊ตฌํ•˜๊ธฐ ๋ฌธ์ œ

DNA ๋ณต์ œ๋ฅผ ์‹œ์ž‘ํ•˜๋Š” ์‹œ์ ์„ origin of replication, ์ฆ‰ ori ๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค. Vibrio cholerae๋ผ๋Š” ๊ท ์˜ ori ์„œ์—ด์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

atcaatgatcaacgtaagcttctaagcatgatcaaggtgctcacacagtttatccacaacctgagtggatgacatcaagatag
gtcgttgtatctccttcctctcgtactctcatgaccacggaaagatgatcaagagaggatgatttcttggccatatcgcaatgaa
tacttgtgacttgtgcttccaattgacatcttcagcgccatattgcgctggccaaggtgacggagcgggattacgaaagcatg
atcatggctgttgttctgtttatcttgttttgactgagacttgttaggatagacggtttttcatcactgactagccaaagccttactc
tgcctgacatcgaccgtaaattgataatgaatttacatgcttccgcgacgatttacctcttgatcatcgatccgattgaagatctt
caattgttaattctcttgcctcgactcatagccatgatgagctcttgatcatgtttccttaaccctctattttttacggaagaatgat
caagctgctgctcttgatcatcgtttc

 

์œ„์˜ ori ์„œ์—ด์—์„œ ๋‹จ๋ฐฑ์งˆ์ด ๊ฒฐํ•ฉํ•ด ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•˜๋Š” ์„œ์—ด์ด "TGATCA"๋ผ๊ณ  ํ–ˆ์„ ๋•Œ, "TGATCA"๊ฐ€ ๋ช‡ ๊ตฐ๋ฐ ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๊ณ  ์‹ถ๋‹ค. ์ด๊ฒƒ์„ ์ฝ”๋“œ๋กœ ๊ตฌํ˜„ํ•  ๋•Œ k-mer์˜ ์›๋ฆฌ๋ฅผ ์ด์šฉํ•˜๋ฉด ๋˜๋Š”๋ฐ k-mer๋ฅผ sliding window ํ˜•์‹์œผ๋กœ ๋งŒ๋“ค์–ด ํ™•์ธํ•˜๋ฉด ๋œ๋‹ค.

 

์˜ˆ๋ฅผ ๋“ค์–ด, 3-mer sliding window๋ฅผ ๋งŒ๋“ ๋‹ค๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด 3 bp ๊ธธ์ด๋กœ ํ•œ ์นธ์”ฉ ์˜†์œผ๋กœ ์›€์ง์ด๋ฉด์„œ ํ•ด๋‹น ์„œ์—ด์˜ ๋ชจ๋“  3-mer ์„œ์—ด์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. (์•„๋ž˜ ๊ทธ๋ฆผ์€ "ATA"๋ผ๋Š” 3-mer๋ฅผ ์ฐพ๋Š” ์˜ˆ์‹œ์ด๋‹ค.)

์œ„์™€ ๊ฐ™์ด sliding window๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด 3-mer์˜ ์‹œ์ž‘์ ์€ 0์—์„œ๋ถ€ํ„ฐ 11๊นŒ์ง€์ธ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์ฆ‰, ์ „์ฒด ๊ธธ์ด์˜ 3์„ ๋บ€ ํฌ์ง€์…˜๊นŒ์ง€์˜ 3-mer ์„œ์—ด์„ ํ™•์ธํ•˜๋ฉด ๋œ๋‹ค.

์ฆ‰, ์ฐพ๊ณ  ์‹ถ์€ ํŒจํ„ด์˜ ๊ธธ์ด์ธ k ๋งŒํผ ์ „์ฒด ๊ธธ์ด์—์„œ ๋บ€ ํฌ์ง€์…˜๊นŒ์ง€์˜ k-mer๋ฅผ ํ™•์ธํ•˜๋ฉด ๋œ๋‹ค.

def PatternCount(text, pattern):
    count = 0
    text, pattern = text.upper(), pattern.upper()
    
    for i in range(0, len(text)-len(pattern)+1):
        if text[i:i+len(pattern)] == pattern:
            count += 1
    
    return count

 

 

์ฐธ๊ณ 

https://stepik.org/lesson/23143/step/7?unit=6783 

 

 

 

๋Œ“๊ธ€