๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿงฌ Biology/๋ฐ”์ด์˜ค ์ฝ”๋”ฉ ๋ฌธ์ œ

[ROSALIND] ๋‹จ๋ฐฑ์งˆ motif ์ฐพ๊ธฐ

by HelloRabbit 2023. 5. 23.
728x90

๋ฌธ์ œ ์„ค๋ช…

๋‹จ๋ฐฑ์งˆ์€ ๊ธฐ๋Šฅ์  ๋‹จ์œ„์ธ ๋‹จ๋ฐฑ์งˆ ๋„๋ฉ”์ธ(protein domain)์œผ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋‹ค. ๋„๋ฉ”์ธ๋งˆ๋‹ค ํ•˜๋‚˜์˜ ๊ธฐ๋Šฅ์ด ์•Œ๋ ค์ ธ ์žˆ๊ณ , ์ผ๋ฐ˜์ ์œผ๋กœ ๋‹จ๋ฐฑ์งˆ์€ ์—ฌ๋Ÿฌ ์—ญํ• ์„ํ•˜๊ธฐ ๋•Œ๋ฌธ์— 1๊ฐœ ์ด์ƒ์˜ ๋„๋ฉ”์ธ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ์ด๋ ‡๊ฒŒ ๊ฐ™์€ ๋„๋ฉ”์ธ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋‹จ๋ฐฑ์งˆ๋“ค์„ ๋ฌถ์–ด์„œ ์œ ์ „์ž๊ตฐ(gene/protein family)๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค.

๋‹จ๋ฐฑ์งˆ ๋„๋ฉ”์ธ์˜ ๊ธฐ๋Šฅ์„ ์ •์˜ํ•˜๋Š” ๋” ์ž‘์€ ๋‹จ์œ„๋กœ ๋ชจํ‹ฐํ”„(motif)๊ฐ€ ์กด์žฌํ•œ๋‹ค. ๋ชจํ‹ฐํ”„๋Š” ์ง„ํ™”์  ์ธก๋ฉด์œผ๋กœ ๋ดค์„ ๋•Œ๋„ ์ž˜ ๋ณด์กด๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค๋ฅธ ์ข… ๊ฐ„์—์„œ๋„ ์œ ์‚ฌํ•œ ๋ชจํ‹ฐํ”„์˜ ํ™•์ธ์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

๋‹จ๋ฐฑ์งˆ ์„œ์—ด์€ ์„ธ๊ณ„์ ์œผ๋กœ ๋‹ค์–‘ํ•œ ์—ฐ๊ตฌ์‹ค์—์„œ ๋ฐœ๊ฒฌ๋˜๊ณ  ์˜จ๋ผ์ธ ์ƒ์—์„œ๋Š” UniProt์— ๋‹จ๋ฐฑ์งˆ ๋ฐ์ดํ„ฐ๊ฐ€ ์Œ“์ด๊ณ  ์žˆ๋‹ค. ์—ฌ๊ธฐ์„œ ๋‹จ๋ฐฑ์งˆ์˜ ๊ตฌ์ฒด์ ์ธ ์„œ์—ด, ๊ธฐ๋Šฅ, ๋„๋ฉ”์ธ ๊ตฌ์กฐ, ๋‹จ๋ฐฑ์งˆ ๋ฒˆ์—ญ ํ›„ ๋ณ€ํ˜•(post-translational modification)๊ณผ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ์ •๋ณด๊ฐ€ ์ œ๊ณต๋œ๋‹ค. ์ด ์ •๋ณด๋ฅผ ์ด์šฉํ•˜์—ฌ ์—ฐ๊ตฌ์ž๋“ค์€ ๋‹จ๋ฐฑ์งˆ ์œ ์‚ฌ์„ฑ ๊ฒ€์ƒ‰, ์ข… ๋ถ„๋ฅ˜ ๋ถ„์„, ์ธ์šฉ๋œ ๋ฌธํ—Œ ๋“ฑ์„ ์ฐพ์•„๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

๋ฌธ์ œ (Finding a Protein Motif)

๋‹จ๋ฐฑ์งˆ ๋ชจํ‹ฐํ”„๋Š” ํ”ํžˆ ์ด๋ ‡๊ฒŒ ํ‘œ๊ธฐํ•œ๋‹ค:

[XY] = X ๋˜๋Š” Y ์•„๋ฏธ๋…ธ์‚ฐ
{X} = X๋ฅผ ์ œ์™ธํ•œ ๋ชจ๋“  ์•„๋ฏธ๋…ธ์‚ฐ

N-glycosylation ๋ชจํ‹ฐํ”„๋Š” ์ด๋ ‡๊ฒŒ ํ‘œ๊ธฐ๋œ๋‹ค:

N{P}[ST]{P}

Uniprot ์•„์ด๋””๊ฐ€ ์—ฌ๋Ÿฌ๊ฐœ ์ฃผ์–ด์กŒ์„ ๋•Œ, ํ•ด๋‹น ๋‹จ๋ฐฑ์งˆ ์„œ์—ด์— ํฌํ•จ๋œ N-glycosylation ๋ชจํ‹ฐํ”„์˜ ์œ„์น˜๋ฅผ ๋ชจ๋‘ ์ถœ๋ ฅํ•˜์‹œ์˜ค.

 

์˜ˆ์‹œ

A2Z669
B5ZC00
P07204_TRBM_HUMAN
P20840_SAG1_YEAST

์˜ˆ์ƒ ๊ฒฐ๊ณผ

B5ZC00
85 118 142 306 395
P07204_TRBM_HUMAN
47 115 116 382 409
P20840_SAG1_YEAST
79 109 135 248 306 348 364 402 485 501 614

 

ํ•ด๊ฒฐ

from urllib.request import urlopen
import re

with open("rosalind_mprt.txt", "r") as f:
    ids = f.readlines()

# N-glycosylation motif = N{P}[ST]{P}
for id in ids:
    id = id.strip()
    uniprot_id = id.split('_')[0]
    url = f"http://www.uniprot.org/uniprot/{uniprot_id}.fasta"
    response = urlopen(url)
    seq = ''.join(map(lambda x: x.decode('utf-8', 'ignore').replace('\n', ''), response.readlines()[1:]))
    
    res = re.search('N[^P][ST][^P]', seq)
    matches = [0]
    while res:
        start = matches[-1] + res.span()[0] + 1
        matches.append(start)
        res = re.search('N[^P][ST][^P]', seq[start:])
    
    if len(matches) > 1:
        print(id)
        print(' '.join(map(str, matches[1:])))

 

 

 

๋Œ“๊ธ€