๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿงฌ Biology/๋ฐ”์ด์˜ค ์ฝ”๋”ฉ ๋ฌธ์ œ

[ROSALIND] Open reading frame (6-frame ๋ฒˆ์—ญํ•˜๊ธฐ)

by HelloRabbit 2023. 5. 30.
728x90

๋ฌธ์ œ ์„ค๋ช…

DNA์—์„œ ์ƒ๋ณด์ ์ธ mRNA ์„œ์—ด์„ ๋งŒ๋“ค๊ณ , mRNA๊ฐ€ ๋‹จ๋ฐฑ์งˆ๋กœ ๋ฒˆ์—ญ๋”˜๋‹ค. mRNA ์„œ์—ด์ด ๋‹จ๋ฐฑ์งˆ๋กœ ๋ฒˆ์—ญ๋  ๋•Œ ํ•ญ์ƒ mRNA์˜ ์ „์ฒด ์„œ์—ด์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด๋‹ค. mRNA ์„œ์—ด ์ค‘ start codon (AUG)๊ฐ€ ์žˆ๋Š” ๋ชจ๋“  ๊ณณ์—์„œ ๋ฒˆ์—ญ์ด ๊ฐ€๋Šฅํ•˜๊ณ  ๋ชจ๋“  end codon (UAG, UAA, UGA)์—์„œ ๋ฒˆ์—ญ์ด ์ข…๋ฃŒ๋œ๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ mRNA์—์„œ ๋ฒˆ์—ญ์ด ์‹œ์ž‘๋  ์ˆ˜ ์žˆ๋Š” ์‹œ์ ์€ ์ฒซ๋ฒˆ์งธ, ๋‘๋ฒˆ์งธ, ์„ธ๋ฒˆ์งธ ์—ผ๊ธฐ์„œ์—ด์ด ์žˆ๋‹ค. ์ด๊ฒƒ์„ three frame translation์ด๋ผ ๋ถ€๋ฅธ๋‹ค.

ํ•˜์ง€๋งŒ DNA์—๋Š” ์ƒ๋ณด์ ์ธ ๋ฐ˜๋Œ€ ์„œ์—ด์ด ์กด์žฌํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ด ์„œ์—ด๊นŒ์ง€ ๊ณ ๋ คํ•œ๋‹ค๋ฉด ๋‹จ๋ฐฑ์งˆ ๋ฒˆ์—ญ์˜ ์‹œ์ž‘์ด ๊ฐ€๋Šฅํ•œ ๊ณณ์€ ์ด6๊ณณ์ด๋‹ค. ์ด๋ ‡๊ฒŒ ๋ฒˆ์—ญ์ด ๊ฐ€๋Šฅํ•œ ๊ตฌ๊ฐ„์„ open reading frame (ORF)์ด๋ผ ๋ถ€๋ฅธ๋‹ค. ์ด๋ฒˆ ๋ฌธ์ œ์—์„œ๋Š” six frame translation์„ ์ฝ”๋“œ๋กœ ๊ตฌํ˜„ํ•˜๋Š” ๋ฌธ์ œ์ด๋‹ค.  

DNA -> RNA -> Protein


๋ฌธ์ œ (Open Reading Frames)

์ตœ๋Œ€ ๊ธธ์ด๊ฐ€ 1 kbp์ธ DNA ์„œ์—ด์ด FASTA ํŒŒ์ผ ํ˜•์‹์œผ๋กœ ์ฃผ์–ด์กŒ์„ ๋•Œ ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ๋‹จ๋ฐฑ์งˆ ์„œ์—ด์„ ์ถœ๋ ฅํ•˜์‹œ์˜ค.

์˜ˆ์‹œ

>Rosalind_99
AGCCATGTAGCTAACTCAGGTTACATGGGGATGACCCCGCGACTTGGATTAGAGTCTCTTTTGGAATAAGCCTGAATGATCCGAGTAGCATCTCAG

์˜ˆ์ƒ ๊ฒฐ๊ณผ

MLLGSFRLIPKETLIQVAGSSPCNLS
M
MGMTPRLGLESLLE
MTPRLGLESLLE

 

ํ•ด๊ฒฐ

# RNA ์„œ์—ด์„ ์ž…๋ ฅ๊ฐ’์œผ๋กœ ์ฃผ์—ˆ์„ ๋•Œ ์ƒ๋ณด์ ์ธ RNA ์„œ์—ด์„ ๋ฐ˜ํ™˜ํ•˜๋Š” ํ•จ์ˆ˜
def rev_complement_mrna(s):
    return s[::-1].upper().replace('A', 'u').replace('U', 'a').replace('G', 'c').replace('C', 'g').upper()

codon = {}
# ๊ฐ codon์— ๋งค์น˜๋˜๋Š” ์•„๋ฏธ๋…ธ์‚ฐ ๋”•์…”๋„ˆ๋ฆฌ ๋งŒ๋“ค๊ธฐ
with open("aa_codon.txt", "r") as f:
    for line in f.readlines():
        aa = line.split()
        for i in range(0, len(aa), 2):
            codon[aa[i]] = aa[i+1]

# 6 frame์„ ๊ณ ๋ คํ•ด์„œ RNA ์„œ์—ด์„ ๋‹จ๋ฐฑ์งˆ๋กœ ๋ฒˆ์—ญํ•˜๋Š” ํ•จ์ˆ˜
def six_frame_translation(rna):
    rev_rna = rev_complement_mrna(rna)
    proteins = set()    # ๋˜‘๊ฐ™์€ ๋‹จ๋ฐฑ์งˆ ์„œ์—ด ์ œ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด set()์„ ์‚ฌ์šฉ

    # 3 frame๋งŒ ๊ณ ๋ คํ•ด์„œ ๋‹จ๋ฐฑ์งˆ๋กœ ๋ฒˆ์—ญํ•˜๋Š” ํ•จ์ˆ˜
    def three_frame_translation(rna):
        for i in [0,1,2]:
            protein = ''
            for j in range(i, len(rna)-3, 3):
                aa = rna[j:j+3]
                
                # ๋‹จ๋ฐฑ์งˆ ์„œ์—ด์ด ๋งŒ๋“ค์–ด์ง€๋Š” ์ค‘์ด๋ฉด ์ด์–ด๋‚˜๊ฐ€๊ธฐ
                if protein:
                    if codon[aa] == 'Stop':
                        proteins.add(protein)
                        protein = ''
                    else:
                        protein += codon[aa]
                
                # ๋‹จ๋ฐฑ์งˆ ์„œ์—ด์ด ์—†๊ณ  codon์ด AUG์ผ ๋•Œ ๋‹จ๋ฐฑ์งˆ ๋ฒˆ์—ญ ์‹œ์ž‘ํ•˜๊ธฐ
                elif protein == '' and aa == 'AUG':
                    protein += codon[aa]
    
    three_frame_translation(rna)
    three_frame_translation(rev_rna)
    
    frags = set()
    for protein in proteins:
        temp = protein.split('M')
        # ๊ฐ ๋‹จ๋ฐฑ์งˆ ์„œ์—ด๋งˆ๋‹ค ์ค‘๊ฐ„์— M์„ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค๋ฉด ์ถ”๊ฐ€์ ์ธ ๋‹จ๋ฐฑ์งˆ ์„œ์—ด๋กœ ์ธ์‹ํ•˜๊ธฐ
        if len(temp) > 2:
            for i in range(2, len(temp)):
                frags.add('M' + ''.join(temp[i:]))

    # proteins์™€ frags์— ์žˆ๋Š” ๋‹จ๋ฐฑ์งˆ ์„œ์—ด ๋ชจ๋‘ ๋ฐ˜ํ™˜ํ•˜๊ธฐ
    return proteins | frags

# ์˜ˆ์‹œ ๊ฐ€์ ธ์˜ค๊ธฐ
with open('rosalind_orf.txt', 'r') as f:
    seq = ''
    for line in f.readlines():
        if not line.startswith(">"):
            seq += line.strip()

    rna = seq.upper().replace('T', 'U')

    proteins = six_frame_translation(rna)
    for protein in proteins:
        print(protein)

aa_codon.txt
0.00MB

 

 

 

๋Œ“๊ธ€