๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿงฌ Biology/๋ฐ”์ด์˜ค ์ฝ”๋”ฉ ๋ฌธ์ œ

[ROSALIND] GC ๋น„์œจ

by HelloRabbit 2023. 5. 8.
728x90

๋ฌธ์ œ (ํ’€์–ด๋ณด๊ธฐ)

GC ๋น„์œจ์ด๋ž€ DNA ์—ผ๊ธฐ์„œ์—ด ์ „์ฒด ์ค‘ 'G'์™€ 'C'๊ฐ€ ์ฐจ์ง€ํ•˜๋Š” ๋น„์œจ์ด๋‹ค.
์ตœ๋Œ€ 10๊ฐœ์˜ DNA ์„œ์—ด์ด ์ฃผ์–ด์กŒ์„ ๋•Œ, ๊ฐ€์žฅ ๋†’์€ GC ๋น„์œจ์„ ๊ฐ€์ง„ ์„œ์—ด์˜ ์ด๋ฆ„๊ณผ GC ๋น„์œจ(%)์„ ๊ตฌํ•˜์‹œ์˜ค.

์˜ˆ์‹œ

>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
>Rosalind_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT
ATATCCATTTGTCAGCAGACACGC
>Rosalind_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC
TGGGAACCTGCGGGCAGTAGGTGGAAT

์˜ˆ์ƒ ๊ฒฐ๊ณผ

Rosalind_0808
60.919540

 

ํ•ด๊ฒฐ

from Bio import SeqIO

max_gc = 0
max_gc_id = ""
for record in SeqIO.parse("rosalind_gc.txt", "fasta"):
    id, seq = record.id, record.seq.upper()
    gc_content = (seq.count('G') + seq.count('C')) / len(seq) *100      # GC(seq)
    if gc_content > max_gc:
        max_gc_id = id
        max_gc = gc_content

print(max_gc_id)
print(max_gc)

ํŒŒ์ด์ฌ์—์„œ๋Š” SeqIO ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด Fasta ํŒŒ์ผ์„ ์‰ฝ๊ฒŒ ์ฝ์–ด๋“ค์ผ ์ˆ˜ ์žˆ๋‹ค.

Fasta ํŒŒ์ผ์ด๋ž€ ์œ„์˜ ์˜ˆ์‹œ์™€ ๊ฐ™์ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ˜•์‹์œผ๋กœ ๋œ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

>DNA์„œ์—ด์ด๋ฆ„1
DNA์„œ์—ด์ด๋ฆ„1์˜ ์—ผ๊ธฐ์„œ์—ด
>DNA์„œ์—ด์ด๋ฆ„2
DNA์„œ์—ด์ด๋ฆ„2์˜ ์—ผ๊ธฐ์„œ์—ด
>DNA์„œ์—ด์ด๋ฆ„3
DNA์„œ์—ด์ด๋ฆ„3์˜ ์—ผ๊ธฐ์„œ์—ด

'>' ๋‹ค์Œ์œผ๋กœ๋Š” ์„œ์—ด์˜ ์ด๋ฆ„์ด ๋จผ์ € ๋‚˜์˜ค๊ณ , ๊ทธ ๋‹ค์Œ ์ค„์—๋Š” ํ•ด๋‹นํ•˜๋Š” ์„œ์—ด์ด ๋‚˜์˜จ๋‹ค.

 

GC ๋น„์œจ์ด๋ž€ DNA์˜ ์ „์ฒด ๊ธธ์ด์—์„œ G์™€ C์˜ ๊ฐœ์ˆ˜์˜ ๋น„์œจ์„ ๊ตฌํ•œ ๊ฒƒ์ด๋‹ค. 

 

 

 

๋Œ“๊ธ€