๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

๐Ÿงฌ Biology/๋ฐ”์ด์˜ค ์ฝ”๋”ฉ ๋ฌธ์ œ20

[ROSALIND] DNA๊ฐ€ ๊ณต์œ ํ•˜๋Š” motif ์ฐพ๊ธฐ ๋ฌธ์ œ ์„ค๋ช… ์œ ์ „์ž(Gene)๋ž€ ๋‹จ๋ฐฑ์งˆ๋กœ ๋ฒˆ์—ญ๋  ์ˆ˜ ์žˆ๋Š” DNA์˜ ์˜์—ญ์„ ์–˜๊ธฐํ•œ๋‹ค. ๋ชจํ‹ฐํ”„(Motif)๋Š” ์œ ์ „์ž์˜ ๊ธฐ๋Šฅ๊ณผ ๊ด€๋ จ์žˆ๋Š” DNA์˜ ์ž‘์€ ๋‹จ์œ„์ธ๋ฐ ๋ชจํ‹ฐํ”„ ์„œ์—ด์€ ์ž˜ ๋ณด์กด๋ผ ์žˆ์–ด ๋ชจํ‹ฐํ”„๋ฅผ ํ†ตํ•ด ์„œ๋กœ ๋‹ค๋ฅธ ์ข… ๊ฐ„์˜ DNA ์œ ์‚ฌ์„ฑ์„ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฒˆ ๋ฌธ์ œ์—์„œ๋Š” ์—ฌ๋Ÿฌ DNA ์„œ์—ด๋“ค ๊ฐ„์— ๊ณต์œ ํ•˜๊ณ  ์žˆ๋Š” ์ตœ๋Œ€ํ•œ ๊ธด ๋ชจํ‹ฐํ”„๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด๋‹ค(๋ชจํ‹ฐํ”„๊ฐ€ ๊ธธ์ˆ˜๋ก ๊ณต์œ ํ•˜๋Š” ๊ธฐ๋Šฅ ๋˜ํ•œ ๋” ์œ ์‚ฌํ•˜๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค!). ๋ฌธ์ œ (Finding a Shared Motif) ์ตœ๋Œ€ 100๊ฐœ์˜ DNA ์„œ์—ด( Rosalind_1 GATTACA >Rosalind_2 TAGACCA >Rosalind_3 ATACA ์˜ˆ์ƒ ๊ฒฐ๊ณผ AC ํ•ด๊ฒฐ def find_kmers(seq, k): return sorted([seq[i:i+k] f.. 2023. 5. 25.
[ROSALIND] ๋‹จ๋ฐฑ์งˆ motif ์ฐพ๊ธฐ ๋ฌธ์ œ ์„ค๋ช… ๋‹จ๋ฐฑ์งˆ์€ ๊ธฐ๋Šฅ์  ๋‹จ์œ„์ธ ๋‹จ๋ฐฑ์งˆ ๋„๋ฉ”์ธ(protein domain)์œผ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋‹ค. ๋„๋ฉ”์ธ๋งˆ๋‹ค ํ•˜๋‚˜์˜ ๊ธฐ๋Šฅ์ด ์•Œ๋ ค์ ธ ์žˆ๊ณ , ์ผ๋ฐ˜์ ์œผ๋กœ ๋‹จ๋ฐฑ์งˆ์€ ์—ฌ๋Ÿฌ ์—ญํ• ์„ํ•˜๊ธฐ ๋•Œ๋ฌธ์— 1๊ฐœ ์ด์ƒ์˜ ๋„๋ฉ”์ธ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ์ด๋ ‡๊ฒŒ ๊ฐ™์€ ๋„๋ฉ”์ธ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋‹จ๋ฐฑ์งˆ๋“ค์„ ๋ฌถ์–ด์„œ ์œ ์ „์ž๊ตฐ(gene/protein family)๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค. ๋‹จ๋ฐฑ์งˆ ๋„๋ฉ”์ธ์˜ ๊ธฐ๋Šฅ์„ ์ •์˜ํ•˜๋Š” ๋” ์ž‘์€ ๋‹จ์œ„๋กœ ๋ชจํ‹ฐํ”„(motif)๊ฐ€ ์กด์žฌํ•œ๋‹ค. ๋ชจํ‹ฐํ”„๋Š” ์ง„ํ™”์  ์ธก๋ฉด์œผ๋กœ ๋ดค์„ ๋•Œ๋„ ์ž˜ ๋ณด์กด๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค๋ฅธ ์ข… ๊ฐ„์—์„œ๋„ ์œ ์‚ฌํ•œ ๋ชจํ‹ฐํ”„์˜ ํ™•์ธ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ๋‹จ๋ฐฑ์งˆ ์„œ์—ด์€ ์„ธ๊ณ„์ ์œผ๋กœ ๋‹ค์–‘ํ•œ ์—ฐ๊ตฌ์‹ค์—์„œ ๋ฐœ๊ฒฌ๋˜๊ณ  ์˜จ๋ผ์ธ ์ƒ์—์„œ๋Š” UniProt์— ๋‹จ๋ฐฑ์งˆ ๋ฐ์ดํ„ฐ๊ฐ€ ์Œ“์ด๊ณ  ์žˆ๋‹ค. ์—ฌ๊ธฐ์„œ ๋‹จ๋ฐฑ์งˆ์˜ ๊ตฌ์ฒด์ ์ธ ์„œ์—ด, ๊ธฐ๋Šฅ, ๋„๋ฉ”์ธ ๊ตฌ์กฐ, ๋‹จ๋ฐฑ์งˆ ๋ฒˆ์—ญ ํ›„ ๋ณ€ํ˜•(po.. 2023. 5. 23.
์ •๊ทœํ‘œํ˜„์‹ Regular Expression Goal 1. ์ •๊ทœํ‘œํ˜„์‹์ด๋ž€? 2. ํŒŒ์ด์ฌ์—์„œ ์ •๊ทœํ‘œํ˜„์‹ ์“ฐ๋Š” ๋ฐฉ๋ฒ• ์•Œ์•„๋ณด๊ธฐ ์ •๊ทœํ‘œํ˜„์‹์ด๋ž€? ์ •๊ทœํ‘œํ˜„์‹ (Regular expression)์€ ํ…์ŠคํŠธ์—์„œ ํŒจํ„ด ์ฐพ๊ธฐ๋ฅผ ํ•  ๋•Œ ์œ ์šฉํ•œ ๋„๊ตฌ์ด๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์—ฌ๋Ÿฌ RNA ์„œ์—ด ์ค‘ "AUG"๊ฐ€ ํฌํ•จ ๋œ ์„œ์—ด๋งŒ ์ถœ๋ ฅํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด ์ •๊ทœํ‘œํ˜„์‹์„ ์จ์„œ ์•„์ฃผ ๊ฐ„๋‹จํ•˜๊ฒŒ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค. ํŒŒ์ด์ฌ์˜ ์ •๊ทœํ‘œํ˜„์‹ ํŒŒ์ด์ฌ์—์„œ๋Š” re ๋ผ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ด์šฉํ•œ๋‹ค. re ๋ชจ๋“ˆ์˜ ๋ฉ”์„œ๋“œ ๊ธฐ๋Šฅ ์˜ˆ์‹œ findall("ํŒจํ„ด", ๋ฌธ์ž์—ด) ํŒจํ„ด์ด ์ผ์น˜ํ•˜๋Š” ๋ชจ๋“ ๊ฑธ ๋ฆฌ์ŠคํŠธ ํ˜•ํƒœ๋กœ ๋ฐ˜ํ™˜ import re rna = "AUGCCAUGCUGA" first_start = re.search("AUG", rna) print(first_start) # search("ํŒจํ„ด", ๋ฌธ์ž์—ด) ํŒจํ„ด์ด ์ผ์น˜ํ•˜๋Š” ๋ถ€๋ถ„์„ objec.. 2023. 5. 23.
URL๋กœ FASTA ์„œ์—ด ๊ฐ€์ ธ์˜ค๊ธฐ ๋ณดํ˜ธ๋˜์–ด ์žˆ๋Š” ๊ธ€ ์ž…๋‹ˆ๋‹ค. 2023. 5. 22.
[ROSALIND] ๋‹จ๋ฐฑ์งˆ ์„œ์—ด๋กœ ๋ฒˆ์—ญํ•˜๊ธฐ ๋ฌธ์ œ (ํ’€์–ด๋ณด๊ธฐ) RNA ์„œ์—ด์ด ์ฃผ์–ด์กŒ์„ ๋•Œ ๋‹จ๋ฐฑ์งˆ ์„œ์—ด๋กœ ๋ฒˆ์—ญํ•˜์‹œ์˜ค. ๋‹ค๋งŒ, RNA ์„œ์—ด์€ ํ•ญ์ƒ AUG๋กœ ์‹œ์ž‘ํ•˜๊ณ  stop codon์œผ๋กœ ๋๋‚˜๊ธฐ ๋•Œ๋ฌธ์— 3 frame์„ ๊ณ ๋ คํ•  ํ•„์š”๋Š” ์—†๋‹ค. ์˜ˆ์‹œ AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA ์˜ˆ์ƒ ๊ฒฐ๊ณผ MAMAPRTEINSTRING ํ•ด๊ฒฐ codon = {} with open("aa_codon.txt", "r") as f: for line in f.readlines(): aa = line.split() for i in range(0, len(aa), 2): codon[aa[i]] = aa[i+1] def translation(rna): protein = '' for i in range(0, len(rna), .. 2023. 5. 14.
[ROSALIND] DNA ๋ณ€์ด ๊ฐœ์ˆ˜ ์„ธ๊ธฐ ๋ฌธ์ œ (ํ’€์–ด๋ณด๊ธฐ) Hamming distance๋ž€ ๊ฐ™์€ ๊ธธ์ด๋ฅผ ๊ฐ€์ง„ 2๊ฐœ์˜ ๋ฌธ์ž์—ด ๊ฐ„์˜ ๋‹ค๋ฅธ ๋ฌธ์ž ๊ฐœ์ˆ˜๋ฅผ ์–˜๊ธฐํ•ฉ๋‹ˆ๋‹ค. DNA ์—ผ๊ธฐ์„œ์—ด 2๊ฐœ๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, Hamming distance๋ฅผ ๊ตฌํ•˜์‹œ์˜ค. ์˜ˆ์‹œ GAGCCTACTAACGGGAT CATCGTAATGACGGCCT ์˜ˆ์ƒ ๊ฒฐ๊ณผ 7 ํ•ด๊ฒฐ seq1, seq2 = '', '' with open("rosalind_hamm.txt", "r") as file: seq1 = file.readline().strip() seq2 = file.readline().strip() print(sum(seq1[i] != seq2[i] for i in range(len(seq1)))) 2023. 5. 9.
[ROSALIND] GC ๋น„์œจ ๋ฌธ์ œ (ํ’€์–ด๋ณด๊ธฐ) GC ๋น„์œจ์ด๋ž€ DNA ์—ผ๊ธฐ์„œ์—ด ์ „์ฒด ์ค‘ 'G'์™€ 'C'๊ฐ€ ์ฐจ์ง€ํ•˜๋Š” ๋น„์œจ์ด๋‹ค. ์ตœ๋Œ€ 10๊ฐœ์˜ DNA ์„œ์—ด์ด ์ฃผ์–ด์กŒ์„ ๋•Œ, ๊ฐ€์žฅ ๋†’์€ GC ๋น„์œจ์„ ๊ฐ€์ง„ ์„œ์—ด์˜ ์ด๋ฆ„๊ณผ GC ๋น„์œจ(%)์„ ๊ตฌํ•˜์‹œ์˜ค. ์˜ˆ์‹œ >Rosalind_6404 CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC TCCCACTAATAATTCTGAGG >Rosalind_5959 CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT ATATCCATTTGTCAGCAGACACGC >Rosalind_0808 CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC TGGG.. 2023. 5. 8.
[ROSALIND] ํ”ผ๋ณด๋‚˜์น˜ ์ˆซ์ž ๋ฌธ์ œ ํ”ผ๋ณด๋‚˜์น˜ ์ˆซ์ž 0,1,1,2,3,5,8,13,21,34,… ์˜ ํŒจํ„ด์€ ๋งค์šฐ ๊ฐ„๋‹จํ•˜๋‹ค. ๊ทธ ๋‹ค์Œ ์˜ฌ ์ˆซ์ž๋Š” ๊ทธ ์ „ ๋‘ ์ˆซ์ž๋ฅผ ๋”ํ•œ ๊ฐ’์ด๋‹ค. n๋ฒˆ์งธ๋กœ ์˜ค๋Š” ์ˆซ์ž๋ฅผ ๋ฐ˜ํ™˜ํ•˜์‹œ์˜ค. ์˜ˆ์‹œ 6 ์˜ˆ์ƒ ๊ฒฐ๊ณผ 8 ํ•ด๊ฒฐ def fibonacci(n): a,b = 0,1 for i in range(2, n+1): a,b = b,a+b print(b) ์ฒซ๋ฒˆ์งธ์™€ ๋‘๋ฒˆ์งธ ์ˆซ์ž๊ฐ€ ๊ฐ๊ฐ 0๊ณผ 1์ด๋ผ๋Š” ๊ฒƒ์„ ์•Œ๊ธฐ ๋•Œ๋ฌธ์— for๋ฌธ์œผ๋กœ ๊ฐ„๋‹จํ•˜๊ฒŒ ๋”ํ•ด๊ฐ€๋ฉด ๋œ๋‹ค. 2023. 5. 7.
[ROSALIND] DNA ์ƒ๋ณด์  ์—ผ๊ธฐ์„œ์—ด ๋ฌธ์ œ (ํ’€์–ด๋ณด๊ธฐ) DNA ์„œ์—ด์˜ ํ•œ์ชฝ ์—ผ๊ธฐ์„œ์—ด์ด ์ฃผ์–ด์กŒ์„ ๋•Œ, ์ƒ๋ณด์ ์œผ๋กœ ์Œ์„ ์ด๋ฃจ๋Š” ์—ผ๊ธฐ์„œ์—ด์„ ๋ฐ˜ํ™˜ํ•˜์‹œ์˜ค. ์˜ˆ์‹œ AAAACCCGGT ์˜ˆ์ƒ ๊ฒฐ๊ณผ ACCGGGTTTT *** DNA๋‚˜ RNA ์—ผ๊ธฐ์„œ์—ด์„ ์“ธ ๋•Œ์—” ํ•ญ์ƒ 5'์—์„œ 3' ๋ฐฉํ–ฅ์œผ๋กœ ์“ฐ๊ธฐ ๋•Œ๋ฌธ์— ๋ฐ˜๋Œ€ํŽธ ์„œ์—ด์ด 3' - TTTTGGGCCA - 5' ์ธ๊ฒƒ์„ ๊ฑฐ๊พธ๋กœ 5' - ACCGGGTTTT - 3' ๋ฐฉํ–ฅ ์ „ํ™˜์„ ํ•ด์•ผ ์ •๋‹ต์ด ๋œ๋‹ค. ํ•ด๊ฒฐ def rev_complement(dna): print(dna[::-1].upper().replace('A', 't').replace('T', 'a').replace('G', 'c').replace('C', 'g').upper()) 2023. 5. 6.