InterPro :
https://www.ebi.ac.uk/interpro/api
protein, structure, taxonomy, proteome, set ( Pfam ... )
URL structure.
/api/[data type]/
Mixing endpoints
Sequence classification 은 녹화본이 없어서 자료를 가지고 내가 나름대로 정리해 본 내용 :
protein_biology_sequence_classification_2022.pdf - Google Drive
1. Pfam database
Pfam database 는 MSA, HMM 를 이용한 protein family 의 집합임. 이것들을 이용하여 family / domain 수준의 protein classification 을 진행하는 것임.
E value 는 PSI-BLAST, HMMER 등과 같은 MSA tool 들에서 나오는 값으로 query protein 과 얼마나 비슷한가를 나타냄.
HMMER 의 종류 :
• phmmer - single protein sequence against protein sequence database
• hmmscan - single protein sequence against profile HMM library (Pfam, CATH-Gene3D, PIRSF,
Superfamily and TIGRFAMs)
• hmmsearch - either multiple sequence alignment or profile HMM against protein sequence
database
• jackhmmer - iterative searches. Initiated with a single sequence, a profile HMM or a multiple
sequence alignment against a target sequence database
Pfam 데이터 종류 | Clans in Pfam ( image from EMBL course material ) |
Pfam 사용할 수 있는 task 들 :
• Protein or DNA sequence search against models
• Browse families/clans/proteomes
• Retrieve text annotation about a given family/entry
• Search by keywords
• See structural information of a family (experimentally
determined from PDB or structure predictions from
AlphaFold/trRosetta)
• Taxonomy distribution
• View multiple sequence alignment of a family/clan
**** 2022.10.29 기준으로 Pfam 은 더이상 서비스를 제공하지 않음 : InterPro 로 넘어감.
2) Sequence classification with InterPro
InterPro ?
Protein alignment 가 있다고 하면, 여러 Protein 에서 여러 Motif 를 포함하는 것은 fingerprint, 단일 motif 는 patten, 전체 alignment method 는 Profile, HMM 이라 함.
Pattern | Fingerprint | Profile / HMM | Profile |
Interpro entry | From EMBL course material | From EMBL course material |
단백질 서열을 입력하면 InterPro 에서 여러가지를 보여주는데,
여러 정보를 취합해서 보여준다.
GO Term 도 mapping해주는데, manual 하게 해 준다고 한다.
Pathway-level 의 interpretation 을 할 때 논문들에서도 보통 manual 하게 pathway 를 선택적으로 annotate 하고 해석하곤 하는데... 이를 자동화할 수 있다면 ( 어느 수준의 신뢰성이 있도록 ) 좋은 tool 이 될듯. (있을수도 있음)
Interpro 의 정확성은 13개의 DB에서 나온 정보를 겹치거나, 취합함으로서 증대된다고 한다.
• Summarize the purpose of Pfam and InterPro in protein classification and how they may be
useful in your research.
• Pfam...
• Can be used to analyse novel protein sequences and predict their domain architecture and structure
• Is useful to find distant homologues and get evolutionary relationships
• Is part of the InterPro member database consortium
• InterPro...
• Combines multiple protein models from multiple resources into a single searchable resource
• Can be used to identify domains and sites within novel sequences, and to define protein families
• Is used in the large scale annotation of proteins, proteomes and metagenomes
• Is heavily curated, errors are identified and corrected regularly
댓글