Drug target prioritization by perturbed gene expression and network information | Scientific Reports (nature.com)
2015, Scientific Reports
Shroeder group
Drug 는 타겟단백질에 붙어서 downstream effector 와 상호작용하여 결과적으로는 암세포의 전사체를 perturb 시킴. 이러한 perturbation 은 약물의 타겟같은 source 에 대한 정보를 제공함.
이 논문에서, 우리는 이러한 perturbation 과 단백질 상호작용 네트워크가 약물 타겟과 주요 pathway 들을 밝힐 수 있다는 것을 연구했음. CMap 에 있는 500여개 약물에 대한 systematic 분석을 진행하였음.
먼저, 약물 타겟이 되는 유전자의 발현량은 drug perturbation 에 의해 그렇게 큰 영향을 받지 않는다는 것을 보였고, 더 나아가 약물 처리 이후 발현량 변화는 drug target 을 판별/구별하기에는 충분하지 않다는 것을 보였음.
하지만 network topological measures ( 네트워크 이론 상의 여러 측정치들 ) 로 drug target 후보 순위를 매겨봤을 때, target 의 순위가 높았던 것을 볼 수 있었음. 이 논문에서는 local radiality 라는 개념을 소개하는데, 이는 perturb 된 유전자 + + 네트워크 안에서 기능적인 상호작용 정보 를 포함하는 개념임.
새로운 방법은 cancer-specific pathway와 target 을 더 잘 고르고 부작용이 덜한 target 을 고를 수 있을것임.
Q’s
- If a drug does not alter the expression of its target, but if it does alter the expression of other genes, then what is the relation of the target to these genes?
- Can drug targets be identified from network information and expression alterations induced by a drug?
- Does a global or a local network feature give higher target prediction accuracy?
- does the target prediction performance depend on the definition of a target protein?
microarray.
- Deregulated genes 과 drug target 의 접근성 은 네트워크 안에서 shortest path 를 구함으로써 얻을 수 있음.
- → deregulated gene 이 target 의 직접적으로 상호작용하거나, 가까이 있는 친척 ( Neighbor) 일 수 있다는 것.
- Topological proximity : 발현량에서부터 계산됨.
- proteins with a higher chance of being a target ranks on the top of the sorted list. ( closeness score 계산에 따라 )
- proteins predicted in the 1st percentile of the ranked list are suggested as potential drug targets.
- CMAP 에서 발현량 데이터, STRING 에서 PPI 데이터 가져와서 전체 gene 에 대해서 Modeling 한 후 ( expression 등이 없는 것 제외 ) , Closeness 계산하여 potential target 뽑은 후 ( 상위 1%), STITCH 데이터 이용하여 Rank validation.
- 97% of drug targets do not show significant expression changes due to drug perturbations.
- Deregulated genes are closer to known targets than any other proteins in the network.
Here, the function |sp| calculates the length of a shortest path that connects the deregulated gene dg and the node n in G;
|DG| indicates the total number of deregulated genes.
The LR utilizes both drug perturbation data (i.e., deregulated genes) and topological information (i.e., shortest path distance).
- The best predictors—LR, radiality, stress, and symmetric kernel diffusion—use the network topology for the calculation of a node score.
- 네트워크 기반의 predictor 가 성능이 좋게 나왔다.
- Radiality : level of reachability of a node via the shortest paths to all other nodes (i.e., the closer to the rest of all nodes, the easier it is to reach).
- Stress calculates the frequency of a node to appear in all possible pairwise shortest paths of the network.
- Symmetric kernel diffusion : random walk-based method
- LR 로 예측한 것의 radiality ( sp 기반 ) score 가 대체적으로 낮게 나옴 : LR은 local interaction 을 잘 예측한다. (가까이에 있는 target 의 interaction 을 잘 설명한다.)
Target classification in this paper
- Physical targets (PT) are collected from 15 different drugs, proteins, and compound databases (see the Drug Targets Section).
- Functional targets (FT, FT1) are obtained from the STITCH Database13.
- FT1 is a subset of the FT targets, it only considers the most confident target for each drug
- Although the prediction of a drug target is crucial, the generation of the expected phenotype is also important for drug treatment experi- ments. The pathway databases can help to formalize the expected phenotype, but incomplete databases limit the investigation of effects on the pathway level.
- SP 기반으로 topological mapping.
- 각 색깔이 subnetwork로, 하나로 움직이는 ( 주어진 약물에 대해 ) 것을 나타내고 있음.
- design new drug experiments.
- LR 은 degregulated gene 들의 근접성을 보여주면서 영향받는 molecular pathway 에 대한 정보도 제공해 주는데 ( 지나가는 path로 부터 )
- 이 논문에서는 당뇨병2형 치료제인 Pioglitazone 을 예시로 들어서 설명
- PPARG 길항제
- CTGF 가 target 으로 알려져 있음.
- 많은 네트워크가 prostate cancer tissue의 expression 을 이용한 네트워크에서 PPARG, CTGF 를 상위에 rank 하고 있음 : drug repositioning.
- 많은 tumor 에서 CTGF 가 과발현되어있긴 하지만, Pioglitazone 처방을 받으면 발현량이 내려감. → 혈관신생 경로를 자극하지 않음.
In con- clusion, the integration of gene expression data into biological networks improves the prioritization of known drug targets. The shortest path-based approach, LR, uncovers affected pathways due to a drug perturbation.
Interaction network :
- STRING 에서 가져옴.
Drug target :
- The known human targets of the drugs in the CMap database were extracted in several steps.
- FT
- First, each drug was mapped to its corresponding PubChem identifier based on a drug name com- parison.
- Known human targets of these drugs were extracted from the STITCH database (version 3.1)13.
- In the STITCH database, drug-target interaction data are collected from different data sources, which provide information about metabolic pathways, crystal structures, binding experiments and drug target relationships.
- Afterwards, for every drug-target interaction, the likelihood of all different sources of this interaction was combined to achieve an overall confidence score.
- Drug-target interactions were extracted with PubChem identifiers from STITCH.
- Finally, human targets with a confidence score of 800 or above were selected as drug targets.
- FT1
- After the mapping and filtering steps, 551 drugs with known targets were left in the CMap data set.
- Due to the high number of targets for some drugs, the most likely target of each drug was chosen with a text mining approach.
- The likelihood of being the best drug target is calculated based on the pairwise occurrence frequency of a target and a tissue name in PubMed abstracts (i.e., the more frequent, the more probable target it is).
- The most confident target shows a literature-based correlation with a specific tissue. Thus, the method selects tissue-specific targets.
- PT (physical targets)
- is an in-house database aggregating more than 15 different drug, protein, and compound databases32.
- The PT set includes physical interactions from Protein Data Bank33, Therapeutic Targets Database34, and BindingDB Database35.
- The coverage of PT is much lower than of FT because of its focus on physical binding only.
- FT1 is the smallest target set in terms of unique targets and drug-target interactions (Supplementary Table 2).
- Although PT contains only known physical binding partners of queried drugs, it was successfully applied in previous drug repositioning studies32,36.
- The biological network (STRING) contains at least 90% of known targets that are provided by any target data set.
- The known target overlap between three drug-target data sets shows that 87 targets are indicated by all of the data sets (Supplementary Fig. 1). PT covers only 155 unique physical drug targets; 2224 targets are only provided by FT, and it has the highest coverage in terms of known targets.
- Construction of Sub-network of Selected Targets and Deregulated Genes.
- The sub-network of drug targets and deregulated genes show that the individual modules are composed of drug targets and deregulated genes.
- The aim is to extract the paths that pass through a target as well as the affected deregulated genes in the STRING network.
- Topological mapping of perturbation data in the biological network reveals the shortest paths between deregulated genes and known targets.
- To choose four drug examples given in the “A Sub-network of Selected Targets and Deregulated Genes” section, we applied the following selection scheme: The shortest paths network (SP-net) is extracted for each drug target and its deregulated genes.
- The SP-net is composed of all possible shortest paths that connect all deregulated genes and target t.
- LR(t) is the local radiality of a target t with the deregulated genes dg. It is calculated for each SP-net.
- If LR(t) < 3, and the distance of SP-nett to other SP-nets is larger than 3, then SP-nett is selected as an example for target-deregulated genes sub-network. Each selected SP-nett is uploaded to the Cytoscape tool to visualize the sub-network40.
댓글