논문 리뷰/Systems biology

Drug target prioritization by perturbed gene expression and network information

Cho et al. 2022. 5. 16.

Drug target prioritization by perturbed gene expression and network information | Scientific Reports (nature.com)

 

2015, Scientific Reports

Shroeder group

 

 

Drug 는 타겟단백질에 붙어서 downstream effector 와 상호작용하여 결과적으로는 암세포의 전사체를 perturb 시킴. 이러한 perturbation 은 약물의 타겟같은 source 에 대한 정보를 제공함. 

 

이 논문에서, 우리는 이러한 perturbation 과 단백질 상호작용 네트워크가 약물 타겟과 주요 pathway 들을 밝힐 수 있다는 것을 연구했음. CMap 에 있는 500여개 약물에 대한 systematic 분석을 진행하였음.

 

먼저, 약물 타겟이 되는 유전자의 발현량은 drug perturbation 에 의해 그렇게 큰 영향을 받지 않는다는 것을 보였고, 더 나아가 약물 처리 이후 발현량 변화는 drug target 을 판별/구별하기에는 충분하지 않다는 것을 보였음.

 

하지만 network topological measures (  네트워크 이론 상의 여러 측정치들 ) 로 drug target 후보 순위를 매겨봤을 때, target 의 순위가 높았던 것을 볼 수 있었음. 이 논문에서는 local radiality 라는 개념을 소개하는데, 이는 perturb 된 유전자 + + 네트워크 안에서 기능적인 상호작용 정보 를 포함하는 개념임. 

 

새로운 방법은 cancer-specific pathway와 target 을 더 잘 고르고 부작용이 덜한 target 을 고를 수 있을것임. 

 

Q’s

  1. If a drug does not alter the expression of its target, but if it does alter the expression of other genes, then what is the relation of the target to these genes?
  2. Can drug targets be identified from network information and expression alterations induced by a drug?
  3. Does a global or a local network feature give higher target prediction accuracy?
  4. does the target prediction performance depend on the definition of a target protein?

microarray.

  • Deregulated genes 과 drug target 의 접근성 은 네트워크 안에서 shortest path 를 구함으로써 얻을 수 있음.
  • → deregulated gene 이 target 의 직접적으로 상호작용하거나, 가까이 있는 친척 ( Neighbor) 일 수 있다는 것.
  • Topological proximity : 발현량에서부터 계산됨.
  • proteins with a higher chance of being a target ranks on the top of the sorted list. ( closeness score 계산에 따라 )
  • proteins predicted in the 1st percentile of the ranked list are suggested as potential drug targets.
  • CMAP 에서 발현량 데이터, STRING 에서 PPI 데이터 가져와서 전체 gene 에 대해서 Modeling 한 후 ( expression 등이 없는 것 제외 ) , Closeness 계산하여 potential target 뽑은 후 ( 상위 1%), STITCH 데이터 이용하여 Rank validation.

  • 97% of drug targets do not show significant expression changes due to drug perturbations.
  • Deregulated genes are closer to known targets than any other proteins in the network.

Here, the function |sp| calculates the length of a shortest path that connects the deregulated gene dg and the node n in G;

|DG| indicates the total number of deregulated genes.

The LR utilizes both drug perturbation data (i.e., deregulated genes) and topological information (i.e., shortest path distance).

  • The best predictors—LR, radiality, stress, and symmetric kernel diffusion—use the network topology for the calculation of a node score.
  • 네트워크 기반의 predictor 가 성능이 좋게 나왔다.
  • Radiality : level of reachability of a node via the shortest paths to all other nodes (i.e., the closer to the rest of all nodes, the easier it is to reach).
  • Stress calculates the frequency of a node to appear in all possible pairwise shortest paths of the network.
  • Symmetric kernel diffusion : random walk-based method

 

  • LR 로 예측한 것의 radiality ( sp 기반 ) score 가 대체적으로 낮게 나옴 : LR은 local interaction 을 잘 예측한다. (가까이에 있는 target 의 interaction 을 잘 설명한다.)

 

Target classification in this paper

  • Physical targets (PT) are collected from 15 different drugs, proteins, and compound databases (see the Drug Targets Section).
  • Functional targets (FT, FT1) are obtained from the STITCH Database13.
  • FT1 is a subset of the FT targets, it only considers the most confident target for each drug

 

 

  • Although the prediction of a drug target is crucial, the generation of the expected phenotype is also important for drug treatment experi- ments. The pathway databases can help to formalize the expected phenotype, but incomplete databases limit the investigation of effects on the pathway level.
  • SP 기반으로 topological mapping.
  • 각 색깔이 subnetwork로, 하나로 움직이는 ( 주어진 약물에 대해 ) 것을 나타내고 있음.
  • design new drug experiments.

 

  • LR 은 degregulated gene 들의 근접성을 보여주면서 영향받는 molecular pathway 에 대한 정보도 제공해 주는데 ( 지나가는 path로 부터 )
  • 이 논문에서는 당뇨병2형 치료제인 Pioglitazone 을 예시로 들어서 설명
  • PPARG 길항제
  • CTGF 가 target 으로 알려져 있음.
  • 많은 네트워크가 prostate cancer tissue의 expression 을 이용한 네트워크에서 PPARG, CTGF 를 상위에 rank 하고 있음 : drug repositioning.
  • 많은 tumor 에서 CTGF 가 과발현되어있긴 하지만, Pioglitazone 처방을 받으면 발현량이 내려감. → 혈관신생 경로를 자극하지 않음.

In con- clusion, the integration of gene expression data into biological networks improves the prioritization of known drug targets. The shortest path-based approach, LR, uncovers affected pathways due to a drug perturbation.

 

 


Interaction network :

  • STRING 에서 가져옴.

Drug target :

  • The known human targets of the drugs in the CMap database were extracted in several steps.
    • FT
    • First, each drug was mapped to its corresponding PubChem identifier based on a drug name com- parison.
    • Known human targets of these drugs were extracted from the STITCH database (version 3.1)13.
    • In the STITCH database, drug-target interaction data are collected from different data sources, which provide information about metabolic pathways, crystal structures, binding experiments and drug target relationships.
    • Afterwards, for every drug-target interaction, the likelihood of all different sources of this interaction was combined to achieve an overall confidence score.
    • Drug-target interactions were extracted with PubChem identifiers from STITCH.
    • Finally, human targets with a confidence score of 800 or above were selected as drug targets.
    • FT1
    • After the mapping and filtering steps, 551 drugs with known targets were left in the CMap data set.
    • Due to the high number of targets for some drugs, the most likely target of each drug was chosen with a text mining approach.
    • The likelihood of being the best drug target is calculated based on the pairwise occurrence frequency of a target and a tissue name in PubMed abstracts (i.e., the more frequent, the more probable target it is).
    • The most confident target shows a literature-based correlation with a specific tissue. Thus, the method selects tissue-specific targets.
    • PT (physical targets)
    • is an in-house database aggregating more than 15 different drug, protein, and compound databases32.
    • The PT set includes physical interactions from Protein Data Bank33, Therapeutic Targets Database34, and BindingDB Database35.
    • The coverage of PT is much lower than of FT because of its focus on physical binding only.
    • FT1 is the smallest target set in terms of unique targets and drug-target interactions (Supplementary Table 2).
    • Although PT contains only known physical binding partners of queried drugs, it was successfully applied in previous drug repositioning studies32,36.
    • The biological network (STRING) contains at least 90% of known targets that are provided by any target data set.
    • The known target overlap between three drug-target data sets shows that 87 targets are indicated by all of the data sets (Supplementary Fig. 1). PT covers only 155 unique physical drug targets; 2224 targets are only provided by FT, and it has the highest coverage in terms of known targets.
  • Construction of Sub-network of Selected Targets and Deregulated Genes.
    • The sub-network of drug targets and deregulated genes show that the individual modules are composed of drug targets and deregulated genes.
    • The aim is to extract the paths that pass through a target as well as the affected deregulated genes in the STRING network.
    • Topological mapping of perturbation data in the biological network reveals the shortest paths between deregulated genes and known targets.
    • To choose four drug examples given in the “A Sub-network of Selected Targets and Deregulated Genes” section, we applied the following selection scheme: The shortest paths network (SP-net) is extracted for each drug target and its deregulated genes.
    • The SP-net is composed of all possible shortest paths that connect all deregulated genes and target t.
    • LR(t) is the local radiality of a target t with the deregulated genes dg. It is calculated for each SP-net.
    • If LR(t) < 3, and the distance of SP-nett to other SP-nets is larger than 3, then SP-nett is selected as an example for target-deregulated genes sub-network. Each selected SP-nett is uploaded to the Cytoscape tool to visualize the sub-network40.

 

댓글