close
close

Multilayer graph attention neural networks for accurate mapping of drug-target interaction

Multi-layer DTI network

Give a medication kit \(D=\{d_i\}^{n_d}_{i=1}\) and a target sentence \(T=\{t_j\}^{n_t}_{j=1}\)The similarity between drugs (targets) can be assessed from different perspectives represented by a series of matrices \(\{A^{D,k}\}^{m_d}_{k=1},(\{A^{T,l}\}^{m_t}_{l=1})\),Where \(A^{D,k}\in \mathbb {R}^{n_d\times n_d},(A^{T,l}\in \mathbb {R}^{n_t\times n_t})\) And \(m_d (m_t)\) is the number of similarity types for drugs (targets). Let the binary matrix \(A^Y\in \{0, 1\}^{n_d\times n_t}\) indicate drug interactions D and goals in TWhere \(A^Y_{ij} = 1\) denotes that \(d_i\) And \(t_j\) interact with each other and \(A^Y_{ij} = 0\) otherwise. A multi-layer DTI network \(G^M=(V^M,E^M)\) as shown in Fig. 1 for D and T consists of \(\{A^{D,k}\}^{m_d}_{k=1},\{A^{T,l}\}^{m_t}_{l=1}\) And \(A^Y\)where its adjacency matrices are represented as follows:

$$\begin{aligned} A^M=\left( \begin{array}{cccccc} A^{D,1}& I& I& \cdots & A^Y& A^Y \\ I& A^{D,2 }& I& \cdots & A^Y& A^Y \\ I& I& A^{D,3}& \cdots & A^Y& A^Y \\ \vdots & \vdots & \vdots & \cdots & \vdots & \vdots \\ A^Y& A^Y& A^Y& \cdots & A^{T,m_t-1}& I\\ A^Y& A^Y& A^Y& \cdots & I& A^{T,m_t} \ end{array} \right) . \end{aligned}$$

(1)

Where \(||V^M|| =N= (n_d\times m_d+n_t\times m_t)\).

Fig. 1

The figure shows a multiplex layer drug-target interaction network (DTI) that integrates multi-level information from drugs (D) and targets (T) across multiple layers. Different levels of drug associations are shown on the left (labeled as). \(A^{D,1},A^{D,2},A^{D,3}\)), representing various relationships between drugs \(D_1, D_2, D_3\) And \(D_4\). On the right, target associations are shown at similar levels (\(A^{T,1}\) And \(A^{T,2}\)) with goals \(T_1,T_2\) And \(T_3\). The central part of the diagram shows the interaction between drugs and targets, using a multi-layer network structure to capture the complex interplay between different levels of information. This multilayer approach enables more comprehensive DTI prediction by considering both intralayer and interlayer interactions.

Multilayer neural network with attention graphs

We propose a model called Multi-Layer Graph Attention Neural Network (MLGANN) for DTI prediction. In the multi-layer network of DTI, in addition to the interaction between drugs and targets, there is also information about the interaction between various properties within drugs and targets themselves. Therefore, we use the developed MLGANN to capture both the interaction information between drugs and targets and the information from multiple sources within drugs and targets.

Multilayer neighbor aggregation

Let \(X\in \mathbb{R}^{N\times f}\) represent the initial characteristics of nodes in the multilayer DTI network F denotes the dimension of the embedding space. We use graph neural networks to learn drug and target embeddings in the multilayer DTI network. Specifically, we use Graph Convolutional Networks (GCN) in our model because they are both simple and effective. These embeddings can be refined through application P GCN layers across the DTI multi-layer network:

$$\begin{aligned} X^{(p)} = \sigma \left( \hat{D}^{-\frac{1}{2}}\hat{A}\hat{D}^{- \frac{1}{2}}X^{(p-1)}W^{(p)}\right), \quad p=1,2,\ldots ,P, \end{aligned}$$

(2)

Where \(X^{(0)} =X,\hat{A}=A^M+I^M\)And \(A^M\) is the adjacency matrix of the multi-layer DTI network, \(I^M\) is an identity matrix of the same size as \(A^M\), \(\hat{D}\) is a diagonal matrix with \(\hat{D}_{ij} = \sum \nolimits _{j = 1}\hat{A}_{ij}\), \(W^{(p)}\in \mathbb {R}^{f\times f}\) is a trainable weight matrix, \(\sigma\) is a nonlinear activation function ReLU.

For a knot \(v\in G^M\) (representing either a drug or a target), Eq. (2) updates the embedding of this node as follows:

$$\begin{aligned} x_v^{(p)} = \sigma \left( W^{(p)}\sum \limits _{u}\dfrac{1}{\alpha _{vu}}x_u^ {(p-1)}\right) ,\quad u\in \{v\cup N_v\cup C_v\} \end{aligned}$$

(3)

Where \(\alpha _{vu}\) is the normalized weight, \(N_v\) is the set of neighbors of the node v in shift \(G^M\)And \(C_v\) is the set of nodes corresponding to the same drug/target as the node v. Therefore, MLGANN does not only aggregate the node's neighbors v in shift \(G^M\) (similar to GCN), but also embeds nodes corresponding to the same drug/target in different layers of \(G^M\). This allows information to be transmitted across different layers \(G^M\). By using information from different levels of \(G^M\)MLGANN can learn better representations for each node, especially nodes with limited interactions in a given layer \(G^M\). This is the main difference between MLGANN and existing other network embedding methods.

Multi-layered concentration of attention

We concatenate all of the learned representations P-Layer GCN to get the final node embedding:

$$\begin{aligned} \begin{aligned} z_i^{D,k} = \left[ z_i^{D,k(0)},z_i^{D,k(1)},\cdots ,z_i^{D,k(P)}\right] \\ z_j^{T,l} = \left[ z_j^{T,l(0)},z_j^{T,l(1)},\cdots ,z_j^{T,l(P)}\right] \end{aligned} \end{aligned}$$

(4)

Where \(z_i^{D,k}\) denotes the final embedding of Ith drug in k layer off \(G^M\) And \(z_i^{D,k(p)}\) designated \(G^M\)'S kLayer embedding of IDrug in GCN Pth layer, \(z_j^{T,l}\) represents the final embedding of JTarget in lth layer of \(G^M\), \(z_j^{T,l(p)}\) designated \(G^M\)'S lLayer embedding of JDrug in GCN PLayer.

To obtain the final representations of drugs and targets, we developed a self-attention mechanism to aggregate the representation vectors of drugs and targets across different levels for DTI prediction \(G^M\) Graph. The computer process is as follows:

$$\begin{aligned} \begin{aligned} e_i^{D,k} = q^D \cdot LeakyReLU\left( W^Dz^{D,k}_i\right) ,\quad e_j^{T, l} = q^T \cdot LeakyReLU\left( W^Tz^{T,l}_j\right) \\ \alpha ^k_i = \frac{e_i^{D,k}}{\sum \nolimits ^{ m_d}_{k'=1}e_i^{D,k'}},\quad z^D_i = \sum \limits ^{m_d}_{k=1}\alpha ^k_i z^{D,k} _i,\quad \beta ^l_j = \frac{e_i^{T,l}}{\sum \nolimits ^{m_t}_{l'=1}e_j^{T,l'}},\quad z^ T_j = \sum \limits ^{m_t}_{l=1}\beta ^l_j z^{T,l}_j, \end{aligned} \end{aligned}$$

(5)

Where \(z^D_i\in \mathbb{R}^{f'}\) And \(z^D_i\in \mathbb{R}^{f'}\) are the final representations of drugs and targets, \(W^D\in \mathbb{R}^{f'\times f'}\) And \(W^T\in \mathbb{R}^{f'\times f'}\) are trainable parameter matrices, \(q^D\in \mathbb{R}^{f'}\) And \(q^T\in \mathbb{R}^{f'}\) are trainable vectors.

DTI forecast

Let \(G^Y\) be the DTI network derived from the adjacency matrix \(A^Y\). For a head start \(d_it_j\) In \(G^Y\)Where \(z^D_i\) And \(z^T_j\) are final representation vectors of the drug \(d_i\) and goal \(t_j\)respectively. We try a non-existent edge \(d_ut_v\) In \(G^Y\)Where \(z^D_u\) And \(z^T_v\) are final representation vectors of the drug \(you\) and goal \(TV\)respectively. We look at DTP \(d_it_j\) as a positive sample and \(d_ut_v\) as a negative sample. Therefore, we design the loss function based on cross entropy as follows:

$$\begin{aligned} \mathcalligra{L}=-\log \left( \sigma \left(\right) \right) -\log \left( \sigma \left( –\right) \right) \end{aligned}$$

(6)

Where \(\sigma\) is a nonlinear activation function sigmoid, \(\) is the inner product in Euclidean space.