Konstantinos Skoularikis

Konstantinos Skoularikis

Machine Learning Platform Engineer

PathUTRNet: Predicting Signaling Pathways from miRNA with AI

• MSc Thesis Research

Summary

PathUTRNet was developed as part of my MSc thesis at Queen Mary University of London. This project represents a deep learning pipeline combining CNNs, RNNs, and autoencoders to identify and interpret miRNA to signaling pathways patterns.

The system accomplishes three interconnected tasks through two sequential deep learning models:

  • Binding Site Identification: The first model identifies the existence of a binding site between a miRNA and non-coding region (binary classification).
  • Pathway Prediction: When a binding site exists, the second model predicts the signal transduction pathway (150 classes - multi-class classification).
  • UTR Classification: The model also determines the binding's untranslated region (3'UTR or 5'UTR - binary classification).

Both models combine CNN and RNN layers, as this hybrid architecture significantly outperforms simpler CNN-only or RNN-only counterparts.

Data Sources

  • Signal transduction pathways data acquired from Reactome
  • Gene symbols related to pathways obtained through NCBI web-based tools
  • DIANA TarBase v.8 indexed to retrieve positive and negative gene-miRNA pairs
  • Mature miRNA transcripts, 3'UTR, and 5'UTR sequences acquired via BioMart and Bioconductor

Additional information about the data acquisition and preprocessing process can be found in the research paper.

Sample Inference Results

Input

miRNA sequence (mmu-miR-194-5p):

TGTAACAGCAACTCCATGTGGA

Target sequence (5'UTR):

TCCTGCGCAGTTCTCCGCCGCAGCCTCAGCGGGCAAGCGCCGGGGCTGCTCTCAAT CTCCTGGCTGCGAGGAGGCAGCCCCGGCGAGCTGTCGTGCGCCCCGTCCAGAGTTACTGAGTGCGGGGCACAGC GTAACTGACAGCGCGTCTGCTCACAGTTCCCGTCGCCTGGACTTAGCTTTCCAACCCCGGCTTCTCGTGGGCAT CATGTCAAGAGCCGTCGCCGCTGCAACCGCCGCCGCCACCCGGGGAAGAGCCGCAGCCTCGGCAGCCGCGCGCG CAGGAGGGCAATAAACCGAATCACTCCGGGCTCAAAGTGGCAGGGGACCGTCGCGGTGCTCTCTGTTCCGGCGG GACTCCTGCCATGTGCTGAGCCATGCCCCTGGCCGCGCCCGCGGGCCGCGT

Output

  • Binding Label: 1 (Binding site confirmed)
  • Pathway Predicted: HS-GAG biosynthesis ✓ (matches ground truth)
  • UTR Type: 5'UTR ✓ (matches ground truth)

Scientific Publication

Research paper authored in collaboration with Professor Rob Krams .

📄 View Research Paper

Note: Paper content is updated periodically before final publication.

Technology Stack

Python TensorFlow Keras Plotly Pandas Scikit-Learn NumPy

Code & Resources

💻 View on GitHub