Machine-Learning-in-Bioinformatics-using-WEKA

Welcome to My Bioinformatics Project

This GitHub Page showcases:

👉 Explore the full project below!

Protein Feature Extraction & Breast Cancer ML Analysis

This repository contains two main components related to bioinformatics and machine learning:

  1. Protein Feature Extraction using R
  2. Breast Cancer Classification using Machine Learning (Weka)

📁 Files Included


🔬 1. Protein Feature Extraction (R Script)

The R script performs the following:

Packages used:


🧠 2. Machine Learning Assignment (Weka)

The assignment explores classification models applied to the Breast Cancer Wisconsin dataset. The goal is to predict tumor recurrence based on morphological features.

🧪 Models Evaluated:

🔍 Key Findings:

| Model | Accuracy | Recall | F1-Score | ROC AUC | PRC AUC | |——-|———-|——–|———-|———|———| | IBk (k=1) | 72.37% | 0.724 | 0.697 | 0.628 | 0.686 | | Naïve Bayes | 71.67% | 0.717 | 0.708 | 0.701 | 0.741 | | SMO (SVM) | 69.58% | 0.696 | 0.600 | 0.509 | 0.586 |


🧬 Biological Relevance


✅ How to Use

For R Script:

  1. Ensure required R packages are installed: protr, Peptides, foreign.
  2. Update file paths in the script if needed.
  3. Run the script to generate .csv and .arff files for your sequences.

For Weka Assignment:

  1. Open Weka GUI.
  2. Load the .arff files.
  3. Apply classifiers under the “Classify” tab.
  4. Evaluate results using 10-fold cross-validation.

📌 Author

Jyothi Swaroop C
Machine Learning in Bioinformatics – Assignment & R-based Feature Extraction


📬 Contact

Feel free to reach out if you have any questions or want to collaborate!