This GitHub Page showcases:
👉 Explore the full project below!
This repository contains two main components related to bioinformatics and machine learning:
protein_feature_extraction.R
: R script to extract multiple features from protein sequences using protr
, Peptides
, and foreign
packages. Outputs are saved as .csv
and .arff
files for use in machine learning.JYOTHI_SWAROOP_ML_ASSIGNMENT.pdf
: A detailed machine learning assignment using the Breast Cancer Wisconsin dataset analyzed through Weka.The R script performs the following:
protcheck
.extractDC
, extractTC
, extractCTDC
, extractCTDD
, extractCTDT
, extractAPAAC
, extractAAC
.csv
format..arff
files for ML tools like Weka.Packages used:
protr
Peptides
foreign
The assignment explores classification models applied to the Breast Cancer Wisconsin dataset. The goal is to predict tumor recurrence based on morphological features.
| Model | Accuracy | Recall | F1-Score | ROC AUC | PRC AUC | |——-|———-|——–|———-|———|———| | IBk (k=1) | 72.37% | 0.724 | 0.697 | 0.628 | 0.686 | | Naïve Bayes | 71.67% | 0.717 | 0.708 | 0.701 | 0.741 | | SMO (SVM) | 69.58% | 0.696 | 0.600 | 0.509 | 0.586 |
protr
, Peptides
, foreign
..csv
and .arff
files for your sequences..arff
files.Jyothi Swaroop C
Machine Learning in Bioinformatics – Assignment & R-based Feature Extraction
Feel free to reach out if you have any questions or want to collaborate!