Skip to content

zeper-eng/Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

127 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learning

(Hand-rolled Machine Learning and Statistical Analyses)

While thin at the moment, the goal is to build a robust library of statistical methods typically used in clinical research. The project serves a dual purpose. Throughout undergrad my training was very applied (comp-bio/structural bio research + informatics and modelling coursework) with algorithmic implementations coming mainly from common packages like scikit-learn, scipy, etc, but as I continue to grow into a data-scientist I am looking to grow my experience in numerical computing. My two main goals at the moment are to:

  1. Practice implementing algorithms from scratch using not much more than scientific computing libraries for vectorization, and dataframe oriented frameworks like Pandas.

  2. Also write out the derivations for the various models I have been using day-to-day and build on them in order to also expand my theoretical statistical foundations.

Originally I was going to try to learn some TypeScript and implemented a simple univariate regression in there but decided to focus more on the math than GUI and also got busy working on this:

MDSA Tools Repo

we are entering the review cycle now so check it out! (pre-print up soon)

The original TypeScript univariate implementations are still present but have been temporarily sidelined due to the focus on developing a Python backend and a JavaScript/TypeScript frontend for visualization and data loading.

Implementations so far and where the code lives

Outlined below are the different models I have implemented so far.

Implementations

Foundations and GLMs

Logistic Regression

OLS Regression

Poisson Regression

Mixed models

Random Effects Meta Regression (random effects)

Code

Core Algorithms

Legacy + Frontend

There are examples at the bottom of each python module inside of the "main" block that can be run to test out each implementation if anyone perusing is curious to see. There are, of-course unlisted dependencies (i.e. numpy/scipy/scikit-learn) but this is not really meant to be an entirely public use at the moment so for now I leave it up to the user to pip/conda install their way to success.

Future directions

Front end

I plan on also adding in some kind of TypeScript front end GUI/chart displayer mostly to try to get some practice using JavaScript/TypeScript.

There is actually already a univariate regression implemented in TypeScript before I realized that there weren’t many good vectorized math packages (aside from like tensorflow but this came with its own suite of problems) in the Node.js version of TypeScript and it's really not meant for that anyways but, it gave me a solid foundation thus far.

Perhaps I will also add in some SQLite for databse operations. Although, we would be just moving CSV's around inside folders it would be proof of concept.

Algorithms

At the moment plan on continuing implementing mostly GLMs for the purpose of growing my clinical research relevant skillset but, will also try out things like K-means and SVD-PCA as well as other generalizations.

In the near future:

  • k-means
  • PCA
  • negative bionomial regression

About

Post-Grad Learning Continues: Repo focused on connecting the math to implementation (matrix notation, likelihoods, gradients) without relying on high-level libraries.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors