(Hand-rolled Machine Learning and Statistical Analyses)
While thin at the moment, the goal is to build a robust library of statistical methods typically used in clinical research. The project serves a dual purpose. Throughout undergrad my training was very applied (comp-bio/structural bio research + informatics and modelling coursework) with algorithmic implementations coming mainly from common packages like scikit-learn, scipy, etc, but as I continue to grow into a data-scientist I am looking to grow my experience in numerical computing. My two main goals at the moment are to:
-
Practice implementing algorithms from scratch using not much more than scientific computing libraries for vectorization, and dataframe oriented frameworks like Pandas.
-
Also write out the derivations for the various models I have been using day-to-day and build on them in order to also expand my theoretical statistical foundations.
Originally I was going to try to learn some TypeScript and implemented a simple univariate regression in there but decided to focus more on the math than GUI and also got busy working on this:
we are entering the review cycle now so check it out! (pre-print up soon)
The original TypeScript univariate implementations are still present but have been temporarily sidelined due to the focus on developing a Python backend and a JavaScript/TypeScript frontend for visualization and data loading.
Outlined below are the different models I have implemented so far.
There are examples at the bottom of each python module inside of the "main" block that can be run to test out each implementation if anyone perusing is curious to see. There are, of-course unlisted dependencies (i.e. numpy/scipy/scikit-learn) but this is not really meant to be an entirely public use at the moment so for now I leave it up to the user to pip/conda install their way to success.
I plan on also adding in some kind of TypeScript front end GUI/chart displayer mostly to try to get some practice using JavaScript/TypeScript.
There is actually already a univariate regression implemented in TypeScript before I realized that there weren’t many good vectorized math packages (aside from like tensorflow but this came with its own suite of problems) in the Node.js version of TypeScript and it's really not meant for that anyways but, it gave me a solid foundation thus far.
Perhaps I will also add in some SQLite for databse operations. Although, we would be just moving CSV's around inside folders it would be proof of concept.
At the moment plan on continuing implementing mostly GLMs for the purpose of growing my clinical research relevant skillset but, will also try out things like K-means and SVD-PCA as well as other generalizations.
In the near future:
- k-means
- PCA
- negative bionomial regression