For those who are interested in my song lists, try this
Statistics
Statistics is fun!
Here are the consumer analysis article co-written by one of my previous GSIs: Johnny and 2018 FIFA worldcup article.
This explains how stats can be applied into the real world problems! ^^
KL Divergence in Korean by Skywalk blog: Link
Bioinformatics
Review of Statistical Learning Methods in Integrated Omics Studies (An Integrated Information Science) by Irene Sui Lan Zeng and Thomas Lumley: Paper
Dimension reduction techniques for the integrative analysis of multi-omics data by Chen Meng, et. al.: Paper
Chapter 11: Genome-Wide Association Studies by William S. Bush and Jason H. Moore: Paper
Multi-Omics Factor Analysis—a framework for unsupervised integration of multi-omics data sets by Ricard Argelaguet et.al.:Paper
DIABLO: an integrative approach for identifying key molecular drivers from multi-omic assays by Amrit Singh et.al.:Paper
Why do we use mice for many experiments? link
About Electronic Health Care Records. link
Find mapping -> From genome-wide associations to candidate causal variants by statistical fine-mapping by Daniel J. Schaid et.al.: Paper
Make improvements in Electronic health records (EHR) -> High-fidelity phenotyping: richness and freedom from bias by George Hripcsak and David J Albers: Paper
Multi-scale inference ofo genetic trait architecture using biologically annotated neural networks by Pinar Demetci et.al.: Paper
Deep Convolutional Neural Networks for Breast Cancer Histology Image Analysis by Alexander Rakhlin et.al.: Paper
A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits by Mingxuan Cai et.al.: Paper
Inferring multimodal latent topics from electronic health records by Yue Li et.al.: Paper
Going to Bat(s) for Studies of Disease Tolerance by Judith N Mandl et.al.: Paper ~~ COVID 19 related???
Single-cell epigenomic analyses implicate candidate causal variants at inherited risk loci for Alzheimer’s and Parkinson’s diseases: Paper ~~ Power of single cell analysis
Joint probabilistic modeling of single-cell multi-omic data with totalVI by Adam Gayoso et.al.: Paper ~~ joint analysis of CITE seq data
Applications of machine learning in drug discovery and development by Jessica Vamathevan et.al.: Paper ~ Introduce different methodologies for each step in drug discovery
Modeling polypharmacy side effects with graph convolutional networks by Marinka Zitnik et.al.: Paper ~ side effect prediction using GCN networks
Comprehensive Integration of Single-Cell Data by Tim Stuart et.al.: Paper ~ Seurat: mapping/anchoring in single cell data using MNN+CCA and scoring with SNN
scAlign: a tool for alignment, integration, and rare cell identification from scRNA-seq data by Nelson Johansen et.al.: Paper ~ bi-directional mapping with unsupervised deep learning. Outperform other mapping/integration methods such as Seurat, scVI, scanorama, MNN, scmap, MINT, scMerge.
On Usage of Autoencoders and Siamese Networks for Online Handwritten Signature Verification by Kian Ahrabian et.al.: Paper ~ Comparing latent spaces of autoencoders using Siamese for offline signature verifications
Long non-coding RNA by 한남식: paper
Interpretable factor models of single-cell RNA-seq via variational autoencoders by Valentine Svensson et.al.: Paper ~ Linearly-decoded VAE implemented in scVI package
A benchmark of batch-effect correction methods for single-cell RNA sequencing data by Hoa Thi Nhu Tran et.al.: Paper ~ scRNAseq batch correction lists benchmark testings
scCobra: Contrastive cell embedding learning with domainadaptation for single-cell data integration by Bowen Zhao et.al.: Paper ~ Batch correction effect with Bowen and Jun
For Project on single cell deconvolution (2023 Spring ~ )
<single-cell spatial modeling of cell identity in spot-resolution: Region segmentation>
SPICEMIX enables integrative single-cell spatial modeling of cell identity by Benjamin Chidester et.al.: Paper
SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network by Jian Hu et.al.:Paper
Spatial transcriptomics at subspot resolution with BayesSpace by Edward Zhao et.al.:Paper
Identification of spatially associated subpopulations by combining scRNAseq and sequential fluorescence in situ hybridization data by Qian Zhu et.al.:Paper ~ Hidden Markov random field.
Spatial reconstruction of single-cell gene expression data by Rahul Satija et.al.:Paper ~ Seurat.
stLearn: integrating spatial location, tissue morphology and gene expression to find cell types, cell-cell interactions and spatial trajectories within undissociated tissues by Duy Pham et.al.:Paper
<Deconvolve spots into cells in spatial data: Cell type deconvolution>
Single-cell and spatial transcriptomics enables probabilistic inference of cell type topography by Alma Andersson et.al.:Paper ~ stereoscope
Robust decomposition of cell type mixtures in spatial transcriptomics by Dylan M. Cable et.al.:Paper ~ RCTD
SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes by Marc Elousa-Bayes et.al:Paper
Spatially informed cell-type deconvolution for spatial transcriptomics by Ying Ma and Xiang Zhou:Paper ~ CARD
Cell2location maps fine-grained cell types in spatial transcriptomics by Vitalii Kleshchevnikov et.al.:Paper
SCDC: bulk gene expression deconvolution by multiple single-cell RNA sequencing references by Meichen Dong et.al.:Paper
SpatialDWLS: accurate deconvolution of spatial transcriptomic data by Rui Dong and Guo-Cheng Yuan: Paper
Giotto: a toolbox for integrative analysis and visualization of spatial expression data by Ruben Dries et.al.:Paper
DSTG: deconvoluting spatial transcriptomics data through graph-based artificial intelligence by Qianqian Song and Jing Su:Paper
Bulk tissue cell type deconvolution with multi-subject single-cell expression reference by Xuran Wang et.al.:Paper ~ MuSiC
<Summary of key problems in analysis and methods of spatial transcriptomics data> Deciphering tissue structure and function using spatial transcriptomics by Benjamin L. Walker et.al.:Paper
For Project on AI Drug Design (2021 Fall)
Reconstructing SARS-CoV-2 response signaling and regulatory networks by Jun Ding et.al.: Paper ~ using network and pathway with integration of previously identified datasets+analysis, let’s make drug design work!
iDREM: Interactive visualization of dynamic regulatory networks by Jun Ding et.al.: Paper ~ integration of time series proteomics, epigenomics, scRNA-seq, interactive visualization leads higher accuracy of a model.
Cellar: Interactive single cell data annotation tool by Euxhen Hasanaj et.al.: Paper ~ explain the UI web server for single cell data annotation tool from preprocessing to all the way to analysis. 1. Preprocessing - i.e. based on dispersion and max expression value 2. Dimensionality reduction - i.e. linear (PCA, truncated SVD) & non-linear (diffusion map, UMAP, multidimensional scaling, isomap) 3. Clustering - i.e. unsupervised (Leiden, k-means, k-medoids, spectral clustering, agglomerative clustering) & semi-supervised (seeded k-means, constrained k-means) 4. Cell type assignment - i.e. co-expression based cell coloring 5. Cell type assignment using label transfer - i.e. scanpy ingest, singleR
AI drug design video:
- video 1 - Quantitative structure-activity relatioinship.
- video 2 - AI driven drug discovery using relationships between proteins, drugs, diseases and metabolites with network prediction like what Mummichog and Piumet do.
- video 3 - use AI to design drugs faster and cheaper learning disease targets better.
- video 4 - Drug discovery using ChemGAN speeding up 8-12 years conventional one drug discovery by trying and choosing molecules that are most likely to have desired properties a note from Siraj - explains conventional drug discovery process and show ML history on it (i.e. RNN, CNN, GAN) and real demo on python using Tensorflow
Probabilisitc graphical model:
-
video 1 - show a good example of directional acyclic causal networks with demo using pgmpy python lib: Bayesian network (Markov “since it is explained by your immediate neighbor”) and thus every edge is conditional probabilistic (i.e. conditional probability distribution when it is Markov -> P(node val parent’s node val)). - video 2 - encode joint distribution by conditional independence in directed acyclic Bayesian networks (approaches for inference: 1. enumeration 2. variable elimination - elimination order can change computational cost 3. belief propagation).
- video 3 - undirected graphical models: Markov random fields (edge: potential function between variables) and convert Bayesian networks as MRF (parametrication is NOT unique) but lose marginal independce of parents.
- towards data science blog - Definition of PGM, directed graphical model (bayesian network)+3 rules of d-separation (1. if there is no unidirectional path between them 2. another set of nodes block any unidirectional path between them 3. a node which has two or more parents or its descendants break d-separation of their parents), undirected graphical model (markov random field - may contain cycles unlike BN thus it can describe a different set of dependency relationships than BN; yet, it is not a superset of DGM since some relatinoships such as causal can only be described by DGM. Undirected edge reprents joint probabilities of cliques. Pairwise+local+global markov properties), conversion of BN and MRF (not easy as independence relationships are diff. Moralization to convert BN to MRF. Triangulation to convert MRF to BN), and inference (variable elimination - exact inference algorithm but it can be computationally intractible for large BN)+parameter estimation in PGM.
For Project on MetaboAnalyst - Metab (2021 Summer)
Predicting Network Activity from High Throughput Metabolomics by Shuzhao Li et.al.: Paper ~ mummichog algorithm (using metabolic pathways and networks to predict functional acitivity without priori identifications of metabolites using biological activity) introduction with a few applications
MetaboAnalyst 4.0: towards more transparent and integrative metabolomics analysis by Jasmine Chong et.al.: Paper ~ update 1. R-command tracking+display and companion metaboAnalystR package (transparency, reproducibility, flexibility), 2. MS peaks to pathways to predict pathway activity from untargeted MS using mummichog, 3. biomarker meta-analysis thru combination of multiple metabolomics data, and 4. network explorer for integrative analysis of metabolomics, metagenomics, transcriptomics 5. knowledgebase updates (compound database, pathway libraries, metabolite sets)
MetaboAnalyst 5.0: narrowing the gap between raw spectra and functional insights by Zhiqiang Pang et.al.: Paper ~ update 1. LC-MS spectra processing module (automated paramter optimization and resumable analysis) with interactive plots, 2. functional analysis module (select any peak groups of interests and evaluate enrichment of potential functions), and 3. functional meta-analysis module (combine multiple global metabolomics datasets) by pathway level integration (different samples) or pooling peaks (same samples)
Revealing disease-associated pathways by network integration of untargeted metabolomics by Leila Pirhaji et.al: Paper
MetaboAnalystR 2.0: From Raw Spectra to Biological Insights by Jasmine Chong et.al.: Paper ~ update 1. raw spectral processing and 2. mummichog algorithm. Show two case studies
MetaboAnalystR 3.0: Toward an Optimized Workflow for Global Metabolomics by Zhiqiang Pang et.al.: Paper ~ update 1. efficient parameter optimization, 2. automated batch effect correction, and 3. more accurate pathway activity prediction with retention time + updated pathway libraries => faster computationally and more biologically meaningful interpretations
Machine Learning
Nice Intro-ANN, CNN, and RNN: link
Love this BLOG-MachineCurve by Chris: link
link1 -> MLP keras and tensorflow example
link2 -> CNN (Convolutional layers, pooling layers, fully connected layers)
link3 -> CNN keras and tensorflow example
towardsdatascience: link -> Here, specifically, it talks about batch size v.s. epoch v.s. number of batches (iterations)*
link1 link2 -> math behind Deep Learning step by step
link3 -> RNN
link4 -> CNN
freeCodeCamp: link -> Here, specifically, it talks about the relationship between math and neural networks with an example
AnalyticsVidhya: link -> Here, specifically, it talks about the activation functions for the layers.
link1 -> RL
Stanford Deep Learning CS 230: link -> CNN and RNN (Super nice cheatsheets!!!)
Word2Vec by The Coding Train: link
Reinforcement Learning by deepsense.ai blog: link
Good summary of Reinforcement Learning by SmartLab AI: link
Self driving cab using RL by learndatasci: link
Pytorch Intro in Korean: link - original link
Good explanation of logistic regression for classification problems by Christoph Molnar: link
Hyperparameters optimization for Deep Learning and Early Stopping criteria by flydhub blog: link => conclusion for Hyp. opt.: Bayes sequential model based optimization is the BEST for DL
Variational Autoencoder explanation by Arxiv Insights: link
GPU CUDA:
- video 1 - NVIDIA CUDA intro
- video 2 - Deep learning using gpu soft architecture API (CUDA) on pytorch example for parallel computing
- video 3 - GPU programming with CUDA in C++
Graph convolutional network: link1
link2
Transposed convolution: link1
link2
VAE in pytorch: link1
link2
link3
link4
link5
link6
link7
link8
link9
link10 -> Loss, beta VAE, VQ VAE, TD VAE
Basic Linux: Manual
BERT: in Korean