Theory behind RSEM. Published on January 29, 2017. In this article, I will walk through and try to explain a 2009 paper RNA-Seq gene expression estimation with read mapping uncertainty by Bo Li, Victor Ruotti, Ron M. Stewart, James A. Thomson, and Colin N. Dewey.. I will also occasionally refer to a 2011 paper by Bo Li and Colin N. Dewey, RSEM: accurate transcript quantification from RNA-Seq GROUP DATA BY MONTH IN R Group data by month in R. Published on February 22, 2017. I often analyze time series data in R — things like daily expenses or webserver statistics. And just as often I want to aggregate the data by month to see longer-term patterns. ASYMMETRY OF COSTS IN (T-)SNE This is the formulation taken in t-SNE and the one I explore here. In the original SNE paper, the probabilities were asymmetric, and the cost function was a sum of KL divergences of pairs of conditional distributions.↩︎

NEW PATTERNS IN TASTY New patterns in tasty. Published on January 8, 2018. When I wrote tasty in 2013, I borrowed the pattern language and its implementation from test-framework.I wasn't fond of that pattern language, but it did the job most of the time, and the task of coming up with a better alternative was daunting. UNDERSTANDING ASYMMETRIC NUMERAL SYSTEMS Understanding Asymmetric Numeral Systems. Published on August 20, 2017; updated on August 22, 2017. Apparently, Google is trying to patent (an application of) Asymmetric Numeral Systems, so I spent some time today learning what it is. In its essense lies a simple and beautiful idea. ANS is a lossless compression algorithm. CAN YOU CHEAT THE BRIER SCORE? Can you cheat the Brier score? Published on August 9, 2018; updated on March 2, 2021. The Brier score is a common way of judging probabilistic forecasts. If you have several people or teams each giving probabilities to various events, you can judge how well they are doing by comparing their Brier scores: lower scores correspond to more accurate predictions. EXPLAINED VARIANCE IN PCA RNA-SEQ NORMALIZATION EXPLAINED RNA-Seq normalization explained. Published on November 28, 2016. RNA-Seq (short for RNA sequencing) is a type of experiment that lets us measure gene expression. The sequencing step produces a large number (tens of millions) of cDNA 1 fragment sequences called reads. Every read represents a part of some RNA molecule in the sample 2.. Then we assign ("map") every read to one of LOGIT() AND LOGISTIC() FUNCTIONS IN R logit () and logistic () functions in R. In statistics, a pair of standard functions logit () and logistic () are defined as follows: ( − x). Given the ubiquity of these functions, it may be puzzling and frustrating for an R user that there are no pre-defined functions logit () and logistic () in R. Some CRAN packages define this function

Word vs. Int. Published on June 5, 2017; updated on March 2, 2018. When dealing with bounded non-negative integer quantities in Haskell, should we use Word or Int to represent them?. Some argue that we shoud use Word because then we automatically know that our quantities are non-negative.. because there's many things that shouldn't be negative by semantic. there's no such as -5 coins in your ALGEBRA OF PERMUTATIONS IN R A permutation is a mathematical function that maps numbers from the set { 1, 2, , n } to other numbers from the same set. An example of a permutation for n = 5 would be. This two-line notation denotes a function s such that s ( 1) = 2, s ( 2) = 5 and so on. In R, we can represent a permutation by its second row, and assume that the first row

rank vs. order in R. Published on March 19, 2016; updated on October 30, 2017. A lot of people (myself included) get confused when they are first confronted with rank and order functions in R. Not only do the descriptions of these functions sound related (they both have to do with how a vector's elements are arranged when the vector is sorted), but their return values may seem identical at CARTESIAN CLOSED COMIC #4: SCHRÖDINGER'S CAT Published on July 21, 2009 PREDICTING A COIN TOSS Predicting a coin toss. Published on June 14, 2016. I flip a coin and it comes up heads. What is the probability it will come up heads the next time I flip it? CARTESIAN CLOSED COMIC #28: MOTTO Published on June 19, 2015 RNA-SEQ NORMALIZATION EXPLAINED RNA-Seq normalization explained. Published on November 28, 2016. RNA-Seq (short for RNA sequencing) is a type of experiment that lets us measure gene expression. The sequencing step produces a large number (tens of millions) of cDNA 1 fragment sequences called reads. Every read represents a part of some RNA molecule in the sample 2.. Then we assign ("map") every read to one of EXPLAINED VARIANCE IN PCA LOGIT() AND LOGISTIC() FUNCTIONS IN R logit () and logistic () functions in R. In statistics, a pair of standard functions logit () and logistic () are defined as follows: ( − x). Given the ubiquity of these functions, it may be puzzling and frustrating for an R user that there are no pre-defined functions logit () and logistic () in R. Some CRAN packages define this function GROUP DATA BY MONTH IN R Group data by month in R. Published on February 22, 2017. I often analyze time series data in R — things like daily expenses or webserver statistics. And just as often I want to aggregate the data by month to see longer-term patterns. SOMETIMES ALL YOU NEED IS TO CUT ADAPTERS In any case, simply trimming the adapter sequence and removing all reads shorter than 40bp fixed pretty much everything: FastQC summary box after removing adapter sequences. The only remaining warning simply tells us that not all sequences are of the same length, as we would expect after trimming adapters: This module will raise a warning if

INCREASE THE OPEN FILES LIMIT ON LINUX Increase the open files limit on Linux. Published on March 26, 2017. Each process on Linux has several limits associated with it, such as the maximum number of files it can open simultaneously. You can find out your current open files limit by running. ulimit -Sn # soft limit; can be raised up to the hard limit ulimit -Hn # hard limit. RNA-SEQ NORMALIZATION EXPLAINED RNA-Seq normalization explained. Published on November 28, 2016. RNA-Seq (short for RNA sequencing) is a type of experiment that lets us measure gene expression. The sequencing step produces a large number (tens of millions) of cDNA 1 fragment sequences called reads. Every read represents a part of some RNA molecule in the sample 2.. Then we assign ("map") every read to one of EXPLAINED VARIANCE IN PCA LOGIT() AND LOGISTIC() FUNCTIONS IN R logit () and logistic () functions in R. In statistics, a pair of standard functions logit () and logistic () are defined as follows: ( − x). Given the ubiquity of these functions, it may be puzzling and frustrating for an R user that there are no pre-defined functions logit () and logistic () in R. Some CRAN packages define this function GROUP DATA BY MONTH IN R Group data by month in R. Published on February 22, 2017. I often analyze time series data in R — things like daily expenses or webserver statistics. And just as often I want to aggregate the data by month to see longer-term patterns. SOMETIMES ALL YOU NEED IS TO CUT ADAPTERS In any case, simply trimming the adapter sequence and removing all reads shorter than 40bp fixed pretty much everything: FastQC summary box after removing adapter sequences. The only remaining warning simply tells us that not all sequences are of the same length, as we would expect after trimming adapters: This module will raise a warning if

INCREASE THE OPEN FILES LIMIT ON LINUX Increase the open files limit on Linux. Published on March 26, 2017. Each process on Linux has several limits associated with it, such as the maximum number of files it can open simultaneously. You can find out your current open files limit by running. ulimit -Sn # soft limit; can be raised up to the hard limit ulimit -Hn # hard limit. UNDERSTANDING ASYMMETRIC NUMERAL SYSTEMS Understanding Asymmetric Numeral Systems. Published on August 20, 2017; updated on August 22, 2017. Apparently, Google is trying to patent (an application of) Asymmetric Numeral Systems, so I spent some time today learning what it is. In its essense lies a simple and beautiful idea. ANS is a lossless compression algorithm. MY PODCAST EDITING WORKFLOW IN REAPER Here is my podcast editing check-list for REAPER, which I'll elaborate on below: Create the project; import and align the tracks. Normalize each track to 0dB and adjust their gain. Apply dynamic split/noise gate to each track. Edit at item boundaries. Re-listen and make final edits.

CARTESIAN CLOSED COMIC #4: SCHRÖDINGER'S CAT Published on July 21, 2009 CAN PROBABILITY BE GREATER THAN ONE? The (frequentist) definition of probability is the limit of the ratio between the number of successes, N s, and the number of trials, N: p = lim N → ∞ N s N. If the number of successes is allowed to be higher than the number of trials, then nothing prevents p from exceeding 1. Consider, for instance, the sleeping beauty problem: EXPLAINED VARIANCE IN PCA RNA-SEQ NORMALIZATION EXPLAINED RNA-Seq normalization explained. Published on November 28, 2016. RNA-Seq (short for RNA sequencing) is a type of experiment that lets us measure gene expression. The sequencing step produces a large number (tens of millions) of cDNA 1 fragment sequences called reads. Every read represents a part of some RNA molecule in the sample 2.. Then we assign ("map") every read to one of LOGIT() AND LOGISTIC() FUNCTIONS IN R In statistics, a pair of standard functions logit() and logistic() are defined as follows:

Hi. I am a Ukrainian software developer and bioinformatician. I write articles , produce podcasts (The bioinformatics chat and Compositional ), and occasionally maintain open source packages on GitHub . The preferred way to reach me is by email: roma@ro-che.info (OpenPGP### key ).

