Are you over 18 and want to see adult content?
More Annotations

A complete backup of www.eltiempo.com/economia/sectores/dia-de-san-valentin-trabajar-con-flores-le-cambio-la-vida-a-una-mujer-vi
Are you over 18 and want to see adult content?

A complete backup of www.livehindustan.com/lifestyle/story-happy-valentine-day-2020-today-is-valentines-day-share-valentine-wish
Are you over 18 and want to see adult content?

A complete backup of www.rnd.de/promis/trauer-um-caroline-flack-britische-love-island-moderatorin-tot-aufgefunden-6B65XI7CSJDRJH
Are you over 18 and want to see adult content?
Favourite Annotations

A complete backup of aikenschools-my.sharepoint.com
Are you over 18 and want to see adult content?

A complete backup of howtomakemoneyonlinehome.com
Are you over 18 and want to see adult content?

A complete backup of whodaresrolls.com
Are you over 18 and want to see adult content?

A complete backup of capsulecorporation.cc
Are you over 18 and want to see adult content?

A complete backup of tizianafausti.com
Are you over 18 and want to see adult content?
Text
impurity over
CHECK IF PYTORCH IS USING THE GPU Check If PyTorch Is Using The GPU. 01 Feb 2020. I find this is always the first thing I want to run when setting up a deep learning environment, whether a desktop machine or on AWS. These commands simply load PyTorch and check to make sure PyTorch can use the GPU. MAKE NEW COLUMNS USING FUNCTIONS Create one column as a function of two columns. # Create a function that takes two inputs, pre and post def pre_post_difference(pre, post): # returns the difference between post and pre return post - pre. # Create a variable that is the output of the function df = pre_post_difference(df,df['postTestScore
CROSS VALIDATION PIPELINE Cross Validation Pipeline. 20 Dec 2017. The code below does a lot in only a few lines. To help explain things, here are the steps that code is doing: Split the raw data into three folds. Select one for testing and two for training. Preprocess the data by scaling the training features. Train a support vector classifier on the training data. HANDLING IMBALANCED CLASSES WITH DOWNSAMPLING Handling Imbalanced Classes With Downsampling. 20 Dec 2017. In downsampling, we randomly sample without replacement from the majority class (i.e. the class with more observations) to create a new subset of observation equal in size to the minority class. HANDLING MISSING VALUES IN TIME SERIES Interpolate Missing Values But Only Up One Value. # Interpolate missing values df.interpolate(limit=1, limit_direction='forward') Sales. 2010-01-31. 1.0. 2010-02-28. 2.0. 2010-03-31. 3.0. RENAME COLUMN HEADERS IN PANDAS Replace the header value with the first row’s values. # Create a new variable called 'header' from the first row of the dataset header = df.iloc 0 first_name 1 last_name 2 age 3 preTestScore Name: 0, dtype: object. # Replace the dataframe with a new one which does not contain the first row df = df # Rename the dataframe's columnvalues
MACHINE LEARNING TUTORIALS I am the Director of Machine Learning at the Wikimedia Foundation.I have spent over a decade applying statistical learning, artificial intelligence, and software engineering to MONITOR A WEBSITE FOR CHANGES WITH PYTHON Monitor A Website For Changes With Python. In this snippet, we create a continous loop that, at set times, scrapes a website, checks to see if it contains some text and if so, emails me. Specifically I used this script to find when Venture Beat had published an article about my company. It should be noted that there are more efficient ways of USING LIST COMPREHENSIONS WITH PANDAS Using list comprehensions with pandas. name reports year next_year; Cochice: Jason: 4: 2012: 2013: Pima: Molly: 24: 2012: 2013: Santa Cruz FEATURE SELECTION USING RANDOM FOREST The process of identifying only the most relevant features is called “feature selection.”. Random Forests are often used for feature selection in a data science workflow. The reason is because the tree-based strategies used by random forests naturally ranks by how well they improve the purity of the node. This mean decrease inimpurity over
CHECK IF PYTORCH IS USING THE GPU Check If PyTorch Is Using The GPU. 01 Feb 2020. I find this is always the first thing I want to run when setting up a deep learning environment, whether a desktop machine or on AWS. These commands simply load PyTorch and check to make sure PyTorch can use the GPU. MAKE NEW COLUMNS USING FUNCTIONS Create one column as a function of two columns. # Create a function that takes two inputs, pre and post def pre_post_difference(pre, post): # returns the difference between post and pre return post - pre. # Create a variable that is the output of the function df = pre_post_difference(df,df['postTestScore
CROSS VALIDATION PIPELINE Cross Validation Pipeline. 20 Dec 2017. The code below does a lot in only a few lines. To help explain things, here are the steps that code is doing: Split the raw data into three folds. Select one for testing and two for training. Preprocess the data by scaling the training features. Train a support vector classifier on the training data. HANDLING IMBALANCED CLASSES WITH DOWNSAMPLING Handling Imbalanced Classes With Downsampling. 20 Dec 2017. In downsampling, we randomly sample without replacement from the majority class (i.e. the class with more observations) to create a new subset of observation equal in size to the minority class. HANDLING MISSING VALUES IN TIME SERIES Interpolate Missing Values But Only Up One Value. # Interpolate missing values df.interpolate(limit=1, limit_direction='forward') Sales. 2010-01-31. 1.0. 2010-02-28. 2.0. 2010-03-31. 3.0. RENAME COLUMN HEADERS IN PANDAS Replace the header value with the first row’s values. # Create a new variable called 'header' from the first row of the dataset header = df.iloc 0 first_name 1 last_name 2 age 3 preTestScore Name: 0, dtype: object. # Replace the dataframe with a new one which does not contain the first row df = df # Rename the dataframe's columnvalues
MAKE NEW COLUMNS USING FUNCTIONS Create one column as a function of two columns. # Create a function that takes two inputs, pre and post def pre_post_difference(pre, post): # returns the difference between post and pre return post - pre. # Create a variable that is the output of the function df = pre_post_difference(df,df['postTestScore
RENAME COLUMN HEADERS IN PANDAS Replace the header value with the first row’s values. # Create a new variable called 'header' from the first row of the dataset header = df.iloc 0 first_name 1 last_name 2 age 3 preTestScore Name: 0, dtype: object. # Replace the dataframe with a new one which does not contain the first row df = df # Rename the dataframe's columnvalues
HIERARCHICAL DATA IN PANDAS Hierarchical Data In pandas. regiment company name preTestScore postTestScore; 0: Nighthawks: 1st: Miller: 4: 25: 1: Nighthawks HYPERPARAMETER TUNING USING GRID SEARCH Create Hyperparameter Search Space. # Create regularization penalty space penalty = # Create regularization hyperparameter space C = np.logspace(0, 4, 10) # Create hyperparameter options hyperparameters = dict(C=C, penalty=penalty) SVC PARAMETERS WHEN USING RBF KERNEL C - The Penalty Parameter. Now we will repeat the process for C: we will use the same classifier, same data, and hold gamma constant. The only thing we will change is the C, the penalty for misclassification.. C = 1. With C = 1, the classifier is clearly tolerant of misclassified data point.There are many red points in the blue region and blue points in the red region. PLOT THE VALIDATION CURVE Plot Validation Curve. # Create range of values for parameter param_range = np.arange(1, 250, 2) # Calculate accuracy on training and test set using range of parameter values train_scores, test_scores = validation_curve(RandomForestClassifier(), X, y, param_name="n_estimators", param_range=param_range, cv=3, scoring="accuracy", n_jobs=-1 PREPROCESSING DATA FOR NEURAL NETWORKS Preprocessing Data For Neural Networks. 20 Dec 2017. Typically, a neural network’s parameters are initialized (i.e. created) as small random numbers. Neural networks often behave poorly when the feature values much larger than parameter values. Furthermore, since an observation’s feature values will are combined as they pass through CREATE DUMMY COLUMNS Dummy Columns. Notice that the ALTER_EGO column does not appear because it is a string. We’d need to join that column back in to regain it. -- Select all columns from SUPERHEROES SELECT * FROM SUPERHEROES -- Create three columns (Maine, Arizona, and California) with a 1 if a row is a member -- of that category and a zerootherwise.
CONVERT COLUMNS INTO ROWS Convert Columns Into Rows. UNPIVOT converts a table’s columns intorows.
SWAP TWO TABLES
Swap Two Tables. 28 Jan 2019. SWAP will swap all contents, metadata, and access permissions between two tables. This can be useful by allowing you to build a new version of the table over time and then when you are ready, swapping out the old verison for the new version. MACHINE LEARNING TUTORIALS I am the Director of Machine Learning at the Wikimedia Foundation.I have spent over a decade applying statistical learning, artificial intelligence, and software engineering to K-NEAREST NEIGHBORS CLASSIFICATION array() According to this result, the model predicted that the observation was loss with a ~67% probability and win with a ~33% probability. Because the observation had a greater probability of being loss, it predicted that class for theobservation.. Notes
USING LIST COMPREHENSIONS WITH PANDAS Using list comprehensions with pandas. name reports year next_year; Cochice: Jason: 4: 2012: 2013: Pima: Molly: 24: 2012: 2013: Santa Cruz FEATURE SELECTION USING RANDOM FOREST The process of identifying only the most relevant features is called “feature selection.”. Random Forests are often used for feature selection in a data science workflow. The reason is because the tree-based strategies used by random forests naturally ranks by how well they improve the purity of the node. This mean decrease inimpurity over
HYPERPARAMETER TUNING USING GRID SEARCH Create Hyperparameter Search Space. # Create regularization penalty space penalty = # Create regularization hyperparameter space C = np.logspace(0, 4, 10) # Create hyperparameter options hyperparameters = dict(C=C, penalty=penalty) APPLY OPERATIONS TO GROUPS IN PANDAS “This grouped variable is now a GroupBy object. It has not actually computed anything yet except for some intermediate data about the group key df.The idea is that this object has all of the information needed to then apply some operation to each of thegroups.”
MAKE NEW COLUMNS USING FUNCTIONS Create one column as a function of two columns. # Create a function that takes two inputs, pre and post def pre_post_difference(pre, post): # returns the difference between post and pre return post - pre. # Create a variable that is the output of the function df = pre_post_difference(df,df['postTestScore
SVC PARAMETERS WHEN USING RBF KERNEL C - The Penalty Parameter. Now we will repeat the process for C: we will use the same classifier, same data, and hold gamma constant. The only thing we will change is the C, the penalty for misclassification.. C = 1. With C = 1, the classifier is clearly tolerant of misclassified data point.There are many red points in the blue region and blue points in the red region. RENAME COLUMN HEADERS IN PANDAS Replace the header value with the first row’s values. # Create a new variable called 'header' from the first row of the dataset header = df.iloc 0 first_name 1 last_name 2 age 3 preTestScore Name: 0, dtype: object. # Replace the dataframe with a new one which does not contain the first row df = df # Rename the dataframe's columnvalues
K-FOLD CROSS-VALIDATING NEURAL NETWORKSCNN CROSS VALIDATIONCROSS VALIDATION TENSORFLOWK FOLD CROSS VALIDATION PYTORCHNEURAL NET IN R k-Fold Cross-Validating Neural Networks. 20 Dec 2017. If we have smaller data it can be useful to benefit from k-fold cross-validation to maximize our ability to evaluate the neural network’s performance. This is possible in Keras because we can “wrap” any neural network such that it can use the evaluation features availablein scikit
MACHINE LEARNING TUTORIALS I am the Director of Machine Learning at the Wikimedia Foundation.I have spent over a decade applying statistical learning, artificial intelligence, and software engineering to K-NEAREST NEIGHBORS CLASSIFICATION array() According to this result, the model predicted that the observation was loss with a ~67% probability and win with a ~33% probability. Because the observation had a greater probability of being loss, it predicted that class for theobservation.. Notes
USING LIST COMPREHENSIONS WITH PANDAS Using list comprehensions with pandas. name reports year next_year; Cochice: Jason: 4: 2012: 2013: Pima: Molly: 24: 2012: 2013: Santa Cruz FEATURE SELECTION USING RANDOM FOREST The process of identifying only the most relevant features is called “feature selection.”. Random Forests are often used for feature selection in a data science workflow. The reason is because the tree-based strategies used by random forests naturally ranks by how well they improve the purity of the node. This mean decrease inimpurity over
HYPERPARAMETER TUNING USING GRID SEARCH Create Hyperparameter Search Space. # Create regularization penalty space penalty = # Create regularization hyperparameter space C = np.logspace(0, 4, 10) # Create hyperparameter options hyperparameters = dict(C=C, penalty=penalty) APPLY OPERATIONS TO GROUPS IN PANDAS “This grouped variable is now a GroupBy object. It has not actually computed anything yet except for some intermediate data about the group key df.The idea is that this object has all of the information needed to then apply some operation to each of thegroups.”
MAKE NEW COLUMNS USING FUNCTIONS Create one column as a function of two columns. # Create a function that takes two inputs, pre and post def pre_post_difference(pre, post): # returns the difference between post and pre return post - pre. # Create a variable that is the output of the function df = pre_post_difference(df,df['postTestScore
SVC PARAMETERS WHEN USING RBF KERNEL C - The Penalty Parameter. Now we will repeat the process for C: we will use the same classifier, same data, and hold gamma constant. The only thing we will change is the C, the penalty for misclassification.. C = 1. With C = 1, the classifier is clearly tolerant of misclassified data point.There are many red points in the blue region and blue points in the red region. RENAME COLUMN HEADERS IN PANDAS Replace the header value with the first row’s values. # Create a new variable called 'header' from the first row of the dataset header = df.iloc 0 first_name 1 last_name 2 age 3 preTestScore Name: 0, dtype: object. # Replace the dataframe with a new one which does not contain the first row df = df # Rename the dataframe's columnvalues
K-FOLD CROSS-VALIDATING NEURAL NETWORKSCNN CROSS VALIDATIONCROSS VALIDATION TENSORFLOWK FOLD CROSS VALIDATION PYTORCHNEURAL NET IN R k-Fold Cross-Validating Neural Networks. 20 Dec 2017. If we have smaller data it can be useful to benefit from k-fold cross-validation to maximize our ability to evaluate the neural network’s performance. This is possible in Keras because we can “wrap” any neural network such that it can use the evaluation features availablein scikit
MONITOR A WEBSITE FOR CHANGES WITH PYTHON Monitor A Website For Changes With Python. In this snippet, we create a continous loop that, at set times, scrapes a website, checks to see if it contains some text and if so, emails me. Specifically I used this script to find when Venture Beat had published an article about my company. It should be noted that there are more efficient ways of MAKE NEW COLUMNS USING FUNCTIONS Create one column as a function of two columns. # Create a function that takes two inputs, pre and post def pre_post_difference(pre, post): # returns the difference between post and pre return post - pre. # Create a variable that is the output of the function df = pre_post_difference(df,df['postTestScore
HIERARCHICAL DATA IN PANDAS Hierarchical Data In pandas. regiment company name preTestScore postTestScore; 0: Nighthawks: 1st: Miller: 4: 25: 1: Nighthawks PLOT THE VALIDATION CURVE Plot Validation Curve. # Create range of values for parameter param_range = np.arange(1, 250, 2) # Calculate accuracy on training and test set using range of parameter values train_scores, test_scores = validation_curve(RandomForestClassifier(), X, y, param_name="n_estimators", param_range=param_range, cv=3, scoring="accuracy", n_jobs=-1 COLOR PALETTES IN SEABORN Color palettes in Seaborn. Create a color palette and set it as the current color palette HANDLING IMBALANCED CLASSES WITH DOWNSAMPLING Handling Imbalanced Classes With Downsampling. 20 Dec 2017. In downsampling, we randomly sample without replacement from the majority class (i.e. the class with more observations) to create a new subset of observation equal in size to the minority class. JOIN AND MERGE PANDAS DATAFRAME Merge with outer join. “Full outer join produces the set of all records in Table A and Table B, with matching records from both sides where available. If there is no match, the missing side will contain null.” - source. pd.merge(df_a, df_b, on='subject_id', how='outer') subject_id. first_name_x. CREATE DUMMY COLUMNS Dummy Columns. Notice that the ALTER_EGO column does not appear because it is a string. We’d need to join that column back in to regain it. -- Select all columns from SUPERHEROES SELECT * FROM SUPERHEROES -- Create three columns (Maine, Arizona, and California) with a 1 if a row is a member -- of that category and a zerootherwise.
MACHINE LEARNING FLASHCARDS Enter your info below to login. Your email. Your password STOP GIT FROM ASKING FOR PASSWORD EVERY PUSH AND PULL FROM If you cloned your GitHub repository using HTTPS, every time you push or pull a repository from GitHub Git will prompt you for your GitHub username and password.This becomes particularly frustrating if you use multi-factor authentication because you cannot use your regular password but instead use a generated token.Chris Albon
* Technical Notes
Machine Learning DeepLearning Python
Statistics
Scala
Snowflake
PostgreSQL
Command Line
Regular Expressions
Mathematics
AWS
Git & GitHub
Computer Science
* Management
* Articles
* About
About Chris GitHub
ML Book
ML Flashcards
NOTES ON USING
DATA SCIENCE & MACHINE LEARNING TO FIGHT FOR SOMETHING THAT MATTERS I am a data scientist with a decade of experience applying statistical learning, artificial intelligence, and software engineering to political, social, and humanitarian efforts -- from election monitoring to disaster relief. I lead the data science team at Devoted Health , helping fix America's health care system. Learning machine learning? Check out my Machine Learning Flashcards or my book, Machine Learning With Python Cookbook .MACHINE LEARNING
BASICS
* Loading Features From Dictionaries * Loading scikit-learn's Boston Housing Dataset * Loading scikit-learn's Digits Dataset * Loading scikit-learn's Iris Dataset * Make Simulated Data For Classification * Make Simulated Data For Clustering * Make Simulated Data For Regression * Perceptron In Scikit * Saving Machine Learning Models VECTORS, MATRICES, AND ARRAYS * Transpose A Vector Or Matrix * Selecting Elements In An Array* Reshape An Array
* Invert A Matrix
* Getting The Diagonal Of A Matrix* Flatten A Matrix
* Find The Rank Of A Matrix * Find The Maximum And Minimum* Describe An Array
* Create A Vector
* Create A Sparse Matrix* Create A Matrix
* Converting A Dictionary Into A Matrix * Calculate The Trace Of A Matrix * Calculate The Determinant Of A Matrix * Calculate The Average, Variance, And Standard Deviation * Calculate Dot Product Of Two Vectors * Apply Operations To Elements * Adding And Subtracting Matrices PREPROCESSING STRUCTURED DATA * Convert Pandas Categorical Data For Scikit-Learn * Delete Observations With Missing Values * Deleting Missing Values * Detecting Outliers * Discretize Features * Encoding Ordinal Categorical Features * Handling Imbalanced Classes With Downsampling * Handling Imbalanced Classes With Upsampling* Handling Outliers
* Impute Missing Values With Means * Imputing Missing Class Labels * Imputing Missing Class Labels Using k-Nearest Neighbors * Normalizing Observations * One-Hot Encode Features With Multiple Labels * One-Hot Encode Nominal Categorical Features * Preprocessing Categorical Features * Preprocessing Iris Data* Rescale A Feature
* Standardize A Feature PREPROCESSING IMAGES* Binarize Images
* Blurring Images
* Cropping Images
* Detect Edges
* Enhance Contrast Of Color Image * Enhance Contrast Of Greyscale Image * Harris Corner Detector* Installing OpenCV
* Isolate Colors
* Load Images
* Remove Backgrounds* Save Images
* Sharpen Images
* Shi-Tomasi Corner Detector * Using Mean Color As A FeaturePREPROCESSING TEXT
* Bag Of Words
* Parse HTML
* Remove Punctuation* Remove Stop Words
* Replace Characters* Stemming Words
* Strip Whitespace
* Tag Parts Of Speech * Term Frequency Inverse Document Frequency* Tokenize Text
PREPROCESSING DATES AND TIMES * Break Up Dates And Times Into Multiple Features * Calculate Difference Between Dates And Times * Convert Strings To Dates * Convert pandas Columns Time Zone * Encode Days Of The Week * Handling Missing Values In Time Series * Handling Time Zones * Lag A Time Feature * Rolling Time Window * Select Date And Time RangesFEATURE ENGINEERING
* Dimensionality Reduction On Sparse Feature Matrix * Dimensionality Reduction With Kernel PCA * Dimensionality Reduction With PCA * Feature Extraction With PCA * Group Observations Using K-Means Clustering * Selecting The Best Number Of Components For LDA * Selecting The Best Number Of Components For TSVD * Using Linear Discriminant Analysis For Dimensionality ReductionFEATURE SELECTION
* ANOVA F-value For Feature Selection * Chi-Squared For Feature Selection * Drop Highly Correlated Features * Recursive Feature Elimination * Variance Thresholding Binary Features * Variance Thresholding For Feature SelectionMODEL EVALUATION
* Accuracy
* Create Baseline Classification Model * Create Baseline Regression Model * Cross Validation Pipeline * Cross Validation With Parameter Tuning Using Grid Search* Cross-Validation
* Custom Performance Metric* F1 Score
* Generate Text Reports On Performance * Nested Cross Validation * Plot The Learning Curve * Plot The Receiving Operating Characteristic Curve * Plot The Validation Curve* Precision
* Recall
* Split Data Into Training And Test SetsMODEL SELECTION
* Find Best Preprocessing Steps During Model Selection * Hyperparameter Tuning Using Grid Search * Hyperparameter Tuning Using Random Search * Model Selection Using Grid Search * Pipelines With Parameter OptimizationLINEAR REGRESSION
* Adding Interaction Terms * Create Interaction Features * Effect Of Alpha On Lasso Regression* Lasso Regression
* Linear Regression
* Linear Regression Using Scikit-Learn* Ridge Regression
* Selecting The Best Alpha Value In Ridge RegressionLOGISTIC REGRESSION
* Fast C Hyperparameter Tuning * Handling Imbalanced Classes In Logistic Regression * Logistic Regression * Logistic Regression On Very Large Data * Logistic Regression With L1 Regularization * One Vs. Rest Logistic RegressionTREES AND FORESTS
* Adaboost Classifier * Decision Tree Classifier * Decision Tree Regression * Feature Importance * Feature Selection Using Random Forest * Handle Imbalanced Classes In Random Forest * Random Forest Classifier * Random Forest Classifier Example * Random Forest Regression * Select Important Features In Random Forest * Titanic Competition With Random Forest * Visualize A Decision TreeNEAREST NEIGHBORS
* Identifying Best Value Of k * K-Nearest Neighbors Classification * Radius-Based Nearest Neighbor Classifier SUPPORT VECTOR MACHINES * Calibrate Predicted Probabilities In SVC * Find Nearest Neighbors * Find Support Vectors * Imbalanced Classes In SVM * Plot The Support Vector Classifiers Hyperplane * SVC Parameters When Using RBF Kernel * Support Vector ClassifierNAIVE BAYES
* Bernoulli Naive Bayes Classifier * Calibrate Predicted Probabilities * Gaussian Naive Bayes Classifier * Multinomial Logistic Regression * Multinomial Naive Bayes Classifier * Naive Bayes Classifier From ScratchCLUSTERING
* Agglomerative Clustering* DBSCAN Clustering
* Evaluating Clustering * Meanshift Clustering * Mini-Batch k-Means Clustering * k-Means ClusteringDEEP LEARNING
SETUP
* Prevent Ubuntu 18.06 And Nvidia Drivers From UpdatingKERAS
* Adding Dropout
* Convolutional Neural Network * Feedforward Neural Network For Binary Classification * Feedforward Neural Network For Multiclass Classification * Feedforward Neural Networks For Regression * LSTM Recurrent Neural Network * Neural Network Early Stopping * Neural Network Weight Regularization * Preprocessing Data For Neural Networks * Save Model Training Progress * Tuning Neural Network Hyperparameters * Visualize Loss History * Visualize Neural Network Architecutre * Visualize Performance History * k-Fold Cross-Validating Neural NetworksPYTORCH
* Check If PyTorch Is Using The GPUPYTHON
BASICS
* Add Padding Around String * All Combinations For A List Of Objects * Apply Operations Over Items In A List * Applying Functions To List Items* Arithmetic Basics
* Assignment Operators * Basic Operations With NumPy Array * Breaking Up String Variables * Brute Force D20 Roll Simulator* Cartesian Product
* Chain Together Lists* Cleaning Text
* Compare Two Dictionaries * Concurrent Processing * Continue And Break Loops * Convert HTML Characters To Strings * Converting Strings To Datetime * Create A New File Then Write To It * Create A Temporary File * Data Structure Basics * Date And Time Basics* Dictionary Basics
* Display JSON
* Display Scientific Notation As Floats* Exiting A Loop
* Find The Max Value In A Dictionary * Flatten Lists Of Lists* For Loop
* Formatting Numbers * Function Annotation Examples* Function Basics
* Functions Vs. Generators * Generating Random Numbers With NumPy * Generator Expressions * Hard Wrapping Text * How To Use Default Dicts * If Else On Any Or All Elements * Indexing And Slicing NumPy Arrays * Indexing And Slicing NumPy Arrays * Iterate An Ifelse Over A List * Iterate Over Multiple Lists Simultaneously * Iterating Over Dictionary Keys* Lambda Functions
* List All Files Of Certain Type In A Directory * Logical Operations * Looping Over Two Lists * Mathematical Operations* Mocking Functions
* Nested For Loops Using List Comprehension* Nesting Lists
* Numpy Array Basics * Parallel Processing * Partial Function Applications* Priority Queues
* Queues And Stacks
* Recursive Functions * Scheduling Jobs In The Future * Select Random Element From A List * Selecting Items In A List With Filters * Set The Color Of A Matplotlib Plot * Sort A List Of Names By Last Name * Sort A List Of Strings By Length * Store API Credentials For Open Source Projects* String Formatting
* String Indexing
* String Operations
* Swapping Variable Values * Try, Except, and Finally* Unpacking A Tuple
* Unpacking Function Arguments * Use Command Line Arguments In A Function * Using Named Tuples To Store Data * any(), all(), max(), min(), sum()* if and if else
* repr vs. str
* while Statement
DATA WRANGLING
* Apply Functions By Group In Pandas * Apply Operations To Groups In Pandas * Applying Operations Over pandas Dataframes * Assign A New Column To A Pandas DataFrame * Break A List Into N-Sized Chunks * Breaking Up A String Into Columns Using Regex In pandas * Columns Shared By Two Data Frames * Construct A Dictionary From Multiple Lists * Convert A CSV Into Python Code To Recreate It * Convert A Categorical Variable Into Dummy Variables * Convert A Categorical Variable Into Dummy Variables * Convert A String Categorical Variable To A Numeric Variable * Convert A Variable To A Time Variable In pandas * Count Values In Pandas Dataframe * Create A Pipeline In Pandas * Create A pandas Column With A For Loop * Create Counts Of Items * Create a Column Based on a Conditional in pandas * Creating Lists From Dictionary Keys And Values * Crosstabs In pandas * Delete Duplicates In pandas * Descriptive Statistics For pandas Dataframe * Dropping Rows And Columns In pandas Dataframe* Enumerate A List
* Expand Cells Containing Lists Into Their Own Variables In Pandas * Filter pandas Dataframes * Find Largest Value In A Dataframe Column * Find Unique Values In Pandas Dataframes * Geocoding And Reverse Geocoding * Geolocate A City And Country * Geolocate A City Or Country * Group A Time Series With pandas * Group Data By Time * Group Pandas Data By Hour Of The Day * Grouping Rows In pandas * Hierarchical Data In pandas * Join And Merge Pandas Dataframe * List Unique Values In A pandas Column * Load A JSON File Into Pandas * Load An Excel File Into Pandas * Load Excel Spreadsheet As pandas Dataframe * Loading A CSV Into pandas * Long To Wide Format * Lower Case Column Names In Pandas Dataframe * Make New Columns Using Functions * Map External Values To Dataframe Values in pandas * Missing Data In pandas Dataframes * Moving Averages In pandas * Normalize A Column In pandas * Pivot Tables In pandas * Quickly Change A Column Of Strings In Pandas * Random Sampling Dataframe * Ranking Rows Of Pandas Dataframes * Regular Expression Basics * Regular Expression By Example * Reindexing pandas Series And Dataframes * Rename Column Headers In pandas * Rename Multiple pandas Dataframe Column Names * Replacing Values In pandas * Saving A pandas Dataframe As A CSV * Search A pandas Column For A Value * Select Rows When Columns Contain Certain Values * Select Rows With A Certain Value * Select Rows With Multiple Filters * Selecting pandas DataFrame Rows Based On Conditions * Simple Example Dataframes In pandas * Sorting Rows In pandas Dataframes * Split Lat/Long Coordinate Variables Into Separate Variables * Streaming Data Pipeline * String Munging In Dataframe * Using List Comprehensions With pandas * Using Seaborn To Visualize A pandas Dataframe * pandas Data Structures * pandas Time Series BasicsDATA VISUALIZATION
* Back To Back Bar Plot In MatPlotLib * Bar Plot In MatPlotLib * Color Palettes in Seaborn * Creating A Time Series Plot With Seaborn And pandas * Creating Scatterplots With Seaborn * Group Bar Plot In MatPlotLib * Histograms In MatPlotLib * Making A Matplotlib Scatterplot From A Pandas Dataframe * Matplotlib, A Simple Example * Pie Chart In MatPlotLib * Scatterplot In MatPlotLib * Stacked Percentage Bar Plot In MatPlotLibWEB SCRAPING
* Beautiful Soup Basic HTML Scraping * Drilling Down With Beautiful Soup * Monitor A Website For Changes With PythonTESTING
* Simple Unit Test
* Test Code Speed
* Test For A Specific Exception * Test If Output Is Close To A Value * Testable DocumentationLOGGING
* Basic Logging
OTHER
* Generate Tweets Using Markov Chains * Mine Twitter's Stream For Hashtags Or Words * Simple Clustering With SciPy * What Is The Probability An Economy Class Seat Is An Aisle Seat?STATISTICS
BASICS
* Trimmed Mean
FREQUENTIST
* Bessels Correction * Demonstrate The Central Limit Theorem * Pearsons Correlation Coefficient * Probability Mass Functions * Spearmans Rank Correlation* T-Tests
* Variance And Standard DeviationSCALA
* Break A Sequence Into Groups* Change Data Type
* Chunk Sequence In Equal Sized Groups * Compare Two Floats* Create A Range
* Extract Substrings Using Regex* Filter A Sequence
* Find Largest Key Or Value In A Map * Flatten Sequence Of Sequences* For Loop A Map
* For Looping
* Format Numbers As Currency* If Else
* Increment And Decrement Numbers * Insert Variables Into Strings * Iterate Over A Map* Loop A Collection
* Make Numbers Pretty * Mapping A Function To A Collection * Matching Conditions* Mutable Maps
* N Dimension Arrays* Partial Functions
* Random Integer Between Two Values * Replacing Parts Of Strings* Search A Map
* Search Strings
* Search Strings Using Regex * Set Operations On Sequences* Sorting Sequences
* Split Strings
* Try, Catch, Finally * Variables And Values * Zip Together Two ListsREGULAR EXPRESSIONS
* Match A Symbol
* Match A Unicode Character* Match A Word
* Match Any Character * Match Any Of A List Of Characters * Match Any Of A Series Of Options * Match Any Of A Series Of Words* Match Dates
* Match Email Addresses* Match Exact Text
* Match Integers Of Any Length * Match Text Between HTML Tags* Match Times
* Match URLs
* Match US Phone Numbers * Match US and UK Spellings * Match Words With A Certain Ending* Match ZIP Codes
SNOWFLAKE
BASICS
* Convert Columns Into Rows * Convert Data Types * Create Dummy Columns * Get Rows Meetings Multiple Conditions * Give Column An Alias * Give Table An Alias* Query A Table
* Return First Few Rows * Return N Rows After Skipping First K Rows * Sample Random Rows From A Table * Sort Rows By A Column's ValuesROWS
MERGE AND JOIN
* Left Join
TABLES
* Add Column
* Add Comment To Column * Create Or Replace Table* Create Table
* Create Table From Query * Create Table If It Doesn't Already Exist * Create Table With Column And Table Comments * Create Temporary Table* Delete A Table
* Describe A Table
* Drop Column
* Query DESCRIBE TABLE like a table* Rename Column
* Rename Table
* Swap Two Tables
* Undelete A Table
* View A Column's Comments * View A Table's CommentsTEXT
NUMERIC
* Get Absolute ValuesDATES
DATABASES
OTHER
POSTGRESQL
BASICS
* Apply Operation To Column * Compare Values To Subquery * Copy Rows From One Table To Another* Count Rows
* Count Unique Values * Create Column Index * Create PostgreSQL Database With Python* Create Subquery
* Create View
* Delete View
* Examine A Query
* Group Rows
* Group Rows With Conditions* If Else
* List Index Columns * List Tables In Database * Rename Columns In Views * Replace Missing Values * Retrieve Only A Few Rows * Retrieve Random Subset Of Rows* Retrieve Row
* Retrieve Rows Based On Condition * Retrieve Rows Based On Multiple Condition * Retrieve Subset Of Columns * Retrieving Missing Values * Save Queries As Variables * Select Highest Value In Each Group * Select Values Between Two Values* Sort Rows
* Sort Rows In Groups * Test If Rows Exist In Subquery * Use Column Aliases With Where Clause * Value Matches Element Of A List * View Unique Values ADD, DELETE, CHANGE ROWS* Add Column
* Change Values
* Create Column Aliases * Create Column Conditional On Another Column * Create Column Of Values * Create Primary Key* Delete All Rows
* Delete Duplicates
* Delete Primary Key* Delete Rows
* Delete Rows That Don't Exist In Another Table* Export To CSV
* Import CSV
* Insert Rows
* Update Rows Based On Another TableMERGING AND JOINING
* All Unique Values In Two Tables * Cartesian Product Of Tables * Concatenate Multiple Table * Find Values In Both Tables * Find Values In One Table And Not Another* Inner Join Tables
* Join Multiple Table* Left Join Tables
* Outer Join Tables
* Right Join Tables
* Self Join Table
* Stack Tables
TABLES
* Copy Table Structure* Create Table
* Create Table With Default Values * Create Table With UUIDs * Create Temporary Table* Delete Table
* Delete Table With Views* Duplicate Table
* List Columns In Table * Show Column Information * View Size Of TableTEXT
* Concatenate Values * Extract Characters From Strings * Lower And Upper Case * Partial String MatchNUMERIC
* Calculate Max, Min, Or Average Of Column * Calculate Running Total * Calculate Sum Of Column * Convert Floats To Integers * Mathematical Operations On ColumnsDATES
* Adding Or Substracting Time * Calculate Time DurationMATHEMATICS
* argmin and argmax
AWS
* Run Project Jupyter Notebooks On Amazon EC2* Create Bucket
* List Buckets
COMPUTER SCIENCE
ALGORITHMS
* Big-O Notation
* Binary Search
* Bubble Sort
* Insertion Sort
* Selection Sort
LINUX COMMAND LINE
BASICS
* Archive And Unarchive Files * Change Permissions * Changing Directories * Check Current Date And Time * Copy Files And Directories* Create Command
* Create Directory
* Create File
* Create Sequential List Of Files And Directories * Create Symbolic Links * Delete Files And Directories * Delete Files And Directories In Current Directory * Exit Terminal Session * Get Help With A Command * Get Information On A File * List Avaliable Commands * List The Contents Of A Directory * Move Files And Directories * Multiple Commands On One Line* Ping Website
* Rename File
* See Disk Drive Space* See Free Memory
* See Who Is Logged Into A System * Select Files Based On Filename * Synchronize Files And Directories * Track Route Of Network Traffic * View A File's Type * View A Text File's Contents * View Current Working Directory * View First And Last Parts Of Files * Zip And Unzip Directories * Zip And Unzip FilesINPUTS AND OUTPUTS
* Append Error To File * Append File Contents To Another File * Append Output To File * Append Outputs And Errors To File * Chain Multiple Commands * Concatenate Multiple Files * Save Output To File In Middle Of Command Chain* Silence Errors
* Sort Rows
* Write Errors To File * Write Output To FileSEARCH
* Find Directories
* Find Files
* Find Files Based On Multiple Conditions * Find Files By Filename * Find Files By Size * Find Program's Location * Find Symbolic Links * Search Contents Of All Files Of Certain Type* Search Filenames
* Search The Contents Of A FileTEXT
* Add Columns To Text * Adding Line Numbers * Comparing Text Files* Count Unique Rows
* Extract Text
* Find And Replace
* Join And Sort Text* Join Columns
* Quickly View File ContentsFLOW CONTROL
* For Loops
* If Else For Integers * If Else For Strings * If Else With Multiple ConditionsPROCESSES
* List Processes
* Monitor Processes
GIT AND GITHUB
* GitHub Cheatsheet
* Stop Git From Asking For Password Every Push And Pull From GitHub DATA SCIENCE MANAGEMENT * What I Said To My Team During COVID-19ARTICLES
* What I Learned Tracking My Time At Techstars * Health System Destruction During The Mozambican Civil War * Health System Reconstruction In Post-War Kosovo * The Problem Of Rebel Mobilization * The Structure Of Health Systems Copyright © Chris Albon, 2020. All 624 notes and articles are available on GitHub .Details
Copyright © 2023 ArchiveBay.com. All rights reserved. Terms of Use | Privacy Policy | DMCA | 2021 | Feedback | Advertising | RSS 2.0