Are you over 18 and want to see adult content?
More Annotations
A complete backup of sonderose.tumblr.com
Are you over 18 and want to see adult content?
A complete backup of newhorizons.co.uk
Are you over 18 and want to see adult content?
A complete backup of professoracarol.org
Are you over 18 and want to see adult content?
A complete backup of apc-overnight.com
Are you over 18 and want to see adult content?
Favourite Annotations
A complete backup of wayfieldfoods.com
Are you over 18 and want to see adult content?
A complete backup of searchquarry.com
Are you over 18 and want to see adult content?
A complete backup of memoryholeblog.com
Are you over 18 and want to see adult content?
A complete backup of timesofisrael.com
Are you over 18 and want to see adult content?
A complete backup of greaterclevelandaquarium.com
Are you over 18 and want to see adult content?
Text
questions.
CLEANING UP CURRENCY DATA WITH PANDAS In this example, the data is a mixture of currency labeled and non-currency labeled values. For a small example like this, you might want to clean it up at the source file. USING PANDAS TO CREATE AN EXCEL DIFF Introduction. As part of my continued exploration of pandas, I am going to walk through a real world example of how to use pandas to automate a process that could be very difficult to do in Excel.My business problem is that I have two Excel files that are structured similarly but have different data and I would like to easily understand what has changed between the two files. OVERVIEW OF PANDAS DATA TYPES RangeIndex: 5 entries, 0 to 4 Data columns (total 10 columns): Customer Number 5 non-null float64 Customer Name 5 non-null object 2016 5 non-null object 2017 5 non-null object Percent Growth 5 non-null object Jan Units 5 non-null object Month 5 non-null int64 Day 5 non-null int64 Year 5 non-null int64 Active 5 non-null object dtypes: float64(1), int64(3 USING WSL TO BUILD A PYTHON DEVELOPMENT ENVIRONMENT ON Introduction. In 2016, Microsoft launched Windows Subsystem for Linux (WSL) which brought robust unix functionality to Windows.In May 2019, Microsoft announced the release of WSL 2 which includes an updated architecture that improved many aspects of WSL - especially file system performance. I have been following WSL for a while but now that WSL 2 is nearing general release, I BINNING DATA WITH PANDAS QCUT AND CUT Binning. One of the most common instances of binning is done behind the scenes for you when creating a histogram. The histogram below of customer sales data, shows how a continuous set of sales numbers can be divided into discrete bins (for example: $60,000 - $70,000) and then used to group and count account instances. COMBINE MULTIPLE EXCEL WORKSHEETS INTO A SINGLE PANDAS Introduction. One of the most commonly used pandas functions is read_excel.This short article shows how you can read in all the tabs in an Excel workbook and combine them into a single pandas dataframeusing one command.
CREATING POWERPOINT PRESENTATIONS WITH PYTHON Creating tables in PowerPoint is a good news / bad news story. The good news is that there is an API to create one. The bad news is that you can’t easily convert a pandas DataFrame to a table using the built in API.However, we are very fortunate that someone has already done all the hard work for us and created PandasToPowerPoint.. This excellent piece of code takes a DataFrame and converts EFFICIENTLY CLEANING TEXT WITH PANDAS Cleaning attempt #2. Another approach that is very performant and flexible is to use np.select to run multiple matches and apply a specified value upon match.. There are several good resources that I used to learn how to use np.select.This article from Dataquest is a good overview. I also found this presentation from Nathan Cheever very interesting and information. PANDAS DATAFRAME VISUALIZATION TOOLS Data Analysis Applications. The second category of GUI applications are full-fledged applications typically using a web back-end like Flask or a separate application based on Qt. These applications vary in complexity and capability from simple table views and plotting capabilities to robust statistical analysis. EXPLORING AN ALTERNATIVE TO JUPYTER NOTEBOOKS FOR PYTHON What is the problem? Jupyter notebooks are an extremely powerful and effective tool for analyzing data. When I approach a new problem, I will typically create a Jupyter notebook and start investigating the data and developing reports or visualizations to answer my businessquestions.
CLEANING UP CURRENCY DATA WITH PANDAS In this example, the data is a mixture of currency labeled and non-currency labeled values. For a small example like this, you might want to clean it up at the source file. USING PANDAS TO CREATE AN EXCEL DIFF Introduction. As part of my continued exploration of pandas, I am going to walk through a real world example of how to use pandas to automate a process that could be very difficult to do in Excel.My business problem is that I have two Excel files that are structured similarly but have different data and I would like to easily understand what has changed between the two files. OVERVIEW OF PANDAS DATA TYPES RangeIndex: 5 entries, 0 to 4 Data columns (total 10 columns): Customer Number 5 non-null float64 Customer Name 5 non-null object 2016 5 non-null object 2017 5 non-null object Percent Growth 5 non-null object Jan Units 5 non-null object Month 5 non-null int64 Day 5 non-null int64 Year 5 non-null int64 Active 5 non-null object dtypes: float64(1), int64(3 USING WSL TO BUILD A PYTHON DEVELOPMENT ENVIRONMENT ON Introduction. In 2016, Microsoft launched Windows Subsystem for Linux (WSL) which brought robust unix functionality to Windows.In May 2019, Microsoft announced the release of WSL 2 which includes an updated architecture that improved many aspects of WSL - especially file system performance. I have been following WSL for a while but now that WSL 2 is nearing general release, I BINNING DATA WITH PANDAS QCUT AND CUT Binning. One of the most common instances of binning is done behind the scenes for you when creating a histogram. The histogram below of customer sales data, shows how a continuous set of sales numbers can be divided into discrete bins (for example: $60,000 - $70,000) and then used to group and count account instances. COMBINE MULTIPLE EXCEL WORKSHEETS INTO A SINGLE PANDAS Introduction. One of the most commonly used pandas functions is read_excel.This short article shows how you can read in all the tabs in an Excel workbook and combine them into a single pandas dataframeusing one command.
PRACTICAL BUSINESS PYTHON The pandas read_html() function is a quick and convenient way to turn an HTML table into a pandas DataFrame. This function can be useful for quickly incorporating tables from various websites without figuring out how to scrape the site’s HTML.However, there can be some challenges in cleaning and formatting the data before analyzing it. CREATING PDF REPORTS WITH PANDAS, JINJA AND WEASYPRINT Introduction. Pandas is excellent at manipulating large amounts of data and summarizing it in multiple text and visual representations. Without much effort, pandas supports output to CSV, Excel, HTML, json and more.Where things get more difficult is if you want SIDETABLE - CREATE SIMPLE SUMMARY TABLES IN PANDAS sidetable. At its core, sidetable is a super-charged version of pandas value_counts with a little bit of crosstab mixed in. For instance, let’s look at some data on School Improvement Grants so we can see how sidetable can help us explore a new data set and figure out approaches for more complex analysis.. The only external dependency is pandas version >= 1.0. USING WSL TO BUILD A PYTHON DEVELOPMENT ENVIRONMENT ON Introduction. In 2016, Microsoft launched Windows Subsystem for Linux (WSL) which brought robust unix functionality to Windows.In May 2019, Microsoft announced the release of WSL 2 which includes an updated architecture that improved many aspects of WSL - especially file system performance. I have been following WSL for a while but now that WSL 2 is nearing general release, I OVERVIEW OF PANDAS DATA TYPES RangeIndex: 5 entries, 0 to 4 Data columns (total 10 columns): Customer Number 5 non-null float64 Customer Name 5 non-null object 2016 5 non-null object 2017 5 non-null object Percent Growth 5 non-null object Jan Units 5 non-null object Month 5 non-null int64 Day 5 non-null int64 Year 5 non-null int64 Active 5 non-null object dtypes: float64(1), int64(3 GENERATING EXCEL REPORTS FROM A PANDAS PIVOT TABLE Introduction. The previous pivot table article described how to use the pandas pivot_table function to combine and present data in an easy to view manner. This concept is probably familiar to anyone that has used pivot tables in Excel. However, pandas has the capability to easily take a cross section of the data and manipulate it. COMBINE MULTIPLE EXCEL WORKSHEETS INTO A SINGLE PANDAS Introduction. One of the most commonly used pandas functions is read_excel.This short article shows how you can read in all the tabs in an Excel workbook and combine them into a single pandas dataframeusing one command.
READING HTML TABLES WITH PANDAS Introduction. The pandas read_html() function is a quick and convenient way to turn an HTML table into a pandas DataFrame. This function can be useful for quickly incorporating tables from various websites without figuring out how to scrape the site’s HTML.However, there can be some challenges in cleaning and formatting the data before analyzing it. READING POORLY STRUCTURED EXCEL FILES WITH PANDAS The key concept to keep in mind is that the function will parse each column by name and must return a True or False for each column. Those columns that get evaluated to True will be included. Another approach to using a callable is to include a lambda expression. Here is an example where we want to include only a defined list of columns. TIPS FOR SELECTING COLUMNS IN A DATAFRAME Hmmm. That obviously doesn’t work but seems like it would be useful for selecting ranges as well as individual columns. Fortunately there is a numpy object that can help us out. The r_ object will “Translate slice objects to concatenation along the first axis.” It might not make much sense from the documentation but it does exactly what we need. EFFICIENTLY CLEANING TEXT WITH PANDAS Cleaning attempt #2. Another approach that is very performant and flexible is to use np.select to run multiple matches and apply a specified value upon match.. There are several good resources that I used to learn how to use np.select.This article from Dataquest is a good overview. I also found this presentation from Nathan Cheever very interesting and information. AUTOMATING WINDOWS APPLICATIONS USING COM Pywin32 is basically a very thin wrapper of python that allows us to interact with COM objects and automate Windows applications with python. The power of this approach is that you can pretty much do anything that a Microsoft Application can do through python. The downside is that you have to run this on a Windows system withMicrosoft Office
PANDAS DATAFRAME VISUALIZATION TOOLS Data Analysis Applications. The second category of GUI applications are full-fledged applications typically using a web back-end like Flask or a separate application based on Qt. These applications vary in complexity and capability from simple table views and plotting capabilities to robust statistical analysis. CLEANING UP CURRENCY DATA WITH PANDAS In this example, the data is a mixture of currency labeled and non-currency labeled values. For a small example like this, you might want to clean it up at the source file. GUIDE TO ENCODING CATEGORICAL VALUES IN PYTHON The Data Set. For this article, I was able to find a good dataset at the UCI Machine Learning Repository.This particular Automobile Data Set includes a good mix of categorical values as well as continuous values and serves as a useful example that is relatively easy to understand. Since domain understanding is an important aspect when deciding how to encode various categorical values - this COMPREHENSIVE GUIDE TO GROUPING AND AGGREGATING WITH The most common built in aggregation functions are basic math functions including sum, mean, median, minimum, maximum, standard deviation, variance, mean absolute deviation and product. We can apply all these functions to the fare while grouping by the embark_town : This is all relatively straightforward math. BINNING DATA WITH PANDAS QCUT AND CUT Binning. One of the most common instances of binning is done behind the scenes for you when creating a histogram. The histogram below of customer sales data, shows how a continuous set of sales numbers can be divided into discrete bins (for example: $60,000 - $70,000) and then used to group and count account instances. OVERVIEW OF PANDAS DATA TYPES RangeIndex: 5 entries, 0 to 4 Data columns (total 10 columns): Customer Number 5 non-null float64 Customer Name 5 non-null object 2016 5 non-null object 2017 5 non-null object Percent Growth 5 non-null object Jan Units 5 non-null object Month 5 non-null int64 Day 5 non-null int64 Year 5 non-null int64 Active 5 non-null object dtypes: float64(1), int64(3 TIPS FOR SELECTING COLUMNS IN A DATAFRAME Introduction. This article will discuss several tips and shortcuts for using iloc to work with a data set that has a large number of columns. Even if you have some experience with using iloc you should learn a couple of helpful tricks to speed up your own analysis and avoid typing lots of column names in your code. COMBINE MULTIPLE EXCEL WORKSHEETS INTO A SINGLE PANDAS This short article shows how you can read in all the tabs in an Excel workbook and combine them into a single pandas dataframe using one command. For those of you that want the TLDR, here is the command: df = pd.concat(pd.read_excel('2018_Sales_Total.xlsx', sheet_name=None), ignore_index=True) Read on for an explanation of when to use this and EFFICIENTLY CLEANING TEXT WITH PANDAS Cleaning attempt #2. Another approach that is very performant and flexible is to use np.select to run multiple matches and apply a specified value upon match.. There are several good resources that I used to learn how to use np.select.This article from Dataquest is a good overview. I also found this presentation from Nathan Cheever very interesting and information. AUTOMATING WINDOWS APPLICATIONS USING COM Pywin32 is basically a very thin wrapper of python that allows us to interact with COM objects and automate Windows applications with python. The power of this approach is that you can pretty much do anything that a Microsoft Application can do through python. The downside is that you have to run this on a Windows system withMicrosoft Office
PANDAS DATAFRAME VISUALIZATION TOOLS Data Analysis Applications. The second category of GUI applications are full-fledged applications typically using a web back-end like Flask or a separate application based on Qt. These applications vary in complexity and capability from simple table views and plotting capabilities to robust statistical analysis. CLEANING UP CURRENCY DATA WITH PANDAS In this example, the data is a mixture of currency labeled and non-currency labeled values. For a small example like this, you might want to clean it up at the source file. GUIDE TO ENCODING CATEGORICAL VALUES IN PYTHON The Data Set. For this article, I was able to find a good dataset at the UCI Machine Learning Repository.This particular Automobile Data Set includes a good mix of categorical values as well as continuous values and serves as a useful example that is relatively easy to understand. Since domain understanding is an important aspect when deciding how to encode various categorical values - this COMPREHENSIVE GUIDE TO GROUPING AND AGGREGATING WITH The most common built in aggregation functions are basic math functions including sum, mean, median, minimum, maximum, standard deviation, variance, mean absolute deviation and product. We can apply all these functions to the fare while grouping by the embark_town : This is all relatively straightforward math. BINNING DATA WITH PANDAS QCUT AND CUT Binning. One of the most common instances of binning is done behind the scenes for you when creating a histogram. The histogram below of customer sales data, shows how a continuous set of sales numbers can be divided into discrete bins (for example: $60,000 - $70,000) and then used to group and count account instances. OVERVIEW OF PANDAS DATA TYPES RangeIndex: 5 entries, 0 to 4 Data columns (total 10 columns): Customer Number 5 non-null float64 Customer Name 5 non-null object 2016 5 non-null object 2017 5 non-null object Percent Growth 5 non-null object Jan Units 5 non-null object Month 5 non-null int64 Day 5 non-null int64 Year 5 non-null int64 Active 5 non-null object dtypes: float64(1), int64(3 TIPS FOR SELECTING COLUMNS IN A DATAFRAME Introduction. This article will discuss several tips and shortcuts for using iloc to work with a data set that has a large number of columns. Even if you have some experience with using iloc you should learn a couple of helpful tricks to speed up your own analysis and avoid typing lots of column names in your code. COMBINE MULTIPLE EXCEL WORKSHEETS INTO A SINGLE PANDAS This short article shows how you can read in all the tabs in an Excel workbook and combine them into a single pandas dataframe using one command. For those of you that want the TLDR, here is the command: df = pd.concat(pd.read_excel('2018_Sales_Total.xlsx', sheet_name=None), ignore_index=True) Read on for an explanation of when to use this and PANDAS DATAFRAME VISUALIZATION TOOLS Data Analysis Applications. The second category of GUI applications are full-fledged applications typically using a web back-end like Flask or a separate application based on Qt. These applications vary in complexity and capability from simple table views and plotting capabilities to robust statistical analysis. AUTOMATING WINDOWS APPLICATIONS USING COM pywin32. The pywin32 package has been around for a very long time. In fact, the book that covers this topic was published in 2000 by Mark Hammond and Andy Robinson. Despite being 18 years old (which make me feel really old :), the underlying technology and concepts still worktoday.
EXPLORING AN ALTERNATIVE TO JUPYTER NOTEBOOKS FOR PYTHON VS Code manages this with a combination of code cells and the Python Interactive Window. As of early 2020, VS Code included support for running Jupyter notebooks natively in VS Code. The entire process is very similar to running the notebook in your browser. If you are not familiar, here is a screenshot of a demo notebook in VS Code. OVERVIEW OF PANDAS DATA TYPES RangeIndex: 5 entries, 0 to 4 Data columns (total 10 columns): Customer Number 5 non-null float64 Customer Name 5 non-null object 2016 5 non-null object 2017 5 non-null object Percent Growth 5 non-null object Jan Units 5 non-null object Month 5 non-null int64 Day 5 non-null int64 Year 5 non-null int64 Active 5 non-null object dtypes: float64(1), int64(3 MONTE CARLO SIMULATION WITH PYTHON The real “magic” of the Monte Carlo simulation is that if we run a simulation many times, we start to develop a picture of the likely distribution of results. In Excel, you would need VBA or another plugin to run multiple iterations. In python, we can use a for loop to run as many simulations as we’d like. IMPROVING PANDAS EXCEL OUTPUT First, we resize the sheet by adjusting the zoom. worksheet.set_zoom(90) Some of our biggest improvements come through formatting the columns to make the data more readable. add_format is very useful for improving your standard output. Here are two examples of formatting numbers: POPULATING MS WORD TEMPLATES WITH PYTHON Background. The package that makes all of this possible is fittingly called docx-mailmerge.It is a mature package that can parse the MS Word docx file, find the merge fields and populate them with whatever values you need. The package also support some helper functions for populating tables and generating single files with multiple pagebreaks.
FINDING NATURAL BREAKS IN DATA WITH THE FISHER-JENKS Without knowing the actual details of the algorithm, you would have known that 20, 50 and 75 are all pretty close to each other. Then, there is a big gap between 75 and 950 so that would be a “natural break” that you would utilize to bucket the rest of your accounts. PANDAS CROSSTAB EXPLAINED Introduction. Pandas offers several options for grouping and summarizing data but this variety of options can be a blessing and a curse. These approaches are all powerful data analysis tools but it can be confusing to know whether to use a groupby, pivot_table or crosstab to build a summary table. Since I have previously covered pivot_tables, this article will discuss the pandas crosstab COMMON EXCEL TASKS DEMONSTRATED IN PANDAS Introduction. The purpose of this article is to show some common Excel tasks and how you would execute similar tasks in pandas.Some of the examples are somewhat trivial but I think it is important to show the simple as well as the more complex functions you can find elsewhere. PRACTICAL BUSINESS PYTHONABOUTRESOURCESMAILING LISTARCHIVESOVERVIEW OFPANDAS DATA TYPES
The pandas read_html() function is a quick and convenient way to turn an HTML table into a pandas DataFrame. This function can be useful for quickly incorporating tables from various websites without figuring out how to scrape the site’s HTML.However, there can be some challenges in cleaning and formatting the data before analyzing it. MONTE CARLO SIMULATION WITH PYTHON The real “magic” of the Monte Carlo simulation is that if we run a simulation many times, we start to develop a picture of the likely distribution of results. In Excel, you would need VBA or another plugin to run multiple iterations. In python, we can use a for loop to run as many simulations as we’d like. EFFICIENTLY CLEANING TEXT WITH PANDAS Cleaning attempt #2. Another approach that is very performant and flexible is to use np.select to run multiple matches and apply a specified value upon match.. There are several good resources that I used to learn how to use np.select.This article from Dataquest is a good overview. I also found this presentation from Nathan Cheever very interesting and information. CREATING POWERPOINT PRESENTATIONS WITH PYTHON Creating tables in PowerPoint is a good news / bad news story. The good news is that there is an API to create one. The bad news is that you can’t easily convert a pandas DataFrame to a table using the built in API.However, we are very fortunate that someone has already done all the hard work for us and created PandasToPowerPoint.. This excellent piece of code takes a DataFrame and converts CREATING A WATERFALL CHART IN PYTHON Creating the Chart. Execute the standard imports and make sure IPython will display matplot plots. Setup the data we want to waterfall chart and load it into a dataframe. The data needs to start with your starting value but you leave out the final total. We will calculateit.
INTRODUCTION TO MARKET BASKET ANALYSIS IN PYTHON A useful (but somewhat overlooked) technique is called association analysis which attempts to find common patterns of items in large data sets. One specific application is often called market basket analysis. The most commonly cited example of market basket analysis is the so-called “beer and diapers” case. CREATING PDF REPORTS WITH PANDAS, JINJA AND WEASYPRINT The PDF creation portion is relatively simple as well. We need to do some imports and pass a string to the PDF generator. from weasyprint import HTML HTML(string=html_out).write_pdf("report.pdf") This command creates a PDF report that looks something like this: Ugh. It’s cool that it’s a PDF but it is ugly. GUIDE TO ENCODING CATEGORICAL VALUES IN PYTHON The Data Set. For this article, I was able to find a good dataset at the UCI Machine Learning Repository.This particular Automobile Data Set includes a good mix of categorical values as well as continuous values and serves as a useful example that is relatively easy to understand. Since domain understanding is an important aspect when deciding how to encode various categorical values - this USING PANDAS TO CREATE AN EXCEL DIFF Introduction. As part of my continued exploration of pandas, I am going to walk through a real world example of how to use pandas to automate a process that could be very difficult to do in Excel.My business problem is that I have two Excel files that are structured similarly but have different data and I would like to easily understand what has changed between the two files. PRACTICAL BUSINESS PYTHONABOUTRESOURCESMAILING LISTARCHIVESOVERVIEW OFPANDAS DATA TYPES
The pandas read_html() function is a quick and convenient way to turn an HTML table into a pandas DataFrame. This function can be useful for quickly incorporating tables from various websites without figuring out how to scrape the site’s HTML.However, there can be some challenges in cleaning and formatting the data before analyzing it. MONTE CARLO SIMULATION WITH PYTHON The real “magic” of the Monte Carlo simulation is that if we run a simulation many times, we start to develop a picture of the likely distribution of results. In Excel, you would need VBA or another plugin to run multiple iterations. In python, we can use a for loop to run as many simulations as we’d like. EFFICIENTLY CLEANING TEXT WITH PANDAS Cleaning attempt #2. Another approach that is very performant and flexible is to use np.select to run multiple matches and apply a specified value upon match.. There are several good resources that I used to learn how to use np.select.This article from Dataquest is a good overview. I also found this presentation from Nathan Cheever very interesting and information. CREATING POWERPOINT PRESENTATIONS WITH PYTHON Creating tables in PowerPoint is a good news / bad news story. The good news is that there is an API to create one. The bad news is that you can’t easily convert a pandas DataFrame to a table using the built in API.However, we are very fortunate that someone has already done all the hard work for us and created PandasToPowerPoint.. This excellent piece of code takes a DataFrame and converts CREATING A WATERFALL CHART IN PYTHON Creating the Chart. Execute the standard imports and make sure IPython will display matplot plots. Setup the data we want to waterfall chart and load it into a dataframe. The data needs to start with your starting value but you leave out the final total. We will calculateit.
INTRODUCTION TO MARKET BASKET ANALYSIS IN PYTHON A useful (but somewhat overlooked) technique is called association analysis which attempts to find common patterns of items in large data sets. One specific application is often called market basket analysis. The most commonly cited example of market basket analysis is the so-called “beer and diapers” case. CREATING PDF REPORTS WITH PANDAS, JINJA AND WEASYPRINT The PDF creation portion is relatively simple as well. We need to do some imports and pass a string to the PDF generator. from weasyprint import HTML HTML(string=html_out).write_pdf("report.pdf") This command creates a PDF report that looks something like this: Ugh. It’s cool that it’s a PDF but it is ugly. GUIDE TO ENCODING CATEGORICAL VALUES IN PYTHON The Data Set. For this article, I was able to find a good dataset at the UCI Machine Learning Repository.This particular Automobile Data Set includes a good mix of categorical values as well as continuous values and serves as a useful example that is relatively easy to understand. Since domain understanding is an important aspect when deciding how to encode various categorical values - this USING PANDAS TO CREATE AN EXCEL DIFF Introduction. As part of my continued exploration of pandas, I am going to walk through a real world example of how to use pandas to automate a process that could be very difficult to do in Excel.My business problem is that I have two Excel files that are structured similarly but have different data and I would like to easily understand what has changed between the two files. COMBINE MULTIPLE EXCEL WORKSHEETS INTO A SINGLE PANDAS This short article shows how you can read in all the tabs in an Excel workbook and combine them into a single pandas dataframe using one command. For those of you that want the TLDR, here is the command: df = pd.concat(pd.read_excel('2018_Sales_Total.xlsx', sheet_name=None), ignore_index=True) Read on for an explanation of when to use this and AUTOMATING WINDOWS APPLICATIONS USING COM Pywin32 is basically a very thin wrapper of python that allows us to interact with COM objects and automate Windows applications with python. The power of this approach is that you can pretty much do anything that a Microsoft Application can do through python. The downside is that you have to run this on a Windows system withMicrosoft Office
CREATING PDF REPORTS WITH PANDAS, JINJA AND WEASYPRINT Introduction. Pandas is excellent at manipulating large amounts of data and summarizing it in multiple text and visual representations. Without much effort, pandas supports output to CSV, Excel, HTML, json and more.Where things get more difficult is if you want PANDAS DATAFRAME VISUALIZATION TOOLS Data Analysis Applications. The second category of GUI applications are full-fledged applications typically using a web back-end like Flask or a separate application based on Qt. These applications vary in complexity and capability from simple table views and plotting capabilities to robust statistical analysis. EXCEL “FILTER AND EDIT” There is a special bonus of $250 plus a 4.5% commission for all shoe sales > $1000 in a single transaction. In order to do this in Excel, using the Filter and edit approach: Add a commission column with 2%. Add a bonus column of $0. Filter on shirts and change the vale to 2.5%. Clear the filter. CLEANING UP CURRENCY DATA WITH PANDAS In this example, the data is a mixture of currency labeled and non-currency labeled values. For a small example like this, you might want to clean it up at the source file. IMPROVING PANDAS EXCEL OUTPUT First, we resize the sheet by adjusting the zoom. worksheet.set_zoom(90) Some of our biggest improvements come through formatting the columns to make the data more readable. add_format is very useful for improving your standard output. Here are two examples of formatting numbers: COMBINING DATA FROM MULTIPLE EXCEL FILES The Problem. Before, I get into the examples, here is a simple diagram showing the challenges with the common process used in businesses all over the world to consolidate data from multiple Excel files, clean it up and perform some analysis. READING HTML TABLES WITH PANDAS Introduction. The pandas read_html() function is a quick and convenient way to turn an HTML table into a pandas DataFrame. This function can be useful for quickly incorporating tables from various websites without figuring out how to scrape the site’s HTML.However, there can be some challenges in cleaning and formatting the data before analyzing it. BINNING DATA WITH PANDAS QCUT AND CUT Binning. One of the most common instances of binning is done behind the scenes for you when creating a histogram. The histogram below of customer sales data, shows how a continuous set of sales numbers can be divided into discrete bins (for example: $60,000 - $70,000) and then used to group and count account instances. COMBINE MULTIPLE EXCEL WORKSHEETS INTO A SINGLE PANDAS This short article shows how you can read in all the tabs in an Excel workbook and combine them into a single pandas dataframe using one command. For those of you that want the TLDR, here is the command: df = pd.concat(pd.read_excel('2018_Sales_Total.xlsx', sheet_name=None), ignore_index=True) Read on for an explanation of when to use this and PRACTICAL BUSINESS PYTHONABOUTRESOURCESMAILING LISTARCHIVESOVERVIEW OFPANDAS DATA TYPES
The pandas read_html() function is a quick and convenient way to turn an HTML table into a pandas DataFrame. This function can be useful for quickly incorporating tables from various websites without figuring out how to scrape the site’s HTML.However, there can be some challenges in cleaning and formatting the data before analyzing it. EFFICIENTLY CLEANING TEXT WITH PANDAS Cleaning attempt #2. Another approach that is very performant and flexible is to use np.select to run multiple matches and apply a specified value upon match.. There are several good resources that I used to learn how to use np.select.This article from Dataquest is a good overview. I also found this presentation from Nathan Cheever very interesting and information. MONTE CARLO SIMULATION WITH PYTHON The real “magic” of the Monte Carlo simulation is that if we run a simulation many times, we start to develop a picture of the likely distribution of results. In Excel, you would need VBA or another plugin to run multiple iterations. In python, we can use a for loop to run as many simulations as we’d like. CREATING POWERPOINT PRESENTATIONS WITH PYTHON Creating tables in PowerPoint is a good news / bad news story. The good news is that there is an API to create one. The bad news is that you can’t easily convert a pandas DataFrame to a table using the built in API.However, we are very fortunate that someone has already done all the hard work for us and created PandasToPowerPoint.. This excellent piece of code takes a DataFrame and converts READING HTML TABLES WITH PANDAS Introduction. The pandas read_html() function is a quick and convenient way to turn an HTML table into a pandas DataFrame. This function can be useful for quickly incorporating tables from various websites without figuring out how to scrape the site’s HTML.However, there can be some challenges in cleaning and formatting the data before analyzing it. INTRODUCTION TO MARKET BASKET ANALYSIS IN PYTHON A useful (but somewhat overlooked) technique is called association analysis which attempts to find common patterns of items in large data sets. One specific application is often called market basket analysis. The most commonly cited example of market basket analysis is the so-called “beer and diapers” case. EXCEL “FILTER AND EDIT” There is a special bonus of $250 plus a 4.5% commission for all shoe sales > $1000 in a single transaction. In order to do this in Excel, using the Filter and edit approach: Add a commission column with 2%. Add a bonus column of $0. Filter on shirts and change the vale to 2.5%. Clear the filter. GUIDE TO ENCODING CATEGORICAL VALUES IN PYTHON The Data Set. For this article, I was able to find a good dataset at the UCI Machine Learning Repository.This particular Automobile Data Set includes a good mix of categorical values as well as continuous values and serves as a useful example that is relatively easy to understand. Since domain understanding is an important aspect when deciding how to encode various categorical values - this USING PANDAS TO CREATE AN EXCEL DIFF Introduction. As part of my continued exploration of pandas, I am going to walk through a real world example of how to use pandas to automate a process that could be very difficult to do in Excel.My business problem is that I have two Excel files that are structured similarly but have different data and I would like to easily understand what has changed between the two files. COMBINE MULTIPLE EXCEL WORKSHEETS INTO A SINGLE PANDAS This short article shows how you can read in all the tabs in an Excel workbook and combine them into a single pandas dataframe using one command. For those of you that want the TLDR, here is the command: df = pd.concat(pd.read_excel('2018_Sales_Total.xlsx', sheet_name=None), ignore_index=True) Read on for an explanation of when to use this and PRACTICAL BUSINESS PYTHONABOUTRESOURCESMAILING LISTARCHIVESOVERVIEW OFPANDAS DATA TYPES
The pandas read_html() function is a quick and convenient way to turn an HTML table into a pandas DataFrame. This function can be useful for quickly incorporating tables from various websites without figuring out how to scrape the site’s HTML.However, there can be some challenges in cleaning and formatting the data before analyzing it. EFFICIENTLY CLEANING TEXT WITH PANDAS Cleaning attempt #2. Another approach that is very performant and flexible is to use np.select to run multiple matches and apply a specified value upon match.. There are several good resources that I used to learn how to use np.select.This article from Dataquest is a good overview. I also found this presentation from Nathan Cheever very interesting and information. MONTE CARLO SIMULATION WITH PYTHON The real “magic” of the Monte Carlo simulation is that if we run a simulation many times, we start to develop a picture of the likely distribution of results. In Excel, you would need VBA or another plugin to run multiple iterations. In python, we can use a for loop to run as many simulations as we’d like. CREATING POWERPOINT PRESENTATIONS WITH PYTHON Creating tables in PowerPoint is a good news / bad news story. The good news is that there is an API to create one. The bad news is that you can’t easily convert a pandas DataFrame to a table using the built in API.However, we are very fortunate that someone has already done all the hard work for us and created PandasToPowerPoint.. This excellent piece of code takes a DataFrame and converts READING HTML TABLES WITH PANDAS Introduction. The pandas read_html() function is a quick and convenient way to turn an HTML table into a pandas DataFrame. This function can be useful for quickly incorporating tables from various websites without figuring out how to scrape the site’s HTML.However, there can be some challenges in cleaning and formatting the data before analyzing it. INTRODUCTION TO MARKET BASKET ANALYSIS IN PYTHON A useful (but somewhat overlooked) technique is called association analysis which attempts to find common patterns of items in large data sets. One specific application is often called market basket analysis. The most commonly cited example of market basket analysis is the so-called “beer and diapers” case. EXCEL “FILTER AND EDIT” There is a special bonus of $250 plus a 4.5% commission for all shoe sales > $1000 in a single transaction. In order to do this in Excel, using the Filter and edit approach: Add a commission column with 2%. Add a bonus column of $0. Filter on shirts and change the vale to 2.5%. Clear the filter. GUIDE TO ENCODING CATEGORICAL VALUES IN PYTHON The Data Set. For this article, I was able to find a good dataset at the UCI Machine Learning Repository.This particular Automobile Data Set includes a good mix of categorical values as well as continuous values and serves as a useful example that is relatively easy to understand. Since domain understanding is an important aspect when deciding how to encode various categorical values - this USING PANDAS TO CREATE AN EXCEL DIFF Introduction. As part of my continued exploration of pandas, I am going to walk through a real world example of how to use pandas to automate a process that could be very difficult to do in Excel.My business problem is that I have two Excel files that are structured similarly but have different data and I would like to easily understand what has changed between the two files. COMBINE MULTIPLE EXCEL WORKSHEETS INTO A SINGLE PANDAS This short article shows how you can read in all the tabs in an Excel workbook and combine them into a single pandas dataframe using one command. For those of you that want the TLDR, here is the command: df = pd.concat(pd.read_excel('2018_Sales_Total.xlsx', sheet_name=None), ignore_index=True) Read on for an explanation of when to use this and AUTOMATING WINDOWS APPLICATIONS USING COM Pywin32 is basically a very thin wrapper of python that allows us to interact with COM objects and automate Windows applications with python. The power of this approach is that you can pretty much do anything that a Microsoft Application can do through python. The downside is that you have to run this on a Windows system withMicrosoft Office
PANDAS DATAFRAME VISUALIZATION TOOLS Data Analysis Applications. The second category of GUI applications are full-fledged applications typically using a web back-end like Flask or a separate application based on Qt. These applications vary in complexity and capability from simple table views and plotting capabilities to robust statistical analysis. READING HTML TABLES WITH PANDAS Introduction. The pandas read_html() function is a quick and convenient way to turn an HTML table into a pandas DataFrame. This function can be useful for quickly incorporating tables from various websites without figuring out how to scrape the site’s HTML.However, there can be some challenges in cleaning and formatting the data before analyzing it. OVERVIEW OF PANDAS DATA TYPES RangeIndex: 5 entries, 0 to 4 Data columns (total 10 columns): Customer Number 5 non-null float64 Customer Name 5 non-null object 2016 5 non-null object 2017 5 non-null object Percent Growth 5 non-null object Jan Units 5 non-null object Month 5 non-null int64 Day 5 non-null int64 Year 5 non-null int64 Active 5 non-null object dtypes: float64(1), int64(3 CREATING A WATERFALL CHART IN PYTHON Creating the Chart. Execute the standard imports and make sure IPython will display matplot plots. Setup the data we want to waterfall chart and load it into a dataframe. The data needs to start with your starting value but you leave out the final total. We will calculateit.
IMPROVING PANDAS EXCEL OUTPUT First, we resize the sheet by adjusting the zoom. worksheet.set_zoom(90) Some of our biggest improvements come through formatting the columns to make the data more readable. add_format is very useful for improving your standard output. Here are two examples of formatting numbers: COMBINING DATA FROM MULTIPLE EXCEL FILES The Problem. Before, I get into the examples, here is a simple diagram showing the challenges with the common process used in businesses all over the world to consolidate data from multiple Excel files, clean it up and perform some analysis. BINNING DATA WITH PANDAS QCUT AND CUT Binning. One of the most common instances of binning is done behind the scenes for you when creating a histogram. The histogram below of customer sales data, shows how a continuous set of sales numbers can be divided into discrete bins (for example: $60,000 - $70,000) and then used to group and count account instances. CLEANING UP CURRENCY DATA WITH PANDAS In this example, the data is a mixture of currency labeled and non-currency labeled values. For a small example like this, you might want to clean it up at the source file. COMBINE MULTIPLE EXCEL WORKSHEETS INTO A SINGLE PANDAS This short article shows how you can read in all the tabs in an Excel workbook and combine them into a single pandas dataframe using one command. For those of you that want the TLDR, here is the command: df = pd.concat(pd.read_excel('2018_Sales_Total.xlsx', sheet_name=None), ignore_index=True) Read on for an explanation of when to use this and PRACTICAL BUSINESS PYTHONABOUTRESOURCESMAILING LISTARCHIVESOVERVIEW OFPANDAS DATA TYPES
The pandas read_html() function is a quick and convenient way to turn an HTML table into a pandas DataFrame. This function can be useful for quickly incorporating tables from various websites without figuring out how to scrape the site’s HTML.However, there can be some challenges in cleaning and formatting the data before analyzing it. EFFICIENTLY CLEANING TEXT WITH PANDAS Cleaning attempt #2. Another approach that is very performant and flexible is to use np.select to run multiple matches and apply a specified value upon match.. There are several good resources that I used to learn how to use np.select.This article from Dataquest is a good overview. I also found this presentation from Nathan Cheever very interesting and information. GUIDE TO ENCODING CATEGORICAL VALUES IN PYTHON The Data Set. For this article, I was able to find a good dataset at the UCI Machine Learning Repository.This particular Automobile Data Set includes a good mix of categorical values as well as continuous values and serves as a useful example that is relatively easy to understand. Since domain understanding is an important aspect when deciding how to encode various categorical values - this MONTE CARLO SIMULATION WITH PYTHON The real “magic” of the Monte Carlo simulation is that if we run a simulation many times, we start to develop a picture of the likely distribution of results. In Excel, you would need VBA or another plugin to run multiple iterations. In python, we can use a for loop to run as many simulations as we’d like. CREATING POWERPOINT PRESENTATIONS WITH PYTHON Creating tables in PowerPoint is a good news / bad news story. The good news is that there is an API to create one. The bad news is that you can’t easily convert a pandas DataFrame to a table using the built in API.However, we are very fortunate that someone has already done all the hard work for us and created PandasToPowerPoint.. This excellent piece of code takes a DataFrame and converts BINNING DATA WITH PANDAS QCUT AND CUT Binning. One of the most common instances of binning is done behind the scenes for you when creating a histogram. The histogram below of customer sales data, shows how a continuous set of sales numbers can be divided into discrete bins (for example: $60,000 - $70,000) and then used to group and count account instances. CLEANING UP CURRENCY DATA WITH PANDAS In this example, the data is a mixture of currency labeled and non-currency labeled values. For a small example like this, you might want to clean it up at the source file. AUTOMATING WINDOWS APPLICATIONS USING COM Pywin32 is basically a very thin wrapper of python that allows us to interact with COM objects and automate Windows applications with python. The power of this approach is that you can pretty much do anything that a Microsoft Application can do through python. The downside is that you have to run this on a Windows system withMicrosoft Office
CREATING PDF REPORTS WITH PANDAS, JINJA AND WEASYPRINT The PDF creation portion is relatively simple as well. We need to do some imports and pass a string to the PDF generator. from weasyprint import HTML HTML(string=html_out).write_pdf("report.pdf") This command creates a PDF report that looks something like this: Ugh. It’s cool that it’s a PDF but it is ugly. COMBINE MULTIPLE EXCEL WORKSHEETS INTO A SINGLE PANDAS This short article shows how you can read in all the tabs in an Excel workbook and combine them into a single pandas dataframe using one command. For those of you that want the TLDR, here is the command: df = pd.concat(pd.read_excel('2018_Sales_Total.xlsx', sheet_name=None), ignore_index=True) Read on for an explanation of when to use this and PRACTICAL BUSINESS PYTHONABOUTRESOURCESMAILING LISTARCHIVESOVERVIEW OFPANDAS DATA TYPES
The pandas read_html() function is a quick and convenient way to turn an HTML table into a pandas DataFrame. This function can be useful for quickly incorporating tables from various websites without figuring out how to scrape the site’s HTML.However, there can be some challenges in cleaning and formatting the data before analyzing it. EFFICIENTLY CLEANING TEXT WITH PANDAS Cleaning attempt #2. Another approach that is very performant and flexible is to use np.select to run multiple matches and apply a specified value upon match.. There are several good resources that I used to learn how to use np.select.This article from Dataquest is a good overview. I also found this presentation from Nathan Cheever very interesting and information. GUIDE TO ENCODING CATEGORICAL VALUES IN PYTHON The Data Set. For this article, I was able to find a good dataset at the UCI Machine Learning Repository.This particular Automobile Data Set includes a good mix of categorical values as well as continuous values and serves as a useful example that is relatively easy to understand. Since domain understanding is an important aspect when deciding how to encode various categorical values - this MONTE CARLO SIMULATION WITH PYTHON The real “magic” of the Monte Carlo simulation is that if we run a simulation many times, we start to develop a picture of the likely distribution of results. In Excel, you would need VBA or another plugin to run multiple iterations. In python, we can use a for loop to run as many simulations as we’d like. CREATING POWERPOINT PRESENTATIONS WITH PYTHON Creating tables in PowerPoint is a good news / bad news story. The good news is that there is an API to create one. The bad news is that you can’t easily convert a pandas DataFrame to a table using the built in API.However, we are very fortunate that someone has already done all the hard work for us and created PandasToPowerPoint.. This excellent piece of code takes a DataFrame and converts BINNING DATA WITH PANDAS QCUT AND CUT Binning. One of the most common instances of binning is done behind the scenes for you when creating a histogram. The histogram below of customer sales data, shows how a continuous set of sales numbers can be divided into discrete bins (for example: $60,000 - $70,000) and then used to group and count account instances. CLEANING UP CURRENCY DATA WITH PANDAS In this example, the data is a mixture of currency labeled and non-currency labeled values. For a small example like this, you might want to clean it up at the source file. AUTOMATING WINDOWS APPLICATIONS USING COM Pywin32 is basically a very thin wrapper of python that allows us to interact with COM objects and automate Windows applications with python. The power of this approach is that you can pretty much do anything that a Microsoft Application can do through python. The downside is that you have to run this on a Windows system withMicrosoft Office
CREATING PDF REPORTS WITH PANDAS, JINJA AND WEASYPRINT The PDF creation portion is relatively simple as well. We need to do some imports and pass a string to the PDF generator. from weasyprint import HTML HTML(string=html_out).write_pdf("report.pdf") This command creates a PDF report that looks something like this: Ugh. It’s cool that it’s a PDF but it is ugly. COMBINE MULTIPLE EXCEL WORKSHEETS INTO A SINGLE PANDAS This short article shows how you can read in all the tabs in an Excel workbook and combine them into a single pandas dataframe using one command. For those of you that want the TLDR, here is the command: df = pd.concat(pd.read_excel('2018_Sales_Total.xlsx', sheet_name=None), ignore_index=True) Read on for an explanation of when to use this and AUTOMATING WINDOWS APPLICATIONS USING COM pywin32. The pywin32 package has been around for a very long time. In fact, the book that covers this topic was published in 2000 by Mark Hammond and Andy Robinson. Despite being 18 years old (which make me feel really old :), the underlying technology and concepts still worktoday.
OVERVIEW OF PANDAS DATA TYPES RangeIndex: 5 entries, 0 to 4 Data columns (total 10 columns): Customer Number 5 non-null float64 Customer Name 5 non-null object 2016 5 non-null object 2017 5 non-null object Percent Growth 5 non-null object Jan Units 5 non-null object Month 5 non-null int64 Day 5 non-null int64 Year 5 non-null int64 Active 5 non-null object dtypes: float64(1), int64(3 USING PYTHON’S PATHLIB MODULE Using python's pathlib module. Interesting. In my opinion this is much easier to mentally parse. It’s a similar thought process to the os.path method of joining the current working directory (using Path.cwd()) with the various subdirectories and file locations.It is much easier to follow because of the clever overriding of the / to build up a path in a more natural manner than chaining many COMPREHENSIVE GUIDE TO GROUPING AND AGGREGATING WITH The most common built in aggregation functions are basic math functions including sum, mean, median, minimum, maximum, standard deviation, variance, mean absolute deviation and product. We can apply all these functions to the fare while grouping by the embark_town : This is all relatively straightforward math. IMPROVING PANDAS EXCEL OUTPUT First, we resize the sheet by adjusting the zoom. worksheet.set_zoom(90) Some of our biggest improvements come through formatting the columns to make the data more readable. add_format is very useful for improving your standard output. Here are two examples of formatting numbers: PANDAS CROSSTAB EXPLAINED Introduction. Pandas offers several options for grouping and summarizing data but this variety of options can be a blessing and a curse. These approaches are all powerful data analysis tools but it can be confusing to know whether to use a groupby, pivot_table or crosstab to build a summary table. Since I have previously covered pivot_tables, this article will discuss the pandas crosstab TIPS FOR SELECTING COLUMNS IN A DATAFRAME Introduction. This article will discuss several tips and shortcuts for using iloc to work with a data set that has a large number of columns. Even if you have some experience with using iloc you should learn a couple of helpful tricks to speed up your own analysis and avoid typing lots of column names in your code. COMBINING DATA FROM MULTIPLE EXCEL FILES The Problem. Before, I get into the examples, here is a simple diagram showing the challenges with the common process used in businesses all over the world to consolidate data from multiple Excel files, clean it up and perform some analysis. USING PANDAS TO CREATE AN EXCEL DIFF Introduction. As part of my continued exploration of pandas, I am going to walk through a real world example of how to use pandas to automate a process that could be very difficult to do in Excel.My business problem is that I have two Excel files that are structured similarly but have different data and I would like to easily understand what has changed between the two files. UNDERSTANDING THE TRANSFORM FUNCTION IN PANDAS As described in the book, transform is an operation used in conjunction with groupby (which is one of the most useful operations in pandas). I suspect most pandas users likely have used aggregate , filter or apply with groupby to summarize data. However, transform is a little more difficult to understand - especially coming from anExcel world.
Toggle navigation _ _* Home
* About
* Resources
* Mailing List
*
* Archives
*
* __
PRACTICAL BUSINESS PYTHON Taking care of business, one python script at a time Tue 16 February 2021 EFFICIENTLY CLEANING TEXT WITH PANDAS Posted by Chris Moffittin articles
It’s no secret that data cleaning is a large portion of the data analysis process. When using pandas, there are multiple techniques for cleaning text fields to prepare for further analysis. As data sets grow large, it is important to find efficient methods that perform in a reasonable time and are maintainable since text cleaning is a process that evolves over time. This article will show examples of cleaning text fields in a large data file and illustrates tips for how to efficiently clean unstructured text fields.Read more... __
-------------------------Mon 18 January 2021
CASE STUDY: AUTOMATING EXCEL FILE CREATION AND DISTRIBUTION WITHPANDAS AND OUTLOOK
Posted by Chris Moffittin articles
I enjoy hearing from readers that have used concepts from this blog to solve their own problems. It always amazes me when I see examples where only a few lines of python code can solve a real business problem and save organizations a lot of time and money. I am also impressed when people figure out how to do this with no formal training - just with some hard work and willingness to persevere through the learning curve.Read more... __
-------------------------Mon 11 January 2021
PANDAS DATAFRAME VISUALIZATION TOOLS Posted by Chris Moffittin articles
I have talked quite a bit about how pandas is a great alternative to Excel for many tasks. One of Excel’s benefits is that it offers an intuitive and powerful graphical interface for viewing your data. In contrast, pandas + a Jupyter notebook offers a lot of programmatic power but limited abilities to graphically display and manipulate aDataFrame view.
There are several tools in the Python ecosystem that are designed to fill this gap. They range in complexity from simple JavaScript libraries to complex, full-featured data analysis engines. The one common denominator is that they all provide a way to view and selectively filter your data in a graphical format. From this point of commonality they diverge quite a bit in design and functionality. This article will review several of these options in order to give you an idea of the landscape and evaluate which ones might be useful for your analysis process.Read more... __
------------------------- Mon 09 November 2020 COMPREHENSIVE GUIDE TO GROUPING AND AGGREGATING WITH PANDAS Posted by Chris Moffittin articles
One of the most basic analysis functions is grouping and aggregating data. In some cases, this level of analysis may be sufficient to answer business questions. In other instances, this activity might be the first step in a more complex data science analysis. In pandas, the groupby function can be combined with one or more aggregation functions to quickly and easily summarize data. This concept is deceptively simple and most new pandas users will understand this concept. However, they might be surprised at how useful complex aggregation functions can be for supporting sophisticated analysis. This article will quickly summarize the basic pandas aggregation functions and show examples of more complex custom aggregations. Whether you are a new or more experienced pandas user, I think you will learn a few things from this article.Read more... __
-------------------------Mon 19 October 2020
READING POORLY STRUCTURED EXCEL FILES WITH PANDAS Posted by Chris Moffittin articles
With pandas it is easy to read Excel files and convert the data into a DataFrame. Unfortunately Excel files in the real world are often poorly constructed. In those cases where the data is scattered across the worksheet, you may need to customize the way you read the data. This article will discuss how to use pandas and openpyxl to read these types of Excel files and cleanly convert the data to a DataFrame suitable for further analysis.Read more... __
-------------------------Mon 12 October 2020
CASE STUDY: PROCESSING HISTORICAL WEATHER PATTERN DATA Posted by Chris Moffittin articles
The main purpose of this blog is to show people how to use Python to solve real world problems. Over the years, I have been fortunate enough to hear from readers about how they have used tips and tricks from this site to solve their own problems. In this post, I am extremely delighted to present a real world case study. I hope it will give you some ideas about how you can apply these concepts to yourown problems.
This example comes from Michael Biermann from Germany. He had the challenging task of trying to gather detailed historical weather data in order to do analysis on the relationship between air temperature and power consumption. This article will show how he used a pipeline of Python programs to automate the process of collecting, cleaning and processing gigabytes of weather data in order to performhis analysis.
Read more... __
------------------------- Mon 21 September 2020 PB PYTHON ARTICLE ROADMAP Posted by Chris Moffittin articles
September 17th is Practical Business Python’s anniversary. Last year , I reflected on 5 years of growth. This year, I wanted to take a step back and develop a guide to guide readers through the content on PB PythonRead more... __
------------------------- Mon 14 September 2020 READING HTML TABLES WITH PANDAS Posted by Chris Moffittin articles
The pandas read_html() function is a quick and convenient way to turn an HTML table into a pandas DataFrame. This function can be useful for quickly incorporating tables from various websites without figuring out how to scrape the site’s HTML. However, there can be some challenges in cleaning and formatting the data before analyzing it. In this article, I will discuss how to use pandas read_html() to read and clean several Wikipedia HTML tables so that you can use them for furthernumeric analysis.
Read more... __
-------------------------Mon 17 August 2020
TAKING ANOTHER LOOK AT PLOTLY Posted by Chris Moffittin articles
I’ve written quite a bit about visualization in python - partially because the landscape is always evolving. Plotly stands out as one of the tools that has undergone a significant amount of change since my first post in 2015. If you have not looked at using Plotly for python data visualization lately, you might want to take it for a spin. This article will discuss some of the most recent changes with Plotly, what the benefits are and why Plotly is worth considering for your data visualization needs.Read more... __
-------------------------Tue 02 June 2020
SIDETABLE - CREATE SIMPLE SUMMARY TABLES IN PANDAS Posted by Chris Moffittin articles
Today I am happy to announce the release of a new pandas utility library called sidetable . This library makes it easy to build a frequency table and simple summary of missing values in a DataFrame. I have found it to be a useful tool when starting data exploration on a new data set and I hope others find it useful as well. This project is also an opportunity to illustrate how to use pandasnew API
to register custom DataFrame accessors. This API allows you to build custom functions for working with pandas DataFrames and Series and could be really useful for building out your own library of custom pandas accessor functions.Read more... __
-------------------------* __ Previous
* 1
* 2
* 3
* 4
* 5
* 6
* 7
* 8
* 9
* Next __
__ __
SUBSCRIBE TO THE MAILING LISTEmail address
Subscribe
__ SOCIAL
* __ Github
* __ Twitter
* __ LinkedIn
__SUBMIT A TOPIC
* __Suggest a topic for a post__POPULAR
* __Pandas Pivot Table Explained * __Common Excel Tasks Demonstrated in Pandas * __Overview of Python Visualization Tools * __Guide to Encoding Categorical Values in Python * __Overview of Pandas Data Types__ARTICLE ROADMAP
__ FEEDS
* __ Atom Feed
-------------------------__ DISCLOSURE
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites. ------------------------- Ⓒ 2014-2021 Practical Business Python • Site built using Pelican • Theme based on VoidyBootstrapby
RKI
Details
Copyright © 2024 ArchiveBay.com. All rights reserved. Terms of Use | Privacy Policy | DMCA | 2021 | Feedback | Advertising | RSS 2.0