Are you over 18 and want to see adult content?
More Annotations
A complete backup of imagerie-boisdeverrieres.fr
Are you over 18 and want to see adult content?
A complete backup of ecocomplect.shop
Are you over 18 and want to see adult content?
A complete backup of psisalonvenus.cz
Are you over 18 and want to see adult content?
A complete backup of jeradhillphoto.com
Are you over 18 and want to see adult content?
A complete backup of theeducatorsroom.com
Are you over 18 and want to see adult content?
Favourite Annotations
A complete backup of www.kicker.de/4589127/spielbericht
Are you over 18 and want to see adult content?
A complete backup of www.mirrormedia.mg/story/20200207edi033
Are you over 18 and want to see adult content?
Text
MARCO BONZANINI
I’m a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd. More about Marcohere.
MINING TWITTER DATA WITH PYTHON: PART 5 MY PYTHON CODE IS SLOW? TIPS FOR PROFILING tl;dr Before you can optimise your slow code, you need to identify the bottlenecks: proper profiling will give you the right insights. This article discusses some profiling tools for Python. Introduction Python is a high-level programming language with an emphasis on readability. Some of its peculiarities, like the dynamic typing, or the (in)famous GIL, might have some trade-offs in terms STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ONMARCOBONZANINI.COM
MINING TWITTER DATA WITH PYTHON (PART 6 Sentiment Analysis is one of the interesting applications of text analytics. Although the term is often associated with sentiment classification of documents, broadly speaking it refers to the use of text analytics approaches applied to the set of problems related to identifying and extracting subjective material in text sources.. This article continues the series on mining Twitter data with GETTING STARTED WITH NEO4J AND PYTHON SEARCHING PUBMED WITH PYTHON Update 2021-01: minor update to reflect some changes in the Pubmed API PubMed is a search engine accessing millions of biomedical citations. Users can freely search for biomedical references. For some articles, the access to the full text paper is also open. This post describes how you can programmatically search the PubMed database with Python,in order
PHRASE MATCH AND PROXIMITY SEARCH IN ELASTICSEARCH The starting point is to understand the specific use case that we’re trying to tackle, and from here we have a set of choices. Depending on the scenario, we might want to choose one between: a simple match search. a match search with a minimum match ratio. a phrase-based match. a phrase match with a slop for proximity search. HOW TO QUERY ELASTICSEARCH WITH PYTHON Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It's a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language. This article provides an overview TUNING RELEVANCE IN ELASTICSEARCH WITH CUSTOM BOOSTING Tuning Relevance in Elasticsearch with Custom Boosting. Elasticsearch offers different options out of the box in terms of ranking function (similarity function, in Lucene terminology). The default ranking function is a variation of TF-IDF, relatively simple to understand and, thanks to some smart normalisations, also quite effective inpractice.
MARCO BONZANINI
I’m a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd. More about Marcohere.
MINING TWITTER DATA WITH PYTHON: PART 5 MY PYTHON CODE IS SLOW? TIPS FOR PROFILING tl;dr Before you can optimise your slow code, you need to identify the bottlenecks: proper profiling will give you the right insights. This article discusses some profiling tools for Python. Introduction Python is a high-level programming language with an emphasis on readability. Some of its peculiarities, like the dynamic typing, or the (in)famous GIL, might have some trade-offs in terms STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ONMARCOBONZANINI.COM
MINING TWITTER DATA WITH PYTHON (PART 6 Sentiment Analysis is one of the interesting applications of text analytics. Although the term is often associated with sentiment classification of documents, broadly speaking it refers to the use of text analytics approaches applied to the set of problems related to identifying and extracting subjective material in text sources.. This article continues the series on mining Twitter data with GETTING STARTED WITH NEO4J AND PYTHON SEARCHING PUBMED WITH PYTHON Update 2021-01: minor update to reflect some changes in the Pubmed API PubMed is a search engine accessing millions of biomedical citations. Users can freely search for biomedical references. For some articles, the access to the full text paper is also open. This post describes how you can programmatically search the PubMed database with Python,in order
PHRASE MATCH AND PROXIMITY SEARCH IN ELASTICSEARCH The starting point is to understand the specific use case that we’re trying to tackle, and from here we have a set of choices. Depending on the scenario, we might want to choose one between: a simple match search. a match search with a minimum match ratio. a phrase-based match. a phrase match with a slop for proximity search. HOW TO QUERY ELASTICSEARCH WITH PYTHON Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It's a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language. This article provides an overview TUNING RELEVANCE IN ELASTICSEARCH WITH CUSTOM BOOSTING Tuning Relevance in Elasticsearch with Custom Boosting. Elasticsearch offers different options out of the box in terms of ranking function (similarity function, in Lucene terminology). The default ranking function is a variation of TF-IDF, relatively simple to understand and, thanks to some smart normalisations, also quite effective inpractice.
MARCO BONZANINI
I’m a Data Science consultant, corporate trainer and author based in London, UK.I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.. More about Marcohere.
ABOUT ME – MARCO BONZANINI About Me. I’m a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.PRESENTATIONS
This presentation is a practical introduction to topic modelling in Python, tackling the problem of analysing large data sets of text, in order to identify topics of interest and related keywords. What are they talking about? Mining topics in documents with topic modelling and Python ( code demo) September 2019 – workshop at PyCon UK,Cardiff
MASTERING SOCIAL MEDIA MINING WITH PYTHON Great news, my book on data mining for social media is finally out! The title is Mastering Social Media Mining with Python. I've been working with Packt Publishing over the past few months, and in July the book has been finalised and released. Links: ebook and paperback on Packt Publishing (the publisher) ebook and paperback MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIES Mining Twitter Data with Python (Part 3: Term Frequencies) This is the third part in a series of articles about data mining on Twitter. After collecting data and pre-processing some text, we are ready for some basic analysis. In this article, we’ll discuss the analysis of term frequencies to extract meaningful terms from our tweets. MINING TWITTER DATA WITH PYTHON (PART 1 Twitter is a popular social network where users can share short SMS-like messages called tweets. Users share thoughts, links and pictures on Twitter, journalists comment on live events, companies promote products and engage with customers. The list of different ways to use Twitter could be really long, and with 500 millions of tweetsper day,
MINING TWITTER DATA WITH PYTHON (PART 2: TEXT PRE Mining Twitter Data with Python (Part 2: Text Pre-processing) This is the second part of a series of articles about data mining on Twitter. In the previous episode, we have seen how to collect data from Twitter. In this post, we’ll discuss the structure of a tweet and we’ll start digging into the processing steps we need for some textanalysis.
BUILDING A SEARCH-AS-YOU-TYPE FEATURE WITH ELASTICSEARCH Search-as-you-type is an interesting feature of modern search engines, that allows users to have an instant feedback related to their search, while they are still typing a query. In this tutorial, we discuss how to implement this feature in a custom search engine built with Elasticsearch and Python/Flask on the backend side, and AngularJS for SENTIMENT ANALYSIS WITH PYTHON AND SCIKIT-LEARN Sentiment Analysis is a field of study which analyses people’s opinions towards entities like products, typically expressed in written forms like on-line reviews. In recent years, it’s been a hot topic in both academia and industry, also thanks to the massive popularity of social media which provide a constant source of textual data full of opinions to analyse. MINING TWITTER DATA WITH PYTHON (AND JS) Geolocation is the process of identifying the geographic location of an object such as a mobile phone or a computer. Twitter allows its users to provide their location when they publish a tweet, in the form of latitude and longitude coordinates. With this information, we are ready to create some nice visualisation for our data, in the form ofinteractive maps.
MARCO BONZANINI
I’m a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd. More about Marcohere.
STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ONMARCOBONZANINI.COM
MINING TWITTER DATA WITH PYTHON: PART 5 MINING TWITTER DATA WITH PYTHON (PART 6 Sentiment Analysis is one of the interesting applications of text analytics. Although the term is often associated with sentiment classification of documents, broadly speaking it refers to the use of text analytics approaches applied to the set of problems related to identifying and extracting subjective material in text sources.. This article continues the series on mining Twitter data with MINING TWITTER DATA WITH PYTHON (PART 1 MINING TWITTER DATA WITH PYTHON (PART 2: TEXT PRE Mining Twitter Data with Python (Part 2: Text Pre-processing) This is the second part of a series of articles about data mining on Twitter. In the previous episode, we have seen how to collect data from Twitter. In this post, we’ll discuss the structure of a tweet and we’ll start digging into the processing steps we need for some textanalysis.
MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIESSEE MORE ONMARCOBONZANINI.COM
PHRASE MATCH AND PROXIMITY SEARCH IN ELASTICSEARCH The starting point is to understand the specific use case that we’re trying to tackle, and from here we have a set of choices. Depending on the scenario, we might want to choose one between: a simple match search. a match search with a minimum match ratio. a phrase-based match. a phrase match with a slop for proximity search. HOW TO QUERY ELASTICSEARCH WITH PYTHON Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It's a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language. This article provides an overview TUNING RELEVANCE IN ELASTICSEARCH WITH CUSTOM BOOSTING Tuning Relevance in Elasticsearch with Custom Boosting. Elasticsearch offers different options out of the box in terms of ranking function (similarity function, in Lucene terminology). The default ranking function is a variation of TF-IDF, relatively simple to understand and, thanks to some smart normalisations, also quite effective inpractice.
MARCO BONZANINI
I’m a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd. More about Marcohere.
MINING TWITTER DATA WITH PYTHON: PART 5 STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ONMARCOBONZANINI.COM
MINING TWITTER DATA WITH PYTHON (PART 2: TEXT PRE Mining Twitter Data with Python (Part 2: Text Pre-processing) This is the second part of a series of articles about data mining on Twitter. In the previous episode, we have seen how to collect data from Twitter. In this post, we’ll discuss the structure of a tweet and we’ll start digging into the processing steps we need for some textanalysis.
MINING TWITTER DATA WITH PYTHON (PART 6 Sentiment Analysis is one of the interesting applications of text analytics. Although the term is often associated with sentiment classification of documents, broadly speaking it refers to the use of text analytics approaches applied to the set of problems related to identifying and extracting subjective material in text sources.. This article continues the series on mining Twitter data with MINING TWITTER DATA WITH PYTHON (PART 1 MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIESSEE MORE ONMARCOBONZANINI.COM
MINING TWITTER DATA WITH PYTHON (PART 4: RUGBY AND TERM CO Last Saturday was the closing day of the Six Nations Championship, an annual international rugby competition. Before turning on the TV to watch Italy being trashed by Wales, I decided to use this event to collect some data from Twitter and perform some exploratory text analysis on something more interesting than the small list of HOW TO QUERY ELASTICSEARCH WITH PYTHON Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It's a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language. This article provides an overview TUNING RELEVANCE IN ELASTICSEARCH WITH CUSTOM BOOSTING Tuning Relevance in Elasticsearch with Custom Boosting. Elasticsearch offers different options out of the box in terms of ranking function (similarity function, in Lucene terminology). The default ranking function is a variation of TF-IDF, relatively simple to understand and, thanks to some smart normalisations, also quite effective inpractice.
ABOUT ME – MARCO BONZANINI About Me. I’m a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.PYTHON TRAINING
Marco is a trainer and lecturer with more than 20 years of teaching experience. He specialises in the Python for Data Science software stack and he offers a broad range of training curricula, spanning from Python programming foundations to specialised Data Analytics and Machine Learning classes, through his company Bonzanini ConsultingLtd.
MINING TWITTER DATA WITH PYTHON: PART 5 A picture is worth a thousand tweets: more often than not, designing a good visual representation of our data, can help us make sense of them and highlight interesting insights. After collecting and analysing Twitter data, the tutorial continues with some notions on data visualisation with Python. Tutorial Table of Contents: Part 1:Collecting dataPart
MASTERING SOCIAL MEDIA MINING WITH PYTHON Great news, my book on data mining for social media is finally out! The title is Mastering Social Media Mining with Python. I've been working with Packt Publishing over the past few months, and in July the book has been finalised and released. Links: ebook and paperback on Packt Publishing (the publisher) ebook and paperback SEARCHING PUBMED WITH PYTHON Update 2021-01: minor update to reflect some changes in the Pubmed API PubMed is a search engine accessing millions of biomedical citations. Users can freely search for biomedical references. For some articles, the access to the full text paper is also open. This post describes how you can programmatically search the PubMed database with Python,in order
MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIES Mining Twitter Data with Python (Part 3: Term Frequencies) This is the third part in a series of articles about data mining on Twitter. After collecting data and pre-processing some text, we are ready for some basic analysis. In this article, we’ll discuss the analysis of term frequencies to extract meaningful terms from our tweets. SENTIMENT ANALYSIS WITH PYTHON AND SCIKIT-LEARN Sentiment Analysis is a field of study which analyses people’s opinions towards entities like products, typically expressed in written forms like on-line reviews. In recent years, it’s been a hot topic in both academia and industry, also thanks to the massive popularity of social media which provide a constant source of textual data full of opinions to analyse. BUILDING DATA PIPELINES WITH PYTHON AND LUIGI Building Data Pipelines with Python and Luigi. As a data scientist, the emphasis of the day-to-day job is often more on the R&D side rather than engineering. In the process of going from prototypes to production though, some of the early quick-and-dirty decisions turn out to be sub-optimal and require a decent amount of effort to bere-engineered.
GETTING STARTED WITH APACHE SPARK AND PYTHON 3 Getting Started with Apache Spark and Python 3. Apache Spark is a cluster computing framework, currently one of the most actively developed in the open-source Big Data arena. It aims at being a general engine for large-scale data processing, supporting a number of platforms for cluster management (e.g. YARN or Mesos as well as Sparknative) and
MINING TWITTER DATA WITH PYTHON (AND JS) Geolocation is the process of identifying the geographic location of an object such as a mobile phone or a computer. Twitter allows its users to provide their location when they publish a tweet, in the form of latitude and longitude coordinates. With this information, we are ready to create some nice visualisation for our data, in the form ofinteractive maps.
MARCO BONZANINI
I’m a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd. More about Marcohere.
STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ONMARCOBONZANINI.COM
MINING TWITTER DATA WITH PYTHON: PART 5 MINING TWITTER DATA WITH PYTHON (PART 6 Sentiment Analysis is one of the interesting applications of text analytics. Although the term is often associated with sentiment classification of documents, broadly speaking it refers to the use of text analytics approaches applied to the set of problems related to identifying and extracting subjective material in text sources.. This article continues the series on mining Twitter data with MINING TWITTER DATA WITH PYTHON (PART 1 MINING TWITTER DATA WITH PYTHON (PART 2: TEXT PRE Mining Twitter Data with Python (Part 2: Text Pre-processing) This is the second part of a series of articles about data mining on Twitter. In the previous episode, we have seen how to collect data from Twitter. In this post, we’ll discuss the structure of a tweet and we’ll start digging into the processing steps we need for some textanalysis.
MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIESSEE MORE ONMARCOBONZANINI.COM
PHRASE MATCH AND PROXIMITY SEARCH IN ELASTICSEARCH The starting point is to understand the specific use case that we’re trying to tackle, and from here we have a set of choices. Depending on the scenario, we might want to choose one between: a simple match search. a match search with a minimum match ratio. a phrase-based match. a phrase match with a slop for proximity search. HOW TO QUERY ELASTICSEARCH WITH PYTHON Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It's a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language. This article provides an overview TUNING RELEVANCE IN ELASTICSEARCH WITH CUSTOM BOOSTING Tuning Relevance in Elasticsearch with Custom Boosting. Elasticsearch offers different options out of the box in terms of ranking function (similarity function, in Lucene terminology). The default ranking function is a variation of TF-IDF, relatively simple to understand and, thanks to some smart normalisations, also quite effective inpractice.
MARCO BONZANINI
I’m a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd. More about Marcohere.
STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ONMARCOBONZANINI.COM
MINING TWITTER DATA WITH PYTHON: PART 5 MINING TWITTER DATA WITH PYTHON (PART 6 Sentiment Analysis is one of the interesting applications of text analytics. Although the term is often associated with sentiment classification of documents, broadly speaking it refers to the use of text analytics approaches applied to the set of problems related to identifying and extracting subjective material in text sources.. This article continues the series on mining Twitter data with MINING TWITTER DATA WITH PYTHON (PART 1 MINING TWITTER DATA WITH PYTHON (PART 2: TEXT PRE Mining Twitter Data with Python (Part 2: Text Pre-processing) This is the second part of a series of articles about data mining on Twitter. In the previous episode, we have seen how to collect data from Twitter. In this post, we’ll discuss the structure of a tweet and we’ll start digging into the processing steps we need for some textanalysis.
MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIESSEE MORE ONMARCOBONZANINI.COM
PHRASE MATCH AND PROXIMITY SEARCH IN ELASTICSEARCH The starting point is to understand the specific use case that we’re trying to tackle, and from here we have a set of choices. Depending on the scenario, we might want to choose one between: a simple match search. a match search with a minimum match ratio. a phrase-based match. a phrase match with a slop for proximity search. HOW TO QUERY ELASTICSEARCH WITH PYTHON Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It's a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language. This article provides an overview TUNING RELEVANCE IN ELASTICSEARCH WITH CUSTOM BOOSTING Tuning Relevance in Elasticsearch with Custom Boosting. Elasticsearch offers different options out of the box in terms of ranking function (similarity function, in Lucene terminology). The default ranking function is a variation of TF-IDF, relatively simple to understand and, thanks to some smart normalisations, also quite effective inpractice.
ABOUT ME – MARCO BONZANINI About Me. I’m a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.PYTHON TRAINING
Marco is a trainer and lecturer with more than 20 years of teaching experience. He specialises in the Python for Data Science software stack and he offers a broad range of training curricula, spanning from Python programming foundations to specialised Data Analytics and Machine Learning classes, through his company Bonzanini ConsultingLtd.
MINING TWITTER DATA WITH PYTHON: PART 5 A picture is worth a thousand tweets: more often than not, designing a good visual representation of our data, can help us make sense of them and highlight interesting insights. After collecting and analysing Twitter data, the tutorial continues with some notions on data visualisation with Python. Tutorial Table of Contents: Part 1:Collecting dataPart
MASTERING SOCIAL MEDIA MINING WITH PYTHON Great news, my book on data mining for social media is finally out! The title is Mastering Social Media Mining with Python. I've been working with Packt Publishing over the past few months, and in July the book has been finalised and released. Links: ebook and paperback on Packt Publishing (the publisher) ebook and paperback SEARCHING PUBMED WITH PYTHON Update 2021-01: minor update to reflect some changes in the Pubmed API PubMed is a search engine accessing millions of biomedical citations. Users can freely search for biomedical references. For some articles, the access to the full text paper is also open. This post describes how you can programmatically search the PubMed database with Python,in order
MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIES Mining Twitter Data with Python (Part 3: Term Frequencies) This is the third part in a series of articles about data mining on Twitter. After collecting data and pre-processing some text, we are ready for some basic analysis. In this article, we’ll discuss the analysis of term frequencies to extract meaningful terms from our tweets. SENTIMENT ANALYSIS WITH PYTHON AND SCIKIT-LEARN Sentiment Analysis is a field of study which analyses people’s opinions towards entities like products, typically expressed in written forms like on-line reviews. In recent years, it’s been a hot topic in both academia and industry, also thanks to the massive popularity of social media which provide a constant source of textual data full of opinions to analyse. BUILDING DATA PIPELINES WITH PYTHON AND LUIGI Building Data Pipelines with Python and Luigi. As a data scientist, the emphasis of the day-to-day job is often more on the R&D side rather than engineering. In the process of going from prototypes to production though, some of the early quick-and-dirty decisions turn out to be sub-optimal and require a decent amount of effort to bere-engineered.
MINING TWITTER DATA WITH PYTHON (AND JS) Geolocation is the process of identifying the geographic location of an object such as a mobile phone or a computer. Twitter allows its users to provide their location when they publish a tweet, in the form of latitude and longitude coordinates. With this information, we are ready to create some nice visualisation for our data, in the form ofinteractive maps.
GETTING STARTED WITH APACHE SPARK AND PYTHON 3 Getting Started with Apache Spark and Python 3. Apache Spark is a cluster computing framework, currently one of the most actively developed in the open-source Big Data arena. It aims at being a general engine for large-scale data processing, supporting a number of platforms for cluster management (e.g. YARN or Mesos as well as Sparknative) and
MARCO BONZANINI
I’m a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd. More about Marcohere.
STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ONMARCOBONZANINI.COM
MINING TWITTER DATA WITH PYTHON: PART 5 MINING TWITTER DATA WITH PYTHON (PART 6 Sentiment Analysis is one of the interesting applications of text analytics. Although the term is often associated with sentiment classification of documents, broadly speaking it refers to the use of text analytics approaches applied to the set of problems related to identifying and extracting subjective material in text sources.. This article continues the series on mining Twitter data with MINING TWITTER DATA WITH PYTHON (PART 1 MINING TWITTER DATA WITH PYTHON (PART 2: TEXT PRE Mining Twitter Data with Python (Part 2: Text Pre-processing) This is the second part of a series of articles about data mining on Twitter. In the previous episode, we have seen how to collect data from Twitter. In this post, we’ll discuss the structure of a tweet and we’ll start digging into the processing steps we need for some textanalysis.
MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIESSEE MORE ONMARCOBONZANINI.COM
PHRASE MATCH AND PROXIMITY SEARCH IN ELASTICSEARCH The starting point is to understand the specific use case that we’re trying to tackle, and from here we have a set of choices. Depending on the scenario, we might want to choose one between: a simple match search. a match search with a minimum match ratio. a phrase-based match. a phrase match with a slop for proximity search. HOW TO QUERY ELASTICSEARCH WITH PYTHON Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It's a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language. This article provides an overview TUNING RELEVANCE IN ELASTICSEARCH WITH CUSTOM BOOSTING Tuning Relevance in Elasticsearch with Custom Boosting. Elasticsearch offers different options out of the box in terms of ranking function (similarity function, in Lucene terminology). The default ranking function is a variation of TF-IDF, relatively simple to understand and, thanks to some smart normalisations, also quite effective inpractice.
MARCO BONZANINI
I’m a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd. More about Marcohere.
STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ONMARCOBONZANINI.COM
MINING TWITTER DATA WITH PYTHON: PART 5 MINING TWITTER DATA WITH PYTHON (PART 6 Sentiment Analysis is one of the interesting applications of text analytics. Although the term is often associated with sentiment classification of documents, broadly speaking it refers to the use of text analytics approaches applied to the set of problems related to identifying and extracting subjective material in text sources.. This article continues the series on mining Twitter data with MINING TWITTER DATA WITH PYTHON (PART 1 MINING TWITTER DATA WITH PYTHON (PART 2: TEXT PRE Mining Twitter Data with Python (Part 2: Text Pre-processing) This is the second part of a series of articles about data mining on Twitter. In the previous episode, we have seen how to collect data from Twitter. In this post, we’ll discuss the structure of a tweet and we’ll start digging into the processing steps we need for some textanalysis.
MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIESSEE MORE ONMARCOBONZANINI.COM
PHRASE MATCH AND PROXIMITY SEARCH IN ELASTICSEARCH The starting point is to understand the specific use case that we’re trying to tackle, and from here we have a set of choices. Depending on the scenario, we might want to choose one between: a simple match search. a match search with a minimum match ratio. a phrase-based match. a phrase match with a slop for proximity search. HOW TO QUERY ELASTICSEARCH WITH PYTHON Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It's a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language. This article provides an overview TUNING RELEVANCE IN ELASTICSEARCH WITH CUSTOM BOOSTING Tuning Relevance in Elasticsearch with Custom Boosting. Elasticsearch offers different options out of the box in terms of ranking function (similarity function, in Lucene terminology). The default ranking function is a variation of TF-IDF, relatively simple to understand and, thanks to some smart normalisations, also quite effective inpractice.
ABOUT ME – MARCO BONZANINI About Me. I’m a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.PYTHON TRAINING
Marco is a trainer and lecturer with more than 20 years of teaching experience. He specialises in the Python for Data Science software stack and he offers a broad range of training curricula, spanning from Python programming foundations to specialised Data Analytics and Machine Learning classes, through his company Bonzanini ConsultingLtd.
MINING TWITTER DATA WITH PYTHON: PART 5 A picture is worth a thousand tweets: more often than not, designing a good visual representation of our data, can help us make sense of them and highlight interesting insights. After collecting and analysing Twitter data, the tutorial continues with some notions on data visualisation with Python. Tutorial Table of Contents: Part 1:Collecting dataPart
MASTERING SOCIAL MEDIA MINING WITH PYTHON Great news, my book on data mining for social media is finally out! The title is Mastering Social Media Mining with Python. I've been working with Packt Publishing over the past few months, and in July the book has been finalised and released. Links: ebook and paperback on Packt Publishing (the publisher) ebook and paperback SEARCHING PUBMED WITH PYTHON Update 2021-01: minor update to reflect some changes in the Pubmed API PubMed is a search engine accessing millions of biomedical citations. Users can freely search for biomedical references. For some articles, the access to the full text paper is also open. This post describes how you can programmatically search the PubMed database with Python,in order
MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIES Mining Twitter Data with Python (Part 3: Term Frequencies) This is the third part in a series of articles about data mining on Twitter. After collecting data and pre-processing some text, we are ready for some basic analysis. In this article, we’ll discuss the analysis of term frequencies to extract meaningful terms from our tweets. SENTIMENT ANALYSIS WITH PYTHON AND SCIKIT-LEARN Sentiment Analysis is a field of study which analyses people’s opinions towards entities like products, typically expressed in written forms like on-line reviews. In recent years, it’s been a hot topic in both academia and industry, also thanks to the massive popularity of social media which provide a constant source of textual data full of opinions to analyse. BUILDING DATA PIPELINES WITH PYTHON AND LUIGI Building Data Pipelines with Python and Luigi. As a data scientist, the emphasis of the day-to-day job is often more on the R&D side rather than engineering. In the process of going from prototypes to production though, some of the early quick-and-dirty decisions turn out to be sub-optimal and require a decent amount of effort to bere-engineered.
MINING TWITTER DATA WITH PYTHON (AND JS) Geolocation is the process of identifying the geographic location of an object such as a mobile phone or a computer. Twitter allows its users to provide their location when they publish a tweet, in the form of latitude and longitude coordinates. With this information, we are ready to create some nice visualisation for our data, in the form ofinteractive maps.
GETTING STARTED WITH APACHE SPARK AND PYTHON 3 Getting Started with Apache Spark and Python 3. Apache Spark is a cluster computing framework, currently one of the most actively developed in the open-source Big Data arena. It aims at being a general engine for large-scale data processing, supporting a number of platforms for cluster management (e.g. YARN or Mesos as well as Sparknative) and
MARCO BONZANINI
I’m a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd. More about Marcohere.
MINING TWITTER DATA WITH PYTHON: PART 5 MY PYTHON CODE IS SLOW? TIPS FOR PROFILING tl;dr Before you can optimise your slow code, you need to identify the bottlenecks: proper profiling will give you the right insights. This article discusses some profiling tools for Python. Introduction Python is a high-level programming language with an emphasis on readability. Some of its peculiarities, like the dynamic typing, or the (in)famous GIL, might have some trade-offs in terms MINING TWITTER DATA WITH PYTHON (PART 6 Sentiment Analysis is one of the interesting applications of text analytics. Although the term is often associated with sentiment classification of documents, broadly speaking it refers to the use of text analytics approaches applied to the set of problems related to identifying and extracting subjective material in text sources.. This article continues the series on mining Twitter data with STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ONMARCOBONZANINI.COM
SEARCHING PUBMED WITH PYTHON Update 2021-01: minor update to reflect some changes in the Pubmed API PubMed is a search engine accessing millions of biomedical citations. Users can freely search for biomedical references. For some articles, the access to the full text paper is also open. This post describes how you can programmatically search the PubMed database with Python,in order
MINING TWITTER DATA WITH PYTHON (PART 1 PHRASE MATCH AND PROXIMITY SEARCH IN ELASTICSEARCH The starting point is to understand the specific use case that we’re trying to tackle, and from here we have a set of choices. Depending on the scenario, we might want to choose one between: a simple match search. a match search with a minimum match ratio. a phrase-based match. a phrase match with a slop for proximity search. HOW TO QUERY ELASTICSEARCH WITH PYTHON Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It's a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language. This article provides an overview TUNING RELEVANCE IN ELASTICSEARCH WITH CUSTOM BOOSTING Tuning Relevance in Elasticsearch with Custom Boosting. Elasticsearch offers different options out of the box in terms of ranking function (similarity function, in Lucene terminology). The default ranking function is a variation of TF-IDF, relatively simple to understand and, thanks to some smart normalisations, also quite effective inpractice.
MARCO BONZANINI
I’m a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd. More about Marcohere.
MINING TWITTER DATA WITH PYTHON: PART 5 MY PYTHON CODE IS SLOW? TIPS FOR PROFILING tl;dr Before you can optimise your slow code, you need to identify the bottlenecks: proper profiling will give you the right insights. This article discusses some profiling tools for Python. Introduction Python is a high-level programming language with an emphasis on readability. Some of its peculiarities, like the dynamic typing, or the (in)famous GIL, might have some trade-offs in terms MINING TWITTER DATA WITH PYTHON (PART 6 Sentiment Analysis is one of the interesting applications of text analytics. Although the term is often associated with sentiment classification of documents, broadly speaking it refers to the use of text analytics approaches applied to the set of problems related to identifying and extracting subjective material in text sources.. This article continues the series on mining Twitter data with STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ONMARCOBONZANINI.COM
SEARCHING PUBMED WITH PYTHON Update 2021-01: minor update to reflect some changes in the Pubmed API PubMed is a search engine accessing millions of biomedical citations. Users can freely search for biomedical references. For some articles, the access to the full text paper is also open. This post describes how you can programmatically search the PubMed database with Python,in order
MINING TWITTER DATA WITH PYTHON (PART 1 PHRASE MATCH AND PROXIMITY SEARCH IN ELASTICSEARCH The starting point is to understand the specific use case that we’re trying to tackle, and from here we have a set of choices. Depending on the scenario, we might want to choose one between: a simple match search. a match search with a minimum match ratio. a phrase-based match. a phrase match with a slop for proximity search. HOW TO QUERY ELASTICSEARCH WITH PYTHON Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It's a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language. This article provides an overview TUNING RELEVANCE IN ELASTICSEARCH WITH CUSTOM BOOSTING Tuning Relevance in Elasticsearch with Custom Boosting. Elasticsearch offers different options out of the box in terms of ranking function (similarity function, in Lucene terminology). The default ranking function is a variation of TF-IDF, relatively simple to understand and, thanks to some smart normalisations, also quite effective inpractice.
MARCO BONZANINI
I’m a Data Science consultant, corporate trainer and author based in London, UK.I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.. More about Marcohere.
ABOUT ME – MARCO BONZANINI About Me. I’m a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd. GETTING INTO DATA SCIENCE PRESENTATION AT HISAR CODING Last week I had the opportunity to speak at the Hisar Coding Summit 2021, an event organised by students of the Hisar School of Istanbul. The remote format opened the doors for participants around the globe, but the audience was mainly high-school students with an interest in Data Science. The title of my presentation, Getting MASTERING SOCIAL MEDIA MINING WITH PYTHON Great news, my book on data mining for social media is finally out! The title is Mastering Social Media Mining with Python. I've been working with Packt Publishing over the past few months, and in July the book has been finalised and released. Links: ebook and paperback on Packt Publishing (the publisher) ebook and paperback MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIES Mining Twitter Data with Python (Part 3: Term Frequencies) This is the third part in a series of articles about data mining on Twitter. After collecting data and pre-processing some text, we are ready for some basic analysis. In this article, we’ll discuss the analysis of term frequencies to extract meaningful terms from our tweets. MINING TWITTER DATA WITH PYTHON (PART 2: TEXT PRE Mining Twitter Data with Python (Part 2: Text Pre-processing) This is the second part of a series of articles about data mining on Twitter. In the previous episode, we have seen how to collect data from Twitter. In this post, we’ll discuss the structure of a tweet and we’ll start digging into the processing steps we need for some textanalysis.
GETTING STARTED WITH NEO4J AND PYTHON This article is a brief introduction to Neo4j, one of the most popular graph databases, and its integration with Python. Graph Databases Graph databases are a family of NoSQL databases, based on the concept of modelling your data as a graph, i.e. a collection of nodes (representing entities) and edges (representing relationships). Themotivation behind
SENTIMENT ANALYSIS WITH PYTHON AND SCIKIT-LEARN Sentiment Analysis is a field of study which analyses people’s opinions towards entities like products, typically expressed in written forms like on-line reviews. In recent years, it’s been a hot topic in both academia and industry, also thanks to the massive popularity of social media which provide a constant source of textual data full of opinions to analyse. BUILDING A SEARCH-AS-YOU-TYPE FEATURE WITH ELASTICSEARCH Search-as-you-type is an interesting feature of modern search engines, that allows users to have an instant feedback related to their search, while they are still typing a query. In this tutorial, we discuss how to implement this feature in a custom search engine built with Elasticsearch and Python/Flask on the backend side, and AngularJS for MINING TWITTER DATA WITH PYTHON (AND JS) Geolocation is the process of identifying the geographic location of an object such as a mobile phone or a computer. Twitter allows its users to provide their location when they publish a tweet, in the form of latitude and longitude coordinates. With this information, we are ready to create some nice visualisation for our data, in the form ofinteractive maps.
MARCO BONZANINI
I’m a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd. More about Marcohere.
MINING TWITTER DATA WITH PYTHON: PART 5 MY PYTHON CODE IS SLOW? TIPS FOR PROFILING tl;dr Before you can optimise your slow code, you need to identify the bottlenecks: proper profiling will give you the right insights. This article discusses some profiling tools for Python. Introduction Python is a high-level programming language with an emphasis on readability. Some of its peculiarities, like the dynamic typing, or the (in)famous GIL, might have some trade-offs in terms MINING TWITTER DATA WITH PYTHON (PART 6 Sentiment Analysis is one of the interesting applications of text analytics. Although the term is often associated with sentiment classification of documents, broadly speaking it refers to the use of text analytics approaches applied to the set of problems related to identifying and extracting subjective material in text sources.. This article continues the series on mining Twitter data with STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ONMARCOBONZANINI.COM
SEARCHING PUBMED WITH PYTHON Update 2021-01: minor update to reflect some changes in the Pubmed API PubMed is a search engine accessing millions of biomedical citations. Users can freely search for biomedical references. For some articles, the access to the full text paper is also open. This post describes how you can programmatically search the PubMed database with Python,in order
MINING TWITTER DATA WITH PYTHON (PART 1 PHRASE MATCH AND PROXIMITY SEARCH IN ELASTICSEARCH The starting point is to understand the specific use case that we’re trying to tackle, and from here we have a set of choices. Depending on the scenario, we might want to choose one between: a simple match search. a match search with a minimum match ratio. a phrase-based match. a phrase match with a slop for proximity search. HOW TO QUERY ELASTICSEARCH WITH PYTHON Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It's a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language. This article provides an overview TUNING RELEVANCE IN ELASTICSEARCH WITH CUSTOM BOOSTING Tuning Relevance in Elasticsearch with Custom Boosting. Elasticsearch offers different options out of the box in terms of ranking function (similarity function, in Lucene terminology). The default ranking function is a variation of TF-IDF, relatively simple to understand and, thanks to some smart normalisations, also quite effective inpractice.
MARCO BONZANINI
I’m a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd. More about Marcohere.
MINING TWITTER DATA WITH PYTHON: PART 5 MY PYTHON CODE IS SLOW? TIPS FOR PROFILING tl;dr Before you can optimise your slow code, you need to identify the bottlenecks: proper profiling will give you the right insights. This article discusses some profiling tools for Python. Introduction Python is a high-level programming language with an emphasis on readability. Some of its peculiarities, like the dynamic typing, or the (in)famous GIL, might have some trade-offs in terms MINING TWITTER DATA WITH PYTHON (PART 6 Sentiment Analysis is one of the interesting applications of text analytics. Although the term is often associated with sentiment classification of documents, broadly speaking it refers to the use of text analytics approaches applied to the set of problems related to identifying and extracting subjective material in text sources.. This article continues the series on mining Twitter data with STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ONMARCOBONZANINI.COM
SEARCHING PUBMED WITH PYTHON Update 2021-01: minor update to reflect some changes in the Pubmed API PubMed is a search engine accessing millions of biomedical citations. Users can freely search for biomedical references. For some articles, the access to the full text paper is also open. This post describes how you can programmatically search the PubMed database with Python,in order
MINING TWITTER DATA WITH PYTHON (PART 1 PHRASE MATCH AND PROXIMITY SEARCH IN ELASTICSEARCH The starting point is to understand the specific use case that we’re trying to tackle, and from here we have a set of choices. Depending on the scenario, we might want to choose one between: a simple match search. a match search with a minimum match ratio. a phrase-based match. a phrase match with a slop for proximity search. HOW TO QUERY ELASTICSEARCH WITH PYTHON Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It's a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language. This article provides an overview TUNING RELEVANCE IN ELASTICSEARCH WITH CUSTOM BOOSTING Tuning Relevance in Elasticsearch with Custom Boosting. Elasticsearch offers different options out of the box in terms of ranking function (similarity function, in Lucene terminology). The default ranking function is a variation of TF-IDF, relatively simple to understand and, thanks to some smart normalisations, also quite effective inpractice.
MARCO BONZANINI
I’m a Data Science consultant, corporate trainer and author based in London, UK.I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.. More about Marcohere.
ABOUT ME – MARCO BONZANINI About Me. I’m a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd. GETTING INTO DATA SCIENCE PRESENTATION AT HISAR CODING Last week I had the opportunity to speak at the Hisar Coding Summit 2021, an event organised by students of the Hisar School of Istanbul. The remote format opened the doors for participants around the globe, but the audience was mainly high-school students with an interest in Data Science. The title of my presentation, Getting MASTERING SOCIAL MEDIA MINING WITH PYTHON Great news, my book on data mining for social media is finally out! The title is Mastering Social Media Mining with Python. I've been working with Packt Publishing over the past few months, and in July the book has been finalised and released. Links: ebook and paperback on Packt Publishing (the publisher) ebook and paperback MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIES Mining Twitter Data with Python (Part 3: Term Frequencies) This is the third part in a series of articles about data mining on Twitter. After collecting data and pre-processing some text, we are ready for some basic analysis. In this article, we’ll discuss the analysis of term frequencies to extract meaningful terms from our tweets. MINING TWITTER DATA WITH PYTHON (PART 2: TEXT PRE Mining Twitter Data with Python (Part 2: Text Pre-processing) This is the second part of a series of articles about data mining on Twitter. In the previous episode, we have seen how to collect data from Twitter. In this post, we’ll discuss the structure of a tweet and we’ll start digging into the processing steps we need for some textanalysis.
GETTING STARTED WITH NEO4J AND PYTHON This article is a brief introduction to Neo4j, one of the most popular graph databases, and its integration with Python. Graph Databases Graph databases are a family of NoSQL databases, based on the concept of modelling your data as a graph, i.e. a collection of nodes (representing entities) and edges (representing relationships). Themotivation behind
SENTIMENT ANALYSIS WITH PYTHON AND SCIKIT-LEARN Sentiment Analysis is a field of study which analyses people’s opinions towards entities like products, typically expressed in written forms like on-line reviews. In recent years, it’s been a hot topic in both academia and industry, also thanks to the massive popularity of social media which provide a constant source of textual data full of opinions to analyse. BUILDING A SEARCH-AS-YOU-TYPE FEATURE WITH ELASTICSEARCH Search-as-you-type is an interesting feature of modern search engines, that allows users to have an instant feedback related to their search, while they are still typing a query. In this tutorial, we discuss how to implement this feature in a custom search engine built with Elasticsearch and Python/Flask on the backend side, and AngularJS for MINING TWITTER DATA WITH PYTHON (AND JS) Geolocation is the process of identifying the geographic location of an object such as a mobile phone or a computer. Twitter allows its users to provide their location when they publish a tweet, in the form of latitude and longitude coordinates. With this information, we are ready to create some nice visualisation for our data, in the form ofinteractive maps.
MARCO BONZANINI
I’m a Data Science consultant, corporate trainer and author based in London, UK.I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.. More about Marcohere.
MINING TWITTER DATA WITH PYTHON: PART 5 MY PYTHON CODE IS SLOW? TIPS FOR PROFILING tl;dr Before you can optimise your slow code, you need to identify the bottlenecks: proper profiling will give you the right insights. This article discusses some profiling tools for Python. Introduction Python is a high-level programming language with an emphasis on readability. Some of its peculiarities, like the dynamic typing, or the (in)famous GIL, might have some trade-offs in terms MINING TWITTER DATA WITH PYTHON (PART 6 Sentiment Analysis is one of the interesting applications of text analytics. Although the term is often associated with sentiment classification of documents, broadly speaking it refers to the use of text analytics approaches applied to the set of problems related to identifying and extracting subjective material in text sources.. This article continues the series on mining Twitter data with STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ONMARCOBONZANINI.COM
SEARCHING PUBMED WITH PYTHON Update 2021-01: minor update to reflect some changes in the Pubmed API PubMed is a search engine accessing millions of biomedical citations. Users can freely search for biomedical references. For some articles, the access to the full text paper is also open. This post describes how you can programmatically search the PubMed database with Python,in order
MINING TWITTER DATA WITH PYTHON (PART 1 PHRASE MATCH AND PROXIMITY SEARCH IN ELASTICSEARCH The case of multi-term queries in Elasticsearch offers some room for discussion, because there are several options to consider depending on the specific use case we're dealing with. Multi-term queries are, in their most generic definition, queries with several terms. These terms could be completely unrelated, or they could be about the same topic,or
TUNING RELEVANCE IN ELASTICSEARCH WITH CUSTOM BOOSTING Elasticsearch offers different options out of the box in terms of ranking function (similarity function, in Lucene terminology). The default ranking function is a variation of TF-IDF, relatively simple to understand and, thanks to some smart normalisations, also quite effective in practice. Each use case is a different story so sometimes the default ranking function HOW TO QUERY ELASTICSEARCH WITH PYTHON Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It's a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language. This article provides an overviewMARCO BONZANINI
I’m a Data Science consultant, corporate trainer and author based in London, UK.I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.. More about Marcohere.
MINING TWITTER DATA WITH PYTHON: PART 5 MY PYTHON CODE IS SLOW? TIPS FOR PROFILING tl;dr Before you can optimise your slow code, you need to identify the bottlenecks: proper profiling will give you the right insights. This article discusses some profiling tools for Python. Introduction Python is a high-level programming language with an emphasis on readability. Some of its peculiarities, like the dynamic typing, or the (in)famous GIL, might have some trade-offs in terms MINING TWITTER DATA WITH PYTHON (PART 6 Sentiment Analysis is one of the interesting applications of text analytics. Although the term is often associated with sentiment classification of documents, broadly speaking it refers to the use of text analytics approaches applied to the set of problems related to identifying and extracting subjective material in text sources.. This article continues the series on mining Twitter data with STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ONMARCOBONZANINI.COM
SEARCHING PUBMED WITH PYTHON Update 2021-01: minor update to reflect some changes in the Pubmed API PubMed is a search engine accessing millions of biomedical citations. Users can freely search for biomedical references. For some articles, the access to the full text paper is also open. This post describes how you can programmatically search the PubMed database with Python,in order
MINING TWITTER DATA WITH PYTHON (PART 1 PHRASE MATCH AND PROXIMITY SEARCH IN ELASTICSEARCH The case of multi-term queries in Elasticsearch offers some room for discussion, because there are several options to consider depending on the specific use case we're dealing with. Multi-term queries are, in their most generic definition, queries with several terms. These terms could be completely unrelated, or they could be about the same topic,or
TUNING RELEVANCE IN ELASTICSEARCH WITH CUSTOM BOOSTING Elasticsearch offers different options out of the box in terms of ranking function (similarity function, in Lucene terminology). The default ranking function is a variation of TF-IDF, relatively simple to understand and, thanks to some smart normalisations, also quite effective in practice. Each use case is a different story so sometimes the default ranking function HOW TO QUERY ELASTICSEARCH WITH PYTHON Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It's a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language. This article provides an overviewMARCO BONZANINI
I’m a Data Science consultant, corporate trainer and author based in London, UK.I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.. More about Marcohere.
ABOUT ME – MARCO BONZANINI I'm a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack.With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.Backed by a PhD inInformation
GETTING INTO DATA SCIENCE PRESENTATION AT HISAR CODING Last week I had the opportunity to speak at the Hisar Coding Summit 2021, an event organised by students of the Hisar School of Istanbul. The remote format opened the doors for participants around the globe, but the audience was mainly high-school students with an interest in Data Science. The title of my presentation, Getting MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIES This is the third part in a series of articles about data mining on Twitter. After collecting data and pre-processing some text, we are ready for some basic analysis. In this article, we'll discuss the analysis of term frequencies to extract meaningful terms from our tweets. Tutorial Table of Contents: Part 1: Collecting data Part MASTERING SOCIAL MEDIA MINING WITH PYTHON Great news, my book on data mining for social media is finally out! The title is Mastering Social Media Mining with Python. I've been working with Packt Publishing over the past few months, and in July the book has been finalised and released. Links: ebook and paperback on Packt Publishing (the publisher) ebook and paperback MINING TWITTER DATA WITH PYTHON (PART 2: TEXT PRE This is the second part of a series of articles about data mining on Twitter. In the previous episode, we have seen how to collect data from Twitter. In this post, we'll discuss the structure of a tweet and we'll start digging into the processing steps we need for some text analysis. Table of Contents GETTING STARTED WITH NEO4J AND PYTHON This article is a brief introduction to Neo4j, one of the most popular graph databases, and its integration with Python. Graph Databases Graph databases are a family of NoSQL databases, based on the concept of modelling your data as a graph, i.e. a collection of nodes (representing entities) and edges (representing relationships). Themotivation behind
SENTIMENT ANALYSIS WITH PYTHON AND SCIKIT-LEARN Sentiment Analysis is a field of study which analyses people’s opinions towards entities like products, typically expressed in written forms like on-line reviews. In recent years, it’s been a hot topic in both academia and industry, also thanks to the massive popularity of social media which provide a constant source of textual data full of opinions to analyse. BUILDING A SEARCH-AS-YOU-TYPE FEATURE WITH ELASTICSEARCH Search-as-you-type is an interesting feature of modern search engines, that allows users to have an instant feedback related to their search, while they are still typing a query. In this tutorial, we discuss how to implement this feature in a custom search engine built with Elasticsearch and Python/Flask on the backend side, and AngularJS for MINING TWITTER DATA WITH PYTHON (AND JS) Geolocation is the process of identifying the geographic location of an object such as a mobile phone or a computer. Twitter allows its users to provide their location when they publish a tweet, in the form of latitude and longitude coordinates. With this information, we are ready to create some nice visualisation for our data, in the form ofinteractive maps.
MARCO BONZANINI
I’m a Data Science consultant, corporate trainer and author based in London, UK.I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.. More about Marcohere.
ABOUT ME – MARCO BONZANINI I'm a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack.With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.Backed by a PhD inInformation
PRESENTATIONS
Some recent talks and tutorials I've delivered at various events: What are they talking about? Mining topics in documents with topic modelling and Python (slides - code demo - video) September 2019 - talk at London Python meetup This presentation is a practical introduction to topic modelling in Python, tackling the problem ofanalysing large
STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ONMARCOBONZANINI.COM
MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIESSEE MORE ONMARCOBONZANINI.COM
MINING TWITTER DATA WITH PYTHON (PART 1 MINING TWITTER DATA WITH PYTHON (PART 6 Sentiment Analysis is one of the interesting applications of text analytics. Although the term is often associated with sentiment classification of documents, broadly speaking it refers to the use of text analytics approaches applied to the set of problems related to identifying and extracting subjective material in text sources.. This article continues the series on mining Twitter data with MINING TWITTER DATA WITH PYTHON (PART 2: TEXT PRE This is the second part of a series of articles about data mining on Twitter. In the previous episode, we have seen how to collect data from Twitter. In this post, we'll discuss the structure of a tweet and we'll start digging into the processing steps we need for some text analysis. Table of Contents PHRASE MATCH AND PROXIMITY SEARCH IN ELASTICSEARCH The case of multi-term queries in Elasticsearch offers some room for discussion, because there are several options to consider depending on the specific use case we're dealing with. Multi-term queries are, in their most generic definition, queries with several terms. These terms could be completely unrelated, or they could be about the same topic,or
TUNING RELEVANCE IN ELASTICSEARCH WITH CUSTOM BOOSTING Elasticsearch offers different options out of the box in terms of ranking function (similarity function, in Lucene terminology). The default ranking function is a variation of TF-IDF, relatively simple to understand and, thanks to some smart normalisations, also quite effective in practice. Each use case is a different story so sometimes the default ranking functionMARCO BONZANINI
I’m a Data Science consultant, corporate trainer and author based in London, UK.I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.. More about Marcohere.
ABOUT ME – MARCO BONZANINI I'm a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack.With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.Backed by a PhD inInformation
PRESENTATIONS
Some recent talks and tutorials I've delivered at various events: What are they talking about? Mining topics in documents with topic modelling and Python (slides - code demo - video) September 2019 - talk at London Python meetup This presentation is a practical introduction to topic modelling in Python, tackling the problem ofanalysing large
STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ONMARCOBONZANINI.COM
MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIESSEE MORE ONMARCOBONZANINI.COM
MINING TWITTER DATA WITH PYTHON (PART 1 MINING TWITTER DATA WITH PYTHON (PART 6 Sentiment Analysis is one of the interesting applications of text analytics. Although the term is often associated with sentiment classification of documents, broadly speaking it refers to the use of text analytics approaches applied to the set of problems related to identifying and extracting subjective material in text sources.. This article continues the series on mining Twitter data with MINING TWITTER DATA WITH PYTHON (PART 2: TEXT PRE This is the second part of a series of articles about data mining on Twitter. In the previous episode, we have seen how to collect data from Twitter. In this post, we'll discuss the structure of a tweet and we'll start digging into the processing steps we need for some text analysis. Table of Contents PHRASE MATCH AND PROXIMITY SEARCH IN ELASTICSEARCH The case of multi-term queries in Elasticsearch offers some room for discussion, because there are several options to consider depending on the specific use case we're dealing with. Multi-term queries are, in their most generic definition, queries with several terms. These terms could be completely unrelated, or they could be about the same topic,or
TUNING RELEVANCE IN ELASTICSEARCH WITH CUSTOM BOOSTING Elasticsearch offers different options out of the box in terms of ranking function (similarity function, in Lucene terminology). The default ranking function is a variation of TF-IDF, relatively simple to understand and, thanks to some smart normalisations, also quite effective in practice. Each use case is a different story so sometimes the default ranking function ABOUT ME – MARCO BONZANINI I'm a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack.With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.Backed by a PhD inInformation
BOOKS & ARTICLES
This page contains references to my publications. Book: Mastering Social Media Mining with Python In 2016 I wrote a book on data mining for social media using Python. You can know more about this book here: ebook and paperback on Packt Publishing ebook and paperback on Amazon.com and Amazon UK companion code for the book onPRESENTATIONS
Some recent talks and tutorials I've delivered at various events: What are they talking about? Mining topics in documents with topic modelling and Python (slides - code demo - video) September 2019 - talk at London Python meetup This presentation is a practical introduction to topic modelling in Python, tackling the problem ofanalysing large
PYTHON TRAINING
Marco is a trainer and lecturer with more than 20 years of teaching experience. He specialises in the Python for Data Science software stack and he offers a broad range of training curricula, spanning from Python programming foundations to specialised Data Analytics and Machine Learning classes, through his company Bonzanini ConsultingLtd.
GETTING INTO DATA SCIENCE PRESENTATION AT HISAR CODING Last week I had the opportunity to speak at the Hisar Coding Summit 2021, an event organised by students of the Hisar School of Istanbul. The remote format opened the doors for participants around the globe, but the audience was mainly high-school students with an interest in Data Science. The title of my presentation, GettingJANUARY 2021
2 posts published by Marco during January 2021. Feature Scaling, also known as Data Normalisation, is a data preprocessing technique used in Machine Learning to normalise the range of predictor variables (i.e. independent variables, or features). SEARCHING PUBMED WITH PYTHON Update 2021-01: minor update to reflect some changes in the Pubmed API PubMed is a search engine accessing millions of biomedical citations. Users can freely search for biomedical references. For some articles, the access to the full text paper is also open. This post describes how you can programmatically search the PubMed database with Python,in order
MINING TWITTER DATA WITH PYTHON: PART 5 A picture is worth a thousand tweets: more often than not, designing a good visual representation of our data, can help us make sense of them and highlight interesting insights. After collecting and analysing Twitter data, the tutorial continues with some notions on data visualisation with Python. Tutorial Table of Contents: Part 1:Collecting dataPart
MINING TWITTER DATA WITH PYTHON (PART 2: TEXT PRE This is the second part of a series of articles about data mining on Twitter. In the previous episode, we have seen how to collect data from Twitter. In this post, we'll discuss the structure of a tweet and we'll start digging into the processing steps we need for some text analysis. Table of Contents HOW TO QUERY ELASTICSEARCH WITH PYTHON Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It's a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language. This article provides an overviewSkip to content
MARCO BONZANINI
PYTHON, DATA SCIENCE, TEXT ANALYTICSMenu
* About
* Writing
* Speaking
API · Big Data
· Books
· Data Mining
· NLP
· Python
· Text Analytics
· Text Mining
MASTERING SOCIAL MEDIA MINING WITHÂ PYTHON August 2, 2016April 22, 2017Marco 22 Comments
Great news, my book on data mining for social media is finally out! The title is Mastering Social Media Mining with Python. I’ve been working with Packt Publishing over the past few months, and in July the book has been finalised and released. Links: ebook and paperback on Packt Publishing (the publisher) ebook and paperback… Continue reading Mastering Social Media Mining with PythonConferences ·
Python
PYDATA LONDON 2018
May 3, 2018
Marco
1 Comment
Last weekend (April 27-29) we run PyData London 2018, the fifth edition of our annual conference (we also have a monthly meet-up, with currently 7,200+ members). The event is entirely run by volunteers, with the purpose of bringing the community together and raising money for NumFOCUS, the charity that provides financial support to open-source scientific… Continue reading PyData London 2018data science ·
Machine Learning
· Python
VIDEO COURSE: PRACTICAL PYTHON DATA SCIENCEÂ TECHNIQUESSeptember 13, 2017
Marco Leave a
comment
I’m happy to announce the recent release of my second video course, Practical Python Data Science Techniques published with Packt Publishing. Links: video course on Packt Publishing (the publisher) companion code for the course (on my GitHub) This video course follows my first introductory course (Data Analysis with Python) and provides the audience with recipe-like… Continue reading Video Course: Practical Python Data Science TechniquesUncategorized
VIDEO COURSE: DATA ANALYSIS WITHÂ PYTHON May 3, 2017May 3, 2017Marco 6 Comments
I’m happy to announce the release of my first video course Data Analysis with Python, published with Packt Publishing. Links: video course on Packt Publishing (the publisher) companion code for the course on my GitHub With 2 hours 26 minutes of content segmented into short video sessions, this course aims at introducing the audience to… Continue reading Video Course: Data Analysis with PythonConferences ·
Python
PYCON ITALY 2017Â WRITE-UP April 11, 2017April 11, 2017Marco Leave a
comment
Last week I’ve travelled to Florence where I attended PyCon Otto, the 8th edition of the Italian Python Conference. As expected, it’s been yet another great experience with the Italian Python community and many international guests. This year the very first day, Thursday, was beginners’ day, with introductory workshops run by volunteer mentors. Thanks to… Continue reading PyCon Italy 2017 write-upConferences · NLP
· Python
PYCON UK 2016Â WRITE-UP September 25, 2016September 25, 2016Marco
Leave a comment
Last week I had a long weekend at PyCon UK 2016 in Cardiff, and it’s been a fantastic experience! Great talks, great friends/colleagues and lots of ideas. On Monday 19th, on the last day of the conference, my friend Miguel and I have run a tutorial/workshop on Natural Language Processing in Python (the GitHub repo… Continue reading PyCon UK2016 write-up
Conferences ·
London · Meet-ups
· Python
PYDATA LONDON 2016Â WRITE-UPMay 11, 2016
Marco Leave a
comment
Last weekend I was at the PyData London conference for three Pythonic days. Firstly, thanks to the organiser, volunteers, speakers, sponsors and everyone who has contributed in a way or another to make the event a great success. This year I had the opportunity to contribute as member of the review committee, which means I… Continue reading PyData London 2016 write-upPOSTS NAVIGATION
Older posts
BOOK
VIDEO COURSE
VIDEO COURSE
RECENT POSTS
* PyData London 2018May 3,
2018
* Video Course: Practical Python Data Science TechniquesSeptember 13, 2017
* Video Course: Data Analysis with PythonMay 3, 2017
* PyCon Italy 2017Â write-upApril 11, 2017
* PyCon UK 2016Â write-upSeptember 25, 2016
* Mastering Social Media Mining with PythonAugust 2, 2016
* PyData London 2016Â write-upMay 11, 2016
* PyCon Italia / PyData Italy 2016Â Write-UpApril 19, 2016
* Retrocomputing and Python: import turtleDecember 29, 2015
* Adding Slack Notifications to a Luigi Pipeline in PythonNovember 21, 2015
ARCHIVES
* May 2018 Â (1)
* September 2017 Â (1)* May 2017 Â (1)
* April 2017 Â (1)
* September 2016 Â (1)* August 2016 Â (1)
* May 2016 Â (1)
* April 2016 Â (1)
* December 2015 Â (1) * November 2015 Â (1) * October 2015 Â (1) * September 2015 Â (2)* August 2015 Â (3)
* July 2015 Â (2)
* June 2015 Â (3)
* May 2015 Â (2)
* April 2015 Â (3)
* March 2015 Â (4)
* February 2015 Â (4) * January 2015 Â (4)CATEGORIES
* API (2)
* Best Practices
(1)
* Big Data (4)
* Books (1)
* Conferences (5)
* Data Mining (8)
* data science
(1)
* Data Visualisation(2)
* Elasticsearch
(6)
* Engineering (2)
* Functional Programming(1)
* Graph Databases
(1)
* Javascript (3)
* London (3)
* Machine Learning
(1)
* Maps (1)
* Meet-ups (3)
* MongoDB (1)
* Neo4j (1)
* NLP (9)
* NoSQL (1)
* Python (32)
* Relevance (2)
* Retrocomputing
(1)
* Search (7)
* Sentiment Analysis(2)
* Spark (1)
* Text Analytics
(3)
* Text Mining (2)
* Text Summarisation(1)
* Uncategorized
(1)
The content of this blog by Marco Bonzanini is licensed under a Creative Commons Attribution 4.0 International License.
Blog at WordPress.com.Marco Bonzanini
Blog at WordPress.com.Post to
Cancel
* Follow
*
* Marco Bonzanini
* Customize
* Follow
* Sign up
* Log in
* Report this content * Manage subscriptions* Collapse this bar
Details
Copyright © 2024 ArchiveBay.com. All rights reserved. Terms of Use | Privacy Policy | DMCA | 2021 | Feedback | Advertising | RSS 2.0