More Annotations

Favourite Annotations

Text

MARCO BONZANINI

I’m a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd. More about Marco

here.

MARCOBONZANINI.COM

MINING TWITTER DATA WITH PYTHON (PART 6 Sentiment Analysis is one of the interesting applications of text analytics. Although the term is often associated with sentiment classification of documents, broadly speaking it refers to the use of text analytics approaches applied to the set of problems related to identifying and extracting subjective material in text sources.. This article continues the series on mining Twitter data with GETTING STARTED WITH NEO4J AND PYTHON SEARCHING PUBMED WITH PYTHON Update 2021-01: minor update to reflect some changes in the Pubmed API PubMed is a search engine accessing millions of biomedical citations. Users can freely search for biomedical references. For some articles, the access to the full text paper is also open. This post describes how you can programmatically search the PubMed database with Python,

in order

PHRASE MATCH AND PROXIMITY SEARCH IN ELASTICSEARCH The starting point is to understand the specific use case that we’re trying to tackle, and from here we have a set of choices. Depending on the scenario, we might want to choose one between: a simple match search. a match search with a minimum match ratio. a phrase-based match. a phrase match with a slop for proximity search. HOW TO QUERY ELASTICSEARCH WITH PYTHON Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It's a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language. This article provides an overview TUNING RELEVANCE IN ELASTICSEARCH WITH CUSTOM BOOSTING Tuning Relevance in Elasticsearch with Custom Boosting. Elasticsearch offers different options out of the box in terms of ranking function (similarity function, in Lucene terminology). The default ranking function is a variation of TF-IDF, relatively simple to understand and, thanks to some smart normalisations, also quite effective in

practice.

MARCO BONZANINI

here.

MARCOBONZANINI.COM

in order

practice.

MARCO BONZANINI

I’m a Data Science consultant, corporate trainer and author based in London, UK.I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.. More about Marco

here.

PRESENTATIONS

This presentation is a practical introduction to topic modelling in Python, tackling the problem of analysing large data sets of text, in order to identify topics of interest and related keywords. What are they talking about? Mining topics in documents with topic modelling and Python ( code demo) September 2019 – workshop at PyCon UK,

Cardiff

MASTERING SOCIAL MEDIA MINING WITH PYTHON Great news, my book on data mining for social media is finally out! The title is Mastering Social Media Mining with Python. I've been working with Packt Publishing over the past few months, and in July the book has been finalised and released. Links: ebook and paperback on Packt Publishing (the publisher) ebook and paperback MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIES Mining Twitter Data with Python (Part 3: Term Frequencies) This is the third part in a series of articles about data mining on Twitter. After collecting data and pre-processing some text, we are ready for some basic analysis. In this article, we’ll discuss the analysis of term frequencies to extract meaningful terms from our tweets. MINING TWITTER DATA WITH PYTHON (PART 1 Twitter is a popular social network where users can share short SMS-like messages called tweets. Users share thoughts, links and pictures on Twitter, journalists comment on live events, companies promote products and engage with customers. The list of different ways to use Twitter could be really long, and with 500 millions of tweets

per day,

MINING TWITTER DATA WITH PYTHON (PART 2: TEXT PRE Mining Twitter Data with Python (Part 2: Text Pre-processing) This is the second part of a series of articles about data mining on Twitter. In the previous episode, we have seen how to collect data from Twitter. In this post, we’ll discuss the structure of a tweet and we’ll start digging into the processing steps we need for some text

analysis.

BUILDING A SEARCH-AS-YOU-TYPE FEATURE WITH ELASTICSEARCH Search-as-you-type is an interesting feature of modern search engines, that allows users to have an instant feedback related to their search, while they are still typing a query. In this tutorial, we discuss how to implement this feature in a custom search engine built with Elasticsearch and Python/Flask on the backend side, and AngularJS for SENTIMENT ANALYSIS WITH PYTHON AND SCIKIT-LEARN Sentiment Analysis is a field of study which analyses people’s opinions towards entities like products, typically expressed in written forms like on-line reviews. In recent years, it’s been a hot topic in both academia and industry, also thanks to the massive popularity of social media which provide a constant source of textual data full of opinions to analyse. MINING TWITTER DATA WITH PYTHON (AND JS) Geolocation is the process of identifying the geographic location of an object such as a mobile phone or a computer. Twitter allows its users to provide their location when they publish a tweet, in the form of latitude and longitude coordinates. With this information, we are ready to create some nice visualisation for our data, in the form of

interactive maps.

MARCO BONZANINI

here.

STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ON

MARCOBONZANINI.COM

MINING TWITTER DATA WITH PYTHON: PART 5 MINING TWITTER DATA WITH PYTHON (PART 6 Sentiment Analysis is one of the interesting applications of text analytics. Although the term is often associated with sentiment classification of documents, broadly speaking it refers to the use of text analytics approaches applied to the set of problems related to identifying and extracting subjective material in text sources.. This article continues the series on mining Twitter data with MINING TWITTER DATA WITH PYTHON (PART 1 MINING TWITTER DATA WITH PYTHON (PART 2: TEXT PRE Mining Twitter Data with Python (Part 2: Text Pre-processing) This is the second part of a series of articles about data mining on Twitter. In the previous episode, we have seen how to collect data from Twitter. In this post, we’ll discuss the structure of a tweet and we’ll start digging into the processing steps we need for some text

analysis.

MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIESSEE MORE ON

MARCOBONZANINI.COM

practice.

MARCO BONZANINI

here.

MINING TWITTER DATA WITH PYTHON: PART 5 STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ON

MARCOBONZANINI.COM

analysis.

MARCOBONZANINI.COM

MINING TWITTER DATA WITH PYTHON (PART 4: RUGBY AND TERM CO Last Saturday was the closing day of the Six Nations Championship, an annual international rugby competition. Before turning on the TV to watch Italy being trashed by Wales, I decided to use this event to collect some data from Twitter and perform some exploratory text analysis on something more interesting than the small list of HOW TO QUERY ELASTICSEARCH WITH PYTHON Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It's a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language. This article provides an overview TUNING RELEVANCE IN ELASTICSEARCH WITH CUSTOM BOOSTING Tuning Relevance in Elasticsearch with Custom Boosting. Elasticsearch offers different options out of the box in terms of ranking function (similarity function, in Lucene terminology). The default ranking function is a variation of TF-IDF, relatively simple to understand and, thanks to some smart normalisations, also quite effective in

practice.

PYTHON TRAINING

Marco is a trainer and lecturer with more than 20 years of teaching experience. He specialises in the Python for Data Science software stack and he offers a broad range of training curricula, spanning from Python programming foundations to specialised Data Analytics and Machine Learning classes, through his company Bonzanini Consulting

Ltd.

MINING TWITTER DATA WITH PYTHON: PART 5 A picture is worth a thousand tweets: more often than not, designing a good visual representation of our data, can help us make sense of them and highlight interesting insights. After collecting and analysing Twitter data, the tutorial continues with some notions on data visualisation with Python. Tutorial Table of Contents: Part 1:

Collecting dataPart

in order

MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIES Mining Twitter Data with Python (Part 3: Term Frequencies) This is the third part in a series of articles about data mining on Twitter. After collecting data and pre-processing some text, we are ready for some basic analysis. In this article, we’ll discuss the analysis of term frequencies to extract meaningful terms from our tweets. SENTIMENT ANALYSIS WITH PYTHON AND SCIKIT-LEARN Sentiment Analysis is a field of study which analyses people’s opinions towards entities like products, typically expressed in written forms like on-line reviews. In recent years, it’s been a hot topic in both academia and industry, also thanks to the massive popularity of social media which provide a constant source of textual data full of opinions to analyse. BUILDING DATA PIPELINES WITH PYTHON AND LUIGI Building Data Pipelines with Python and Luigi. As a data scientist, the emphasis of the day-to-day job is often more on the R&D side rather than engineering. In the process of going from prototypes to production though, some of the early quick-and-dirty decisions turn out to be sub-optimal and require a decent amount of effort to be

re-engineered.

GETTING STARTED WITH APACHE SPARK AND PYTHON 3 Getting Started with Apache Spark and Python 3. Apache Spark is a cluster computing framework, currently one of the most actively developed in the open-source Big Data arena. It aims at being a general engine for large-scale data processing, supporting a number of platforms for cluster management (e.g. YARN or Mesos as well as Spark

native) and

MINING TWITTER DATA WITH PYTHON (AND JS) Geolocation is the process of identifying the geographic location of an object such as a mobile phone or a computer. Twitter allows its users to provide their location when they publish a tweet, in the form of latitude and longitude coordinates. With this information, we are ready to create some nice visualisation for our data, in the form of

interactive maps.

MARCO BONZANINI

here.

STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ON

MARCOBONZANINI.COM

analysis.

MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIESSEE MORE ON

MARCOBONZANINI.COM

practice.

MARCO BONZANINI

here.

STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ON

MARCOBONZANINI.COM

analysis.

MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIESSEE MORE ON

MARCOBONZANINI.COM

practice.

PYTHON TRAINING

Ltd.

Collecting dataPart

in order

re-engineered.

interactive maps.

native) and

MARCO BONZANINI

here.

STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ON

MARCOBONZANINI.COM

analysis.

MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIESSEE MORE ON

MARCOBONZANINI.COM

practice.

MARCO BONZANINI

here.

STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ON

MARCOBONZANINI.COM

analysis.

MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIESSEE MORE ON

MARCOBONZANINI.COM

practice.

PYTHON TRAINING

Ltd.

Collecting dataPart

in order

re-engineered.

interactive maps.

native) and

MARCO BONZANINI

here.

MINING TWITTER DATA WITH PYTHON: PART 5 MY PYTHON CODE IS SLOW? TIPS FOR PROFILING tl;dr Before you can optimise your slow code, you need to identify the bottlenecks: proper profiling will give you the right insights. This article discusses some profiling tools for Python. Introduction Python is a high-level programming language with an emphasis on readability. Some of its peculiarities, like the dynamic typing, or the (in)famous GIL, might have some trade-offs in terms MINING TWITTER DATA WITH PYTHON (PART 6 Sentiment Analysis is one of the interesting applications of text analytics. Although the term is often associated with sentiment classification of documents, broadly speaking it refers to the use of text analytics approaches applied to the set of problems related to identifying and extracting subjective material in text sources.. This article continues the series on mining Twitter data with STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ON

MARCOBONZANINI.COM

SEARCHING PUBMED WITH PYTHON Update 2021-01: minor update to reflect some changes in the Pubmed API PubMed is a search engine accessing millions of biomedical citations. Users can freely search for biomedical references. For some articles, the access to the full text paper is also open. This post describes how you can programmatically search the PubMed database with Python,

in order

MINING TWITTER DATA WITH PYTHON (PART 1 PHRASE MATCH AND PROXIMITY SEARCH IN ELASTICSEARCH The starting point is to understand the specific use case that we’re trying to tackle, and from here we have a set of choices. Depending on the scenario, we might want to choose one between: a simple match search. a match search with a minimum match ratio. a phrase-based match. a phrase match with a slop for proximity search. HOW TO QUERY ELASTICSEARCH WITH PYTHON Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It's a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language. This article provides an overview TUNING RELEVANCE IN ELASTICSEARCH WITH CUSTOM BOOSTING Tuning Relevance in Elasticsearch with Custom Boosting. Elasticsearch offers different options out of the box in terms of ranking function (similarity function, in Lucene terminology). The default ranking function is a variation of TF-IDF, relatively simple to understand and, thanks to some smart normalisations, also quite effective in

practice.

MARCO BONZANINI

here.

MARCOBONZANINI.COM

in order

practice.

MARCO BONZANINI

I’m a Data Science consultant, corporate trainer and author based in London, UK.I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.. More about Marco

here.

ABOUT ME – MARCO BONZANINI About Me. I’m a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd. GETTING INTO DATA SCIENCE PRESENTATION AT HISAR CODING Last week I had the opportunity to speak at the Hisar Coding Summit 2021, an event organised by students of the Hisar School of Istanbul. The remote format opened the doors for participants around the globe, but the audience was mainly high-school students with an interest in Data Science. The title of my presentation, Getting MASTERING SOCIAL MEDIA MINING WITH PYTHON Great news, my book on data mining for social media is finally out! The title is Mastering Social Media Mining with Python. I've been working with Packt Publishing over the past few months, and in July the book has been finalised and released. Links: ebook and paperback on Packt Publishing (the publisher) ebook and paperback MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIES Mining Twitter Data with Python (Part 3: Term Frequencies) This is the third part in a series of articles about data mining on Twitter. After collecting data and pre-processing some text, we are ready for some basic analysis. In this article, we’ll discuss the analysis of term frequencies to extract meaningful terms from our tweets. MINING TWITTER DATA WITH PYTHON (PART 2: TEXT PRE Mining Twitter Data with Python (Part 2: Text Pre-processing) This is the second part of a series of articles about data mining on Twitter. In the previous episode, we have seen how to collect data from Twitter. In this post, we’ll discuss the structure of a tweet and we’ll start digging into the processing steps we need for some text

analysis.

GETTING STARTED WITH NEO4J AND PYTHON This article is a brief introduction to Neo4j, one of the most popular graph databases, and its integration with Python. Graph Databases Graph databases are a family of NoSQL databases, based on the concept of modelling your data as a graph, i.e. a collection of nodes (representing entities) and edges (representing relationships). The

motivation behind

SENTIMENT ANALYSIS WITH PYTHON AND SCIKIT-LEARN Sentiment Analysis is a field of study which analyses people’s opinions towards entities like products, typically expressed in written forms like on-line reviews. In recent years, it’s been a hot topic in both academia and industry, also thanks to the massive popularity of social media which provide a constant source of textual data full of opinions to analyse. BUILDING A SEARCH-AS-YOU-TYPE FEATURE WITH ELASTICSEARCH Search-as-you-type is an interesting feature of modern search engines, that allows users to have an instant feedback related to their search, while they are still typing a query. In this tutorial, we discuss how to implement this feature in a custom search engine built with Elasticsearch and Python/Flask on the backend side, and AngularJS for MINING TWITTER DATA WITH PYTHON (AND JS) Geolocation is the process of identifying the geographic location of an object such as a mobile phone or a computer. Twitter allows its users to provide their location when they publish a tweet, in the form of latitude and longitude coordinates. With this information, we are ready to create some nice visualisation for our data, in the form of

interactive maps.

MARCO BONZANINI

here.

MARCOBONZANINI.COM

in order

practice.

MARCO BONZANINI

here.

MARCOBONZANINI.COM

in order

practice.

MARCO BONZANINI

I’m a Data Science consultant, corporate trainer and author based in London, UK.I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.. More about Marco

here.

analysis.

motivation behind

interactive maps.

MARCO BONZANINI

I’m a Data Science consultant, corporate trainer and author based in London, UK.I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.. More about Marco

here.

MARCOBONZANINI.COM

in order

MINING TWITTER DATA WITH PYTHON (PART 1 PHRASE MATCH AND PROXIMITY SEARCH IN ELASTICSEARCH The case of multi-term queries in Elasticsearch offers some room for discussion, because there are several options to consider depending on the specific use case we're dealing with. Multi-term queries are, in their most generic definition, queries with several terms. These terms could be completely unrelated, or they could be about the same topic,

or

TUNING RELEVANCE IN ELASTICSEARCH WITH CUSTOM BOOSTING Elasticsearch offers different options out of the box in terms of ranking function (similarity function, in Lucene terminology). The default ranking function is a variation of TF-IDF, relatively simple to understand and, thanks to some smart normalisations, also quite effective in practice. Each use case is a different story so sometimes the default ranking function HOW TO QUERY ELASTICSEARCH WITH PYTHON Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It's a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language. This article provides an overview

MARCO BONZANINI

I’m a Data Science consultant, corporate trainer and author based in London, UK.I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.. More about Marco

here.

MARCOBONZANINI.COM

in order

or

MARCO BONZANINI

I’m a Data Science consultant, corporate trainer and author based in London, UK.I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.. More about Marco

here.

ABOUT ME – MARCO BONZANINI I'm a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack.With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.Backed by a PhD in

Information

GETTING INTO DATA SCIENCE PRESENTATION AT HISAR CODING Last week I had the opportunity to speak at the Hisar Coding Summit 2021, an event organised by students of the Hisar School of Istanbul. The remote format opened the doors for participants around the globe, but the audience was mainly high-school students with an interest in Data Science. The title of my presentation, Getting MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIES This is the third part in a series of articles about data mining on Twitter. After collecting data and pre-processing some text, we are ready for some basic analysis. In this article, we'll discuss the analysis of term frequencies to extract meaningful terms from our tweets. Tutorial Table of Contents: Part 1: Collecting data Part MASTERING SOCIAL MEDIA MINING WITH PYTHON Great news, my book on data mining for social media is finally out! The title is Mastering Social Media Mining with Python. I've been working with Packt Publishing over the past few months, and in July the book has been finalised and released. Links: ebook and paperback on Packt Publishing (the publisher) ebook and paperback MINING TWITTER DATA WITH PYTHON (PART 2: TEXT PRE This is the second part of a series of articles about data mining on Twitter. In the previous episode, we have seen how to collect data from Twitter. In this post, we'll discuss the structure of a tweet and we'll start digging into the processing steps we need for some text analysis. Table of Contents GETTING STARTED WITH NEO4J AND PYTHON This article is a brief introduction to Neo4j, one of the most popular graph databases, and its integration with Python. Graph Databases Graph databases are a family of NoSQL databases, based on the concept of modelling your data as a graph, i.e. a collection of nodes (representing entities) and edges (representing relationships). The

motivation behind

interactive maps.

MARCO BONZANINI

I’m a Data Science consultant, corporate trainer and author based in London, UK.I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.. More about Marco

here.

Information

PRESENTATIONS

Some recent talks and tutorials I've delivered at various events: What are they talking about? Mining topics in documents with topic modelling and Python (slides - code demo - video) September 2019 - talk at London Python meetup This presentation is a practical introduction to topic modelling in Python, tackling the problem of

analysing large

STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ON

MARCOBONZANINI.COM

MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIESSEE MORE ON

MARCOBONZANINI.COM

MINING TWITTER DATA WITH PYTHON (PART 1 MINING TWITTER DATA WITH PYTHON (PART 6 Sentiment Analysis is one of the interesting applications of text analytics. Although the term is often associated with sentiment classification of documents, broadly speaking it refers to the use of text analytics approaches applied to the set of problems related to identifying and extracting subjective material in text sources.. This article continues the series on mining Twitter data with MINING TWITTER DATA WITH PYTHON (PART 2: TEXT PRE This is the second part of a series of articles about data mining on Twitter. In the previous episode, we have seen how to collect data from Twitter. In this post, we'll discuss the structure of a tweet and we'll start digging into the processing steps we need for some text analysis. Table of Contents PHRASE MATCH AND PROXIMITY SEARCH IN ELASTICSEARCH The case of multi-term queries in Elasticsearch offers some room for discussion, because there are several options to consider depending on the specific use case we're dealing with. Multi-term queries are, in their most generic definition, queries with several terms. These terms could be completely unrelated, or they could be about the same topic,

or

MARCO BONZANINI

I’m a Data Science consultant, corporate trainer and author based in London, UK.I specialise in the Python for Data Science (PyData) software stack. With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.. More about Marco

here.

Information

PRESENTATIONS

analysing large

STEMMING, LEMMATISATION AND POS-TAGGING WITH PYTHON ANDSEE MORE ON

MARCOBONZANINI.COM

MINING TWITTER DATA WITH PYTHON (PART 3: TERM FREQUENCIESSEE MORE ON

MARCOBONZANINI.COM

or

TUNING RELEVANCE IN ELASTICSEARCH WITH CUSTOM BOOSTING Elasticsearch offers different options out of the box in terms of ranking function (similarity function, in Lucene terminology). The default ranking function is a variation of TF-IDF, relatively simple to understand and, thanks to some smart normalisations, also quite effective in practice. Each use case is a different story so sometimes the default ranking function ABOUT ME – MARCO BONZANINI I'm a Data Science consultant, corporate trainer and author based in London, UK. I specialise in the Python for Data Science (PyData) software stack.With 20 years of experience in the tech industry, I provide consulting, coaching and training services in the data science space through my company Bonzanini Consulting Ltd.Backed by a PhD in

Information

BOOKS & ARTICLES

This page contains references to my publications. Book: Mastering Social Media Mining with Python In 2016 I wrote a book on data mining for social media using Python. You can know more about this book here: ebook and paperback on Packt Publishing ebook and paperback on Amazon.com and Amazon UK companion code for the book on

PRESENTATIONS

analysing large

PYTHON TRAINING

Ltd.

JANUARY 2021

2 posts published by Marco during January 2021. Feature Scaling, also known as Data Normalisation, is a data preprocessing technique used in Machine Learning to normalise the range of predictor variables (i.e. independent variables, or features). SEARCHING PUBMED WITH PYTHON Update 2021-01: minor update to reflect some changes in the Pubmed API PubMed is a search engine accessing millions of biomedical citations. Users can freely search for biomedical references. For some articles, the access to the full text paper is also open. This post describes how you can programmatically search the PubMed database with Python,

in order

Collecting dataPart

MINING TWITTER DATA WITH PYTHON (PART 2: TEXT PRE This is the second part of a series of articles about data mining on Twitter. In the previous episode, we have seen how to collect data from Twitter. In this post, we'll discuss the structure of a tweet and we'll start digging into the processing steps we need for some text analysis. Table of Contents HOW TO QUERY ELASTICSEARCH WITH PYTHON Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It's a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language. This article provides an overview

Skip to content

MARCO BONZANINI

PYTHON, DATA SCIENCE, TEXT ANALYTICS

* About

* Writing

* Speaking

API · Big Data

� Books

� Data Mining

� NLP

� Python

� Text Analytics

� Text Mining

MASTERING SOCIAL MEDIA MINING WITH PYTHON August 2, 2016April 22, 2017

Marco 22 Comments

Great news, my book on data mining for social media is finally out! The title is Mastering Social Media Mining with Python. I’ve been working with Packt Publishing over the past few months, and in July the book has been finalised and released. Links: ebook and paperback on Packt Publishing (the publisher) ebook and paperback… Continue reading Mastering Social Media Mining with Python

Conferences ·

Python

PYDATA LONDON 2018

May 3, 2018

Marco

1 Comment

Last weekend (April 27-29) we run PyData London 2018, the fifth edition of our annual conference (we also have a monthly meet-up, with currently 7,200+ members). The event is entirely run by volunteers, with the purpose of bringing the community together and raising money for NumFOCUS, the charity that provides financial support to open-source scientific… Continue reading PyData London 2018

data science ·

Machine Learning

� Python

VIDEO COURSE: PRACTICAL PYTHON DATA SCIENCE TECHNIQUES

September 13, 2017

Marco Leave a

comment

I’m happy to announce the recent release of my second video course, Practical Python Data Science Techniques published with Packt Publishing. Links: video course on Packt Publishing (the publisher) companion code for the course (on my GitHub) This video course follows my first introductory course (Data Analysis with Python) and provides the audience with recipe-like… Continue reading Video Course: Practical Python Data Science Techniques

Uncategorized

VIDEO COURSE: DATA ANALYSIS WITH PYTHON May 3, 2017May 3, 2017

Marco 6 Comments

I’m happy to announce the release of my first video course Data Analysis with Python, published with Packt Publishing. Links: video course on Packt Publishing (the publisher) companion code for the course on my GitHub With 2 hours 26 minutes of content segmented into short video sessions, this course aims at introducing the audience to… Continue reading Video Course: Data Analysis with Python

Conferences ·

Python

PYCON ITALY 2017 WRITE-UP April 11, 2017April 11, 2017

Marco Leave a

comment

Last week I’ve travelled to Florence where I attended PyCon Otto, the 8th edition of the Italian Python Conference. As expected, it’s been yet another great experience with the Italian Python community and many international guests. This year the very first day, Thursday, was beginners’ day, with introductory workshops run by volunteer mentors. Thanks to… Continue reading PyCon Italy 2017 write-up

Conferences · NLP

� Python

PYCON UK 2016 WRITE-UP September 25, 2016September 25, 2016

Marco

Last week I had a long weekend at PyCon UK 2016 in Cardiff, and it’s been a fantastic experience! Great talks, great friends/colleagues and lots of ideas. On Monday 19th, on the last day of the conference, my friend Miguel and I have run a tutorial/workshop on Natural Language Processing in Python (the GitHub repo… Continue reading PyCon UK

2016 write-up

Conferences ·

London · Meet-ups

� Python

PYDATA LONDON 2016 WRITE-UP

May 11, 2016

Marco Leave a

comment

Last weekend I was at the PyData London conference for three Pythonic days. Firstly, thanks to the organiser, volunteers, speakers, sponsors and everyone who has contributed in a way or another to make the event a great success. This year I had the opportunity to contribute as member of the review committee, which means I… Continue reading PyData London 2016 write-up

POSTS NAVIGATION

Older posts

BOOK

VIDEO COURSE

May 3,

2018

* Video Course: Practical Python Data Science Techniques

September 13, 2017

* Video Course: Data Analysis with Python

May 3, 2017

* PyCon Italy 2017 write-up

April 11, 2017

* PyCon UK 2016 write-up

September 25, 2016

* Mastering Social Media Mining with Python

August 2, 2016

* PyData London 2016 write-up

May 11, 2016

* PyCon Italia / PyData Italy 2016 Write-Up

April 19, 2016

* Retrocomputing and Python: import turtle

December 29, 2015

* Adding Slack Notifications to a Luigi Pipeline in Python

November 21, 2015

* May 2018 (1)

* September 2017 (1)

* May 2017 (1)

* April 2017 (1)

* September 2016 (1)

* August 2016 (1)

* May 2016 (1)

* April 2016 (1)

* December 2015 (1) * November 2015 (1) * October 2015 (1) * September 2015 (2)

* August 2015 (3)

* July 2015 (2)

* June 2015 (3)

* May 2015 (2)

* April 2015 (3)

* March 2015 (4)

* February 2015 (4) * January 2015 (4)

* API (2)

* Best Practices

(1)

* Big Data (4)

* Books (1)

* Conferences (5)

* Data Mining (8)

* data science

(1)

* Data Visualisation

(2)

* Elasticsearch

(6)

* Engineering (2)

* Functional Programming

(1)

* Graph Databases

(1)

* Javascript (3)

* London (3)

* Machine Learning

(1)

* Maps (1)

* Meet-ups (3)

* MongoDB (1)

* Neo4j (1)

* NLP (9)

* NoSQL (1)

* Python (32)

* Relevance (2)

* Retrocomputing

(1)

* Search (7)

* Sentiment Analysis

(2)

* Spark (1)

* Text Analytics

(3)

* Text Mining (2)

* Text Summarisation

(1)

* Uncategorized

(1)

The content of this blog by Marco Bonzanini is licensed under a Creative Commons Attribution 4.0 International License

More Annotations

Daniel Brown

2019-10-29 16:47:12

Daniel Brown

2019-10-29 16:47:33

Daniel Brown

2019-10-29 16:47:53

Daniel Brown

2019-10-29 16:48:17

Daniel Brown

2019-10-29 16:48:42

Daniel Brown

2019-10-29 16:49:08

Daniel Brown

2019-10-29 16:49:41

Daniel Brown

2019-10-29 16:50:04

Daniel Brown

2019-10-29 16:50:32

Daniel Brown

2019-10-29 16:50:50

Daniel Brown

2019-10-29 16:51:02

Daniel Brown

2019-10-29 16:51:22

Favourite Annotations

Daniel Brown

2020-02-08 23:13:40

Daniel Brown

2020-02-08 23:13:47

Daniel Brown

2020-02-08 23:14:20

Daniel Brown

2020-02-08 23:14:33

Daniel Brown

2020-02-08 23:15:09

Daniel Brown

2020-02-08 23:15:20

Daniel Brown

2020-02-08 23:15:40

Daniel Brown

2020-02-08 23:16:12

Daniel Brown

2020-02-08 23:16:38

Daniel Brown

2020-02-08 23:16:57

Daniel Brown

2020-02-08 23:17:11

Daniel Brown

2020-02-08 23:17:20

Text

MARCO BONZANINI

here.

MARCOBONZANINI.COM

in order

practice.

MARCO BONZANINI

here.

MARCOBONZANINI.COM

in order

practice.

MARCO BONZANINI

here.

PRESENTATIONS

Cardiff

per day,

analysis.

interactive maps.

MARCO BONZANINI

here.

MARCOBONZANINI.COM

analysis.

MARCOBONZANINI.COM

practice.

MARCO BONZANINI

here.

MARCOBONZANINI.COM

analysis.

MARCOBONZANINI.COM

practice.