Are you over 18 and want to see adult content?
More Annotations

Taller Jugando con el arte - Escuela Taller de Cerámica
Are you over 18 and want to see adult content?

RoomRecess - Free Learning Games for Kids Online
Are you over 18 and want to see adult content?

Дом-2 Ñвежие Ñерии Ñмотреть онлайн беÑплатно
Are you over 18 and want to see adult content?

The best vibrator judged by one vibrant vixen
Are you over 18 and want to see adult content?

JOTA – Jornalismo e tecnologia para tomadores de decisão
Are you over 18 and want to see adult content?

AP Employees salary slips, pay slips, IR Calculator, EHS Hospitals - Ebadi.in
Are you over 18 and want to see adult content?

Global Water Filtry do wody - Twoje źródło wiedzy o wodzie - Blog i Sklep
Are you over 18 and want to see adult content?

The Fervent Shaker - Impassioned By Cocktails
Are you over 18 and want to see adult content?

柚åç•™å¦_æ—¥ä¼æä¾›çš„æ—¥æœ¬ç•™å¦æ”¯æ´å¹³å°
Are you over 18 and want to see adult content?
Favourite Annotations

A complete backup of www.gamesofdesire.com
Are you over 18 and want to see adult content?

A complete backup of youngtinysex.com
Are you over 18 and want to see adult content?

A complete backup of rushteenporn.com
Are you over 18 and want to see adult content?

A complete backup of www.www.bellazon.com
Are you over 18 and want to see adult content?

A complete backup of www.teennudegirls.com
Are you over 18 and want to see adult content?
Text
TECH BLOG
Mark Litwintschik. I have 15 years of consulting & hands-on build experience with clients in the UK, USA, Sweden, Ireland & Germany. Past clients include Bank of America Merrill Lynch, Blackberry, Bloomberg, British Telecom, Ford, Google, ITV, LeoVegas, News UK, Pizza Hut, Royal Bank of Scotland, Royal Mail, T-Mobile, TransferWise, Williams Formula 1 & UBS. 1.1 BILLION TAXI RIDES USING OMNISCIDB AND A MACBOOK PRO 1.1 Billion Taxi Rides using OmniSciDB and a MacBook Pro. Many believe that for near-instant analytics on billions of records you'd need dedicated Linux clusters, several GPUs or proprietary Cloud offerings. Some of my fastest benchmarks were run on such environments. But in 2020, an off-the-shelf MacBook Pro using OmniSciDB (formerly MapD) can PYTHON & BIG DATA: AIRFLOW & JUPYTER NOTEBOOK WITH HADOOP By default Presto's Web UI, Spark's Web UI and Airflow's Web UI all use TCP port 8080. If you launch Presto after Spark then Presto will fail to start. If you start Spark after Presto then Presto will launch on 8080 and the Spark Master Server will take 8081 and keep trying higher ports until it SUMMARY OF THE 1.1 BILLION TAXI RIDES BENCHMARKS Summary of the 1.1 Billion Taxi Rides Benchmarks. This table lists the fastest query times (measured in seconds) seen in each of my benchmarks broken down by software and hardware setup. The dataset used has 1.1 billion records, 51 columns and is 500 GB in size when in uncompressed CSV format. Instructions on producing the dataset can befound
SYSTEMS MONITORING: TOP VS HTOP VS GLANCES The software does a good job of keeping you within the application. If you want to inspect the files a process is using you can select the process and simply type l, if you want to run the process through strace simply type s while running htop as a privileged user. Below will install and run htop on Ubuntu 16.04.2 LTS. CONVERT CSVS TO ORC FASTER I'll then use Presto to convert the CSV data into ZStandard-compressed ORC files. $ presto-cli \ --schema default \ --catalog hive. The following took 37 mins and 44 seconds. INSERT INTO trips_orc_zstd_presto SELECT * FROM trips_csv; The above generated 79GB
COLLECTING ALL IPV4 WHOIS RECORDS IN PYTHON UPDATE: Since writing this blog post I've had other developers get in touch with ideas of improving the script described in this post. I've create a repo on github where pull requests can be submitted.. I recently published a blog post on finding the fastest way to lookup the country mapping of any given IP Address. Within a day I found some interesting insights made by gigarray on /r/Python: TIGHTENING DJANGO ADMIN LOGINS Tightening Django Admin Logins. Django admin has a form where you can login with an account and see the various admin screens you have access to. As with any login screen it's good to detect multiple failures and restrict the access of any offending user's IP address. Pointing fail2ban at the nginx logs should be enough but not in thiscase.
IP ADDRESS LOOKUPS USING PYTHON The curious case of 24.24.24.24. The CSV database I downloaded from MaxMind and the binary one I installed via the libgeoip-dev package had differences between them. One of the test IP addresses I used when I started building these scripts was 24.24.24.24.According to whois 24.24.24.24 the IP address is mapped to a network in Herndon, VA, USA and sits in the net range 24.24.0.0 -IS HADOOP DEAD?
EMR allows you to launch Hadoop clusters with a large variety of software installed with a couple of clicks. It can run on spot instances which cut hardware costs by ~80% and can store data on S3 which was, and still is, cheap and has 99.999999999% durability. Suddenly, the need for 25 contractors on a project was gone.TECH BLOG
Mark Litwintschik. I have 15 years of consulting & hands-on build experience with clients in the UK, USA, Sweden, Ireland & Germany. Past clients include Bank of America Merrill Lynch, Blackberry, Bloomberg, British Telecom, Ford, Google, ITV, LeoVegas, News UK, Pizza Hut, Royal Bank of Scotland, Royal Mail, T-Mobile, TransferWise, Williams Formula 1 & UBS. PYTHON & BIG DATA: AIRFLOW & JUPYTER NOTEBOOK WITH HADOOP By default Presto's Web UI, Spark's Web UI and Airflow's Web UI all use TCP port 8080. If you launch Presto after Spark then Presto will fail to start. If you start Spark after Presto then Presto will launch on 8080 and the Spark Master Server will take 8081 and keep trying higher ports until it 1.1 BILLION TAXI RIDES USING OMNISCIDB AND A MACBOOK PRO 1.1 Billion Taxi Rides using OmniSciDB and a MacBook Pro. Many believe that for near-instant analytics on billions of records you'd need dedicated Linux clusters, several GPUs or proprietary Cloud offerings. Some of my fastest benchmarks were run on such environments. But in 2020, an off-the-shelf MacBook Pro using OmniSciDB (formerly MapD) can SUMMARY OF THE 1.1 BILLION TAXI RIDES BENCHMARKS Summary of the 1.1 Billion Taxi Rides Benchmarks. This table lists the fastest query times (measured in seconds) seen in each of my benchmarks broken down by software and hardware setup. The dataset used has 1.1 billion records, 51 columns and is 500 GB in size when in uncompressed CSV format. Instructions on producing the dataset can befound
SYSTEMS MONITORING: TOP VS HTOP VS GLANCES The software does a good job of keeping you within the application. If you want to inspect the files a process is using you can select the process and simply type l, if you want to run the process through strace simply type s while running htop as a privileged user. Below will install and run htop on Ubuntu 16.04.2 LTS. CONVERT CSVS TO ORC FASTER I'll then use Presto to convert the CSV data into ZStandard-compressed ORC files. $ presto-cli \ --schema default \ --catalog hive. The following took 37 mins and 44 seconds. INSERT INTO trips_orc_zstd_presto SELECT * FROM trips_csv; The above generated 79GB
COLLECTING ALL IPV4 WHOIS RECORDS IN PYTHON UPDATE: Since writing this blog post I've had other developers get in touch with ideas of improving the script described in this post. I've create a repo on github where pull requests can be submitted.. I recently published a blog post on finding the fastest way to lookup the country mapping of any given IP Address. Within a day I found some interesting insights made by gigarray on /r/Python: TIGHTENING DJANGO ADMIN LOGINS Tightening Django Admin Logins. Django admin has a form where you can login with an account and see the various admin screens you have access to. As with any login screen it's good to detect multiple failures and restrict the access of any offending user's IP address. Pointing fail2ban at the nginx logs should be enough but not in thiscase.
IP ADDRESS LOOKUPS USING PYTHON The curious case of 24.24.24.24. The CSV database I downloaded from MaxMind and the binary one I installed via the libgeoip-dev package had differences between them. One of the test IP addresses I used when I started building these scripts was 24.24.24.24.According to whois 24.24.24.24 the IP address is mapped to a network in Herndon, VA, USA and sits in the net range 24.24.0.0 -IS HADOOP DEAD?
EMR allows you to launch Hadoop clusters with a large variety of software installed with a couple of clicks. It can run on spot instances which cut hardware costs by ~80% and can store data on S3 which was, and still is, cheap and has 99.999999999% durability. Suddenly, the need for 25 contractors on a project was gone.TECH BLOG
Mark Litwintschik. I have 15 years of consulting & hands-on build experience with clients in the UK, USA, Sweden, Ireland & Germany. Past clients include Bank of America Merrill Lynch, Blackberry, Bloomberg, British Telecom, Ford, Google, ITV, LeoVegas, News UK, Pizza Hut, Royal Bank of Scotland, Royal Mail, T-Mobile, TransferWise, Williams Formula 1 & UBS. RECOMMENDATION ENGINE BUILT USING SPARK AND PYTHON The code used in this blog post can be found on GitHub.. Apache Spark is a data processing framework that supports building projects in Python and comes with MLlib, distributed machine learning framework. I was excited at the possibilities this software offered when I first read a guide to creating a movie recommendation engine.I was able to find some code snippets and helpful gists but I COLLECTING ALL IPV4 WHOIS RECORDS IN PYTHON UPDATE: Since writing this blog post I've had other developers get in touch with ideas of improving the script described in this post. I've create a repo on github where pull requests can be submitted.. I recently published a blog post on finding the fastest way to lookup the country mapping of any given IP Address. Within a day I found some interesting insights made by gigarray on /r/Python: SYSTEMS MONITORING: TOP VS HTOP VS GLANCES The software does a good job of keeping you within the application. If you want to inspect the files a process is using you can select the process and simply type l, if you want to run the process through strace simply type s while running htop as a privileged user. Below will install and run htop on Ubuntu 16.04.2 LTS. IP ADDRESS LOOKUPS USING PYTHON The curious case of 24.24.24.24. The CSV database I downloaded from MaxMind and the binary one I installed via the libgeoip-dev package had differences between them. One of the test IP addresses I used when I started building these scripts was 24.24.24.24.According to whois 24.24.24.24 the IP address is mapped to a network in Herndon, VA, USA and sits in the net range 24.24.0.0 - LOAD BALANCING DJANGO The whole playbook from this blog post can be seen in this gist.. Ansible is a Python-based tool for automating application deployment and infrastructure setups. It's often compared with Capistrano, chef, puppet and fabric. These comparisons don't always compare apples to apples as each tool has their own distinctive capabilities and philosophy as to how best automate deployments and A MINIMALIST GUIDE TO SQLITE A Minimalist Guide to SQLite. SQLite is a self-contained, serverless SQL database. Dr. Richard Hipp, the creator of SQLite, first released the software on the 17th of August, 2000. Since then it has gone on to be the second most deployed piece of software in the world. It's used in systems as important as the Airbus A350 so it comes as no HADOOP 3 SINGLE-NODE INSTALL GUIDE Hadoop 3 Single-Node Install Guide. Hadoop 3 was released in December 2017. It's a major release with a number of interesting new features. It's early days but I've found so far in my testing it hasn't broken too many of the features or processes I commonly use day to day in my 2.x installations. Again, early days but I'm happy so far with what WORKING WITH THE HADOOP DISTRIBUTED FILE SYSTEM Working with the Hadoop Distributed File System. The Hadoop Distributed File System (HDFS) allows you to both federate storage across many computers as well as distribute files in a redundant manor across a cluster. HDFS is a key component to many storage clusters that possess more than a petabyte of capacity. FASTER FILE DISTRIBUTION WITH HDFS AND S3 The above took 27 minutes and 40 seconds. I wasn't expecting this client to be almost twice as slow as the HDFS CLI. S3 provides consistent performance when I've run other tools multiple times so I suspect either the code behind the put functionality could be optimised or there might be a more appropriate endpoint for copying multi-gigabyte files onto HDFS.MARK LITWINTSCHIK
I have 15 years of consulting & hands-on build experience with clients in the UK, USA, Sweden, Ireland & Germany. Past clients include Bank of America Merrill Lynch, Blackberry, Bloomberg, British Telecom, Ford, Google, ITV, LeoVegas, News UK, Pizza Hut, Royal Bank of Scotland, Royal Mail, T-Mobile, TransferWise, Williams Formula 1 & UBS. I hold both a Canadian and a British passport. My CVprofile.
Home | Benchmarks | Categories| Atom Feed
1.1 BILLION TAXI RIDES USING OMNISCIDB AND A MACBOOK PRO I investigate how fast OmniSciDB can query 1.1 billion taxi journeys using a 16" MacBook Pro. ------------------------- PYTHON WEB SCRAPING WITH VIRTUAL PRIVATE NETWORKS Proxy Python and curl web requests through WireGuard and OpenSSH. ------------------------- FAST IPV4 TO HOST LOOKUPS I compare PostgreSQL and ClickHouse performance characteristics while performing IPv4 to hostname lookups. ------------------------- FASTER ZIP DECOMPRESSION I compare the decompression times of various DEFLATE implementations. ------------------------- FASTER CLICKHOUSE IMPORTS I compare import times of various formats into ClickHouse. ------------------------- YOUTUBE'S DATABASE "PROCELLA" I analyse material recently published on Google's "Procella" query processing engine which powers YouTube. -------------------------IS HADOOP DEAD?
I analyse and debate arguments surrounding the "demise" of Hadoop. ------------------------- MINIMALIST GUIDE TO LOSSLESS COMPRESSION I look at various aspects of lossless compression. ------------------------- FASTER FILE DISTRIBUTION WITH HDFS AND S3 I look for faster ways of transferring files between HDFS and AWS S3. ------------------------- A MINIMALIST GUIDE TO FLUME I take a look at Apache Flume and walk through an example using it to connect Kafka to HDFS. ------------------------- A MINIMALIST GUIDE TO FOUNDATIONDB I take a short look at FoundationDB and walk through a leaderboard example using Python. ------------------------- "ARCHITECTING MODERN DATA PLATFORMS" BOOK REVIEW I review the Hadoop-focused book "Architecting Modern Data Platforms". ------------------------- 1.1 BILLION TAXI RIDES: 108-CORE CLICKHOUSE CLUSTER I investigate how fast ClickHouse 18.16.1 can query 1.1 billion taxi journeys on a 3-node, 108-core AWS EC2 cluster. ------------------------- CONVERT CSVS TO ORC FASTER I compare the ORC file construction times of Spark 2.4.0, Hive 2.3.4and Presto 0.214.
------------------------- 1.1 BILLION TAXI RIDES: SPARK 2.4.0 VERSUS PRESTO 0.214 I investigate how fast Spark and Presto can query 1.1 Billion Taxi Journeys using a 21-node EMR cluster. ------------------------- WORKING WITH THE HADOOP DISTRIBUTED FILE SYSTEM I explore several HDFS interfaces and compare them to the JVM-based Apache Hadoop HDFS CLI. ------------------------- SYSTEMS MONITORING: TOP VS HTOP VS GLANCES An examination and comparison of top, Htop and Glances; three tools for performing ad-hoc monitoring of systems and applicationperformance.
------------------------- WORKING WITH DATA FEEDS This tutorial covers converting Wikipedia's XML dump of its English-language site into CSV, JSON, AVRO and ORC file formats as well as analysing the data using ClickHouse. ------------------------- A MINIMALIST GUIDE TO MICROSOFT SQL SERVER 2017 ON UBUNTU LINUX This tutorial covers importing CSV data into SQL Server 2017, automating data pipeline tasks via Apache Airflow and visualising data using Pandas and Jupyter Notebooks. ------------------------- 1.1 BILLION TAXI RIDES WITH SQLITE, PARQUET & HDFS I investigate how fast SQLite can query 1.1 billion taxi journeys from Parquet files off of HDFS. ------------------------- CUSTOMISING AIRFLOW: BEYOND BOILERPLATE SETTINGS I walk through setting up Apache Airflow to use Dask.distributed, PostgreSQL, logging to AWS S3 as well as create User accounts andPlugins.
------------------------- USING SQL TO QUERY KAFKA, MONGODB, MYSQL, POSTGRESQL AND REDIS WITHPRESTO
A guide to connecting to five different data stores using Presto. ------------------------- PYTHON & BIG DATA: AIRFLOW & JUPYTER NOTEBOOK WITH HADOOP 3, SPARK &PRESTO
A guide to running Airflow and Jupyter Notebook with Hadoop 3, Spark &Presto.
------------------------- 1.1 BILLION TAXI RIDES: EC2 VERSUS EMR I investigate how fast Spark and Presto can query 1.1 Billion Taxi Journeys using an i3.8xlarge EC2 instance with 1.7 TB of NVMe storage versus a 21-node EMR cluster. ------------------------- HADOOP 3 SINGLE-NODE INSTALL GUIDE A simple Hadoop 3 installation guide for Ubuntu 16 that includes Hive,Spark and Presto.
------------------------- 1.1 BILLION TAXI RIDES WITH BRYTLYTDB 2.1 & A 5-NODE IBM MINSKYCLUSTER
I investigate how fast BrytlytDB 2.1 can query 1.1 billion taxi journeys using five IBM Minsky servers with 20 Nvidia P100 GPUs. ------------------------- 1.1 BILLION TAXI RIDES WITH BRYTLYTDB 2.0 & 2 GPU-POWERED P2.16XLARGEEC2 INSTANCES
I investigate how fast BrytlytDB 2.0 can query 1.1 billion taxi journeys using two p16.8xlarge AWS EC2 instances. ------------------------- A MINIMALIST GUIDE TO SQLITE This tutorial covers importing CSV data into SQLite 3, manipulating data via Python and visualising data using Pandas and JupyterNotebooks.
------------------------- 1.1 BILLION TAXI RIDES WITH SPARK 2.2 & 3 RASPBERRY PI 3 MODEL BS I investigate how fast Spark 2.2 can query 1.1 billion taxi journeys using a cluster of three Raspberry Pis. ------------------------- 1.1 BILLION TAXI RIDES WITH BRYTLYTDB & 2 GPU-POWERED P2.16XLARGE EC2INSTANCES
I investigate how fast BrytlytDB can query 1.1 billion taxi journeys using two p16.8xlarge AWS EC2 instances. ------------------------- COMPILING MAPD'S SOURCE CODE In this tutorial I walk-through building MapD from source on an Ubuntu16.04.2 machine.
------------------------- 1.1 BILLION TAXI RIDES WITH MAPD 3.0 & 2 GPU-POWERED P2.8XLARGE EC2INSTANCES
I investigate how fast MapD 3.0 can query 1.1 billion taxi journeys using two p2.8xlarge AWS EC2 instances. ------------------------- DETECTING BOTS IN APACHE & NGINX LOGS I explore the task of bot detection in web traffic logs. ------------------------- DOOM BOTS IN TENSORFLOW I walk through using TensorFlow to train AI Bots to play Doom, a classic first-person shooter. ------------------------- ANALYSING PETABYTES OF WEBSITES I demonstrate how to extract analytical data from petabytes worth of websites collected by Common Crawl. ------------------------- A REVIEW OF "DESIGNING DATA-INTENSIVE APPLICATIONS" I review an early release of Martin Kleppmann's book "Designing Data-Intensive Applications". ------------------------- 1.1 BILLION TAXI RIDES ON CLICKHOUSE & AN INTEL CORE I5 I investigate how fast ClickHouse can query 1.1 billion taxi journeys on an Intel Core i5 4670K. ------------------------- 1.1 BILLION TAXI RIDES ON VERTICA & AN INTEL CORE I5 I investigate how fast Vertica Community Edition 8.0.1 can query 1.1 billion taxi journeys on an Intel Core i5 4670K. ------------------------- 1.1 BILLION TAXI RIDES ON AWS EMR 5.3.0 & SPARK 2.1.0 I investigate how fast an 11-node Spark 2.1.0 cluster can query over abillion records.
------------------------- 1.1 BILLION TAXI RIDES ON KDB+/Q & 4 XEON PHI CPUS I investigate how fast kdb+/q can query 1.1 billion taxi journeys on 4 Intel Xeon Phi 7210 CPUs. ------------------------- 1.1 BILLION TAXI RIDES ON AMAZON ATHENA I investigate how fast Amazon Athena can query 1.1 billion taxijourneys.
------------------------- ALENKA: A GPU-DRIVEN, OPEN SOURCE DATABASE I walk through installing, loading in data and querying Alenka. ------------------------- 1.1 BILLION TAXI RIDES WITH MAPD & 8 NVIDIA PASCAL TITAN XS I investigate how fast MapD can query 1.1 billion taxi journeys using 8 Nvidia Pascal-based Titan X cards. ------------------------- TENSORFLOW ON A GTX 1080 I walk through setting up TensorFlow, a Deep Learning Framework, on Ubuntu 16 with an Nvidia GTX 1080 and use it to build "Deep Fizzbuzz".
------------------------- BUILDING A DATA PIPELINE WITH AIRFLOW I walk through setting up a data pipeline for currency exchange rates using Airflow, PostgreSQL and Redis. ------------------------- 1.1 BILLION TAXI RIDES WITH MAPD & AWS EC2 I investigate how fast MapD can query 1.1 billion taxi journeys using 4 g2.8xlarge EC2 instances. ------------------------- 1.1 BILLION TAXI RIDES WITH MAPD & 4 NVIDIA TITAN XS I investigate how fast MapD can query 1.1 billion taxi journeys using 4 Nvidia Titan X cards. ------------------------- 1.1 BILLION TAXI RIDES WITH MAPD & 8 NVIDIA TESLA K80S I investigate how fast MapD can query 1.1 billion taxi journeys using 8 Nvidia Telsa K80 GPU cards. ------------------------- 1.2 BILLION TAXI RIDES ON AWS RDS RUNNING POSTGRESQL I investigate how fast a series of graph generated using R can be created across 4 different types of AWS RDS instances. ------------------------- 1.1 BILLION TAXI RIDES ON A LARGE REDSHIFT CLUSTER I investigate how fast a 6-node ds2.8xlarge Redshift Cluster can query over a billion records. ------------------------- ALL 1.1 BILLION TAXI RIDES ON REDSHIFT I investigate how fast a single Redshift ds2.xlarge instance can query over a billion records. ------------------------- ALL 1.1 BILLION TAXI RIDES IN ELASTICSEARCH I look at ways of fitting every column of the 1.1 billion taxi rides into Elasticsearch on a single, 850 GB SSD. ------------------------- 50-NODE PRESTO CLUSTER ON GOOGLE CLOUD'S DATAPROC I investigate how fast a 50-node Dataproc cluster queries the metadata of 1.1 billion taxi trips. ------------------------- PERFORMANCE IMPACT OF FILE SIZES ON PRESTO QUERY TIMES I investigate the performance impact of ORC file sizes on Presto query times using Google Cloud's Dataproc service. ------------------------- FASTER IPV4 WHOIS CRAWLING I examine the performance and reliably increases from using Redis across a 51-node IPv4 WHOIS crawling cluster. ------------------------- 33X FASTER QUERIES ON GOOGLE CLOUD'S DATAPROC I look at speeding up Presto queries on 1.1 billion records run on a 10-node Dataproc cluster. ------------------------- MASS IP ADDRESS WHOIS COLLECTION WITH DJANGO & KAFKA I investigate how fast a cluster of EC2 instances can collect WHOIS records of IPv4 addresses. ------------------------- A BILLION TAXI RIDES: AWS S3 VERSUS HDFS I investigate the speed differences between S3 and HDFS when querying over a billion records using Presto on AWS EMR. ------------------------- A BILLION TAXI RIDES ON GOOGLE'S DATAPROC RUNNING PRESTO I investigate how fast a small Dataproc cluster can query over a billion records using Presto. ------------------------- 50-NODE PRESTO CLUSTER ON AMAZON EMR I investigate how fast a 50-node AWS EMR cluster can query over a billion records using Presto. ------------------------- A BILLION TAXI RIDES ON GOOGLE'S BIGQUERY I investigate how fast BigQuery can query the metadata of 1.1 billionNYC taxi journeys.
------------------------- BULK IP ADDRESS WHOIS COLLECTION WITH PYTHON AND HADOOP I investigate how fast a 40-node Hadoop cluster on AWS EMR can collect WHOIS records of IPv4 addresses. ------------------------- A BILLION TAXI RIDES IN POSTGRESQL I look at query speeds on 1.1 billion records on a single PostgreSQL installation running on an SSD. ------------------------- A BILLION TAXI RIDES IN ELASTICSEARCH I investigate how fast a single instance of Elasticsearch can query over a billion records. ------------------------- A BILLION TAXI RIDES ON AMAZON EMR RUNNING SPARK I investigate how fast a small AWS EMR cluster can query over a billion records using Spark. ------------------------- A BILLION TAXI RIDES ON AMAZON EMR RUNNING PRESTO I investigate how fast a small AWS EMR cluster can query over a billion records using Presto. ------------------------- KAFKA PRODUCER LATENCY WITH LARGE TOPIC COUNTS I look at the relationship between topic counts and producer latencywith Kafka.
------------------------- A BILLION TAXI RIDES IN HIVE & PRESTO Import the metadata of over a billion Yellow and Green Taxi and Uber rides in New York City into ORC-formatted, columnar-based files on HDFS and query them using Hive & Presto. ------------------------- A BILLION TAXI RIDES IN REDSHIFT Import the metadata of over a billion Yellow and Green Taxi and Uber rides in New York City into a columnar-based Data Warehouse. ------------------------- PRESTO, PARQUET & AIRPAL Using Airpal to execute queries on Parquet-fomatted data via Presto. ------------------------- A MILLION SONGS ON AWS REDSHIFT Parallel imports of CSV data from AWS S3 into Redshift. ------------------------- HADOOP UP AND RUNNING I explore three ways to get Hadoop installed and running. ------------------------- FASTER TESTING WITH RAM DRIVES Reduce the I/O overhead of running tests in Django. ------------------------- POPULAR AIRLINE PASSENGER ROUTES Scraping 29K Wikipedia pages to find the most popular commercial airline passenger routes. ------------------------- RECOMMENDATION ENGINE BUILT USING SPARK AND PYTHON An end-to-end guide to building a film recommendation engine. ------------------------- TIGHTENING DJANGO ADMIN LOGINS A strategy for blocking dictionary attacks and restricting access to a white list of IP addresses. ------------------------- LINTING UK POSTCODES Parsing and linting UK postcodes is ripe with edge cases. -------------------------PASSWORDS IN DJANGO
A review of Django auth's password storage format and password storage upgrading capabilities. -------------------------FASTER PYTHON
Six tips for speeding up Python code. ------------------------- CRUSHING, CACHING AND CDN DEPLOYMENT IN DJANGO A strategy for crushing, caching and deploying front-end-optimisedDjango sites.
------------------------- BETTER PYTHON PACKAGE MANAGEMENT Python's most popular package management tool is pip. I explore some tools to increase its functionality. ------------------------- LOAD BALANCING DJANGO Setup a load-balanced, two-node Django cluster with a minimal Ansiblefootprint.
------------------------- FASTER DJANGO TESTING Run Django tests concurrently with pytest-xdist. ------------------------- DJANGO EXCEPTION ARCHAEOLOGY How to capture, monitor and analyse exceptions raised from a Djangoproject.
------------------------- PYTHON'S KILLER APPS FOR BLOGGING: PELICAN AND S3CMD I look into the steps of creating a blog using Pelican and hosting it with low-cost CDN services from Amazon with the help of S3cmd. ------------------------- COLLECTING ALL IPV4 WHOIS RECORDS IN PYTHON An exploratory effort to see how hard it is to collect all IPv4'sWHOIS records.
------------------------- FORMER PHP DEVELOPER I stopped coding in PHP in 2011, here are the thoughts that led me tothat decision.
------------------------- FILE UPLOADS TO AMAZON S3 IN DJANGO How to upload files to Amazon S3 from a form in Django as well as (very important) how to test the upload process. ------------------------- IP ADDRESS LOOKUPS USING PYTHON A comparison of four methods used to find the country of an IPaddress.
------------------------- DJANGO SPEAKING JSON django-jsonview offers a method decorator which will cause all responses (including exceptions) to return in API-friend, JSON format. ------------------------- QUERYING ELASTICSEARCH FROM GOOGLE APP ENGINE GAE strips HTTP body payloads if sent via HTTP GET. Elasticsearch excepts post bodies sent via HTTP GET. Re-writing the HTTP verb fixes the communications problem. Copyright © 2014 - 2019 Mark Litwintschik. This site's template is based off a template byGiulio Fidente.
Details
Copyright © 2023 ArchiveBay.com. All rights reserved. Terms of Use | Privacy Policy | DMCA | 2021 | Feedback | Advertising | RSS 2.0