I was on Github looking at something else (as always) when I saw someone I follow forking an R repository called Polite. Dmytry’s markdown to explain the pillars of politeness are well written and absolutely applicable to the wider corners of scraping that a lot of analysts do using tools like Alteryx.
Webscraping can vary between wildly hard (purposefully or otherwise) and being the notch below an open API to extract data. However, there are often some hints you can use to check whether or not a particular site will be one of the easy or hard ones.
On Wednesday my esteemed colleague @liluns and I went to the AWS Summit in London, which is several thousand people crammed into one end of the Excel Centre in the Docklands.
There’s a lot of data where one observation to a human (e.g. one survey) isn’t ideally one observation to the query language of a database system.
From just getting it out the door to ensuring the doors are all custom, let’s take a look at the different ways to generate them.
You’re not really meant to say this when you are (or have been) a data analyst/scientist/whatever, but I have a limited patience/tolerance for the reformatting and cleaning of data.