I was on Github looking at something else (as always) when I saw someone I follow forking an R repository called Polite. Dmytry’s markdown to explain the pillars of politeness are well written and absolutely applicable to the wider corners of scraping that a lot of analysts do using tools like Alteryx.
Recently, I took on the mantle of teaching data ethics and security to colleagues who join my company. The training runs over a whole day, but I still feel there’s so much I can’t cover. Making some tea this morning, I realised one of the really important ones was an item that comes up at work under the radar a lot, which is the ethics of overtime.
Webscraping can vary between wildly hard (purposefully or otherwise) and being the notch below an open API to extract data. However, there are often some hints you can use to check whether or not a particular site will be one of the easy or hard ones.
There’s a lot of data where one observation to a human (e.g. one survey) isn’t ideally one observation to the query language of a database system.
You’re not really meant to say this when you are (or have been) a data analyst/scientist/whatever, but I have a limited patience/tolerance for the reformatting and cleaning of data.
The story of The Warming Stripes needs little in the way of explicit direction, which is also why it is so adaptable to odd media (like ties and earrings).
I haven’t felt compelled to write on this blog in some time, partly out of time and partly out having … More
Yesterday in the Hague, Radovan Karadzic was convicted for 40 years for his role in the Bosnian war. What some people may not know is that statistical analysis of migrational movements and killings was one of the ways external observers demonstrated that there was a systematic campaign of ethnic cleansing in Serbia.