By Rebecca Radcliff in the Guardian
Source: UNESCO Institute for Statistics (UIS). Data from 2011 is used for Canada, South Africa, Malaysia, Algeria, Barbados, Botswana, Brazil, Cameroon, Chad, Democratic Republic of Congo, Gambia, Greece, Iceland, Israel, Liechtenstein, Mali, Mozambique, Myanmar, Nepal, Oman, Uganda, Uzbekistan, Yemen. 2010 figures are used for Egypt, China and Morocco. All other statistics taken from 2012. Where countries are grey, no data is available.
The statistics refer to students who have crossed a national border to study, or are enrolled in a distance learning programme abroad. These students are not residents or citizens of the country where they study. Both part-time, full-time, undergraduate and postgraduate students are included.
Students who are under short-term, for-credit study and exchange programmes that last less than a full academic year are not included.
Read the full story Here
By Adam Crymble in StatsLife
Family history giant Ancestry.com claims to have digitised 12.7 billion records that document an element of millions if not billions of individual lives. Maybe a marriage, or a birth, or an arrest, or a discharge from the military. These are what historians call ‘life events’. Unless you’re related to one of these people, or the person happens to be a notable figure, chances are you wouldn’t care about them all that much. You don’t have to feel bad – our collective descendants won’t care about us either.
For data historians though, the individual lives can be aggregated with others to give us a view of whole populations that lived in the past. Through these billions of records we can look for emerging patterns that change over time, as societies evolve, cities grow or shrink, and the age structure gets older or younger as chance may be. These individual lives, seen through documentary fragments in libraries and archives, and digitised to sell to family historians via subscriptions, offer us a chance to see the big picture of history like never before. This is what Kate Börner calls the ‘macroscopic’ view, which lets us ‘observe what is at once too great, slow, or complex for the human eye and mind to notice and comprehend’.
Read More Here
By Mahendra Mahey in BL Digital Scholarship Blog
The second annual British Library Labs Symposium on Monday 3rd November, 2014, opened with Professor Tim Hitchcock giving a keynote speech focusing on ‘Big and small data in the humanities’.
The video is available on Youtube.
Read More Here
By Greg Millar in Wired
Read More Here
By Limor Peer in Institution for social and policy studies blog
Who is responsible for the quality of data deposited in repositories? And what is quality data, anyway?
These questions were on my mind as I was preparing to present a poster at the Open Repositories 2013 conference in Charlottetown, PEI earlier this month. The annual conference brings the digital repositories community together with stakeholders, such as researchers, librarians, publishers and others to address issues pertaining to “the entire lifecycle of information.” The conference theme this year, “Use, Reuse, Reproduce,” could not have been more relevant to the ISPS Data Archive. Two plenary sessions bookended the conference, both discussing the credibility crisis in science. In the opening session, Victoria Stodden set the stage with her talk about the central role of algorithms and code in the reproducibility and credibility of science. In the closing session, Jean-Claude Guédon made a compelling case that open repositories are vital to restoring quality in science.
My poster, titled, “The Repository as Data (Re) User: Hand Curating for Replication,” illustrated the various data quality checks we undertake at the ISPS Data Archive. The ISPS Data Archive is a small archive, for a small and specialized community of researchers, containing mostly small data. We made a key decision early on to make it a “replication archive,” by which we mean a repository that holds data and code for the purpose of being used to replicate and verify published results.
The poster presents ISPS Data Archive’s answer to the questions of who is responsible for the quality of data and what that means: We think that repositories do have a responsibility to examine the data and code we receive for deposit before making the files public, and that this data review involves verifying and replicating the original research outputs. In practice, this means running the code against the data to validate published results. These steps in effect expand the role of the repository and more closely integrate it into the research process, with implications for resources, expertise, and relationships, which I will explain here.
First, a word about what data repositories usually do, the special obligations reproducibility imposes, and who is fulfilling them now. This ties in with a discussion of data quality, data review, and the role of repositories.
Read More Here
Matt Daniels examine the vocabulary of hip hop artists, in this fascinating article:
Last month, I wrote about the fun and the pitfalls of viral maps, a feature that included 88 super-simple maps of my own creation. As a follow-up, I’m writing up short items on some of those maps, walking through how I created them and how they succumb to (and hopefully overcome) the shortfalls of viral cartography.
One of the most interesting data sets for aspiring mapmakers is the Census Bureau’s American Community Survey. Among other things, that survey includes a detailed look at the languages spoken in American homes. All the maps below are based on the responses to this survey. For instance, Mandarin, Cantonese, and other Chinese dialects are separated as different responses in the data and were treated as different languages when constructing these maps. If those languages had been grouped together, the marking of many states would change. In addition, Hawaiian is listed as a Pacific Island language, so following the ACS classifications, it was not included in the Native American languages map. The spelling of each language is based on the language of the ACS.
Read the rest here