The Accidental Data Scientist: Big Data Applications and Opportunities for Librarians and Information Professionals

Sue Weddell (University of Otago Library, Dunedin, New Zealand)

Library Management

ISSN: 0143-5124

Article publication date: 10 August 2015

673

Keywords

Citation

Sue Weddell (2015), "The Accidental Data Scientist: Big Data Applications and Opportunities for Librarians and Information Professionals", Library Management, Vol. 36 No. 6/7, pp. 547-549. https://doi.org/10.1108/LM-06-2015-0043

Publisher

:

Emerald Group Publishing Limited

Copyright © 2015, Emerald Group Publishing Limited


In the following abstract, Amy Affelt from her blog post November 2012, Big Data=Big Opportunity, http://web.freepint.com/go/blog/69516, provides the background that forms the basis of this new title.

Harvard Business Review listed ‘data scientist’ as the ‘sexiest job of the 21st century’. Librarians and information professionals are data scientists, as well as experts in finding, evaluating and transforming data and information into insightful deliverables that enable strategic decision making. If we gain a basic understanding of how Big Data can be used to solve problems in the industries in which we work, we can gain a place at the table when Big Data initiatives are unveiled in our organisations”.

Three years on “Big Data” initiatives are springing up in a number of organisations, suddenly it is a major theme being discussed across many forums – the retail and marketing sectors wanting to use the rich data that comes from social media and other sources which can be used for competitive advantage; academics and researchers realising the potential in storing and making raw data available but struggling with the practicalities – data management plans and storage of large amounts of data; the continuing fallout from WikiLeaks – on national security, privacy and surveillance. The author’s strong belief that there are opportunities to be had for those librarians and information professionals who want to move out of their comfort zone and push their personal boundaries has resulted in the publication of this new title.

Thomas Davenport, who along with Larry Prusak introduced us all to knowledge management a couple of decades ago, writes in the foreword (p. xvi).

[…] “there is little doubt that if information professionals want to play a significant role in the Big Data era, they will have to expand their roles and skills. They will have to become comfortable with the ideas presented in The Accidental Data Scientist and other sources on Big Data and analytics. They may have to get their hands dirty by actually doing some data structuring, filtering, cleaning or analysis. Those who take these steps can play valuable roles on teams that work with Big Data. They may even become data scientists – either accidentally, or with intent”.

This aim of this book is to take the reader on a journey beginning in the introduction with a checklist for the reader designed to decide whether or not the journey is one you want to take. The author also draws parallels with the fact that the profession has largely survived “the disruptions of the ubiquity of internet searching, and late starts in working in knowledge management, records and digital asset management” and that the “same will be true of our roles in the Big Data explosion” (p. 9).

Chapter 1 sets the scene by providing essential background into the whole concept of Big Data, exploring definitions, the huge increase in data available via social media, and concludes with the five V’s of Big Data. The Gartner Group views “Big Data” as having three characteristics – volume (sheer amount of data collected), velocity (speed at which the data are being created transferred, delivered and collected) and variety (huge variation in shape, format and type). These are expanded by the author to include two more for librarians and information professionals – verification (process involved in analysing data sources and retrieval systems to determine data quality) and value (deriving true value is challenging, expensive and risky).

It is important for librarians and information professionals to understand the technical side of working with Big Data and Chapter 2 provides an overview of the process by which raw data is transformed by computer programmers and the terminology used. A very useful Big Data glossary of terms is provided as well as suggestions for keeping up with new developments in the field.

The twenty-first century librarian’s skill set is the theme for Chapter 3 and covers such areas as the on-going debate about the use of the word “librarian” and the SLA Alignment Project which concluded that the three critical contributions that librarian and information professionals make to stakeholders is that they have specialist knowledge regarding appropriate information resources; can provide assurance that information is current and comes from a reputable source; and can exhibit best practice in the most efficient use of these resources.

Chapter 4 “Dipping a toe in the water” sets out to take the mystery out of dealing with Big Data. Using different examples the author discusses how many companies use customer-generated data, in particular, one well publicised incident involving Target Corporation that became associated with the accusation of spying on customers. The most valuable information in this chapter is, however, the information provided on data visualisation tools and predictive algorithms, these include Google Fusion Tables, Infogr.am, Text is Beautiful, Statwing, Tableau Public and BigML.

Big Data applications and initiatives by industry is the theme of Chapter 5 and each of the following sectors are covered with examples – healthcare, transportation, entertainment, legal, law enforcement, atmospheric science and politics. This chapter leads nicely to the next chapter entitled “Big Data projects for info pros” which begins by looking at examples of some Big Data projects and briefly touching on patterns vs predictions in Big Data analysis. There is a useful but very brief mention of the types of roles available to librarians and information professionals – curator, data cleanser and archive manager and the real-life example of a data librarian is an excellent addition. However, overall this chapter is not very well structured and seems to lose its way with the addition of comments relating to embedded librarianship among other things which while interesting are not really necessary and disrupt the flow.

In Chapter 7, the author introduces a number of scenarios which follow a step-by-step brainstorming process for which she has developed a template. Called the Big Data Communications Framework, it consists of six steps which should be taken when considering the use of data within a research project:

  1. understand the problem;

  2. determine impact measurements (“the data”);

  3. discover available data (“the sources”);

  4. decide which data is more valuable;

  5. formulate hypotheses (“the hypothesis”); and

  6. communicate the impact of results.

Each of these areas is expanded upon and then various scenarios using the framework are presented. Of all the chapters in this book, this one is in my opinion the most valuable because it provides a framework that can be used by anyone who provides research support whether it be in a special library or an academic library.

The final chapter takes a look at career opportunities and includes a table listing Big Data jobs as well as an interview with a Big Data LIS Professor about the type of courses on offer; skills needed to work with Big Data; the opportunities for jobs; and advice for those interested in working in the field. It should be noted though, that this chapter in the main focuses on the situation in North America which unfortunately limits its usefulness.

In academic libraries in particular, e-research and all that this encompasses is now a major focus – data management plans, the digital humanities, data visualisation, Big Data and high-performance computing and data storage are now forming part of what we do. As a manager of subject librarians grappling with not only supporting academics in their teaching and learning role but also with their research, I had high hopes for this book which I envisaged could not only be used as a reference tool but also provide direction for those who might be keen to develop their career along this path. Unfortunately those hopes have not been realised. The main reason perhaps is that it has been written for a North American audience but I think that it is really just that the book doesn’t flow in a logical sequence even though it starts off well in the first two chapters, some rearrangement of chapters might have improved this. Because of this, the original aim of the book which is to provide librarians who may wish to embrace the world of data management and analysis with useful background information is to some degree lost.

One thing we should always remember as librarians is that as the author notes in her conclusion, and I concur, “Librarians need to continually remind others that no matter what the buzzword of the moment is we always bring our traditional skills and capabilities to every research situation” (p. 213).

Related articles