The Accidental Data Scientist: Big Data Applications and Opportunities for Librarians and Information Professionals

John Azzolini (Clifford Chance US LLP, New York, New York, USA)

The Electronic Library

ISSN: 0264-0473

Article publication date: 2 November 2015

305

Citation

John Azzolini (2015), "The Accidental Data Scientist: Big Data Applications and Opportunities for Librarians and Information Professionals", The Electronic Library, Vol. 33 No. 6, pp. 1196-1198. https://doi.org/10.1108/EL-07-2015-0126

Publisher

:

Emerald Group Publishing Limited

Copyright © 2052, Emerald Group Publishing Limited


Like cloud computing and knowledge management, “big data” is a buzzword seemingly disengaged from everyday understanding. It is bandied about in mainstream news stories, a growing number of books are devoted to its implications and industry reports highlight it as a trend too important to ignore. However, it is seldom pulled down to earth and given conceptual clarification for use by educated laypeople. Perhaps this is because of its most daunting intimation: immense networked data sets and complex algorithmic processes, expanding at an unrelenting rate, intelligible only to data scientists with orthodox credentials.

To clear away misconceptions and offer practical insights into the limits and possibilities of big data, Amy Affelt has written this book. The author strives to facilitate librarians’ swift entry into this area by underscoring noteworthy developments, introducing its distinct terms and industry players and giving a broad sample of what she calls “situation-based research problems” where big data is being fruitfully applied.

The first chapter covers the varied understandings and characteristics of big data, as it is portrayed by both popular surveys and business insiders. Some commentators see it as the data itself, such as raw social media content or the personal information collected by government agencies. Others define it as the sources that generate the data or the technology that allows the data to be collected and analyzed. In addition to posting the conceptual basics, the author touches on the costs of big data (the requisite storage platforms, staff and office space) and its attendant copyright ambiguities.

The leading platform providers and frameworks are outlined in Chapter 2. These include Hadoop, MapReduce, Splunk and Emerald Logic. Several real world applications of these technologies give the reader an idea of how institutions are using them with significant effect. A glossary and a good selection of current awareness websites and blogs are given.

Chapter 3 calls attention to the qualities that make a librarian an excellent fit for big data roles. Among these are the provision of current and authoritative knowledge in advancing the goals of their organization and the use of best practices in maintaining the delivery of this business-relevant information. Of course, librarians must continually demonstrate and market this often taken-for-granted value. Becoming integral members of big data project teams would make this value more apparent to management.

Chapter 4 addresses the fear one might have that big data is something so new and unrelated to more traditional types of information that it is beyond the grasp of librarians. User-friendly data visualization models and predictive algorithms such as Google Fusion Tables, Infogr.am and Statwing are outlined to show the facility of big data tools.

The dedicated mining and analyzing of big data by private industry and public agencies are underscored in the next chapter. The range of domains in which these innovative steps are being taken is remarkable: healthcare, transportation, entertainment, law enforcement, atmospheric science and politics. Chapter 6 drives home the opportunities to be had at an individual level by reiterating the unique abilities information professionals possess, with context awareness and validity checking being some of the most estimable.

To support her conviction that evidence-based practices in information management are essential, Affelt advances her own Big Data Communications Framework in Chapter 7. This is a six-step process of proficient data usage that consists of disciplined problem understanding, impact measurement, source discovery, value assessment, hypotheses formation and the communication of impact. She then gives several reference scenarios where this framework could be used by librarians to boost their big data credibility.

The final chapter gauges the big data job market, advising librarians on what skills in their repertoire they should highlight to impress hiring managers seeking qualified team members. It briefly reviews hiring trends and illustrates the diversity of existing opportunities by providing a table of representative data scientist positions and their responsibilities.

The Accidental Data Scientist’s intended readers are librarians who lack the sophisticated computer backgrounds of programmers but possess the analytical and creative skills upon which big data comprehension could be built. The aim is to convince these non-technical professionals and the knowledge-intensive organizations using them that they are ideal candidates to take on larger roles in big data projects. In the service of such an objective, the book sometimes takes on the tone of promotional material, with librarianship being marketed by a vested professional. However, this boosterism seems like a natural approach for a committed guide hoping to assure the hesitant to set out on a new and potentially rewarding career path. Clearly expressed and backed by evident learning, its guidance is certainly worth following.

Related articles