Data, data everywhere but…
Information, known to scientists as ‘data’, is increasing in amount and accessibility, but making sure it is high quality and useful isn’t easy, say Anja Gassner, Ric Coe, Jason Donovan, Tor-Gunnar Vågen, Constance Neely and Eike Luedeling
There is more data available now than ever before, such as online data resources like social media and repositories like Dataverse at Harvard University (which contains data sets from the World Agroforestry Centre) and those held by the United Nations Conference on Trade and Development, the Food and Agriculture Organization and the World Bank. We have entered the era of Big Data, which is the term used to describe a ‘collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data-processing applications’ (Wikipedia).
The issue of data management and analysis in this new era of abundant information was critically presented by researchers Anja Gassner, Ric Coe, Jason Donovan, Tor-Gunnar Vågen, Constance Neely and Eike Luedeling as a keynote address at the annual Science Week, 9–13 September 2013, at the World Agroforestry Centre’s headquarters in Nairobi, Kenya.
According to the research team, over the last few years there have been rapid developments in technology, computational platforms and resources and laboratory analytical methods that allow generating and accessing large data sets followed by ‘intelligent data analysis’. Consequently, scientists can do a lot with the data available, for example, mapping the health of entire ecosystems; tracking deforestation in the Amazon; and using multiple data sets to model possible scenarios that can be successfully matched against ‘real world’ situations.
However, these analytical tools are quite technically complex and powerful; so much so that the risk for researchers is that they become ends in themselves, resulting in people analysing data they know nothing about. Every data set has its limitations and the analyst has to understand those. There is also the problem of ‘data for data’s sake’ or collecting a lot of data that isn’t of high quality or potentially useful, creating a ‘garbage in = garbage out’ effect. But it was also pointed out that ‘garbage in’ might result in ‘compost out’, in other words, finding that the available data is ‘rubbish’ might result in a greater effort to change things. The coming of Big Data has changed much of statistical analysis from testing hypotheses and quantifying uncertainty to discovering of patterns. But there is always a danger of spotting pattern where there is none; something the human brain is rather good at.
Can more data simply be too much data? The human brain generally isn’t good at making decisions based on excessive information: too much data disables decision-making. Humans tend to base their decisions on a myriad of inputs and internal and external drivers rather than being driven by a rational decision framework. Rationalizing tends to trump rationale. This is particularly important given that the results of agricultural data analyses produced by researchers are often aimed at decision-makers, whether they be farmers deciding what to plant or government officers preparing spatial land-use plans for entire landscapes. The way information is presented to these decision-makers is critical. Information for decision-making must be readily understood and be used to enhance awareness, allowing for decision makers to gain insights into complex issues, query the data and better understand risks, uncertainties and trade-offs associated with solutions.
While reducing the amount of information might be necessary for clear thinking, over-simplification can be misleading. For example, a headline such as, ‘Maize grown with Gliricidia can result in increased yields of up to 50%’, focuses on the success of one farmer in a trial involving 52, ten of whom had negative results and the rest improved yields by 10–20%. In this case, closer examination revealed that the sensational claim in the headline, whilst true, could also mislead decision-makers into thinking that the particular agroforestry system would consistently produce such yields in every situation, something that couldn’t be supported by the evidence.
In keeping with the changes to the amount, management and analyses of data, the team noted that the role of scientists working in development, such as those at the World Agroforestry Centre, is changing. Previously, funding contracts and performance evaluations focussed strongly on achieving publication in high-impact, academic journals. Fortunately, there has been a rediscovery that the real currency of research and scientific knowledge—data methods and ideas—are important instruments to accelerate the impact of research for development. Now, World Agroforestry Centre scientists are increasingly using their skills—and their strategic advantage of being well-integrated with farming communities—to create well-designed data sets for others to use.
The CGIAR—the global research partnership for a food-secure future of which the World Agroforestry Centre is a member—is moving to ‘open-access’ data, meaning that CGIAR data sets will be made publicly available in repositories that are compliant with the Open Archives Initiative. The World Agroforestry Centre launched its new online data portal in 2011.
While doing so undoubtedly adds to the sum total of intellectual public goods, the move will put more pressure on the gatherers, compilers, analysts and keepers of the data to ensure that their methods are rigorous and their results appropriately targeted at the people who want them.
Edited by Robert Finlayson
The work of the World Agroforestry Centre is closely linked to the CGIAR Research Program on Forests, Trees and Agroforestry