Open Entomology: Tips and Tools for Better Reproducibility in Your Research
By Jake Wittman
Reproducibility is a hot topic in today’s scientific world. Chances are, you’ve come across mentions in news outlets or social media sites of the “reproducibility crisis” in the medical and social sciences. These reproducibility issues have led to a movement to make science more open, especially with respect to how we handle our data, carry out our analyses, produce our results, and report our findings. By being more transparent about how we have carried out our work, the hope is that we will make our work more reproducible.
Many tools are available to make our work more reproducible, and I outline several in more detail in my paper, “A Guide and Toolbox to Replicability and Open Science in Entomology,” published in May in the open-access Journal of Insect Science. The article is part of a special “Open Entomology” group of papers published in the journal. I cowrote it with my advisor, Brian Aukema, Ph.D., of the University of Minnesota because there does not seem to be much open science communication targeted at the entomology community.
Open science practices and tools exist to make it easier for other people to pick up our work and see how we did it, which has the side effect of being beneficial to us individually! There is a common adage uttered in many statistics courses that captures this sentiment: “Your most important collaborator is you 6 months from now, and past you doesn’t answer emails.” At the start of my graduate work, I can’t tell you how many times I had to spend a few hours reacquainting myself with old data or analyses. If I had been aware of the open science movement and all the tools and practices available to me, I could have saved myself many headaches. Below are a few ways you can save yourself a headache, while simultaneously making your work more open and reproducible.
Tips for Open Science
Curate your data. Make sure you document from the start how you will collect all your data and how you will record your data, and be consistent. Make clear the units you are using when measuring a variable. Data points that may be especially problematic are recordings of no observation, especially when you’re reviewing your data three months after it was collected. If you’ve ever encountered a blank cell in a spreadsheet and had to determine if the cell is blank because there was no data to collect or someone just made a mistake, then you’ll understand the importance of deciding to write “NA” when there is no data to collect. Documenting these decisions clearly and keeping that documentation close to where the data is stored (e.g., another tab in your spreadsheet) will make your life easier and make it clearer should anyone else want to look at your data.
Use a tool like OpenRefine or a programming language like R to document your data processing steps. Documenting your data processing will save you time if you collect more data and need to repeat the same steps, and it makes it simple for others to see what you did. For my master’s degree, I studied gypsy moth caterpillar movement data. Much of the data I collected for one of my studies was location data for gypsy moth caterpillars moving in an open environment. To analyze my data, I had to derive variables like distance and direction traveled from those locations. I spent the week following my first field season calculating these variables in my spreadsheet. By the time my second field season ended, I was much more comfortable in R and was able write some code to process this data in less than an hour. Not only did it save me almost an entire week of work, but, by commenting my code thoroughly, I created detailed documentation of all the steps I took to derive those variables.
Make your supporting documents like data and code files available online after publication in a repository. This may not save you any headaches immediately, but it will certainly come in handy if anyone ever requests your data or the code for your project. Uploading your supporting files to a repository like Zenodo is a safeguard against losing or misplacing those files in the future. Some repository platforms even let you have private repository space, so you can upload your files as you go to protect against computer crashes or theft. It is becoming more common to require documents like data and analysis files to be uploaded to a repository at time of submission to make our research more widely available, which makes our research more open. As a bonus, these repositories will assign your research products a digital object identifier (DOI), allowing them to be cited if others use them!
Pick up one or two new things at a time—don’t try to do it all. I know that, as researchers, we’re all busy people. It can be hard for us to find the time to learn something new that isn’t directly related to our work, especially if we already know how to do a task one way already (like making a graph in Excel versus making a graph in R). I try to make it a priority to add one new tool or process to my workflow with each project that will increase the openness and reproducibility of my research.
Open Entomology in Action
Open science is likely to become the norm moving forward, as granting agencies begin to push for more stringent reproducibility standards. It will be important for current graduate students to develop their open science toolbox as part of their training. It is also likely we see open research increasingly highlighted through special issues like the Open Entomology collection in the Journal of Insect Science. You can see examples of some of the tips I’ve highlighted in the articles in this issue.
- In their article “Modelling the Putative Ancient Distribution of the Coastal Rock Pool Mosquito Aedes togoi,” Peach and Matthews use species distribution models with current and past climate data to identify suitable habitat for an Aedes mosquito, which they use to speculate whether it is indigenous or invasive to North America.
- In another article, “Probing Behavior of Diaphorina citri (Hemiptera: Liviidae) on Valencia Orange Influenced by Sex, Color, and Size,” Ebert and Rogers describe experiments they performed to elucidate the roles sex, color, and size play in the feeding behavior of an invasive psyllid that vectors a disease of citrus trees.
- Lastly, in “Mating disruption of Chilo suppressalis from sex pheromone of another pyralid rice pest Cnaphalocrocis medinalis,” Liang and colleagues describe how the sex pheromone of one pest moth in rice inhibits the catch of another pest moth when the two sex pheromones are used together in traps.
All these articles were published according to open science guidelines, so you’ll be able to look through their supporting research documents yourself. Look and maybe you’ll find inspiration for a project of your own, learn something new about their analysis techniques, or come up with a great idea for a collaboration.
Journal of Insect Science
Jake Wittman is a Ph.D. student in the Department of Entomology at the University of Minnesota. Email: firstname.lastname@example.org.