How Systematic Entomology Will Thrive in the Age of Artificial Intelligence
By Jiri Hulcr, Ph.D., Andrew J. Johnson, Ph.D., and G. Christopher Marais
In a recent New York Times profile, Mauricio Diazgranados, the new director of the New York Botanical Gardens, shared a message that many scientists grapple with: “We just cannot keep doing the science as we are used to doing. Can we keep going into the field, bringing in and describing new species while the whole world is tearing apart and burning?”
Does Mauricio’s message resonate with entomologists? Our own field, systematic entomology, often struggles with this question: Should we focus on providing solutions to contemporary issues, or should we keep documenting the world’s biodiversity? By spending our careers documenting insect species and their relationships instead of solving the world’s immediate problems, are we potentially threatening the survival of our field?
We believe that history shows the answer: Documenting biodiversity is here to stay as one of the foundations of biological sciences. What we need is to keep evolving how we do it. Our survival will be a result of our ability to retool.
Systematists Have Repeatedly Evolved
A few decades ago, traditional systematic entomology was derided as stamp collecting, an outdated field soon to be replaced by (back then) cladistics, phylogenetics, and studies of evolutionary processes. Turns out, systematics as a whole survived and is thriving, while cladistics not so much.
A decade or two later, molecular biology swept in by storm. Taxonomy was once again predicted to go extinct, and taxonomists were deemed a dying breed. Yet, as we filled GenBank with millions of DNA sequences, we found ourselves unable to place them in a meaningful context, or even label them with names. Instead of going extinct, experts familiar with organisms became a hot commodity, and new programs at the National Science Foundation increased funding for documenting biodiversity.
Systematics has survived and thrived because each new generation of systematists has embraced the new tools that time has brought.
How Systematics Can Become “Machine Intelligible”
What’s hot now? Artificial intelligence (AI). Language models and image recognition may be the next disruption in biodiversity documentation. When iNaturalist is in everyone’s pocket, it is time to assess what systematic entomology needs to do to keep itself relevant, well-funded, and thriving.
Our lab has toyed around with machine learning, which has resulted in two outcomes: a prototype AI bark beetle classifier, and a profound realization of the importance of people with in-depth familiarity with the organism. Once again, as we are adapting our field to the use of new tools, we need the humans who crawl through bushes, sift leaf litter, and spend time peering at museum drawers to be the arbiters of what is biological truth versus what is an artifact of an algorithm. And, once again, taxonomists may become increasingly important—but only to the extent that we can cooperate with the machine.
Here is what it means for systematists to be “machine intelligible.”
1. Our outputs need to embrace the machines’ hunger for data. Even the good old “stamp collecting” entomology, as in specimen accumulation, has become valuable again. The key will be turning the specimens into data and making those available. So, to our fellow systematists: Please publish your images, and label them lavishly. Publish your morphological descriptions with ample details. Publish field observations and host associations. The machines keep harvesting our data off the web; make sure yours are there. If your collection is not online, it is not helping the common cause. If we feed the models enough morphological terms, one day you will be able to identify your bug by talking about it to your computer.
2. We humans should be more disciplined about our vocabularies. This does not mean that we need to write like robots. While the recent era of relational databases required strict consistency in format and spelling, the new era of natural language models, fuzzy matching, and graph databases does not strictly require that. What’s more important is the volume and repetition of statements that are accurate. Statements do not necessarily need to be unified, or even grammatically correct; rather, they have to be factually accurate. From now on, we need to be much more careful to distinguish what we know for sure, and what is a hypothesis. If we are not sure, it is our responsibility to state the doubt.
The taxon we study, bark and ambrosia beetles, is a good example. Thousands of publications report trees being killed by these beetles. So, if you ask ChatGPT whether, for example, beetles in the Ips genus kill trees, it will report with high confidence that they do. But that is not true for the great majority of Ips species, including nearly all in the United States. This response is a result of an over-emphasis in published work on one European tree-killing pest, while at the same time we collectors and systematics consistently fail to report when all the other Ips beetles are just secondary colonizers of dead trees.
One more word on language: Perhaps you are already using Darwin Core format for your data; great. But now let’s think about it a bit differently. Don’t think about language rules as restraints. Instead, think about the need to tell everything that you know, even if it is repeated and boring. Learn about ontologies and try to adopt one in your work.
3. We must keep collecting! Even as AI models grow increasingly intelligent, in the end it will be you, your specimens, and your knowledge of them required to steer the machines in the murky waters of truth and knowledge. Artificial intelligence routinely distorts human beliefs about the world just by sampling biases and doesn’t know it. With the global homogenization of biota on one hand and rampant extinctions on the other, generating real data, not simulations, has become more important than ever. Here in Florida, the state Department of Agriculture’s Division of Plant Industries—an agency very much immersed in applied science—developing a regional taxonomy hub staffed with human taxonomists, not machines. Why? Because their biggest problem is recognizing and documenting new invasive pests that nobody has ever seen before.
The Future of Describing the Natural World
Ahead lie some exciting times for systematics. What comes after the AI machines have been trained for systematic work? What comes after the point where machines are more accurate than people at predicting the identity of organisms? What happens when the processes to train them have also been automated? How will taxonomy survive? Right now we are still doing most of the describing, sequencing, and photographing of the biological world, but soon robots may be better at these and take up much of the grunt work. What role will trained taxonomists fill then? Where will the frontier of the field lie, the tasks at which we can still “outcreate” the machines? We do not know yet.
We do know that taxonomists are primed to fill a translational role between machines and people, both other scientists and the public. We may need more training in interpreting the complexities of the world in digestible ways. In other words, we may be needed to bridge the gap between machines, people, and nature itself.
Jiri Hulcr, Ph.D., is an associate professor with the School of Forest, Fisheries, and Geomatic Sciences and the Department of Entomology and Nematology at the University of Florida and principal investigator at the UF Forest Entomology Lab. Email: email@example.com. Andrew J. Johnson, Ph.D. (firstname.lastname@example.org) is an assistant research scientist and G. Christopher Marais (email@example.com) is a master’s student and graduate assistant in the lab.