1000FARMS - Video of Webinar 02 - Violet Lasdun

Click on the link below to view the webinar by Violet Lasdun from London School of Economics on 4 February 2025, presenting her talk Engaging Rural Youth in Data-Driven Plant Breeding

Presenter Bio: Violet is a PhD student at the London School of Economics, working with the Alliance on a project to develop and scale AI technologies for plant breeding. Her research explores how breeding programs can leverage new technologies for highly decentralized data collection, providing insights into genotype-by-environment interactions in diverse production environments, and capturing detailed farmer preferences. These datasets support the development of target-adapted crop varieties that perform well in farmer fields and meet nuanced production and consumption objectives.

For more info contact Violet at: v.lasdun@cgiar.org

Summary: In this presentation, she will share her experience implementing a tricot trial in Tanzania, emphasizing the human networks and connections that are essential for the successful adoption of new technologies and methods.

Question Answer
How was the feedback experience? What were the developments of the meeting? I think it went pretty well. We organized meetings within each village, so there’d be 12 farmers and the enumerator who led the meeting. We provided the analyzed data from the tricot trial so that the enumerator would have prepared the winner from the tricot ranking for each each trait that we measured, as well as the overall winner, both within the village and then for the overall trial as well. So you could see locally and globally which was preferred and which had the quantitative highest yield so that was something to compare. And then we ran a PVS exercise because we wanted to conduct like a more typical exercise where people come up with traits and then rank them, and that also worked well, because usually people just run those PVS exercises on one demonstration plot which limits the number of participants. In this case, in each village we’re able to get that data from lots of people. And there was lively discussion. We provided snacks and sodas, and had everyone like talk about their experience and what they had learned. And we talked a lot about methods for seed saving and kind of coming together as a community to save the seeds, because sometimes people complain that it wasn’t enough seed to really be worthwhile, like planting for seed multiplication. But if people kind of came together with their village and combined people who had the same varieties, and then planted a village plot together. Then the seed multiplication ended up making a lot more sense. So some of those ideas came out of those feedback sessions as well. And it was just a nice kind of opportunity for some closure and closing the loop on the participation.
Could you please explain CV Phenotyping and LLMS? CV or computer vision phenotyping is basically a type of artificial intelligence where you feed a lot of images into a computer program, and the computer is able to process data from that. So one example is pod count. So if you take a lot of images of plants with pods on them, you can train a computer to recognize where the pods are in that image. Basically by feeding them millions of other images that have pods in it, where people have labeled what’s a pod and what’s not a pod so the computer learns to pick up that pattern. So then you run a variety trial, collect images of the plants that are tagged to specific genotype and plot, and then it will make a pod count estimate instead of having a person out in that field count those pods by hand. The pilot study has been on common bean, but it will expand to other crops as well.
LLM is large language models. AI is an LLM, so it’s just like Chat GPT and the chat bots that you may have heard about in recent years. Basically, so they’re they’re tools that have just processed so much data that they’re almost able to like think like a human. And they have tons and tons of applications. But one of the things that we’re using them for is to listen to interviews instead of a human listening to them, and to pull out the key insights from the interview. You might have done this process where you sit there listening to the interview, and you’re trying to fill out an excel sheet with all the important information, so that you can actually process that data. This is very time consuming.
What is the role of the farmers as you hire enumerators for planting the trials and collecting the data? So the numerators plant the tricot plot just to ensure that they’re comparable. You want to have the same number of plots of plants. You want to have more or less even spacing, especially for our images that need to be captured in a certain way. But apart from that we don’t give any other management advice. So the farmers free to manage it in the same way they would manage their other beans. So this lets us get data on genome by environment by management interaction. Also having different farm sites ensures that we have a lot of diversity in the environments that we’re seeing. And we collect all this environmental data and GPS data which we also link to climate satellite data. So we get a lot of different observations of the genotype in all the environments that we’re actually breeding for.
Can you explain more about the Ndizi tool? It’s kind of what I was just describing with the LLMs. Ndizi is just the name of the project for now. It’s going to be a chatbot, essentially for collecting interview data at scale and then processing that from that data without having to do this kind of human transcription and interview coding tasks. And the vision is for it to have many applications. But the first one we’ve been doing is a user profile product profile. So it starts with a semi-structured interview where we in the context of a tricot trial. It asks farmers what did you like about variety A? What did you dislike about variety A? And then you have a free conversation around that while recording it. Then that interview gets fed into this LLM where we ask the model questions that you would ask chatgpt. Basically, we give a prompt to the LLM to pull out the traits that the farmer talked about, and then the extent to which they were like enthusiastic about it. It can describe the varieties along each of the traits that’s mentioned by the farmers, and scale the relative importance of traits to different farmers. We also take in a lot of socioeconomic data. For example if a lot of women in a certain income level are talking more about consumption traits or like risk reduction traits then we have that data. And we can say this demographic cares more about these types of traits, because that’s what they’re really mentioning when they’re asked to describe the varieties rather than if you’re given like a complete set of traits that you need to rank every variety along.
What are your thoughts on increasing sample, size per enumerator for getting more data in? Are there any sample size limitations that need to be considered? You can definitely include more. The limitation for the Artemis data is every data collection needs to happen at a certain amount of time after planting. Because if you’re collecting pictures of stand count, you need to collect that picture of the plot after all the plants have germinated but before the canopies have closed or the image won’t be able to be processed by the computer vision. So what that means is timing is complicated and we’re working on building a system for this. The limitation is more the number of farms per day rather than number of farms per enumerator.
How does the AI tool you’re using deal with local languages? How do you handle the language barrier with local farmers? It’s only for Swahili right now. And it would need to be retrained for every local language so it’s not ideal.
Did you encounter any challenge or challenges in selecting your farmers? And how did you solve that challenge? I’m doing my Phd in economics, and everyone is strict about like random selection. So to write an economics paper you have to make sure that it’s a random sample that’s representative, which I do think the spirit of that is really important as I discussed in this presentation. But in the end of the day, if you just randomly select people sometimes, people don’t actually understand what they’re signing up for, and then they end up dropping out. So we did have about 15% of the initial selected farmers ended up dropping out once they realized how much of a commitment it was going to be. But then we were able to find other people to take their place really easily. In the future. I would do a less strict method, maybe like opening it to everyone and talk more with the farmers before signing them up.