MADLIBS: A Novel Multilingual Data Augmentation Algorithm for Low-Resource Neural Machine Translation
Zeyneb N. Kaya created a natural language processing algorithm, similar to ChatGPT, that could help preserve endangered languages. She found a way to enhance the small training datasets of lesser-known languages by creating accurate translation pairs of words to generate grammatically correct sentences.
View PosterZeyneb N. Kaya, 17, of Saratoga, improved resources for machine learning models to help preserve endangered languages for her Regeneron Science Talent Search computer science project. Today’s natural language processing (NLP) algorithms, like ChatGPT, work well for languages like English because it has a lot of easily accessible text to learn from. For languages with less accessible text, Zeyneb wrote her own algorithm, called MADLIBS, that generates additional text, which, in turn, allows subsequent NLP algorithms to work better.
The key insight was to get more mileage out of the limited existing resources of each language dataset by generating appropriate translation pairs to make grammatically correct sentences and create high-quality translations.

Among the endangered languages she investigated was the Māori language from New Zealand. She hopes one day her MADLIBS software will be an open-source resource for people around the world.

Zeyneb attends Saratoga High School where she is president of the linguistics club and events coordinator of the Chinese club. The daughter of Latife Genc Kaya and Ahmet Kaya, Zeyneb hopes to pursue a career that combines linguistics and computer science.

Beyond the Project
Zeyneb is a “huge geek for languages.” She speaks English and Turkish, is learning Mandarin, and can also read text written in Korean and Arabic script!
FUN FACTS: Zeyneb developed a robust A.I. model and mobile app for farmers and home gardeners to identify, manage, and track plant diseases. Her app has been downloaded thousands of times.
