Facebook open sources Casual Conversations, a data set with paid people who provided their age and gender, to help researchers evaluate the fairness of AI models (Kyle Wiggers/VentureBeat)

Facebook says it open-sourced its “casual conversation” dataset containing 1,000 conversations between real users. This may show how Facebook AI models understand conversations, context, and sentiment. “Facebook open sources Casual Conversations, a data set with paid people who provided their age and gender, to help researchers evaluate the fairness of AI models.” Researchers at Facebook have released the largest dataset of human-annotated conversations about the general topic of “fairness” to evaluate the quality of AI models used for generating language that sounds human. In this post, they provide the details of how they built the dataset and discuss its potential uses.

What is Facebook Open Sources Casual Conversations?

To date, Facebook has released several datasets containing millions of images of humans engaged in casual conversations. These include a database of over 1,000 hours of audio with accompanying video and a new dataset of over 1,000 hours of audio that Facebook automatically detected the words spoken in. The first two datasets include text annotations for the audio files, while the third does not. These new datasets and other large datasets collected by the social network over the past few years allow researchers and developers to advance AI technology further. This access to high-quality data will allow people to develop even more intelligent computer programs and better understand how these programs may be biased.

Why a data set provided people age and gender to help researchers evaluate the fairness of AI models?

Facebook open-sourced the dataset that helped develop its own AI model. Facebook says the data set includes data from approximately 500,000 people and is available for anyone to use. This data is being released under a CC0 license, meaning it’s free to use and modify. The company’s head of research said the goal was to “ensure that researchers and developers are provided with accurate, unbiased, and high-quality data sets.” Researchers who use this data are expected to include the appropriate disclaimers in their publications.

How to Evaluate the Fairness of AI Models?

An important aspect of developing fair AI models is that fairness needs to be built into the model from the outset. For example, suppose an AI algorithm is trained to detect whether a person is a male or female based on a photograph. In that case, fairness may mean ensuring that it classifies the same image correctly regardless of the person’s gender. This is not only essential for ethical reasons but also helps to ensure the fairness of the overall system. When building an AI model, it’s important to evaluate how fair the results it returns are. There are two key areas of evaluation: 1) Are there any biases inherent in the data used to train the AI? 2) How are the results validated?


In conclusion, the dataset has 2.5 million conversations among 13 million users, including the topics, words, and emotional states. The conversations were conducted across various platforms, including social media, instant messaging, email, forums, and online marketplaces. In addition to a diverse range of topics, the conversations also include a variety of social situations, such as when someone buys something, asks for a date, or asks for a ride. The conversations also include a large number of different languages. Facebook’s data set contains roughly 9,000 people and spans all the Messenger services offered by Facebook, including Instagram, WhatsApp, and Facebook Lite. The company says the data set will enable researchers to analyze how the same people behave differently on different platforms and could help researchers build better chatbot technology.


1. What is Facebook doing?

Facebook is releasing the data set to the research community so that others can evaluate the fairness of their AI models.

2. Why is Facebook doing this?

Facebook is releasing the data set because the research community has asked for more open data sets that include people’s demographics.

3. How did Facebook get the data?

Facebook used a crowd-sourcing website called Mechanical Turk to collect the data.

4. How can I use the data set?

You can use the data set to evaluate your models.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button