We introduce ExFunTube dataset, which consists of 10,136 funny short-form youtube videos. Videos are annotated with start and end timestamps and corresponding explanations. You can see an example of our ExFunTube dataset below, which has three funny moments!


About Dataset

What's the difference?

Previous humorous video datasets are collected from limited domains, such as sitcoms or speeches. In speeches, there is a single speaker so visual cues are restricted to facial expressions or gestures. On the other hand, in sitcoms, fixed characters follow a predefined script on a constructed set, where visual cues are also resricted. For this reason, we collect multimodally funny short-form videos from YouTube!

How do we collect videos?

To verify multimodal fun, we devise a video filtering pipeline. We are inspired by other datasets that are proved as multimodal by comparing task performence between with and without visual cues. Therefore, we textualize videos, make GPT-3.5 explain why funny, and compare two results with and without visual information. Then, we select videos when two results are significantly different. With our pipeline, we can gather multimodally funny videos!

Why is our dataset important?

Short-form funny videos on social networks are gaining popularity. Thus, it becomes beneficial for AI models to understand them in that they can provide empathetic responses or recommend funny videos based on users’ sense of humor. Furthermore, videos in our dataset are annotated funny moments and corresponding explanations, which can be utilized to help models understand humor or evaluate model’s understanding of humor in depth!



Funny Moments

There are 11,166 funny moments annotated. Out of 10,136 videos, 9,222 contain one funny moment, 798 contain two, and 116 contain three.


Each moment is annotated with start and end timestamps and corresponding explanation. Thus, there are 11,166 explanations and they consist of 44.3 words on average.


Thanks to GPT-3.5's in-context learing, we can effectively classify videos into 20 humor categories using annotated explanations. The results are shown below.


We provide a json file of our dataset, consisting of youtube urls, timestamps of funny moments, and corresponding explanations. If you click the "Download Dataset" button below, you can download a json file of our dataset. For more information about our project, please refer to ExFunTube.

      title={Can Language Models Laugh at YouTube Short-form Videos?},
      author={Dayoon Ko, Sangho Lee, Gunhee Kim},
      booktitle={The 2023 Conference on Empirical Methods in Natural Language Processing},