RedCaps: web-curated image-text data created by the people, for the people

Big datasets of picture-text pairs from the website are utilised for transfer discovering applications in pc vision. Nonetheless, they ought to utilize elaborate filtering ways to offer with noisy website facts.

Graphic credit history: pxhere.com, CC0 Public Domain

A new review on arXiv.org investigates how to get hold of high-high quality picture-text facts from the website with out elaborate facts filtering.

The researchers counsel making use of Reddit for accumulating picture-text pairs. Photographs and their captions are gathered in matter-distinct subreddits. 1 of the advantages of the dataset is the linguistic variety: the captions from Reddit are normally far more natural and diverse than HTML alt-text. Subreddits supply supplemental picture labels and team-connected material. That permits researchers to steer dataset contents with out labeling particular person instances.

The proposed dataset is handy for discovering visual representations that transfer to downstream tasks like picture classification or item detection.

Big datasets of paired photographs and text have turn out to be more and more well-known for discovering generic representations for vision and vision-and-language tasks. This kind of datasets have been crafted by querying look for engines or accumulating HTML alt-text — given that website facts is noisy, they involve elaborate filtering pipelines to manage high quality. We take a look at alternate facts sources to acquire high high quality facts with minimal filtering. We introduce RedCaps — a large-scale dataset of 12M picture-text pairs gathered from Reddit. Photographs and captions from Reddit depict and describe a huge selection of objects and scenes. We acquire facts from a manually curated established of subreddits, which give coarse picture labels and permit us to steer the dataset composition with out labeling particular person instances. We demonstrate that captioning types properly trained on RedCaps make abundant and diverse captions desired by people, and understand visual representations that transfer to quite a few downstream tasks.

Study paper: Desai, K., Kaul, G., Aysola, Z., and Johnson, J., “RedCaps: website-curated picture-text facts made by the men and women, for the people”, 2021. Link to the article: https://arxiv.org/stomach muscles/2111.11431

Link to the web-site of challenge: https://redcaps.xyz/


Next Post

Balancing Efficiency and Comfort in Robot-Assisted Bite Transfer

Assistive robot arms can assistance to offer treatment for these with disabilities. A latest paper on arXiv.org appears to be into robot-assisted feeding. A bite transfer technique will have to optimize its trajectories on the fly by bringing food items into a mouth devoid of sacrificing user comfort. Feeding on. […]

Subscribe US Now