Annotated Flickr dataset for identification of professional photographers

Abstract

We collected and computed various data and statistics from a sample of Flickr users who uploaded photos to the platform in December 2021 and their photos, obtaining a final number of 27,516 users and 2,647,928 photos. Having the total number of photos uploaded and the number of photos uploaded in December by each user, we selected a representative sample of those whose activity was not overly concentrated in December and obtained data from those who specified their occupation. In addition to the data collected directly from Flickr, we enriched the dataset with new features resulting from the automated analysis of the photos and their comments. One of the most valuable features of this data collection is that each photo has three Image Quality Assessment scores representing aesthetic and technical aspects. For this, we used Convolutional Neural Networks trained with human-labeled data. Furthermore, we added labels to indicate whether the user is a professional photographer, so the data are specially prepared for supervised training.

Sofia Strukova
Sofia Strukova
PhD Student

My research interests include data science, machine learning, computational social science and learning analytics.