{"id":375,"date":"2017-05-01T10:56:05","date_gmt":"2017-05-01T08:56:05","guid":{"rendered":"http:\/\/aireligion.org\/?p=375"},"modified":"2017-05-01T10:57:35","modified_gmt":"2017-05-01T08:57:35","slug":"40000-tinder-selfies-scraped-to-make-a-facial-dataset-for-ai-experiments","status":"publish","type":"post","link":"https:\/\/aireligion.org\/?p=375","title":{"rendered":"40,000 Tinder selfies scraped to make a facial dataset for AI experiments"},"content":{"rendered":"<p><img loading=\"lazy\" class=\"alignleft\" src=\"http:\/\/1vze7o2h8a2b2tyahl3i0t68.wpengine.netdna-cdn.com\/wp-content\/uploads\/2016\/07\/rs_560x415-140917143530-1024.Tinder-Logo.ms_.091714_copy.jpg\" width=\"169\" height=\"125\" \/><\/p>\n<p>Someone scraped 40,000 Tinder selfies to make a facial dataset for AI experiments. Tinder users have many motives for uploading their likeness to\u00a0the dating app. But contributing a\u00a0facial biometric to a downloadable data set for training convolutional neural networks probably wasn\u2019t\u00a0top of their list when they signed up to swipe.<\/p>\n<p>A user of Kaggle, a platform for machine learning and data science competitions which\u00a0was <a href=\"https:\/\/techcrunch.com\/2017\/03\/07\/google-is-acquiring-data-science-community-kaggle\/\" target=\"_blank\" rel=\"noopener noreferrer\">recently acquired by Google<\/a>, has uploaded a facial data set he says was\u00a0created by exploiting Tinder\u2019s API to scrape 40,000 profile photos from Bay Area users of the dating app \u2014 20,000 apiece from profiles of each gender.<\/p>\n<p><!--more--><\/p>\n<p>The data set, called\u00a0<a href=\"https:\/\/www.kaggle.com\/scolianni\/people-of-tinder\" target=\"_blank\" rel=\"noopener noreferrer\">People of Tinder<\/a>,\u00a0consists of six downloadable zip files, with four containing around 10,000 profile photos each and two files with sample sets of around 500 images per gender.<\/p>\n<p>Some users have had multiple photos scraped from their profiles, so there is likely a lot fewer than 40,000 Tinder users represented here.<\/p>\n<p>The creator of the data set, Stuart Colianni, has released it under\u00a0a\u00a0<a href=\"https:\/\/creativecommons.org\/publicdomain\/zero\/1.0\/\" target=\"_blank\" rel=\"noopener noreferrer\">CC0: Public Domain License<\/a>\u00a0and also uploaded his\u00a0scraper script\u00a0to\u00a0<a href=\"https:\/\/github.com\/scoliann\/TinderFaceScraper\" target=\"_blank\" rel=\"noopener noreferrer\">GitHub.<\/a><\/p>\n<p>He\u00a0describes it as a \u201csimple script to scrape Tinder profile photos for the purpose of creating a facial dataset,\u201d saying\u00a0his inspiration for creating\u00a0the scraper was disappointment working with other facial data sets. He also describes\u00a0Tinder as offering \u201cnear unlimited access to create a facial data set\u201d and says\u00a0scraping the app\u00a0offers \u201can extremely efficient way to collect such data.\u201d<\/p>\n<p>\u201cI have often been disappointed,\u201d he writes of other facial data sets. \u201cThe datasets tend to be extremely strict in their structure, and are usually too small. Tinder gives you access to thousands of people within miles of you. Why not leverage Tinder to build a better, larger facial dataset?\u201d<\/p>\n<p>Why not \u2014 except, perhaps, the privacy of thousands of individuals whose facial biometrics you\u2019re dumping online in a mass\u00a0repository for public repurposing, entirely without their say-so.<\/p>\n<p>Glancing through a few of the images from one of the downloadable files they certainly look like the sort of quasi-intimate photos\u00a0people use for profiles on Tinder (or indeed, for\u00a0other online social apps) \u2014 with a mix of selfies, friend group shots and random stuff like photos of cute animals or memes. It\u2019s by no means\u00a0a flawless data set if it\u2019s just faces you\u2019re looking for.<\/p>\n<p>Reverse image searching several of the photos mostly drew\u00a0blanks for exact matches online, so it appears that many of the photos have\u00a0not been uploaded to\u00a0the\u00a0open web \u2014 though I was able to identify one profile image via this method: a student at\u00a0San Jose State University, who had used the same image for another social\u00a0profile.<\/p>\n<p>She confirmed to TechCrunch she had joined Tinder \u201cbriefly a while back,\u201d and said she doesn\u2019t really use it anymore. Asked if she was happy at her data being repurposed to feed an AI model she told us: \u201cI don\u2019t like the idea of people using my pictures for some sad \u2018researches.\u2019 \u201d She preferred\u00a0not to be identified for\u00a0this article.<\/p>\n<div><\/div>\n<p>Colianni writes that he plans to use the data set with Google\u2019s TensorFlow\u2019s Inception (for training image classifiers) to try to create a convolutional neural network capable of distinguishing between men and women. (I just hope he strips out all the pet shots first or he\u2019ll find this\u00a0task an uphill struggle.)<\/p>\n<p>The data set, which was uploaded to Kaggle three days ago (minus the sample files), has been downloaded more than 300 times at this point \u2014 and there\u2019s obviously no way to know what additional uses it might be being put to.<\/p>\n<p>Developers have done all sorts of weird, wacky and creepy things playing around with\u00a0Tinder\u2019s (ostensibly) private API over the years, including <a href=\"http:\/\/valleywag.gawker.com\/techies-are-hacking-tinder-in-a-desperate-attempt-to-ge-1621177524\" target=\"_blank\" rel=\"noopener noreferrer\">hacking it to automatically like every potential date<\/a>\u00a0to save on\u00a0thumb-swipes; offering a paid look-up service for people to\u00a0<a href=\"https:\/\/qz.com\/656947\/the-troubling-way-that-anyone-can-spy-on-any-tinder-user\/\" target=\"_blank\" rel=\"noopener noreferrer\">check up on whether a person they know is using Tinder<\/a>; and even building a catfishing system to snare horny bros and make them\u00a0<a href=\"http:\/\/www.theverge.com\/2015\/3\/25\/8277743\/tinder-hack-bros-swiping-bros\" target=\"_blank\" rel=\"noopener noreferrer\">unwittingly flirt with each other<\/a>.<\/p>\n<p>So you could argue that anyone creating a profile on\u00a0Tinder should be prepared for their data\u00a0to leech\u00a0outside the community\u2019s porous walls in various different ways \u2014 be it as a single screenshot, or via one of the aforementioned API hacks.<\/p>\n<p>But the mass harvesting of thousands of Tinder profile photos to act as\u00a0fodder for\u00a0feeding AI models does <em>feel<\/em> like another line is being crossed. In the <a href=\"https:\/\/techcrunch.com\/2016\/07\/09\/we-need-to-talk-about-ai-and-access-to-publicly-funded-data-sets\/\" target=\"_blank\" rel=\"noopener noreferrer\">scramble for big\u00a0data sets to fuel\u00a0AI utility<\/a>, clearly very little is sacred.<\/p>\n<p>It\u2019s also worth noting that in\u00a0agreeing to the company\u2019s\u00a0<a href=\"https:\/\/www.gotinder.com\/terms\" target=\"_blank\" rel=\"noopener noreferrer\">T&amp;Cs<\/a>\u00a0Tinder users grant it\u00a0a \u201cworldwide, transferable, sub-licensable, royalty-free, right and license to host, store, use, copy, display, reproduce, adapt, edit, publish, modify and distribute\u201d their content \u2014 though it\u2019s less clear whether that would apply in this case where a\u00a0third-party developer is scraping Tinder data and releasing it\u00a0under a public domain license.<\/p>\n<p>At the time of writing Tinder had not responded to a request for comment on this\u00a0use of its API. But since\u00a0Tinder makes its rights to your content transferable, it\u2019s entirely possible even this large-scale\u00a0repurposing of the data falls within the scope of its\u00a0T&amp;Cs, assuming it sanctioned\u00a0Colianni\u2019s use of\u00a0its API.<\/p>\n<p>A Tinder spokesperson has now provided the following statement:<\/p>\n<blockquote><p>We take the security and privacy of our users seriously and have tools and systems in place to uphold the integrity of our platform. It\u2019s important to note that Tinder is free and used in more than 190 countries, and the images that we serve are profile images, which are available to anyone swiping on the app.\u00a0We are always working to improve the Tinder experience and continue to implement measures\u00a0against the automated use of our API, which includes steps to deter and\u00a0prevent\u00a0scraping.<\/p>\n<p>This person has violated our\u00a0<a href=\"https:\/\/www.gotinder.com\/terms\" target=\"_blank\" rel=\"noopener noreferrer\">terms of service<\/a>\u00a0(Sec. 11) and we are taking appropriate action and investigating further.<\/p><\/blockquote>\n<p><a href=\"https:\/\/techcrunch.com\/2017\/04\/28\/someone-scraped-40000-tinder-selfies-to-make-a-facial-dataset-for-ai-experiments\/\">techcrunch<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Someone scraped 40,000 Tinder selfies to make a facial dataset for AI experiments. Tinder users have many motives for uploading their likeness to\u00a0the dating app. But contributing a\u00a0facial biometric to a downloadable data set for training convolutional neural networks probably wasn\u2019t\u00a0top of their list when they signed up to swipe. A user of Kaggle, a &hellip; <a href=\"https:\/\/aireligion.org\/?p=375\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">40,000 Tinder selfies scraped to make a facial dataset for AI experiments<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[9,2],"tags":[],"_links":{"self":[{"href":"https:\/\/aireligion.org\/index.php?rest_route=\/wp\/v2\/posts\/375"}],"collection":[{"href":"https:\/\/aireligion.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aireligion.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aireligion.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/aireligion.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=375"}],"version-history":[{"count":2,"href":"https:\/\/aireligion.org\/index.php?rest_route=\/wp\/v2\/posts\/375\/revisions"}],"predecessor-version":[{"id":377,"href":"https:\/\/aireligion.org\/index.php?rest_route=\/wp\/v2\/posts\/375\/revisions\/377"}],"wp:attachment":[{"href":"https:\/\/aireligion.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=375"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aireligion.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=375"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aireligion.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=375"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}