It’s time we taught computers what we think of BMW drivers
Nationwide polls like the American Community Survey (ACS) reach out to millions of people and can cost over $1bn each year. A form is sent to each household to collect information on the “demographic, social, economic, and housing characteristics.” It can be tedious to fill out, and annoying to receive the numerous follow-up phone calls.
But what if there was a way to get the information without having to deal with mountains of paperwork or bothering people at all? A paper on arXiv shows there may be a better alternative, thanks to machine learning. It involves taking snapshots, literally, of citizens’ lives and automatically deducing their backgrounds.
Working under Professor Fei Fei Li – a leading educator in computer vision who created ImageNet, a widely used image classification dataset – a team of researchers from Stanford, Rice, the University of Michigan, and Baylor College of Medicine, set to work collating 50 million images taken from Google Street View.
First, an object recognition algorithm was applied to pick out vehicles in the snaps taken across 200 US cities. Out of those photographs, 22 million distinct rides were detected and convolutional neural networks and deep learning algorithms were used to work out the make, model, body type and age of each vehicle.
Next, the information was sorted by geographical region, allowing researchers to count the number of vehicles of each make and model in each location. Extra data on the average price, fuel efficiency and overall car density was also included. The dataset was split – around one-fifth was used for training, and the rest was for test purposes.
Using the US Census and Presidential Election voting results, the researchers trained the model to roughly match the vehicles seen in a particular region to the race, education levels, estimated income and voting preferences for people living in that area.
They found that there were strong links between the types of cars and socioeconomic trends. People of Asian descent are more likely to drive Asian cars – particularly Hondas and Toyotas. Cars manufactured by Chrysler, Buick and Oldsmobile were more likely seen in African American neighborhoods, while pickup trucks, Volkswagens and Aston Martins were in mostly Caucasian neighborhoods.
Interestingly, just counting the types of cars seen in different cities during a 15-minute drive is all it takes to build up a good picture of a neighborhood’s political preferences. Sedans are most strongly associated with Democratic households (88 per cent) and pickup trucks were seen more in Republican strongholds (82 per cent).
Machines are better at detecting socioeconomic changes
The testing stage reveals the method to be fairly accurate, if you’re being generous. There was a strong correlation between the demographic estimates and actual ACS data across 165 cities – particularly for race and voter preference. For example, the model correctly guessed that Seattle, Washington is 69 per cent Caucasian and that the African American population is mostly concentrated in a few Southern cities. Cities in the North Eastern states are mostly inhabited by highly educated people, and people living in the South have the lowest incomes.
A closer inspection of the results reveals a high accuracy of political leanings on the precinct level. Milwaukee, Wisconsin, is a strong Democratic city with 311 precincts, and the model correctly classified the political status for 264 precincts – an accuracy of 85 per cent. The estimates were even better for Gilbert, Arizona, a Republican city, with a whopping 97 per cent accuracy for 58 of its 60 precincts.
Using algorithms and pictures of cars only pieces together a rough picture of the local demographics and is less accurate than completing surveys. You might even say the software simply tells you the bleeding obvious. But machines are much quicker than humans, so the AI brains have that going for them.
It took only two weeks for the computer to crunch through the 50 million images. If a human spent 10 seconds per image, it would take them more than 15 years to complete the same task. The latest survey results can be outdated if data is collected less frequently, and the new method provides a better way to detect and compare social changes.
One way to improve the results would be to incorporate other types of images such as satellite images or pictures from social networks, the paper said.
The growth of machine learning accelerates the power of algorithms and enhances data collection. Being able to extrapolate such a wide range of personal information from public resources is impressive. Although there are benefits – such as helping politicians with policy making or even detecting socioeconomic trends like recessions – there are privacy concerns.
“It is clear that public data should not be used to compromise reasonable privacy expectations of individual citizens, and this will be a central concern moving forward,” the paper, dated February 22, stated. ®