About the Turkish citizenship database breach: trending names (part IV)

[continued from part III]

One final aggregate pattern in the data we can sanity-check is the distribution of names. While the US Census Bureau publishes statistics on popular names by year and even crunches the data for trends, there is no comparable official source of statistics for Turkey. We can look at patterns in this data-set as a proxy for that.

Last names

There are over 332,000 distinct last names in the data. Here are the most popular ones in order of decreasing frequency:

Popular last names (all years)

This data is relatively stable over time and consistent with expectations. The most popular surnames in 1960 were:

Yilmaz, Kaya, Demir, Sahin, Celik,
Yildiz, Yildirim, Ozturk, Aydin, Ozdemir

Fast forward thirty years to 1990, the last complete year in this data-set, the picture changes only slightly:

Yilmaz, Demir, Kaya, Celik, Yildiz,
Sahin, Yildirim, Aydin, Ozturk, Ozdemir

There is only minor reshuffling of original group. Demir has moved up a spot, Sahin has dropped a couple, Aydin and Ozturk have swapped places. Remarkably all 10 names are identical between the individual years 1960/1990 and entire data-set going back to the 19th century.

First names

By contrast the distribution of first names starts with greater concentration but also exhibits more change over time, with new trendy names appearing and existing ones falling out of favor. There is both greater concentration (the top 10 most popular names account for a much larger fraction of all first-names, compared to last names) and surprisingly greater overall diversity. With nearly 628,000 unique first-names the greater concentration must be accompanied by a long-tail of relatively uncommon names.

The names they are a-changing

For women born in 1950 the most popular names were:


By 1990 the list has changed— more than half the entries are new— and experienced a dramatic flattening  effect:


The pronounced clustering around a handful choices has weakened. Twice as many people were born in 1990 compared to forty years age and still the number of newborns with popular names have decreased in absolute numbers. Collectively they account for smaller percentage, suggesting  increasing diversity. We can visualize these trends by taking all names appearing in both lists and charting over time the number of people born in a given year with that name:


Some trends stand out:

  • Between 1950 and 1965, popular names are still holding their own and continuing to hit new highs in absolute numbers. (But overall population is also growing at an increasing pace; the next section considers adjusted numbers relative to overall births in that year.)
  • That trend plateaus in the 1970s and reverses sharply after 1980. Several of the names in the top 10 for 1950 start declining in absolute numbers even as more people are born each year.
  • Popular names come out of nowhere and take off quickly. 1960s witness the rise of Özlem and Esra, 1970s introduce Tuğba and 1980s have hockey-stick pattern for Merve. These names were hardly on the radar in the 1950s, registering in the single digits most years and in some cases exactly zero.
  • There are also names walking a middle-ground, bucking both trends such as Elif and Zeynep. They start out steady through the first two decades, inching up higher in the 70s and 80s.
  • The mysterious drop for certain years affecting overall numbers is also reflected here. 1951, 1957, 1961, 1967, 1975 and 1982 feature declines across the board for all names. Interestingly there is no similar correction observed for 1988.

Viewing popularity as a percentage of people born that year removes the artifacts caused by those anomalies in the data, revealing a steady erosion in the incidence of popular names. Over-arching trend is towards greater diversity and less concentration in a handful of popular options.


Paging class of 1988

Similar trends apply to names for men. Here are the most popular names in 1950 in descending order:

Mehmet, Mustafa, Ahmet, Ali, Huseyin,
Hasan, Ismail, Ibrahim, Osman, Halil

In 1990:

Mehmet, Mustafa, Ahmet, Murat, Ali,
Gokhan, Ibrahim, Huseyin, Emre, Ugur

The first three spots are identical but Huseyin has dropped to #8, Ismail is no longer on the list, while Emre, Ugur and Gokhan make an appearance. Murat came out of left-field to claim #4. While the overall ranking has not changed appreciably, the trends are more pronounced when visualized over time:


There is that significant dip for 1988 again, only it is very pronounced. Here is one of the sharp differences from the graphs for women: while this graph also has drops around 1951, 1957, 1961, 1967, 1975 and 1982, it is unique in having a far more pronounced across-the-board decline in 1988. That difference may point towards one explanation: military service. With some exceptions, Turkey requires all men to serve in the armed forces. Those born in 1988 would have reached their first year of eligibility in 2009, coinciding with the timing of local elections which these records are believed to be associated with. During their compulsory service, these men would not be eligible for voting. Removing them from voter rolls could explain that anomaly for 1988. Looking at percentages instead of absolute numbers smooths out the anomaly:


Again the overall trend is towards greater diversity and less concentration among a handful of popular names. Most of the lines are sloping downward. Murat had an unusual burst of popularity through the 1960s but peaked in the following decade. Even names that came into vogue more recently in the 70s and early 80s are starting to plateau.


