How common is your name?

Fun with datasets and Shiny in R.

I am part of a Mayfield Lab group project that will use the US Social Security Baby Names dataset... which I did not know even existed. Therefore what do I do... Get very distracted by an awesome, large dataset. It contains the first names of all male and female infants at birth according to US Social Security from 1880 to 2015. They exclude names that have less than 5 occurrences in a year to protect those with unique names.

There have been 95,025 distinct names over the past 136 years and 9,026 of them have been for both males and females. Click each image for a larger pdf.

Fig 1. Total number of male (blue) and female (green) births over time. Crude birth rate for 1908 to 2015 (red) is also represented on the figure (right axis).

Fig 1. Total number of male (blue) and female (green) births over time. Crude birth rate for 1908 to 2015 (red) is also represented on the figure (right axis).

It appears that although there have been more births in the US over time, the number of births adjusted for the number of women between 15 and 44 years old decreases into the late 20th Century.

Fig 2. Proportion of births for each sex over time. Male (blue) births and female (green) births over time. Horizontal line is at 0.50 or equal male and female births.

Fig 2. Proportion of births for each sex over time. Male (blue) births and female (green) births over time. Horizontal line is at 0.50 or equal male and female births.

I was surprised to see such a drastic shift to nearly 75% of births being female in the late 19th and early 20th Century. I am still trying to figure out the cause (whether a real event or an artifact of poor data).  In 1902 the US Congress established the Bureau of Census that included registration of births and in 1946 the function of collecting population vital statistics was moved to the US Public Health Service. This increase in regulation maybe the cause of the equalizing shown in the figure in the 1920's. Previously, parents may have chosen not to register the birth (although that still does not support a heavier proportion of female births).

Fig 3. Top 10 male names over time. For each year, I took the top ten names for male births.

Fig 3. Top 10 male names over time. For each year, I took the top ten names for male births.

Fig 4. Top 10 female names. I plotted the top ten female names based on count for each year.

Fig 4. Top 10 female names. I plotted the top ten female names based on count for each year.

Fig 5. Proportion of common and rare names over time for each sex. Common names (those that each contained 10-1% of the total births of that year) are in yellow-green (bottom left of each panel). Purple indicates names that had 1-0.1% of the population. Orange indicates names shared with 0.1-0.01% of the population of that year and green indicates names that have less than 0.01% of the births of the year but more than 5 counts. Rare names (those with only 5 entries per year) are in pink. Female births are in the left panel and males in the right.

Fig 5. Proportion of common and rare names over time for each sex. Common names (those that each contained 10-1% of the total births of that year) are in yellow-green (bottom left of each panel). Purple indicates names that had 1-0.1% of the population. Orange indicates names shared with 0.1-0.01% of the population of that year and green indicates names that have less than 0.01% of the births of the year but more than 5 counts. Rare names (those with only 5 entries per year) are in pink. Female births are in the left panel and males in the right.

According to Figs 3-5 it appears that uncommon names are becoming more prevalent. There are less infants sharing first names in the US than before the 1920's. 

What about you name?

You have survived the luridly colored figures. Now test out my Shiny app that I made with ShinyR**. Type in a name and see the plot of the number of births over time (you can choose from Male or Female births or select Both to represent both sexes on the graph. Then you can hover on the graph to find a specific year and search for that year in the output table to find the exact number of names.

Find Your Name App*** #

** This is my first Shiny App. Let me know if you have any major issues...

*** The app will take a long time to initially load because it is a large dataset (over 90,000 names for 2 sexes over 136 years - 185,689 entries). It also may need to be reloaded from the server and may crash. See ** comment above. :)

# 07.07.2016 - The app has been crashing quite a bit and is probably not usable currently. I need to figure out how to make the large dataset easier to query.

###################

Check out these sites that used the same database, R and Shiny but had more luck than I did.

SSA Baby Names Visualization with R and Shiny

Popularity of Baby Names Since 1880