Merge commit '2af495d056616cc0f757a055114b56df2e0d5d84' as 'projects/bad-nlp/name-database'

This commit is contained in:
2023-03-20 18:03:18 -06:00
669 changed files with 423076 additions and 0 deletions

View File

@@ -0,0 +1,15 @@
# 2000 US Census First name and Surname Database
[From http://www.census.gov/genealogy/names/names_files.html](http://www.census.gov/genealogy/names/names_files.html)
## Files
- dist.all.last: contains a list of roughly 90,000 surnames recorded in the 1990 census
- dist.female.first: contains a list of 4,300 female given names
- dist.male.first: contains a list of 1,200 male given names
## Record Fields
- Name
- Frequency as a percentage
- Cumulative percentage
- Rank

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,19 @@
# 2000 US Census Surname Database
[From http://www.census.gov/genealogy/www/freqnames2k.html](http://www.census.gov/genealogy/www/freqnames2k.html)
## Files
- app\_c.csv: contains a list of surnames recorded at least 100 times in the 2000 census
## Record Fields
- rank: the absolute rank of the name in the census
- count: how many people counted in the census had the surname
- prop100k: the proportion of people with the surname per 100,000 people
- cum_prop100k: the cumulative proportion per 100k of the surname and every higher ranked surname before it
- pctwhite: percentage of White people with this name
- pctblack: percentage of Black people with this name
- pctapi: percentage of Asian people or Pacific islanders with this name
- pctaian: percentage of American Indians or Alaskan natives with this name
- pct2prace: percentage of people of more than one race with this name
- pcthispanic: percentage of Hispanic people with this name

File diff suppressed because it is too large Load Diff