US Congressional Apportionment Calculator

Following the release of apportionment population data in April 2021, each state was allocated a particular number of Congressional representatives. Apportionment populations are slightly different from the standard census population counts, as they consist of “the resident population of the 50 states including overseas federal employees (military and civilian) and their dependents living with them.”

Interestingly enough, there are multiple different ways that seats could be apportioned based on population data, each with various tradeoffs in terms of privileging larger or smaller states. Since 1941, apportionment values have been calculated using the Huntington-Hill method. Generally speaking, the Huntington-Hill method calculates a priority number for each state and a particular seat number (for example Delaware having a 2nd Congressional District and Texas having an 88th Congressional district). These priority numbers are then ranked, and with slight modifications in order depending on the minimum seats required for each state, the remaining number of seats are selected in order.

Because the Huntington-Hill method is somewhat tricky to implement, and calculators employing it appear hard to find, I created a Congressional Apportionment Calculator in Google Sheets that allows you to enter population values for every state, and then to calculate how many Congressional seats would be allocated to every state based off of those figures.

The notebook has been written where cells in blue are editable by any viewer of the document.

Screenshot of the apportionment calculator

The calculator also includes fields to include DC as a state, and to change the minimum and maximum of seats per state, as well as the total number of states. The second tab in the spreadsheet includes two maps, which plot the total number of seats for each state under the modified populations, as well as the difference in seats per state compared with the seats that were actually allocated.

Using the calculator, you can see things like:
– How apportionment would have changed if New York’s apportionment population was just 89 people larger
– How apportionment would change if DC were counted as a state
– How apportionment, and a state’s percentage of all seats would change if Congress was expanded.

Please note:
– Unfortunately, Google Sheets does not support including Washington, DC among its state maps.
– Changing the minimum seats per state to 0 may result in whacky outputs, particularly because the definition of the method begins with each state being allocated one seat.

Technically speaking, the spreadsheet utilizes a function I wrote in Google Apps script, and was a good chance to practice some Javascript. With some additional time, I could try to add an option for Puerto Rico, and link the results of this analysis with those discussed in an earlier post Counties and Cities with the Most Influence on US Federal Elections.

Is Nevada Really The Most Mountainous State?

Which State Has the Most Mountains over 10k Feet?

I was recently asked which state has the most mountains over 10,000 feet. I guessed Colorado, but was surprised to hear that it is allegedly Nevada.

If you Google this, you can find some evidence for it, in particular this Opinion piece from the Great Falls Tribune, which states:

If measured by the state with the number of named mountain ranges, then the distinction goes to Nevada, according to a member of Blurtit.com, an online question and answer community. 

“Nevada has more than 300 named mountain ranges, all running north-south as part of the Great Basin complex. Elevations range from 2,000 to 3,000 feet. The state has the most number of peaks above 10,000 feet. “

Great Falls Tribune edit board

While I don’t doubt that Nevada has a large number of mountain ranges, I found it hard to believe that Nevada had the most peaks above 10,000 feet, particularly because California contains the bulk of the Sierra Nevada range, and Colorado is known for its many 14,000 foot peaks and has significant areas of land above 10,000 feet in elevation.

Curious to find the answer, I looked at topographic data and mountain summit data to dig into it further. From that, here are my findings: Colorado, not Nevada, has the most mountains over 10,000 feet tall.

Running the Numbers

The first source I looked at was summit data from the Environmental Systems Research Institute. It provided data on mountain peaks, including height in meters and the state and county in which each peak is located. Filtering down the dataset to mountains over 3048 meters (10k feet), here’s how many peaks there are in each state:

StateNumber of 10,ooo foot Mountains
Colorado1,593
California506
Wyoming474
Utah265
Montana230
New Mexico169
Idaho119
Nevada117
Alaska108
Hawaii38
Arizona25
Washington13
Oregon5
Source: Environmental Systems Research Institute (Redlands, Calif.)

The number for Nevada from this data closely matches that of this list on Peakbagger.com, which lists 134 peaks above 10,000 feet.

The Colorado number appears harder to verify, but if anything appears vastly undercounted relative to the numbers listed in this article. That link mentions 48 14k footers, 804 13k footers, 1,062 12k footers, 716 11k footers, and 527 10k footers (with at least 300′ clear prominence).

Perhaps the quote above listing Nevada on top is incorporating some sort of prominence metric, but it seems hard to justify putting Nevada on top when there are 7 other states with more 10k footers.

Visualizing

This question gave me a good opportunity to play around with RASTER data, where I used elevation data from the USGS and to map that data alongside state boundary polygons and the summit point data mentioned above. As you can see from the map, Colorado clearly has the most mountains over 10k feet, which are marked in blue.

Below is a map without the mountains.

Visualizing Basketball’s Plus / Minus Statistic

If you’ve ever looked at a box score for a basketball game, one interesting statistic that is recorded is the plus / minus. Plus / minus denotes a team’s net points while the player is on the court.

As an example, if a player enters the game with the scored tied, and leaves with his or her team leading by 5, the player would have a plus / minus of +5. If that player re-entered the game when his or her team was up by 10 and subbed out with the team down by 2, his or her net plus / minus would be -7 (+5 + (-12) ). One small caveat is that minor adjustments are made so players don’t “receive” or “lose” points from being subbed in or out part way through free throw attempts.

Below you can see examples of the plus / minus from Game 1 of the Eastern Conference semifinals between the Philadelphia 76ers and the Boston Celtics, which the 76ers won 119 – 115.

The plus / minus data is pretty interesting, especially when combined with data about how many minutes each player played. For Philadelphia, in the 42 minutes (out of 48 total for the game) that Tobias Harris played, the 76ers had a net even margin with the Boston Celtics. On the other hand, in the 36 minutes that Tyrese Maxey played, the 76ers outscored the Celtics by 12 points. Lastly, in the 25 minutes De’Anthony Melton played, the 76ers were outscored by 8 points.

Clearly there were some minutes in which Harris was out of the game and Maxey was playing where the 76ers did very well and a good chunk of minutes where the 76ers played quite poorly when Melton was in, when Harris and Maxey were in the game.

The Celtics’ boxscore is arguably more interesting with Malcolm Brogdon logging a +14 in his 34 minutes and Al Hereford logging -17 in his 30 minutes. In a game the Celtics lost by only 4 points, that’s a large differential.

In order to better visualize how this discrepancies exist, I put together a visualization of plus / minus data for the entire game using play-by-play and score data from ESPN and then Python and the matplotlib library for analysis.

With the visualization, you can better see how all the plus / minus extremes for each team played out. For example, Maxey picked up points relative to Harris from subbing out a bit earlier in the 1st quarter and playing in the late second quarter. Comparing across teams also allows you to guess some of the matchups at play, with Maxey appearing to take advantage of the time Brogdon sat around halftime.

Here’s Game 4 of the Warriors – Lakers as a comparison:


If you’re interested in visualizing other games or seeing the code to generate the graphics, let me know!

Counties and Cities with the Most Influence on Our Federal Elections

Note: The analysis discussed in this article can be played with interactively here.

Most Populous American Cities in 2020

According to the 2020 Census, the largest cities in the United States were as follows:

  1. New York, NY – 8.80 million people
  2. Los Angeles, CA – 3.90 million people
  3. Chicago, IL – 2.75 million people
  4. Houston, TX – 2.30 million people
  5. Phoenix, AZ – 1.60 million people
  6. Philadelphia, PA – 1.60 million people
  7. San Antonio, TX – 1.43 million people
  8. San Diego, CA – 1.39 million people
  9. Dallas, TX – 1.30 million people
  10. San Jose, CA – 1.30 million people

Something that is perhaps counterintuitive is that even though these are the ten largest cities in the country, they do not necessarily have the most influence on our nation’s federal elections.

Influence on Federal Elections

The question of influence is essentially if you could be a party boss in a particular city, what city would generally get you the farthest nationally?

This is because “influence” depends up what state a city is in, what percentage of that state’s population is in the city, and how the state fared in congressional reapportionment in terms of people per representative. In a perfectly democratic system, “influence” would be equal to a cities’ percent of nationwide population, but our system is not intended to be perfectly democratic.

Calculating Influence

Calculating “influence” is done by equally-weighting the share of reps across Senatorial, Congressional and Presidential elections. As an example, a state with 2 Senators (out of 100 total), 3 Congressional representatives (out of 435 total) and 5 electoral college electors (out of 538 total) would have an influence of 1.2%. This should make sense as the state has 2% of Senators, .6% of Congressional Representatives, and .9% of electoral college electorates.

(1/3)((2/100) + (3/435) + (5/538)) = ~1.2%

To calculate this for a city, you take the city’s percentage population of the entire state and re-run the calculation using fractional representatives. Intuitively, this says that a city that constitutes 75% of a states’ population “has” 1.5 Senators, whereas a city that is only 10% of a states’ population “has” .2 Senators.

This method is admittedly imperfect, as a city that constituted 75% of a states’ population could probably elect any 2 Senators it wanted to if it voted as a block, and redistricting could in theory keep a city with two representatives from being the majority voting block for any representative , but we’re going with it 🙂

Most Influential American Cities in 2020

When you re-run the numbers on cities (in this case 2020 Census Places), the numbers turn out quite differently.

  1. New York, NY – 1.91% influence (8.80 million people)
  2. Los Angeles, CA – .79% influence (3.90 million people)
  3. Chicago, IL – .67% influence (2.75 million people)
  4. Houston, TX – .48% influence (2.30 million people)
  5. Phoenix, AZ – .46% influence (1.60 million people)
  6. Philadelphia, PA – .39% influence (1.60 million people)
  7. Anchorage Municipality, AK – .37% influence (291k people)
  8. Albuquerque, NM – .32% influence (565k people)
  9. Omaha, NE – .32% influence (486k people)
  10. San Antonio, TX – .30% (1.43 million people)

While the top 6 cities stay the same, Anchorage, Albuquerque and Omaha make the list, primarily due to their relatively large share of the population in AK, NM, and NE, as you can see in the image below.

If you’d like to play around with the numbers and sort by the various columns: check out the spreadsheet here.

DC – Taxation without Representation?

DC is particularly interesting, in that its influence of .19% isn’t all that much lower than its nationwide population percentage and there are similarly sized cities roughly as influential.

In fact, on a per-person basis, voters in DC are actually more “influential” per capita than voters in 11 states. This is because DC having 3 electoral college electors as a city of only 689,545 (.55% of electors despite being only .21% of the total population) is disproportionate enough to make up for the lack of Senators or voting Congressional representative. Although DC is still under the average ratio of influence to nationwide population pct (which is 1), there are 11 states with an even lower ratio.

Most Populous Counties in 2020

Per the 2020 Census, the most populous counties were as follows:

  1. Los Angeles County, CA – 10.01 million people
  2. Cook County, IL – 5.28 million people
  3. Harris County, TX – 4.73 million people
  4. Maricopa County, AZ – 4.42 million people
  5. San Diego County, CA – 3.30 million people
  6. Orange County, CA – 3.19 million people
  7. Kings County, NY – 2.74 million people
  8. Miami-Dade County, FL – 2.70 million people
  9. Dallas County, TX – 2.61 million people
  10. Riverside County, CA – 2.42 million people

Most Influential Counties in 2020

Crunching the “influence” numbers gives you:

  1. Los Angeles County, CA – 2.03% influence (10.01 million people)
  2. Cook County, IL – 1.30% (5.28 million people)
  3. Maricopa County, AZ – 1.26% (4.42 million people)
  4. Harris County, TX – .98% (4.73 million people)
  5. Clark County, NV – .98% (2.27 million people)
  6. Honolulu County, HI – .75% (1.02 million people)
  7. San Diego County, CA – .67% (3.30 million people)
  8. Orange County, CA – .64% (3.19 million people)
  9. Providence County, RI – .64% (660k people)
  10. Kings County, WA – .64% (2.27 million people)

The top 15, as you can see here.

While the method is definitely imperfect, it does flag the importance of New Castle County to American politics, where Joe Biden began his political career many years ago.

Imperfectly Democratic

In the Senate, the 643,077 people living in Vermont, per the 2020 Census, elect and are represented by 2 Senators, whereas in California, any group of 643,077 people will only comprise a small fraction of the 39,538,223 people who elect and represented by the state’s 2 Senators, per the 2020 Census.

The disparity is somewhat smaller for Congressional races. In terms of apportionment population, the lowest ratio of people to representative is in Montana, where there are 542,704 people for each of its 2 House Representatives. The highest ratio is in Delaware, where there are 990,837 people for Delaware’s lone representative. The apportionment population used to determine the number of representatives in a state is slightly different from a state’s total population, as it includes count of U.S. military and federal civilian employees (and their dependents) living overseas allocated to their home state.

For the Presidency, state’s vote for the number of electors equal to their number of Senators plus their number of representatives. DC, which elects neither a voting member of Congress nor a Senator, votes for 3 electors.

Notes

To be perhaps more accurate, this analysis could incorporate a city’s percentage of voting eligible or participating voter population and the subtle nuances for electoral college electors in NE and ME.

Data from the 2020 Census via Redistricting Data Hub

Seedigami – Women’s March Madness Seed Matchup Analysis

The analysis and charts in this post were inspired by Scorigami, a website which tracks whether a certain final score has occurred in an NFL game. As an example, there has been one game that has ended with a score of 72-41 and no games that have ended with a score of 21-5. This piece is a companion to an analysis of the Men’s Tournament, which you can find here.

In this post, I analyzed all the different matchups between different seeds at the NCAA tournament. All data in the below analyses are sourced from the Kaggle “March Machine Learning Mania 2023” datasets and incorporates results from 1998 to 2022. This doesn’t quite capture every year of the 64-team tournament era, which began in ’94.

Anything under the “N/A” line is an upset. I was surprised to see that there has never been a 3-14 or 2-15 upset in the tournament.

Seedigami – Men’s March Madness Seed Matchup Analysis

The analysis and charts in this post were inspired by Scorigami, a website which tracks whether a certain final score has occurred in an NFL game. As an example, there has been one game that has ended with a score of 72-41 and no games that have ended with a score of 21-5.

In this post, I analyzed all the different matchups between different seeds at the NCAA tournament. All data in the below analyses are sourced from the Kaggle “March Machine Learning Mania 2023” datasets and incorporates results from 1985 to 2022.

Following the completion of the 2023 tournament, the charts will need to be updated, as, among others, this tournament has seen:
– The first 6 seed vs. 15 seed game (Creighton vs. Princeton)
– The first victory of a 6 seed over a 15 seed
– The first victory of a 9 seed over a 3 seed (FAU over Kansas State)

The one fewer 7 – 10 game is due to a forfeit.

Anything under the “N/A” line is an upset.

8 seeds have been particularly successful against lower seeds.

State Adjacency Portmanteaus

A border portmanteau is a region or town near a mutual border that combines the names of two, or occasionally three, adjacent states. The most famous example is probably “Texarkana” which is a combination of Texas, Arkansas and Louisiana. There is a Texarkana, TX and a Texarkana, AR. Having seen “Pen Mar, MD” on the map, I was curious as to which state borders have a border portmanteau.

I generated the border geometries using a slightly modified version of the adjacency code described here. The data came from the “Border portmanteaus” section of the List of geographic portmanteaus Wikipedia article. Some border portmanteaus no longer exist (ex. Nosodak, ND) or have no current population (Oklarado, CO) but are included anyway.

Mapping and Analyzing 16 Years of Data on Top HS Basketball Recruits

For this post, I analyzed ESPN100 men’s HS basketball prospect ratings from 2007 to 2022. For a given year, the dataset looks something like this:

Example of 2022 ESPN100 dataset

For this analysis, I cleaned the dataset and used API Ninja to Geocode hometown information. I ended up analyzing:
– Basic player information like name, height, and weight
– Top high schools for producing ESPN100 players and the colleges players most frequently attend
– Maps of where ESPN100 players come from each year and over time
– Maps of where particular colleges’ ESPN100 players come from
– An analysis of ESPN100 players forgoing college basketball

Notes and some additional information on processing are including at the bottom of the post. In general, I did not edit the dataset and primarily filled in gaps where they existed. The original dataset was accessed manually through the ESPN100 website.

Basic Prospect Data (Names, Height, Weight)

Names

The most common name for ESPN100 recruits from 2007 to 2022 is either Jordan or Jalen, with 21 recruits each having those names. The other names are:
  1. Jordan – 21 recruits
  2. Jalen – 21 recruits
  3. Brandon – 19 recruits
  4. Isaiah – 16 recruits
  5. Josh – 15 recruits
  6. Chris – 15 recruits
  7. Anthony – 14 recruits
  8. Justin – 14 recruits
  9. Tyler – 14 recruits
  10. James – 13 recruits

Height

The tallest player in the dataset is Mamadou Ndiaye, who is listed at 7’5″. Ndiaye was the #74 ranked player in 2013 and went on to play basketball at UC Irvine.

The shortest player in the dataset is Erving Walker, who is listed at 5’6″. Walker was the #75 ranked player in 2008 and went on to play basketball at Florida.

View the full distribution below:

Histogram of player heights from ESPN100 from 2007 to 2022

Weight

The heaviest player in the dataset is Sim Bhullar, who is listed at 7’4″ and 330 lbs. Bhullar was the #82 ranked player in 2011 and went on to play basketball at New Mexico State before becoming the first player of Indian descent to play in the NBA.

There are 8 different players tied for the lightest player in the dataset at 150 pounds.

View the full distribution below:

Histogram of player weights from ESPN100 from 2007 to 2022

Top High Schools and Colleges

The graphic below shows the high schools or prep schools that had the most ESPN100 prospects.

High schools that produced the most ESPN100 prospects

In terms of colleges signing the highest number of ESPN100 recruits, Kentucky and Duke are a clear tier above the rest.

Colleges that recruited the most ESPN100 prospects

Recruits by Hometown Every Year and in Total

The two graphics depict where ESPN100 recruits came from in a given year. Some possible trends include a lot fewer players from the Bay Area over time and more players from the Twin Cities and Seattle.

Mapping where top basketball players came from on ESPN100 in total across 2007 to 2022

Where Colleges Get Their Recruits

The following section shows maps for the 76 schools that signed more than 5 ESPN100 recruits from 2007 to 2022. The slideshow is loaded in descending order, with schools with more numbers of recruits at the beginning. For each school, the map plots the hometown of every recruit they signed, with a catchall “Overseas” category.

In many cases, one could come pretty close to guessing the school just by the location of its top recruits.

Players Forgoing College Basketball

During this time period, players were not allowed to enter the NBA directly out of high school. Nonetheless, 26 players on the list did not end up signing with a college to play basketball. The number of such players peaked at 8 in 2020, which coincided with the inaugural season of the NBA G League Ignite.

Graphic depicting ESPN100 prospects

Notable prospects to forgo college basketball (2007 – 2002):
  • Jaden Hardy: #2 in ’21
  • Jalen Green: #1 in ’20
  • Jonathan Kuminga: #4 in ’20
  • LaMelo Ball: #21 in ’19
  • Anfernee Simons: #9 in ’18
  • Mitchell Robinson: #11 in ’17
  • Emmanuel Mudiay: #5 in ’14
  • Brandon Jennings: #1 in ’08
  • Terrelle Pryor: #39 in ’08
Note: Terrelle Pryor appears to be the only player from 07-22 to forgo basketball entirely. He went on to play football at Ohio State

Data Processing and Cleaning

For each class year, the ESPN100 data provides the following fields
– Rank
– Player Name
– Position
– Hometown (including High School name and City / State)2
– Height
– Weight
– Stars
– Grade (0 to 100 recruit grade determined by ESPN)
– College (School where they signed or committed)

A few notes about the dataset:
– For the “Hometown” field, the City / State information is not always the City / State of the High School, and appears to be the actual City / State where the player is from. At least for this analysis, I am more interested in a player’s hometown and not the location of the school where they played their basketball season. See (appendix) for more details.
– The dataset includes international recruits.
– In the original ESPN100 dataset, not every player has complete “College” data. In most cases the name of the college where they signed a letter of intent is listed, but in some cases, the field just lists where a player committed to play or simply provides a list of school
– In certain cases, a single college or destination is not provided, in those instances I looked into the player’s career and generally marked where they played next played basketball.

Notes

1 For whatever reason, ESPN100 does not always list exactly 100 recruits, and in some years, fewer recruits are listed.

2 Many elite HS basketball recruits attend high schools or prep schools to play basketball. In general, it seems like the ESPN100 list provides the name of the high school or prep school but the city and state of the player’s hometown, and not the location of the school. For example, Brandon Jennings is listed as “Los Angeles, CA Oak Hill Academy” in the dataset, even though Oak Hill Academy is in Mouth of Wilson, VA. I deferred to ESPN and did not significantly edit these values.

Four Maps to Better Understand the Last 6 Presidential Elections (2000 – 2020)

The 5 maps above show data relating to how the counties of the lower 48 states (and DC) have voted across the last six presidential elections. I decided not to include Alaska and Hawaii data, primarily because Alaska has had a number of county boundary changes since 2000. If I had wanted to include Alaska and Hawaii, I could have followed the instructions in this post.

The bulk of the election data comes from MEDSL, with additional 2020 data from VEST used to fill in some of the gaps in MEDSL’s file. For each county, I retrieved vote totals for the following six Presidential elections:
– 2000: Bush vs. Gore
– 2004: Bush vs. Kerry
– 2008: McCain vs. Obama
– 2012: Romney vs. Obama
– 2016: Trump vs. Clinton
– 2020: Trump vs. Biden

The first four maps group counties into categories I found interesting:
– Counties that supported for a candidate of the same party in every election
– Counties that began the 2000s voting Democratic and have voted Republican since (only flipping once)
– Counties that began the 2000s voting Republican and have voted Democratic since (only flipping once)
– Counties that supported Republicans in every election, except for Obama at least once

The above four categories do not encompass every county (3,108) in the contiguous United States + DC, but only 2,954. Information on the counties not included is shown below.

The number of counties with a specific Presidential voting pattern not included in those 4 maps is included below. Counties included above contain a strikethrough. Broomfield, CO is excluded from the data as it did not exist in 2000. For simplicity, I did not try to modify the 2000 election results for the counties that previously contained the area of Broomfield.

'Republican': 2064,
 'Democratic': 343,
 "Rep. except Obama '08": 120,
 'Democratic until 2016': 90,
 'Democratic until 2004': 88,
 'Republican until 2004': 68,
 'Rep. except Obama twice': 49,
 'Democratic until 2008': 39,
 'Republican until 2020': 31,
 'Rep: Bush (second), Romney, 2 x Trump': 26,
 'Republican except for Kerry, 2 x Obama ': 23,
 'Republican until 2016': 22,
 'Rep: Bush (second), 2 x Trump': 20,
 'Democratic until 2012': 19,
 'Republican except for Obama (first), Biden': 13,
 'Democratic except Trump first term': 12,
 'Republican except for Obama (first), Clinton, Biden': 10,
 'Democratic except Bush second term': 10,
 'Democratic until 2020': 10,
 'Republican except for 2 x Obama, Biden': 9,
 'Republican until 2012': 6,
 'Rep: Bush (second), McCain, 2 x Trump': 6,
 'Republican except Kerry, Obama (first)': 5,
 "Rep. except Obama '12": 5,
 'Republican except for Kerry, 2 x Obama, Biden': 4,
 'Republican except for 2 x Obama, Clinton': 3,
 'Rep Bush (second), Trump (second)': 3,
 'Republican except for Obama (second), Biden': 3,
 'Republican except for Kerry': 3,
 'County Boundaries Changed': 1,
 'Democratic except Romney first term': 1,
 'Rep Bush (second), Trump (first)': 1,
 'Rep: McCain, 2 x Trump': 1

The fifth map looks at how many times each county had majority support for the winning Presidential candidate in these 6 elections. Interestingly, no county voted for the losing candidate in every Presidential election from 2000-2020 and 9 counties have supported the winning candidate in every one. Broomfield County is not included here, so there are only 3,107 counties.

0 winners: 0 counties
1 winner: 40 counties
2 winners: 121 counties
3 winners: 2543 counties
4 winners: 256 counties
5 winners: 138 counties
6 winners: 9 counties

Note: This post contains an updates to the maps I had previously shared in these two posts (2000 – 2016 and 2000 – 2020).