How to Analyze Uncertainty? Interesting salary trends on Stack Overflows Annual Survey 2020

Photo by olia danilevich from Pexels

When you are solving what is the average salary for a developer type you’ll cross with this observation: Dataset has 23 different developer categories but this is a multiple-choice question so respondents can choose all that applies. This means for example that one respondent can have many developer types listed. There were 8269 different answer types (23 categories + 8264 different combinations of those categories). So how can you exactly tell, which salary is meant for each developer type?

Stack Overflow makes an Annual Developer Survey to their developer community. In 2020 with nearly 65,000 responses fielded from over 180 countries, their survey examines all aspects of the developer experience from technologies to education and salary levels.

In this article I will be exploring the 2020 Stack Overflow public survey. My study question will we be focused on developer types salary and education. Main target group are those developers who describe them as professionals.

During the exploring phase I noticed interesting things in when respondents gave their answers for salary. This observation led me to examine salary question more deeply. You can find the whole analysis in my GitHub.

There will be three study questions:

  1. What is the median salary for each developer type?

2. How does education compare to median salary?

3. How does education compare to average salary?

Part One: What is the median salary for each developer type?

This was a very interesting question. There were two main observations: Salary skewness and developer type combinations. Firstly, salary distribution is very high right-skewed. Salaries start from 0 and end up 2 million USD. Different countries have different salary levels and respondents have answered in different ways. For example, there are 313 respondents whose annual incomes are below 1000 USD and 1503 respondents whose annual incomes are below 5000 USD. Also, there are 1570 respondents whose annual incomes are in the range of 260k to 2m USD and 640 respondents whose annual incomes are in the range 900k to 2m USD. Secondly, respondents have given many combinations of developer types, actually, there were 8269 combinations of developer types.

Salary is measured by variable ConvertedComp. This variable means that salary is converted to annual USD salaries using the exchange rate on 2020–02–19, assuming 12 working months and 50 working weeks.

This first chart above is similar to the Stack Overflow website’s salary chart.

From charts above we can make some conclusions.

Firstly, salaries for different developer types are very hard to identify because of respondents’ multiple-choice answers. So how can you exactly tell, which salary is meant for each developer type? One respondent, whose annual incomes are 2 million, answer for developer type is:

Data scientist or machine learning specialist; Designer;Developer, back-end; Developer, desktop or enterprise applications; Developer, front-end; Developer, full-stack; Developer, mobile;Engineer, data; Engineer, site reliability; Engineering manager.

Secondly, median salaries depend on boundaries we have set. If we don’t set any lower or upper boundaries, we will get different types of charts as we have seen above. For example in professionals salary data, there are 651 Senior executive/VP’s and 24 of them are without combinations. Also there are 224 Marketing or sales professionals and all of them are in combinations with other developer types. These two types are also related to the highest salary category which we can see in salary groups. If we set upper bound for upper whisker (salary range chart, max 200k) we can see a different Top 5.

Thirdly, different countries have different salary levels based on their gross domestic product. For this case, we could make normalization e.g. logarithmic transformations and/or relate this to countries GDP. This is another analysis to do.

Part Two: How does education compare to median salary?

When we are exploring median salaries by education, we have three salary data frames in a jupyter notebook: salary without upper bound (max value 2m USD), salary range with upper bound (max value 200k), and salary group very high (range 263k to 2m USD). You can find a detailed analysis in my GitHub with respondents count etc.

In the first chart above where we have no upper bound, we can notice Top 5: Other doctoral degree (resp. 761, median 75k), Master’s degree (resp. 7749, median 58k), Associate degree (resp. 1058, median 54k), Bachelor’s degree (resp. 15610, median 54k) and Some college/university (resp. 3707 , median 49k)

In the second chart, we have an upper bound 200K and we can notice another order in Top 5: Other doctoral degree (resp. 681, median 70k), Master’s degree (resp. 7235, median 54k), Associate degree (resp. 1009, median 52k), Bachelor’s degree (resp. 14585, median 49k) and Some college/university (resp. 3487 , median 45k)

In the third chart, we have the highest salary group and there we can notice the third order in Top 5: No formal education (resp.7, median 1m), Associate degree (resp. 41, median 1m), Bachelor’s degree (resp. 804, median 813k), Primary/elementary school (resp. 9, median 771k) and Secondary school (resp. 48, median 713k),

So, we have three different charts. If your education is Other doctoral degree you will have a good chance to be a well-paid employer. Also, it was seen that without any higher or formal education you can also achieve a well-paid job, if we can rely on this data.

Part Three: How does education compare to average salary?

When we are exploring average salaries by education, we have three salary data frames: salary_pro without upper bound (max value 2m USD), salary_range with upper bound (max value 200k), and salary group very high (range 263k to 2m USD).

In the first chart above where we have no upper bound, we can notice Top 5:

Other doctoral degree (resp. 761, avg 125k), Primary/elementary school (resp.132, avg 120k), Bachelor’s degree (resp. 15610, avg 107k), Associate degree (resp. 1058, avg 106k) and Master’s degree (resp. 7749, avg 102k)

In the second chart, we have an upper bound 200k and we can notice another order in Top 5:

Other doctoral degree (resp. 681, avg 79k), Master’s degree (resp. 7235, avg 62k), Associate degree (resp. 1009, avg 60k), Bachelor’s degree (resp. 14585, avg 59k) and No formal education (resp.129, avg 58k)

In the third chart, we have the highest salary group and there we can notice the third order in Top 5:

Associate degree (resp. 41, avg 1.2m), Primary/elementary school (resp. 9, avg 1m), Bachelor’s degree (resp. 804, avg 943K), Some college/university (resp. 168, avg 859k), No formal education (resp.7, avg 854k)

So, we have three different charts. If your education is Another doctoral degree you will have a good chance to be a well-paid employer. Quite curious is that Primary/elementary school was the second place in two charts above. Also, if your education is Bachelor’s degree you will have a sure option to be in Top 5. Also, it was seen that without any higher or formal education you can also achieve a well-paid job if we can rely on this data.

Conclusions

Survey questions can be quite tricky to analyze. The more time you spend with data more deeply you’ll dive in. In this dataset, you can find so many ways to make better analyses.

Salaries for different developer types are very hard to identify because of respondents’ multiple-choice answers and it is hard to tell, which salary is meant for each developer type. Average salaries depend on the boundaries we have set. If we don’t set any lower or upper boundaries, we will get different types of charts as we have seen above. Also, different countries have different salary levels based on their gross domestic product. In this case, we could have other methods to analyze and this is another story.

When we were comparing education to salary we noticed the same thing with salary data. For median salaries: If your education is Another doctoral degree you will have a good chance to be a well-paid employer. For average salaries: If your education is Another doctoral degree you will have a good chance to be a well-paid employer. Quite curious is that Primary/elementary school was the second place in two charts. Also, it was seen that without any higher or formal education you can also achieve a well-paid job if we can rely on this data.

In this study, we had four different median salary charts, three different education by median salary charts, and three different education by average salary charts. Which one is to be believed?

The main things in a data analysis sense are:

1) How to deal with high right-skewed data? Where to set lower and upper bounds? And how sure we can be, those respondents have answered truthfully.
2) Understanding the salary levels in different countries. If we want to make a reliable analysis on world-wide developer salaries, it means we have to take care of GDP in every country and relate that value to every country’s annual salaries.
3) Understanding the salary — developer types problem: which salary is meant for each developer type in respondents answers when they are combinations (like 73% answers are).

Overall Stack Overflows public survey is a good material for data scientists to examine and practice highly needed skills.

Data Expert, Innovator & Co-Founder of Digitalents Helsinki, Innovative Leader, PSM