Data Analytics: A Beginner’s Guide

This article explains the differences between data analysis, data analytics, and data science. It covers the differences in the requirement of programming language, level of statistical knowledge, and involvement in computer science.

Data Analytics, Data Science, and Data Analysis have high overlap in terms of what they cover and what they mean. Obviously, they are all about data, and in a lot of circumstances, you will find that people use them interchangeably. Of course, there is some nuanced difference, and I will briefly explain them below.

What is Data Analysis?

Data analysis can be considered as a more generic term, literally, analyzing data! As long as you are dealing with data in your job, you need to analyze them. Thus, it is unsurprising to see that Data Analysis has been around like forever. The following is the Google Trends, and the yellow line is about Data Analysis is the overall trend since 2004. (Note that, Google Trends only provides data from 2004). Interestingly, Data Analysis has been pretty steady, even with the increased popularity of Data Science and Data Analytics. Thus, we can consider Data Analysis as an evergreen term.

Since it is so generic, Data Analysis can mean a lot of different things. You can use linear regression models to do data analysis, but you could also use machine learning models to conduct analysis.

By the way, I found a book written by Wes McKinney very useful. There is a free version online. Basically, that book focuses on how to understand the Python programming environment and frequently-used packages (NumPy and Pandas), data manipulations, and data visualization. However, that book does not cover how to use Python to conduct statistical analysis. I think it is a good idea to have his book dedicated to Python data manipulation since we all need to do some data cleaning before conducting any data analysis. Further, it is a bit too much to have a book combining both data manipulation and statistical methods via using Python. Thus, I believe the combination of that book and my tutorials on this website is a good combination for understanding data analytics.

What is Data Analytics?

Data Analytics typically emphasizes getting insights from data. The following is the quote from Northeastern University’s website, and I found it appropriate to summarize the nature and core of data analytics.

Data analysts utilize data to draw meaningful insights and solve problems. They analyze well-defined sets of data using an arsenal of different tools to answer tangible business needs: e.g. why sales dropped in a certain quarter, why a marketing campaign fared better in certain regions, how internal attrition affects revenue, etc. Data analysts have a range of fields and titles, including (but not limited to) database analyst, business analyst, market research analyst, sales analyst, financial analyst, marketing analyst, advertising analyst, customer success analyst, operations analyst, pricing analyst, and international strategy analyst. — Webpage link.

Interestingly, you will find a lot of business schools offering Master’s degrees in Business Analytics, such as Carnegie Mellon University Tepper School of Business offering an online Master’s degree in Business Analytics. Some schools offer it in an interdisciplinary format, such that (1) the program faculty are from different schools including business, statistics, and computer science, and (2) the program has different tracks. A good example of this is Georgia Tech’s MS in Analytics, which includes Analytical Tools Track, Business Analytics Track, and Computational Data Analytics.

It is worth noting that all tracks basically have a lot of overlaps in courses. For instance, based on their official website, all tracks share the courses of Data Analytics in Business, Data and Visual Analytics, operation, and two statistics courses. The knowledge of data visualization and statistics is foundational to data analytics. In addition, knowledge of machine learning and computational statistics is required if you want to do more computational data analytics.

Tracks offered by Georgia Tech. Source: Georgia Tech MS Analytics Website
Curriculum Requirement. Source: Georgia Tech MS Analytics Website
Courses offered in the MS in Analytics. Source: Georgia Tech MS Analytics Website

What is Data Science?

While Data Analytics emphasizes getting insights from data, Data Science is more on the analysis method (e.g., machine learning) and data characteristics (i.e., big data). At the graduate school level, Data Science typically is offered as an interdisciplinary format of computer science and statistics. (In some situations, it involves some engineering departments and business departments as well.) You can find some information about how the curriculum is structured by visiting the program pages of MS in Data Science offered by Columbia and the University of Washington. Thus, based on the perspective of curriculum, data analytics and data science have a high degree of overlap, or even in some situations, they are almost identical. Further, the end result of data science is to get insights from data, and thus in that sense, you can say data science is similar to data analytics (but, again, data science emphasizes the uniqueness of method and data characteristics.).

This image has an empty alt attribute; its file name is image-12.png
Source: The author

You might have questions as to why there is a need to involve computer science in data science. That is partially due to the method of machine learning emphasized in data science. Machine learning algorithms started in computer science, such that the term machine learning was coined in 1959 by Arthur Samuel, who was working at IBM in the field of computer gaming and artificial intelligence (Source: Wikipedia). Further, data scientists typically work with big data, which typically is stored in databases. Consequently, you are expected to learn SQL as well, which typically is a course in the computer science department rather than in the statistics department. Thus, again, data science is more about techniques and methods.

When do I need to use machine learning algorithms? If you are not dealing with big data involving a lot of variables, you do not need to use machine learning algorithms. For instance, if you are working with survey data (with like 5 variables or constructs), you can just use linear regression and logistic regression to conduct data analysis. Thus, if you are working with marketing survey data and running a psychology experiment involving a few independent variables, you probably do not need to use machine learning algorithms. As a side note, I always find a comment about machine learning from Elon Musk amusing.

I discourage (the) use of machine learning because it is really difficult unless you have to use machine learning, don’t do it. (haha) It is usually a red flag when somebody says that we want to use machine learning to solve this tech. I am like that sounds like b’sh’t. — Presentation Link

What is the difference between data analytics and data science?

Thus, the following table is the major conclusion of this write-up to summarize the connections and differences between data analytics and data science.

Data Analytics Data Science
EmphasisGetting Insights from DataAnalysis Methods and Big Data
DisciplinesMainly Combining Business and StatisticsMainly Combining Computer Science and Statistics
Programming Python and R (as well as SAS, SPSS)Mainly Python, as well as R. SQL is expected.
StatisticsHigh RequirementModerately Hight Requirement
  • First, they are both doing analyses of data. Data Analytics focuses on getting insights from data, whereas Data Science emphasizes the methods and data characteristics.
  • Second, Data Analytics is a combination of statistics and real-world applications, with some support from computer programming. In contrast, computer science especially machine learning algorithms is a key part of Data Science, combined with statistics and real-world applications. (See below for the visual illustration.)
  • Third, to be a data analyst or data scientist, you are expected to learn both statistics and programming. For specific programming languages, both Python and R are popular among both data analysts and data scientists, even though you will see more Python in the Data Science domain. Further, machine learning knowledge will be expected from data scientists, but it does not mean data analysts do not need machine learning at all.

Final Comment

The current tutorial of Python for Data Analytics provides a foundational knowledge basis for both data analysts and data scientists. I hope you find this write-up and the tutorial provided on this site useful.

Disclaimer: The author and this website do not have any relationships with any schools mentioned here. The author just uses them to explain the connections and differences between different concepts. This article is for information only. Please consult professional advisors if you are looking for any advice for graduate schools.