Data Science and Machine Learning Dictionary (A, B, C)

As a newbie in the data science profession, it can be challenging to understand office tech talk and jargon. This article will share the most used terminology to help you during your journey as a data science professional.

## Data Science and Machine Learning Dictionary

# A

**Algorithm** – a set of instructions for solving a task.

**Anaconda** – an open-source distribution for Data science professionals that simplifies packages and deployment. They are primarily used for Python and R programming users.

**Apache Spark** – an open-source big data computing framework and set of libraries for real-time data processing.

**Artificial Intelligence** – AI is the ability of a computer or a computer-controlled robot to perform tasks usually performed by humans as they require human intelligence.

*You are reading the article, Data Science and Machine Learning Dictionary – Part 1. Browse Part 2 of the article.*

# B

**Backpropagation** – sometimes abbreviated as “backprop”. The messenger tells the Neural Network whether it made a mistake when it made a prediction.

**Bagging** – also known as bootstrap, is a technique that predicts using a combination of predictions from multiple models created on a data subset.

**Bayes’ Theorem** – It is a mathematical formula used to determine conditional probability.

**Bayesian Networks** – A type of probabilistic graphical model aims to develop a model which maintains known conditional dependence between random variables.

*Read Part 2 of the article., Data Science and Machine Learning Dictionary.*

**Bias **– It is a systematic error due to incorrect assumptions in a Machine Learning (ML) process.

**Big Data** – The data that can be very impractical due to its great variety, huge volume, and velocity.

**Binary** – In the outcome context, it only has two unique values, such as “True” and “False”.

**Binomial Distribution** – It is a method of calculating probabilities for experiments that have a fixed number of trials.

**Boosting** – It is a sequential process where some model corrects errors learning from previous versions or models.

*You are reading the article, Data Science and Machine Learning Dictionary – Part 1. *

Part 2: *Data Science and Machine Learning Dictionary*

Part 3: *Data Science and Machine Learning Dictionary.*

Part 4: *Data Science and Machine Learning Dictionary.*

# C

**Categorical variables** – These are the variables that have discrete qualitative values, such as religion or race.

**Chi-square test** – It is a statistical method used to test and compare expected results with observed results.

**Classification** – It is the prediction of the object by identifying which category it belongs to based on distinct parameters.

**Clustering** – It is an unsupervised algorithm process of dividing data into various groups called clusters.

**Computer Vision** – It is a form of Artificial Intelligence (AI) that allows computers to visualize, identify, and process images/videos in the same way as a human vision.

**Confidence Interval** – It is a statistical method used to estimate what per cent of the population falls under a particular category based on the results from sample data.

Part 2: Data Science and Machine Learning Dictionary

Part 3: Data Science and Machine Learning Dictionary

Part 4: *Data Science and Machine Learning Dictionary.*

**Confusion Matrix **– It is a table used to express the performance of a classification model.

**Continuous Variables** – These variables have an infinite number of values, such as time, speed and distance.

**Convex Function** – It calculates if the line segment between any two points on the graph lies above or on the chart.

**Correlation** – It is the ratio of covariance between two or more variables.

**Cost function** – It is a statistical approach used to define and measure the model’s error.

**Cross-Entropy** – It is a measure of the difference between two probability distributions for a set of events.

**Cross-Validation** is a statistical technique used to evaluate and compare machine learning algorithms by dividing data into two segments.

*Hope you liked reading the article, Data Science and Machine Learning Dictionary – Part 1. Share your thoughts in the comments section below.*

To become a data scientist, explore these amazing certification programs by Console flare that make you ready for the profiles of Data Analyst, Data Engineer, Database manager, and Data Scientist.

1. Python For Data Analytics Certification Program

2. Masters in Data Science With Power BI Certification Program

For the latest industry news, follow our LinkedIn Page.

Part 2: Data Science and Machine Learning Dictionary

## 3 thoughts on “Data Science and Machine Learning Dictionary – Part 1”