In the previous articles of Data Science & ML Dictionary, we’ve shared the terminology starting from A to P. In this article, we’re going to provide the terminology starting from “R” to “Z”.

Part 1: Data Science & ML Dictionary

Part 2: Data Science & ML Dictionary

Part 3: Data Science & ML Dictionary

## Data Science & ML Dictionary

# R

**R **– It is an open-source programming language and a software environment for statistical computing, machine learning, and data visualization.

**Random Forest** – It comprises many decision trees and an ensemble learning method for classification, regression, and other tasks that consist of multiple Decision Trees.

**Regression** – It is a technique used for investigating the relationship b/w independent variables and dependent variables.

**Regularization** – It is a technique used to solve overfitting in statistical models.

**Reinforcement Learning** – It aims to train a model to return an optimum solution using a sequence of keys and/or decisions created for a specific problem.

**Ruby** – It is an open-source programming language primarily used for developing web apps.

Part 1: Data Science & ML Dictionary

Part 2: Data Science & ML Dictionary

Part 3: Data Science & ML Dictionary

# S

**Scikit-learn** – It is a library for Python programmers that contains tools for machine learning, and statistical modellings such as classification, clustering, regression, and dimensionality reduction.

**SQL** – It is an acronym for Structured Query Language and is used to manage databases by performing tasks such as updating, retrieving, and maintaining data.

**Standard Deviation** – It tells you the variation of the data around the mean.

**Standard Error** – It tells the variation of the various means calculated.

**Stochastic Gradient Descent **– The goal is to minimize the Cost Function by incrementally changing the weight of the network.

**Supervised Learning** – It is the type of learning when an algorithm learns on a labelled dataset and analyses the sample data.

**Support Vector Machine** – It is a supervised learning model which creates a line or a hyperplane that divides the data into classes.

Part 1: Data Science & ML Dictionary

Part 2: Data Science & ML Dictionary

Part 3: Data Science & ML Dictionary

# T

**T-Distribution **– It is a probability distribution that describes the standardized distances of sample means to the population mean, the same as the normal distribution.

**T-Value** – The variance b/w and within the groups, where big T-Value means distinct groups, and small T-Value mean similar groups.

**TensorFlow** – It is an open-source software library for deep learning applications which makes model building easy through large-scale neural networks with many layers using data flow charts.

**Tokenization** – Process of splitting a text string into units is called tokens and is a part of NLP (Natural Language Processing).

**Transfer Learning** – It is a machine learning method where the knowledge of application obtained from a model task can be reused as a foundation for another task.

**True Positive** – When you predicted positive, and it is positive

**True Negative** – When you predicted negative, and it is negative

**T-test** – It is a test used to compare two population sets by finding the difference in their population means.

**Type I error** – It is the decision to reject the null hypothesis as it could be incorrect.

**Type II error** – It is the decision to retain the null hypothesis as it could be incorrect.

Part 1: Data Science & ML Dictionary

Part 2: Data Science & ML Dictionary

Part 3: Data Science & ML Dictionary

# U

**Underfitting** – It is a modelling error that can neither model sample data nor generalizes fresh data and does not perform good on the sample set.

**Unsupervised Learning** – The process where an ML model learns on unlabelled data, to produce accurate and reliable outputs, inferring more about hidden structures.

# V

**Variance** – It is used to measure the spread of a given set of numbers.

**Vectors** – They are used to represent numeric characteristics known as features in a mathematical form.

# X

**XGBoost** – It is an open-source library that provides a regularizing gradient boosting framework for programming languages such as C++, Java, Python, R, etc.

# Z

**Z-test** – It is a statistical test used to calculate whether two population means are different.

Part 1: Data Science & ML Dictionary

Part 2: Data Science & ML Dictionary

Part 3: Data Science & ML Dictionary

For the latest industry news, follow our LinkedIn Page.

## 3 thoughts on “Data Science & ML Dictionary – Part 4”