In the previous articles of Data Science & ML Dictionary, we’ve shared the terminology starting from A to P. In this article, we’re going to provide the terminology starting from “R” to “Z”.

Part 1: Data Science & ML Dictionary

Part 2: Data Science & ML Dictionary

Part 3: Data Science & ML Dictionary

## Data Science & ML Dictionary

Contents

# R

**R **– It is an open-source programming language and a software environment for statistical computing, machine learning, and data visualization.

**Random Forest** – It comprises many decision trees and an ensemble learning method for classification, regression, and other tasks that consist of multiple Decision Trees.

**Regression** – It is a technique used for investigating the relationship b/w independent variables and dependent variables.

**Regularization** – It is a technique used to solve overfitting in statistical models.

**Reinforcement Learning** – It aims to train a model to return an optimum solution using a sequence of keys and/or decisions created for a specific problem.

**Ruby** – It is an open-source programming language primarily used for developing web apps.

Part 1: Data Science & ML Dictionary

Part 2: Data Science & ML Dictionary

Part 3: Data Science & ML Dictionary

# S

**Scikit-learn** – It is a library for Python programmers that contains tools for machine learning, and statistical modellings such as classification, clustering, regression, and dimensionality reduction.

**SQL** – It is an acronym for Structured Query Language and is used to manage databases by performing tasks such as updating, retrieving, and maintaining data.

**Standard Deviation** – It tells you the variation of the data around the mean.

**Standard Error** – It tells the variation of the various means calculated.

**Stochastic Gradient Descent **– The goal is to minimize the Cost Function by incrementally changing the weight of the network.

**Supervised Learning** – It is the type of learning when an algorithm learns on a labelled dataset and analyses the sample data.

**Support Vector Machine** – It is a supervised learning model which creates a line or a hyperplane that divides the data into classes.

Part 1: Data Science & ML Dictionary

Part 2: Data Science & ML Dictionary

Part 3: Data Science & ML Dictionary

# T

**T-Distribution **– It is a probability distribution that describes the standardized distances of sample means to the population mean, the same as the normal distribution.

**T-Value** – The variance b/w and within the groups, where big T-Value means distinct groups, and small T-Value mean similar groups.

**TensorFlow** – It is an open-source software library for deep learning applications which makes model building easy through large-scale neural networks with many layers using data flow charts.

**Tokenization** – Process of splitting a text string into units is called tokens and is a part of NLP (Natural Language Processing).

**Transfer Learning** – It is a machine learning method where the knowledge of application obtained from a model task can be reused as a foundation for another task.

**True Positive** – When you predicted positive, and it is positive

**True Negative** – When you predicted negative, and it is negative

**T-test** – It is a test used to compare two population sets by finding the difference in their population means.

**Type I error** – It is the decision to reject the null hypothesis as it could be incorrect.

**Type II error** – It is the decision to retain the null hypothesis as it could be incorrect.

Part 1: Data Science & ML Dictionary

Part 2: Data Science & ML Dictionary

Part 3: Data Science & ML Dictionary

# U

**Underfitting** – It is a modelling error that can neither model sample data nor generalizes fresh data and does not perform good on the sample set.

**Unsupervised Learning** – The process where an ML model learns on unlabelled data, to produce accurate and reliable outputs, inferring more about hidden structures.

# V

**Variance** – It is used to measure the spread of a given set of numbers.

**Vectors** – They are used to represent numeric characteristics known as features in a mathematical form.

# X

**XGBoost** – It is an open-source library that provides a regularizing gradient boosting framework for programming languages such as C++, Java, Python, R, etc.

# Z

**Z-test** – It is a statistical test used to calculate whether two population means are different.

Part 1: Data Science & ML Dictionary

Part 2: Data Science & ML Dictionary

Part 3: Data Science & ML Dictionary

For the latest industry news, follow our LinkedIn Page.

Hi there, just became alert to your blog through Google, and found that it is truly informative. I’m gonna watch out for brussels. I’ll appreciate if you continue this in future. Many people will be benefited from your writing. Cheers!

I do not even know the way I finished up here, but I assumed this put up was great. I do not realize who you are however definitely you’re going to a famous blogger in case you aren’t already 😉 Cheers!