In the previous articles of Data Science & ML Dictionary, we’ve shared the terminology starting from A to P. In this article, we’re going to provide the terminology starting from “R” to “Z”.
Data Science & ML Dictionary
R – It is an open-source programming language and a software environment for statistical computing, machine learning, and data visualization.
Random Forest – It comprises many decision trees and an ensemble learning method for classification, regression, and other tasks that consist of multiple Decision Trees.
Regression – It is a technique used for investigating the relationship b/w independent variables and dependent variables.
Regularization – It is a technique used to solve overfitting in statistical models.
Reinforcement Learning – It aims to train a model to return an optimum solution using a sequence of keys and/or decisions created for a specific problem.
Ruby – It is an open-source programming language primarily used for developing web apps.
Scikit-learn – It is a library for Python programmers that contains tools for machine learning, and statistical modellings such as classification, clustering, regression, and dimensionality reduction.
SQL – It is an acronym for Structured Query Language and is used to manage databases by performing tasks such as updating, retrieving, and maintaining data.
Standard Deviation – It tells you the variation of the data around the mean.
Standard Error – It tells the variation of the various means calculated.
Stochastic Gradient Descent – The goal is to minimize the Cost Function by incrementally changing the weight of the network.
Supervised Learning – It is the type of learning when an algorithm learns on a labelled dataset and analyses the sample data.
Support Vector Machine – It is a supervised learning model which creates a line or a hyperplane that divides the data into classes.
T-Distribution – It is a probability distribution that describes the standardized distances of sample means to the population mean, the same as the normal distribution.
T-Value – The variance b/w and within the groups, where big T-Value means distinct groups, and small T-Value mean similar groups.
TensorFlow – It is an open-source software library for deep learning applications which makes model building easy through large-scale neural networks with many layers using data flow charts.
Tokenization – Process of splitting a text string into units is called tokens and is a part of NLP (Natural Language Processing).
Transfer Learning – It is a machine learning method where the knowledge of application obtained from a model task can be reused as a foundation for another task.
True Positive – When you predicted positive, and it is positive
True Negative – When you predicted negative, and it is negative
T-test – It is a test used to compare two population sets by finding the difference in their population means.
Type I error – It is the decision to reject the null hypothesis as it could be incorrect.
Type II error – It is the decision to retain the null hypothesis as it could be incorrect.
Underfitting – It is a modelling error that can neither model sample data nor generalizes fresh data and does not perform good on the sample set.
Unsupervised Learning – The process where an ML model learns on unlabelled data, to produce accurate and reliable outputs, inferring more about hidden structures.
Variance – It is used to measure the spread of a given set of numbers.
Vectors – They are used to represent numeric characteristics known as features in a mathematical form.
XGBoost – It is an open-source library that provides a regularizing gradient boosting framework for programming languages such as C++, Java, Python, R, etc.
Z-test – It is a statistical test used to calculate whether two population means are different.
For the latest industry news, follow our LinkedIn Page.