Loading
Glossary of 50 AI terms with examples and data:
AI (Artificial Intelligence)
Definition: Machines designed to mimic human intelligence.
Example: Self-driving cars.
Data: GPS coordinates, sensor data, road images.
Algorithm
Definition: A step-by-step procedure used for calculations or problem-solving.
Example: QuickSort algorithm for sorting data.
Data: Input list: [3, 1, 4, 1, 5], Output: [1, 1, 3, 4, 5].
Artificial Neural Network (ANN)
Definition: A system of algorithms designed to recognize patterns.
Example: Image recognition systems.
Data: Input image: "cat", Output: "cat" label.
Autonomous System
Definition: A system that operates without human intervention.
Example: A drone that can deliver packages.
Data: GPS coordinates, battery status, obstacle distance.
Agent
Definition: An entity that can observe and act within an environment.
Example: A robot vacuum that detects obstacles and cleans a room.
Data: Sensors detecting obstacles, movement commands.
Backpropagation
Definition: A method to update weights in a neural network to minimize errors.
Example: Training a neural network for digit recognition.
Data: Input: Image of "3", Expected output: 3, Prediction: 5. Error is used to adjust the weights.
Bias
Definition: A systematic error in the model due to incorrect assumptions or data.
Example: Gender bias in hiring algorithms.
Data: A dataset with more resumes from men than women.
BERT (Bidirectional Encoder Representations from Transformers)
Definition: A transformer-based model for NLP tasks.
Example: Sentiment analysis on tweets.
Data: Input: "I love this movie", Output: Positive sentiment.
Classification
Definition: Categorizing data into predefined labels.
Example: Email spam detection.
Data: Emails labeled as "spam" or "not spam."
Clustering
Definition: Grouping similar data points together without labels.
Example: Customer segmentation.
Data: Customer purchases:
Computer Vision
Definition: Teaching machines to interpret and process visual data.
Example: Face recognition.
Data: Image dataset labeled with names (Person A, Person B).
Convolutional Neural Network (CNN)
Definition: A deep neural network used mainly for image classification.
Example: Recognizing objects in images.
Data: Image of a dog: Input: pixels, Output: "dog."
Confusion Matrix
Definition: A table used to evaluate classification performance.
Example: In a spam filter model, showing true positives (correct spam), false positives (non-spam marked as spam), etc.
Data: True positives: 30, False positives: 5, False negatives: 2, True negatives: 50.
Deep Learning
Definition: A subset of machine learning with multiple layers of neural networks.
Example: Image classification using deep networks.
Data: Labeled images of animals: Input: Image, Output: Animal type.
Decision Tree
Definition: A model that splits data into branches based on feature values.
Example: Predicting loan approval based on income and credit score.
Data:
Data Mining
Definition: Discovering patterns in large datasets.
Example: Identifying customer buying patterns.
Data: Transaction data: Customer 1: $100, Customer 2: $200.
Dataset
Definition: A collection of data used to train or test machine learning models.
Example: Iris flower dataset for classification.
Data: Features: Petal length, petal width, Output: Flower species.
Dimensionality Reduction
Definition: Reducing the number of features in a dataset while retaining essential information.
Example: Using PCA to reduce features in a high-dimensional dataset.
Data: High-dimensional data points: (1, 2, 3, 4, 5) → Reduced to (2, 3).
Dropout
Definition: A regularization technique to prevent overfitting by randomly dropping neurons during training.
Example: Used in deep neural networks for training.
Data: During training, some neurons are ignored to avoid overfitting.
Ensemble Learning
Definition: Combining multiple models to improve performance.
Example: Random Forest combining multiple decision trees.
Data: 100 decision trees each making a prediction and the majority vote is taken.
Epoch
Definition: A full pass through the entire training dataset during training.
Example: Training a neural network on 1000 samples for 10 epochs.
Data: A dataset of 1000 samples.
Exploratory Data Analysis (EDA)
Definition: Analyzing datasets to summarize their main characteristics.
Example: Visualizing a dataset of customer ages.
Data: Customer ages: [25, 30, 35, 40, 45]. Visualize age distribution.
Feature
Definition: An individual variable used as input for machine learning models.
Example: "Age" and "Income" can be features for predicting loan approval.
Data: Age = 30, Income = 50000.
Feature Engineering
Definition: The process of transforming raw data into features that can be used for machine learning.
Example: Creating a "age group" feature from age data.
Data: Age = 25 → Age Group = "20-30."
F1 Score
Definition: A metric for classification performance, balancing precision and recall.
Example: A model with high precision and recall will have a high F1 score.
Data: Precision = 0.9, Recall = 0.8, F1 Score = 2 * (0.9 * 0.8) / (0.9 + 0.8) = 0.84.
Gradient Descent
Definition: An optimization algorithm used to minimize loss in machine learning.
Example: Used to train a neural network by updating weights.
Data: A neural network with weights [0.2, 0.5]; update weights using gradient descent.
Generative Adversarial Network (GAN)
Definition: A deep learning framework consisting of two networks: a generator and a discriminator.
Example: Generating realistic images from random noise.
Data: Input: Random noise, Output: Fake image of a cat.
Gradient Boosting
Definition: A machine learning technique that builds an ensemble of weak learners in a sequential manner.
Example: XGBoost algorithm for predictive modeling.
Data: Training data with features like age, income for predicting credit approval.
Hyperparameter
Definition: Parameters that control the training process of a machine learning model.
Example: Learning rate and batch size in a neural network.
Data: Learning rate = 0.01, Batch size = 32.
Hyperparameter Tuning
Definition: The process of finding the best hyperparameters for a model.
Example: Using grid search to find the best learning rate and number of trees for a decision tree.
Data: Hyperparameters tested: learning rates of 0.1, 0.01, and 0.001.
Inference
Definition: The process of making predictions using a trained machine learning model.
Example: Using a trained model to predict if an email is spam.
Data: Input: Email text, Output: Spam or Not Spam.
Imbalanced Dataset
Definition: A dataset where certain classes or categories are underrepresented.
Example: Predicting rare diseases where only 5% of patients
have the disease.
Data: Disease = "Cancer" 5%, Disease = "Healthy" 95%.
K-Nearest Neighbors (KNN)
Definition: A machine learning algorithm that classifies data based on the majority class of its nearest neighbors.
Example: Classifying a point as "cat" or "dog" based on its nearest neighbors.
Data: Neighbors of point X are mostly labeled as "cat."
K-Means Clustering
Definition: An algorithm for partitioning data into K clusters based on distance metrics.
Example: Grouping customers into clusters based on purchasing behavior.
Data: Customer data: [low spender, high spender, mid spender].
Logistic Regression
Definition: A regression model used for binary classification tasks.
Example: Predicting whether a customer will buy a product (Yes/No).
Data: Features: age, income, and past purchases.
LSTM (Long Short-Term Memory)
Definition: A type of recurrent neural network designed to remember long-term dependencies.
Example: Predicting the next word in a sentence.
Data: Input: "I am going to", Output: "the store."
Model Overfitting
Definition: When a model learns too much from training data, capturing noise instead of general patterns.
Example: A decision tree that perfectly classifies the training data but performs poorly on new data.
Data: High accuracy on training data, low accuracy on test data.
Machine Learning
Definition: A subset of AI where systems learn from data to improve their performance.
Example: Predicting house prices based on features like size and location.
Data: Features: size, bedrooms, location; Target: price.
Matrix Factorization
Definition: A technique used for reducing the dimensionality of data, often used in recommendation systems.
Example: Netflix using matrix factorization to recommend movies.
Data: User ratings for movies.
Naive Bayes
Definition: A probabilistic classifier based on Bayes’ theorem.
Example: Classifying email as spam or not based on word frequency.
Data: Words in email: ["win", "free", "money"] → Predicted class = Spam.
Neural Network
Definition: A computational model inspired by the human brain used for machine learning tasks.
Example: A neural network used for handwriting recognition.
Data: Image of handwritten "3", Output: Predicted label = "3."
Principal Component Analysis (PCA)
Definition: A technique for reducing the number of features in a dataset while preserving as much variance as possible.
Example: Reducing dimensions in a dataset with hundreds of features.
Data: Input data with 100 features → Reduced to 2 dimensions.
Precision
Definition: A metric that evaluates the accuracy of positive predictions.
Example: In spam classification, precision measures how many of the predicted spam emails are actually spam.
Data: True positives: 20, False positives: 5, Precision = 20 / (20 + 5) = 0.8.
Reinforcement Learning
Definition: A machine learning technique where an agent learns by interacting with its environment and receiving feedback.
Example: A robot learning to play chess through trial and error.
Data: State: Chessboard configuration, Action: Move piece, Reward: Win or lose.
Recurrent Neural Network (RNN)
Definition: A neural network designed for sequential data processing.
Example: Predicting the next word in a sentence.
Data: Input: "I am", Output: "going".
Random Forest
Definition: An ensemble learning method using many decision trees.
Example: Predicting customer churn using various decision trees.
Data: Customer data with features like age, income, and churn status.
This is a comprehensive glossary of 50 common AI terms, with examples and data to provide better context for their applications.