Sunday, 29 December 2024

Sequential and Non-Sequential Data

Sequential and non-sequential data refer to different ways of organizing and interpreting data based on the order in which data points appear.

Sequential Data:

  • Definition: Sequential data is data that is dependent on some order or sequence. The arrangement of the data points is crucial for understanding and analyzing the data.
  • Examples:
    • Time Series Data: Stock prices, weather data, or sensor readings where each data point is associated with a specific point in time.
    • Text Data: Sentences and paragraphs where the meaning is derived from the order of words.
    • DNA Sequences: The order of nucleotides is essential for biological functions and interpretations.
  • Applications: Sequential data is critical in fields like natural language processing, finance (for predicting stock prices or economic indicators), and bioinformatics.

Non-Sequential Data:

  • Definition: Non-sequential data does not rely on a specific order. The arrangement of data points does not affect the interpretation of the data.
  • Examples:
    • Tabular Data: Spreadsheets or databases where each row represents a different record, like customer information or sales data.
    • Graphs and Networks: Data represented in nodes and edges, where the connection, rather than order, is more important.
    • Images: Pixel data where spatial arrangement matters more than a sequential order.
  • Applications: Non-sequential data is used in customer relationship management (CRM), social network analysis, and image processing.

In practice, the distinction is important for choosing the right tools and methods for analysis. For sequential data, techniques like recurrent neural networks (RNNs) or long short-term memory (LSTM) networks are often used, while non-sequential data can be tackled with solutions like decision trees, clustering algorithms, or convolutional neural networks (CNNs) for images.

Thursday, 19 December 2024

Retrieval-Augmented Generation (RAG)

RAG, or Retrieval-Augmented Generation, is a machine learning approach that combines retrieval and generation to enhance how AI models handle tasks, especially in areas like question answering or document summarization.

Imagine it like this:

  1. Retrieval: Think of it as looking things up. The AI first searches for relevant pieces of information (from a database, a set of documents, or even the internet) based on what you're asking. This step ensures the AI doesn't just "guess" but bases its response on real data.

  2. Generation: After gathering the relevant info, the AI then uses a language model (like ChatGPT) to craft a natural, coherent, and helpful response.

Why use RAG?

Traditional AI models can sometimes "hallucinate," which means they confidently provide incorrect answers. By grounding their responses in real, retrieved information, RAG helps make the output more accurate and reliable.

Example:

You ask: "What are the key benefits of solar energy?"

  1. Retrieval: The AI fetches data from articles or documents that specifically talk about solar energy benefits.
  2. Generation: It summarizes the information and responds: "Solar energy is renewable, reduces electricity bills, and is environmentally friendly since it reduces carbon emissions."

It’s like combining a librarian’s ability to find the best sources with a writer’s ability to summarize and explain those sources.

Tuesday, 17 December 2024

Types of Artificial Neural Networks

Feature Feedforward Neural Networks Recurrent Neural Networks (RNNs) Long Short-Term Memory (LSTM) Networks
Structure Data flows in one direction, no cycles Contains loops to allow cycles, facilitating sequence handling Similar to RNNs but with additional gates and memory cells
Data Flow Moves from input to output without feedback Incorporates feedback by taking current and previous states as input Similar to RNNs with advanced gating mechanisms to control data flow
Memory No memory, treats each input independently Has memory, capable of retaining information from previous inputs Enhanced memory with forget, input, and output gates to manage long-term dependencies
Use Cases Image classification, simple regression Time series prediction, natural language processing, speech recognition Tasks with long-term dependencies, improved sequence learning
Complexity Simpler to train and understand More complex due to issues like vanishing/exploding gradients More complex than standard RNNs but better at handling long sequences
Handling of Sequences Not well-suited for sequential data Designed to handle sequences and time-dependent data Excellent for handling long sequences due to improved memory management
Architecture Variants Typically standard architecture Variants include GRU to address training challenges A specific architecture variant of RNN designed to overcome shortcomings of basic RNNs
Limitations Cannot handle sequential data, lacks memory and context awareness Prone to vanishing and exploding gradient problems, struggles with long-term dependencies Higher computational cost and complexity, requires more data and resources for training

Monday, 16 December 2024

Key Panda functions

Function Description Example
pd.DataFrame()
Creates a DataFrame, a 2D labeled data structure, 
            from a variety of inputs like dictionaries or arrays.
import pandas as pd
data = {'A': [1, 2], 'B': [3, 4]}
df = pd.DataFrame(data)
print(df)
                
pd.read_csv() Reads a CSV file and creates a DataFrame.
df = pd.read_csv('file.csv')
print(df.head())
                
df.head() Displays the first n rows of the DataFrame (default is 5).
print(df.head(3))
                
df.tail() Displays the last n rows of the DataFrame (default is 5).
print(df.tail(2))
                
df.info() Provides a summary of the DataFrame, including column data types and non-null values.
df.info()
                
df.describe() Generates descriptive statistics for numeric columns.
print(df.describe())
                
df.isnull() Returns a DataFrame indicating where null values are present.
print(df.isnull())
                
df.fillna() Replaces NaN values with a specified value.
df = df.fillna(0)
print(df)
                
df.dropna() Removes rows or columns with NaN values.
df = df.dropna()
print(df)
                
df.loc[] Accesses a group of rows and columns by labels or boolean array.
print(df.loc[0, 'A'])
                
df.iloc[] Accesses rows and columns by integer index positions.
print(df.iloc[0, 1])
                
df.sort_values() Sorts the DataFrame by a specified column or columns.
df = df.sort_values(by='A')
print(df)
                
df.groupby() Groups DataFrame rows based on a column and allows for aggregation.
grouped = df.groupby('A').sum()
print(grouped)
                
df.merge() Merges two DataFrames on a key or keys.
df1.merge(df2, on='key')
                
df.concat() Concatenates DataFrames along a specified axis.
pd.concat([df1, df2], axis=0)
                
df.pivot() Creates a pivot table based on unique values of columns.
pivot = df.pivot(index='A', columns='B', values='C')
print(pivot)
                
df.apply() Applies a function along an axis (row-wise or column-wise).
df['A'] = df['A'].apply(lambda x: x*2)
print(df)
                
df.value_counts() Returns a Series with the count of unique values.
print(df['A'].value_counts())
                
df.corr() Computes the pairwise correlation of columns in the DataFrame.
print(df.corr())
                
pd.to_datetime() Converts a column or series to a datetime object.
df['date'] = pd.to_datetime(df['date'])
print(df)
                
df.plot() Generates plots for visualizing DataFrame or Series data (requires Matplotlib).
df['A'].plot(kind='line')
plt.show()
                
df.to_csv() Writes a DataFrame to a CSV file.
df.to_csv('output.csv', index=False)
                
df.drop() Removes specified labels from rows or columns.
df = df.drop(columns=['A'])
print(df)
                
df.rename() Renames columns or indices.
df = df.rename(columns={'A': 'New_A'})
print(df)
                
df.duplicated() Checks for duplicate rows in the DataFrame.
print(df.duplicated())
                
df.nunique() Counts the number of unique values in each column.
print(df.nunique())
                
pd.cut() Bins continuous values into discrete intervals.
df['bins'] = pd.cut(df['A'], bins=3)
print(df)
                
pd.get_dummies() Converts categorical variables into dummy/indicator variables.
df = pd.get_dummies(df, columns=['Category'])
print(df)
                

Tuesday, 3 December 2024

Algorithm Complexities

Algorithm Complexities

Sorting Algorithms

Algorithm Time Complexity (Best) Time Complexity (Average) Time Complexity (Worst) Space Complexity
Bubble Sort O(n) O(n2) O(n2) O(1)
Insertion Sort O(n) O(n2) O(n2) O(1)
Selection Sort O(n2) O(n2) O(n2) O(1)
Merge Sort O(n log n) O(n log n) O(n log n) O(n)
Quick Sort O(n log n) O(n log n) O(n2) O(log n)*
Heap Sort O(n log n) O(n log n) O(n log n) O(1)
Radix Sort O(nk)** O(nk) O(nk) O(n + k)
Counting Sort O(n + k) O(n + k) O(n + k) O(k)
Bucket Sort O(n + k) O(n + k) O(n2) O(n)

Searching Algorithms

Algorithm Time Complexity (Best) Time Complexity (Average) Time Complexity (Worst) Space Complexity
Linear Search O(1) O(n) O(n) O(1)
Binary Search O(1) O(log n) O(log n) O(1)
Jump Search O(1) O(√n) O(√n) O(1)
Interpolation Search O(1) O(log log n) O(n) O(1)
Exponential Search O(1) O(log n) O(log n) O(log n)

Tree Traversals

Traversal Type Time Complexity Space Complexity
Inorder O(n) O(h)*
Preorder O(n) O(h)
Postorder O(n) O(h)
Level Order O(n) O(w)**

Data Structures Operations

Data Structure Search (Avg/Worst) Insert (Avg/Worst) Delete (Avg/Worst) Space Complexity
Array O(1)/O(n) O(n)/O(n) O(n)/O(n) O(n)
Stack (Array-based) O(n)/O(n) O(1)/O(1) O(1)/O(1) O(n)
Queue (Array-based) O(n)/O(n) O(1)/O(1) O(1)/O(1) O(n)
Singly Linked List O(n)/O(n) O(1)/O(1) O(1)/O(1) O(n)
Doubly Linked List O(n)/O(n) O(1)/O(1) O(1)/O(1) O(n)
Binary Search Tree O(log n)/O(n) O(log n)/O(n) O(log n)/O(n) O(n)
AVL Tree O(log n)/O(log n) O(log n)/O(log n) O(log n)/O(log n) O(n)
Hash Table O(1)/O(n) O(1)/O(n) O(1)/O(n) O(n)

* \(h\) is the height of the tree.

** \(w\) is the maximum width of the tree.

** \(k\) is the number of digits in the input (for Radix Sort).

Monday, 2 December 2024

Data Structures and Algorithms (DSA) - Cheat sheet

DSA Cheat Sheet

Data Structures

Arrays

An array is an ordered collection of elements, each identified by an index. It's a simple data structure used to store a collection of data.

Example: Array: [10, 20, 30, 40, 50]

Accessing the third element (index 2): array[2] results in 30.

Index:  0   1   2   3   4
Array: [10, 20, 30, 40, 50]
        

Linked Lists

A linked list is a linear collection of nodes, where each node contains data and a pointer to the next node.

Example: A simple singly linked list: 10 -> 20 -> 30

Node:  [10|*] -> [20|*] -> [30|NULL]
        

Stacks

A stack is a LIFO (Last-In-First-Out) data structure where the last element added is the first one to be removed.

Initial Stack:  |       |    Push 10  =>     |  10  |    Pop   =>      |      |
                |_______|    Push 20  =>     |  20  |    Result:      Empty Stack
                             |_______|       |  10  |
        

Queues

A queue is a FIFO (First-In-First-Out) data structure where the first element added is the first one to be removed.

Initial Queue:  |       |  Enqueue 10 =>   Front->|  10  |    Dequeue =>   |      |
                |_______|  Enqueue 20 =>          |  20  |    Result:      Empty Queue
                              |_______|           |______|
        

Trees

A tree is a hierarchical data structure with a root node and sub-nodes. Trees are non-linear and can have multiple levels.

Example: Binary Tree

    10
   /  \
  5    15
 / \   / \
2   7 12  20
        

Graphs

A graph is a non-linear data structure consisting of nodes (vertices) and edges that connect pairs of nodes.

Example: Directed Graph

(0) -> (1)
 |     ^
 v    /
(2)  /
    v
   (3)
        

Hash Tables

A hash table stores key-value pairs and uses a hash function to map keys to indices in an array.

Example: Hash table storing student IDs and names

{123: "Alice", 456: "Bob", 789: "Charlie"}
        

Algorithms

Sorting

Bubble Sort

Repeatedly steps through the list, compares adjacent elements, and swaps them if they are in the wrong order.

Sorting [5, 2, 9, 1]
Pass 1: [2, 5, 1, 9]
Pass 2: [2, 1, 5, 9]
Pass 3: [1, 2, 5, 9]
        
Selection Sort

Selects the smallest element from unsorted sublist and swaps it with the leftmost unsorted element.

Sorting [5, 2, 9, 1]
Pass 1: [1, 2, 9, 5]
Pass 2: [1, 2, 9, 5]
Pass 3: [1, 2, 5, 9]
        
Insertion Sort

Builds the final sorted array one item at a time by picking elements and inserting them into their correct position.

Sorting [5, 2, 9, 1]
Pass 1: [2, 5, 9, 1]
Pass 2: [2, 5, 9, 1]
Pass 3: [1, 2, 5, 9]
        
Merge Sort

Divides the array into halves, sorts them, and merges them back together.

Split:   [5, 2] [9, 1]
Merge:   [2, 5] [1, 9]
Result:  [1, 2, 5, 9]
        
Quick Sort

Picks an element as pivot and partitions the array around the pivot.

Pivot:     [5]
Partition: [2, 1] [9]
Result:    [1, 2, 5, 9]
        
Heap Sort

Builds a heap from the input data and repeatedly extracts the maximum element to sort it.

Build Heap:   [9, 5, 2, 1]
Extract Max:  [5, 2, 1] -> [1, 2, 5, 9]
        

Searching

Linear Search

Sequentially checks each element of the list until the desired element is found.

Searching for 9 in [5, 2, 9, 1]
Steps: Check each element from left to right until 9 is found.
        
Binary Search

Efficient algorithm for finding an item in a sorted array. It repeatedly divides the search interval in half.

Searching for 9 in [1, 2, 5, 9]
Initial: [1, 2, 5, 9]   -> Mid: 5
Search:  [9]            -> Mid: 9
        

Recursion

A function calls itself directly or indirectly to solve a problem. Example: Factorial of a number n! = n * (n-1)!

Base Case: 0! = 1
Recursive Case: n! = n * (n-1)!
        

Dynamic Programming

Solves problems by breaking them down into simpler subproblems and storing the results of these subproblems to avoid computing the same results multiple times.

Example: Fibonacci sequence using memoization

F(5) = F(4) + F(3)
F(4) = F(3) + F(2)
F(3) = F(2) + F(1)
F(2) = F(1) + F(0)
        

Greedy Algorithms

Makes the best choice at each step with the hope of finding the best overall solution. Example: Coin change problem with minimal coins for a given amount.

Amount: 11
Coins:  [1, 5, 10]
Steps:  10 -> 1
        

Backtracking

Tries to build a solution step by step and removes solutions that fail to satisfy the constraints. Example: Solving a maze or Sudoku.

Path: [Start -> 1 -> 2 -> 3]
Backtrack: [Start -> 1 -> 4]
        

Time and Space Complexity

Big O Notation describes the performance or complexity of an algorithm in terms of time or space as the input size grows. Space Complexity is the amount of memory an algorithm uses relative to the input size.

Example:

  • Time Complexity of Bubble Sort: O(n²)
  • Space Complexity of Merge Sort: O(n)

Important Tips

  • Practice Regularly: Keep coding to reinforce knowledge.
  • Understand Complexity: Analyze the time and space requirements.
  • Debugging Tools: Use tools like print statements, debuggers.
  • Problem Breakdown: Divide complex problems into smaller, manageable parts.
  • Data Structure Choice: Choose structures that optimize performance.
  • Code Efficiency: Write clean and efficient code.
  • Collaboration: Work with peers to learn new techniques.