What is the difference between Manhattan Distance vs Euclidean Distance?

231    Asked by ElizabethClarke in Devops , Asked on Jul 15, 2024

I am a data engineer and I am currently working on a machine learning-based project within the Azure DevOps pipeline. My team is developing a recommendation system that can suggest products based on the preferences of the users. Two distance Metrics are under consideration for the algorithms of recommendation. The first is the Manhattan Distance and the second one is the Euclidean Distance. How should I compare the difference between these two distance Metrics so that I can choose the right one for my particular project? 

Answered by Diya tomar

In the context of DevOps, here is a detailed comparison given between Manhattan Distance vs Euclidean Distance:-

Data characteristics

The Manhattan Distance can sum the absolute differences between the coordinates. It is very much recommendable for the grid-like pathing. It is less sensitive to outliers and it can handle the high dimensional data well.

On the other hand, the Euclidean Distance can calculate the straight line distance between the two points in Euclidean space. It is overall more sensitive to the magnitude of differences and outliers.

Computational efficiency

The Manhattan Distance is generally faster for the high dimensional data as it can avoid the squiring and square root operations.

The Euclidean Distance is more computationally intensive due to the squaring of each difference of dimensions and taking the squares root of their sum.

Integration with Azure DevOps

You can set up a pipeline in Azure DevOps to automate the comparison of these metrics. The pipeline can use the script for preprocessing data, applying each distance Metrics, and comparing their performances.

Setting up the comparison in Azure DevOps

Here is a Python-based script given:-

Import numpy as np
Def manhattan_distance(vector1, vector2):
    Return np.sum(np.abs(vector1 – vector2))
Def euclidean_distance(vector1, vector2):
    Return np.sqrt(np.sum((vector1 – vector2) ** 2))

You can use the YAML to configure an Azure DevOps pipeline that can run these distance calculations. Here is a simplified YAML Configuration for a Python based script Given:-

Trigger:
Main
Pool:
  vmImage: ‘ubuntu-latest’
steps:
Task: UsePythonVersion@0
  Inputs:
    versionSpec: ‘3.x’
    addToPath: true
Script: |
    Pip install numpy
    Python compare_distances.py
  displayName: ‘Run Distance Comparison Script’

You can write a script “compare_ distance.py” for running the distance Metrics and compare their performances:-

Import numpy as np
# Sample user and product data
User_vector = np.array([1, 2, 3, 4, 5])
Product_vectors = np.array([
    [2, 3, 4, 5, 6],
    [1, 1, 1, 1, 1],
    [5, 5, 5, 5, 5]
])
Manhattan_results = []
Euclidean_results = []
For product_vector in product_vectors:
    Manhattan_results.append(manhattan_distance(user_vector, product_vector))
    Euclidean_results.append(euclidean_distance(user_vector, product_vector))
Print(“Manhattan Distances:”, manhattan_results)
Print(“Euclidean Distances:”, euclidean_results)
# Evaluate performance (e.g., accuracy, runtime, etc.)
Here is the java based approach given to compare both:-
Import java.util.Arrays;
Public class DistanceComparison {
    // Function to calculate Manhattan distance
    Public static double manhattanDistance(double[] vector1, double[] vector2) {
        Double sum = 0;
        For (int I = 0; I < vector1 xss=removed xss=removed xss=removed xss=removed xss=removed xss=removed xss=removed xss=removed xss=removed xss=removed xss=removed>
    }
    // Function to evaluate and compare performance
    Public static void evaluatePerformance(double[] userVector, double[][] productVectors,
                                           Double[] manhattanResults, double[] euclideanResults) {
        // Example performance metrics
        Double manhattanSum = Arrays.stream(manhattanResults).sum();
        Double euclideanSum = Arrays.stream(euclideanResults).sum();
        System.out.println(“Total Manhattan Distance: “ + manhattanSum);
        System.out.println(“Total Euclidean Distance: “ + euclideanSum);
        // Further evaluation can be added here, e.g., runtime analysis, accuracy metrics, etc.
    }
}


Your Answer