C RUBY-ON-RAILS MYSQL ASP.NET DEVELOPMENT RUBY .NET LINUX SQL-SERVER REGEX WINDOWS ALGORITHM ECLIPSE VISUAL-STUDIO STRING SVN PERFORMANCE APACHE-FLEX UNIT-TESTING SECURITY LINQ UNIX MATH EMAIL OOP LANGUAGE-AGNOSTIC VB6 MSBUILD

# java cosine similarity problem

By : user3848302
Date : October 26 2020, 11:51 AM
wish help you to fix your issue I'm not sure of your implementation but the cosine distance of two vectors is equal to the normalized dot product of those vectors.
The dot product of two matrix can be expressed as a . b = aTb. As a result if the matrix have different length you can't take the dot product to identify the cosine.
code :

Share :

## Cosine Similarity in Java

By : user3255226
Date : March 29 2020, 07:55 AM
around this issue You want to compute the similarities between the given row and each row in the Matrix. Hence, inner product and norms must be computed getRowDimension times.
But the initializations are in the wrong place - move them into the loop over all rows.
code :
``````private ArrayList cosineSimilarity(int rowIndex, Matrix D) {
ArrayList<Double> similarRows = new ArrayList<>();

for(int row = 0; row < D.getRowDimension(); row++){
double dotProduct = 0.0, firstNorm = 0.0, secondNorm = 0.0;
for (int column = 0; column < D.getColumnDimension(); column++) {
dotProduct += (D.get(rowIndex, column) * D.get(row, column));
firstNorm += pow(D.get(rowIndex, column),2);
secondNorm += pow(D.get(row, column), 2);
// Matrix f = D.getMatrix(row, column);
}
double cosinSimilarity = (dotProduct / (sqrt(firstNorm) * sqrt(secondNorm)));
}
``````

## Choice between an adjusted cosine similarity vs regular cosine similarity

By : Greg King
Date : March 29 2020, 07:55 AM
Hope this helps
Why would a regular cosine similarity result in a positive number for such 'different' items?
code :
``````from scipy import spatial
import numpy as np
a = np.array([2.0,1.0])
b = np.array([5.0,3.0])
1 - spatial.distance.cosine(a,b)
#----------------------
# 0.99705448550158149
#----------------------
c = np.array([5.0,4.0])
1 - spatial.distance.cosine(c,b)
#----------------------
# 0.99099243041032326
#----------------------
``````
``````mean_ab = sum(sum(a,b)) / 4
# mean_ab : 3.5
# adjusted vectors : [-1.5, -2.5] , [1.5, -0.5]
1 - spatial.distance.cosine(a - mean_ab, b - mean_ab)
#----------------------
# -0.21693045781865616
#----------------------
mean_cb = sum(sum(c,b)) / 4
# mean_cb : 6.5
# adjusted vectors : [-1.5, -3.5] , [-1.5, -2.5]
1 - spatial.distance.cosine(c - mean_cb, b - mean_cb)
#----------------------
# 0.99083016804429891
#----------------------
``````

## Python, Cosine Similarity to Adjusted Cosine Similarity

By : SL3
Date : March 29 2020, 07:55 AM
To fix this issue Here's a NumPy based solution to your problem.
First we store rating data into an array:
code :
``````fruits = np.asarray(['Apple', 'Orange', 'Pear', 'Grape', 'Melon'])
M = np.asarray(data.loc[:, fruits])
``````
``````M_u = M.mean(axis=1)
item_mean_subtracted = M - M_u[:, None]
similarity_matrix = 1 - squareform(pdist(item_mean_subtracted.T, 'cosine'))
``````
``````indices = np.fliplr(np.argsort(similarity_matrix, axis=1)[:,:-1])
result = np.hstack((fruits[:, None], fruits[indices]))
``````
``````In [49]: M
Out[49]:
array([[ 0, 10,  0,  1,  0],
[ 6,  0,  0,  0,  2],
[ 1,  0, 20,  0,  1],
[ 0,  3,  6,  0, 18],
[ 3,  0,  2,  0,  0],
[ 0,  2,  0,  5,  0]])

In [50]: np.set_printoptions(precision=2)

In [51]: similarity_matrix
Out[51]:
array([[ 1.  ,  0.01, -0.41,  0.48, -0.44],
[ 0.01,  1.  , -0.57,  0.37, -0.26],
[-0.41, -0.57,  1.  , -0.56, -0.19],
[ 0.48,  0.37, -0.56,  1.  , -0.51],
[-0.44, -0.26, -0.19, -0.51,  1.  ]])

In [52]: result
Out[52]:
array([['Apple', 'Grape', 'Orange', 'Pear', 'Melon'],
['Orange', 'Grape', 'Apple', 'Melon', 'Pear'],
['Pear', 'Melon', 'Apple', 'Grape', 'Orange'],
['Grape', 'Apple', 'Orange', 'Melon', 'Pear'],
['Melon', 'Pear', 'Orange', 'Apple', 'Grape']],
dtype='|S6')
``````

## Problem applying UDF cosine similarity to grouped ML vectors in Pyspark

By : user2437352
Date : March 29 2020, 07:55 AM
Hope that helps That's because Spark SQL doesn't support NumPy types. You should convert values to float before returning
code :
``````@F.udf(ArrayType(DoubleType()))
def dot_group(M):
combs = combinations(M, 2)
return [
# or float(i.dot(j) / (LA.norm(i) * LA.norm(j)))
(i.dot(j) / (LA.norm(i) * LA.norm(j))).tolist()
for i, j in combs
]
``````

## Search the similarity of 2 strings in java using part of word matching, not cosine similarity

By : user3480692
Date : March 29 2020, 07:55 AM
hop of those help? For each search string, split it into words using haystack.split("\\s+") (\\s+ is regexp-ese for 'the strings are separated by whitespace').
Then, to obtain a 'score' you need 2 numbers: How many words matched, and how many words there are total. You sort descending on first, and ascending on last, which gets you the behaviour you seem to want.
code :
``````String[] needle = "super cold white snow".split("\\s+");
String[] haystack = "white image superdupercold".split("\\s+");
int matchedWords = 0, totalWords = haystack.length;
for (String n : needle) {
boolean found = false;
for (String hay : haystack) {
if (hay.contains(n)) {
found = true;
break;
}
}
if (found) matchedWords++;
}
``````
``````private static final long MULTIPLIER = 0x100000000L;
long score = MULTIPLIER * matchedWords + (Integer.MAX_VALUE - totalWords);
``````
``````@Value
class Result { String needle; int words, total; }

list.sort(
Comparator.comparing(Result::getWords).reversed().
thenComparing(Comparator.comparing(Result::getTotal));

list.stream().map(Result::getNeedle).forEach(System.out::println);
``````