Cosine Similarity in Java
By : user3255226
Date : March 29 2020, 07:55 AM
around this issue You want to compute the similarities between the given row and each row in the Matrix. Hence, inner product and norms must be computed getRowDimension times. But the initializations are in the wrong place  move them into the loop over all rows. code :
private ArrayList cosineSimilarity(int rowIndex, Matrix D) {
ArrayList<Double> similarRows = new ArrayList<>();
for(int row = 0; row < D.getRowDimension(); row++){
double dotProduct = 0.0, firstNorm = 0.0, secondNorm = 0.0;
for (int column = 0; column < D.getColumnDimension(); column++) {
dotProduct += (D.get(rowIndex, column) * D.get(row, column));
firstNorm += pow(D.get(rowIndex, column),2);
secondNorm += pow(D.get(row, column), 2);
// Matrix f = D.getMatrix(row, column);
}
double cosinSimilarity = (dotProduct / (sqrt(firstNorm) * sqrt(secondNorm)));
similarRows.add(row, cosinSimilarity);
}

Choice between an adjusted cosine similarity vs regular cosine similarity
By : Greg King
Date : March 29 2020, 07:55 AM
Hope this helps Why would a regular cosine similarity result in a positive number for such 'different' items? code :
from scipy import spatial
import numpy as np
a = np.array([2.0,1.0])
b = np.array([5.0,3.0])
1  spatial.distance.cosine(a,b)
#
# 0.99705448550158149
#
c = np.array([5.0,4.0])
1  spatial.distance.cosine(c,b)
#
# 0.99099243041032326
#
mean_ab = sum(sum(a,b)) / 4
# mean_ab : 3.5
# adjusted vectors : [1.5, 2.5] , [1.5, 0.5]
1  spatial.distance.cosine(a  mean_ab, b  mean_ab)
#
# 0.21693045781865616
#
mean_cb = sum(sum(c,b)) / 4
# mean_cb : 6.5
# adjusted vectors : [1.5, 3.5] , [1.5, 2.5]
1  spatial.distance.cosine(c  mean_cb, b  mean_cb)
#
# 0.99083016804429891
#

Python, Cosine Similarity to Adjusted Cosine Similarity
By : SL3
Date : March 29 2020, 07:55 AM
To fix this issue Here's a NumPy based solution to your problem. First we store rating data into an array: code :
fruits = np.asarray(['Apple', 'Orange', 'Pear', 'Grape', 'Melon'])
M = np.asarray(data.loc[:, fruits])
M_u = M.mean(axis=1)
item_mean_subtracted = M  M_u[:, None]
similarity_matrix = 1  squareform(pdist(item_mean_subtracted.T, 'cosine'))
indices = np.fliplr(np.argsort(similarity_matrix, axis=1)[:,:1])
result = np.hstack((fruits[:, None], fruits[indices]))
In [49]: M
Out[49]:
array([[ 0, 10, 0, 1, 0],
[ 6, 0, 0, 0, 2],
[ 1, 0, 20, 0, 1],
[ 0, 3, 6, 0, 18],
[ 3, 0, 2, 0, 0],
[ 0, 2, 0, 5, 0]])
In [50]: np.set_printoptions(precision=2)
In [51]: similarity_matrix
Out[51]:
array([[ 1. , 0.01, 0.41, 0.48, 0.44],
[ 0.01, 1. , 0.57, 0.37, 0.26],
[0.41, 0.57, 1. , 0.56, 0.19],
[ 0.48, 0.37, 0.56, 1. , 0.51],
[0.44, 0.26, 0.19, 0.51, 1. ]])
In [52]: result
Out[52]:
array([['Apple', 'Grape', 'Orange', 'Pear', 'Melon'],
['Orange', 'Grape', 'Apple', 'Melon', 'Pear'],
['Pear', 'Melon', 'Apple', 'Grape', 'Orange'],
['Grape', 'Apple', 'Orange', 'Melon', 'Pear'],
['Melon', 'Pear', 'Orange', 'Apple', 'Grape']],
dtype='S6')

Problem applying UDF cosine similarity to grouped ML vectors in Pyspark
By : user2437352
Date : March 29 2020, 07:55 AM
Hope that helps That's because Spark SQL doesn't support NumPy types. You should convert values to float before returning code :
@F.udf(ArrayType(DoubleType()))
def dot_group(M):
combs = combinations(M, 2)
return [
# or float(i.dot(j) / (LA.norm(i) * LA.norm(j)))
(i.dot(j) / (LA.norm(i) * LA.norm(j))).tolist()
for i, j in combs
]

Search the similarity of 2 strings in java using part of word matching, not cosine similarity
By : user3480692
Date : March 29 2020, 07:55 AM
hop of those help? For each search string, split it into words using haystack.split("\\s+") (\\s+ is regexpese for 'the strings are separated by whitespace'). Then, to obtain a 'score' you need 2 numbers: How many words matched, and how many words there are total. You sort descending on first, and ascending on last, which gets you the behaviour you seem to want. code :
String[] needle = "super cold white snow".split("\\s+");
String[] haystack = "white image superdupercold".split("\\s+");
int matchedWords = 0, totalWords = haystack.length;
for (String n : needle) {
boolean found = false;
for (String hay : haystack) {
if (hay.contains(n)) {
found = true;
break;
}
}
if (found) matchedWords++;
}
private static final long MULTIPLIER = 0x100000000L;
long score = MULTIPLIER * matchedWords + (Integer.MAX_VALUE  totalWords);
@Value
class Result { String needle; int words, total; }
list.sort(
Comparator.comparing(Result::getWords).reversed().
thenComparing(Comparator.comparing(Result::getTotal));
list.stream().map(Result::getNeedle).forEach(System.out::println);

