logo
down
shadow

Numpy group by multiple vectors, get group indices


Numpy group by multiple vectors, get group indices

By : chengchao
Date : November 22 2020, 04:01 AM
I wish this helpful for you After using np.stack on the arrays a and b, if you set the parameter return_inverse to True in np.unique then it is the output you are looking for:
code :
a = np.array([1,2,1,1,1,2,3,1])
b = np.array([1,2,2,2,2,3,3,2])
_, inv = np.unique(np.stack([a,b]), axis=1, return_inverse=True)
print (inv)

array([0, 2, 1, 1, 1, 3, 4, 1], dtype=int64)
def group_np_sum(groupcols):
    groupcols_max = np.cumprod([ar.max()+1 for ar in groupcols[:-1]])
    return np.unique( sum([groupcols[0]] +
                          [ ar*m for ar, m in zip(groupcols[1:],groupcols_max)]), 
                      return_inverse=True)[1]
a = np.array([1,2,1,1,1,2,3,1])
b = np.array([1,2,2,2,2,3,3,2])
print (group_np_sum([a,b]))
array([0, 2, 1, 1, 1, 3, 4, 1], dtype=int64)
a = np.array([3,2,1,1,1,2,3,1])
b = np.array([1,2,2,2,2,3,3,2])
print(group_np2([a,b]))
print (group_np_sum([a,b]))
array([3, 1, 0, 0, 0, 2, 4, 0], dtype=int64)
array([0, 2, 1, 1, 1, 3, 4, 1], dtype=int64)
a = np.random.randint(1, 100, 30000)
b = np.random.randint(1, 100, 30000)
c = np.random.randint(1, 100, 30000)
groupcols = [a,b,c]

%timeit group_pd(groupcols)
#13.7 ms ± 1.22 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit group_np2(groupcols)
#34.2 ms ± 6.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit group_np_sum(groupcols)
#3.63 ms ± 562 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Share : facebook icon twitter icon
Group by multiple columns, get group total count and specific column from last two rows in each group

Group by multiple columns, get group total count and specific column from last two rows in each group


By : user56946
Date : March 29 2020, 07:55 AM
around this issue I have an SQL Server table with the following columns: , I would attempt this by using the following WITH clause:
code :
WITH RUL AS (
select
  UserId,
  Area,
  Action,
  ObjectId,
  RelatedUserLink as RelatedUserLink1,

  LAG(RelatedUserLink) OVER (PARTITION BY UserId, Area, Action, ObjectId ORDER BY Created) as RelatedUserLink2,

  ROW_NUMBER() OVER (PARTITION BY UserId, Area, Action, ObjectId ORDER BY Created DESC) latest_to_earliest,

  MAX(Created) OVER (PARTITION BY UserId, Area, Action, ObjectId) as Created,

  COUNT(*) OVER OVER (PARTITION BY UserId, Area, Action, ObjectId) as Count

from
  Notification
where UserId = 10
)
select 
  UserId,
  Area,
  Action,
  ObjectId,
  RelatedUserLink1,
  RelatedUserLink2,
  Created,
  Count
from 
  RUL 
where 
  latest_to_earliest = 1;
Group by and aggregate problems for numpy arrays over word vectors

Group by and aggregate problems for numpy arrays over word vectors


By : Asmita
Date : March 29 2020, 07:55 AM
wish helps you There is a bug in your code. Inside your lambda function you sum across the entire dataframe instead of just the group. This should fix things:
code :
movie_groupby = movie_data.groupby('movie_id').agg(lambda v: np.sum(v['textvec']))
Means within each group with more than 1 column of group indices

Means within each group with more than 1 column of group indices


By : user6240922
Date : March 29 2020, 07:55 AM
wish of those help Another option, you can use apply(because you already have a matrix) to loop through columns( with Margin set to 2) and pass the column to ave function as group variable, you can either explicitly specify FUN parameter to be mean or not specify it as mean is the default function used:
code :
apply(groupings, 2, ave, x = var)  # pass the var as a named parameter since it is the 
                                   # parameter at the first position of ave function, if not
                                   # ave will treat the column as the first position parameter
                                   # which you don't want to

 #      [,1]      [,2]   [,3]
 #[1,] 0.630 0.5940000 0.5625
 #[2,] 0.625 0.5940000 0.5625
 #[3,] 0.470 0.5940000 0.5625
 #[4,] 0.630 0.7900000 0.6500
 #[5,] 0.470 0.4166667 0.5650
 #[6,] 0.625 0.5940000 0.5650
 #[7,] 0.470 0.4166667 0.5650
 #[8,] 0.625 0.7900000 0.5650
 #[9,] 0.630 0.5940000 0.5625
#[10,] 0.625 0.4166667 0.6400
library(dplyr)
mutate_all(as.data.frame(groupings), funs(ave(var, .)))

#      V1        V2     V3
#1  0.630 0.5940000 0.5625
#2  0.625 0.5940000 0.5625
#3  0.470 0.5940000 0.5625
#4  0.630 0.7900000 0.6500
#5  0.470 0.4166667 0.5650
#6  0.625 0.5940000 0.5650
#7  0.470 0.4166667 0.5650
#8  0.625 0.7900000 0.5650
#9  0.630 0.5940000 0.5625
#10 0.625 0.4166667 0.6400
What is the fastest way to map group names of numpy array to indices?

What is the fastest way to map group names of numpy array to indices?


By : user3478615
Date : March 29 2020, 07:55 AM
This might help you Constant number of indices per group Approach #1
We can perform dimensionality-reduction to reduce cubes to a 1D array. This is based on a mapping of the given cubes data onto a n-dim grid to compute the linear-index equivalents, discussed in detail here. Then, based on the uniqueness of those linear indices, we can segregate unique groups and their corresponding indices. Hence, following those strategies, we would have one solution, like so -
code :
N = 4 # number of indices per group
c1D = np.ravel_multi_index(cubes.T, cubes.max(0)+1)
sidx = c1D.argsort()
indices = sidx.reshape(-1,N)
unq_groups = cubes[indices[:,0]]

# If you need in a zipped dictionary format
out = dict(zip(map(tuple,unq_groups), indices))
s1,s2 = cubes[:,:2].max(0)+1
s = np.r_[s2,1,s1*s2]
c1D = cubes.dot(s)
from scipy.spatial import cKDTree

idx = cKDTree(cubes).query(cubes, k=N)[1] # N = 4 as discussed earlier
I = idx[:,0].argsort().reshape(-1,N)[:,0]
unq_groups,indices = cubes[I],idx[I]
c1D = np.ravel_multi_index(cubes.T, cubes.max(0)+1)

sidx = c1D.argsort()
c1Ds = c1D[sidx]
split_idx = np.flatnonzero(np.r_[True,c1Ds[:-1]!=c1Ds[1:],True])
grps = cubes[sidx[split_idx[:-1]]]

indices = [sidx[i:j] for (i,j) in zip(split_idx[:-1],split_idx[1:])]
# If needed as dict o/p
out = dict(zip(map(tuple,grps), indices))
def numpy1(cubes):
    c1D = np.ravel_multi_index(cubes.T, cubes.max(0)+1)        
    sidx = c1D.argsort()
    c1Ds = c1D[sidx]
    mask = np.r_[True,c1Ds[:-1]!=c1Ds[1:],True]
    split_idx = np.flatnonzero(mask)
    indices = [sidx[i:j] for (i,j) in zip(split_idx[:-1],split_idx[1:])]
    out = dict(zip(c1Ds[mask[:-1]],indices))
    return out
from numba import  njit

@njit
def _numba1(sidx, c1D):
    out = []
    n = len(sidx)
    start = 0
    grpID = []
    for i in range(1,n):
        if c1D[sidx[i]]!=c1D[sidx[i-1]]:
            out.append(sidx[start:i])
            grpID.append(c1D[sidx[start]])
            start = i
    out.append(sidx[start:])
    grpID.append(c1D[sidx[start]])
    return grpID,out

def numba1(cubes):
    c1D = np.ravel_multi_index(cubes.T, cubes.max(0)+1)
    sidx = c1D.argsort()
    out = dict(zip(*_numba1(sidx, c1D)))
    return out
from numba import types
from numba.typed import Dict

int_array = types.int64[:]

@njit
def _numba2(sidx, c1D):
    n = len(sidx)
    start = 0
    outt = Dict.empty(
        key_type=types.int64,
        value_type=int_array,
    )
    for i in range(1,n):
        if c1D[sidx[i]]!=c1D[sidx[i-1]]:
            outt[c1D[sidx[start]]] = sidx[start:i]
            start = i
    outt[c1D[sidx[start]]] = sidx[start:]
    return outt

def numba2(cubes):
    c1D = np.ravel_multi_index(cubes.T, cubes.max(0)+1)    
    sidx = c1D.argsort()
    out = _numba2(sidx, c1D)
    return out
In [4]: cubes = np.load('cubes.npz')['array']

In [5]: %timeit numpy1(cubes)
   ...: %timeit numba1(cubes)
   ...: %timeit numba2(cubes)
2.38 s ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.13 s ± 25.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.8 s ± 5.95 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
import numexpr as ne

s0,s1 = cubes[:,0].max()+1,cubes[:,1].max()+1
d = {'s0':s0,'s1':s1,'c0':cubes[:,0],'c1':cubes[:,1],'c2':cubes[:,2]}
c1D = ne.evaluate('c0+c1*s0+c2*s0*s1',d)
Convert indices to vectors in Numpy

Convert indices to vectors in Numpy


By : RUGIGANA SIMON PETER
Date : March 29 2020, 07:55 AM
I think the issue was by ths following , A fairly common way to do this in NumPy is to compare data with arange and cast the boolean array to integer type:
Related Posts Related Posts :
  • Pythonic way set variables if none in __init__
  • Python remove duplicate entries from list within a list
  • I'm trying to perform certain pattern matching using python's re module
  • Format Google Calendar Event Date
  • How to remove apostrophe's when writing to csv file in Python
  • How to graph the second derivatives of coupled non-linear second order ODEs in Python?
  • Full gradient descent in keras
  • How to manually calculate AUC of the ROC?
  • Python http.server command gives "Syntax Error"
  • How to groupby and sum if the cell value of certain columns fit specific conditions
  • Batch file not closed after being created and written to by Python?
  • Adding an extra in column into 2D numpy array python
  • Scraping content using pyppeteer in association with asyncio
  • Rearrange rows of Dataframe alternatively
  • Function not returning value due to errors in if else logic
  • Value Error in Python while calling a function
  • Is it possible to check if a function is decorated inside another function?
  • How to change "style property" in pygtk2
  • how to create new dataframe out of columns after resampling?
  • Why doesn't this Python code work? It returns no output
  • Python - Split multiple columns into multiple rows
  • Pyinstaller 3.4 Not Working on Windows 10 with Python 2.7
  • inputing numpy array images into pytorch neural net
  • Creating a Dataframe of Proportions
  • Scrapy with dynamic captcha
  • In python, how do I get urllib to recognize multiple lines in a string as separate URLs?
  • Add prefix and suffix to each element of a nested list
  • Generate string set from csv file in Python
  • Custom usage message for many-valued argument
  • Python Class, how to skip a wrong entry and proceed to next entry
  • Numpy efficient way to parse array of string
  • Kivy , Python: Update Label on_file_drop
  • What does it mean if a deeper conv layer converges first?
  • Selecting User in client.send_message() from arg list
  • python slicing multi levels list of dict using list comprehension
  • Value Error problem with multicell Dimensions must be equal, but are 20 and 13
  • How to print a board with coordinates?
  • Keras LSTM shape doesn't contain length of sequence
  • Boxplot with Pandas in Python
  • How can I rename a PySpark dataframe column by index? (handle duplicated column names)
  • How to calculate hash of a python class object
  • Using ideas from HashEmbeddings with sklearn's HashingVectorizer
  • keycloak.exceptions.KeycloakGetError: 404: b'' using Python 3.7
  • How to modify a column in a SQLite3?
  • VS Integration Services: flat file source to OLE DB destination - detect new data columns in file, add columns to table,
  • Customize xticks in matplotlib plot
  • How can I show the image in a labelframe which is inserted through askopenfilename?
  • Boxplot with distibution size histogram on top (and median regression)
  • Fit differential equation with scipy
  • ModuleNotFoundError: Correct setup
  • How to pass rendered plot to a html file through render_template?
  • Create flat ndarray from DataFrame column containing arrays
  • Bring radial axes labels in front of lines of polar plot matplotlib
  • Python3: Unable to split word from parsed data
  • Using Python to login to a website and web scrape
  • Customise shift in matplotlib offset
  • Combining and Reshaping rows and columns of 2 dataframes in R or Python
  • Regex condition after and before a known phrase
  • subplots based on records of two different pandas DataFrames ( with same structure) using Seaborn or Matplotlib
  • find numpy array in other numpy array
  • shadow
    Privacy Policy - Terms - Contact Us © bighow.org