logo
down
shadow

Matrix addition in Scheme


Matrix addition in Scheme

By : djf
Date : October 18 2020, 11:12 AM
will help you Here is very short working implementation. Map is good at getting rid of a layer of recursion, when you can use it.
(define (matrix-add x y) (map (lambda (x y) (map + x y)) x y))
code :


Share : facebook icon twitter icon
How to get user input as a matrix format to perform matrix addition in php? anybody having suggestions?

How to get user input as a matrix format to perform matrix addition in php? anybody having suggestions?


By : Hashem Haghbayan
Date : March 29 2020, 07:55 AM
I wish this help you Give a option to user with two input box . First for rows and second for columns; Generate a table based on these input through javascript having textbox in each and input box name should be in array based on rows and columns while you are generating. let says, user enters :- rows : 3 columns: 3
code :
<table>
<tr>
    <td><input type="textbox" name="matrix[0][]" value=""/> </td>
    <td><input type="textbox" name="matrix[0][]" value=""/> </td>
    <td><input type="textbox" name="matrix[0][]" value=""/> </td>
</tr>
<tr>
    <td><input type="textbox" name="matrix[1][]" value=""/> </td>
    <td><input type="textbox" name="matrix[1][]" value=""/> </td>
    <td><input type="textbox" name="matrix[1][]" value=""/> </td>
</tr>
<tr>
    <td><input type="textbox" name="matrix[2][]" value=""/> </td>
    <td><input type="textbox" name="matrix[2][]" value=""/> </td>
    <td><input type="textbox" name="matrix[2][]" value=""/> </td>
</tr>
</table>
<?php
$matrixArr = $_POST['matrix']; // it will be a two dimenssion array having value as  matrix have
?>
Which is best among Hybrid CPU-GPU, only GPU,onlyCPU for implementing large matrix addition or matrix multiplication?

Which is best among Hybrid CPU-GPU, only GPU,onlyCPU for implementing large matrix addition or matrix multiplication?


By : Ivo Ivanov
Date : March 29 2020, 07:55 AM
I think the issue was by ths following , The problem with CPU-GPU hybrid computations where you need the result back on CPU is the latency between the two. If you expect to do some computation on GPU and have the result back on CPU, there can be easily several milliseconds of delay from starting the computation on GPU to get the results back on CPU, so the amount of work done on GPU should be significant. Or you need significant amount of work on CPU between starting GPU computation and getting the results back from GPU. Performing 1000 element matrix addition is tiny amount of work thus you would be better off performing the entire computation on CPU instead. You also have the overhead of transferring the data back and forth between the CPU & GPU across the PCI bus which adds to the overhead, so computations which require small amount of data transferred between the two lean more towards hybrid solution.
If you never need to read the result back from GPU to CPU, then you don't have the latency issue though. For example you could do N-body simulation on GPU and perform visualization on GPU as well thus never needing the result on CPU. But the moment you need the result of the simulation back to CPU you have to deal with the latency issue.
Program crashes at: (1) matrix multiplication; and (2) failed matrix addition/subtraction

Program crashes at: (1) matrix multiplication; and (2) failed matrix addition/subtraction


By : Colin Telfer
Date : March 29 2020, 07:55 AM
this will help You posted a copy constructor and assignment operator. Your assignment operator has 4 major issues:
You should pass the parameter by const reference, not by value. You should return a reference to the current object, not a brand new object. If new throws an exception during assignment, you've messed up your object by deleting the memory beforehand. It is redundant. The same code appears in your copy constructor.
code :
#include <algorithm>
//...
Matrix& Matrix::operator=(const Matrix& aMatrix)
{
    Matrix temp(aMatrix);
    swap(*this, temp);
    return *this;
}

void Matrix::swap(Matrix& left, Matrix& right)
{
   std::swap(left.rows, right.rows);
   std::swap(left.cols, right.cols);
   std::swap(left.element, right.element);
}
Matrix& Matrix::operator+=(const Matrix &aMatrix)
{
    if(rows != aMatrix.rows || cols != aMatrix.cols)
       throw SomeException;
    for(int i = 0; i < rows; i++)
    {
        for(int x = 0; x < cols; x++)
            element[i][x] += aMatrix.element[i][x];
    }
    return *this;
}

Matrix Matrix::operator+(const Matrix &aMatrix)
{
   Matrix temp(*this);
   temp += aMatrix;
   return temp;
}
cuda magma matrix-matrix addition kernel

cuda magma matrix-matrix addition kernel


By : Muhammad Yusran
Date : March 29 2020, 07:55 AM
hop of those help? I tried using similar format as magmablas_sgeadd_q kernel, however I am not getting proper outputs, moreover every time I run it, I get a different output. The code that I used is given below: , There were two coding errors that I found:
code :
if ( ind < m ) {
    dA += ind + iby*ldda;
    dB += ind + iby*lddb;
    dC += ind + iby*lddb;  // add this line
for (int i = 0; i < m; ++i)
{
    for (int j = 0; j < n ; j ++)
    h_A[i*m+j] = rand()/(float)RAND_MAX;
$ cat t1213.cu
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <cuda_runtime.h>

#define BLK_X 2
#define BLK_Y 1

__global__ void matrixAdd2( const float *dA, const float *dB, float *dC, int m, int n)
{
int ldda = m;
int lddb = m;

int ind = blockIdx.x*BLK_X + threadIdx.x;
int iby = blockIdx.y*BLK_Y;
/* check if full block-column */
bool full = (iby + BLK_Y <= n);
/* do only rows inside matrix */
if ( ind < m ) {
    dA += ind + iby*ldda;
    dB += ind + iby*lddb;
    dC += ind + iby*lddb;
    if ( full )
    {
        // full block-column
        #pragma unroll
        for( int j=0; j < BLK_Y; ++j )
        {
            dC[j*lddb] = dA[j*ldda] + dB[j*lddb];
            printf("A is %f, B is %f, C is %f  \n",dA[j*ldda],dB[j*lddb],dC[j*lddb]);
        }
    }
    else
    {
        // partial block-column
        for( int j=0; j < BLK_Y && iby+j < n; ++j )
        {
            dC[j*lddb] = dA[j*ldda] + dB[j*lddb];
            printf("parital: A is %f, B is %f, C is %f  \n",dA[j*ldda],dB[j*lddb],dC[j*lddb]);
        }
    }
}
}



int main ( void )
{

int m = 4; // a - mxn matrix
int n = 2; // b - mxn matrix

size_t size =  m * n * sizeof(float);


printf("Matrix addition of %d rows and %d columns \n", m, n);

// allocate matrices on the host

float *h_A = (float *)malloc(size); // a- mxn matrix on the host
float *h_B = (float *)malloc(size); // b- mxn matrix on the host
float *h_C = (float *)malloc(size); // b- mxn matrix on the host


// Initialize the host input matrixs
for (int i = 0; i < n; ++i)
{
    for (int j = 0; j < m ; j ++)
    {
        h_A[i*m+j] = rand()/(float)RAND_MAX;
        h_B[i*m+j] = rand()/(float)RAND_MAX;

    }
}

// Allocate the device input matrix A
float *d_A = NULL;
cudaError_t err = cudaMalloc((void **)&d_A, size);; // d_a - mxn matrix a on the device

// Allocate the device input matrix B
float *d_B = NULL;
err = cudaMalloc((void **)&d_B, size);

// Allocate the device output matrix C
float *d_C = NULL;
err = cudaMalloc((void **)&d_C, size);

// Copy the host input matrixs A and B in host memory to the device input    matrixs in device memory
printf("Copy input data from the host memory to the CUDA device\n");
err = cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);

err = cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice);

// defining number of threads and blocks
dim3 threads( BLK_X, BLK_Y );
dim3 grid((int)ceil(m/BLK_X),(int)ceil(n/BLK_Y) );


// Launching kernel
matrixAdd2<<<grid, threads, 0>>>(d_A, d_B, d_C, m, n);

// Copy the device result matrix in device memory to the host result matrix in host memory.
printf("Copy output data from the CUDA device to the host memory\n");
err = cudaMemcpy(h_C, d_C, size, cudaMemcpyDeviceToHost);

//print A matrix
printf("Matrix A");
for (int i = 0; i < n; i++)
{
    for (int j = 0; j < m; j++)
   {
        printf(" %f", h_A[i*m+j]);

    }
    printf("\n");
}

// print B matrix if required
printf("Matrix B");
for (int i = 0; i < n; i++)
{
    for (int j = 0; j < m; j++)
    {

        printf(" %f", h_B[i*m+j]);

    }
    printf("\n");
}
int flag = 0;
//Error checkng
printf("Matrix C ");
for (int i = 0; i < n; i++)
{
    for (int j = 0; j < m; j++)
   {
        printf("%f", h_C[i*m+j]);
        if(h_C[i*m+j] == h_A[i*m+j] + h_B[i*m+j] )
        {
            flag = flag + 1;
        }
    }
    printf("\n");
}

if(flag==m*n)
{
printf("Test PASSED\n");
}


// Free device global memory
err = cudaFree(d_A);

err = cudaFree(d_B);

err = cudaFree(d_C);

// Free host memory
free(h_A);
free(h_B);
free(h_C);


err = cudaDeviceReset();
printf("Done\n");
return 0;

}
$ nvcc -o t1213 t1213.cu
$ cuda-memcheck ./t1213
========= CUDA-MEMCHECK
Matrix addition of 4 rows and 2 columns
Copy input data from the host memory to the CUDA device
Copy output data from the CUDA device to the host memory
A is 0.277775, B is 0.553970, C is 0.831745
A is 0.477397, B is 0.628871, C is 1.106268
A is 0.364784, B is 0.513401, C is 0.878185
A is 0.952230, B is 0.916195, C is 1.868425
A is 0.911647, B is 0.197551, C is 1.109199
A is 0.335223, B is 0.768230, C is 1.103452
A is 0.840188, B is 0.394383, C is 1.234571
A is 0.783099, B is 0.798440, C is 1.581539
Matrix A 0.840188 0.783099 0.911647 0.335223
 0.277775 0.477397 0.364784 0.952230
Matrix B 0.394383 0.798440 0.197551 0.768230
 0.553970 0.628871 0.513401 0.916195
Matrix C 1.2345711.5815391.1091991.103452
0.8317451.1062680.8781851.868425
Test PASSED
Done
========= ERROR SUMMARY: 0 errors
$
Why Matrix Addition is slower than Matrix-Vector Multiplication in Eigen?

Why Matrix Addition is slower than Matrix-Vector Multiplication in Eigen?


By : Borna Morovic
Date : March 29 2020, 07:55 AM
To fix this issue Short answer: you calculated the number of operations but neglected to count memory accesses for which there is nearly x2 more costly loads for the addition case. Details below.
First of all, the practical number of operations is the same for both operation because modern CPUs are able to perform one independent addition and multiplication at the same time. Two sequential mul/add like x*y+z can even be fused as a single operation having the same cost than 1 addition or 1 multiplication. If your CPU supports FMA, this is what happens with -march=native, but I doubt FMA plays any role here.
Related Posts Related Posts :
  • weight update of one random layer in multilayer neural network using backpagation?
  • Find most recent date in a list of objects on LocalDate property using Java 8 stream
  • deflateSetDictionary usage
  • Getting Me using Azure OAuth 2 Token
  • How do I get an Aspara Video playauth?
  • Avoiding memory leaks with Commanded in an aggregate that doesn't produce an event
  • Terraform: How to get a boolean from interpolation?
  • How can I disable shift (or any modifier) for some keys but not for others?
  • How to access component model from outside
  • Formatting decimal output when multiplying all numerical values in a string?
  • How to solve the numerical instability to a solution of a system of ordinary differential equations
  • How can I determine if a row has changed?
  • Inherited software that's built on Visual FoxPro, How can I access the source files and edit the program as needed?
  • Problem accessing orion-psb-image-R5.4 on FIWARE Lab using ssh
  • Check if List of Matrix Indexes are adjacent
  • View availability message on summary page Cart
  • Spring AMQP RabbitMQ RPC - Queue with with some messages that do not expect a response
  • App Pool Login Failed Message in New Acumatica Version
  • New-AzureRmSqlDatabaseImport does not accept a DatabaseMaxSizeBytes greater than 5GB
  • How to get the list of queues from a server?
  • Maxima plot discrete data with 3 columns
  • Referencing field from input in a Logstash filter
  • Chapel : Understanding lifetime of managed classes with zip and user-defined iterators
  • Compile With Static Library Using GNAT
  • Is there a way to receive most messages out of the standard SQS Queue? [NOT FIFO]
  • Remove shadow from XFCE panel
  • How do I serialize TransactionBuilder
  • How to disable Rule: one-line in TSLint
  • Automatically Ignore @OneToMany, @ManyToOne
  • Accidently renamed libc.so.6 and cannot chroot within rescue mode
  • A proper way to serialize/deserialize Xodus-dnq entity
  • How to hide the overlays in A frame?
  • PRelue is not supperted with mmdnn?
  • JasperReports: How to pass parameter to subReport
  • Is there built-in web analytics tool in Liferay?
  • 'if' scenario in an equation. How do I implement it?
  • Uncaught [CKEDITOR.editor] The instance "html" already exists
  • Trimming variable in CLLE
  • Slot not being passed from action to layout in Symfony 1.4
  • Are there any disadvantages of using C# 3.0 features?
  • How to download paypal transactions using some API
  • Microsoft CRM could not log you on to the system. Make sure your user record
  • Good acceleration structure for ray sphere tests with spheres that move
  • Draw formatted rich text
  • How can I sum values in column based on the value in another column?
  • JBoss eventually stops responding to request, but no OOME
  • Prevent inline-block from wrapping on white-space: pre?
  • Neural Network Recommendation Engine
  • Getting the return value of an exec process
  • Dynamics CRM: Create users with specific GUIDs
  • Languages used to write programs for satellite-missions?
  • is it possible to use only two semaphores to synchronise three or more threads?
  • Opengl ES - drawing a plane of multiple vertices
  • Make Aptana never use Windows line endings
  • Stub property and save other behaviour
  • What would you call "callback" or "closure" in general?
  • Drag a bezier curve to edit it
  • Archive/Compress Command FTP Through Terminal?
  • how can I use curly braces in a mysql query?
  • or operator in vbscript
  • shadow
    Privacy Policy - Terms - Contact Us © bighow.org