What is __m128d?

like below fixes the issue I really can't get what "keyword" like __m128d is in C++. ,
Is it a typedef?
code :
// From gcc 7.3's emmintrin.h  (SSE2 extensions).  SSE1 stuff in xmmintrin.h

/* The Intel API is flexible enough that we must allow aliasing with other
   vector types, and their scalar components.  */
typedef long long __m128i __attribute__ ((__vector_size__ (16), __may_alias__));
typedef double __m128d __attribute__ ((__vector_size__ (16), __may_alias__));
__m128d a2 = { -1.388539L, 0.0L };

SSE: convert __m128 and __m128i into two __m128d

it should still fix some issue The intrinsics _mm_cvtepi32_pd and _mm_cvtps_pd convert the values to double.
This should be the loop:
code :
__m128i* base_addr = ...;
for( int i = 0; i < cnt; ++i )
    __m128i epi32 = _mm_load_si128( base_addr + i );
    __m128d v0 = _mm_cvtepi32_pd( epi32 );
    epi32 = _mm_srli_si128( epi32, 8 );
    __m128d v1 = _mm_cvtepi32_pd( epi32 );
are __m128, __m128d, __m256, etc built-in types in C++?

This might help you All are correct. These types are extensions to C++, not built in (almost nothing built into C++ begins with underscores). Since they are extensions, the implementation is free to impose any restrictions on them it wants.
Convert __m128d to double

seems to work fine The counterpart of load would be store[ms, intel]. So in your case I'd guess (double precision, aligned pointer, regular store):
code :
_mm_store_pd(A, res); //A = res;
Returning a __m128d from MASM procedure to a C caller

I wish this helpful for you For educational purposes, I wrote up a version of your function that uses intrinsics:
code :
#include <immintrin.h>

extern "C" void AbsMax(__m128d* samples, int len, __m128d* pResult)
    __m128d min = _mm_setzero_pd();
    __m128d max = _mm_setzero_pd();
    while (len--)
        min = _mm_min_pd(min, *samples);
        max = _mm_max_pd(max, *samples);
    *pResult = _mm_max_pd(max, _mm_sub_pd(_mm_setzero_pd(), min));
; Listing generated by Microsoft (R) Optimizing Compiler Version 18.00.31101.0 
include listing.inc


samples$ = 8
len$ = 16
pResult$ = 24
AbsMax PROC                     ; COMDAT
    xorps   xmm3, xmm3
    movaps  xmm2, xmm3
    movaps  xmm1, xmm3
    test    edx, edx
    je  SHORT $LN6@AbsMax
    npad   3
    minpd   xmm2, XMMWORD PTR [rcx]
    maxpd   xmm1, XMMWORD PTR [rcx]
    lea rcx, QWORD PTR [rcx+16]
    dec edx
    jne SHORT $LL2@AbsMax
    subpd   xmm3, xmm2
    maxpd   xmm1, xmm3
    movaps  XMMWORD PTR [r8], xmm1
    ret 0
AbsMax  ENDP
Isn't __m128d aligned natively?

I hope this helps you . __m128d is a type that assumes / requires / guarantees (to the compiler) 16-byte alignment1.
Casting a misaligned pointer to __m128d* and dereferencing it is undefined behaviour, and this is the expected result. Use _mm_loadu_pd if your data might not be aligned. (Or preferably, align your data with alignas(16) double a[bufferSize]; 2). ISO C++11 and later have portable syntax for aligning static and automatic storage (but not as easy for dynamic storage).
code :
typedef double __m128d_u 
       __attribute__ ((__vector_size__ (16), __may_alias__, __aligned__ (1)));
