simd指令集使用入门-APISpace

simd指令集使用入门

所谓SIMD（单指令多数据流）就是Single Instruction Multiple Data的简称，可以理解成能够同时操作多个数据，并把储存在大型寄存器的一组指令集。

当中包括x86体系中SSE,SSE2,SSE3,SSSE3,SSE4,AVX和AMD已经废弃了的3DNow!。今天以AVX为例简单介绍一下使用方法，结合github上的md5 avx开源demo来讲解。

MD5超简略讲解

MD5的计算过程实际就是反复使用ABCD四个辅助值，重复执行FF、GG、HH、II来进行计算。利用SIMD指令就能单个运行周期内进行多路MD5计算

A: 0x67452301

B: 0xefcdab89

C: 0x98badcfe

D: 0x10325476

其执行逻辑如何下：

F（X，Y，Z）=X&Y | （～X）&ZG（X，Y，Z）=X&Z | Y （～Z）H（X，Y，Z）=X ^Y ^ZI（X，Y，Z）=Y ^ (X | (~Z) )

数据类型

typedef struct { __m256i state[4]; /* state (ABCD) */ unsigned long int count[2]; /* number of bits, modulo 2^64 (lsb first) */ unsigned char buffer1[64]; unsigned char buffer2[64]; unsigned char buffer3[64]; unsigned char buffer4[64]; unsigned char buffer5[64]; unsigned char buffer6[64]; unsigned char buffer7[64]; unsigned char buffer8[64];} MD5_AVX_CTX;..........void md5_avx_init(MD5_AVX_CTX *context){ context->count[0] = context->count[1] = 0; /* Load magic initialization constants. */ context->state[0] = _mm256_set1_epi32(0x67452301); context->state[1] = _mm256_set1_epi32(0xefcdab89); context->state[2] = _mm256_set1_epi32(0x98badcfe); context->state[3] = _mm256_set1_epi32(0x10325476);}

一、SIMD数据类型简介

SIMD数据类型有——__m64：64位紧缩整数（MMX）。

__m128：128位紧缩单精度（SSE）。

__m128d：128位紧缩双精度（SSE2）。

__m128i：128位紧缩整数（SSE2）。

__m256：256位紧缩单精度（AVX）。

__m256d：256位紧缩双精度（AVX）。

__m256i：256位紧缩整数（AVX）。

注：紧缩整数包括了8位、16位、32位、64位的带符号和无符号整数。

这些数据类型与寄存器的对应关系为——

64位MM寄存器（MM0~MM7）：_m64。

128位SSE寄存器（XMM0~XMM15） _m128、_m128d、_m128i。

256位AVX寄存器（YMM0~YMM15）：_m256 、_m256d、_m256i。

从上表可以看出md5_avx_init函数中使用_mm256类型分别保存A、B、C、D四个值。

_mm256_set1_epi32（）负责类型转换工作。

这么一顿操作之后现在分别就有8份ABCD辅助值的数据。接下来也需要把待验证的数据进行8份分割。

static void avx_decode(__m256i *output, unsigned char *input1, unsigned char *input2, unsigned char *input3, unsigned char *input4, unsigned char *input5, unsigned char *input6, unsigned char *input7, unsigned char *input8, unsigned int len){ unsigned int i, j; for (i = 0, j = 0; j < len; i++, j += 4) { output[i] = _mm256_set_epi32( ((unsigned long int)input8[j]) | (((unsigned long int)input8[j+1]) << 8) | (((unsigned long int)input8[j+2]) << 16) | (((unsigned long int)input8[j+3]) << 24), ((unsigned long int)input7[j]) | (((unsigned long int)input7[j+1]) << 8) | (((unsigned long int)input7[j+2]) << 16) | (((unsigned long int)input7[j+3]) << 24), ((unsigned long int)input6[j]) | (((unsigned long int)input6[j+1]) << 8) | (((unsigned long int)input6[j+2]) << 16) | (((unsigned long int)input6[j+3]) << 24), ((unsigned long int)input5[j]) | (((unsigned long int)input5[j+1]) << 8) | (((unsigned long int)input5[j+2]) << 16) | (((unsigned long int)input5[j+3]) << 24), ((unsigned long int)input4[j]) | (((unsigned long int)input4[j+1]) << 8) | (((unsigned long int)input4[j+2]) << 16) | (((unsigned long int)input4[j+3]) << 24), ((unsigned long int)input3[j]) | (((unsigned long int)input3[j+1]) << 8) | (((unsigned long int)input3[j+2]) << 16) | (((unsigned long int)input3[j+3]) << 24), ((unsigned long int)input2[j]) | (((unsigned long int)input2[j+1]) << 8) | (((unsigned long int)input2[j+2]) << 16) | (((unsigned long int)input2[j+3]) << 24), ((unsigned long int)input1[j]) | (((unsigned long int)input1[j+1]) << 8) | (((unsigned long int)input1[j+2]) << 16) | (((unsigned long int)input1[j+3]) << 24) ); }}

根据官方手册可知，_m256i是小端储存模式，所以高位数据需要右移。

__m256i _mm256_set_epi32 (int e7, int e6, int e5, int e4, int e3, int e2, int e1, int e0)**Synopsis**__m256i _mm256_set_epi32 (int e7, int e6, int e5, int e4, int e3, int e2, int e1, int e0)#include Instruction: SequenceCPUID Flags: AVXDescriptionSet packed 32-bit integers in dst with the supplied values.Operationdst[31:0] := e0dst[63:32] := e1dst[95:64] := e2dst[127:96] := e3dst[159:128] := e4dst[191:160] := e5dst[223:192] := e6dst[255:224] := e7dst[MAX:256] := 0

simd运算

F运行中参数是X,Y,Z当中都是简单运算的组合

F（X，Y，Z）=X&Y | （～X）&Z

代码实现如下：

AVX_F(x, y, z) _mm256_castps_si256(_mm256_or_ps(_mm256_and_ps(_mm256_castsi256_ps(x), _mm256_castsi256_ps(y)), _mm256_andnot_ps(_mm256_castsi256_ps(x), _mm256_castsi256_ps(z))))

从里面到外面分析：

mm256_castsi256_ps函数作用是把实现m256i转成m256类型，AVX API的与或运算只能使用m256类型，mm256_and_ps 对x,y 8路数据进行与操作。

mm256_andnot_ps对Z和X进行与非操作相当于（～X）&Z

mm256_or_ps把前两者的结果进行或操作。

结语

总的来说，SIMD AVX指令集可以1条指令同时对8路数据进行加减乘除、与非或操作，榨干CPU性能大大提升运行效率。

以上就是关于今天的全部内容，下期将给大家带来《关于ext4文件系统概述》，敬请期待~

c语言sscanf函数的用法是什么

552 2022-11-04

simd指令集使用入门

c语言sscanf函数的用法是什么

r语言清空数组的方法是什么

linux怎么查看本机内存大小

推荐文章

api接口有哪几种分类及功能

什么是API接口?API接口简单介绍

短信API接口概述，短信API接口的优势

7款快递物流的物流查询API工具，物流快递查询API接口怎么对接？

企业四要素: 了解企业经营成功的关键

什么是语音验证码?,语音验证码平台有哪些

全国工商查询系统怎么查企业名录

哪些平台提供实名认证的接口？

PHP如何调用API接口?

如何使用百度天气预报API接口?

最近发表

热评文章

数据接口api（数据接口API开发平台）

数据开放接口api（数据服务api开发）

Python爬虫教程：爬取酷狗音乐（python爬取

hbuilder怎么更改字体大小和颜色

直播平台api接口 - 构建卓越的直播平台

实时股票数据api接口（股票实时行情api接口）