Intel的cpu汇编bmi2指令集的pext指令和完美哈希函数

May 04, 2024 | 4 Minute Read

参考:

https://www.chessprogramming.org/BMI2
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=pext&ig_expand=5088
https://gcc.gnu.org/onlinedocs/gcc/x86-Built-in-Functions.html

pext 指令

是 “Parallel Bits Extract” 的意思, 作用就是根据你提供的一个mask掩码整数,如果mask的某一位k是1就这个 目标整数dst的第k位抽出来 作为新整数的一位。

intel提供的伪代码是下面这样的

Synopsis
unsigned __int64 _pext_u64 (unsigned __int64 a, unsigned __int64 mask)
#include <immintrin.h>
Instruction: pext r64, r64, r64
CPUID Flags: BMI2

Description
Extract bits from unsigned 64-bit integer a at the corresponding bit locations specified by mask to contiguous low bits in dst; the remaining upper bits in dst are set to zero.

Operation
tmp := a
dst := 0
m := 0
k := 0
DO WHILE m < 64
	IF mask[m] == 1
		dst[k] := tmp[m]
		k := k + 1
	FI
	m := m + 1
OD

象棋百科提供的C语言的代码

看起来没有intel的容易理解

U64 _pext_u64(U64 val, U64 mask) {
  U64 res = 0;
  for (U64 bb = 1; mask; bb += bb) {
    if ( val & mask & -mask )
      res |= bb;
    mask &= mask - 1;
  }
  return res;
} 

象棋百科提供的示意图

就是说pext运算后的解释,把s60 和 s7 s5 s2这些位抽出来组成的DEST就是指令返回的结果。

SRC1   ┌───┬───┬───┬───┬───┐    ┌───┬───┬───┬───┬───┬───┬───┬───┐
       │S63│S62│S61│S60│S59│....│ S7│ S6│ S5│ S4│ S3│ S2│ S1│ S0│ 
       └───┴───┴───┴───┴───┘    └───┴───┴───┴───┴───┴───┴───┴───┘

SRC2   ┌───┬───┬───┬───┬───┐    ┌───┬───┬───┬───┬───┬───┬───┬───┐
(mask) │ 0 │ 0 │ 0 │ 1 │ 0 │0...│ 1 │ 0 │ 1 │ 0 │ 0 │ 1 │ 0 │ 0 │  (f.i. 4 bits set)
       └───┴───┴───┴───┴───┘    └───┴───┴───┴───┴───┴───┴───┴───┘

DEST   ┌───┬───┬───┬───┬───┐    ┌───┬───┬───┬───┬───┬───┬───┬───┐
       │ 0 │ 0 │ 0 │ 0 │ 0 │0...│ 0 │ 0 │ 0 │ 0 │S60│ S7│ S5│ S2│ 
       └───┴───┴───┴───┴───┘    └───┴───┴───┴───┴───┴───┴───┴───┘

pdep 叫做 “Parallel Bits Deposit”, 是pext的反操作吧

gcc 提供的build-in函数

使用 -mbmi2 编译选项就会生成这些指令吧。 这个bmi2 指令集在intel Haswell 架构以后的cpu应该都支持, linux系统里面cat /proc/cpuinfo 的flags 里面会带有 bmi2 字样

unsigned int _bzhi_u32 (unsigned int, unsigned int);
unsigned int _pdep_u32 (unsigned int, unsigned int);
unsigned int _pext_u32 (unsigned int, unsigned int);
unsigned long long _bzhi_u64 (unsigned long long, unsigned long long);
unsigned long long _pdep_u64 (unsigned long long, unsigned long long);
unsigned long long _pext_u64 (unsigned long long, unsigned long long);

应用

这个pext指令的应用,有一个用途是把大整数转为小整数的索引,比如用来实现哈希表的完美哈希函数。比如这个https://github.com/boost-ext/mph库的用法:

def hash[kv: array, unknown: typeof(kv[0][0])](key : any):
  # 0. find mask which uniquely identifies all keys [compile-time]
  mask = ~typeof(kv[0][0]) # 0b111111...

  for i in range(nbits(mask)):
    masked = []
    mask.unset(i)

    for k, v in kv:
      masked.append(k & mask)

    if not unique(masked):
      mask.set(i)

  assert unique(masked)
  assert mask != ~typeof(kv[0][0])

  lookup = array(typeof(kv[0]), 2^popcount(mask)) # static constexpr + alignment
  for k, v in kv:
    lookup[pext(k, mask)] = (k, v)

  # 1. lookup [run-time] / if key is a string convert to u32 or u64 first (memcpy)
    # word: 00101011
    # mask: 11100001
    #    &: 000____1
    # pext: ____0001 # intel/intrinsics-guide/index.html#text=pext
    def pext(a : uN, mask : uN):
      dst, m, k = ([], 0, 0)

      while m < nbits(a):
        if mask[m] == 1:
          dst.append(a[m])
          k += 1
        m += 1

      return uN(dst)

  k, v = lookup[pext(key, mask)]

  if k == key: # policies (conditional, branchless, ...)
    return v
  else:
    return unknown

在象棋引擎里面,主要用来实现 “bitboard位棋盘” 的 棋子攻击表的 完美哈希函数, “Fancy Magic Bitboards” “Fancy PEXT Bitboards ”,把 棋盘的掩码转换为小的哈希表索引。
可以参考Stockfish引擎的 init_magics函数 https://github.com/official-stockfish/Stockfish/blob/master/src/bitboard.cpp