C++で書かれたプログラムのコンパイルエラー

参考サイトに掲載されていたC++のプログラムをコンパイルしようとしています。
行列計算のプログラムで、実行形式ファイルとアセンブリコードの両方で、同じ６つのエラーが表示されました。
どのように修正すればコンパイルできるようになるでしょうか。

実行コマンド

$ g++ -o sample sample.cpp

$ g++ -S sample.cpp

エラー

sample.cpp:7:3: error: use of undeclared identifier 'float8'; did you mean
      'float'?
  float8 csum[regsA][regsB] = {{0.0}};
  ^~~~~~
  float
sample.cpp:12:7: error: use of undeclared identifier 'float8'; did you mean
      'float'?
      float8 bb = LoadFloat8(&B(p, bi * 8));
      ^~~~~~
      float
sample.cpp:12:31: error: use of undeclared identifier 'B'
      float8 bb = LoadFloat8(&B(p, bi * 8));
                              ^
sample.cpp:14:9: error: use of undeclared identifier 'float8'; did you mean
      'float'?
        float8 aa = BroadcastFloat8(A(ai, p));
        ^~~~~~
        float
sample.cpp:14:37: error: use of undeclared identifier 'A'
        float8 aa = BroadcastFloat8(A(ai, p));
                                    ^
sample.cpp:23:19: error: use of undeclared identifier 'C'
      AdduFloat8(&C(ai, bi * 8), csum[ai][bi]);
                  ^
6 errors generated.

float8とfloatに関してエラーが出ていたので、修正したところ以下のようになりました。

sample.cpp:12:30: error: use of undeclared identifier 'B'
      float bb = LoadFloat8(&B(p, bi * 8));
                             ^
sample.cpp:14:36: error: use of undeclared identifier 'A'
        float aa = BroadcastFloat8(A(ai, p));
                                   ^
sample.cpp:23:19: error: use of undeclared identifier 'C'
      AdduFloat8(&C(ai, bi * 8), csum[ai][bi]);
                  ^
3 errors generated.

実行プログラム

template <unsigned regsA, unsigned regsB>
void matmul_dot_inner(int k, const float *a, int lda, const float *b, int ldb,
                      float *c, int ldc) {
  float csum[regsA][regsB] = {{0.0}};
  for (int p = 0; p < k; p++) {

    // Perform the DOT product.
    for (int bi = 0; bi < regsB; bi++) {
      float bb = LoadFloat8(&B(p, bi * 8));
      for (int ai = 0; ai < regsA; ai++) {
        float aa = BroadcastFloat8(A(ai, p));
        csum[ai][bi] += aa * bb;
      }
    }
  }

  // Accumulate the results into C.
  for (int ai = 0; ai < regsA; ai++) {
    for (int bi = 0; bi < regsB; bi++) {
      AdduFloat8(&C(ai, bi * 8), csum[ai][bi]);
    }
  }
}

できていること
C++のHello world.は実行確認できています。

$ g++ -o hello hello.cpp
$ ./hello 
Hello world.

#include <iostream>

using namespace std;

int main(){
  cout << "Hello world." << endl;
  return 0;
}