Pycudaでのマッチングプログラムの作成方法

Pycudaを用いた画像マッチングのプログラムを作成しようとしています。しかし、cuda用の記述方法が難しく、プログラミング初心者もあって、止まってしまっている状態です。
今の所、こちらのOpenCVチュートリアルの下部にある、テンプレートマッチング(主に、10行目の処理を)をPycudaで記述したいと考えています。ご教授お願いします。

1  import cv2
2  import numpy as np
3  from matplotlib import pyplot as plt
4
5  img_rgb = cv2.imread('mario.png')
6  img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)
7  template = cv2.imread('mario_coin.png',0)
8  w, h = template.shape[::-1]
9
10 res = cv2.matchTemplate(img_gray,template,cv2.TM_CCOEFF_NORMED)
11
12 threshold = 0.8
13 loc = np.where( res >= threshold)
14
15 for pt in zip(*loc[::-1]):
16     cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0,0,255), 2)
17
18 cv2.imwrite('res.png',img_rgb)