下記参考URLをもとに、word2vecを動かしてみたいと思いました。
以下train.py,similars.pyのファイル、作業手順は全てこのサイトからの転用です。

作業手順

mecabで青空文庫のファイルを分かち書きしたのち、以下のファイルで学習させました。
生成したmodelはdata22.modelとして保存。

train.py

-*- coding: utf-8 -*-

from gensim.models import word2vec
import logging
import sys

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', 
level=logging.INFO)

sentences = word2vec.LineSentence(sys.argv[1])
model = word2vec.Word2Vec(sentences,
                          sg=1,
                          size=100,
                          min_count=1,
                          window=10,
                          hs=1,
                          negative=0)
model.save(sys.argv[2])

pythonでtrain.pyを実行。結果のmodelにdata22.modelと名前をつけて保存。

$ python train.py data22.txt data22.model
2017-04-08 01:49:31,381 : INFO : collecting all words and their counts
2017-04-08 01:49:31,382 : INFO : PROGRESS: at sentence #0, processed 0 
words, keeping 0 word types
2017-04-08 01:49:31,389 : INFO : collected 1684 word types from a 
corpus of 9554 raw words and 228 sentences
2017-04-08 01:49:31,389 : INFO : Loading a fresh vocabulary
2017-04-08 01:49:31,395 : INFO : min_count=1 retains 1684 unique words 
(100% of original 1684, drops 0)
2017-04-08 01:49:31,395 : INFO : min_count=1 leaves 9554 word corpus (100% of original 9554, drops 0)
2017-04-08 01:49:31,405 : INFO : deleting the raw counts dictionary of 1684 items
2017-04-08 01:49:31,406 : INFO : sample=0.001 downsamples 45 most-common words
2017-04-08 01:49:31,407 : INFO : downsampling leaves estimated 5687 word corpus (59.5% of prior 9554)
2017-04-08 01:49:31,407 : INFO : estimated required memory for 1684 words and 100 dimensions: 2526000 bytes
2017-04-08 01:49:31,410 : INFO : constructing a huffman tree from 1684 words
2017-04-08 01:49:31,496 : INFO : built huffman tree with maximum node depth 13
2017-04-08 01:49:31,496 : INFO : resetting layer weights
2017-04-08 01:49:31,544 : INFO : training model with 3 workers on 1684 vocabulary and 100 features, using sg=1 hs=1 sample=0.001 negative=0 window=10
2017-04-08 01:49:31,544 : INFO : expecting 228 sentences, matching count from corpus used for vocabulary survey
2017-04-08 01:49:31,708 : INFO : worker thread finished; awaiting finish of 2 more threads
2017-04-08 01:49:31,766 : INFO : worker thread finished; awaiting finish of 1 more threads
2017-04-08 01:49:31,767 : INFO : worker thread finished; awaiting finish of 0 more threads
2017-04-08 01:49:31,767 : INFO : training on 47770 raw words (28489 effective words) took 0.2s, 128642 effective words/s
2017-04-08 01:49:31,767 : WARNING : under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
2017-04-08 01:49:31,767 : INFO : saving Word2Vec object under data22.model, separately None
2017-04-08 01:49:31,767 : INFO : not storing attribute syn0norm
2017-04-08 01:49:31,767 : INFO : not storing attribute cum_table
2017-04-08 01:49:31,870 : INFO : saved data22.model

指定した単語と類似度の高い単語をリストアップするスクリプトsimilars.pyを用意。

similars.py

# -*- coding: utf-8 -*-

from gensim.models import word2vec
import sys

model   = word2vec.Word2Vec.load(sys.argv[1])
results = model.most_similar(positive=sys.argv[2], topn=10)

for result in results:
    print(result[0], '\t', result[1])

先ほど作成したmodelファイルでsimilars.pyを「本」という単語を引数にして実行。すると以下のエラーが出てしまいます。引数に指定した「本」という単語が認識されていないようですが、原因がわかりません。

$ python similars.py data22.model 本
Traceback (most recent call last):
  File "similars.py", line 7, in <module>
    results = model.most_similar(positive=sys.argv[2], topn=10)
  File "/usr/local/lib/python2.7/site-
packages/gensim/models/word2vec.py", line 1285, in most_similar
    return self.wv.most_similar(positive, negative, topn, 
restrict_vocab, indexer)
  File "/usr/local/lib/python2.7/site-
packages/gensim/models/keyedvectors.py", line 97, in most_similar
    **raise KeyError("word '%s' not in vocabulary" % word)**
**KeyError: "word '\xe6\x9c\xac' not in vocabulary"**

どなたか、解決のヒントをいただければ幸いです。よろしくお願いします。