NAME Search::Fulltext::Tokenizer::Ngram - Character n-gram tokenizer for Search::Fulltext VERSION version 0.01 SYNOPSIS use utf8; use Search::Fulltext; use Search::Fulltext::Tokenizer::Bigramm; my $searcher = Search::Fulltext->new( docs => [ 'ãƒãƒ³ãƒ—ティ・ダンプティ å¡€ã®ä¸Š', 'ãƒãƒ³ãƒ—ティ・ダンプティ è½ã£ã“ã¡ãŸ', '王様ã®é¦¬ã¿ã‚“ãªã¨ 王様ã®å®¶æ¥ã¿ã‚“ãªã§ã‚‚', 'ãƒãƒ³ãƒ—ティを元㫠戻ã›ãªã‹ã£ãŸ', ], tokenizer => q/perl 'Search::Fulltext::Tokenizer::Bigram::get_tokenizer'/, ); my $hit_document_ids = $searcher->search('ãƒãƒ³ãƒ—ティ'); # [0, 1, 3] DESCRIPTION This module provides character N-gram tokenizers for Search::Fulltext. By default {1,2,3}-gram tokenzers are available. CREATING A N(> 3)-GRAM TOKENIZER If you wish to use other N-grams where N > 3, you can create it by inheriting "Search::Fulltext::Tokenizer::Ngram": package My::Tokenizer::42gram; use parent qw/Search::Fulltext::Tokenizer::Ngram/; my $iterator_generator = __PACKAGE__->new(42); sub get_tokenizer { sub { $iterator_generator->create_token_iterator(@_) }; } SEE ALSO Search::Fulltext::Tokenizer::Unigram Search::Fulltext::Tokenizer::Bigram Search::Fulltext::Tokenizer::Trigram AUTHOR Koichi SATOH <sekia@cpan.org> COPYRIGHT AND LICENSE This software is Copyright (c) 2014 by Koichi SATOH. This is free software, licensed under: The MIT (X11) License