An object that knows how to perform hyphenation based on the TeX hyphenation algorithm with pattern files. Each object is constructed with a specific languageās hyphenation patterns.
Clears the per-instance hyphenation and visualization caches.
# File lib/text/hyphen.rb, line 173 def clear_cache! @cache.clear @vcache.clear end
Returns an array of character positions where a word can be hyphenated.
hyp.hyphenate('representation') #=> [3, 5, 8 10]
Because hyphenation can be expensive, if the word has been hyphenated previously, it will be returned from a per-instance cache.
# File lib/text/hyphen.rb, line 106 def hyphenate(word) word = word.downcase $stderr.puts "Hyphenating #{word}" if DEBUG return @cache[word] if @cache.has_key?(word) res = @language.exceptions[word] return @cache[word] = make_result_list(res) if res letters = word.scan(@language.scan_re) $stderr.puts letters.inspect if DEBUG word_size = letters.size result = [0] * (word_size + 1) right_stop = word_size - @right updater = Proc.new do |hash, str, pos| if hash.has_key?(str) $stderr.print "#{pos}: #{str}: #{hash[str]}" if DEBUG hash[str].scan(@language.scan_re).each_with_index do |cc, ii| cc = cc.to_i result[ii + pos] = cc if cc > result[ii + pos] end $stderr.print ": #{result.inspect}\n" if DEBUG end end # Walk the word (0..right_stop).each do |pos| rest_length = word_size - pos (1..rest_length).each do |length| substr = letters[pos, length].join('') updater[@language.hyphen, substr, pos] updater[@language.start, substr, pos] if pos.zero? updater[@language.stop, substr, pos] if (length == rest_length) end end updater[@language.both, word, 0] if @language.both[word] (0..@left).each { |i| result[i] = 0 } ((-1 - @right)..(-1)).each { |i| result[i] = 0 } @cache[word] = make_result_list(result) end
This function will hyphenate a word so that the first point is at most
NOTE: if hyphen is set to a string, it will still be counted as one character (since it represents a hyphen)
size
characters.
# File lib/text/hyphen.rb, line 184 def hyphenate_to(word, size, hyphen = '-') point = hyphenate(word).delete_if { |e| e >= size }.max if point.nil? [nil, word] else [word[0 ... point] + hyphen, word[point .. -1]] end end
Returns a string describing the structure of the patterns for the language of this hyphenation object.
# File lib/text/hyphen.rb, line 195 def stats _b = @language.both.size _s = @language.start.size _e = @language.stop.size _h = @language.hyphen.size _x = @language.exceptions.size _T = _b + _s + _e + _h + _x s = <<-EOS The language '%s' contains %d total hyphenation patterns. % 6d patterns are word start patterns. % 6d patterns are word stop patterns. % 6d patterns are word start/stop patterns. % 6d patterns are normal patterns. % 6d patterns are exceptions. EOS s % [ @iso_language, _T, _s, _e, _b, _h, _x ] end
Returns a visualization of the hyphenation points.
hyp.visualize('representation') #=> rep-re-sen-ta-tion
Any string can be set instead of the default hyphen:
hyp.visualize('example', '­') #=> exam­ple
Because hyphenation can be expensive, if the word has been visualised previously, it will be returned from a per-instance cache.
# File lib/text/hyphen.rb, line 159 def visualise(word, hyphen = '-') return @vcache[word] if @vcache.has_key?(word) w = word.dup s = hyphen.size hyphenate(w).each_with_index do |pos, n| # Insert the hyphen string at the ported position plus the offset of # the last hyphen string inserted. w[pos.to_i + (n * s), 0] = hyphen unless pos.zero? end @vcache[word] = w end
Creates a hyphenation object with the options requested. The options available are:
The language to perform hyphenation with. See language and iso_language.
The minimum number of characters to the left of a hyphenation point. See left.
The minimum number of characters to the right of a hyphenation point. See right.
The options can be provided either as hashed parameters or set as methods in an initialization block. The following initializations are all equivalent:
hyp = Text::Hyphenate.new(:language => 'en_us') hyp = Text::Hyphenate.new(language: 'en_us') # under Ruby 1.9 hyp = Text::Hyphenate.new { |h| h.language = 'en_us' }
# File lib/text/hyphen.rb, line 75 def initialize(options = {}) # :yields self: @iso_language = options[:language] @left = options[:left] @right = options[:right] @language = nil @cache = {} @vcache = {} @hyphen = {} @begin_hyphen = {} @end_hyphen = {} @both_hyphen = {} @exception = {} @first_load = true yield self if block_given? @first_load = false load_language @left ||= DEFAULT_MIN_LEFT @right ||= DEFAULT_MIN_RIGHT end