class Text::Hyphen

An object that knows how to perform hyphenation based on the TeX hyphenation algorithm with pattern files. Each object is constructed with a specific language’s hyphenation patterns.

Public Instance Methods

clear_cache!() click to toggle source

Clears the per-instance hyphenation and visualization caches.

# File lib/text/hyphen.rb, line 173
def clear_cache!
  @cache.clear
  @vcache.clear
end
hyphenate(word) click to toggle source

Returns an array of character positions where a word can be hyphenated.

hyp.hyphenate('representation') #=> [3, 5, 8 10]

Because hyphenation can be expensive, if the word has been hyphenated previously, it will be returned from a per-instance cache.

# File lib/text/hyphen.rb, line 106
def hyphenate(word)
  word = word.downcase
  $stderr.puts "Hyphenating #{word}" if DEBUG
  return @cache[word] if @cache.has_key?(word)
  res = @language.exceptions[word]
  return @cache[word] = make_result_list(res) if res

  letters = word.scan(@language.scan_re)
  $stderr.puts letters.inspect if DEBUG
  word_size = letters.size

  result = [0] * (word_size + 1)
  right_stop = word_size - @right

  updater = Proc.new do |hash, str, pos|
    if hash.has_key?(str)
      $stderr.print "#{pos}: #{str}: #{hash[str]}" if DEBUG
      hash[str].scan(@language.scan_re).each_with_index do |cc, ii|
        cc = cc.to_i
        result[ii + pos] = cc if cc > result[ii + pos]
      end
      $stderr.print ": #{result.inspect}\n" if DEBUG
    end
  end

    # Walk the word
  (0..right_stop).each do |pos|
    rest_length = word_size - pos
    (1..rest_length).each do |length|
      substr = letters[pos, length].join('')
      updater[@language.hyphen, substr, pos]
      updater[@language.start, substr, pos] if pos.zero?
      updater[@language.stop, substr, pos] if (length == rest_length)
    end
  end

  updater[@language.both, word, 0] if @language.both[word]

  (0..@left).each { |i| result[i] = 0 }
  ((-1 - @right)..(-1)).each { |i| result[i] = 0 }
  @cache[word] = make_result_list(result)
end
hyphenate_to(word, size, hyphen = '-') click to toggle source

This function will hyphenate a word so that the first point is at most

NOTE: if hyphen is set to a string, it will still be counted as one character (since it represents a hyphen)

size characters.

# File lib/text/hyphen.rb, line 184
def hyphenate_to(word, size, hyphen = '-')
  point = hyphenate(word).delete_if { |e| e >= size }.max
  if point.nil?
    [nil, word]
  else
    [word[0 ... point] + hyphen, word[point .. -1]]
  end
end
stats() click to toggle source

Returns a string describing the structure of the patterns for the language of this hyphenation object.

# File lib/text/hyphen.rb, line 195
  def stats
    _b = @language.both.size
    _s = @language.start.size
    _e = @language.stop.size
    _h = @language.hyphen.size
    _x = @language.exceptions.size
    _T = _b + _s + _e + _h + _x

    s = <<-EOS

The language '%s' contains %d total hyphenation patterns.
    % 6d patterns are word start patterns.
    % 6d patterns are word stop patterns.
    % 6d patterns are word start/stop patterns.
    % 6d patterns are normal patterns.
    % 6d patterns are exceptions.

EOS
    s % [ @iso_language, _T, _s, _e, _b, _h, _x ]
  end
visualise(word, hyphen = '-') click to toggle source

Returns a visualization of the hyphenation points.

hyp.visualize('representation') #=> rep-re-sen-ta-tion

Any string can be set instead of the default hyphen:

hyp.visualize('example', '&shy;') #=> exam&shy;ple

Because hyphenation can be expensive, if the word has been visualised previously, it will be returned from a per-instance cache.

# File lib/text/hyphen.rb, line 159
def visualise(word, hyphen = '-')
  return @vcache[word] if @vcache.has_key?(word)
  w = word.dup
  s = hyphen.size
  hyphenate(w).each_with_index do |pos, n|
    # Insert the hyphen string at the ported position plus the offset of
    # the last hyphen string inserted.
    w[pos.to_i + (n * s), 0] = hyphen unless pos.zero?
  end
  @vcache[word] = w
end
Also aliased as: visualize
visualize(word, hyphen = '-') click to toggle source
Alias for: visualise

Public Class Methods

new(options = {}) { |self| ... } click to toggle source

Creates a hyphenation object with the options requested. The options available are:

language

The language to perform hyphenation with. See language and iso_language.

left

The minimum number of characters to the left of a hyphenation point. See left.

right

The minimum number of characters to the right of a hyphenation point. See right.

The options can be provided either as hashed parameters or set as methods in an initialization block. The following initializations are all equivalent:

hyp = Text::Hyphenate.new(:language => 'en_us')
hyp = Text::Hyphenate.new(language: 'en_us') # under Ruby 1.9
hyp = Text::Hyphenate.new { |h| h.language = 'en_us' }
# File lib/text/hyphen.rb, line 75
def initialize(options = {}) # :yields self:
  @iso_language = options[:language]
  @left         = options[:left]
  @right        = options[:right]
  @language     = nil

  @cache        = {}
  @vcache       = {}

  @hyphen       = {}
  @begin_hyphen = {}
  @end_hyphen   = {}
  @both_hyphen  = {}
  @exception    = {}

  @first_load = true
  yield self if block_given?
  @first_load = false

  load_language

  @left  ||= DEFAULT_MIN_LEFT
  @right ||= DEFAULT_MIN_RIGHT
end