Changeset d2ba33c for util.c


Ignore:
Timestamp:
Aug 16, 2017, 12:53:41 PM (7 years ago)
Author:
Jason Gross <jasongross9@gmail.com>
Branches:
master
Children:
5dee79a
Parents:
47225c9
git-author:
Jason Gross <jgross@mit.edu> (01/01/14 20:59:51)
git-committer:
Jason Gross <jasongross9@gmail.com> (08/16/17 12:53:41)
Message:
Use g_utf8_casefold and g_utf8_normalize

We define a convenience function compat_casefold in util.c for reuse in
filters.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • util.c

    rcba6b9c rd2ba33c  
    643643}
    644644
     645CALLER_OWN char *owl_util_compat_casefold(const char *str)
     646{
     647  /*
     648   * Quoting Anders Kaseorg at https://github.com/barnowl/barnowl/pull/54#issuecomment-31452543:
     649   *
     650   * The Unicode specification calls this compatibility caseless matching, and
     651   * the correct transformation actually has five calls:
     652   * NFKC(toCasefold(NFKD(toCasefold(NFD(string))))) Zephyr’s current
     653   * implementation incorrectly omits the innermost NFD, but that difference
     654   * only matters for characters including U+0345 ◌ͅ COMBINING GREEK
     655   * YPOGEGRAMMENI. I think we should just write the correct version and get
     656   * Zephyr fixed.
     657   *
     658   * Neither of these operations should be called toNFKC_Casefold, because that
     659   * has slightly different behavior regarding Default_Ignorable_Code_Point. I
     660   * propose compat_casefold. And I guess if Jabber wants it too, we should
     661   * move it to util.c.
     662   */
     663  char *tmp0 = g_utf8_normalize(str, -1, G_NORMALIZE_NFD);
     664  char *tmp1 = g_utf8_casefold(tmp0, -1);
     665  char *tmp2 = g_utf8_normalize(tmp1, -1, G_NORMALIZE_NFKD);
     666  char *tmp3 = g_utf8_casefold(tmp2, -1);
     667  char *out = g_utf8_normalize(tmp3, -1, G_NORMALIZE_NFKC);
     668  g_free(tmp0);
     669  g_free(tmp1);
     670  g_free(tmp2);
     671  g_free(tmp3);
     672
     673  return out;
     674}
     675
    645676/* This is based on _extract() and _isCJ() from perl's Text::WrapI18N */
    646677int owl_util_can_break_after(gunichar c)
Note: See TracChangeset for help on using the changeset viewer.