Changeset cc27237 for util.c


Ignore:
Timestamp:
Jan 7, 2014, 6:02:25 PM (9 years ago)
Author:
Jason Gross <jgross@mit.edu>
Children:
611236e
Parents:
4b9c3b9
git-author:
Jason Gross <jgross@mit.edu> (01/01/14 20:59:51)
git-committer:
Jason Gross <jgross@mit.edu> (01/07/14 18:02:25)
Message:
Use g_utf8_casefold and g_utf8_normalize

We define a convenience function compat_casefold in util.c for reuse in
filters.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • util.c

    r7b89e8c rcc27237  
    640640}
    641641
     642CALLER_OWN char *owl_util_compat_casefold(const char *str)
     643{
     644  /*
     645   * Quoting Anders Kaseorg at https://github.com/barnowl/barnowl/pull/54#issuecomment-31452543:
     646   *
     647   * The Unicode specification calls this compatibility caseless matching, and
     648   * the correct transformation actually has five calls:
     649   * NFKC(toCasefold(NFKD(toCasefold(NFD(string))))) Zephyr’s current
     650   * implementation incorrectly omits the innermost NFD, but that difference
     651   * only matters for characters including U+0345 ◌ͅ COMBINING GREEK
     652   * YPOGEGRAMMENI. I think we should just write the correct version and get
     653   * Zephyr fixed.
     654   *
     655   * Neither of these operations should be called toNFKC_Casefold, because that
     656   * has slightly different behavior regarding Default_Ignorable_Code_Point. I
     657   * propose compat_casefold. And I guess if Jabber wants it too, we should
     658   * move it to util.c.
     659   */
     660  char *tmp0 = g_utf8_normalize(str, -1, G_NORMALIZE_NFD);
     661  char *tmp1 = g_utf8_casefold(tmp0, -1);
     662  char *tmp2 = g_utf8_normalize(tmp1, -1, G_NORMALIZE_NFKD);
     663  char *tmp3 = g_utf8_casefold(tmp2, -1);
     664  char *out = g_utf8_normalize(tmp3, -1, G_NORMALIZE_NFKC);
     665  g_free(tmp0);
     666  g_free(tmp1);
     667  g_free(tmp2);
     668  g_free(tmp3);
     669
     670  return out;
     671}
     672
    642673/* This is based on _extract() and _isCJ() from perl's Text::WrapI18N */
    643674int owl_util_can_break_after(gunichar c)
Note: See TracChangeset for help on using the changeset viewer.