teachqert.blogg.se - Chinese word counter

#Chinese word counter code#

By default, word boundaries are identified by means of a simple classification of characters as either “word” or “non-word”, which approximates traditional regular expression behavior. If set, word boundaries are found according to the definitions of word found in Unicode UAX 29, Text Boundaries. This modifier performs the following action:Ĭontrols the behavior of \b in a pattern. Specifically, the ICU regex engine contains the UREGEX_UWORD modifier option, which can be turned on dynamically via the normal (?w.) syntax.

This is obviously not perl, :).ĭespite this, the ICU regex engine happens to have a feature that sounds remarkably like what it is that you're trying to do. I do happen to be the author of RegexKitLite, which is an Objective-C wrapper around the ICU regex engine. First, I only speak and read english, so I obviously do not speak or read chinese. I may be able to offer some insight, but it's hard to tell if my answer will be "helpful". #!/usr/bin/perl -wįoreach (split) \t times.\n" The strangest part is my console displayed all the individual Chinese words correctly without any problem.

The "total thing" is 125 that is the string number (125 lines). Which seems to me the problem is the file format. The Error message is Use of uninitialized value $valid in concatenation (.) or string at word_ line 21, line 21.

#Chinese word counter code#

I tried following perl code to count the Chinese word of a file, it seems working but not get the right thing.