List of articles   Terminology   Choose language


Rough equality of strings on DDL and DML


Operation of comparison, which is named as "rough equality" and is designated "", can be applied to strings. It returns natural value, equal to deviation of two strings, or "null", if two strings are not equal.

Following differences can exist between words: superfluous letter, missed letter, other letter (i.e. one letter is replaced by other letter). Differences "upper-case letter" - "lower-case letter", "abbreviation" - "word from lower-case letters" are considered as difference "missed letter" (as presence-absence of two unprintable symbols-prefix, creating capital letter and abbreviation). Each difference is estimated in four points (because irrelevance accepts values in range from zero to three). Such set of differences is chosen of all variants, at which sum of points is minimal - this minimal sum is named as incongruity. Variance of positions between two words concerning base line (ideas Unicode2 for coding Unicode) is name as irrelevance. If to represent positions as sequence of control signs, then irrelevance is calculated so: first identical signs of positions are rejected, maximal length of got stumps is irrelevance. Sum of incongruity and irrelevance is named as deviation of two words. Words are not rough-equal at detection more than two differences "other letter".

Following differences can exist between sentences: superfluous word, missed word, permutation of two words (permutation is impossible, if words are divided by mark of punctuation), other word (i.e. one word is replaced by other word), convolution (of phrase into an abbreviation from initial letters of each word of a phrase). Each difference is estimated in sixteen points (for convolution - in sixteen points on each letter of an abbreviation). It try to disassemble difference "other word" as set of differences between pair of words for reason to make quantity of points less sixteen (if two words are not equal, then difference "other word" with sixteen points is counted). Blank between words can be excluded or is replaced by hyphen - Both transformation is estimated in point one. Such set of differences between sentences is chosen of all variants, at which sum of points is minimal. This sum is named as deviation of two sentences. Sentences are not rough-equal at detection more than two differences "missed word".

Deviation of two strings, if at least one of them contains more than one words, is calculated so, as deviation of two sentences, but letter "point" is regarded as mark of punctuation (i.e. it forbids permutations of words). If both strings consist of one word, then deviation of two strings is calculated as deviation of two words. Letter "point" in the end of strings is not considered in any case. If one of compared strings is "null", then deviation of two strings is equal sixteen.

Records, came into result of request, are sorted and extracted in order of increasing of variance (i.e. record with the least variance goes first).

select  ...  where a"algebraic equations"


Dmitry Turin



List of articles   Terminology   Choose language


Сайт управляется системой uCoz