RosettaCodeData/Task/Natural-sorting/00DESCRIPTION

74 lines
3.0 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{{Sorting Algorithm}}
Natural sorting is the sorting of text that does more than rely on the
order of individual characters codes to make the finding of
individual strings easier for a ''human'' reader.
There is no "one true way" to do this, but for the purpose of this task 'natural' orderings might include:
:1. Ignore leading, trailing and multiple adjacent spaces
:2. Make all whitespace characters equivalent.
:3. Sorting without regard to case.
:4. Sorting numeric portions of strings in numeric order. That is split the string into fields on numeric boundaries, then sort on each field, with the rightmost fields being the most significant, and numeric fields of integers treated as numbers.
:: foo9.txt before foo10.txt
:: As well as ... x9y99 before x9y100, before x10y0
:: ... (for any number of groups of integers in a string).
:5. Title sorts: without regard to a leading, very common, word such
:: as 'The' in "The thirty-nine steps".
:6. Sort letters without regard to accents.
:7. Sort ligatures as separate letters.
:8. Replacements:
:: Sort german scharfes S (ß) as ss
:: Sort ſ, LATIN SMALL LETTER LONG S as s
:: Sort ʒ, LATIN SMALL LETTER EZH as s
:: ...
;Task Description
* '''Implement the first four''' of the eight given features in a natural sorting routine/function/method...
* Test each feature implemented separately with an ordered list of test strings from the 'Sample inputs' section below, and make sure your naturally sorted output is in the same order as other language outputs such as Python.
* Print and display your output.
* '''For extra credit''' implement more than the first four.
Note: It is not necessary to have individual control of which features are active in the natural sorting routine at any time.
;Sample input:
<pre>
# Ignoring leading spaces
Text strings:
['ignore leading spaces: 2-2', ' ignore leading spaces: 2-1', ' ignore leading spaces: 2+0', ' ignore leading spaces: 2+1']
# Ignoring multiple adjacent spaces (m.a.s)
Text strings:
['ignore m.a.s spaces: 2-2', 'ignore m.a.s spaces: 2-1', 'ignore m.a.s spaces: 2+0', 'ignore m.a.s spaces: 2+1']
# Equivalent whitespace characters
Text strings:
['Equiv. spaces: 3-3', 'Equiv.\rspaces: 3-2', 'Equiv.\x0cspaces: 3-1', 'Equiv.\x0bspaces: 3+0', 'Equiv.\nspaces: 3+1', 'Equiv.\tspaces: 3+2']
# Case Indepenent sort
Text strings:
['cASE INDEPENENT: 3-2', 'caSE INDEPENENT: 3-1', 'casE INDEPENENT: 3+0', 'case INDEPENENT: 3+1']
# Numeric fields as numerics
Text strings:
['foo100bar99baz0.txt', 'foo100bar10baz0.txt', 'foo1000bar99baz10.txt', 'foo1000bar99baz9.txt']
# Title sorts
Text strings:
['The Wind in the Willows', 'The 40th step more', 'The 39 steps', 'Wanda']
# Equivalent accented characters (and case)
Text strings:
[u'Equiv. \xfd accents: 2-2', u'Equiv. \xdd accents: 2-1', u'Equiv. y accents: 2+0', u'Equiv. Y accents: 2+1']
# Separated ligatures
Text strings:
[u'\u0132 ligatured ij', 'no ligature']
# Character replacements
Text strings:
[u'Start with an \u0292: 2-2', u'Start with an \u017f: 2-1', u'Start with an \xdf: 2+0', u'Start with an s: 2+1']</pre>