The most important file available here contains a list of 750 000
Norwegian words. Each word is marked with a number indicating the
commonness of that word. Compound words are hyphenated at their
compound points. Some words are marked as belonging to a specific
classes; mathematics, oil, conservative language, 'samnorsk' etc.
Words marked with a star are allowed in Nynorsk.
This file is usable to several things:
Making dictionaries for the Ispell program of different sizes,
choosing which words to include in a sensible way.
Making Norwegian dictionaries for word processors that doesn't
have one, again with a sensible subset of words.
Making new and better hyphenation patterns for TeX.
Text-recognition (OCR) programs.
Encourage the e-TeX team to implement multi-level hyphenation in
Encourage people to use frequency-information when they write
programs making suggestions for replacements for misspelled words.
Routines for the first three items on the above list is included in
the Makefiles. The last three was too hard to implement in Make.
For the ispell-related stuff, you need the ispell program, and you can
get it from the
ispell home-page. You can also find dictionaries for a lot of
You also need the version of the look program in util-linux-2.9.
Older versions have a bug which shows up when searching dictionaries
with non-English characters. Ispell uses look to complete words
(ispell-complete-word). If you don't plan to have a Norwegian words
file for lookup, you don't need to worry about the Look program.
If you want to use ispell from Emacs, i recommend upgrading to the
latest version of ispell.el.
This version supports Norwegian, and it has become clean to include
local dictionary definitions. It is almost like the version included
in in Emacs-20.4. There is also an add-on to ispell.el,
flyspell.el, written by
Manuel Serrano available, offering better `on-the-fly'
spell-checking. An old version is included in emacs-20.3, but you
would like to have the new version with important speed improvements.
If you want to make your own hyphenation patterns for TeX (you
probably don't), you need a version of the patgen program with greater
capacities than standard versions, e.g. you have to compile patgen
with a different patgen.ch. See the patterns/Makefile for more
information. Almost every TeX distribution contains the patgen
program. I recommend teTeX.
If you want both kinds of hyphenation in the same TeX format, you
probably need to recompile TeX due to capacity problems. Again, this
is easy with teTeX.
How to make ispell and emacs work with these dictionaries.
words.norsk.sq This file contains the Norwegian words and
the indication of their commonness compressed with the sq program.
norsk.aff.in A template for the affix file for the
Norwegian language. This file is made for ispell with 64 maskbits
that understands HTML. Most pre-made versions of ispell supports only
32 maskbits and don't understand HTML. Use the patch and recompile
ispell, or delete the html-related stuff.
Ispell-3.1.20.no.patch A patch for ispell-3.1 that adds
the amsmath and breqn environments to the skip-list and fixes a bug in
buildhash. It also makes ispell html-aware, and tries to fix `the
backslash bug'. In addition it makes ispell suggest "- as a compound
word mark when seeing an unknown compound word, but only in TeX mode
if the dictionary is named norsk. This is an ugly hack that works for
me. It also implement the -r flag which is like the -a flag, but the
suggestions are printed even if the word is found in the dictionary.
norsk.single.tex This is a set of hyphenation patterns
for TeX that works well on non-compound words. It is used when making
the new hyphenation patterns for TeX. This file is basically made
from nohyph3.tex, a hyphenation file I released May 1998. But a lot of
errors have been removed by comparing its action on the single words
by the action of nohyph.tex
(standard in teTeX), nohyph2.tex,
and the unreleased hyphenation patterns by Simen Gaure used at the Department of
Mathematics, University of
Oslo. I have tried to follow the rules given in nohyph.tex,
at least where I find it reasonable. Bear in mind that there is no
authoritative source for hyphenation in Norwegian. Please get in
touch if you want to help improving the Norwegian hyphenation
addition to Babel-3.6 for LaTeX that makes the character " active and
offers you many `different' hyphen signs. You can say o"ppussing in
LaTeX to get correct hyphenation opp-pussing! This functionality will
appear in Babel-3.7 for Norwegian. Danish and Swedish have had it for
Search for words in a file or from standard input that maybe should be
written in one word. Like `matematikk lærer' etc.
Search for words in a file from from standard input that the Norwegian
hyphenation patterns from this distribution might not hyphenate
properly. Incorrect hyphenation of words not printed is considered to
be a bug in the patterns. There is only a finite number of them.
Makefile This file contains rules for making
dictionaries for ispell and lists of the most common words for dumb
word processors. There is also a Makefile in the patterns directory
for making hyphenation patterns.
nohyphbc.tex, nohyphb.tex This
is the hyphenation patterns for TeX. The file nohyphbc.tex hyphenates
only at compound points. The nohyphb.tex hyphenates each component of
a word too, but avoiding to hyphenate 'near' compound points. I think
'bar-nepsykologen' looks really bad. Too bad TeX doesn't support
multi-level hyphenation yet.
The naming of the files follow the paradigm in Babel; if a replacement
for a file foo.bar is offered, it is named foob.bar, where the b
stands for big.
These new patterns easily outpreforms those available before, mostly
because of better compound word hyphenation. For reference I have
made lists of about 2000 compound word errors made by previous
patterns: err.nokyph and
The size of the patterns can be argued over. The patterns are copied
into each format file, thus occupying some disk space. They also
limit the number of languages one can load hyphenation patterns for on
most TeX systems. But size considerations has become less important
recent years, so I prefer to focus on getting things right, not small.
It is also possible to recompile teTeX such that there is more room
for hyphenation patterns, but the patterns take up more memory then.
There is surely a lot of unnessesary structure within the hyphenation
patterns , but it is very time-consuming to remove. The file
patterns/Makefile can be configured, such that one can make smaller
sets of patterns, taking only the most common words into
concideration. Everyone is invited to play.
COPYING The GNU general public license.
There has been a lot of changes since version 1.1a. The quality has
improved a lot, and the structure of the distribution is completely
new. Therefore i choose not to make the previous versions available
from this site.
Here is a rough summary of the changes:
New distribution format
Support for Nynorsk
Commonness indicator for each word from Bokmål
Words are hyphenated at their compound points
A lot of common words added, especially compound words
Makefile completly rewritten. It is possible to configure the
size of the dictionary for ispell without beaking the munching.
Makefile to make hyphenation patterns for TeX added
The pregenerated TeX patterns are included in the distribution.
Controlled compoundwords support added. This includes affix file
Some uncommon and misspelled words removed
Affix file updated for html. This will only work if you use the patch.
Remove/mark uncommon words that are close to common words. If you
type 're' you probably meant 'er', even if 're' is a valid word.
There are too many words with commonness 0. Split this group in
Some words in the basic category belongs in special categories.
When making a small dictionary with all words from mathematics, many
such words are missing, since they are in the basic category. They
should be moved.
Make ispell sort the suggested replacements for misspelled words
by commonness of the suggested words. One (easy) way to do this is to
make an external file containing the most common words, and make
ispell look into that file each time it has more than one suggestion.
Or the file could be read into memory. (I don't think frequency
information is representable within the root/affix structure, since
one flag can represent multiple words.) This would slow ispell down a
little bit, but only when it makes suggestions. If you would like to
help with this, please get in touch.
Comments, suggestions and bug-reports to firstname.lastname@example.org. If you have
or want to make a correct dictionary from some field of knowledge, i
would like to include it in the next release. See the README file for some
suggestions about how to get started. All you need is a large amount
of Norwegian text from the field in question and some time to organize