How To Set Up A Light-Weight On-line Thesaurus For Vim Pt.II

09:00 Sat, 25 Feb 2012

Vim has support for a built-in thesaurus. However, it consumes memory and its auto-complete selection has issues. In Part I, I showed how to set up an on-line thesaurus. Here is how to build syntax rules that will colour the output.


This is the second post of two about a light weight way to implement a thesaurus. In Part I, I described how to set up a script that provides access to an on-line thesaurus. In this Part, I describe how to write a set of simple syntax rules to provide colour and highlighting for the output.

Here is a screenshot of the finished syntax rules (using dummy data):

highlighting screenshot

If you don't like these colours, you can change the rules to use whatever colours you prefer.

How Vim does it

Vim does syntax highlighting in two parts:

  • A set of rules that define a (group of) word, and
  • A colour description for each rule.

Set of rules

The set of rules is in a file named for the filetype in either $VIMRUNTIME/syntax or your ~/.vim/syntax. Each rule defines a name for that rule and a list or a regular expression to match words that should be captured by that rule.

An example is

syntax keyword cStatement goto break return continue asm syntax keyword cLabel case default syntax match cTodo /\/* *TODO/ syntax region cBlock start="{" end="}"
The first and second lines are the simplest case of keyword, which is a list of words. The third line is the next simplest case of a regex pattern match, in this case matching "TODO" within a C-style comment leading with "/*". The fourth line is like a broader match in that it specifies a region defined by a start pattern and an end pattern.

Notice each rule has a name that is a unique identifier: cStatement, cLabel, cTodo and cBlock.

In fact, it can be much more complicated than that. Rules can be contained within rules (for example, nested if-thens), rules can apply only if another rule has triggered (a '(' as part of a condition, but not as part of a comment), rules can apply to entire regions, and so on.

However, for our simple purpose which is to provide slight colour hints to a plain text list, we can ignore all that, to which we say, "Thank goodness." Check out sh.vim or even simple old c.vim to see why.

Colour description

The colour description is usually in the colorscheme you are using. For each rule name, it provides a colour scheme to use, along the lines of

hi Comment term=bold ctermfg=8 guifg=#7C7C7C

which translates as: for rule name "Comment", for a simple terminal set to bold, for a colour terminal set the foreground to colour number 8, for the GUI version of Vim set the foreground to colour number #7C7C7C.

You can think of the process as going like this: when Vim displays a file, each token or word is checked against the set of rules and assigned a rule name, and is then colourised according to the highlighting for that rule name.

You can see your highlighting description with :hi Comment, which shows the current highlighting for rule "Comment" for the filetype in the current buffer. To see all highlighting, use :hi by itself.

From this you can see that the rules and the highlighting scheme are tightly coupled. Each rule has a name, and that name should be in the colorscheme1 . Vim controls the tight coupling by providing a set of standard names which, if we were writing rules for a new programming language, we should use. Those names are things like "Comment", "Keyword", "Statement", and so on.

However, we are creating our own set of rules for this specific instance of text, so we can do what we like. Which we shall.

Syntax rules

The data

Here is the (made up) sample text we will be working with:

Main Entry: which pronunciation [hwich, wich] [IMG] Show IPA/wItS, Definition: what Synonyms: and that, that, whatever, whichever in order that, in that, so, so that Notes: in current usage, that refers to persons or things and which is used chiefly for things. The standard rule says that one uses that only to introduce a restrictive or Antonyms: none

Main Entry: this Definition: the one Synonyms: that, the aforementioned one, the one in question, the thing indicated, this one, this person

Typically, you get a set of entries comprising Main Entry, Definition, Synonyms, Notes and Antonyms. Not every entry is present, often Antonyms and Notes are absent.

As well, if the word is common, you often get repeating sets of entries for very similar words, as in the example where the entry for this is also shown. (This isn't real data, by the way; this doesn't appear with which normally.)

Set up some rules

How to define some rules? Some things come to mind immediately.
  • Entry names distinguished (highlighted differently) from entry contents, i.e. "Main Entry:" should be different from "which".
  • The main entry word should be bolded or similar to make it stand out.
  • The main word melds too easily into "pronunciation..", which makes it hard to pick out. Diminish everything to the right of the main word.
  • Each entry should be distinguished so you can scan them easily. For example, "Main Entry" should stand out, "Definition" less so, "Synonyms" more so.

Here is an example rule set that roughly implements the first, second and fourth items.

" Include the colon as part of the word
setlocal iskeyword+=:

" Rules
" a keyword
syntax keyword thesSynonyms Synonyms:
" this entry name has a space, so needs a regex
syntax match thesMainEntry /Main Entry: */
" this entry should include the line to the end
syntax region thesDefinition start=/Definition: / end=/$/

" Highlighting
" link the highlighting to the defined
" name "Keyword" in the colorscheme
hi link thesMainEntry Keyword
hi link thesSynonyms Statement
hi link thesDefinition Todo
" specify the colours directly
hi thesMainEntry term=bold
\ctermfg=White cterm=bold guifg=6 gui=bold
" or, keep the current colour, but bold it
hi thesMainEntry term=bold cterm=bold gui=bold

which gives this:

syntax example

You can see that the keyword "Synonyms:" has matched a rule set, and the highlighting for the rule "Statement" in my colorscheme (likely different to your colorscheme) has been applied.

The regex pattern for "Main Entry: " has matched a rule set and the highlighting for the rule set has been applied.

Similarly for "Definition", the rule set has matched to the end of the line, and the highlighting for the rule "Todo" in my colorscheme has been applied.

The real deal


Enough examples. Here is the actual syntax file to produce the highlighting in the screenshot at the top of the page.

" Vi syntax file
" Language: text dump from online thesaurus
" Maintainer: Nick Coleman
" Last Change: 2012 Feb 18
" Remark: for the online thesaurus script by Nick Coleman

if exists("b:current_syntax")

" Setup
" syntax clear " only useful for testing
syntax case match
setlocal iskeyword+=:

" Entry name rules
syntax match thesMainEntry /Main Entry: */ contained
syntax region thesDefinition start=/Definition: / end=/$/
syntax keyword thesNotes Notes: contained
syntax keyword thesSynonyms Synonyms:
syntax region thesAntonyms start=/Antonyms:/ end=/$/

" give the pronunciation region a special name
syntax region thesPronunciation start=/pronunciation \[/ end=/$/ contained

" Entry contents rules
syntax region thesMainWord start=/Main Entry:/ end=/$/ contains=CONTAINED keepend
syntax region thesNotesEntry start=/Notes:/ end=/^ *$/ contains=thesNotes,thesAntonyms keepend

" Highlighting

hi link thesMainEntry Keyword
hi thesMainWord term=bold cterm=bold gui=bold
hi link thesDefinition String
hi link thesNotes Number
hi link thesNotesEntry Number
hi link thesSynonyms Statement
hi link thesAntonyms Todo
hi link thesPronunciation Comment

let b:current_syntax = "thesaurus"

Some rules have "contains". This allows a rule within a rule, the classic example being Todo appearing within a comment where you want a different colour to make Todo stand out. "keepend" is part of that, it stops both rules at the first end pattern match rather than the final end match. :h usr_44 section 44.5 for more.


Recall from Part I that the script sets the thesaurus' buffer to filetype thesaurus. Put the above syntax file in $HOME/.vim/syntax/thesaurus.vim and the buffer will pick up the syntax rules automatically.

Windows users can put it in C:\Program Files\Vim\vimfiles\syntax\thesaurus.vim or the equivalent if using Vista or Windows 7. If you don't have administrator privileges, find where Vim thinks $HOME is by (within Vim) trying :echo $HOME or :version and putting it in $HOME\vimfiles\syntax\thesaurus.vim.

Trying it out

You probably want to use your own highlighting. A tip: to easily see the effect of your changes reload the highlighting for the data buffer with :setlocal filetype=thesaurus.

To see the colours that your colorscheme uses for a particular rule use :hi <rule> as in :hi Statement. To see all colours, use hi by itself :hi.

[1] I said the rule should be in the colorscheme. In fact, it is not an error if there is no colour highlighting for that rule, it simply gets ignored. And you can put highlighting descriptions anywhere, such as your .vimrc. For example, I quite like the colorscheme I use, except for the Search colours which I override with a separate description in my .vimrc like this:

hi Search ctermfg=white ctermbg=darkblue

Leave a comment

Your email address will not be published. Required fields are marked *

Plain text only please, any < or > are removed.