How To Set Up A Light-Weight On-line Thesaurus For Vim

08:02 Fri, 17 Feb 2012

Vim has support for a built-in thesaurus. However, it consumes a lot of memory, which you may not want for a feature you do not use much, and its auto-complete selection has issues. Here is how to set up an on-line thesaurus query that is light weight.

Summary

This is the first post of two (second here) about a light weight way to implement a thesaurus. It is great for what I need, which is the occasional use of a thesaurus for writing text such as this article. Once it is set up, you can forget about it and just use K whenever you want to look up a word.

A nice bonus or synergy of using an online source is that the website also returns a definition for the word, so it functions as a simple dictionary as well.

The second post (here) will deal with how to use Vim's built in syntax rule sets to provide highlighting and nice colours.

There are two or three simple steps:

  1. a vim script that passes the cursor word to an external shell script.
  2. a shell script that looks up the word using an online thesaurus, then parses the output to remove unnecessary cruft.
  3. an optional syntax file to provide highlighting and colours.

Here is what it looks like with a quick-n-dirty syntax rule set:

vim with thesaurus window

Sample output for "quick"


I use Vim for programming and it is great for that. I use it when I need to edit files on remote servers that I have ssh'ed in. I also use Vim for writing text because it is a great text editor. I am writing this in Vim, for example.

Often, one of the things you want when writing text is a thesaurus, which is the topic for today.

Why on-line

Vim comes with support for a thesaurus, but I've never really liked it, for two reasons.

You download a thesaurus (the Moby one is common), point Vim to it and you are good to go. However, a thesaurus can consume a lot of memory. The Moby file is 24 MB, although to be fair Vim doesn't use anything like that. If you are using Vim remotely, you may not want to use any extra memory. For example, I have a remote VPS that has only 128 MB of total memory, so every megabyte is critical. This is especially true if you use the thesaurus only rarely; the trade-off is not worth it.

The bigger issue for me is that Vim does not handle a long list of alternative words very well. It is fine with a short list where you use it just like auto-complete, but it seems to get confused with a long list. Unfortunately, Moby often serves up a long list, and you frequently find you have inadvertently changed your perfectly good word to something completely different. You then have to Undo and try again.

In fact, I got so frustrated with the Vim/Moby combination I ended up unsetting the thesaurus feature in Vim after just a couple of months.

The good news is that there is an alternative, and that is to use one of the many on-line thesauruses.

I have set things up so that a script calls a command-line browser to query for a word and dump the output from the online website, then parses it to remove extra cruft such as sidebars, headers and footers, and white space, and puts it into a scratch buffer in Vim itself for me to browse through and perhaps copy a word from.

Vim setup

First, set up Vim to call a script, which I will describe further down, and display the results in a scratch buffer. (I shamelessly pulled the concept straight from the ReadMan script, which displays a Unix man page for the keyword over the cursor.) To use it, simply press K with the cursor under the word that you want to see alternatives. You can do all the normal things in the buffer like search, jump to a line, copy a word (cw), and so on. To close it, just hit the q key.

Copy and paste the following into your .vimrc or wherever you prefer. I have it in a common.vim file in my ftplugin directory, where my various filetype scripts can source it if they want.

fun! ReadThesaurus()
" Assign current word under cursor to a script variable
let s:thes_word = expand('<cword>')
" Open a new window, keep the alternate so this doesn't clobber it.
keepalt split thes_
" Show cursor word in status line
exe "setlocal statusline=" . s:thes_word
" Set buffer options for scratch buffer
setlocal noswapfile nobuflisted nowrap nospell
\buftype=nofile bufhidden=hide
" Delete existing content
1,$d
" Run the thesaurus script
exe ":0r !/home/nickcoleman/bin/thesaurus " . s:thes_word
" Goto first line
1
" Set file type to 'thesaurus'
set filetype=thesaurus
" Map q to quit without confirm
nmap <buffer> q :q<CR>
endfun
" Map the K key to the ReadThesaurus function
noremap <buffer> K :call ReadThesaurus()<CR><CR>

The script is fully commented for you to follow what is going on.

Notice I specified the location of the shell script that Vim is calling to be $HOME/bin/thesaurus. Change this to wherever you are going to put the shell script.

I call the scratch buffer "thes_" so that I have a name for it which means I can easily re-use the buffer again. Otherwise Vim would create a new buffer every time and go through buffer numbers like crazy. In the unlikely event you have an actual file called "thes_", change the script to use some other wacky name. I could have used the Vim function tempname() to generate a unique buffer name, but "thes_" is meaningful and easy to remember if you want to unhide the buffer later.

I set the status line to show the word I am looking up. Sometimes the website will return a different word, a near synonym, so the status line reminds me exactly which word I am looking up. An example is looking up the word "the" which returns the definition and synonyms for "histrionical". No, I don't know why.

I include a line to set the filetype to "thesaurus". Its purpose is to allow me to set up anything special for that filetype later on. A possible use would be to create a syntax file to do special highlighting or colouring. That will be the topic for the next post.

The mapping for K has a second <CR>. It eats up the "Press ENTER or type command to continue" prompt. There are other ways to do this, but they can mess up the display or have the side effect of applying globally instead of just this buffer.

By the way, in case it isn't obvious, the reason I split the functionality in to two—part- Vim and part-shell scripts—instead of doing it all in Vim is because Vim's scripting language, always an awkward beast at the best of times, would make it too hard. Best to use the good parts of each and combine them.

Thesaurus script

Up to now, everything has been operating system agnostic. The script below is written for unix-like operating systems, including OS X, because it uses some unix utilities like links and sed. (The others it uses such as basename and readlink aren't absolutely necessary, just good style.) I have written a paragraph or two below on how to get those tools for Windows.

Originally I wrote a quick-n-dirty one-liner, but I decided the script might be useful outside Vim as well, so I tidied it up to be a useful script that you can call from the command line.

This is the shell script that Vim calls. Put it in the location you specified in the Vim script above. It is straightforward, apart from the sed call, and there are enough comments so you can see what is happening.

#!/bin/sh

# This searchs an online thesaurus, cuts as much cruft out as possible,
# and displays the definitions.

#URL='http://www.merriam-webster.com/thesaurus/'
URL='http://thesaurus.com/browse/'

# Display help, and quit.
function usage {
cat << EOF_HELP
Usage: $(basename $0) [-r][-h] word
Display the parsed results of an online thesaurus search for <word>

-r Raw results; no filtering.
-h Display this help.

EOF_HELP
exit
}

# Check for a parameter. Get the output from the website
function get_thes {
[ -z "$1" ] && usage
$(readlink -e $(which links)) -dump "${URL}$1"
}

# Check for a -h parameter or absence of parameter.
[ "$1" = "-h" -o -z "$1" ] && usage

# If -r just get the raw output, otherwise pass it through sed.
if [ "$1" = "-r" ] ; then
shift
get_thes $1
else
# Sed is doing many things:
# Print out "no results found for " and quit.
# Print out only the lines I'm interested in, which are:
# No results found for
# Main Entry:
# Definition:
# and all from Synonyms: to Antonyms:
# Put a new line after Antonyms.
get_thes $1 | sed -n -e '/No results found for/ {' \
-e ' s|^[ \t]*\(No results .*\)|\1|p ; q} '\
-e '{/Main Entry:/p ' \
-e '/Definition:/p ' \
-e '/Synonyms:/,/\(Antonyms:\)\|\(^$\)/p} ' \
-e '/Antonyms:/ a \ '
fi

I use thesaurus.com because I found it has a wider coverage than merriam-webster.com. If you want to use a different website, you will need to write your own sed script to parse it.

I separated the sed actions into discrete chunks above so you can get a clearer picture of what sed is doing, but they appear as one line in the actual script. That line is below, for you to cut-n-paste.

    get_thes $1 | sed -n -e '/No results found for/ { s|^[ \t]*\(No results .*\)|\1|p ; q} ;{/Main Entry:/p ; /Definition:/p ; /Synonyms:/,/\(Antonyms:\)\|\(^$\)/p} ; /Antonyms:/ a \  '

Install

I prefer that this script is only available in buffers where I am writing text. That way, I can keep the K mapping for unix man pages in buffers where I am writing code.

The way to do that is to put the Vim script in its own file that is called only by text-like filetypes. Scripts that are unique to filetypes are put in ~/.vim/ftplugin or ~/.vim/after/ftplugin. I created files named text.vim, xml.vim and blog.vim that are in the ftplugin directory. (XML is treated as text because most of my XML is text data.) I put the thesaurus script in a file called common.vim and I source common.vim from within all the above ftplugin scripts.

If ftplugin (and perhaps autocmd) are a mystery, see :h ftplugin and :h autocmd.

With that, I am done. To use, move the cursor in Normal mode underneath the word of interest and press K.

Bugs

I noticed once that the script loses permissions to a temporary file that Vim uses internally. It seems to only happen if a remote ssh session in screen is unexpectedly terminated, and not always then. Closing and re-opening Vim (yes, a pain) will fix it.

Additional Info for Windows Users

Summary

In summary, Windows users have a little extra to do, but not much.
  1. Install sed and links. It takes only a few seconds each to download and install. The default install is fine.
  2. Create a batch file to get and parse the thesaurus data. The batch file will be an abbreviated form of the shell script above, tailored for Windows.
  3. Point vim to the location of the batch file.

Install Links & Sed

Windows users will need to install the links and sed commands and perhaps a shell if you don't want to use a DOS batch file (which is below). I did a bit of a search and found a Windows build of links here, and a good set of unix tools for Windows here (direct link for sed is here), both of which are fine.

Create DOS Batch File

You don't need the full-blown shell script above. The simple DOS batch file below will do. It runs links and then pipes the output to sed. You might need to change the paths to the folder(s) where you installed links and sed if you did not use the defaults when you installed them.

@"c:\program files\links\links.exe" -dump http://thesaurus.com/browse/%1 | "c:\program files\gnuwin32\bin\sed.exe" -n -e "/No results found for/ { s|^[ \t]*\(No results .*\)|\1|p ; q} ;{/Main Entry:/p ; /Definition:/p ; /Synonyms:/,/\(Antonyms:\)\|\(^$\)/p} ; /Antonyms:/ a \  "

There are a couple of differences to the unix script. Sed needs double quotes surrounding its commands instead of single quotes. The paths to the executables need double quotes because of the spaces in the path. I put a '@' in front of everything to prevent DOS from echoing back the entire command. Finally, I put %1 instead of $1 for the DOS way to pass in the parameter.

It is probably worth opening a cmd.exe window and testing the batch file. Assuming you called the batch file "thesaurus.bat", run thesaurus.bat loose and you should get a listing back after a few seconds with all the synonyms of "loose".

Point Vim

Now that you have a batch file that works, put it in a folder somewhere. The vim script needs one small change to point to that location.

In Windows, $HOME expands to "C:\Documents and Settings\{user}".1 However, it doesn't expand in a script if it is contained within quotes. So the vim script should be changed from

" Run the thesaurus script :exe ":0r !$HOME/bin/thesaurus " . s:thes_word

to

" Run the thesaurus script :exe ':0r !"' . $HOME . '\thesaurus.bat" ' . s:thes_word
assuming again that your DOS batch file is called "thesaurus.bat". This puts $HOME outside the quotes and Vim will expand it to "C:\Documents and Settings\{user}\thesaurus.bat"1.

If you put the batch file in a sub folder, add that folder name in front of '\thesaurus.bat' like this: '\my_folder\thesaurus.bat'.

If it does not work, because you tested the batch file itself beforehand, the problem is almost certainly in how you specified the location of your batch file in the vim script.


Leave a comment

Your email address will not be published. Required fields are marked *

Plain text only please, any < or > are removed.