custom processing

This feature -- which, like API, is not for those without a tame programmer to help -- is found under Adjust Settings | Advanced.

The point of it…

I cannot know which criteria you have in processing your texts, other than the criteria already set up (the choice of texts, of search-word, etc.) You might need to do some specialised checks or alteration of data before it enters the WordSmith formats. For example, you might need to lemmatise a word according to the special requirements of your language.

This function makes that possible. If for example you have chosen to filter concordances, as Concord processes your text files, every time it finds a match for your search-word, it will call your .dll file. It'll tell your own .dll what it has found, and give it a chance to alter the result or tell Concord to ignore this one.

How to do it…

Choose your .dll file (it can have any filename you've chosen for it) and check one or more of the options in the Advanced page. You will need to call standard functions and need to know their names and formats. It is up to you to write your own .dll program which can do the job you want. This can be written in any programming language (C++, Java, Pascal, etc.).

An example for lemmatising a word in WordList

The following DLL is supplied with your installation, compiled & ready to run.

Your .dll needs to contain a function with the following specifications

function WordlistChangeWord(

original : pointer;

language_identifier : DWORD;

is_Unicode : WordBool) : pointer; stdcall;

The language_identifier is a number corresponding to the language you're working with. See List of Locale ID (LCID) Values as Assigned by Microsoft .

So the "original" (sent by WordSmith) can be a PCHAR (7 or 8-bit) or a PWIDECHAR (16-bit Unicode) and the result which your .dll supplies can point to

a) nil (if you simply do not want the original word in your list)

b) the same PCHAR/PWIDECHAR if it is not to be changed at all

c) a replacement form

Here's an example where the source text was

Today is Easter Day.

custom_processingEASTER

Source code

The source code for the .dll in Delphi is this

library WS5WordSmithCustomDLL;

uses

Windows, SysUtils;

{

This example uses a very straightforward Windows routine for comparing

strings, CompareStringA and CompareStringW which are in a Windows .dll.

The function does a case-insensitive comparison because

NORM_IGNORECASE (=1) is used. If it was replaced by 0, the comparison

would be case-sensitive.

In this example, EASTER gets changed to CHRISTMAS.

}

function WordlistChangeWord(

original : pointer;

language_identifier : DWORD;

is_Unicode : WordBool) : pointer; stdcall;

begin

Result := original;

if is_Unicode then begin

if CompareStringW(

language_identifier,

NORM_IGNORECASE,

PWideChar(original), -1,

PWideChar(widestring('EASTER')), -1) - 2 = 0

then

Result := pwidechar(widestring('CHRISTMAS'));

end else begin

if CompareStringA(

language_identifier,

NORM_IGNORECASE,

PAnsiChar(original), -1,

PAnsiChar('EASTER'), -1) - 2 = 0

then

Result := pAnsichar('CHRISTMAS');

end;

function ConcordChangeWord(

original : pointer;

language_identifier : DWORD;

is_Unicode : WordBool) : pointer; stdcall;

begin

Result := WordlistChangeWord(original,language_identifier,is_unicode);

end;

function KeyWordsChangeWord(

original : pointer;

language_identifier : DWORD;

is_Unicode : WordBool) : pointer; stdcall;

begin

Result := WordlistChangeWord(original,language_identifier,is_unicode);

end;

{

This routine exports each concordance line together with

the filename it was found in

a number stating how many bytes into the source text file the entry was found

its hit position in that text file counted in characters (not bytes) and

the length of the hit-word

(so if the search was on HAPP* and the hit was HAPPINESS this would be 9)

This information is saved in Unicode appended to your results_filename

}

function HandleConcordanceLine

(source_line : pointer;

hit_pos_in_characters,

hit_length : integer;

byte_position_in_file,

language_id : DWORD;

is_Unicode : WordBool;

source_text_filename,

results_filename : pwidechar) : pointer; stdcall;

function extrasA : ansistring;

begin

Result := #9+ ansistring(widestring(pwidechar(source_text_filename)))+

#9+ ansistring(IntToStr(byte_position_in_file))+

#9+ ansistring(IntToStr(hit_pos_in_characters))+

#9+ ansistring(IntToStr(hit_length));

end;

function extrasW : widestring;

begin

Result := #9+ widestring(pwidechar(source_text_filename))+

#9+ IntToStr(byte_position_in_file)+

#9+ IntToStr(hit_pos_in_characters)+

#9+ IntToStr(hit_length);

end;

const

bm: char = widechar($FEFF);

var f : File of widechar;

output_string : widestring;

begin

Result := source_line;

if length(results_filename)>0 then

try

AssignFile(f,results_filename);

if FileExists(results_filename) then begin

Reset(f);

Seek(f, FileSize(f));

end else begin

Rewrite(f);

Write(f, bm);

end;

if is_Unicode then

output_string := pwidechar(source_line)+extrasW

else

output_string := pAnsichar(source_line)+widestring(extrasA);

if length(output_string) > 0 then

BlockWrite(f, output_string[1], length(output_string));

CloseFile(f);

except

end;

exports

ConcordChangeWord,

KeyWordsChangeWord,

WordlistChangeWord,

HandleConcordanceLine;

begin

end.