Custom Processing

  Previous topic Next topic JavaScript is required for the print function  

 

This feature -- which, like API, is not for those without a tame programmer to help -- is found under Adjust Settings | Advanced.

 

The point of it…

I cannot know which criteria you have in processing your texts, other than the criteria already set up (the choice of texts, of search-word, etc.) You might need to do some specialised checks or alteration of data before it enters the WordSmith formats. For example, you might need to lemmatise a word according to the special requirements of your language.

This function makes that possible. If for example you have chosen to filter concordances, as Concord processes your text files, every time it finds a match for your search-word, it will call your .dll file. It'll tell your own .dll what it has found, and give it a chance to alter the result or tell Concord to ignore this one.

 

How to do it…

Choose your .dll file (it can have any filename you've chosen for it) and check one or more of the options in the Advanced page. You will need to call standard functions and need to know their names and formats. It is up to you to write your own .dll program which can do the job you want. This can be written in any programming language (C++, Java, Pascal, etc.).

 

An example for lemmatising a word in WordList

 

The following DLL is supplied with your installation, compiled & ready to run.

 

Your .dll needs to contain a function with the following specifications

 

function WordlistChangeWord(

  original : pointer;

  language_identifier : DWORD;

  is_Unicode : WordBool) : pointer; stdcall;

 

The language_identifier is a number corresponding to the language you're working with. See List of Locale ID (LCID) Values as Assigned by Microsoft .

 

So the "original" (sent by WordSmith) can be a PCHAR (7 or 8-bit) or a PWIDECHAR (16-bit Unicode) and the result which your .dll supplies can point to

 

a) nil (if you simply do not want the original word in your list)

b) the same PCHAR/PWIDECHAR if it is not to be changed at all

c) a replacement form

 

Here's an example where the source text was

 

Today is Easter Day.

 

 

custom_processingEASTER

 

 

tog_minus        Source code

The source code for the .dll in Delphi is this

 

library WS5WordSmithCustomDLL;

 

uses

 

  Windows, SysUtils;

 

{

 This example uses a very straightforward Windows routine for comparing

 strings, CompareStringA and CompareStringW which are in a Windows .dll.

 

 The function does a case-insensitive comparison because

 NORM_IGNORECASE (=1) is used. If it was replaced by 0, the comparison

 would be case-sensitive.

 

 In this example, EASTER gets changed to CHRISTMAS.

}

 

function WordlistChangeWord(

  original : pointer;

  language_identifier : DWORD;

  is_Unicode : WordBool) : pointer; stdcall;

begin

  Result := original;

  if is_Unicode then begin

    if CompareStringW(

      language_identifier,

      NORM_IGNORECASE,

      PWideChar(original), -1,

      PWideChar(widestring('EASTER')), -1) - 2 = 0

    then

      Result := pwidechar(widestring('CHRISTMAS'));

  end else begin

    if CompareStringA(

      language_identifier,

      NORM_IGNORECASE,

      PAnsiChar(original), -1,

      PAnsiChar('EASTER'), -1) - 2 = 0

    then

      Result := pAnsichar('CHRISTMAS');

  end;

end;

 

function ConcordChangeWord(

  original : pointer;

  language_identifier : DWORD;

  is_Unicode : WordBool) : pointer; stdcall;

begin

  Result := WordlistChangeWord(original,language_identifier,is_unicode);

end;

 

function KeyWordsChangeWord(

  original : pointer;

  language_identifier : DWORD;

  is_Unicode : WordBool) : pointer; stdcall;

begin

  Result := WordlistChangeWord(original,language_identifier,is_unicode);

end;

 

{

 This routine exports each concordance line together with

   the filename it was found in

   a number stating how many bytes into the source text file the entry was found

   its hit position in that text file counted in characters (not bytes) and

   the length of the hit-word

   (so if the search was on HAPP* and the hit was HAPPINESS this would be 9)

 This information is saved in Unicode appended to your results_filename

}

 

function HandleConcordanceLine

 (source_line : pointer;

  hit_pos_in_characters,

  hit_length : integer;

  byte_position_in_file,

  language_id : DWORD;

  is_Unicode : WordBool;

  source_text_filename,

  results_filename : pwidechar) : pointer; stdcall;

 

  function extrasA : ansistring;

  begin

    Result := #9+ ansistring(widestring(pwidechar(source_text_filename)))+

              #9+ ansistring(IntToStr(byte_position_in_file))+

              #9+ ansistring(IntToStr(hit_pos_in_characters))+

              #9+ ansistring(IntToStr(hit_length));

  end;

 

  function extrasW : widestring;

  begin

    Result := #9+ widestring(pwidechar(source_text_filename))+

              #9+ IntToStr(byte_position_in_file)+

              #9+ IntToStr(hit_pos_in_characters)+

              #9+ IntToStr(hit_length);

  end;

 

const

  bm: char = widechar($FEFF);

var f : File of widechar;

  output_string : widestring;

begin

  Result := source_line;

  if length(results_filename)>0 then

  try

    AssignFile(f,results_filename);

    if FileExists(results_filename) then begin

      Reset(f);

      Seek(f, FileSize(f));

    end else begin

      Rewrite(f);

      Write(f, bm);

    end;

    if is_Unicode then

      output_string := pwidechar(source_line)+extrasW

    else

      output_string := pAnsichar(source_line)+widestring(extrasA);

    if length(output_string) > 0 then

      BlockWrite(f, output_string[1], length(output_string));

    CloseFile(f);

  except

  end;

end;

 

exports

 

  ConcordChangeWord,

  KeyWordsChangeWord,

  WordlistChangeWord,

  HandleConcordanceLine;

 

begin

end.

 

 

See also : API, custom settings

Page url: http://www.lexically.net/downloads/version5/HTML/?custom_processing.htm