|

PlainKnowledge
Linguistic Enabling
of Document Management Systems
Document Management Systems (DMS) ensure the organization and management of
text documents. Since information is not processed based on contents, DMS
only provides the first step toward efficient document processing. The
second – by now indispensable – step is represented by Knowledge Management
Systems. Semantic analyses of the texts open new application horizons
leaving pure DMS in the past. With its innovative technologies, AppTek paves
the way to forward-looking Knowledge Management Systems.
AppTek's Products and Technologies
In the area of Knowledge Management AppTek offers several software modules,
under the
collective
name PlainKnowledge, which have already been integrated into third-party
products or have also been implemented as SDK for end customers. The
individual modules are PlainCluster (grouping), PlainClassify
(classification), PlainSummarize (summarization), PlainExtract (content
specific extraction), PlainLingua (language recognition of a text) and
PlainRetrieve (associative search).
Since AppTek combines the areas of automatic translation and speech
recognition, the customer profits not only from improvement in quality but
also realizes further benefits. For example, the speech recognition system
PlainSpeech removes the limitation of textual processing. In the future, the
semantic processing will be expanded to include speech files, recorded video
conferences and films. Additionally, the automatic translation system
PlainTranslate can overcome language barriers. The illustration above
illustrates AppTek's three major fields of competence. The overlapping areas
enable such innovative products as Speech-to-Speech-Translation, Spoken
Document Retrieval and Multilingual Information Retrieval.
PlainKnowledge for Windream
The integration of the PlainKnowledge Module into the efficient Document
Management System Windream of Windream GmbH, Bochum, realized the
continuation of the Document
Management System into a fully-fledged Knowledge Management System. The
Windream-System seamlessly adapts to the Windows Explorer environment. The
document database
is presented as a virtual drive. In contrast to a regular hard disk
directory, the Windream drive has the functionality of a Document Management
System: version control, permission assignment, document history, type
specific indexing, full text extraction, etc.. In addition to
the outstanding search function, Windream offers the option to save
additional information and
comments with each individual document. The addition of AppTek's
high-performance PLAINknowledge package achieves an expanded semantic
information processing. With the help of this software you can, for example,
extract the subject of a text without reading the document. A secondary
ordering according to subject can be done in addition to the hierarchical
structure of the folders on the hard drive. The Windream search function is
supplemented with an associative search, which goes far beyond a full text
search and is based on a contextual evaluation of the query and the text
database.
PlainCluster
PlainCluster serves in the analysis of large amounts of unstructured
documents. The result is a
potential classification of the documents based on content and class-type
key words. Each key word is listed with the frequency of its occurrence and
its degree of relevance to the cluster. This significantly reduces the
effort of manual classification which would involve reading all of the
documents. At the same time, the keyword-based manual post-processing allows
the reassignment of incorrectly allocated documents and the inclusion of
customer specific requests. The post-processed classification can be used as
input for training of the classification module.
PlainClassify
PlainClassify is a flexible, language independent system which
allocates documents into
defined classes. The integration of the Document Management System Windream
ensures the best possible archiving and quick retrieval of documents. The
fully automatic classification is based on a user-defined and trained
classification. The system is customized to the individual requirements of
the company without additional cost. The deciding advantage of
PlainClassify is found in the new technology . AppTek uses self-learning,
stochastic methods which deliver significantly better results than the
rule-based methods generally used. Furthermore, the statistical methods can
be actualized through repeated training without manual involvement. The
expensive maintenance of rule based systems is eliminated. Training of the
system takes place on the basis of predetermined text examples which have
already been allocated to classes. All that needs to be done is to
right-click on the folder with the text examples to open the Context Menu
and select PlainClassify Train. Using this data,
the program learns the phrases and text building blocks, which characterize
the different classes. During the classification of new documents
PlainClassify decides which class they should be allocated to based on the
self-learned characteristics,. The possibility of multiple class allocation
is a special feature of PlainClassify. In this way, texts with a broad
subject matter spectrum can be allocated to all appropriate classes. Using a
confidence measurement, the user can evaluate the main subject and also the
level of confidence of the total classification. An additional hierarchical
class allocation allows any further, finely-detailed classification needed.

Document Result of classification The
self-learned characteristics are collected in a data model following
training. During each classification PlainClassify uses the appropriate
characteristics from this data model (above). In addition to word
frequencies, word contexts and phrases are also modeled. Training of the
data model is a sensitive part of the classification application and should
therefore include special care in the selection of the documents to be used
for training. It is especially important to have a realistic reflection of
the contents of the documents to be classified included in the training
documents. Difficult documents should be included and in sufficient quantity
for the training in order to be able to handle new texts of similar type.
PlainClassify interactively classifies both individual documents and
multiple documents at the same time. To start the classification,
right-click on the document to be classified to open the Context Menu and
select PlainClassify.
You can use the result of the classification in many ways. The user, for
example can search for text in a limited subject matter area using the
included search function. Of course, the Windream search function is also
supported and includes the class index. An automatic classification
ultimately serves the development and long-term assurance of a selected
filing system. The increased efficiency of the automatic system is
especially significant during the initial installation of the Windream
system and the transfer of the existing inventory of documents.
In addition to the interactive classification PLAINknowledge Server (left)
allows the user routine classification and optional transfer of the
classified texts depending on the allocated classification index. One
possible application of the Server is the automatic classification and
distribution of e-mails according to their content to the appropriate
employees. Other routinely arriving documents, such as ticker messages or
news, can be mined and efficiently processed without manual intervention.
PlainSummarize
PlainSummarize abstracts unstructured text according to content. The most
important sentences
are
extracted from a text based on an evaluation of the degree of relevance of
the words used. The length of the summary can be selected freely as a
percent of the original text. Both individual texts and whole directories
can be summarized and displayed or saved. Text summaries allow an efficient
overview of the data inventory. PlainSummarize enables the search of
information in results from search functions that have returned extensive
data.
PlainRetrieve
PlainRetrieve adds an associative, i.e. semantically-based document search,
to the Windream
search function, which already excels for full-text and index searches. The
query and the document inventory are evaluated based on content and both are
set in relation to each other. The result is a weighted document list of the
Windream database. The efficient evaluation of the query allows the entry of
not only a short keyword but also of entire text fragments. In addition to
the listing of the relevant documents,
PlainRetrieve evaluates the document inventory in view of possible
continuing searches. The
evaluation results in semantic markers which show possible refinements in
the search in the form of keywords. The iterative execution of
PlainRetrieve, using these keywords, efficiently guides the user to the
information he is searching for. However, it does not force the user into a
certain direction because of the strong interactivity between the system and
the user. The depiction of the history of the search rounds off
PlainRetrieve into a high-performance search engine for unstructured texts
within the Windream system.

Light-Version:
- PlainCluster
- 5, 7 and 10 classes
- PlainClassify
- Flat class structure, simple allocation of classes
- Training units:
- min. 10 up to max. 25 documents per class
- max. 25 classes
- PlainSummarize
- 25% summarization
Pro-Version:
- PlainCluster
- Any number of classes
- PlainClassify
- Hierarchical class structure, multiple class allocation
- Training units
- no limitation of the number of documents or classes
Enterprise-Version:
- Same as Pro-Version
- PlainRetrieve
- Associative search with semantic markers
|




|