Additional document information (ADI) is information is needed for weighting schemes which take into account properties of whole documents. More...
#include <CAdditionalDocumentInformation.h>
Public Member Functions | |
CAdditionalDocumentInformation (const string &inName="N.N.") | |
Constructor sets filename. | |
void | setFileNameBase (const string &inName) |
...If neccesary set filename later | |
void | resetDF () |
Reset mMaximumDF and mDFSquareSum. | |
void | adjustDF (double inDF) |
Add the right things to mMaximumDF and mDFSquareSum. | |
void | resetSquareDFLogICF () |
reset mSquareDFLogICFSum | |
void | adjustSquareDFLogICF (double) |
add the right things to mSquareDFLogICFSum | |
Accessors | |
double | getMaximumDF () const |
double | getDFSquareSum () const |
double | getSquareDFLogICFSum () const |
bool | output (ostream &outStream) const |
bool | output () const |
bool | input (istream &inStream) |
bool | input () |
Protected Attributes | |
string | mFileNameBase |
Filename of the document (from which the ADI file will be built) | |
double | mMaximumDF |
Maximum Document frequency of a feature for the whole document. | |
double | mDFSquareSum |
Sum of squared document frequencies off all features of the document. | |
double | mSquareDFLogICFSum |
Sum of (DF*DF*log(ICF)) for all features of the document. | |
Additional document information (ADI) is information is needed for weighting schemes which take into account properties of whole documents.
Things like e.g. the euclidean length of a vector have to be calculated beforehand.