topic management thumbnail

Topic Management and Disambiguation

The precise definition and disambiguation of search queries is crucial to obtain high-quality results – for both modifying the global filter setting as well as managing your own personal bookmarks.

bookmark management

Defining and Accessing Bookmarks

As a registered user, you can use the left sidebar to create and edit bookmarks, combine multiple bookmarks into complex queries or set customized email alerts (on hover, a gear icon to activate the overlay menu appears). You also have access to the full set of advanced search options when defining or revising a bookmark. This includes the ability to filter by metadata elements, and to use logical operators (AND, OR, AND NOT) to combine various filters and restrictions. Even complex queries to search in the full text, title or URL of documents can be stored, revised and later accessed at the click of a button.

It is possible to exclude certain aspects of recent coverage – e.g., restrict queries to one or more countries, or select content from a specific set of Web sites. Clicking on a bookmark label activates a tooltip to revise the global filter, clicking on the small checkbox activates (or deactivates) the corresponding search. All matching documents are included in the list of search results, and used for computing various charts and metrics. The label of bookmark itself is not considered in the matching process.

For ad-hoc queries, simple text fields typically suffice. Defining and disambiguating topics, however, often requires a larger number of terms. To properly describe abstract concepts like “climate change” or popular but ambiguous brand names such as “Amazon”, “Apple”, “Gap” and “Three”, one needs to consider synonyms, singular and plural versions of a term, grammatical variations, lists of related products and services, etc. The phrase editor, as outlined in the next section, facilitates the inclusion of such variations in a search query.

Accessing the Phrase Editor

Users can search for ‘at least one word’, ‘all words’ or the ‘exact phrase’ using wildcard characters (‘*’, ‘?’), or select ‘list of phrases’ to activate the Phrase Editor. This editor allows you to define and manage lists of terms. It expects one word or phrase per line. There is no need to use quotation marks to mark phrases such as big data. The column on the right shows the number of documents matching the query defined by this particular line – considering the currently selected content source(s) and time interval. The lines can be sorted alphabetically, or by the number of matching documents.

topic-editor-cop21

Each line can contain (i) a single word, (ii) a phrase, or (iii) a regular expression (RegExp) that supports optional wildcards for defining queries more effectively. The Topic Editor uses the following simplified RegExp notation:

  • Question marks instruct the system to treat the preceding token as optional; ‘networks?’, for example, considers both the singular (‘network’) as well as the plural (‘networks’) of the term.
  • Brackets support the grouping of tokens. While ‘networks?’ is identical to ‘network(s)?’, brackets are mandatory to mark more than one character as optional; e.g. ‘network(ed)?’ > ‘network’, ‘networked’.
  • Vertical bars ‘|‘ represent an ”or” operator, considering the document whenever one or more of its operands match; e.g. ‘network(s|ed)?’ > ‘network’, ‘networks’, ‘networked’.

On mouse-over, the editor provides an expand/shrink option to preview a list of all phrases matching the regular expression. The gear icon opens a tooltip to access the Topic Wizard, a visual editor for regular expressions. If the line is not in a valid RegExp format supported by the editor, it will show a brief help text and disable the wizard for this particular line.

Additional Options

Below the text input lines, users can (ii) activate on-the-fly negation detection with standard prefix sets for different parts of speech, and (ii)  specify the minimal number of RegExp lines that a document must match to be included in the search results. This can improve the precision of the query at the cost of lower recall, especially for terms that are ambiguous without additional context information.

Both the regular expression list and the expanded term list can be exported as a comma-separated values (CSV) or plain text file. To import existing term lists, users can copy/paste text into the editor, which automatically creates the required number of lines.

Special Characters

When searching for reserved characters in full text fields (text, title), the system treats any of these special characters as word separators. Therefore it is not possible to directly search for one specific special character.

  • List of reserved characters:
    + – = && || > < ! ( ) { } [ ] ^ ” ~ * ? . , : \ /
  • Example: When searching for “climate-change”, the “-” is treated as a word separator (similar to a white space) and will also match documents containing “climate change”, “climate&change”, “climate/change”, etc.