Scribo KDE roadmap
From Mandriva Community Wiki
Scribo stands for Semi-automatic Collaborative Retrieval of Information Based on Ontologies. The project has the hallmark of the System@tic competitiveness cluster http://www.systematic-paris-region.org. The project's Web site is . Scribo is delivering 1) a set of natural language processing engines capable of adding semantic annotations to documents (identification of named entities, of coreferences, of relations between entitites, term desambiguation, etc.) 2) tools for managing annotations on the KDE desktop 3) applications dedicated to specific activities: activity management, Linux documentation annotation, press article annotation.
This page is about the KDE roadmap of the Scribo developments in 2009 and 2010.
Note on Scribo analysis engines input/output
The NLP engines designed within Scribo take as input an XML representation of a document containing the headings and the body of the document. They return a set of annotations consisting of an XML file containing:
- the text fragment the annotation relates to,
- the context of the text fragment, i.e. n lines before the fragment, n lines after the fragment, and the position of the fragment in the input XML document section,
- the content of the annotation as a set of triples serialized in XML.
Mandriva roadmap ideas
(All time estimates are based on full days. A real day is seldom "full" of development time)
|Capability to highlight fragments in html files||Just like in plain text files we would also like to highlight the annotations in html files. This should be done in the annotation window below and in the konqueror integration below.
||1 day for integrating html into the annotation window||ongoing|
|Creation of an annotation sidebar for Konqueror||Create a sidebar for Konqueror similar to the Firefox OpenCalais plugin. The plugin lets the user configure the analysis engine to be used (OpenCalais, Scribo, Alchemy etc.). When the user gets a page analysed, the plugin highlights with different colors the identified text fragments.||
|Capability to add manually semantic annotations to a URL or to text fragments in an HTML page.||Manual annotations||
|Integrate automated text analysis results to Okular||The simplest way to implement this would mean to add annotation support through Nepomuk/Scribo to Okular.|| Existing parts:
||Hard to say. This could take longer since it may involve changing some parts of Okular.|
|Manual annotations Okular||Okular already provides annotations of some kind as mentioned above. This task will add manual annotations to the semi-automatic ones. This means that the user can select a passage in the pdf and then link this passage with some resource or tag it or make it a resource (like for example selecting "Paris" and then stating that this is the city Paris).||Integrate the manual annotation system into Okular. For a first prototype it should be sufficient to use a context menu and a separate dialog similar to the annotation window discussed above.||Once the annotation window is done and the semi-automatic annotations in Okular are done, this should be fairly easy and be done in 2 days.|
|CEA annotation engine integration||CEA provides another web service for text analysis. This service should be integrated into the Scribo framework as a plugin.||
|Integrate manual/automated annotation capabilities to KMail||
|Define actions for extracted entities||For extracted entities such as cities or persons actions can be defined. The most simple one could be to open google maps for the extracted city. Idea: create something like the mimetype actions? Maybe also using desktop files?|
|Create a framework for defining actions based on annotation recommendations and information in the Nepomuk store|
|Define a set of standard actions and tie them to typical RDF classes (such as pimo:City maybe or OpenCalais' city class)|
|Integrate the system into the test shell|
|Create test data||ongoing|
|Capability to highlight fragments in text files (with various colors)||The automatic annotations created by the Scribo system mentioned above optionally relate to positions in the text. These positions should be highlighted.|| Existing parts:
||1-2 days once the annotation window above is done.||done|
|DONE Context action for launching a text analysis on a file from Dolphin||The user selects a file in Dolphin and clicks the "annotate" action. A window opens which provides the means to create manual annotations (tags, comments, relations to pimo things, relations to arbitrary things). In the background the Scribo system creates possible annotations using the annotation plugin system. The generated annotations are proposed to the user. The user can accept them via a simple click, ignore them (by doing nothing), or rejecting them as being useless or wrong. The window also provides a means to configure the plugin system (choosing which plugins should be used for annotation creation: OpenCalais, DERI engine, Proxem, or others). The window also shows all current annotations set for the file. Whenever the user accepts an annotation or creates a new manual one the view is updated.||
|As most of the code already exists and only needs to be combined I would predict 2-3 full days.||done|
|DONE UI for sending feedback to the annotation engine||Certain suggested semi-automatic annotations may be completely wrong. In this case it would be good to allow the user to give feedback by "telling" the system about the error. Most likely a reject button would be enough (see above).||
Ideas for actions
|City, Country||Open Google Maps|
|Persons||Write email to|
|Dates||Create a date in the calendar (if possible use context: propose extracted entities as events)|
Use information from the Nepomuk store
There could already be information about extracted entities (or also the resource itself) in the Nepomuk store. This should be presented to the user while hovering the entity in the text via a tooltip. This tooltip could also be combined with the actions idea above: show a map in the tooltip and maybe even mark known points of interest from the Nepomuk store in the map.
- In KDE's playground we have a system for presenting arbitrary information from Nepomuk based on templates. This could be used and improved here.
- Map the extracted entities to entities in the Nepomuk store
- Extract all information about the mapped entities from Nepomuk and present them
- Use the framework for annotating doc4 contents (see http://doc4.mandriva.org)
- Consider developments for OpenOffice integration? KOffice integration?
- Realize an activity oriented desktop: create activity oriented plasmoids for various desktop objects: http://wiki.mandriva.com/en/Nepomuk-Scribo_task_oriented_desktop
- Standardisation at the freedesktop / OSCAF levels
- Annotations should mostly be based on PIMO. This means that for example the OpenCalais plugin is supposed to create a pimo thing which has the extracted OpenCalais resource as an occurrence. If possible the pimo things's type should match the OpenCalais class (unsure: should we also create new pimo classes in certain situations?)
Created by Sebastian and Stéphane, may 2009