GramLab Corpus Manager is an open source solution for converting and consolidating a set of heterogeneous documents
Taking the first step to providing an open source solution
for linguistic processing and analysis of unstructured data from Open Data and Big Data
Paris, July 20th, 2012 – GramLab, a research project funded by the FEDER and labeled by Cap Digital, announces today the release of the GramLab Corpus Manager that collects a set of documents of different formats, converts them into the same format (XML TEILite) to create collections (arranged by subject) and then combines them into the same file.
GramLab Corpus Manager provides a solution to analyze and process a large set of heterogeneous documents, issue frequently tackled by computational linguists when creating or testing a new grammar. Once the “corpus” is converted, organized by theme and compiled into a single document, GramLab IDE allows the linguist to create computational grammars that will analyze document content and extract, for example, every occurrence of a given subject, whatever the words that are used.
Objectives of GramLab
The GramLab Consortium aims at providing free open source solutions for all (software vendors, SMEs, academics, etc.) whether computational linguistic specialists or not, that can, for example, analyze data now available as part of any Open Data initiatives or Big Data that is overwhelming the corporate world.
Tools available through the GramLab Project
The scope of tools provided in the GramLab framework range from development of Information Retrieval solutions, Content Management or Data Management of Big Data or Open Data. Tools coming soon:
- GramLab IDE, environment dedicated to the creation and maintenance of computational grammars and linguistic resources for automatic analysis.
- GramLab linguistic resources, linguistic resources such as corpora, computational grammars and lexicons.
- GramLab Annotators, a generic UIMA annotator that performs textual content analytics.
The GramLab Team
The GramLab Consortium is composed of four SMEs, an academic research laboratory and a non-profit organization:
- Kwaga, the creator of the semantic platform that analyses emails and automatically updates the address book – WriteThat.Name.
- Actimos, creator of Wipolo, the travel companion.
- The LIGM (Université de Paris-Est Marne-La-Vallée) public research lab.
- Lingway, developer of specialized solutions based on powerful multilingual semantic tools and business-specific linguistic resources.
- Qwam Content Intelligence, software solutions for management, research and accessing electronic content.
- L’Aproged, professional association for digital economy.
The GramLab Corpus Manager may be downloaded for free here: the GramLab forge
The GramLab Project aims to develop a platform for local grammars (FSA) quick creation and robust execution.
This open source project is designed to enrich both the scientific and industrial community through the development and dissemination of resources about local computational grammars manuals, which are the basic components of the analysis modules of written text in all its diversity (emails, blogs, SMS, chat, etc.).
The GramLab Project is financed by the FEDER fund and has been labeled by Cap Digital.
For more information and to sign up for the beta program: www.gramlab.org