Archives par mot-clé : RÉALISATIONS
MappSent, measuring Text-to-Text Similarity
MappSent, Python system implementing a Mapping Approach for measuring Text-to-Text Similarity
- Based on a linear text segment (e.g. sentence) embedding representation, its principle is to build a matrix that maps text segments in a joint-subspace where similar sets of segments are pushed closer.
- We evaluate our approach on the SemEval 2016 and 2017 question-to-question similarity task and show that overall MappSent achieves competitive results and outperforms in most cases state-of-art methods.
Download the sources (under Apache v2 license)
EXIDE, Extracting information from presentation
EXIDE, Python module for information extraction (logical structure…) from presentation documents
- Supported file types: Office Open XML (PPTX), OpenDocument (ODP), LaTeX beamer
- Among the extracted information: general presentation structure and outline, slide titles, body text, emphasized text, …
Download the sources (under a GNU GPL v3 license)
PyRATA, Python Rule-based feAture sTructure Analysis
- provides regular expression (re) matching methods on a more complex structure than a list of characters (string), namely a sequence of features set (i.e.
list
ofdict
in python jargon); - is free from the information encapsulated in the features and consequently can work with word features, sentences features, calendar event features… Indeed, PyRATA is not only dedicated to process textual data.
- is fun and easy to use to explore data for research study, solve deterministic problems, formulate expert knowledge in a declarative way, prototype quickly models and generate training data for Machine Learning (ML) systems, extract ML features, augment ML models…
Download the sources (under Apache v2 license)