Archives de catégorie : Réalisations

MappSent, measuring Text-to-Text Similarity

MappSent, Python system implementing a Mapping Approach for measuring Text-to-Text Similarity

  • Based on a linear text segment (e.g. sentence) embedding representation, its principle is to build a matrix that maps text segments in a joint-subspace where similar sets of segments are pushed closer.
  • We evaluate our approach on the SemEval 2016 and 2017 question-to-question similarity task and show that overall MappSent achieves competitive results and outperforms in most cases state-of-art methods.

Download the sources (under Apache v2 license)

PyRATA, Python Rule-based feAture sTructure Analysis

  • provides regular expression (re) matching methods on a more complex structure than a list of characters (string), namely a sequence of features set (i.e. list of dict in python jargon);
  • is free from the information encapsulated in the features and consequently can work with word features, sentences features, calendar event features… Indeed, PyRATA is not only dedicated to process textual data.
  • is fun and easy to use to explore data for research study, solve deterministic problems, formulate expert knowledge in a declarative way, prototype quickly models and generate training data for Machine Learning (ML) systems, extract ML features, augment ML models…

Download the sources (under Apache v2 license)