Home ======= Linalgo is a Python module to help Machine Learning team create and curate datasets for Natural Language Processing. It tries to follow the W3C `Web Annotation Data Model `_ and to provides a powerful system to add metadata to most commonly used text and image formats: TXT, PDF, HTML, etc. Installation ============ You can install linalgo using pip:: pip install linalgo For other options, see :ref:`install_page`. Getting started =============== Examples -------- Examples will be available in the `examples/` folder. Tutorials --------- **Getting task data** .. code-block:: python from linalgo.client import LinalgoClient client_id = '' client_secret = '' linalgo_client = LinalgoClient(client_id, client_secret) **Training a binary classifier** .. code-block:: python from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.pipeline import Pipelin task_id = 1 tasks = linalgo_client.get_task(task_id) label = 4 docs, labels = task.transform(target='binary', label=label) X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.33, random_state=42) text_clf = Pipeline([ ('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf', LogisticRegression()), ]) text_clf.fit(X_train, y_train) predicted = text_clf.predict(X_test) .. toctree:: :maxdepth: 2 :caption: Contents: Indices and tables ================== * :ref:`genindex` * :ref:`modindex`