Muse FacebookFacebook Muse
facebookreearch/MUSE: Multilingual unattended or monitored embedding libraries
The MUSE is a Python multi-lingual embedded vocabulary collection designed to serve the community: There are two ways, one monitored, using a bi-lingual lexicon or the same string, and one unattended, not using concurrent information (see the section Word-translation without concurrent information for more information).
The MUSE is available on CPU or GPU, in Python 2 or 3. Obtain single-language and cross-language wording embedding of analysis data records: by simple execution (in data/): If you are pre-trained in unilingual embedding, we suggest using fasText Wikipedia embedding or fasText to practice your own embedding of words from your body. The English (en) and Spanish (es) embeds can be downloaded in this way:
There are two ways to achieve cross-linguistic embedding of words: Unmonitored: without using concurrent dates or anchors, you' ll be learning how to map from the original location to the destination with enemy practice and (iterative) procrustes sophistication. Protocols and embeds are stored in the dumped/ folder. The standard validating metrics is the mean co-sine of pair of words from a synthesized vocabulary created with CSLS (Cross-domain Similarity Local Scaling). 2.
If you want to use some languages (e.g. En-Zh), we suggest centering the embedding with --normalize_embeddings centre. There is also a basic tool to assess the accuracy of single and multilingual embedding of words in various tasks: At the end of the experiment, the oriented embedding is normally output in a text format:
txt-import. It may take a while to embed in a text document if you have many embeds. If you want to have a very quick exported, you can configure either to use the option xport. You can either specify xport 56 to exported the embedding to a PyTorch binaries, or just deactivate the option (--export ""). During embedding loads, the cast can be loaded:
These first two are very quick and can download 1 million embeds in a few seconds, while it can take a while to download text only. Multi-lingual embedding and bi-lingual lexicons are available. This embedding is almost text embedding that is placed in a shared area. Wikipedia publishes fasText monitored text embedded words for 30 different language, hosted in a unique vectorspace.
It is our aim to facilitate the design and assessment of cross-language verb embedding and multi-language NLP. 04087}, MUSE is the initial unattended automated translator working with unilingual only[ 2].