We present an instance of MaltParser trained for Spanish. The parser has been trained using the IULA Spanish LSP Treebank, which contains more than 42,000 sentences and almost 590,000 tokens. In order to achieve optimal performance, MaltOptimizer has been used to set the best parameters to train MaltParser.
We performed some evaluation experiments using 80% of the IULA Spanish LSP Treebank as train and 20% as test. We also evaluated the model over Tibidabo corpus, which is a good evaluation set because it contains very different kind of sentences (newspaper domain).
The training and test corpora are available for download to perform Machine Learning experiments. In that way the same partitions can be used by different researchers and their results can be directly compared.
The following table summarizes the evaluation results over the two corpora:
|
LAS: |
LCM: |
IULA Spanish LSP Treebank Test Set |
93,14 % |
47,60 % |
Tibidabo Treebank |
88,95 % |
36,20 % |
The resulting parser can be used in two different ways:
1. |
Accessing the malt_parser web service |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Access ws | Access malt_parser web service | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
MaltParser is expected to work better if the PoS and the tokens are as similar as possible to those used to train it; this is, the PoS and tokenization of the IULA Spanish LSP Treebank. For that reason we have developed a package that performs the PoS step with FreeLing configured as it was when building the Treebank and then calls the MaltParser model to get the dependencies. This package has been deployed as a web service and can be freely used to parse plain text sentences.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2. |
Running Malt parser Spanish module espmalt-1.0.mco |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Download | Download Malt parser Spanish module espmalt-1.0.mco: e-repositori | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The file espmalt-1.0.mco contains a single malt configuration for parsing Spanish text with MaltParser. To run the Malt parser Spanish module follow the steps bellow:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3. |
MaltParser for Spanish training and test corpora |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Download | Download MaltParser for Spanish training and test corpora: e-repositori | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
In this package we offer a partition of the IULA Spanish LSP Treebank into train and test sets and we also deliver the Tibidabo Treebank (Marimon 2010) which contains a set of sentences extracted from Ancora corpus annotated in the same way than the Iula Treebank. Tibidabo Treebank is a very good test set for models trained with Iula Spanish LSP Treebank since the sentences that form it are from a very different domain than those of the Iula Spanish LSP Treebank. From the IULA Spanish LSP Treebank, we took 80% for training and 20% for test. The following table summarizes the size of these two partitions plus the Tibidabo Treebank:
All these corpora follow Conll-X shared task format with the same function values than IULA Spanish LSP Treebank. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4. |
More informationFor more information:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5. |
AcknowledgmentsThe development of the Malt Parser for Spanish has been funded by PANACEA project (7FP-ITC-248064). We thank the creators of the IulaTreebank, used to train this MaltParser model for Spanish. The creation of the Treebank was Funded by METANET4U project (CIP-PSP-270893) and IULA. We also thank Miguel Ballesteros for his kind help regarding the use of MaltOptimizer. |
© INSTITUT DE LINGÜÍSTICA APLICADA - UNIVERSITAT POMPEU FABRA