HunPoS (Halácsy et al, 2007) is an open source reimplementation of the statistical part-of-speech tagger Trigrams'n Tags, also called TnT (Brants, 2000). MODEL ===== POS tagging model for Hunpos (Halácsy et al. 2007), trained on a mix of data from - the TiGer treebank (Brants et al. 2002) - the harmonised POS testsuite (Rehbein et al. 2018) - and self-training data from Twitter USAGE ===== Using the model (on a Linux machine): 1) Before running the tagger, you first need to download the tagger (https://code.google.com/archive/p/hunpos/): For Linux, download the file hunpos-1.0-linux.tgz and decompress it with tar -xzf hunpos-1.0-linux.tgz 2) Then go to the directory "hunpos-1.0-linux" where you see 2 files (hunpos-tag, hunpos-train). 3) You can test the tagger, using the following command: echo "Heute war neben dem Mumien schiebe Tag im Aldi auch der ich bleibe einfach stehen und bewege mich gar nicht mehr Tag im #Aldi" | sed 's/ /\n/g' | hunpos-tag hunpos-social-media.model The output should look like this: Heute ADV war VAFIN neben APPR dem ART Mumien NN schiebe VVFIN Tag NN im APPRART Aldi NE auch ADV der PRELS ich PPER bleibe VVFIN einfach ADJD stehen VVINF und KON bewege VVFIN mich PRF gar ADV nicht PTKNEG mehr PIAT Tag NN im APPRART #Aldi NE You can see that the model is not perfect, but it does pretty well. If you want to feed the tagger your own input files, then you have to make sure that the input text is in a one-word-per-line format (see example.txt). Then you can run the tagger with: cat example.txt | hunpos-tag hunpos-social-media.model >output.txt REFERENCES Sabine Brants, Stefanie Dipper, Silvia Hansen, Wolfgang Lezius, and George Smith (2002): The TIGER treebank. In Proceedings of the Workshop on Treebanks and Linguistic Theories (TLT 2002), Sozopol, Bulgaria. Thorsten Brants (2000): TnT -- A Statistical Part-of-Speech Tagger. In Proceedings of ANLP-2000, Seattle, WA. Péter Halácsy, András Kornai, and Csaba Oravecz (2007): HunPos: An open source trigram tagger. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions (ACL 2007), Prague, Czech Republic. Ines Rehbein, Josef Ruppenhofer and Victor Zimmermann (2018): A harmonised testsuite for POS tagging of German social media data. In Proceedings of the 27th International Conference on Computational Linguistics (KONVENS 2018), Vienna, Austria.