Search Results

open access

Phrasal Proper Names in German and Norwegian

Description: Article discusses the morpho-syntax of phrasal proper names like Deutsche Bahn 'German Railway' and Norske Skog 'Norwegian Forest' in German and Norwegian. The authors document that phrasal proper names may show features of recursivity evidenced most clearly in Norwegian.
Date: September 9, 2023
Creator: Julien, Marit & Roehrs, Dorian
Partner: UNT College of Information
captions transcript

CoRSAL 6th Annual Planning Meeting

Description: Video recording of Computational Resource for South Asian Languages' (CoRSAL) 6th Annual Planning Meeting which focused on collecting, organizing, and archiving materials for dictionaries.
Date: September 30, 2022
Duration: 4 hours 32 minutes 43 seconds
Creator: University of North Texas. Department of Linguistics.
Partner: UNT College of Information
open access

What do complexity measures measure? Correlating and validating corpus-based measures of morphological complexity

Description: Article describes how the authors present an analysis of eight measures used for quantifying morphological complexity of natural languages. The measures they study are corpus-based measures of morphological complexity with varying requirements for corpus annotation.
Date: September 22, 2022
Creator: Çöltekin, Çağrı & Rama, Taraka
Partner: UNT College of Information
open access

User needs in language archives: Findings from interviews with language archive managers, depositors, and end-users

Description: This article is an exploratory study providing empirical data on language archive user needs and supports some anecdotal evidence of known issues facing language archive end-users, depositors, and managers in primarily academic contexts.
Date: April 2022
Creator: Burke, Mary; Zavalina, Oksana; Chelliah, Shobhana Lakshmi & Phillips, Mark Edward
Partner: University of North Texas
open access

Challenges to Representing Personal Names and Language Names in Language Archives: Examples from Northeast India

Description: Article reviewing one particular challenge to data management relevant to South Asia, which is the complexity of names (of individuals, groups, and languages). It was presented at the 1st International Workshop on Digital Language Archives held on September 30-October 1, 2021 as part of the ACM/IEEE Joint Conference on Digital Libraries 2021.
Date: October 7, 2021
Creator: Burke, Mary & Chelliah, Shobhana Lakshmi
Partner: UNT College of Information
open access

Proceedings of the International Workshop on Digital Language Archives: LangArc 2021

Description: Conference proceedings of the 1st International Workshop on Digital Language Archives held on September 30-October 1, 2021 as part of the ACM/IEEE Joint Conference on Digital Libraries 2021. It includes 14 peer-reviewed papers that were presented at the workshop and an introduction from the workshop organizers.
Date: October 7, 2021
Creator: Zavalina, Oksana & Chelliah, Shobhana Lakshmi
Partner: UNT College of Information
captions transcript

CoRSAL 5th Annual Planning Meeting

Description: Video recording of Computational Resource for South Asian Languages' (CoRSAL) 5th Annual Planning Meeting which focused on increasing engagement with CoRSAL through social media.
Date: October 1, 2021
Duration: 3 hours 35 minutes 21 seconds
Creator: University of North Texas. Department of Linguistics.
Partner: UNT College of Information

Leveraging Digital Library Infrastructure to Build a Language Archive

Description: Presentation describing the ongoing CoRSAL (Computational Resource for South Asian Languages) project, including background on the UNT Digital Library infrastructure and metadata schema, specific fields that have presented issues or areas of discussion for language data records (language, creator/contributor, and relation), and final conclusions about the collaboration so far.
Date: September 30, 2021
Creator: Phillips, Mark Edward & Tarver, Hannah
Partner: UNT Libraries
open access

Synthetic data for annotation and extraction of family history information from clinical text

Description: This article investigates the use of synthetic data for the annotation and automated extraction of family history information relating to cases of cardiac disease from Norwegian clinical text. This work assesses the validity and applicability of the annotated synthetic corpus using machine learning techniques. The methodology outlined in this article may be useful in other situations where limited availability of clinical text hinders NLP tasks.
Date: July 14, 2021
Creator: Brekke, Pål H.; Kasicheyanula, Taraka; Pilán, Ildikó; Nytrø, Øystein & Øvrelid, Lilja
Partner: UNT College of Information
open access

Neural classification of Norwegian radiology reports: using NLP to detect findings in CT-scans of children

Description: This article trained machine learning techniques to classify Norwegian radiology reports of pediatric CT examinations according to their description of abnormal findings. The developed models are robust with respect to different contexts, and may be used in quality assurance processes.
Date: March 4, 2021
Creator: Dahl, Fredrik A.; Rama, Taraka; Hurlen, Petter; Brekke, PÃ¥l H.; Husby, Haldor; Gundersen, Tore et al.
Partner: UNT College of Information
open access

It’s not a Non-Issue: Negation as a Source of Error in Machine Translation

Description: Article investigates whether translating negation is an issue for modern MT systems using 17 translation directions as test bed and provides a linguistically motivated analysis that explains the majority of the findings. The authors release their annotations and code to replicate analysis here: https://github.com/mosharafhossain/negation-mt.
Date: November 2020
Creator: Hossain, Md Mosharaf; Blanco, Eduardo; Palmer, Alexis & Anastasopoulos, Antonios
Partner: University of North Texas

Archives: Perspectives from Three Scholars of Tibeto-Burman

Description: Presentation on the author's interviews with three scholars of Bodish languages about their experiences, materials, and preservation plans and their reflections on language archives, particularly in regards to accessibility and usability for pedagogy. It was presented at the CoRSAL 4th Annual Meeting held on October 1, 2020.
Date: October 1, 2020
Creator: Hildebrandt, Kristine
Partner: UNT College of Information
open access

A test of Generalized Bayesian dating: A new linguistic dating method

Description: Article addressing if a new Bayesian framework can be introduced and ways to overcome subjectivity. The authors introduce a new method called Generalized Bayesian Dating (GBD) for inferring dates of language groups from lexical and phonological data. This work has implications for future performance testing in the area of linguistic dating.
Date: August 12, 2020
Creator: Kasicheyanula, Taraka & Søren Wichmann
Partner: UNT College of Information
open access

Hierarchical Coding Scheme: Exploring Methods and Techniques for Facilitating Access to Digital Language Archives

Description: This is the hierarchical coding scheme used for qualitative analysis of interviews with language archive managers, depositors, and end-users as part of the 'Exploring Methods and Techniques for Facilitating Access to Digital Language Archives' project (January 2019-August 2020).
Date: June 2020
Creator: Burke, Mary; Zavalina, Oksana; Chelliah, Shobhana Lakshmi & Phillips, Mark Edward
Partner: UNT College of Information
open access

WikiPossessions: Possession Timeline Generation as an Evaluation Benchmark for Machine Reading Comprehension of Long Texts

Description: Article presents WikiPossessions, a new benchmark corpus for the task of temporally-oriented possession (TOP), or tracking objects as they change hands over time. In addition to the corpus, the authors release evaluation scripts and a baseline model for the task.
Date: May 2020
Creator: Blanco, Eduardo; Palmer, Alexis & Chinnappa, Dhivya
Partner: University of North Texas
open access

A Corpus of Negations and their Underlying Positive Interpretations

Description: Article presenting a corpus of negations and their underlying positive interpretations using negations from Simple Wikipedia, automatically generating potential positive interpretations, and collecting manual annotations that effectively rewrite the negation in positive terms. This article was presented at the Eighth Joint Conference on Lexical and Computational Semantics (SEM 2019) in Minneapolis, Minnesota, June 6-7, 2019.
Date: June 2019
Creator: Sarabi, Zahra; Killian, Erin; Blanco, Eduardo & Palmer, Alexis
Partner: University of North Texas
open access

An Automated Framework for Fast Cognate Detection and Bayesian Phylogenetic Inference in Computational Historical Linguistics

Description: Article presents a fully automated workflow for phylogenetic reconstruction on large datasets, consisting of two novel methods, one for fast detection of cognates and one for fast Bayesian phylogenetic inference.
Date: 2019
Creator: Kasicheyanula, Taraka & List, Johann-Mattis
Partner: UNT College of Information
captions transcript

Discussions about documenting and archiving languages

Description: Video of one group discussion during the break-out session at the 2018 CoRSAL Symposium on Developing Infrastructure for a Computational Resource on South Asian Languages. Participants broke up into two groups to discuss various issues that emerge during the process of documenting and archiving languages.
Date: November 1, 2018
Duration: 1 hour 12 seconds
Creator: University of North Texas. Department of Linguistics.
Partner: UNT College of Information
Back to Top of Screen