What
Hyvmind is a 'research to earn' app for lawyers. It rewards them for contributing to an open source semantic-data layer.
Why
Human annotated datasets play a critical role in training language models. But sourcing them is considered expensive and time-consuming. The irony is that researchers invariably create such annotations for themselves, without being aware of their value.
How
Imagine an app where legal researchers could (a) own each annotation that they made on a public document, (b) encrypt and preserve it as a private asset or deposit it in a common pool for rewards, (c) collect assets shared by others, (e) combine owned and collected assets to create private curations and knowledge subgraphs.
Whitepaper Abstract
While the superiority of language models over traditional search-engines has become clear in recent years, their ‘generativity’ remains a serious concern for policymakers. On the input side, it is generally acknowledged that most large models are trained on unethically sourced data. And on the output side, their tendency to hallucinate and misinform makes them unfit for domain-specific work. This paper takes the view that explainability and transparency cannot be achieved simply by putting ‘a human in the loop’. Taking legal work as a concrete site, it proposes Hyvmind– an architecture that puts ‘humans in the centre’ by recording and rewarding semantic labour through tokenised annotations. Its novelty lies in conceptualising legal research as a set of four interconnected functions (source, watch, frame and curate) around a common data-object (source-text). By storing and rewarding annotative-work through a distributed ledger system with nested states, it creates a secure, ethical and organic pathway for generating high-quality datasets for the next generation of domain-specific language models.
Legal research is plagued by multiple inefficiencies resulting from siloed annotations. The absence of transparent and reliable incentives for sharing notes results in gridlocked collaboration and unnecessary duplication. On the other hand, the absence of high-quality, ethically sourced, human-annotated datasets impedes the development of reliable, domain-specific language models for the legal field. Current AI models often rely on unethically sourced data (or noisy and synthetic data parsed from documents), leading to inaccuracies and hallucinations, making them unsuitable for legal applications. There is an urgent need for a system that recognizes and rewards the semantic labor of legal researchers while providing high-quality datasets for AI training, evaluation and benchmarking.
Hyvmind offers an elegant solution to this problem by creating a reward channel which turns siloed legal annotations into community-owned vector databases.
Concrete Deliverables:
An open-source data layer for legal semantics will be invaluable for training, benchmarking and evaluating legal AIs. Hyvmind will ensure that such models get an organic supply of accurate and ethically source subgraphs with real-time weights. Since each annotation is tokenised and verified at root, Hyvmind will create transparent 'pay as you go' pathways for any models/curators/subscribers seeking access to accurate and context-rich legal data.
Want to grow through just grant funding
India
50.98 USD